Editors' Picks
Great books about your topic, Statistics, selected by Encarta editors
Related Items
Encarta Search
Search Encarta about Statistics

Advertisement

Windows Live® Search Results

  • Statistics - Wikipedia, the free encyclopedia

    Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data . It is applicable to a wide variety of ...

  • Category:Statistics - Wikipedia, the free encyclopedia

    Statistics is the science and practice of developing human knowledge through the use of empirical data . It is based on statistical theory which is a branch of applied mathematics

  • statistics.com

    statistics.com is the leading provider of professional development courses in statistics. Online programs give you regular access to leading experts in statistics with courses ...

See all search results in
Windows Live® Search Results

Statistics

Encyclopedia Article
Find | Print | E-mail | Blog It
Multimedia
Domesday BookDomesday Book
Article Outline
I

Introduction

Statistics, branch of mathematics that deals with the collection, organization, and analysis of numerical data and with such problems as experiment design and decision making.

II

History

Simple forms of statistics have been used since the beginning of civilization, when pictorial representations or other symbols were used to record numbers of people, animals, and inanimate objects on skins, slabs, or sticks of wood and the walls of caves. Before 3000 bc the Babylonians used small clay tablets to record tabulations of agricultural yields and of commodities bartered or sold. The Egyptians analyzed the population and material wealth of their country before beginning to build the pyramids in the 31st century bc. The biblical books of Numbers and 1 Chronicles are primarily statistical works, the former containing two separate censuses of the Israelites and the latter describing the material wealth of various Jewish tribes. Similar numerical records existed in China before 2000 bc. The ancient Greeks held censuses to be used as bases for taxation as early as 594 bc. See Census.

The Roman Empire was the first government to gather extensive data about the population, area, and wealth of the territories that it controlled. During the Middle Ages in Europe few comprehensive censuses were made. The Carolingian kings Pepin the Short and Charlemagne ordered surveys of ecclesiastical holdings: Pepin in 758 and Charlemagne in 762. Following the Norman Conquest of England in 1066, William I, king of England, ordered a census to be taken; the information gathered in this census, conducted in 1086, was recorded in the Domesday Book. Registration of deaths and births was begun in England in the early 16th century, and in 1662 the first noteworthy statistical study of population, Observations on the London Bills of Mortality, was written. A similar study of mortality made in Breslau, Germany, in 1691 was used by the English astronomer Edmond Halley as a basis for the earliest mortality table. In the 19th century, with the application of the scientific method to all phenomena in the natural and social sciences, investigators recognized the need to reduce information to numerical values to avoid the ambiguity of verbal description.

At present, statistics is a reliable means of describing accurately the values of economic, political, social, psychological, biological, and physical data and serves as a tool to correlate and analyze such data. The work of the statistician is no longer confined to gathering and tabulating data, but is chiefly a process of interpreting the information. The development of the theory of probability increased the scope of statistical applications. Much data can be approximated accurately by certain probability distributions, and the results of probability distributions can be used in analyzing statistical data. Probability can be used to test the reliability of statistical inferences and to indicate the kind and amount of data required for a particular problem.



III

Statistical Methods

The raw materials of statistics are sets of numbers obtained from enumerations or measurements. In collecting statistical data, adequate precautions must be taken to secure complete and accurate information.

The first problem of the statistician is to determine what and how much data to collect. Actually, the problem of the census taker in obtaining an accurate and complete count of the population, like the problem of the physicist who wishes to count the number of molecule collisions per second in a given volume of gas under given conditions, is to decide the precise nature of the items to be counted. The statistician faces a complex problem when, for example, he or she wishes to take a sample poll or straw vote. It is no simple matter to gauge the size and constitution of the sample that will yield reasonably accurate predictions concerning the action of the total population.

In protracted studies to establish a physical, biological, or social law, the statistician may start with one set of data and gradually modify it in light of experience. For example, in early studies of the growth of populations, future change in size of population was predicted by calculating the excess of births over deaths in any given period. Population statisticians soon recognized that rate of increase ultimately depends on the number of births, regardless of the number of deaths, so they began to calculate future population growth on the basis of the number of births each year per 1000 population. When predictions based on this method yielded inaccurate results, statisticians realized that other limiting factors exist in population growth. Because the number of births possible depends on the number of women rather than the total population, and because women bear children during only part of their total lifetime, the basic datum used to calculate future population size is now the number of live births per 1000 females of childbearing age. The predictive value of this basic datum can be further refined by combining it with other data on the percentage of women who remain childless because of choice or circumstance, sterility, contraception, death before the end of the childbearing period, and other limiting factors. The excess of births over deaths, therefore, is meaningful only as an indication of gross population growth over a definite period in the past; the number of births per 1000 population is meaningful only as an expression of the proportion of increase during a similar period; and the number of live births per 1000 women of childbearing age is meaningful for predicting future size of populations.

IV

Tabulation and Presentation of Data

The collected data must be arranged, tabulated, and presented to permit ready and meaningful analysis and interpretation. To study and interpret the examination-grade distribution in a class of 30 pupils, for instance, the grades are arranged in ascending order: 30, 35, 43, 52, 61, 65, 65, 65, 68, 70, 72, 72, 73, 75, 75, 76, 77, 78, 78, 80, 83, 85, 88, 88, 90, 91, 96, 97, 100, 100. This progression shows at a glance that the maximum is 100, the minimum 30, and the range, or difference, between the maximum and minimum is 70.

In a cumulative-frequency graph, such as Fig. 1, the grades are marked on the horizontal axis and double marked on the vertical axis with the cumulative number of the grades on the left and the corresponding percentage of the total number on the right. Each dot represents the accumulated number of students who have attained a particular grade or less. For example, the dot A corresponds to the second 72; reading on the vertical axis, it is evident that there are 12, or 40 percent, of the grades equal to or less than 72.

In analyzing the grades received by 10 sections of 30 pupils each on four examinations, a total of 1200 grades, the amount of data is too large to be exhibited conveniently as in Fig. 1. The statistician separates the data into suitably chosen groups, or intervals. For example, ten intervals might be used to tabulate the 1200 grades, as in column (a) of the accompanying frequency-distribution table; the actual number in an interval, called the frequency of the interval, is entered in column (c). The numbers that define the interval range are called the interval boundaries. It is convenient to choose the interval boundaries so that the interval ranges are equal to each other; the interval midpoints, half the sum of the interval boundaries, are simple numbers, because they are used in many calculations. A grade such as 87 will be tallied in the 80-90 interval; a boundary grade such as 90 may be tallied uniformly throughout the groups in either the lower or upper intervals. The relative frequency, column (d), is the ratio of the frequency of an interval to the total count; the relative frequency is multiplied by 100 to obtain the percent relative frequency. The cumulative frequency, column (e), represents the number of students receiving grades equal to or less than the range in each succeeding interval; thus, the number of students with grades of 30 or less is obtained by adding the frequencies in column (c) for the first three intervals, which total 53. The cumulative relative frequency, column (f), is the ratio of the cumulative frequency to the total number of grades.

The data of a frequency-distribution table can be presented graphically in a frequency histogram, as in Fig. 2, or a cumulative-frequency polygon, as in Fig. 3. The histogram is a series of rectangles with bases equal to the interval ranges and areas proportional to the frequencies. The polygon in Fig. 3 is drawn by connecting with straight lines the interval midpoints of a cumulative frequency histogram.

Newspapers and other printed media frequently present statistical data pictorially by using different lengths or sizes of various symbols to indicate different values.

Prev.
| |
Next
Find
Print
E-mail
Blog It




© 2008 Microsoft