Editors' Picks
Great books about your topic, Statistics, selected by Encarta editors
Related Items
Encarta Search
Search Encarta about Statistics

Advertisement

Windows Live® Search Results

  • Statistics - Wikipedia, the free encyclopedia

    Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data . It is applicable to a wide variety of ...

  • Category:Statistics - Wikipedia, the free encyclopedia

    Statistics is the science and practice of developing human knowledge through the use of empirical data . It is based on statistical theory which is a branch of applied mathematics

  • statistics.com

    statistics.com is the leading provider of professional development courses in statistics. Online programs give you regular access to leading experts in statistics with courses ...

See all search results in
Windows Live® Search Results
Page 2 of 3

Statistics

Encyclopedia Article
Find | Print | E-mail | Blog It
Multimedia
Domesday BookDomesday Book
Article Outline
V

Measures of Central Tendency

After data have been collected and tabulated, analysis begins with the calculation of a single number, which will summarize or represent all the data. Because data often exhibit a cluster or central point, this number is called a measure of central tendency.

Let x1, x2, …, xn be the n tabulated (but ungrouped) numbers of some statistic; the most frequently used measure is the simple arithmetic average, or mean, written , which is the sum of the numbers divided by n:

If the x's are grouped into k intervals, with midpoints m1, m2, …, mk and frequencies f1, f2, …, fk, respectively, the simple arithmetic average is given by

with i = 1, 2, …, k.

The median and the mode are two other measures of central tendency. Let the x's be arranged in numerical order; if n is odd, the median is the middle x; if n is even, the median is the average of the two middle x's. The mode is the x that occurs most frequently. If two or more distinct x's occur with equal frequencies, but none with greater frequency, the set of x's may be said not to have a mode or to be bimodal, with modes at the two most frequent x's, or trimodal, with modes at the three most frequent x's.



VI

Measures of Variability

The investigator frequently is concerned with the variability of the distribution, that is, whether the measurements are clustered tightly around the mean or spread over the range. One measure of this variability is the difference between two percentiles, usually the 25th and the 75th percentiles. The pth percentile is a number such that p percent of the measurements are less than or equal to it; in particular, the 25th and the 75th percentiles are called the lower and upper quartiles, respectively. The pth percentile is readily found from the cumulative-frequency graph, (Fig. 1) by running a horizontal line through the p percent mark on the vertical axis on the graph, then a vertical line from this point on the graph to the horizontal axis; the abscissa of the intersection is the value of the pth percentile.

The standard deviation is a measure of variability that is more convenient than percentile differences for further investigation and analysis of statistical data. The standard deviation of a set of measurements x1, x2, …, xn, with the mean is defined as the square root of the mean of the squares of the deviations; it is usually designated by the Greek letter sigma (σ). In symbols

The square, σ2, of the standard deviation is called the variance. If the standard deviation is small, the measurements are tightly clustered around the mean; if it is large, they are widely scattered.

VII

Correlation

When two social, physical, or biological phenomena increase or decrease proportionately and simultaneously because of identical external factors, the phenomena are correlated positively; under the same conditions, if one increases in the same proportion that the other decreases, the two phenomena are negatively correlated. Investigators calculate the degree of correlation by applying a coefficient of correlation to data concerning the two phenomena. The most common correlation coefficient is expressed as

in which x is the deviation of one variable from its mean, y is the deviation of the other variable from its mean, and N is the total number of cases in the series. A perfect positive correlation between the two variables results in a coefficient of +1, a perfect negative correlation in a coefficient of -1, and a total absence of correlation in a coefficient of 0. Intermediate values between +1 and 0 or -1 are interpreted by degree of correlation. Thus, .89 indicates high positive correlation, -.76 high negative correlation, and .13 low positive correlation.

VIII

Mathematical Models

A mathematical model is a mathematical idealization in the form of a system, proposition, formula, or equation of a physical, biological, or social phenomenon. Thus, a theoretical, perfectly balanced die that can be tossed in a purely random fashion is a mathematical model for an actual physical die. The probability that in n throws of a mathematical die a throw of 6 will occur k times is

in which (¥) is the symbol for the binomial coefficient
The statistician confronted with a real physical die will devise an experiment, such as tossing the die n times repeatedly, for a total of Nn tosses, and then determine from the observed throws the likelihood that the die is balanced and that it was thrown in a random way.

In a related but more involved example of a mathematical model, many sets of measurements have been found to have the same type of frequency distribution. For example, let x1, x2, …, xN be the number of 6's cast in the N respective runs of n tosses of a die and assume N to be moderately large. Let y1, y2, …, yN be the weights, correct to the nearest 1/100 g, of N lima beans chosen haphazardly from a 100-kg bag of lima beans. Let z1, z2, …, zN be the barometric pressures recorded to the nearest 1/1000 cm by N students in succession, reading the same barometer. It will be observed that the x's, y's, and z's have amazingly similar frequency patterns. The statistician adopts a model that is a mathematical prototype or idealization of all these patterns or distributions. One form of the mathematical model is an equation for the frequency distribution, in which N is assumed to be infinite:

in which e (approximately 2.7) is the base for natural logarithms (see Logarithm). The graph of this equation (Fig. 4) is the bell-shaped curve called the normal, or Gaussian, probability curve. If a variate x is normally distributed, the probability that its value lies between a and b is given by
The mean of the x's is 0, and the standard deviation is 1. In practice, if N is large, the error is exceedingly small.

IX

Tests of Reliability

The statistician is often called upon to decide whether an assumed hypothesis for some phenomenon is valid or not. The assumed hypothesis leads to a mathematical model; the model, in turn, yields certain predicted or expected values, for example, 10, 15, 25. The corresponding actually observed values are 12, 16, 21. To determine whether the hypothesis is to be kept or rejected, these deviations must be judged as normal fluctuations caused by sampling techniques or as significant discrepancies. Statisticians have devised several tests for the significance or reliability of data. One is the chi-square (c2) test. The deviations (observed values minus expected values) are squared, divided by the expected values, and summed:

The value of c2 is then compared with values in a statistical table to determine the significance of the deviations.

Prev.
| |
Next
Find
Print
E-mail
Blog It


More from Encarta


© 2008 Microsoft