![]() Editors' Picks
Great books about your topic, Statistics, selected by Encarta editors Related Items
Encarta Search
Search Encarta about Statistics |
Windows Live® Search Results
Windows Live® Search Results Page 2 of 3
Article Outline
Introduction; History; Statistical Methods; Tabulation and Presentation of Data; Measures of Central Tendency; Measures of Variability; Correlation; Mathematical Models; Tests of Reliability; Higher Statistics
After data have been collected and tabulated, analysis begins with the calculation of a single number, which will summarize or represent all the data. Because data often exhibit a cluster or central point, this number is called a measure of central tendency.
Let x1, x2, …, xn be the n tabulated (but ungrouped) numbers of some statistic; the most frequently used measure is the simple arithmetic average, or mean, written , which is the sum of the numbers divided by n:
If the x's are grouped into k intervals, with midpoints m1, m2, …, mk and frequencies f1, f2, …, fk, respectively, the simple arithmetic average is given by
The median and the mode are two other measures of central tendency. Let the x's be arranged in numerical order; if n is odd, the median is the middle x; if n is even, the median is the average of the two middle x's. The mode is the x that occurs most frequently. If two or more distinct x's occur with equal frequencies, but none with greater frequency, the set of x's may be said not to have a mode or to be bimodal, with modes at the two most frequent x's, or trimodal, with modes at the three most frequent x's.
The investigator frequently is concerned with the variability of the distribution, that is, whether the measurements are clustered tightly around the mean or spread over the range. One measure of this variability is the difference between two percentiles, usually the 25th and the 75th percentiles. The pth percentile is a number such that p percent of the measurements are less than or equal to it; in particular, the 25th and the 75th percentiles are called the lower and upper quartiles, respectively. The pth percentile is readily found from the cumulative-frequency graph, (Fig. 1) by running a horizontal line through the p percent mark on the vertical axis on the graph, then a vertical line from this point on the graph to the horizontal axis; the abscissa of the intersection is the value of the pth percentile.
The standard deviation is a measure of variability that is more convenient than percentile differences for further investigation and analysis of statistical data. The standard deviation of a set of measurements x1, x2, …, xn, with the mean is defined as the square root of the mean of the squares of the deviations; it is usually designated by the Greek letter sigma (σ). In symbols
When two social, physical, or biological phenomena increase or decrease proportionately and simultaneously because of identical external factors, the phenomena are correlated positively; under the same conditions, if one increases in the same proportion that the other decreases, the two phenomena are negatively correlated. Investigators calculate the degree of correlation by applying a coefficient of correlation to data concerning the two phenomena. The most common correlation coefficient is expressed as
A mathematical model is a mathematical idealization in the form of a system, proposition, formula, or equation of a physical, biological, or social phenomenon. Thus, a theoretical, perfectly balanced die that can be tossed in a purely random fashion is a mathematical model for an actual physical die. The probability that in n throws of a mathematical die a throw of 6 will occur k times is
In a related but more involved example of a mathematical model, many sets of measurements have been found to have the same type of frequency distribution. For example, let x1, x2, …, xN be the number of 6's cast in the N respective runs of n tosses of a die and assume N to be moderately large. Let y1, y2, …, yN be the weights, correct to the nearest 1/100 g, of N lima beans chosen haphazardly from a 100-kg bag of lima beans. Let z1, z2, …, zN be the barometric pressures recorded to the nearest 1/1000 cm by N students in succession, reading the same barometer. It will be observed that the x's, y's, and z's have amazingly similar frequency patterns. The statistician adopts a model that is a mathematical prototype or idealization of all these patterns or distributions. One form of the mathematical model is an equation for the frequency distribution, in which N is assumed to be infinite:
The statistician is often called upon to decide whether an assumed hypothesis for some phenomenon is valid or not. The assumed hypothesis leads to a mathematical model; the model, in turn, yields certain predicted or expected values, for example, 10, 15, 25. The corresponding actually observed values are 12, 16, 21. To determine whether the hypothesis is to be kept or rejected, these deviations must be judged as normal fluctuations caused by sampling techniques or as significant discrepancies. Statisticians have devised several tests for the significance or reliability of data. One is the chi-square (c2) test. The deviations (observed values minus expected values) are squared, divided by the expected values, and summed:
© 1993-2008 Microsoft Corporation. All Rights Reserved.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2008 Microsoft
![]() ![]() |