Skip to Main Content

LIB Basics: Facts, Statistics & Bias

Basic Concepts in Statistics

Average -- The average is a way of picking a "midpoint" in any set of data. There are three kinds of average:

Mean -- This is what people usually mean when they say "average." You add up the values of all the items and divide by the number of items. Advantages: It is easy to calculate. Disadvantages: It is strongly affected by extremely high and/or low values. Example: 5, 5, 5, 10, 10, 50 Mean = 37.5

Median -- This is the value in the middle of the set of data, if you arrange it by value. Advantages: It is always the value of an actual item, and it not affected by high and/or low values. Disadvantages: Slightly more complex to find than a Mean.  Example: 5, 5, 5, 10, 10, 50 Median = 10

Mode -- The value that shows up most often. Advantages: It is always the value o an actual item, and it shows the value that appears the most frequently. Disadvantages: More complex to find than the Mean or the Median, and it's not as familiar as the other two. Example: 5, 5, 5, 10, 10, 50 Mode = 5

Why is this Important?
Since there are a variety of ways to calculate an "average," terms like "average cost" and "average salary" can be misleading, especially if you are comparing averages calculated by different methods. You need to know what kind of average was used before you can make sense of the data!

Standard Deviation -- Many kinds of data (e.g. people's heights) tend to fall in a "Bell Curve." This means that the values cluster around the Mean and trail off in both directions. Generally, 68% of the values will be within + or - one standard deviation of the Mean, while an additional 27% will be between one and two standard deviations, accounting for the vast majority of the values.

www.uri.edu/artsci/newecn/Classes/Art/306a/Outlines/Statistics/normal4.gif

Why is this Important?
A low Standard Deviation means that the data is close to the Mean, and that there are not too many really high or low values. On the other hand, a high Standard Deviation means that the data is all over the place, which could mean that your measuring wasn't very good or there just isn't a lot of similarity. If you are measuring the height of college Freshmen, you would expect the data to be pretty uniform, if you are measuring a random group of kids from age 1 to 20, the values will be all over and not tell you much.
 

This work is licensed under a Creative Commons Attribution 4.0 International License.