Demystifying Statistics

Demystifying Statistics

Your Jewish Fairy Godmother’s 10 Commandments
for Coping with Esoteric Math and Strange Greek Symbols


We’ve all had that moment when we look like deer in the headlights: someone’s
making a presentation and using all sorts of mystical jargon and strange
symbols. They survey the room and seem to look straight at you, arched
eyebrow implying, You get it, don’t you? You can either fake it and nod, or admit
you have no idea what the person is talking about. You won’t actually be the only
clueless one in the room, but everyone else will be staring intently at you, eyes
carefully averted from the speaker.

The commandments below won’t substitute for a full-bore statistics class. But
they should be enough to get you out of the room with your ego and job intact.
Grab a pencil and paper, take them step by step, remember to breathe, and
you’ll see it’s not as tough as its reputation.

Commandment Number 1: Know your odds.

Everything in life is measured in odds. A sure thing has a 100% chance of
happening. It’s guaranteed, or, in statistical terms, an event with a 100%
probability. Events are how statisticians talk about things that happen. Probability
is a fancy way of saying odds. If you go to sleep tonight in your own bed, there’s
a 99.99% chance that’s where you’ll wake up in the morning. The world could
end in-between, or you could roll out, but we tend to assume life is more
predictable. Something that’s highly unlikely to occur, say that you’ll wake up an
amnesiac or in China, has a probability .00001, or that approaches 0%. Anything
you can describe or measure has a probability that ranges between 0 and 100%.
(See, this is e.a.s.y…). Just to be safe, statisticians rarely use the 0 or the 100.
They say, approaches 0, or approaches 100 (percent implied).

Commandment Number 2: Identify your population.

No matter what you want to measure, you have to define it. You might care about
the height of NBA players or the age of employees. The life expectancy of people
or lightbulbs. You need to set parameters, which is statisto-speak for criteria that
identify who or what you’re going to study and measure. If you need to know the
height of NBA players, the height of college players isn’t relevant. If someone
talks about “the height of basketball players” you have the right to ask if they
mean K-12, college, pros, or the kids playing in the neighborhood park. Your
studyvariable is however you define it. But once you say what it is, that’s the
definition you stick with.

Commandment Number 3: Know n from N.

Big N means every event you could possibly measure. Every NBA player. Every
light bulb manufactured. Every employee who works for your company. However
you define your population, that’s N. Little n is a sample of N. It represents the
data events that you’re going to do statistics on. If you tested every light bulb,
you’d have none left and would sit in the dark. So you use a sample, a
representative subset of N. There’re many possible n’s in N. Trust me on this, but
any group of 30 or more is considered a good size no matter how big N is.
Amazing but true. The goal is to find an n that is truly representative of N.
Generally randomness is considered a good way to eliminate bias. For example,
if you do a survey but ask only the opinions of your friends, that’s a biased
sample. Better to assign everyone in N a random number, put the numbers in a
hat, and have some stranger on the street draw out 30 of them. Then you have a
legitimate random sample size n = 30.

Commandment Number 4: Show off what you found out.

Even without measuring anything else, you’re already doing descriptive statistics!
The next step is to make them visual. In elementary school you learned about
Pie Charts: a circle broken into different size slices, each slice representing a
percent of the whole. Also, Bar Charts: the height of each bar shows how many
people are in a given category. There’s lots of other, fancier techniques. But for
almost all of them you’re limited to the two-dimensions of a piece of paper. In
computer programs you can make graphs look three-dimensional, but you need
to think about what you’re really trying to show. Generally speaking, in addition to
whatever you measured (in units of however you measured it), you have to
convey how many events got what score, the time period things change over,
perhaps contrasts between different groups (e.g. men and women, or salary vs
hourly, or technical vs sales). You can use different colors, footnotes, and other
tools. Your goal is to be able to show your chart to someone else and have them
understand it.

Commandment Number 5: Look at the shape of the distribution.

You’ve got your sample and measured whatever variable you‘re studying. Now
you want to understand what the results are telling you. The simplest way is to
rank the scores from biggest to smallest. Inferential statistics (ways to describe
the population N based on what you observed in your sample n), are usually
graphed in a curvy line on a grid with a horizontal and vertical axis. (Soon you’ll
understand the bell curve.) Imagine a horizontal line from left to right (the x axis),
and vertical one (the y axis) where the crossing point is zero. Mark the x axis with
key intervals (for example 5’-5’6”, 5’7”-5’,11”, etc). On the y axis you measure
count how many events/people/etc fall into a given category, Then connect the
tops of each category. If everyone scored the same, you’d have only a tall mark
in that category. If everyone was spread equally across categories, you’d see a
straight line across them. For most things you measure, there will be groupings,
tall categories with more observations and flatter ones with fewer. Look at the
picture and see what it tells you.

Commandment Number 6: Know one average from another.

Averages tell about the middle of your sample. There are three kinds of
averages. Each one tells you something different. If everyone scored exactly the
same, you could stop counting now. If you looked at all the observations in a
ranked list, the median is the number in the middle. For example, if you look at
the salaries of 101 employees, and rank them from lowest to highest, the median
is the salary of the 51 st person. The mode goes back to the shape of the
distribution. It’s the category with the most observations in it. For example, if
you’re looking at how long people stay with your company, and more of your
employees are in 3-6 years than any other group, the mode is 4.5 (the middle of
the biggest group, even if no single person has worked there 4.5 years). The
mean is the number you get if you share equally. It’s as if you added up all the
scores and divided them by how many people you measured. For example, if you
took the heights of all the players in the NBA divided by the number of players,
the mean height might be 6’3”, even though there are some short guys and some
giants. BTW, whenever someone says “average,” try to know which average
they’re using. In a perfect bell-shaped distribution, all three averages are at the
top of the bell.

Commandment Number 7: Know how different your group is from itself.

The fancy statistical name for this concept is standard deviation. It has to do with
how unalike the members of your sample n (and implicitly N) are from one
another. Imagine a startup firm, where everyone has worked there a very short
time. If you are measuring length of service among employees, there’d be a very
small standard deviation. If you look instead at a place like the US military, you
might find career soldiers in the same sample as new recruits. The standard
deviation would be much larger. For a different visual, imagine an NBA team
where everyone is between 6’1-6’5 (a small standard deviation), compared to
one with a guy 5’5 and another 7’2. the two teams might have the same
“average” height, but they’d look very different when they lined up for the pledge
of allegiance. Note: There’s math to calculate a standard deviation, but most
calculators will do it for you.

Commandment Number 8: Understand for whom the bell tolls.

The infamous bell curve (as in “Do you grade on a curve?) is a distribution
shaped like a bell, drawn from knowing only two numbers, the mean and the
standard deviation. (This is where it gets very cool.) You’ve been doing this
intuitively for years, as in: It takes me 30 minutes to get to work, give or take five.
That means, most of the time, you will get to work in 25-25 minutes. Less often
it’ll take 20-25 minute or 35-40 minutes. Rarely you’ll get there in less than 20 or
more than 40. By knowing only two numbers, the mean and standard deviation,
you can get a very good and surprisingly accurate picture of your population.
Generally speaking, for normally distributed variables, which is a lot of what we
measure, 68% of the population will fall within one standard deviation of the
mean (mean +/- 1 sd), 95% within mean +/-2 sd, and 99% between mean +/- 3
sd. Just from knowing two numbers, you can make a bell curve and get a pretty good picture of what’s going on in the world, all from measuring a random
sample of 30 or more. Amazing but true.

Commandment Number 9: Know what’s significant

This is probably the simplest and most sophisticated concept in statistics. Once
you have a mean and a standard deviation, you can do what are called tests.
Test are a fancy way of asking, if the truth is “this,” and in our sample we found
“that,” then what’re the chances that that by sheer dumb luck we’d have stumbled
onto a sample that would be very far away, improbably away, from “this?” It’d be
like concluding the average height of NBA players is 5’9”, just because we
happened to pick a sample that included a lot of the shorter guys. When people
say “our results are statistically significant,” what they’re really saying is, there’s
only a very small chance, say 1%, or 5%, that we’re wrong when we say the
mean is “that” (and it’s really “this’). One important note: the person doing the
stats decides how sure they want or have to be. If you’re testing an experimental
drug that has a side effect of death, you’d probably want to take a smaller chance
of thinking you’re right when you’re wrong than if you’re asking people which cola
they prefer.

Commandment Number 10: Get out your crystal ball.

There are many more complicated statistical techniques that try and predict
things. For those you generally need to look at more than one variable at a time.
For example , if you’re trying to figure out what you’d pay for a new pickup, you’d
want to know lots of things like: year, mileage, brand (yes, there are ways to
measure things that are names and not numbers), automatic vs 5-speed,
options, what part of the country you’re buying in, accident history, etc etc etc all
the way down to whether or not it has genuine leopard skin seats. If you have
enough info, you can predict what it should cost. That’s how the Kelly Blue book
works. These techniques are interesting, though complex, and you’ll need a more
advanced guide.

Try to think about statistics as looking like algebra but really being geometry.
You’re trying to draw a picture that shows someone what you think is true about
everyone you haven’t measured, based on the people you did measure. If you’re
interested, think about taking a class. If you can master this kind of thinking, it’s a
fast track to advancement.