Demystifying Statistics

Your Jewish Fairy Godmother’s 10 Commandments

for Coping with Esoteric Math and Strange Greek Symbols

We’ve all had that moment when we look like deer in the headlights: someone’s

making a presentation and using all sorts of mystical jargon and strange

symbols. They survey the room and seem to look straight at you, arched

eyebrow implying, You get it, don’t you? You can either fake it and nod, or admit

you have no idea what the person is talking about. You won’t actually be the only

clueless one in the room, but everyone else will be staring intently at you, eyes

carefully averted from the speaker.

The commandments below won’t substitute for a full-bore statistics class. But

they should be enough to get you out of the room with your ego and job intact.

Grab a pencil and paper, take them step by step, remember to breathe, and

you’ll see it’s not as tough as its reputation.

Commandment Number 1: Know your odds.

Everything in life is measured in odds. A sure thing has a 100% chance of

happening. It’s guaranteed, or, in statistical terms, an event with a 100%

probability. Events are how statisticians talk about things that happen. Probability

is a fancy way of saying odds. If you go to sleep tonight in your own bed, there’s

a 99.99% chance that’s where you’ll wake up in the morning. The world could

end in-between, or you could roll out, but we tend to assume life is more

predictable. Something that’s highly unlikely to occur, say that you’ll wake up an

amnesiac or in China, has a probability .00001, or that approaches 0%. Anything

you can describe or measure has a probability that ranges between 0 and 100%.

(See, this is e.a.s.y…). Just to be safe, statisticians rarely use the 0 or the 100.

They say, approaches 0, or approaches 100 (percent implied).

Commandment Number 2: Identify your population.

No matter what you want to measure, you have to define it. You might care about

the height of NBA players or the age of employees. The life expectancy of people

or lightbulbs. You need to set parameters, which is statisto-speak for criteria that

identify who or what you’re going to study and measure. If you need to know the

height of NBA players, the height of college players isn’t relevant. If someone

talks about “the height of basketball players” you have the right to ask if they

mean K-12, college, pros, or the kids playing in the neighborhood park. Your

studyvariable is however you define it. But once you say what it is, that’s the

definition you stick with.

Commandment Number 3: Know n from N.

Big N means every event you could possibly measure. Every NBA player. Every

light bulb manufactured. Every employee who works for your company. However

you define your population, that’s N. Little n is a sample of N. It represents the

data events that you’re going to do statistics on. If you tested every light bulb,

you’d have none left and would sit in the dark. So you use a sample, a

representative subset of N. There’re many possible n’s in N. Trust me on this, but

any group of 30 or more is considered a good size no matter how big N is.

Amazing but true. The goal is to find an n that is truly representative of N.

Generally randomness is considered a good way to eliminate bias. For example,

if you do a survey but ask only the opinions of your friends, that’s a biased

sample. Better to assign everyone in N a random number, put the numbers in a

hat, and have some stranger on the street draw out 30 of them. Then you have a

legitimate random sample size n = 30.

Commandment Number 4: Show off what you found out.

Even without measuring anything else, you’re already doing descriptive statistics!

The next step is to make them visual. In elementary school you learned about

Pie Charts: a circle broken into different size slices, each slice representing a

percent of the whole. Also, Bar Charts: the height of each bar shows how many

people are in a given category. There’s lots of other, fancier techniques. But for

almost all of them you’re limited to the two-dimensions of a piece of paper. In

computer programs you can make graphs look three-dimensional, but you need

to think about what you’re really trying to show. Generally speaking, in addition to

whatever you measured (in units of however you measured it), you have to

convey how many events got what score, the time period things change over,

perhaps contrasts between different groups (e.g. men and women, or salary vs

hourly, or technical vs sales). You can use different colors, footnotes, and other

tools. Your goal is to be able to show your chart to someone else and have them

understand it.

Commandment Number 5: Look at the shape of the distribution.

You’ve got your sample and measured whatever variable you‘re studying. Now

you want to understand what the results are telling you. The simplest way is to

rank the scores from biggest to smallest. Inferential statistics (ways to describe

the population N based on what you observed in your sample n), are usually

graphed in a curvy line on a grid with a horizontal and vertical axis. (Soon you’ll

understand the bell curve.) Imagine a horizontal line from left to right (the x axis),

and vertical one (the y axis) where the crossing point is zero. Mark the x axis with

key intervals (for example 5’-5’6”, 5’7”-5’,11”, etc). On the y axis you measure

count how many events/people/etc fall into a given category, Then connect the

tops of each category. If everyone scored the same, you’d have only a tall mark

in that category. If everyone was spread equally across categories, you’d see a

straight line across them. For most things you measure, there will be groupings,

tall categories with more observations and flatter ones with fewer. Look at the

picture and see what it tells you.

Commandment Number 6: Know one average from another.

Averages tell about the middle of your sample. There are three kinds of

averages. Each one tells you something different. If everyone scored exactly the

same, you could stop counting now. If you looked at all the observations in a

ranked list, the median is the number in the middle. For example, if you look at

the salaries of 101 employees, and rank them from lowest to highest, the median

is the salary of the 51 st person. The mode goes back to the shape of the

distribution. It’s the category with the most observations in it. For example, if

you’re looking at how long people stay with your company, and more of your

employees are in 3-6 years than any other group, the mode is 4.5 (the middle of

the biggest group, even if no single person has worked there 4.5 years). The

mean is the number you get if you share equally. It’s as if you added up all the

scores and divided them by how many people you measured. For example, if you

took the heights of all the players in the NBA divided by the number of players,

the mean height might be 6’3”, even though there are some short guys and some

giants. BTW, whenever someone says “average,” try to know which average

they’re using. In a perfect bell-shaped distribution, all three averages are at the

top of the bell.

Commandment Number 7: Know how different your group is from itself.

The fancy statistical name for this concept is standard deviation. It has to do with

how unalike the members of your sample n (and implicitly N) are from one

another. Imagine a startup firm, where everyone has worked there a very short

time. If you are measuring length of service among employees, there’d be a very

small standard deviation. If you look instead at a place like the US military, you

might find career soldiers in the same sample as new recruits. The standard

deviation would be much larger. For a different visual, imagine an NBA team

where everyone is between 6’1-6’5 (a small standard deviation), compared to

one with a guy 5’5 and another 7’2. the two teams might have the same

“average” height, but they’d look very different when they lined up for the pledge

of allegiance. Note: There’s math to calculate a standard deviation, but most

calculators will do it for you.

Commandment Number 8: Understand for whom the bell tolls.

The infamous bell curve (as in “Do you grade on a curve?) is a distribution

shaped like a bell, drawn from knowing only two numbers, the mean and the

standard deviation. (This is where it gets very cool.) You’ve been doing this

intuitively for years, as in: It takes me 30 minutes to get to work, give or take five.

That means, most of the time, you will get to work in 25-25 minutes. Less often

it’ll take 20-25 minute or 35-40 minutes. Rarely you’ll get there in less than 20 or

more than 40. By knowing only two numbers, the mean and standard deviation,

you can get a very good and surprisingly accurate picture of your population.

Generally speaking, for normally distributed variables, which is a lot of what we

measure, 68% of the population will fall within one standard deviation of the

mean (mean +/- 1 sd), 95% within mean +/-2 sd, and 99% between mean +/- 3

sd. Just from knowing two numbers, you can make a bell curve and get a pretty good picture of what’s going on in the world, all from measuring a random

sample of 30 or more. Amazing but true.

Commandment Number 9: Know what’s significant

This is probably the simplest and most sophisticated concept in statistics. Once

you have a mean and a standard deviation, you can do what are called tests.

Test are a fancy way of asking, if the truth is “this,” and in our sample we found

“that,” then what’re the chances that that by sheer dumb luck we’d have stumbled

onto a sample that would be very far away, improbably away, from “this?” It’d be

like concluding the average height of NBA players is 5’9”, just because we

happened to pick a sample that included a lot of the shorter guys. When people

say “our results are statistically significant,” what they’re really saying is, there’s

only a very small chance, say 1%, or 5%, that we’re wrong when we say the

mean is “that” (and it’s really “this’). One important note: the person doing the

stats decides how sure they want or have to be. If you’re testing an experimental

drug that has a side effect of death, you’d probably want to take a smaller chance

of thinking you’re right when you’re wrong than if you’re asking people which cola

they prefer.

Commandment Number 10: Get out your crystal ball.

There are many more complicated statistical techniques that try and predict

things. For those you generally need to look at more than one variable at a time.

For example , if you’re trying to figure out what you’d pay for a new pickup, you’d

want to know lots of things like: year, mileage, brand (yes, there are ways to

measure things that are names and not numbers), automatic vs 5-speed,

options, what part of the country you’re buying in, accident history, etc etc etc all

the way down to whether or not it has genuine leopard skin seats. If you have

enough info, you can predict what it should cost. That’s how the Kelly Blue book

works. These techniques are interesting, though complex, and you’ll need a more

advanced guide.

Try to think about statistics as looking like algebra but really being geometry.

You’re trying to draw a picture that shows someone what you think is true about

everyone you haven’t measured, based on the people you did measure. If you’re

interested, think about taking a class. If you can master this kind of thinking, it’s a

fast track to advancement.