# A layman's guide to Statistics and Probability !

July 19, 2015

The look that i get from a-level students who for the first time are glancing their S1 textbook and stumble across these large tables in the appendix with piles and piles of numbers, is partially the reason that has prompted me to make this resource. That look is often extreme confusion / fear / anxiety of what is to come... till in a month or two months' time where those tables are merely an unknowingly magical calculator that every student uses.

In this guide I hope to make you understanding statistical jargon and differentiate the terms "Statistics" and "Probability". I've always found higher level Maths easy when I've comprehended the underlying principles - the exact manner in which i tutor in. Anyway let's get started :

Statistics is concerned with the collection, presentation and interpretation of data. (The easy bit - all your GCSE maths)

i) collection - how efficiently can we collect data, by reducing bias in our samples.

ii) presentation - the data we collect is useless to just gather and look at. That's why we have fancy graphs and charts so that our human brain can interact with the visual aid and easily make inferences about what the data is saying by building a picture. (Bar Charts, Histograms, Pie charts, blah blah...)

iii) Interpretation - Can be unique to each person. We can interpret the data we collected and presented in our own way, but a more rigorous approach is to use statistics (this simply means a function of the data we collected which tells us about it - the mean, median, mode, quartiles, percentiles..)

Statistics forms part of the two way interaction with Probability Theory. The statistics part is usually for the purpose to for using Probability theory to make inferences about the behaviour of an underlying population.

Statistics in itself can be explained very qualitatively, it's Probability theory which gets very messy and the more messier it gets the more it helps in understanding certain phonemena and try to model such processes. A few professions in which advanced Probability theory is used :

*Actuarial Science (study of applied mathematics to sectors such as insurance, pensions (my one) and investment.

*Investment Banks / Trading - Quants (mathematical geniuses use probability theory to predict values of stocks and financial contracts to hedge risk and make millions).

The notion of Probability theory starts off from Random Variables. A random variable is simply something of interest which we can quantify (assign a number to), but it's random (we can't predict with accuracy). So basically anything  !!

For example for some odd reason I'm interested in the amount of rainfall that falls on the 20th July at 9.15 am. (quite random but the whole point of this). Anyway, so firstly this is a random variable because it ticks the box that we can't predict it with 100% accuracy and if it were to happen it would be just coincidence (a lot factors underpin how much it will rain which we have no way of understanding).

The first distinction to make about Random variables are whether they are Continuous (these can take any number) or Discrete (integer values only). So i know rain can be 3.22mm if wants or 1045.5345495425 cm, so that makes it a Continuous random variable.

The next step is assigning a Probability distribution. This essentially helps build our model to predicting the rain. The most famous continuous distribution is the Normal (Gaussian) distribution - it's called Normal because many natural processes have a tendency to follow this.. so anything not normal is abnormal (which are the confusing ones).

Let me take a step back and explain what a probability distribution actually is - A list of values that random variable can take and the probability that each value with occur with.

To make the it easier let's pretend rain can only be anything between 0 - 100mm inclusive, and the fact that it's continuous means it can take an infinite number of values (any number on the number line). So we wouldn't have a list in this case because it would be infinite...

The question how do we work out the probability of the rain being 1mm or 2.15 mm or 55.9 mm ? Well it's using the probability density function (p.d.f.) - which is the formula for a continuous distribution which tells you how to work the probability. For the normal distribution it's pretty intense : So that's why we have those tables at the back of your s1 textbook which have a list of numbers and probabilities beside them which are simply what are the result of the formula above. That's why i said those tables are quite magical.

Now back to the normal distribution. Each distribution needs use to parameters (these are like statistics - but from the population rather than a sample). For the normal one, we need the mean (population mean which is called mew / mu) and the variance (sigma squared) - these are in the formula above. Now all I need to do is pick a value i'm interested in so for example 0mm (i hate rain), standardise it (using another formula), get the standard value and read off the probability off the table to see what the chance of having no rain is. Unfortunately if i use the normal distribution the number i get gives me a very true representation of the british climate.

To summarise :

A random variable > assign probability distribution > use probability density function > get your probability.

I will talk continue this series by further dwelving into discrete random variables and other concepts..