# Applied statistics/Tutorials

## Contents

## Rules of chance

### The addition rule

For two mutually exclusive events, A and B,

the probability that either A or B will occur is equal to the probability that A will occur plus the probability that B will occur,

- P(A or B) = P(A) + P(B).

### The multiplication rule

For two independent (unrelated) events, A and B,

the probability that A and B will both occur is equal to the probability that A will occur multiplied by the probability that B will occur,

- P(A and B) = P(A) x P(B)

### Bayes' theorem

The probability that event A will occur, given that event B has occurred is equal to the probability that B will occur, given that A has occurred, multiplied by the probability that A will occur divided by the probability that B will occur,

- P(A/B) = P(B/A) x P(A)/P(B).

## Common fallacies

### The double birthday fallacy

#### The fallacy

That it is very unlikely that 2 people in a group of 24 have the same birthday.

#### The truth

That there is a better than 50 percent probability that 2 people in any group of 23 or more will have the same birthday.

#### Proof

**Step 1**: the following chain of argument proves that the number of different pairs in a group of 23 people is 253

(a) the number of pairs that there would be if each of 23 people paired with one of 23 people is 23 x 23 = 529;
(b) deducting the 23 cases in which a person would be paired with himself leaves 529 - 23 ;506; and,
(c) deducting the duplicates that occur if a pair such as AB were counted as well as the pair BA leaves 506/2 = 253

**Step 2**: the following argument proves that probability that the two people making up one particular pair **do not** have the same birthday is 99.726 per cent

(a) of the 365 days in a year there are 364 days that are not A's birthday; (b) there is a one in 365 chance that B's birthday falls on any one of those days; (c) therefore the probability that B's birthday falls on a day that is not A's birthday is 364/365 = 0.99726 or 99.726 per cent;

**Step 3**: the following argument proves that the probability that none of all the 253 different pairs have the same birthdays is 49.94 per cent

(a) since the probability that one particular pair do not have the same birthday is 99.726 per cent, the probability that neither of two selected pairs have the same birthday must be .99726 x .99726 or (0.99726)^{2}, and that for none of three selected pairs it is (0.99726)^{3}... and so on (b) so, the probability that none of the 253 possible pairs of step 1, have a birthday in common is (0.99726)^{253} = 0.4994 or 49.94 per cent.

**Step 4**; since the probability that none of the 256 pairs has the same birthday is 0.4994, the probability that one of the pairs does have the same birthday must be 1 - 0.4995 = 0.5006 or 50.06 per cent.

### The false positive fallacy

#### The fallacy

That if a test of a disease that has a prevalence rate of 1 in 1000 has a false positive rate of 5%, there is a 95 per cent probability that a person who has been given a positive result actually has the disease.

#### The truth

The true probability is 2 per cent.

#### Proof

Let A denote the event of having the disease and, B the event of having been tested positive (for the purpose of applying Bayes' theorem),

Then P(B/A) which is the probability of having been tested positive when having the disease, can be taken as equal to 1;

And P(A) is the probability of having the disease, which with a prevalence of 1 in 1000 must be equal to 1/1000<

And P(B) is the probability of being tested positive, which can be arrived at by 3 steps:

**Step 1** is to observe that since the prevalence of the disease is 1 in 1000, 999 persons out of every 1000 are healthy.

**Step 2** is to recall that for each healthy person the probability of being tested positive is 5% or 1 in 20.

**Step 3** is to apply the multiplication rule and get the answer:

- P(B) = 999/1000 multiplied by 1/20 or, near enough 1/20.

- P(B) = 999/1000 multiplied by 1/20 or, near enough 1/20.

So applying Bayes' theorem, the probability of having the disease, given that you have been tested positive is given by

- P(A/B) = P(B/A) x P(A)/P(B), or:
- = 1 x (1/1000)/(1/20) - which is 0.02, or 2%.

- P(A/B) = P(B/A) x P(A)/P(B), or:

### The prosecutor's fallacy

#### The fallacy (an example)

The fact that the accused's DNA matched that of the sperm found on the victim in a test which has a one in a thousand chance of giving a false positive result means that there is only a one in a thousand chance of the accused's innocence.

#### The truth

In fact it means nothing of the sort. One in a thousand of the rest of the population would give the same result, so if the accused is one of half a million people who could have committed the crime, there would be 500 people (in addition to the real rapist) giving the same result. So, in the absence of other evidence, the positive result establishes only a one in 500 probability of the accused's guilt. (DNA evidence can, of course, provide valid proof of guilt when it is used to establish who, among a restricted group of suspects, had committed the crime).