Probability space

In probability theory, the notion of probability space is the conventional mathematical model of randomness. It formalizes three interrelated ideas by three mathematical notions. First, a sample point (called also elementary event), — something to be chosen at random (outcome of experiment, state of nature, possibility etc.) Second, an event, — something that will occur or not, depending on the chosen sample point. Third, the probability of an event.

Alternative models of randomness (finitely additive probability, non-additive probability) are sometimes advocated in connection to various probability interpretations.

Introduction
The notion "probability space" provides a basis of the formal structure of probability theory. It may puzzle a non-mathematician for several reasons:


 * it is called "space" but is far from geometry;


 * it is said to provide a basis, but many people applying probability theory in practice neither understand nor need this quite technical notion.

These puzzling facts are explained below. First, a mathematical definition is given; it is quite technical, but the reader may skip it. Second, an elementary case (finite probability space) is presented. Third, the puzzling facts are explained. Next topics are countably infinite probability spaces, and general probability spaces.

Definition
A probability space is a measure space such that the measure of the whole space is equal to 1.

In other words: a probability space is a triple $$\textstyle (\Omega, \mathcal F, P)$$ consisting of a set $$\textstyle \Omega$$ (called the sample space), a σ-algebra (called also σ-field) $$\textstyle \mathcal F $$ of subsets of $$\textstyle \Omega$$ (these subsets are called events), and a measure $$\textstyle P$$ on $$\textstyle (\Omega, \mathcal F)$$ such that $$\textstyle P(\Omega)=1$$ (called the probability measure).

Elementary level: finite probability space
On the elementary level, a probability space consists of a finite number $$n$$ of sample points $$ \omega_1, \dots, \omega_n $$ and their probabilities $$ p_1, \dots, p_n $$ — positive numbers satisfying $$ p_1 + \dots + p_n = 1. $$ The set $$ \Omega = \{ \omega_1, \dots, \omega_n \} $$ of all sample points is called the sample space. Every subset $$ A \subset \Omega $$ of the sample space is called an event; its probability is the sum of probabilities of its sample points. For example, if $$ A = \{ \omega_1, \omega_8, \omega_9 \} $$ then $$ \mathbb{P} (A) = p_1 + p_8 + p_9 $$.

A random variable $$ X $$ is described by real numbers $$ x_1, \dots, x_n $$ (not necessarily different) corresponding to the sample points $$ \omega_1, \dots, \omega_n. $$ Its expectation is $$ \mathbb{E} (X) = x_1 p_1 + \dots + x_n p_n. $$

Why "space"?
Fact: it is called "space" but is far from geometry.

Explanation: see Space (mathematics).

What is it good for?
Fact: it is said to provide a basis, but many people applying probability theory in practice do not need this notion. For them, formulas (such as the addition rule, the multiplication rule, the inclusion-exclusion rule, the law of total probability, Bayes' rule etc. ) are instrumental; probability spaces are not, they reign but do not rule.

Explanation 1. Likewise, one may say that points are of no use in geometry. Formulas connecting lengths and angles (such as Pythagorean theorem, law of sines etc.) are instrumental; points are not.

However, these useful formulas follow from the axioms of geometry formulated in terms of points (and some other notions). It would be very cumbersome and unnatural, if at all possible, to reformulate geometry avoiding points.

Similarly, the formulas of probability follow from the axioms of probability formulated in terms of probability spaces. It would be very cumbersome and unnatural, if at all possible, to reformulate probability theory avoiding probability spaces.

Explanation 2. One of the most useful formulas is linearity of expectation: $$ \mathbb{E} (aX+bY) = a \mathbb{E} (X) + b \mathbb{E} (Y) $$ whenever $$ X, Y $$ are random variables and $$ a, b $$ are (non-random) coefficients. One may derive this formula avoiding probability spaces, by transforming the sum
 * $$ \sum_{x,y} (ax+by) \mathbb{P} (X=x, Y=y ) $$

into the linear combination
 * $$ a \sum_x x \mathbb{P} (X=x) + b \sum_y y \mathbb{P} (Y=y). $$

However, much better insight is provided by probability spaces: the expectation $$ \mathbb{E} (X) = x_1 p_1 + \dots + x_n p_n $$ is a linear function of the variables $$ x_1, \dots, x_n. $$ Moreover, a helpful connection to linear algebra appears: random variables form an $$n$$-dimensional linear space, and the expectation is a linear functional on this space.

Two approaches to infinity
Everything is finite in applications, but mathematical theories often benefit by using infinity. In mathematical analysis, infinity appears only indirectly, via limiting procedure, when one says that something "tends to infinity". In the set theory, infinity appears directly; for instance, one say that "the set of prime numbers is infinite". Both approaches to infinity can be used in probability theory.

Example 1. "A randomly chosen integer is even with probability 0.5." This phrase is interpreted via limiting procedure: the fraction of even numbers among $$ 1,\dots,n $$ converges to 0.5 as $$ n $$ tends to infinity. This approach introduces an infinite sequence of finite probability spaces; the $$n$$-th space consists of sample points $$ 1,\dots,n $$ endowed with equal probabilities $$ 1/n. $$

Example 2. "Flipping a fair coin repeatedly one must get "heads" sooner or later." Also this phrase may be interpreted via an infinite sequence of finite probability spaces: flipping the coin $$n$$ times one gets "heads" at least once with the probability $$ 1 - 2^{-n} $$ that converges to 1 as $$n$$ tends to infinity. Another interpretation is possible, via a single infinite probability space consisting of the sequences H, TH, TTH, TTTH and so on ("TTH" means: "tails" twice, then "heads"; the coin is tossed until "heads") having the probabilities
 * $$ \mathbb{P} (H) = 1/2; \quad \mathbb{P} (TH) = 1/4; \quad

\mathbb{P} (TTH) = 1/8; \quad \dots $$ whose sum is $$ 2^{-1} + 2^{-2} + 2^{-3} + \dots = 1. $$ One may insert also the infinite sequence "TTT..." ("tails forever") to the sample space; but then necessarily
 * $$ \mathbb{P} (TTT\dots) = 0 $$

since the sum of probabilities cannot exceed 1.

It is tempting to extend this approach (a single infinite probability space) to the case of Example 1, defining
 * $$ \mathbb{P} (A) = \lim_{n\to\infty} \frac{ | A \cap \{1,\dots,n\} | }{ n } $$

for $$ A \subset \Omega = \{ 1,2,\dots \}; $$ here $$ | A \cap \{1,\dots,n\} | $$ is the number of elements of $$A$$ among $$ 1,\dots,n. $$ This limit, called the density of $$A,$$ is a useful mathematical device. However, treating it as probability one gets numerous paradoxes. One paradox: an integer chosen at random must have more than one decimal digit, since $$ \mathbb{P} ( \{ 1,\dots,9 \} ) = 0. $$ Similarly, it must have more than two digits; and so on. Thus, it must have infinitely many digits, which cannot happen to an integer. Another paradox: let two integers $$ X, Y $$ be chosen at random, independently. Then $$ \mathbb{P} ( X \le Y ) = 0, $$ since $$ \mathbb{P} ( X \le 1 ) = 0, $$ $$ \mathbb{P} ( X \le 2 ) = 0 $$ and so on. Similarly, $$ \mathbb{P} ( Y \le X ) = 0. $$ Thus, it must be $$ X > Y > X $$.

By default (unless explicitly stated otherwise), probability theory deals with a single probability space. When solving a specific problem, the probability space is usually (but not always) chosen according to the given problem; when developing general theory, it is arbitrary.

The notions "negligible" and "almost sure"
A sample point of zero probability can be added to a probability space or removed from it at will, since it cannot contribute to any probability (or expectation). Such point is called negligible.

In Example 2 (above) the case "tails forever" is negligible.

An event of probability 1 is said to happen almost surely.

In Example 2 (above), "heads" appears (sooner or later) almost surely.

The following anecdote follows a real event.

Professor (dealing with a random variable $$X$$): ...here we use the evident fact that $$ -1 \le \sin X \le 1 $$ almost surely.

Student: Why "almost surely"? It holds surely.

Professor (laughing): You see, I am a probabilist. We probabilists do not say "sure"; "almost sure" is our strongest expression.

Countable additivity
As was noted above, paradoxes prevent treating the density of a set $$ A \subset \Omega = \{ 1,2,\dots \} $$ as its probability. These paradoxes are caused by violation of countable additivity. Namely, single-point sets $$ A_1 = \{1\}, \, A_2 = \{2\}, \, A_3 = \{3\}, \dots $$ are of density 0 (each), but their union $$ A_1 \cup A_2 \cup \dots = \Omega $$ is of density 1.

The countable additivity requires
 * $$ \mathbb{P} ( A_1 \cup A_2 \cup \dots ) = \mathbb{P} (A_1) + \mathbb{P} (A_2) + \dots $$

whenever events $$ A_1, A_2, \dots $$ are mutually excluding (in other words, disjoint sets).