STATS 101

Probability: The Logic of Uncertainty

Why we can predict the future (even when we can't predict the next coin flip).

KnowStatistics Jan 12, 2026 6 min read
Abstract visualisation of randomness coalescing into a pattern
Figure: Order emerging from chaos

Why Probability Comes After Data (Not Before)

In our previous articles, like Scatterplots and Correlation, we spent our time looking at data that had already happened. We measured heights, calculated averages, and looked for patterns in exam scores. This is Descriptive Statistics - the art of summarising the past.

But data science isn't just about history; it's about the future. We want to use the data we have to make guesses about the data we don't have. We want to know if a medical treatment will work for a new patient, or who is likely to win the next election. To make that leap from the known to the unknown, we need a bridge. That bridge is Probability.

Data is the evidence. Probability is the tool we use to weigh that evidence. It allows us to quantify our uncertainty, giving us a number that tells us how confident we should be in our predictions.

Randomness vs. Uncertainty

When we say something is "random" in daily life, we usually mean it's chaotic or haphazard. But in statistics, randomness has a very specific meaning. It describes a phenomenon where individual outcomes are uncertain, but there is a regular distribution of outcomes in a large number of repetitions.

Think of a coin flip. If you flip it once, you have no idea if it will be Heads or Tails. It is unpredictable. But if you flip it 10,000 times, you can predict with near certainty that roughly 50% of them will be Heads.

This is the Law of Large Numbers. It states that as you perform more and more trials of a random process, the average result will settle down to a specific, predictable value. This is why casinos always make money in the long run - they can't predict your next hand of blackjack, but they can predict the outcome of a million hands perfectly.

The Law of Large Numbers Simulator

Flip a virtual coin. In the short run, the percentage of heads fluctuates wildly. In the long run, watch it settle.

Total Flips
0
Heads
0
Proportion (Heads)
-

Sample Space & Events (Without the Math Trauma)

To calculate probability, we first need to define the universe of possibilities. We call this the Sample Space (often denoted as $S$). It is simply the list of everything that could possibly happen.

  • Coin flip: $S = \{Heads, Tails\}$
  • Rolling a die: $S = \{1, 2, 3, 4, 5, 6\}$

An Event (usually denoted as $A$, $B$, etc.) is a specific outcome or set of outcomes we are interested in. For example, rolling an even number on a die. The event $A$ would be $\{2, 4, 6\}$.

If every outcome is equally likely (like a fair die), the probability of event $A$ is just a simple fraction:

$P(A) = \frac{\text{Number of outcomes in A}}{\text{Total outcomes in S}}$

Basic Probability Rules (Addition & Multiplication)

Life rarely involves just one event. We often want to know the probability of multiple things happening. For this, we have two main rules:

1. The Addition Rule (OR)

This is used when you are satisfied if either event happens. What is the probability of rolling a 5 OR a 6? Since you can't roll both at once (they are mutually exclusive), you simply add their probabilities:

$P(5 \text{ or } 6) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6}$

2. The Multiplication Rule (AND)

This is used when you need both events to happen. What is the probability of flipping Heads AND then rolling a 6? Since the coin flip doesn't affect the die roll (they are independent), you multiply their probabilities:

$P(\text{Heads and } 6) = \frac{1}{2} \times \frac{1}{6} = \frac{1}{12}$

Two-Dice Explorer

Roll two dice 1,000 times. Observe how the "Sample Space" creates a pattern. Why is 7 the most common sum? (Hint: How many combinations make 7 vs. 2?)

Conditional Probability (The "Given That" Idea)

Probability is not static; it flows and changes based on information. The probability of an event changes if we know that another event has already occurred. This is Conditional Probability, written as $P(A|B)$ - read as "the probability of A given B".

Imagine a deck of cards. The probability of drawing an Ace is $4/52$. But if I tell you "I have already removed the King of Hearts from the deck," the probability changes slightly to $4/51$.

Or, think back to our scatterplots. If we pick a random student, the probability they scored over 90% might be low. But if we add the condition "Given that the student studied for 10 hours," the probability of that high score shoots up. Information refines our prediction.

Independence vs. Dependence

Understanding the relationship between events is crucial.

  • Independent Events: The outcome of one does not affect the other. A coin has no memory. If you flip 10 Heads in a row, the probability of the next flip being Heads is still exactly 50%. Believing otherwise is the famous Gambler's Fallacy.
  • Dependent Events: The outcome of one affects the other. Drawing cards from a deck without putting them back is a classic example. If you draw an Ace, the deck is now "poorer" in Aces, lowering the chance for the next person.

Why Probability Is the Language of Inference

Why do we learn all this? Because in the real world, we never know the "Truth" (the population parameter). We only have samples.

When a scientist sees a result in their data, they have to ask: "Is this a real pattern, or did I just get lucky?" Probability provides the answer. It allows us to calculate how "surprising" a result is. If a coin lands on Heads 100 times in a row, probability tells us that this is so astronomically unlikely to happen by chance that the coin is almost certainly rigged.

This "measure of surprise" is the basis of the P-value, a concept we will explore deeply in future articles.

Conclusion: From Patterns to Predictions

We have moved from looking at patterns (scatterplots) to understanding the rules of chance that generate those patterns. We've learned that randomness is not chaos but long-term predictability. We've seen how to combine probabilities and how information changes the odds.

Now that we have the language of probability, we are ready for the core of statistics. In the next post, we will look at Sampling Distributions and finally answer the question: how can a poll of 1,000 people predict the behavior of 300 million?

Probability Randomness Inference Law of Large Numbers
Author Photo

About KnowStatistics

Hello, I'm Nina. I'm the founder, writer, and designer behind KnowStatistics. I spend a lot of my time pondering the whys, and chances are, you've stumbled upon an idea I once had.

I believe statistics to be the key to understanding the world, and it is imperative that it is accessible - so that's the heart of KnowStatistics.

Comments