48  The Birthday Problem in IoT Networks

Key Concepts
  • Birthday Problem: A probability theory result showing that in a group of 23 people, there is a >50% chance that two share a birthday; analogous to collision probability in networks
  • Collision Probability: The mathematical likelihood that two or more devices transmit simultaneously on a shared channel in a given time interval
  • Poisson Process: A statistical model for random independent events over time; models packet arrivals and transmission attempts in a network
  • Load Factor (G): The ratio of total offered traffic (including retransmissions) to channel capacity; used in ALOHA and CSMA performance models
  • Throughput (S): The fraction of time the channel carries successful (non-colliding) transmissions; maximised at a specific load factor for each MAC protocol
  • Pure ALOHA: The simplest random-access protocol where devices transmit immediately; peak throughput of 18.4% due to collisions
  • Slotted ALOHA: An improvement where transmissions are aligned to time slots; doubles peak throughput to 36.8%

48.1 In 60 Seconds

The Birthday Problem reveals that address collisions in IoT networks happen far sooner than intuition suggests: with only 23 people in a room, there is a 50% chance two share a birthday. Applied to networking, a 16-bit address space (65,536 values) reaches 50% collision probability at just 301 devices. This directly explains why protocols like Zigbee limit network size and why IPv6’s 128-bit addresses virtually eliminate collisions at scale.

48.2 Learning Objectives

By the end of this chapter, you will be able to:

  • Apply the Birthday Problem formula to calculate collision probability
  • Explain why address collisions occur more often than intuition suggests
  • Evaluate the impact of address size on collision probability
  • Calculate collision thresholds for different IoT address spaces

The birthday problem is a famous math puzzle: in a room of just 23 people, there is a 50% chance two share the same birthday. In IoT networks, a similar effect happens when devices are randomly assigned addresses from a limited pool – even a modest number of devices can cause surprising numbers of address collisions, just like surprising birthday matches in a small group.

“Let me show you a magic trick!” said Max the Microcontroller. “If we put just 23 sensors in a room, there is a 50% chance two of them share the same network address. Sounds impossible, right?”

“No way!” Sammy the Sensor protested. “There are thousands of possible addresses. How could 23 devices cause a match?” Max grinned. “Because you are not comparing each device to ONE specific address – you are comparing every device to EVERY other device. With 23 devices, that is 253 different pairs to check. The matches add up way faster than your brain expects!”

Lila the LED looked worried. “So in a Zigbee network with only 65,536 possible short addresses, we hit 50% collision chance at just 301 devices? That explains the network size limit!”

“Exactly,” said Bella the Battery. “That is why IPv6 uses 128-bit addresses – the address space is so enormous that you would need quintillions of devices before collisions become likely. It is like having a birthday calendar with trillions of days instead of 365. The Birthday Problem teaches us that address spaces need to be MUCH bigger than the number of devices we plan to use!”

48.3 Introduction

Time: ~10 min | Difficulty: Intermediate | Unit: P07.C15.U04a

When designing IoT networks, engineers often assume that address collisions are extremely unlikely given the large address spaces available. However, the Birthday Problem (also called the Birthday Paradox) reveals a surprising truth: collisions occur far more frequently than intuition suggests, with critical implications for MAC addresses, short addresses, and network design.

48.4 The Classic Birthday Problem

The Question: In a room of randomly selected people, how many people do you need before there’s a 50% chance that two people share the same birthday?

Intuitive Answer: Most people guess around 183 (half of 365 days in a year).

Actual Answer: Only 23 people are needed for a 50% probability of a shared birthday!

Why So Few? The key insight is that we’re not comparing each person to a specific birthday – we’re comparing all possible pairs of people. With 23 people, there are 253 possible pairs (23 x 22 / 2 = 253), which is why the probability grows much faster than expected.

48.5 The Birthday Problem Formula

The general formula for collision probability is:

\[P(\text{collision}) \approx 1 - e^{-n^2/(2m)}\]

Where:

  • n = number of items (devices, addresses assigned)
  • m = number of possible values (address space size)
  • e = Euler’s number (approximately 2.718)

Key Insight: Probability depends on n squared, not n. This quadratic relationship means collisions grow much faster than the number of devices.

The 50% collision threshold occurs at approximately:

\[n_{50\%} \approx \sqrt{2m \cdot \ln 2} \approx 1.177\sqrt{m}\]

This square root relationship is the core of the Birthday Problem: you only need about the square root of the address space to reach a coin-flip chance of collision.

Try It: Birthday Problem Calculator

48.6 Application to IoT: MAC Address Collisions

MAC Address Space:

  • 48-bit MAC addresses = 2^48 = 281,474,976,710,656 addresses (281 trillion)
  • Seems impossible to ever have collisions, right?

Reality Check Using Birthday Problem:

The 50% collision threshold for 48-bit addresses is approximately 19.75 million randomly assigned devices. The table below shows how collision probability grows:

Number of Devices Collision Probability Practical Scenario
100,000 0.002% Campus IoT deployment
1 million 0.18% City-wide IoT network
5 million 4.3% National smart meter rollout
10 million 16.3% Large-scale regional deployment
20 million 50.9% (1 in 2) Global smart home devices threshold
50 million 98.8% Well beyond collision certainty

Calculation Example (1 million devices):

\[P(\text{collision}) \approx 1 - e^{-(10^6)^2/(2 \times 2^{48})} = 1 - e^{-0.00178} \approx 0.00178 = 0.178\%\]

Real-World Impact: Even at 0.18% probability with 1 million devices, the expected number of colliding pairs is approximately 1.78. As deployments scale toward tens of millions of randomly assigned addresses, collisions become near-certain. This is why manufacturers must carefully manage MAC address allocation from IEEE-assigned OUI blocks rather than assigning randomly.

48.7 Short Addresses: The Critical Case

Many IoT protocols use short addresses for efficiency. The Birthday Problem becomes severe in these constrained address spaces.

48.7.1 16-bit Short Addresses (802.15.4, Zigbee)

Address Space: 2^16 = 65,536 addresses

Collision Probability:

Line chart showing exponential growth of collision probability for 16-bit address space, rising from 2% at 50 devices to 64% at 350 devices, illustrating the Birthday Problem's impact on IoT networks with constrained addressing
Figure 48.1: Collision probability growth from 2% at 50 devices to 64% at 350 devices

Critical Threshold: With just 301 devices, there’s a 50% chance of address collision!

Why Zigbee Networks Limit Device Count:

802.15.4 Personal Area Network (PAN):
- Coordinator assigns 16-bit short addresses randomly
- With 301 devices: 50% collision probability
- With 250 devices: 38% collision probability (manageable with detection)
- With 500 devices: 85% collision probability (network breakdown!)

This is why Zigbee specifications recommend:
- Maximum 250-300 devices per PAN
- Use collision detection and reassignment
- Or fall back to 64-bit extended addresses

48.8 Collision Probability Across Address Sizes

The following table shows how many devices produce a 50% collision probability for different address sizes commonly used in IoT:

Address Size Total Addresses Devices for 50% Collision IoT Protocol Examples
8-bit 256 ~19 devices BLE advertising channels
16-bit 65,536 ~301 devices 802.15.4 short addresses, Zigbee PANs
24-bit 16,777,216 ~4,823 devices Some proprietary protocols
32-bit 4,294,967,296 ~77,163 devices IPv4 private networks (with NAT)
48-bit 281 trillion ~19.75 million devices MAC addresses (Ethernet, Wi-Fi, BLE)
64-bit 18.4 quintillion ~5.06 billion devices EUI-64, IPv6 IIDs, LoRaWAN DevEUI
128-bit 340 undecillion ~21.7 quintillion devices IPv6 addresses, UUIDs

Key Observation: The 50% threshold device count is approximately \(\sqrt{2m \cdot \ln 2} \approx 1.177\sqrt{m}\), where \(m\) is the address space size. This square root relationship means doubling the address bits squares the address space, which increases the safe device count by roughly the square root of that factor. Going from 16-bit to 32-bit addresses increases the threshold from 301 to 77,163 – a 256x improvement.

How does the Birthday Problem formula explain Zigbee’s practical network size limits? Let’s calculate the exact collision probability.

Zigbee PAN uses 16-bit short addresses assigned randomly by the coordinator. \(\text{Address space} = 2^{16} = 65{,}536\text{ possible addresses}\)

Birthday Problem formula: \(P(\text{collision}) \approx 1 - e^{-n^2/(2m)}\)

At 300 devices (Zigbee’s recommended maximum): \(P = 1 - e^{-(300)^2/(2 \times 65{,}536)}\) \(= 1 - e^{-90{,}000/131{,}072}\) \(= 1 - e^{-0.687}\) \(= 1 - 0.503 = 0.497 = 49.7\%\)

At 500 devices (beyond recommended limit): \(P = 1 - e^{-(500)^2/(2 \times 65{,}536)}\) \(= 1 - e^{-250{,}000/131{,}072}\) \(= 1 - e^{-1.907}\) \(= 1 - 0.148 = 0.852 = 85.2\%\)

Why the quadratic growth matters: \(\text{Number of address pairs} = \frac{n(n-1)}{2}\)

  • 100 devices: \(\frac{100 \times 99}{2} = 4{,}950\text{ pairs}\)
  • 300 devices: \(\frac{300 \times 299}{2} = 44{,}850\text{ pairs (9x more!)}\)
  • 500 devices: \(\frac{500 \times 499}{2} = 124{,}750\text{ pairs (25x more!)}\)

Key insight: Zigbee’s 300-device limit is NOT arbitrary – it’s mathematically derived from the Birthday Problem. At ~301 devices, collision probability crosses 50%, requiring coordinator intervention (collision detection and address reassignment). IPv6’s 128-bit address space needs roughly 21.7 quintillion devices to reach the same 50% threshold, making collisions physically impossible for any foreseeable IoT deployment.

48.9 Worked Example: LoRaWAN Transmission Collisions

The Birthday Problem also applies to time-slot collisions in uncoordinated IoT networks. LoRaWAN Class A devices transmit at random times, and two devices transmitting simultaneously on the same channel and spreading factor cause a collision.

Scenario: A city deploys 2,000 LoRaWAN parking sensors, each transmitting a 20-byte status update every 10 minutes. The gateway has 8 channels and supports 6 spreading factors (SF7-SF12). What is the collision rate?

48.9.1 Step 1: Calculate Airtime per Transmission

At SF10, 125 kHz bandwidth:
  Symbol time: 2^10 / 125000 = 8.19 ms
  Preamble: 8 symbols = 65.5 ms
  Payload symbols (20 bytes): ~15 symbols = 122.9 ms
  Total airtime: ~190 ms per transmission

48.9.2 Step 2: Calculate Collision Window

Two transmissions collide if they overlap in time on the same channel AND SF.
Collision window: 2 x airtime = 380 ms (device A starts during B, or B starts during A)

Available channel-SF combinations: 8 channels x 6 SFs = 48 "virtual channels"
Transmissions per device per hour: 6
Total transmissions per hour: 2,000 x 6 = 12,000

Transmissions per virtual channel per hour: 12,000 / 48 = 250

48.9.3 Step 3: Apply Birthday Problem to Time Slots

Think of each hour as having "time slots" of 380 ms width:
  Slots per hour: 3,600,000 ms / 380 ms = 9,474 slots

Birthday Problem: 250 transmissions randomly placed in 9,474 slots
  P(at least one collision) = 1 - e^(-250^2 / (2 x 9,474))
  P = 1 - e^(-3.30) = 0.963 = 96.3%

Expected collisions per hour per virtual channel:
  E[collisions] = 250^2 / (2 x 9,474) = 3.30

Total collisions per hour: 3.30 x 48 = 158 collisions
Collision rate: 158 / 12,000 = 1.3%

48.9.4 Step 4: Evaluate Impact

Without retransmission: 1.3% packet loss (acceptable for parking status)
With 1 retry (confirmed uplink): ~0.017% loss (1.3% x 1.3%)

At 2,000 devices: 1.3% collision rate is manageable
At 10,000 devices: collision rate rises to ~6.5% (Birthday Problem quadratic growth)
At 20,000 devices: collision rate reaches ~13% (requires ADR optimization)

Key Insight: The Birthday Problem explains why LoRaWAN networks experience collisions much sooner than device density alone would suggest. Doubling the device count quadruples the collision rate – not because the network is “full” (channel utilization is under 3%) but because random arrivals create pairwise collision opportunities that grow with n squared.

48.10 Practical Design Rules from the Birthday Problem

The Birthday Problem provides concrete design rules for IoT network engineers:

Address/Slot Space Safe Count (<1%) 50% Threshold Design Rule
2^8 = 256 ~2 devices ~19 Use for tiny networks only
2^16 = 65,536 ~26 devices ~301 Add collision detection above 200 devices or use extended addresses
2^24 = 16.7M ~410 devices ~4,823 Safe for single-building deployments
2^32 = 4.3B ~6,554 devices ~77,163 Adequate for campus-scale IoT
2^48 = 281T ~1.68M devices ~19.75M Safe for regional deployments (with managed OUI allocation)
2^64 = 18.4Q ~430M devices ~5.06B Safe for global deployments
2^128 ~1.85Q devices ~21.7Q Collision-proof for any foreseeable deployment

Rule of Thumb: For less than 1% collision probability, limit device count to approximately \(\sqrt{m}/10\), where \(m\) is the address space size. For Zigbee (16-bit): \(\sqrt{65{,}536}/10 \approx 26\) devices per PAN for near-collision-free operation. For LoRaWAN DevEUI (64-bit): \(\sqrt{2^{64}}/10 \approx 430\) million devices globally.

48.11 Real-World Case Study: Wi-Fi Probe Request Collisions in Retail Analytics

A retail analytics company deployed Wi-Fi sensors in 200 shopping malls to track foot traffic by passively capturing smartphone probe requests. Each sensor recorded the randomized MAC addresses from nearby smartphones to estimate unique visitor counts.

The Problem: Apple, Google, and Samsung implemented MAC address randomization in 2014-2020, where phones broadcast random MAC addresses in probe requests instead of their real hardware MAC. The analytics system counted unique MAC addresses to estimate visitors, but randomization meant each phone generated 5-15 different MAC addresses per visit.

Birthday Problem Impact on Counting Accuracy:

Shopping mall with 3,000 actual visitors per day
Each visitor generates ~10 randomized MACs = 30,000 observed MACs
MAC randomization uses 46-bit random space (2 bits reserved for local/unicast)
Address space: 2^46 = 70.4 trillion addresses

Birthday Problem collision probability:
P = 1 - e^(-(30,000)^2 / (2 x 70.4 trillion))
P = 1 - e^(-0.0000064)
P = 0.00064% (negligible)

Conclusion: MAC collisions between DIFFERENT phones are essentially zero.
The real problem is the opposite -- collision ABSENCE means overcounting.

The Twist: The Birthday Problem revealed that the company was NOT suffering from collisions (different phones sharing the same random MAC) – the address space was large enough. Instead, they had the inverse problem: the same phone generating many non-colliding addresses, inflating visitor counts by 5-15x.

Solution Using Birthday Problem Mathematics:

The company used the Birthday Problem formula in reverse to estimate unique visitors from observed MAC counts:

If N unique phones each generate k random MACs:
  Total observed MACs = N x k
  Expected duplicate MACs (same phone, same random): N x k x (k-1) / (2 x 2^46) ~ 0

Since collision rate is near zero, each observed MAC is unique.
Estimate actual visitors: observed_MACs / average_k

Calibration: ground-truth cameras showed k ~ 8 for iPhones, k ~ 12 for Android
Corrected estimate: 30,000 MACs / 10 average = 3,000 visitors (matches ground truth)

Business Impact: Without understanding the Birthday Problem mathematics, the company initially reported 30,000 daily visitors to mall tenants instead of 3,000 – a 10x overcount that led to inflated lease negotiations and eventual lawsuits when actual sales data contradicted the traffic claims.

48.12 Review Activities

48.12.1 Match Address Sizes to Their Collision Characteristics

48.12.2 Order the Steps to Calculate Collision Risk for an IoT Network

Common Pitfalls

With 100 devices each transmitting once per second on a shared channel, collision probability rises dramatically. Fix: use the ALOHA throughput formula to calculate collision probability before deploying more than 20–30 nodes on an unslotted shared channel.

The birthday problem assumes uniform random choice from a fixed set. Network collisions depend on traffic patterns, packet sizes, and backoff algorithms that the simple birthday model does not capture. Fix: use the birthday problem as an intuition-builder, not a precise engineering tool.

After a collision, both devices retransmit, increasing the offered load G and potentially triggering more collisions (congestion collapse). Fix: implement exponential backoff to reduce retransmission rates under high load.

48.13 Summary

  • The Birthday Problem shows collision probability grows with n squared (quadratic), causing collisions far sooner than intuition suggests
  • 23 people create 50% birthday collision probability – the same math applies to network addresses
  • The 50% threshold is approximately \(1.177\sqrt{m}\) where \(m\) is the address space size
  • 16-bit addresses support ~301 devices before 50% collision probability, explaining Zigbee’s network size limits
  • 48-bit MAC addresses reach 50% collision probability at about 19.75 million randomly assigned devices
  • Time-slot collisions in LoRaWAN and other uncoordinated protocols follow the same mathematics – 2,000 devices on 48 virtual channels produce 1.3% collision rate

48.14 Try It Yourself

Scenario: You’re deploying sensors in an office building with 3 floors. Plan the address space to avoid Birthday Problem collisions.

Given:

  • Floor 1: 80 temperature sensors
  • Floor 2: 60 occupancy sensors
  • Floor 3: 40 air quality sensors
  • Total: 180 devices
  • Growth plan: Add 20% more devices per year for 3 years
  • Final device count: 180 x 1.2^3 = 311 devices

Tasks:

  1. Calculate 50% collision threshold for 16-bit (65,536 addresses)
  2. Calculate 50% collision threshold for 24-bit (16,777,216 addresses)
  3. Calculate collision probability for 311 devices with each address size
  4. Select minimum safe address size (target <5% collision probability)

Hint: Use formula P(collision) = 1 - e(-n2 / (2 x address_space))

Solution:

16-bit (65,536 addresses):

  • 50% threshold: \(\sqrt{2 \times 65{,}536 \times \ln 2} \approx 301\) devices
  • With 311 devices: \(P = 1 - e^{-(311)^2/(2 \times 65{,}536)} = 52.2\%\) – UNACCEPTABLE

24-bit (16,777,216 addresses):

  • 50% threshold: \(\sqrt{2 \times 16{,}777{,}216 \times \ln 2} \approx 4{,}823\) devices
  • With 311 devices: \(P = 1 - e^{-(311)^2/(2 \times 16{,}777{,}216)} = 0.29\%\) – SAFE

32-bit (4,294,967,296 addresses):

  • With 311 devices: \(P = 1 - e^{-(311)^2/(2 \times 4{,}294{,}967{,}296)} = 0.001\%\) – EXCELLENT

Recommendation: Use 24-bit addresses (3 bytes) for this deployment. Provides adequate safety margin with minimal overhead compared to 32-bit. Zigbee’s 16-bit short addresses would fail – use 64-bit extended addresses instead.

Key Lesson: Always calculate for your FINAL expected device count, not initial deployment. The Birthday Problem’s quadratic growth means adding 73% more devices (180 to 311) increases collision probability by approximately 3x!

48.15 See Also

48.16 What’s Next

Topic Chapter Description
Collision Design Strategies Collision Design Strategies Practical solutions for handling address collisions, including Zigbee limitations, IPv6 addressing, and resolution protocols
Bandwidth Requirements Bandwidth Requirements Bandwidth calculations and common misconceptions that lead to over-provisioning in IoT networks
Zigbee Fundamentals Zigbee Fundamentals and Architecture Why 16-bit short addresses constrain Zigbee PAN size and how extended addresses provide a remedy
LoRaWAN Network Architecture LoRaWAN Network Architecture Network topology and capacity considerations, including channel and spreading-factor management
MAC Address Management Layered Models Labs IEEE OUI allocation, MAC address randomization, and collision detection at the data link layer
IPv6 Addressing IPv6 Addressing How 128-bit IPv6 addresses eliminate collision risk for any foreseeable IoT deployment scale