1467  Location Privacy Leaks

1467.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Assess Location Data Sensitivity: Understand what location traces reveal about individuals
  • Explain De-anonymization Attacks: Describe how attackers re-identify users from “anonymized” location data
  • Calculate Anonymity Sets: Determine how many spatiotemporal points uniquely identify users
  • Apply Location Privacy Defenses: Implement techniques to reduce location tracking exposure

1467.2 Prerequisites

Before diving into this chapter, you should be familiar with:

  • Quizzes Hub: Test your understanding of de-anonymization attacks with interactive quizzes. Focus on calculating anonymity set sizes from location traces.

  • Knowledge Gaps Tracker: Common confusion points include thinking anonymization protects location data (4 spatiotemporal points uniquely identify 95% of users). Document your gaps here for targeted review.

1467.3 Introduction

Location data is extremely sensitive and can reveal intimate details about an individual’s life. Even when “anonymized,” location traces remain highly identifiable due to their unique spatiotemporal patterns. This chapter examines why location privacy is fundamentally different from other data types.

1467.4 What Location Data Reveals

Location data can reveal: - Home and work addresses - Daily routines and habits - Social relationships (who you meet, where) - Health conditions (hospital visits, pharmacy) - Religious beliefs (place of worship attendance) - Political affiliations (rally attendance, campaign offices)

Cellular network privacy leak vectors showing exposed metadata including cell tower triangulation for location tracking, IMEI and IMSI identifiers, call and SMS metadata patterns, and network connection logs revealing user behavior
Figure 1467.1: Cellular network privacy leaks

1467.5 De-anonymization Using Location Data

Research Finding: Knowing a user’s home and work location at census block granularity reduces anonymity set to median size of 1 in US working population.

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#E67E22', 'secondaryColor': '#16A085', 'tertiaryColor': '#E67E22', 'fontSize': '12px'}}}%%
flowchart TB
    GPS[GPS Location Traces<br/>15 months, 1.5M users] --> CLUSTER[Temporal Clustering]

    CLUSTER --> HOME[Home Location<br/>Nighttime stays<br/>10 PM - 6 AM]
    CLUSTER --> WORK[Work Location<br/>Weekday stays<br/>9 AM - 5 PM]

    HOME --> CENSUS1[Census Block 1<br/>~2000 people]
    WORK --> CENSUS2[Census Block 2<br/>~2000 people]

    CENSUS1 --> INTERSECT[Intersection:<br/>Home x Work]
    CENSUS2 --> INTERSECT

    INTERSECT --> UNIQUE[Anonymity Set = 1<br/>UNIQUELY IDENTIFIED]

    style GPS fill:#16A085,stroke:#0e6655,color:#fff
    style HOME fill:#2C3E50,stroke:#16A085,color:#fff
    style WORK fill:#2C3E50,stroke:#16A085,color:#fff
    style UNIQUE fill:#c0392b,stroke:#a93226,color:#fff

Figure 1467.2: Location-Based De-anonymization: Home and Work Inference from GPS Traces

1467.5.1 Location Inference Features

Attackers use these features to infer home and work locations: - Last destination of day (likely home) - Long stay locations - Time patterns (work hours vs. home hours) - Movement speed (walking, driving, public transit)

1467.6 Quantifying De-anonymization Risk

Key Research Findings:

Data Points Unique Identification Rate
4 spatiotemporal points 95% of individuals
Home + work location Median anonymity set = 1
8 movie ratings 99% of Netflix users

Why Location is Worse Than Ratings:

  • Data sparsity: Infinite location possibilities (continuous GPS coordinates) vs. discrete choices (5 star ratings)
  • Temporal correlations: Sequential activities create unique patterns—(gym then coffee then office at 7am) is your fingerprint
  • Auxiliary information attacks: Public data enables re-identification via voter registration, property records, social media check-ins
WarningKey Insight: 4 Points = Identity

Research on 1.5 million mobile users over 15 months proves that 4 spatiotemporal points uniquely identify 95% of individuals. This means any “anonymized” location dataset with moderate temporal resolution is effectively an identity database.

1467.7 K-Anonymity Requirements for Mobility

K-anonymity means ensuring each record is indistinguishable from at least K-1 other records.

Different data types require vastly different K values:

Data Type Required K Why
Movie ratings K >= 5 Discrete choices, limited correlations
Mobility traces K >= 5,000 Continuous space, strong temporal correlations

Why mobility requires 1,000x more anonymity: 1. Continuous space: GPS has infinite precision vs. 5 star levels 2. Stronger correlations: Sequential dependencies (gym then shower then breakfast is distinct from breakfast then gym) 3. Higher temporal resolution: Second-level timestamps vs. approximate dates 4. Multiple dimensions: Location + time + activity + network simultaneously

Question: Research shows movie rating datasets achieve privacy with K=5 (5 indistinguishable users). What K-value do mobile sensing traces require for similar privacy?

Mobile sensing data requires K>=5,000 for basic privacy—1,000x more than movie ratings! Why? 1) Data sparsity: Movie ratings are discrete (5 star levels x ~20,000 movies), but location is continuous (infinite lat/lon combinations), 2) Stronger correlations: Activities are sequentially dependent (gym then shower then breakfast is distinct from breakfast then gym), 3) Temporal resolution: Movie ratings lack precise timestamps; mobile data has second-level precision revealing detailed routines, 4) Multiple dimensions: Mobile data combines location + activity + Wi-Fi + Bluetooth + cellular simultaneously. Netflix Prize researchers re-identified 99% of “anonymized” users with just 8 movie ratings—mobile data is exponentially worse!

1467.8 Differential Privacy Limitations

Question: You want to share anonymized location traces from 10,000 users for research. Differential privacy with epsilon=1.0 adds Gaussian noise to each location. Why is this insufficient protection?

Adding noise to individual locations is insufficient because sequential locations are highly correlated! Trajectory reconstruction attacks work because: 1) Map constraints: Roads, buildings, and geography limit possible paths—noisy locations can be snapped back to actual routes, 2) Temporal correlations: Speed and direction constraints (you can’t teleport or move at 200 mph) reduce the noise effect, 3) Auxiliary information: Knowing one true location (e.g., from social media check-in) helps reconstruct the entire trajectory, 4) Multiple observations: With enough noisy samples, statistical techniques (Kalman filtering, particle filtering) recover the true path. Better approach: Coarsen spatial granularity (city-level only), add temporal obfuscation (plus or minus hours, not seconds), or use synthetic trajectory generation preserving statistical properties without real paths.

1467.9 Location Privacy Attack Example

Question: You have one week of GPS traces showing nighttime locations (10pm-6am) all cluster around (37.7749, -122.4194) and daytime weekday locations (9am-5pm) cluster around (37.7849, -122.4094). What can be inferred at census block granularity?

Research shows that knowing home and work locations at census block granularity reduces the anonymity set to a median of 1 person in the US working population. Census blocks are small areas (typically 600-3,000 people), and the combination of two blocks (home x work) creates a unique fingerprint. Even with just 4 spatiotemporal points, 95% of individuals are uniquely identifiable. This demonstrates why location “anonymization” is fundamentally broken—the data itself is a unique identifier.

1467.10 Suspicious Location Access Patterns

Question: Your mobile privacy framework detects an app accessed LOCATION permission 847 times in one day, but only transmitted data to the server 3 times. What privacy risk does this indicate?

This usage pattern is highly suspicious! 847 location accesses but only 3 transmissions suggests: 1) Local profiling: App is building a detailed location history database locally (every ~100 seconds), 2) Delayed exfiltration: Waiting for Wi-Fi connection, low battery monitoring, or specific trigger before bulk upload, 3) Evasion tactics: Infrequent uploads avoid detection by bandwidth monitors or privacy tools, 4) Granular tracking: 847 daily accesses = every ~100 seconds, far more than needed for legitimate use. Real-world example: Ad networks do this to create detailed movement patterns while appearing “privacy-friendly” in network analysis. Detection strategy: Monitor storage access patterns, analyze local databases, and track cumulative permission usage over time, not just network transmissions. This is why comprehensive privacy frameworks monitor sources, storage, AND sinks!

1467.11 Location Privacy Defenses

Effective Defenses:

  1. Limit collection: Use “Only while using” permission when possible
  2. Coarsen granularity: City-level location for weather apps
  3. Temporal obfuscation: Delay location-based features by hours
  4. Dummy locations: Mix real locations with synthetic ones
  5. Local processing: Perform location-based computations on-device

Ineffective Defenses:

  1. Simple anonymization: Removing names/IDs insufficient
  2. Adding noise to individual points: Trajectory reconstruction attacks succeed
  3. Hashing location data: Small space enables rainbow tables
  4. K-anonymity with K less than 5000: Insufficient for mobility data

1467.12 Summary

Location data poses unique privacy challenges:

What Location Reveals: - Home, work, and frequently visited locations - Social relationships, health conditions, beliefs - Daily routines and behavioral patterns

De-anonymization Risks: - 4 spatiotemporal points identify 95% of individuals - Home + work location = unique identifier - K-anonymity requires K >= 5,000 for mobility (1,000x more than ratings)

Why Anonymization Fails: - Continuous space (infinite GPS precision) - Strong temporal correlations - Auxiliary information attacks - Map constraints enable trajectory reconstruction

Key Takeaway: Location data is inherently identifiable. Privacy protection requires preventing collection, not trusting post-hoc anonymization.

1467.13 What’s Next

With location privacy risks understood, the next chapter explores Wi-Fi and Sensing Privacy where you’ll learn how Wi-Fi probe requests and motion sensors create additional tracking vectors, and how mobile sensing data enables de-anonymization through behavioral fingerprinting.

Continue to Wi-Fi and Sensing Privacy →