K-anonymity ensures each record is indistinguishable from at least k-1 other records. L-diversity requires at least l distinct values for sensitive attributes within each k-anonymous group.
K-Anonymity: For a dataset D, every record must be identical to at least k-1 other records on quasi-identifiers (age, zip code, gender).
L-Diversity Entropy: Within each k-anonymous group, the entropy of sensitive attribute S must satisfy: \[H(S) = -\sum_{i=1}^{l} p_i \log_2(p_i) \geq \log_2(l)\]
where \(p_i\) is the proportion of records with sensitive value \(i\).
Working through an example: Given: Hospital IoT patient monitoring system with 10,000 patients. Data includes: Age, ZIP code, Heart rate monitoring frequency. Apply k=5 anonymity and l=3 diversity.
Original dataset sample: | Patient | Age | ZIP | HR Frequency | Diagnosis | |———|—–|—–|————–|———–| | P001 | 42 | 94102 | High | Arrhythmia | | P002 | 43 | 94102 | High | Arrhythmia | | P003 | 42 | 94103 | Low | Healthy |
Step 1: Apply k=5 generalization - Age: 42 → “40-45”, 43 → “40-45” - ZIP: 94102 → “941”, 94103 → ”941”
K-anonymous groups (each has ≥5 members): | Group | Age Range | ZIP | HR Frequency | Count | |——-|———–|—–|————–|——-| | G1 | 40-45 | 941** | High | 8 patients | | G2 | 40-45 | 941** | Low | 5 patients |
Step 2: Check l-diversity for Group G1 (8 patients with High HR monitoring) - Diagnoses in G1: Arrhythmia (5), Heart Failure (2), Healthy (1) - Distribution: p₁=5/8, p₂=2/8, p₃=1/8 - Entropy: \(H = -(5/8)\log_2(5/8) - (2/8)\log_2(2/8) - (1/8)\log_2(1/8)\) - \(H = -(0.625)(-0.678) - (0.25)(-2) - (0.125)(-3)\) - \(H = 0.424 + 0.5 + 0.375 = 1.299\) bits
Step 3: Verify l-diversity requirement - Required: \(H \geq \log_2(3) = 1.585\) bits - Actual: \(H = 1.299\) bits - FAILS l=3 diversity (dominated by Arrhythmia)
Step 4: Fix by suppressing Group G1 or merging with another group - Merge G1 with similar group to increase diagnosis diversity - New merged group has 6 distinct diagnoses - New entropy: \(H = 2.1\) bits > 1.585 bits ✓
Result: K=5 anonymity achieved with generalization. L=3 diversity required merging groups to prevent diagnosis inference from quasi-identifiers.
In practice: K-anonymity alone fails when sensitive attributes are homogeneous (all patients with high HR monitoring have heart conditions). L-diversity ensures that even knowing someone is in a k-anonymous group doesn’t reveal their sensitive attribute. For IoT health data (Tier 3), l≥3 diversity is the minimum to prevent inference attacks via background knowledge.