%% fig-alt: "Hash function showing any input producing fixed 256-bit output with avalanche effect"
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22'}}}%%
flowchart LR
subgraph Inputs["Any Input Size"]
I1["Hello" <br/> 5 bytes]
I2["Firmware v2.1.3" <br/> 2 MB]
I3["Sensor data stream" <br/> 1 GB]
end
subgraph Hash["SHA-256"]
H[Hash Function]
end
subgraph Outputs["Fixed Output: 256 bits"]
O1["185f8db3..."]
O2["a3f2c9e8..."]
O3["7d4e2f1a..."]
end
I1 --> H --> O1
I2 --> H --> O2
I3 --> H --> O3
style Inputs fill:#E67E22,stroke:#2C3E50,color:#fff
style Hash fill:#2C3E50,stroke:#16A085,color:#fff
style Outputs fill:#16A085,stroke:#2C3E50,color:#fff
1431 Hash Functions and Data Integrity
1431.1 Learning Objectives
By the end of this chapter, you will be able to:
- Understand Hash Function Properties: Explain collision resistance, preimage resistance, and the avalanche effect
- Apply SHA-256 for Integrity: Use cryptographic hashes to verify firmware, validate data, and store passwords securely
- Implement HMAC Authentication: Combine hash functions with secret keys for message authentication
- Avoid Deprecated Algorithms: Recognize why MD5 and SHA-1 are insecure and should never be used
A hash function creates a fixed-size “fingerprint” of any data. Like a fingerprint, even tiny changes produce completely different outputs. Hash functions are one-way: you can create a hash from data, but you cannot reverse it to get the original data back.
Why it matters for IoT: When your smart thermostat downloads a firmware update, it computes the SHA-256 hash and compares it to the manufacturer’s published hash. If they match, the firmware is authentic and unmodified. If even one bit was changed (by an attacker or network error), the hash will be completely different.
1431.2 How Hash Functions Work
A cryptographic hash function takes input of any size and produces a fixed-size output called a digest or hash.
1431.3 Critical Properties
1431.3.1 1. Collision Resistance
It must be computationally infeasible to find two different inputs that produce the same hash.
Why it matters: If attackers could create a malicious firmware file with the same hash as legitimate firmware, they could bypass integrity checks.
1431.3.2 2. Preimage Resistance (One-Way)
Given a hash output, it must be computationally infeasible to find any input that produces that hash.
Why it matters: Password hashes stored in databases cannot be reversed to reveal the original passwords.
1431.3.3 3. Avalanche Effect
Changing even one bit of input produces a completely different hash output (approximately 50% of bits change).
Input: "Sensor: Temp=25.0C"
Hash: a3f2c9e847b1...
Input: "Sensor: Temp=25.1C" (one character changed)
Hash: 7d4e2f1a93c8... (completely different!)
1431.4 SHA-256: The Standard
SHA-256 (Secure Hash Algorithm 256-bit) is the recommended hash function for IoT:
| Property | Value |
|---|---|
| Output Size | 256 bits (32 bytes) |
| Block Size | 512 bits |
| Security Level | 128-bit collision resistance |
| Speed | ~200-500 MB/s on modern processors |
| Status | Current standard, no known attacks |
1431.4.1 IoT Use Cases
1. Firmware Integrity Verification
Manufacturer publishes:
firmware_v2.1.bin
SHA-256: 7d4e2f1a93c8b5d6e7f8a9b0c1d2e3f4...
Device downloads firmware, computes:
SHA-256(downloaded_firmware)
If hashes match: INSTALL
If hashes differ: REJECT (corrupted or tampered)
2. Password Storage
User password: "MySecretP@ss123"
Stored in DB: SHA-256(password + salt)
= "a3f2c9e847b1d5f6..."
Never store plaintext passwords!
3. Data Integrity in Transit
Sensor sends:
{data: "temp=25.0C", hash: SHA-256(data)}
Gateway receives and verifies:
computed_hash = SHA-256(received_data)
if computed_hash == received_hash: VALID
4. Blockchain and Merkle Trees
IoT audit logs use hash chains where each entry includes the hash of the previous entry, making tampering detectable.
1431.5 HMAC: Hash-Based Message Authentication
HMAC combines a hash function with a secret key to provide both integrity AND authentication.
%% fig-alt: "HMAC process showing message combined with secret key to produce authentication tag"
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22'}}}%%
flowchart LR
subgraph Sender["Sender"]
Msg1[Message]
Key1[Secret Key K]
HMAC1[HMAC-SHA256]
Tag1[MAC Tag<br/>32 bytes]
Msg1 --> HMAC1
Key1 --> HMAC1
HMAC1 --> Tag1
end
subgraph Transit["Network"]
Send[Message + MAC Tag]
end
subgraph Receiver["Receiver"]
Msg2[Received Message]
Key2[Secret Key K]
HMAC2[HMAC-SHA256]
Tag2[Computed Tag]
Check{Tags Match?}
Msg2 --> HMAC2
Key2 --> HMAC2
HMAC2 --> Tag2
Tag2 --> Check
end
Tag1 --> Send --> Check
Check -->|Yes| Valid[Authentic + Unmodified]
Check -->|No| Invalid[Tampered or Forged]
style Valid fill:#27AE60,stroke:#2C3E50,color:#fff
style Invalid fill:#E74C3C,stroke:#2C3E50,color:#fff
1431.5.1 HMAC vs Plain Hash
| Feature | SHA-256(data) | HMAC-SHA256(key, data) |
|---|---|---|
| Integrity | Yes | Yes |
| Authentication | No | Yes |
| Key Required | No | Yes |
| Prevents Forgery | No | Yes |
Always use HMAC when you need to verify that a message came from an authorized sender AND wasn’t modified.
1431.6 Deprecated Algorithms
1431.7 Never Use MD5 or SHA-1
These algorithms have known vulnerabilities and must not be used for security:
| Algorithm | Status | Vulnerability |
|---|---|---|
| MD5 | Broken | Collision attacks in seconds |
| SHA-1 | Deprecated | Collision demonstrated in 2017 |
Real-world impact: Attackers have created fake SSL certificates using MD5 collisions, enabling man-in-the-middle attacks.
1431.7.1 Migration Path
| Old Algorithm | Replacement | Notes |
|---|---|---|
| MD5 | SHA-256 | Direct replacement |
| SHA-1 | SHA-256 or SHA-3 | All new systems |
| SHA-256 | SHA-3 (optional) | Different design for defense-in-depth |
1431.8 Hash Function Recommendations
| Hash Function | Output Size | Security Level | Use Case |
|---|---|---|---|
| SHA-256 | 256 bits | Strong | Default choice for IoT |
| SHA-3 | 256/512 bits | Very Strong | Future-proofing, defense-in-depth |
| BLAKE2 | 256/512 bits | Strong | High-performance hashing |
| SHA-512 | 512 bits | Very Strong | When 256-bit is insufficient |
1431.9 Knowledge Check
A smart home device uses SHA-256 to verify firmware updates before installation. Which property of hash functions makes this security measure effective?
Options:
- Hash functions can be reversed to recover the original firmware if needed
- Hash functions are extremely slow, preventing attackers from generating fake firmware quickly
- Hash functions produce a unique fixed-size fingerprint that changes completely if even one bit of the input changes
- Hash functions encrypt the firmware so attackers cannot read it
Correct: C
Cryptographic hash functions produce a unique, fixed-size output (256 bits for SHA-256) that serves as a “fingerprint” of the input data. The avalanche effect means changing even a single bit produces a completely different hash. When the manufacturer publishes the official hash, the device computes its own hash of downloaded firmware - if they match, the firmware is authentic.
Hash functions are NOT encryption (they don’t hide data), are NOT reversible (one-way only), and are actually very fast (not slow).
Your IoT gateway receives sensor readings from remote devices. You want to verify that readings come from authorized sensors AND haven’t been tampered with. Which approach should you use?
Options:
- SHA-256 hash of the sensor data
- HMAC-SHA256 with a shared secret key
- RSA encryption of the sensor data
- Base64 encoding of the sensor data
Correct: B
HMAC-SHA256 combines a hash with a secret key to provide both integrity (data wasn’t modified) AND authentication (sender knows the secret). Plain SHA-256 only provides integrity - anyone can compute a hash, so it doesn’t prove who sent the data. RSA is for encryption/signatures (overkill here). Base64 is just encoding, not security.
1431.10 Implementation Example
1431.10.1 Python: HMAC-SHA256 for Sensor Data
import hmac
import hashlib
# Shared secret key (pre-provisioned on sensor and gateway)
secret_key = b"sensor_shared_secret_key"
# Sensor reading
sensor_data = b"temp=25.5,humidity=60%,time=1642345678"
# Compute HMAC
mac_tag = hmac.new(secret_key, sensor_data, hashlib.sha256).hexdigest()
# Sensor sends: {data: sensor_data, mac: mac_tag}
# Gateway verifies:
received_data = sensor_data # from network
received_mac = mac_tag # from network
computed_mac = hmac.new(secret_key, received_data, hashlib.sha256).hexdigest()
if hmac.compare_digest(computed_mac, received_mac):
print("VALID: Data is authentic and unmodified")
else:
print("INVALID: Data tampered or wrong key")1431.10.2 Key Points
- Use
hmac.compare_digest()- Prevents timing attacks - Use unique keys per device - Limits blast radius if compromised
- Rotate keys periodically - Limits exposure from key leakage
- Include timestamp - Prevents replay attacks
1431.11 Password Hashing
For storing user passwords, use password hashing functions (not plain SHA-256):
| Function | Status | Use Case |
|---|---|---|
| Argon2 | Recommended | Modern systems, adjustable memory-hardness |
| bcrypt | Good | Legacy systems, widely supported |
| scrypt | Good | Memory-hard, prevents GPU attacks |
| PBKDF2 | Acceptable | FIPS compliance required |
Why not plain SHA-256?
Plain hashes are too fast - attackers can try billions of passwords per second. Password hashing functions are intentionally slow (100ms+) to make brute-force attacks impractical.
# WRONG: Too fast to brute-force
password_hash = hashlib.sha256(password.encode()).hexdigest()
# RIGHT: Intentionally slow (100,000+ iterations)
from hashlib import pbkdf2_hmac
password_hash = pbkdf2_hmac('sha256', password.encode(), salt, 100000)1431.12 Summary
- Hash functions create fixed-size fingerprints of any data
- SHA-256 is the recommended hash for IoT - 256-bit output, no known attacks
- Collision resistance prevents attackers from creating fake data with matching hashes
- Avalanche effect means tiny changes produce completely different hashes
- HMAC combines hashing with a secret key for authentication + integrity
- Never use MD5 or SHA-1 - both are cryptographically broken
- Password hashing requires slow functions (Argon2, bcrypt) not plain SHA-256
1431.13 What’s Next
Continue to TLS/DTLS Transport Security to learn how transport layer security protocols protect IoT communications by combining encryption, hashing, and authentication into a complete security solution.