1431  Hash Functions and Data Integrity

1431.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Understand Hash Function Properties: Explain collision resistance, preimage resistance, and the avalanche effect
  • Apply SHA-256 for Integrity: Use cryptographic hashes to verify firmware, validate data, and store passwords securely
  • Implement HMAC Authentication: Combine hash functions with secret keys for message authentication
  • Avoid Deprecated Algorithms: Recognize why MD5 and SHA-1 are insecure and should never be used
TipIn Plain English

A hash function creates a fixed-size “fingerprint” of any data. Like a fingerprint, even tiny changes produce completely different outputs. Hash functions are one-way: you can create a hash from data, but you cannot reverse it to get the original data back.

Why it matters for IoT: When your smart thermostat downloads a firmware update, it computes the SHA-256 hash and compares it to the manufacturer’s published hash. If they match, the firmware is authentic and unmodified. If even one bit was changed (by an attacker or network error), the hash will be completely different.

1431.2 How Hash Functions Work

A cryptographic hash function takes input of any size and produces a fixed-size output called a digest or hash.

%% fig-alt: "Hash function showing any input producing fixed 256-bit output with avalanche effect"
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22'}}}%%
flowchart LR
    subgraph Inputs["Any Input Size"]
        I1["Hello" <br/> 5 bytes]
        I2["Firmware v2.1.3" <br/> 2 MB]
        I3["Sensor data stream" <br/> 1 GB]
    end

    subgraph Hash["SHA-256"]
        H[Hash Function]
    end

    subgraph Outputs["Fixed Output: 256 bits"]
        O1["185f8db3..."]
        O2["a3f2c9e8..."]
        O3["7d4e2f1a..."]
    end

    I1 --> H --> O1
    I2 --> H --> O2
    I3 --> H --> O3

    style Inputs fill:#E67E22,stroke:#2C3E50,color:#fff
    style Hash fill:#2C3E50,stroke:#16A085,color:#fff
    style Outputs fill:#16A085,stroke:#2C3E50,color:#fff

Figure 1431.1: Hash functions produce fixed-size outputs regardless of input size

1431.3 Critical Properties

1431.3.1 1. Collision Resistance

It must be computationally infeasible to find two different inputs that produce the same hash.

Why it matters: If attackers could create a malicious firmware file with the same hash as legitimate firmware, they could bypass integrity checks.

1431.3.2 2. Preimage Resistance (One-Way)

Given a hash output, it must be computationally infeasible to find any input that produces that hash.

Why it matters: Password hashes stored in databases cannot be reversed to reveal the original passwords.

1431.3.3 3. Avalanche Effect

Changing even one bit of input produces a completely different hash output (approximately 50% of bits change).

Input:  "Sensor: Temp=25.0C"
Hash:   a3f2c9e847b1...

Input:  "Sensor: Temp=25.1C"  (one character changed)
Hash:   7d4e2f1a93c8...      (completely different!)

1431.4 SHA-256: The Standard

SHA-256 (Secure Hash Algorithm 256-bit) is the recommended hash function for IoT:

Property Value
Output Size 256 bits (32 bytes)
Block Size 512 bits
Security Level 128-bit collision resistance
Speed ~200-500 MB/s on modern processors
Status Current standard, no known attacks

1431.4.1 IoT Use Cases

1. Firmware Integrity Verification

Manufacturer publishes:
  firmware_v2.1.bin
  SHA-256: 7d4e2f1a93c8b5d6e7f8a9b0c1d2e3f4...

Device downloads firmware, computes:
  SHA-256(downloaded_firmware)

If hashes match: INSTALL
If hashes differ: REJECT (corrupted or tampered)

2. Password Storage

User password:  "MySecretP@ss123"
Stored in DB:   SHA-256(password + salt)
                = "a3f2c9e847b1d5f6..."

Never store plaintext passwords!

3. Data Integrity in Transit

Sensor sends:
  {data: "temp=25.0C", hash: SHA-256(data)}

Gateway receives and verifies:
  computed_hash = SHA-256(received_data)
  if computed_hash == received_hash: VALID

4. Blockchain and Merkle Trees

IoT audit logs use hash chains where each entry includes the hash of the previous entry, making tampering detectable.

1431.5 HMAC: Hash-Based Message Authentication

HMAC combines a hash function with a secret key to provide both integrity AND authentication.

%% fig-alt: "HMAC process showing message combined with secret key to produce authentication tag"
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22'}}}%%
flowchart LR
    subgraph Sender["Sender"]
        Msg1[Message]
        Key1[Secret Key K]
        HMAC1[HMAC-SHA256]
        Tag1[MAC Tag<br/>32 bytes]

        Msg1 --> HMAC1
        Key1 --> HMAC1
        HMAC1 --> Tag1
    end

    subgraph Transit["Network"]
        Send[Message + MAC Tag]
    end

    subgraph Receiver["Receiver"]
        Msg2[Received Message]
        Key2[Secret Key K]
        HMAC2[HMAC-SHA256]
        Tag2[Computed Tag]
        Check{Tags Match?}

        Msg2 --> HMAC2
        Key2 --> HMAC2
        HMAC2 --> Tag2
        Tag2 --> Check
    end

    Tag1 --> Send --> Check

    Check -->|Yes| Valid[Authentic + Unmodified]
    Check -->|No| Invalid[Tampered or Forged]

    style Valid fill:#27AE60,stroke:#2C3E50,color:#fff
    style Invalid fill:#E74C3C,stroke:#2C3E50,color:#fff

Figure 1431.2: HMAC provides authentication and integrity verification

1431.5.1 HMAC vs Plain Hash

Feature SHA-256(data) HMAC-SHA256(key, data)
Integrity Yes Yes
Authentication No Yes
Key Required No Yes
Prevents Forgery No Yes

Always use HMAC when you need to verify that a message came from an authorized sender AND wasn’t modified.

1431.6 Deprecated Algorithms

1431.7 Never Use MD5 or SHA-1

These algorithms have known vulnerabilities and must not be used for security:

Algorithm Status Vulnerability
MD5 Broken Collision attacks in seconds
SHA-1 Deprecated Collision demonstrated in 2017

Real-world impact: Attackers have created fake SSL certificates using MD5 collisions, enabling man-in-the-middle attacks.

1431.7.1 Migration Path

Old Algorithm Replacement Notes
MD5 SHA-256 Direct replacement
SHA-1 SHA-256 or SHA-3 All new systems
SHA-256 SHA-3 (optional) Different design for defense-in-depth

1431.8 Hash Function Recommendations

Hash Function Output Size Security Level Use Case
SHA-256 256 bits Strong Default choice for IoT
SHA-3 256/512 bits Very Strong Future-proofing, defense-in-depth
BLAKE2 256/512 bits Strong High-performance hashing
SHA-512 512 bits Very Strong When 256-bit is insufficient

1431.9 Knowledge Check

A smart home device uses SHA-256 to verify firmware updates before installation. Which property of hash functions makes this security measure effective?

Options:

    1. Hash functions can be reversed to recover the original firmware if needed
    1. Hash functions are extremely slow, preventing attackers from generating fake firmware quickly
    1. Hash functions produce a unique fixed-size fingerprint that changes completely if even one bit of the input changes
    1. Hash functions encrypt the firmware so attackers cannot read it

Correct: C

Cryptographic hash functions produce a unique, fixed-size output (256 bits for SHA-256) that serves as a “fingerprint” of the input data. The avalanche effect means changing even a single bit produces a completely different hash. When the manufacturer publishes the official hash, the device computes its own hash of downloaded firmware - if they match, the firmware is authentic.

Hash functions are NOT encryption (they don’t hide data), are NOT reversible (one-way only), and are actually very fast (not slow).

Your IoT gateway receives sensor readings from remote devices. You want to verify that readings come from authorized sensors AND haven’t been tampered with. Which approach should you use?

Options:

    1. SHA-256 hash of the sensor data
    1. HMAC-SHA256 with a shared secret key
    1. RSA encryption of the sensor data
    1. Base64 encoding of the sensor data

Correct: B

HMAC-SHA256 combines a hash with a secret key to provide both integrity (data wasn’t modified) AND authentication (sender knows the secret). Plain SHA-256 only provides integrity - anyone can compute a hash, so it doesn’t prove who sent the data. RSA is for encryption/signatures (overkill here). Base64 is just encoding, not security.

1431.10 Implementation Example

1431.10.1 Python: HMAC-SHA256 for Sensor Data

import hmac
import hashlib

# Shared secret key (pre-provisioned on sensor and gateway)
secret_key = b"sensor_shared_secret_key"

# Sensor reading
sensor_data = b"temp=25.5,humidity=60%,time=1642345678"

# Compute HMAC
mac_tag = hmac.new(secret_key, sensor_data, hashlib.sha256).hexdigest()

# Sensor sends: {data: sensor_data, mac: mac_tag}

# Gateway verifies:
received_data = sensor_data  # from network
received_mac = mac_tag       # from network

computed_mac = hmac.new(secret_key, received_data, hashlib.sha256).hexdigest()

if hmac.compare_digest(computed_mac, received_mac):
    print("VALID: Data is authentic and unmodified")
else:
    print("INVALID: Data tampered or wrong key")

1431.10.2 Key Points

  1. Use hmac.compare_digest() - Prevents timing attacks
  2. Use unique keys per device - Limits blast radius if compromised
  3. Rotate keys periodically - Limits exposure from key leakage
  4. Include timestamp - Prevents replay attacks

1431.11 Password Hashing

For storing user passwords, use password hashing functions (not plain SHA-256):

Function Status Use Case
Argon2 Recommended Modern systems, adjustable memory-hardness
bcrypt Good Legacy systems, widely supported
scrypt Good Memory-hard, prevents GPU attacks
PBKDF2 Acceptable FIPS compliance required

Why not plain SHA-256?

Plain hashes are too fast - attackers can try billions of passwords per second. Password hashing functions are intentionally slow (100ms+) to make brute-force attacks impractical.

# WRONG: Too fast to brute-force
password_hash = hashlib.sha256(password.encode()).hexdigest()

# RIGHT: Intentionally slow (100,000+ iterations)
from hashlib import pbkdf2_hmac
password_hash = pbkdf2_hmac('sha256', password.encode(), salt, 100000)

1431.12 Summary

  • Hash functions create fixed-size fingerprints of any data
  • SHA-256 is the recommended hash for IoT - 256-bit output, no known attacks
  • Collision resistance prevents attackers from creating fake data with matching hashes
  • Avalanche effect means tiny changes produce completely different hashes
  • HMAC combines hashing with a secret key for authentication + integrity
  • Never use MD5 or SHA-1 - both are cryptographically broken
  • Password hashing requires slow functions (Argon2, bcrypt) not plain SHA-256

1431.13 What’s Next

Continue to TLS/DTLS Transport Security to learn how transport layer security protocols protect IoT communications by combining encryption, hashing, and authentication into a complete security solution.