Explain Hash Function Properties: Describe collision resistance, preimage resistance, and the avalanche effect
Apply SHA-256 for Integrity: Verify firmware authenticity, validate data, and store passwords securely using cryptographic hashes
Implement HMAC Authentication: Combine hash functions with secret keys for message authentication
Distinguish Deprecated Algorithms: Identify why MD5 and SHA-1 are insecure and select appropriate replacements
In 60 Seconds
Cryptographic hash functions produce a fixed-size, irreversible fingerprint of any data, enabling integrity verification and password storage for IoT devices without exposing the original values.
For Beginners: Hash Functions and Data Integrity
Hashing and data integrity checks ensure that IoT data has not been tampered with during transmission. Think of a wax seal on an envelope – if the seal is broken, you know someone opened it. Hash functions create a unique digital fingerprint of your data, so any change, no matter how small, is immediately detectable.
Sensor Squad: The Digital Fingerprint Machine!
“Every piece of data has a unique fingerprint!” Sammy the Sensor said, holding up a data packet. “When I feed this temperature reading into the SHA-256 hash function, out comes a 64-character hex string. Change even ONE tiny bit of the reading, and the fingerprint looks COMPLETELY different!”
Max the Microcontroller demonstrated. “Watch this: ‘Hello’ hashes to 185f8db… But ‘hello’ with a lowercase h hashes to 2cf24db… Totally different! This is called the avalanche effect – small changes cause massive differences in the output. That is what makes hash functions so useful for detecting tampering.”
“Hash functions are one-way,” Lila the LED explained. “You can turn data INTO a hash, but you cannot turn a hash BACK into data. It is like putting a document through a paper shredder – you can verify the shreds match the original, but you cannot reconstruct the document from the shreds. That is why hashes are safe for storing passwords – even if someone steals the hash, they cannot reverse it to find the password.”
“HMAC adds a secret key to the hash,” Bella the Battery noted. “Without a key, anyone can compute the hash. But with HMAC, only someone who knows the secret key can create or verify the hash. It is like a wax seal that needs a specific signet ring – without the ring, you cannot create a valid seal. And never use MD5 or SHA-1 – they have been broken! Always use SHA-256 or better.”
In Plain English
A hash function creates a fixed-size “fingerprint” of any data. Like a fingerprint, even tiny changes produce completely different outputs. Hash functions are one-way: you can create a hash from data, but you cannot reverse it to get the original data back.
Why it matters for IoT: When your smart thermostat downloads a firmware update, it computes the SHA-256 hash and compares it to the manufacturer’s published hash. If they match, the firmware is authentic and unmodified. If even one bit was changed (by an attacker or network error), the hash will be completely different.
12.2 How Hash Functions Work
A cryptographic hash function takes input of any size and produces a fixed-size output called a digest or hash.
Figure 12.1: Hash functions produce fixed-size outputs regardless of input size
12.2.1 Critical Properties
12.2.1.1 1. Collision Resistance
It must be computationally infeasible to find two different inputs that produce the same hash.
Why it matters: If attackers could create a malicious firmware file with the same hash as legitimate firmware, they could bypass integrity checks.
12.2.1.2 2. Preimage Resistance (One-Way)
Given a hash output, it must be computationally infeasible to find any input that produces that hash.
Why it matters: Password hashes stored in databases cannot be reversed to reveal the original passwords.
12.2.1.3 3. Avalanche Effect
Changing even one bit of input produces a completely different hash output (approximately 50% of bits change).
SHA-256 (Secure Hash Algorithm 256-bit) is the recommended hash function for IoT:
Property
Value
Output Size
256 bits (32 bytes, 64 hex characters)
Block Size
512 bits
Security Level
128-bit collision resistance
Speed
~200-500 MB/s on modern processors
Status
Current standard, no known attacks
12.3.1 IoT Use Cases
1. Firmware Integrity Verification
Manufacturer publishes:
firmware_v2.1.bin
SHA-256: 7d4e2f1a93c8b5d6e7f8a9b0c1d2e3f4...
Device downloads firmware, computes:
SHA-256(downloaded_firmware)
If hashes match: INSTALL
If hashes differ: REJECT (corrupted or tampered)
2. Password Storage
User password: "MySecretP@ss123"
Stored in DB: SHA-256(password + salt)
= "a3f2c9e847b1d5f6..."
Never store plaintext passwords!
(See Password Hashing section below for best practices)
3. Data Integrity in Transit
Sensor sends:
{data: "temp=25.0C", hash: SHA-256(data)}
Gateway receives and verifies:
computed_hash = SHA-256(received_data)
if computed_hash == received_hash: VALID
4. Blockchain and Merkle Trees
IoT audit logs use hash chains where each entry includes the hash of the previous entry, making tampering detectable.
12.4 HMAC: Hash-Based Message Authentication
HMAC combines a hash function with a secret key to provide both integrity AND authentication.
Figure 12.2: HMAC provides authentication and integrity verification
12.4.1 HMAC vs Plain Hash
Feature
SHA-256(data)
HMAC-SHA256(key, data)
Integrity
Yes
Yes
Authentication
No
Yes
Key Required
No
Yes
Prevents Forgery
No
Yes
Always use HMAC when you need to verify that a message came from an authorized sender AND was not modified. A plain hash only proves the data was not altered; HMAC also proves who created it.
12.5 Deprecated Algorithms
12.6 Never Use MD5 or SHA-1
These algorithms have known vulnerabilities and must not be used for security:
Algorithm
Output Size
Status
Vulnerability
MD5
128 bits
Broken
Collision attacks in seconds on a laptop
SHA-1
160 bits
Deprecated
Practical collision demonstrated by Google in 2017 (SHAttered)
Real-world impact: Attackers have created fake SSL certificates using MD5 collisions, enabling man-in-the-middle attacks.
collisionProb = {const n =10** numItems;const H = hashBits;// Use log-space to avoid overflow: exponent = -n^2 / (2 * 2^H)// log10(exponent) = 2*numItems - log10(2) * (H + 1)const log10Exp =2* numItems -Math.log10(2) * (H +1);if (log10Exp <-300) return0;// too smallif (log10Exp >2) return1.0;// certain collisionconst exponent =-(n * n) / (2*Math.pow(2, H));return1-Math.exp(exponent);}
Why this matters for IoT: If an attacker could find a collision for your firmware hash, they could create malicious firmware that passes integrity checks. MD5 collisions can be found in seconds on a laptop. SHA-256 collisions require approximately 2^128 operations – more energy than the sun produces in its lifetime.
12.8 Worked Example: Rainbow Table Attack Cost Analysis
Scenario: A smart home hub stores 50,000 user passwords as unsalted SHA-256 hashes. An attacker obtains the database backup. Calculate the attack cost and compare defenses.
Step 1: Estimate brute-force speed
Modern GPUs can compute SHA-256 hashes at extraordinary speeds:
Hardware
SHA-256 Hashes/second
Cost
RTX 4090 (1 GPU)
~22 billion/sec
~$1,600
8x RTX 4090 rig
~176 billion/sec
~$15,000
Cloud (100 GPUs)
~2.2 trillion/sec
~$50/hour
ASIC (Bitcoin miner)
100+ trillion/sec
~$3,000
Step 2: Calculate crack times for common password patterns
viewof pwLength = Inputs.range( [4,20], {value:8,step:1,label:"Password length (characters)"})viewof charsetChoice = Inputs.select( ["Digits only (10)","Lowercase (26)","Mixed case (52)","Alphanumeric (62)","All printable ASCII (95)"], {value:"Alphanumeric (62)",label:"Character set"})viewof gpuCount = Inputs.range( [1,100], {value:1,step:1,label:"Number of RTX 4090 GPUs"})html`<div style="border-left: 4px solid #3498DB; padding: 12px 16px; margin: 16px 0; background: #f8f9fa; border-radius: 0 4px 4px 0;"><h4 style="margin-top:0; color:#2C3E50;">Password Crack Time Calculator (SHA-256)</h4><table style="width:100%; border-collapse:collapse; font-size:0.95em;"><tr><td style="padding:4px 8px;"><strong>Character set size</strong></td><td>${charsetSize} characters</td></tr><tr><td style="padding:4px 8px;"><strong>Password length</strong></td><td>${pwLength} characters</td></tr><tr><td style="padding:4px 8px;"><strong>Keyspace</strong></td><td>${keyspace.toExponential(2)} combinations</td></tr><tr><td style="padding:4px 8px;"><strong>Hash rate</strong></td><td>${(22e9* gpuCount).toExponential(2)} SHA-256/sec (${gpuCount} GPU${gpuCount >1?"s":""})</td></tr><tr style="border-top:1px solid #ddd;"><td style="padding:4px 8px;"><strong>Time to exhaust keyspace</strong></td><td style="font-weight:bold; color:${crackSeconds <86400?'#E74C3C': crackSeconds <86400*365?'#E67E22':'#16A085'};">${crackTimeStr}</td></tr><tr><td style="padding:4px 8px;"><strong>Security assessment</strong></td><td>${crackSeconds <3600?"TRIVIAL -- cracked in under an hour": crackSeconds <86400?"WEAK -- cracked within a day": crackSeconds <86400*365?"MODERATE -- cracked within a year": crackSeconds <86400*365*1000?"STRONG -- takes years to crack":"EXCELLENT -- computationally infeasible"}</td></tr></table><p style="margin-bottom:0; font-size:0.85em; color:#7F8C8D;">Note: This assumes brute-force search of the full keyspace with unsalted SHA-256. Real attacks often use dictionaries and rules, which are much faster for common passwords. Always use bcrypt or Argon2 for password storage.</p></div>`
Show code
charsetSize = {const map = {"Digits only (10)":10,"Lowercase (26)":26,"Mixed case (52)":52,"Alphanumeric (62)":62,"All printable ASCII (95)":95};return map[charsetChoice];}keyspace =Math.pow(charsetSize, pwLength)crackSeconds = keyspace / (22e9* gpuCount)crackTimeStr = {const s = crackSeconds;if (s <0.001) return (s *1e6).toFixed(1) +" microseconds";if (s <1) return (s *1000).toFixed(2) +" milliseconds";if (s <60) return s.toFixed(1) +" seconds";if (s <3600) return (s /60).toFixed(1) +" minutes";if (s <86400) return (s /3600).toFixed(1) +" hours";if (s <86400*365) return (s /86400).toFixed(1) +" days";if (s <86400*365*1e6) return (s / (86400*365)).toFixed(1) +" years";return (s / (86400*365)).toExponential(2) +" years";}
Results (single RTX 4090):
Password Type
Keyspace
Time to Exhaust
Cost
6-digit PIN
1 million
0.045 ms
Free
8-char lowercase
209 billion
9.5 seconds
Free
8-char mixed case
53 trillion
40 minutes
Free
8-char alphanumeric
218 trillion
2.8 hours
$0.14 electricity
8-char all printable
6.6 quadrillion
3.5 days
$4.20 electricity
12-char alphanumeric
3.2 x 10^21
~4,650 years
Impractical
Step 3: Rainbow table pre-computation
For the 50,000 unsalted hashes, an attacker uses a pre-computed rainbow table:
Rainbow Table
Passwords Covered
Size
Lookup Time
8-char alphanumeric
All 218 trillion
~2 TB
<1 second per hash
RockYou + common variants
14 million + mutations
50 GB
Instant
Cost to crack 50,000 unsalted hashes: Under $100 using cloud GPUs and publicly available rainbow tables. Time: minutes to hours.
Step 4: Defense comparison
Defense
Crack Time (8-char password)
Cost to Attack
Unsalted SHA-256
2.8 hours
$0.14
Salted SHA-256
2.8 hours per password (x50,000)
$7,000
bcrypt (cost=12)
~72 years per password
$millions
Argon2id (64 MB, 3 iterations)
500+ years per password
Impractical
Key insight: Salting defeats rainbow tables (each password must be cracked individually), but SHA-256 is still too fast. Password hashing functions like bcrypt and Argon2 add deliberate slowness that makes GPU attacks impractical.
Real-World Breach: LinkedIn (2012)
What happened: 6.5 million LinkedIn passwords leaked, stored as unsalted SHA-1 hashes. Within hours, security researchers cracked 60% of passwords using rainbow tables and dictionary attacks.
Why unsalted hashes failed: With no salt, identical passwords (“password123”) produced identical hashes. Attackers pre-computed hashes for the top 10 million common passwords and matched them instantly against the entire database.
The fix: LinkedIn migrated to bcrypt with per-user salts. The 2016 follow-up breach revealed the original leak was actually 117 million accounts, but the bcrypt-protected passwords from the updated system remained uncracked.
IoT relevance: Many IoT platforms store device credentials the same way. A smart home cloud service storing 100,000 device passwords as unsalted hashes is one database leak away from total fleet compromise.
12.9 Password Hashing: Why Plain SHA-256 Is Not Enough
For storing user or device passwords, use password hashing functions (not plain SHA-256):
Function
Status
Use Case
Argon2id
Recommended
Modern systems, adjustable memory-hardness
bcrypt
Good
Legacy systems, widely supported
scrypt
Good
Memory-hard, prevents GPU attacks
PBKDF2
Acceptable
FIPS compliance required
Why not plain SHA-256? Plain hashes are too fast – attackers can try billions of passwords per second. Password hashing functions are intentionally slow (100ms+) to make brute-force attacks impractical.
import hashlibfrom hashlib import pbkdf2_hmacimport os# WRONG: Too fast to brute-forcepassword_hash = hashlib.sha256(password.encode()).hexdigest()# RIGHT: Intentionally slow with unique saltsalt = os.urandom(16)password_hash = pbkdf2_hmac('sha256', password.encode(), salt, 100_000)
12.10 Implementation Example
12.10.1 Python: HMAC-SHA256 for Sensor Data
import hmacimport hashlib# Shared secret key (pre-provisioned on sensor and gateway)secret_key =b"sensor_shared_secret_key"# Sensor readingsensor_data =b"temp=25.5,humidity=60%,time=1642345678"# Compute HMACmac_tag = hmac.new(secret_key, sensor_data, hashlib.sha256).hexdigest()# Sensor sends: {data: sensor_data, mac: mac_tag}# Gateway verifies:received_data = sensor_data # from networkreceived_mac = mac_tag # from networkcomputed_mac = hmac.new(secret_key, received_data, hashlib.sha256).hexdigest()if hmac.compare_digest(computed_mac, received_mac):print("VALID: Data is authentic and unmodified")else:print("INVALID: Data tampered or wrong key")
12.10.2 Key Points
Use hmac.compare_digest() – Prevents timing attacks
Use unique keys per device – Limits blast radius if compromised
Rotate keys periodically – Limits exposure from key leakage
Include timestamp – Prevents replay attacks
Try It: SHA-256 Hashing and the Avalanche Effect
Objective: See how even a tiny change in input produces a completely different hash output.
import hashlib# Hash a simple IoT sensor messagemessage1 ="temperature:22.5"message2 ="temperature:22.6"# Just 0.1 degree differencehash1 = hashlib.sha256(message1.encode()).hexdigest()hash2 = hashlib.sha256(message2.encode()).hexdigest()print(f"Message 1: {message1}")print(f"Hash 1: {hash1}")print(f"\nMessage 2: {message2}")print(f"Hash 2: {hash2}")# Count how many characters differdifferences =sum(1for a, b inzip(hash1, hash2) if a != b)print(f"\nCharacters that differ: {differences} out of {len(hash1)}")print(f"Percentage changed: {100* differences /len(hash1):.1f}%")
What to Observe:
The two hashes look completely unrelated despite inputs differing by one digit
Roughly 50% of hex characters change (the avalanche effect)
Both hashes are exactly 64 hex characters (256 bits) regardless of input length
Try It: HMAC for IoT Message Authentication
Objective: Build a simple sensor-to-gateway authentication system using HMAC-SHA256.
Objective: Compute SHA-256 hashes on an ESP32 using the built-in mbedTLS library.
#include <mbedtls/sha256.h>#include <Arduino.h>void printHash(unsignedchar* hash){for(int i =0; i <32; i++){ Serial.printf("%02x", hash[i]);} Serial.println();}void setup(){ Serial.begin(115200); delay(1000); Serial.println("=== ESP32 SHA-256 Demo ===\n");// Hash a sensor readingconstchar* msg1 ="temperature:22.5";constchar* msg2 ="temperature:22.6";unsignedchar hash1[32], hash2[32]; mbedtls_sha256((constunsignedchar*)msg1, strlen(msg1), hash1,0); mbedtls_sha256((constunsignedchar*)msg2, strlen(msg2), hash2,0); Serial.print("Message: "); Serial.println(msg1); Serial.print("SHA-256: "); printHash(hash1); Serial.print("\nMessage: "); Serial.println(msg2); Serial.print("SHA-256: "); printHash(hash2);// Count differing bytesint diff =0;for(int i =0; i <32; i++){if(hash1[i]!= hash2[i]) diff++;} Serial.printf("\nBytes that differ: %d/32 (%.0f%%)\n", diff,100.0* diff /32);// Timing testunsignedlong start = micros();for(int i =0; i <1000; i++){ mbedtls_sha256((constunsignedchar*)msg1, strlen(msg1), hash1,0);}unsignedlong elapsed = micros()- start; Serial.printf("\n1000 hashes in %lu us (%.1f us each)\n", elapsed, elapsed /1000.0);}void loop(){}
What to Observe: The ESP32 computes SHA-256 in microseconds thanks to hardware acceleration. The hash output is always 32 bytes regardless of input size.
12.11 Knowledge Check
Question: Hash Properties
A smart home device uses SHA-256 to verify firmware updates before installation. Which property of hash functions makes this security measure effective?
Options:
Hash functions can be reversed to recover the original firmware if needed
Hash functions are extremely slow, preventing attackers from generating fake firmware quickly
Hash functions produce a unique fixed-size fingerprint that changes completely if even one bit of the input changes
Hash functions encrypt the firmware so attackers cannot read it
Answer
Correct: C
Cryptographic hash functions produce a unique, fixed-size output (256 bits for SHA-256) that serves as a “fingerprint” of the input data. The avalanche effect means changing even a single bit produces a completely different hash. When the manufacturer publishes the official hash, the device computes its own hash of downloaded firmware – if they match, the firmware is authentic.
Hash functions are NOT encryption (they do not hide data), are NOT reversible (one-way only), and are actually very fast (not slow).
Question: HMAC vs Hash
Your IoT gateway receives sensor readings from remote devices. You want to verify that readings come from authorized sensors AND have not been tampered with. Which approach should you use?
Options:
SHA-256 hash of the sensor data
HMAC-SHA256 with a shared secret key
RSA encryption of the sensor data
Base64 encoding of the sensor data
Answer
Correct: B
HMAC-SHA256 combines a hash with a secret key to provide both integrity (data was not modified) AND authentication (sender knows the secret). Plain SHA-256 only provides integrity – anyone can compute a hash, so it does not prove who sent the data. RSA is for encryption/signatures (overkill here). Base64 is just encoding, not security.
Quiz: Hash Functions and Data Integrity
Concept Relationships
Concept
Depends On
Enables
Common Mistake
Hash Functions
One-way mathematical functions
Data integrity verification
“Hashes provide confidentiality” – NO, only integrity
Collision Resistance
Large output space (256+ bits)
Secure firmware verification
Using MD5/SHA-1 (both broken)
Preimage Resistance
Computational hardness
Password storage
Storing plaintext passwords
Avalanche Effect
Good hash design
Tamper detection
Thinking minor changes will not affect hash
HMAC
Hash function + secret key
Message authentication
Using plain hash for authentication
Password Hashing
Intentional slowness (KDF)
Brute-force resistance
Using fast hashes like SHA-256
Birthday Attack
Probability theory
Determines required hash size
Ignoring collision probability
Key Distinction: Hashing is not Encryption. Hashes are one-way (cannot decrypt), encryption is two-way. Use hashes for integrity/authentication, encryption for confidentiality.
Putting Numbers to It: Hash Collision Probability (Birthday Paradox)
The probability of finding at least one hash collision among \(n\) hashed items with \(H\)-bit output follows the birthday paradox.
Step 4: Items needed for 50% collision probability \[n_{50\%} = \sqrt{2 \times 2^{256} \times \ln(2)} \approx 4.0 \times 10^{38}\]
This is approximately \(2^{128}\), which is why SHA-256 has 128-bit collision resistance.
Result: With 1 billion firmware images, SHA-256 collision probability is \(4.3 \times 10^{-60}\) (effectively zero). You need approximately \(2^{128}\) hashes for a 50% collision chance.
In practice: SHA-256 provides 128-bit collision resistance (vs 256-bit preimage resistance). MD5 (128-bit output) has only \(2^{64}\) collision resistance – practical attacks exist. Always use SHA-256 minimum for security-critical hashing.
Hash Function: A deterministic function mapping arbitrary-length input to a fixed-length output (digest); the same input always produces the same output.
SHA-256: A 256-bit cryptographic hash function from the SHA-2 family; widely used for integrity verification and digital signatures.
BLAKE2: A fast cryptographic hash function optimized for embedded systems; often faster than MD5 with stronger security guarantees.
Collision Resistance: A property ensuring it is computationally infeasible to find two different inputs that produce the same hash output.
Pre-image Resistance: A property ensuring it is computationally infeasible to reverse a hash — given a hash output, you cannot recover the original input.
HMAC: Hash-based Message Authentication Code — a construction combining a hash function with a secret key to authenticate message origin and integrity.
Salt: A random value added to a password before hashing, ensuring that identical passwords produce different hashes and preventing rainbow table attacks.
Order the Firmware Integrity Verification Process
Place the steps of hash-based firmware verification in the correct order:
Common Pitfalls
1. Using MD5 or SHA-1 for Security Purposes
MD5 and SHA-1 have known collision vulnerabilities — attackers can craft different inputs producing the same hash. Use SHA-256 or BLAKE2 for any security-sensitive application.
2. Comparing Hashes with Non-Constant-Time Equality
Standard string comparison (==) short-circuits on the first differing byte, leaking timing information that attackers use to forge valid hashes. Always use constant-time comparison functions like hmac.compare_digest().
3. Hashing Passwords Without a Salt
Unsalted password hashes can be attacked with precomputed rainbow tables. Add a cryptographically random salt unique to each user/device before hashing, and use a password-specific hash function (bcrypt, PBKDF2, Argon2).
4. Using a Hash Alone for Message Authentication
A plain hash without a secret key can be recomputed by an attacker after modifying the message. Use HMAC (which incorporates a secret key) whenever message authentication is required.
Label the Diagram
💻 Code Challenge
12.13 Summary
Hash functions create fixed-size fingerprints of any data
SHA-256 is the recommended hash for IoT – 256-bit output, no known attacks
Collision resistance prevents attackers from creating fake data with matching hashes
Avalanche effect means tiny changes produce completely different hashes
HMAC combines hashing with a secret key for authentication + integrity
Never use MD5 or SHA-1 – both are cryptographically broken
Password hashing requires slow functions (Argon2id, bcrypt) not plain SHA-256
12.14 What’s Next
If you want to…
Read this
Understand how HMAC uses hash functions for authentication
Continue to TLS/DTLS Transport Security to learn how transport layer security protocols protect IoT communications by combining encryption, hashing, and authentication into a complete security solution.