5 Hash Functions and Data Integrity

Digests, Collision Resistance, HMAC, KDFs, and Password Hashing

security

cryptography

iot

Keywords

IoT hash functions, data integrity, HMAC, collision resistance, KDF, password hashing, SHA-256, SHA-3, salting

5.1 Start With the Fingerprint Question

Imagine a device downloads a firmware image and a short digest that is supposed to prove the image is safe. The digest can tell whether the bytes match an expected value, but it cannot say who chose that value unless the expected value is protected. That is the whole hash lesson in one question: what exactly is being compared, and who made the comparison trustworthy?

This chapter follows the hash family from that question. A plain digest detects change, HMAC adds shared-key proof, a signature protects artifacts many devices must trust, a KDF derives scoped keys, and a password hash slows offline guessing.

A Fingerprint for Data, Not a Lock and Not a Signature

A cryptographic hash function takes any input, of any size, and produces a fixed-size value called a digest. The same input always yields the same digest, and even a one-bit change to the input produces a completely different digest. That makes a digest a compact fingerprint: a way to tell whether two pieces of data are identical, or whether something changed in transit or in storage. In an IoT system, a digest can catch a corrupted firmware download or prove that a retained telemetry record still matches the bytes the gateway wrote.

The most common confusion is to treat a hash as if it were encryption or a signature. It is neither. A hash has no key and runs in one direction only: you cannot turn a digest back into the original data, so it hides nothing reversibly and provides no confidentiality. And because anyone can compute a hash over data they choose, a bare digest does not prove who produced the data. A hash detects change; it does not, on its own, keep a secret or prove an identity.

If you only need the intuition, this layer is enough: a hash is a one-way fingerprint that detects change but is not encryption and not a signature. To prove a message came from a specific party, add a key with HMAC or a digital signature. To store passwords, use a slow, salted password-hashing function, not a plain digest.

Think of a tamper-evident sticker that shows a unique pattern. If the pattern matches the one you expect, the package was not disturbed; if it differs, something changed. But the sticker alone does not say who applied it, and an attacker who can print their own stickers can reseal a package they opened. That is the gap between detecting change, which a hash does, and proving authenticity, which needs a key or a signature.

Choose a hash-based construction by its security role, not by digest length alone: a plain digest, an HMAC tag, a KDF, and a password verifier solve different jobs.

The One-Minute View

One-way fingerprint

A digest detects accidental or visible change and lets you compare data, but it cannot be reversed and carries no key, so it is not encryption.

Detecting change is not proving who

Anyone can recompute a plain hash. Proving a message came from a specific party needs a key (HMAC) or a private-key signature.

Pick the construction for the job

Plain digest, HMAC, KDF, and password hashing are different tools. A fast general-purpose hash is the wrong choice for storing passwords.

Beginner Examples

Comparing a downloaded file against a digest published in a signed release note catches corruption and tampering, because the expected digest itself is protected.
Appending a plain SHA-256 digest to a message proves nothing about the sender: an attacker who changes the message simply recomputes the digest. A keyed HMAC closes that gap.
Storing user passwords as plain fast hashes is unsafe; an attacker who steals the database can guess billions of candidates per second. A salted, deliberately slow password hash is the right tool.

Overview Knowledge Check

If you can explain why a hash detects change but is not encryption or a signature, you can stop here. Continue to Practitioner to choose the right hash-based construction for a real integrity decision.

Choose the Construction and Protect the Expected Value

The practical job is matching each integrity decision to the right construction and proving the expected value is trustworthy. A plain hash only helps when the digest you compare against already arrives through a protected path; the moment authenticity matters, the design needs a key or a signature. The common review failure is accepting a bare digest as if it authenticated its source. For example, a gateway that accepts an MQTT command from a device should verify a tag over the topic, device identity, sequence or nonce, and payload, not only a SHA-256 value over the payload text.

Walkthrough: From Input to a Trustworthy Check

Classify the input and the threat. Decide whether you are checking public data for corruption, authenticating a message between parties that share a secret, deriving keys, storing a password, or verifying a signed artifact.
Pick the construction. Plain digest for integrity against a trusted expected value, HMAC for keyed message authentication, a KDF for key derivation, a password-hashing function for verifier storage, and a digital signature for artifacts many devices must trust.
Protect the expected value. A plain digest is only as trustworthy as the channel that delivered the digest you compare against, so put it inside a signed manifest or an authenticated protocol.
Bind the exact bytes and context. Name precisely which bytes are hashed or covered by the tag, including metadata such as version, identity, and command fields, so nothing security-relevant is left outside.
Test failure closed. Prove that a modified input, a wrong key, a wrong context, a replay, and an unsupported algorithm are all rejected, and that logs record the failure category without leaking secrets.

A plain hash flags a mismatch but anyone can recompute it; HMAC mixes in a secret key, so a valid tag also proves the sender knew the key.

Choosing the Construction

Construction

Use For

Evidence to Show

Do Not Use For

Plain digest

Detecting corruption, content addressing, or comparing against a trusted expected digest.

Digest algorithm, the exact bytes hashed, and a trusted source for the expected digest.

Authenticating a sender or accepting an artifact from an untrusted channel.

HMAC

Integrity plus authenticity between two parties that share a secret key.

Key owner, algorithm, covered fields, tag length, and replay handling.

Public verification by many independent receivers with no shared secret.

Digital signature

Firmware, manifests, and certificates verified by many devices.

Signing authority, certificate chain or pinned key, exact signed bytes, rollback policy.

A constrained two-party channel where a shared key is the real trust model.

Password hashing

Storing password verifiers for users, devices, or service accounts.

Unique salt, deliberate work or memory cost, and an upgrade policy.

High-speed message integrity or deriving keys from strong shared secrets.

Worked Review: A Firmware Manifest Digest

A firmware package ships an image, a manifest, and a digest, all downloaded from the same server. The team says the digest proves the firmware is genuine. The reviewer turns that into evidence questions.

What the claim covers

The digest reliably detects accidental corruption of the image, and it confirms the image matches the digest that was supplied.

What the claim misses

Because the digest came from the same unauthenticated channel as the image, an attacker can replace both, and the device would happily verify the attacker's image against the attacker's digest.

Conclusion

Put the digest inside a signed manifest or authenticated protocol, verify the signing authority, version policy, rollback counter, and image digest, and add negative tests for a modified image, modified manifest, wrong key, and unsupported algorithm.

Practitioner Knowledge Check

If you can pick the construction and prove the expected value is protected, you can stop here. Continue to Under the Hood for the security properties, the HMAC and KDF mechanisms, and the failure modes.

Hash Properties, HMAC, KDFs, and Password Hashing

The deeper layer explains the properties a hash must have for security use, why collision resistance is the property that protects signatures, and why HMAC, KDFs, and password hashing are distinct constructions rather than the same hash applied differently.

The Properties That Make a Hash Cryptographic

A general-purpose hash only needs to scatter inputs. A cryptographic hash must also resist three attacks. Preimage resistance means that, given only a digest, it is infeasible to find any input that produces it. Second-preimage resistance means that, given one input, it is infeasible to find a different input with the same digest. Collision resistance means it is infeasible to find any two distinct inputs that share a digest. The avalanche effect ties these together: changing a single input bit flips about half the output bits, so similar inputs do not produce similar digests. Crucially, collision resistance is bounded by roughly half the digest length because of the birthday effect, so an n-bit digest offers only about n/2 bits of collision resistance, which is one reason very short digests are unsuitable for security.

Why Collision Resistance Protects Signatures

Signatures and certificates do not sign a whole document; they sign its digest. If an attacker can find two different documents with the same digest, a signature created over the harmless one is automatically valid for the malicious one. This is exactly why hash algorithms with known practical collisions, such as MD5 and SHA-1, must not be used for new signatures, certificates, or firmware approval, regardless of digest length. Current designs use the SHA-2 family (for example SHA-256) or SHA-3, and a release should name the algorithm its update profile allows and reject anything weaker.

HMAC: Keys Done Right

It is tempting to build a keyed tag as hash(key || message), but with the Merkle-Damgard construction used by SHA-2 that is vulnerable to a length-extension attack: an attacker who sees the tag and the message length can append data and produce a valid tag without knowing the key. HMAC avoids this with a nested, two-pass construction that mixes the key in twice, so it stays secure even with such hash functions. HMAC gives integrity and authenticity to anyone holding the shared key, but because the key is shared it cannot prove which of the two holders produced the tag; that non-repudiation property needs an asymmetric signature.

KDFs Versus Password Hashing

Both transform a secret input into output, but their threat models differ. A key derivation function such as HKDF assumes a high-entropy input, like a shared secret from key agreement, and its job is to extract uniform key material and expand it into separate keys bound to context: protocol, role, identities, and algorithm. A password-hashing function assumes a low-entropy input that a human or factory chose, so its job is to make offline guessing expensive. It adds a unique salt to every entry, which defeats precomputed tables and ensures identical passwords store differently, and it applies a deliberate work factor or memory cost so each guess is slow and resists GPU and custom-hardware acceleration. Functions such as Argon2, scrypt, bcrypt, and PBKDF2 are designed for this; a fast general-purpose hash is not. Using a plain KDF on a weak password does not add this guessing cost, and using a password hash for high-volume message integrity is needlessly slow.

A KDF expands a strong shared secret into context-bound keys; password hashing defends a low-entropy secret with salt and deliberate cost. The inputs differ, so the evidence differs.

Mechanisms and Failure Modes

Mechanism

What It Guarantees

Evidence to Request

Failure Mode If Weak

Collision-resistant algorithm

Two different inputs cannot share a digest, so a signature binds one document.

A current algorithm (SHA-2 or SHA-3) and a policy that rejects MD5 and SHA-1.

A collision lets a signature over a benign file validate a malicious one.

Protected expected value

A digest comparison actually reflects the trusted source's intent.

Expected digest delivered via signed manifest, certificate chain, or authenticated channel.

A digest from the same untrusted channel as the data authenticates nothing.

HMAC keyed tag

Integrity and authenticity between holders of a shared key.

Key owner, covered fields, tag length, and a length-extension-safe construction.

A keyless or naive hash(key||message) tag can be forged.

Context-bound KDF

Each derived key is distinct and tied to its purpose.

HKDF (or equivalent) with explicit role, identity, and protocol context.

Reused or context-free derivation collapses key separation.

Salted, costly password hash

Stolen verifiers resist offline guessing and precomputation.

Unique per-entry salt, a tuned work or memory cost, and an upgrade path.

A fast unsalted hash falls to rainbow tables and high-speed guessing.

Common Pitfalls

Treating a digest as a signature. A plain hash is not authenticity if an attacker can replace both the data and the expected digest.
Fast hashes for passwords. Password verifiers need unique salts and deliberate cost; a fast general-purpose hash makes offline guessing cheap.
Confusing KDF output with entropy. A KDF derives keys from input material and context; it does not manufacture entropy a weak input lacks.
Legacy collision-prone hashes. MD5 and SHA-1 must not back new signatures, certificates, or firmware approval, no matter the digest length.
Leaving fields outside the tag. An HMAC or signature must cover every security-relevant field, including routing, identity, and command metadata.
No migration path. Devices need a way to move to stronger algorithms, costs, and salts without disabling validation entirely.

Under-the-Hood Knowledge Check

At this depth, hashing is a family of distinct guarantees built on one one-way function: collision resistance that protects signatures, a protected expected value that makes a comparison meaningful, HMAC that adds a key safely, KDFs that derive scoped keys from strong secrets, and password hashing that makes weak secrets expensive to guess. Name the role, use a current algorithm, and test that bad inputs fail closed, rather than trusting any digest because it looks long enough.

5.2 Summary

A cryptographic hash maps any input to a fixed-size digest; it is one-way and keyless, so it detects change but is not encryption and not a signature.
Security needs preimage, second-preimage, and collision resistance plus the avalanche effect; collision resistance is bounded by roughly half the digest length.
A plain digest only authenticates when the expected value is protected by a signed manifest, certificate chain, or authenticated channel.
HMAC mixes in a shared secret to give integrity and authenticity, and its nested construction avoids the length-extension weakness of a naive keyed hash.
Signatures sign a digest, so a hash with practical collisions (MD5, SHA-1) breaks them; use SHA-2 or SHA-3 for new security decisions.
A KDF such as HKDF derives context-bound keys from a strong shared secret; it does not create entropy that a weak input lacks.
Password hashing uses a unique salt and deliberate work or memory cost (Argon2, scrypt, bcrypt, PBKDF2) to resist offline guessing; a fast hash is the wrong tool.

Key Takeaway

A hash is not encryption. Use a plain digest to detect change against a protected expected value, HMAC to authenticate messages with a shared secret, a digital signature for artifacts many devices trust, a KDF to derive keys from strong secrets, and a salted, costly password hash for stored passwords. Match the construction to the job, and use a current collision-resistant algorithm.

5.3 See Also

Symmetric Encryption

See how AEAD pairs confidentiality with a built-in authentication tag, the keyed integrity HMAC provides separately.

Public Key Cryptography

Follow how signatures sign a digest, and why collision resistance is what keeps a signature bound to one document.

Encryption Key Management

Connect KDF outputs, HMAC keys, and password verifiers to ownership, derivation context, and rotation.