10  Monitoring & Intrusion Detection

In 60 Seconds

Security monitoring is the detection layer of IoT defense-in-depth, continuously observing network traffic and device behavior to catch attacks that bypass preventive controls like firewalls and encryption. Two complementary approaches work together: signature-based detection matches known attack patterns (low false positives, but blind to novel attacks) and anomaly-based detection flags statistical outliers from established behavioral baselines (catches zero-day attacks, but requires tuning). With the average breach taking 194 days to detect, deploying both approaches with SIEM integration for cross-device event correlation can reduce detection time from months to minutes.

10.1 Learning Objectives

By the end of this chapter, you should be able to:

  • Differentiate signature-based from anomaly-based intrusion detection and select the appropriate approach for given IoT scenarios
  • Design anomaly detection systems that establish behavioral baselines for IoT device monitoring
  • Implement comprehensive security logging and audit trail architectures for forensic analysis
  • Integrate security events with SIEM systems for cross-device correlation
  • Apply security monitoring concepts through hands-on detection exercises

What is Security Monitoring? Security monitoring continuously observes systems for signs of attack or compromise. It’s like having security cameras and guards watching for suspicious behavior, but for digital systems.

Why does it matter for IoT? Even with strong preventive controls (firewalls, encryption), attacks can still succeed. Detection systems catch attacks in progress, enabling rapid response before significant damage occurs. The average time to detect a breach is 194 days (IBM, 2024) – monitoring aims to reduce this dramatically.

Key terms: | Term | Definition | |——|————| | IDS | Intrusion Detection System - monitors for attacks | | IPS | Intrusion Prevention System - blocks detected attacks | | SIEM | Security Information and Event Management - correlates events | | Anomaly Detection | Detecting unusual behavior vs. established baseline | | Signature-based | Detecting known attack patterns |

“Shhh, everyone quiet!” Sammy the Sensor whispered, his detection lights dimmed to stealth mode. “I am on night watch duty, monitoring everything that moves on our network. Every single data packet that comes through – I check it!”

Max the Microcontroller nodded. “Sammy does two kinds of watching. First, he has a book of known troublemakers – like a wanted poster list. If data matches a known attack pattern, that is called signature-based detection. But the really clever part is his second trick: he learns what NORMAL looks like, so when something unusual happens – like a sensor suddenly sending ten times more data than usual at 3 AM – he spots it even if it is a brand new type of attack!”

“When Sammy catches something suspicious, that is where I come in!” Lila the LED exclaimed. “I flash bright red alerts to the security team. It is like a burglar alarm for the digital world. The faster we alert, the faster humans can respond and stop the bad guys.”

Bella the Battery added, “And I make sure the monitoring system NEVER runs out of power. Because what good is a security camera that goes dark? The average time to catch a breach without monitoring is almost 200 days. With our Sensor Squad on watch, we aim to catch problems in minutes, not months!”

10.2 Prerequisites

Before diving into this chapter, you should be familiar with:

10.3 Introduction

Security monitoring is the detection layer of a defense-in-depth strategy. While preventive controls such as firewalls, encryption, and access control reduce attack surface, no prevention is perfect. Monitoring systems continuously observe network traffic, device behavior, and system events to detect attacks in progress, enabling rapid response before significant damage occurs. This chapter covers two core detection approaches – signature-based and anomaly-based intrusion detection – along with the logging infrastructure and SIEM integration needed to operationalize them in IoT environments.

10.4 Intrusion Detection Systems

Intrusion detection systems monitor network traffic and system activity for malicious activity or policy violations.

How It Works: Signature-Based Detection

Step 1: Build Signature Database

  • Security researchers analyze known malware and attacks
  • Extract patterns: specific byte sequences, packet structures, behavioral indicators
  • Example: Mirai botnet signature = telnet login with username “root” + password from list of 62 defaults

Step 2: Capture Network Traffic

  • IDS monitors network packets (NIDS) or host activity (HIDS)
  • Every packet analyzed against signature database in real-time
  • Low overhead: simple pattern matching (regex or hash comparison)

Step 3: Match and Alert

  • If packet matches signature → immediate alert generated
  • Alert includes: signature name, source IP, destination, severity, timestamp
  • Example: “SQL_INJECTION detected from 192.168.1.100 targeting device 192.168.10.50”

Step 4: Response Action

  • IDS (detection): Log alert, notify security team
  • IPS (prevention): Block traffic automatically + log + notify

Why It Works: Known attacks have recognizable patterns. Like a burglar alarm programmed to detect specific break-in techniques.

Limitation: Zero-day attacks with no existing signature go undetected. This is why anomaly detection is also needed.

How It Works: Anomaly-Based Detection Using Z-Score

Step 1: Establish Baseline (Learning Phase - 7-30 days)

  • Monitor normal device behavior: packet sizes, transmission intervals, destinations contacted
  • Calculate statistics: mean (average), standard deviation (variation)
  • Example: Temperature sensor baseline = 1024 bytes every 60 seconds ± 50 bytes

Step 2: Compute Z-Score for Each New Observation

Z = (Current Value - Baseline Mean) / Baseline Standard Deviation

If Z > 3: Value is 3+ standard deviations away from normal
→ Only 0.3% of normal observations would be this extreme

Step 3: Threshold Decision

  • Z > 3: Flag as MEDIUM severity (investigate)
  • Z > 5: Flag as HIGH severity (likely attack)
  • Z > 10: Flag as CRITICAL (definitely malicious)

Step 4: Alert and Adapt

  • Generate alert with Z-score, current value, baseline for context
  • Periodically retrain baseline (monthly) to adapt to legitimate changes
  • Example: After firmware update, normal packet size increases from 1KB to 1.5KB → retrain baseline

Why It Works: Attacks cause behavioral changes. Data exfiltration = huge packet sizes. DDoS = massive transmission rate. Both are statistical outliers.

Limitation: Requires accurate baseline. False positives if baseline drifts. Higher computational cost than signature matching.

10.4.1 Signature-Based vs Anomaly-Based Detection

Aspect Signature-Based Anomaly-Based
How it works Match patterns against known attack signatures Compare behavior against established baseline
Detection Known attacks only Novel attacks + known attacks
False positives Low Higher (requires tuning)
Setup effort Low (use vendor signatures) High (must establish baseline)
Maintenance Update signature database Retrain models periodically
Best for Detecting known malware, exploits Detecting zero-day attacks, insider threats

10.4.2 Signature-Based Detection Example

# Signature-based IDS for MQTT (simplified pseudocode)
class SignatureBasedIDS:
    def __init__(self):
        self.signatures = [
            {
                "name": "MQTT_Injection",
                "pattern": r"[\x00-\x1f]",  # Control characters
                "action": "block"
            },
            {
                "name": "SQL_Injection",
                "pattern": r"(SELECT|INSERT|DELETE|DROP)\s",
                "action": "alert"
            },
            {
                "name": "Command_Injection",
                "pattern": r"[;&|`$]",  # Shell metacharacters
                "action": "block"
            },
            {
                "name": "Port_Scan",
                "pattern": "5+ connection attempts in 10 seconds",
                "action": "alert"
            }
        ]

    def analyze_traffic(self, packet):
        for sig in self.signatures:
            if re.search(sig["pattern"], packet.payload):
                self.trigger_alert(sig, packet)

    def trigger_alert(self, signature, packet):
        alert = {
            "type": signature["name"],
            "severity": "HIGH" if signature["action"] == "block" else "MEDIUM",
            "source_ip": packet.source_ip,
            "timestamp": datetime.now()
        }
        self.send_to_siem(alert)

10.4.3 Anomaly-Based Detection Example

# Anomaly-based IDS using statistical analysis
class AnomalyBasedIDS:
    def __init__(self):
        self.baseline = {}  # device_id -> baseline metrics

    def establish_baseline(self, device_id, historical_data):
        """Build baseline from 7-14 days of normal traffic"""
        self.baseline[device_id] = {
            "packet_size": {
                "mean": np.mean(historical_data["sizes"]),
                "std": np.std(historical_data["sizes"])
            },
            "interval": {
                "mean": np.mean(historical_data["intervals"]),
                "std": np.std(historical_data["intervals"])
            },
            "destinations": set(historical_data["destinations"])
        }

    def analyze_traffic(self, device_id, current_data):
        """Compare current behavior to baseline"""
        if device_id not in self.baseline:
            return  # No baseline yet

        baseline = self.baseline[device_id]

        # Check packet size anomaly (z-score > 3)
        z_size = (current_data["size"] - baseline["packet_size"]["mean"]) / \
                 baseline["packet_size"]["std"]
        if abs(z_size) > 3:
            self.trigger_alert("PACKET_SIZE_ANOMALY", device_id, z_size)

        # Check for new destinations
        if current_data["destination"] not in baseline["destinations"]:
            self.trigger_alert("NEW_DESTINATION", device_id,
                             current_data["destination"])

        # Check transmission interval
        z_interval = (current_data["interval"] - baseline["interval"]["mean"]) / \
                     baseline["interval"]["std"]
        if abs(z_interval) > 3:
            self.trigger_alert("INTERVAL_ANOMALY", device_id, z_interval)

    def trigger_alert(self, alert_type, device_id, details):
        alert = {
            "type": alert_type,
            "device_id": device_id,
            "details": details,
            "severity": "HIGH" if "NEW_DESTINATION" in alert_type else "MEDIUM",
            "timestamp": datetime.now()
        }
        self.send_to_siem(alert)

10.5 Attack Detection Scenarios

Attack Type Normal Behavior Anomalous Behavior Detection Method
Data Exfiltration 1 KB packets 10 MB packets Packet size z-score > 5
DDoS Participation 1 packet/min 1000 packets/sec Interval z-score > 5
C&C Beaconing No C&C contact Contact unknown IP every 5 sec Unknown destination IP
Cryptomining 5-10% CPU 95% CPU sustained CPU usage z-score > 3
Port Scanning MQTT:8883 only Scan ports 1-65535 Connection count > 10
Network anomaly detection pipeline showing four stages: baseline normal traffic patterns, compare real-time against baseline, severity-based alerting thresholds, and response actions (log, alert, or block) based on severity level
Figure 10.1: Network anomaly detection with severity-based alerting

10.6 Security Logging and Audit Trails

Comprehensive logging enables incident investigation, compliance verification, and forensic analysis.

10.6.1 What to Log

Event Category Specific Events Retention
Authentication Login success/failure, session start/end 1 year
Authorization Access granted/denied, permission changes 1 year
Configuration Setting changes, firmware updates 7 years
Network Connection attempts, data transfers 90 days
Security IDS alerts, policy violations 2 years
System Boot, shutdown, errors 90 days

10.6.2 Security Log Schema

{
  "event_id": "uuid-v4",
  "timestamp": "2026-01-10T14:32:15.234Z",
  "event_type": "AUTHENTICATION_ATTEMPT",
  "outcome": "FAILURE",
  "severity": "MEDIUM",
  "source": {
    "device_id": "sensor-42",
    "ip_address": "192.168.1.100",
    "mac_address": "AA:BB:CC:DD:EE:FF"
  },
  "target": {
    "service": "MQTT_BROKER",
    "endpoint": "mqtt.company.com:8883"
  },
  "details": {
    "reason": "INVALID_CERTIFICATE",
    "certificate_cn": "sensor-42.factory.com",
    "certificate_expiry": "2025-12-31T23:59:59Z"
  },
  "context": {
    "session_id": "sess-123456",
    "request_id": "req-789012"
  }
}

10.6.3 SIEM Integration

Security Information and Event Management (SIEM) systems correlate events across multiple sources to detect complex attacks.

SIEM integration pipeline showing four stages: collect logs from all IoT devices, aggregate and normalize data, analyze for patterns and anomalies, and trigger automated incident response alerts
Figure 10.2: SIEM integration for centralized security monitoring
Knowledge Check: Signature vs Anomaly Detection

Question: A new zero-day exploit targets MQTT brokers by sending malformed CONNECT packets. Your IoT network has both signature-based and anomaly-based IDS deployed. Which system detects this attack first, and why?

Click to reveal answer

Answer: The anomaly-based IDS detects it first. Signature-based IDS requires a pre-existing pattern match for the specific exploit, which does not exist for a zero-day attack. The anomaly-based system detects the malformed packets because they deviate from the established baseline of normal MQTT CONNECT packet structure (size, frequency, payload patterns). After the zero-day is analyzed and a signature is created, the signature-based IDS can then detect future occurrences with lower false-positive rates.

10.7 Practice Exercises

Objective: Design and implement a role-based access control system for an IoT platform, with emphasis on the audit logging that feeds into the monitoring pipeline.

Scenario: You’re building a smart building management system with 3 user types: Administrators, Operators, and Viewers.

Tasks:

  1. Define Role-Permission Matrix:

    Role Read Sensors Write Data Control Actuators Manage Devices Admin Panel
    Admin Yes Yes Yes Yes Yes
    Operator Yes No Yes No No
    Viewer Yes No No No No
  2. Design Access Control Flow:

    • User authenticates with credentials or certificate
    • System looks up user’s role in the role-permission matrix
    • Permission check: Does this role have the required permission?
    • If yes, execute action and log success; if no, return 403 and log denial
  3. Test RBAC Enforcement:

    Test User Role Action Expected Result
    1 Viewer Unlock door 403 Forbidden
    2 Operator Delete device 403 Forbidden
    3 Admin Unlock door 200 Success
    4 (Invalid token) Any 401 Unauthorized

Expected Outcome: Working RBAC implementation with complete audit trail. Every access decision (granted or denied) should be logged in the format described in the Security Log Schema section above, enabling SIEM correlation.

Objective: Implement mutual TLS authentication for device-to-server communication.

Scenario: Your smart factory has 100 industrial sensors connecting to an MQTT broker. Implement mTLS so only authorized devices can connect.

Tasks:

  1. Create Certificate Authority:

    # Generate CA
    openssl genrsa -out ca.key 4096
    openssl req -new -x509 -days 3650 -key ca.key -out ca.crt \
        -subj "/C=US/ST=CA/O=MyCompany/CN=MyCompany Root CA"
    
    # Generate server certificate
    openssl genrsa -out server.key 2048
    openssl req -new -key server.key -out server.csr \
        -subj "/C=US/ST=CA/O=MyCompany/CN=mqtt.mycompany.com"
    openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key \
        -CAcreateserial -out server.crt -days 365
    
    # Generate device certificate
    openssl genrsa -out device1.key 2048
    openssl req -new -key device1.key -out device1.csr \
        -subj "/C=US/ST=CA/O=MyCompany/CN=device-001"
    openssl x509 -req -in device1.csr -CA ca.crt -CAkey ca.key \
        -CAcreateserial -out device1.crt -days 365
  2. Configure MQTT broker for mTLS:

    listener 8883
    certfile /etc/mosquitto/certs/server.crt
    keyfile /etc/mosquitto/certs/server.key
    cafile /etc/mosquitto/certs/ca.crt
    require_certificate true
    use_identity_as_username true
  3. Test Security:

    • Device with valid certificate connects successfully
    • Device without certificate is rejected
    • Device with wrong certificate is rejected
    • Device rejects connection to fake broker

Expected Outcome: Bidirectional authentication with full audit trail. Log all connection attempts (successful and rejected) so your IDS can detect patterns like certificate brute-forcing or unauthorized device enrollment.

Objective: Implement anomaly-based intrusion detection for IoT devices.

Scenario: Your smart home temperature sensors normally send 1KB every 60 seconds. Build a system to detect unusual behavior.

Tasks:

  1. Establish Normal Baseline:

    Metric Normal Value Purpose
    Packet Size 1024 bytes +/- 50 Detect data exfiltration
    Interval 60 sec +/- 5 Detect beaconing/DDoS
    Destinations mqtt.company.com only Detect C&C communication
  2. Implement Detection:

    def detect_anomaly(current, baseline):
        z_score = (current - baseline['mean']) / baseline['std']
        if abs(z_score) > 3:
            return "MEDIUM" if abs(z_score) <= 5 else "HIGH"
        return None
  3. Test Detection Scenarios:

    Attack Normal Anomalous Detection
    Exfiltration 1 KB 10 MB z > 5
    DDoS 1/min 1000/sec z > 5
    C&C Known IP Unknown IP New destination

Expected Outcome: Working anomaly detection with tuned thresholds and false positive analysis.

Objective: Build a secure over-the-air update system with cryptographic verification.

Scenario: You have 1,000 IoT sensors deployed. Push a security patch without physical access.

Tasks:

  1. Generate signing keys:

    openssl genrsa -out firmware_signing_key.pem 2048
    openssl rsa -in firmware_signing_key.pem -pubout -out firmware_public_key.pem
  2. Sign firmware:

    openssl dgst -sha256 -sign firmware_signing_key.pem \
        -sigopt rsa_padding_mode:pss \
        -out firmware_v2.0.sig \
        firmware_v2.0.bin
  3. Verify on device (before installation):

    • Compute SHA-256 hash of downloaded firmware
    • Verify RSA signature with embedded public key
    • Reject if signature invalid
    • Install only if verification passes
  4. Test Security:

    • Correct signature: Installation succeeds
    • Modified firmware: Signature fails, blocked
    • Wrong key signature: Verification fails, blocked
    • Old firmware (downgrade): Version check fails

Expected Outcome: Secure OTA with anti-rollback protection. All update events (download, verification, installation, rollback) should be logged and forwarded to the SIEM for fleet-wide anomaly detection (e.g., unexpected update failures across multiple devices).

10.8 Knowledge Checks

Objective: Build a simple network behavior monitoring system that establishes baselines and detects anomalous device behavior – the core of any IoT IDS.

import random
import math
from collections import defaultdict

class IoTNetworkMonitor:
    """Behavioral anomaly detection for IoT device traffic"""

    def __init__(self, baseline_window=50, z_threshold=3.0):
        self.baselines = {}  # device_id -> {metric -> {mean, std}}
        self.history = defaultdict(lambda: defaultdict(list))
        self.baseline_window = baseline_window
        self.z_threshold = z_threshold
        self.alerts = []

    def update_baseline(self, device_id, metric, values):
        """Compute rolling baseline statistics"""
        if len(values) < 10:
            return
        mean = sum(values) / len(values)
        std = math.sqrt(sum((v - mean) ** 2 for v in values) / len(values))
        if device_id not in self.baselines:
            self.baselines[device_id] = {}
        self.baselines[device_id][metric] = {"mean": mean, "std": max(std, 0.1)}

    def check_reading(self, device_id, metrics):
        """Check device metrics against baseline, return alerts"""
        alerts = []
        for metric, value in metrics.items():
            self.history[device_id][metric].append(value)
            recent = self.history[device_id][metric][-self.baseline_window:]
            self.update_baseline(device_id, metric, recent[:-1])

            if device_id in self.baselines and metric in self.baselines[device_id]:
                baseline = self.baselines[device_id][metric]
                z = abs(value - baseline["mean"]) / baseline["std"]
                if z > self.z_threshold:
                    severity = "CRITICAL" if z > 5 else "HIGH" if z > 4 else "MEDIUM"
                    alert = {
                        "device": device_id, "metric": metric,
                        "value": value, "z_score": round(z, 2),
                        "baseline_mean": round(baseline["mean"], 1),
                        "severity": severity
                    }
                    alerts.append(alert)
                    self.alerts.append(alert)
        return alerts

# Simulate IoT network with normal and attack traffic
monitor = IoTNetworkMonitor()
random.seed(42)

# Build baseline with 60 normal readings per device
devices = ["sensor_001", "sensor_002", "gateway_001"]
for _ in range(60):
    for dev in devices:
        monitor.check_reading(dev, {
            "bytes_per_min": random.gauss(1024, 100),
            "requests_per_min": random.gauss(10, 2),
            "unique_destinations": random.gauss(2, 0.5),
        })

print("Baseline established. Simulating attack scenarios...\n")

# Scenario 1: Data exfiltration (sensor_001 sends 50x normal data)
print("--- Scenario 1: Data Exfiltration ---")
alerts = monitor.check_reading("sensor_001", {
    "bytes_per_min": 50000,   # 50x normal
    "requests_per_min": 12,   # Normal
    "unique_destinations": 2, # Normal
})
for a in alerts:
    print(f"  [{a['severity']}] {a['device']}: {a['metric']}={a['value']} "
          f"(z={a['z_score']}, baseline={a['baseline_mean']})")

# Scenario 2: C&C beaconing (gateway contacts many new destinations)
print("\n--- Scenario 2: C&C Beaconing ---")
alerts = monitor.check_reading("gateway_001", {
    "bytes_per_min": 1100,         # Slightly above normal
    "requests_per_min": 50,        # 5x normal
    "unique_destinations": 15,     # 7x normal destinations
})
for a in alerts:
    print(f"  [{a['severity']}] {a['device']}: {a['metric']}={a['value']} "
          f"(z={a['z_score']}, baseline={a['baseline_mean']})")

# Scenario 3: DDoS amplification (sensor becomes attack reflector)
print("\n--- Scenario 3: DDoS Amplification ---")
alerts = monitor.check_reading("sensor_002", {
    "bytes_per_min": 500000,  # 500x normal
    "requests_per_min": 1000, # 100x normal
    "unique_destinations": 50, # 25x normal
})
for a in alerts:
    print(f"  [{a['severity']}] {a['device']}: {a['metric']}={a['value']} "
          f"(z={a['z_score']}, baseline={a['baseline_mean']})")

print(f"\nTotal alerts generated: {len(monitor.alerts)}")

What to Observe:

  1. The monitor establishes per-device behavioral baselines from normal traffic
  2. Data exfiltration triggers a bytes anomaly but request count stays normal
  3. C&C beaconing shows abnormal destinations and request rates
  4. DDoS amplification triggers all three metrics simultaneously – highest severity
  5. Z-score-based detection requires no prior knowledge of attack signatures

10.9 Case Study: Verkada Camera Breach (2021)

Real-World Case Study: How Anomaly Detection Could Have Stopped the Verkada Hack

Company: Verkada, a cloud-based security camera company serving 5,200+ customers including hospitals, prisons, schools, and Tesla factories.

What happened (March 2021): A hacking group gained access to Verkada’s internal support tools using a hardcoded administrator credential found in an exposed Jenkins server. They accessed live feeds from 150,000+ cameras, including sensitive locations like Halifax Health hospital patient rooms and Tesla manufacturing floors.

Timeline:

  • Attackers found exposed Jenkins CI/CD server with default credentials
  • Used Jenkins access to discover a super-admin service account password
  • Super-admin account had unrestricted access to all customer cameras
  • Attackers had access for approximately 36 hours before Verkada detected the breach
  • Detection came from a public tweet by the attackers, not from monitoring systems

What monitoring should have caught:

Anomaly Signal Normal Baseline Attack Behavior Detection Method
Admin login location Mountain View, CA (HQ) Multiple countries simultaneously Geo-velocity check: impossible travel speed
Camera access pattern Support tickets access 1-3 cameras per session 150,000+ cameras accessed in batch Volume anomaly: z-score > 100
Access time pattern Business hours, weekdays Continuous access 24/7 over 36 hours Time-based anomaly detection
Data export volume ~50 MB/day for support Terabytes of video exported Egress volume z-score > 50
Account behavior 1 concurrent session per admin Same account active from 10+ IPs Session multiplicity alert

Cost of inadequate monitoring:

  • SEC investigation and regulatory fines
  • $100+ million class-action lawsuits from affected customers
  • Loss of customer trust – several major clients (including Tesla) publicly reviewed their contracts
  • Verkada had to rebuild its entire access control architecture

Key lessons for IoT security monitoring:

  1. Never use hardcoded credentials – but if they exist, anomaly detection on privileged accounts catches misuse
  2. Baseline every admin account: typical login times, locations, access volumes, and session patterns
  3. Implement impossible-travel detection: if an account logs in from New York and London within 30 minutes, it is compromised
  4. Alert on access volume spikes: no legitimate admin needs to view 150,000 cameras in one session
  5. Detect data exfiltration: monitor egress bandwidth per account – terabytes of video export should trigger immediate investigation

The 36-hour detection gap is significant. With the anomaly-based IDS techniques described in this chapter (z-score thresholds on access volume, geo-velocity checks, session multiplicity monitoring), automated alerts would have fired within minutes of the attackers beginning to access cameras at scale.

A smart factory has 200 IoT sensors. Each sensor normally sends 10 KB of telemetry data per hour to an MQTT broker. Sensor ID 42 suddenly sends 50 MB in one hour. Calculate whether this triggers an anomaly alert using z-score threshold detection.

Baseline (established over 30 days of normal operation):

Mean (μ): 10,240 bytes/hour per sensor
Standard deviation (σ): 512 bytes/hour (sensors are consistent)
Sample size: 200 sensors × 720 hours = 144,000 observations

Current observation (Sensor 42, hour 1842):

Actual traffic: 50,000,000 bytes (50 MB)

Z-Score Calculation:

z = (X - μ) / σ

Where:
  X = current value = 50,000,000 bytes
  μ = baseline mean = 10,240 bytes
  σ = baseline standard deviation = 512 bytes

z = (50,000,000 - 10,240) / 512
z = 49,989,760 / 512
z = 97,636

Absolute z-score: |97,636| = 97,636

Threshold Evaluation:

Alert thresholds (from security policy):
  MEDIUM severity: z > 3 (only 0.3% of normal values exceed this)
  HIGH severity: z > 5 (only 0.00006% of normal values exceed this)
  CRITICAL severity: z > 10 (virtually certain attack)

Sensor 42 z-score: 97,636
  >> 10 (CRITICAL threshold)

Severity: CRITICAL

Alert Details:

{
  "alert_id": "ALT-2026-01-08-001842",
  "severity": "CRITICAL",
  "type": "DATA_EXFILTRATION_DETECTED",
  "device_id": "sensor-042",
  "metric": "hourly_bytes_transmitted",
  "baseline_mean": 10240,
  "baseline_std": 512,
  "current_value": 50000000,
  "z_score": 97636,
  "deviation_factor": "4,882× baseline",
  "timestamp": "2026-01-08T14:32:00Z",
  "recommended_action": "ISOLATE_DEVICE_IMMEDIATELY"
}

Interpretation:

The z-score of 97,636 indicates the current traffic is 97,636 standard deviations away from the mean. In a normal distribution: - z=3: 99.7% of observations fall within (1 in 370 is beyond) - z=5: 99.99994% within (1 in 1.7 million beyond) - z=97,636: Probability of natural occurrence ≈ 10^-2,074,000,000 (essentially impossible)

Conclusion: This is definitively an attack, not random variation. Sensor 42 is either compromised (data exfiltration) or malfunctioning. Automated response: Isolate sensor from network immediately, trigger investigation workflow.

Key Insight: Z-score detection requires no prior knowledge of attack signatures. The algorithm simply recognizes “this behavior is statistically impossible under normal conditions” and flags it, even if the attack is zero-day malware never seen before.

10.9.1 Interactive: Z-Score Anomaly Detector

Adjust the parameters below to see how z-score anomaly detection works with different baselines and observed values.

You’re securing a 500-device IoT network with constrained budget. Compare signature-based vs anomaly-based intrusion detection systems.

Criterion Signature-Based IDS Anomaly-Based IDS Hybrid IDS Best For
Detects Known Attacks Excellent (99%+) Good (85-90%) Excellent (99%+) CVE exploits, malware
Detects Zero-Day Attacks Poor (0%) Excellent (80-95%) Excellent (90%+) Novel threats
False Positive Rate Very Low (0.1-1%) Medium-High (5-20%) Low (1-5%) Operations impact
Setup Effort Low (load vendor signatures) High (30-90 day baseline) High (baseline + signatures) Time to deployment
Maintenance Medium (update signatures monthly) Medium (retrain quarterly) High (both) Ongoing effort
Resource Usage Low (pattern matching) High (statistical computation) High (both engines) Constrained gateways
Cost $5-20/device/year $10-40/device/year $20-60/device/year Budget

Decision Criteria:

Use Signature-Based IDS if:

  • Protecting against known threats (SQL injection, command injection, malware)
  • Low false positive tolerance (can’t afford alarm fatigue)
  • Limited baseline data (new deployment, <30 days operational)
  • Constrained gateway resources (ESP32, Raspberry Pi)
  • Example: Small smart home (20 devices, consumer-grade gateway)
  • Cost: $100-400/year for 20 devices
  • Limitation: Won’t catch novel attacks (assumes attacker uses known techniques)

Use Anomaly-Based IDS if:

  • High-value target likely to face zero-day attacks (critical infrastructure, defense)
  • Can afford 5-20% false positive rate (security team can investigate)
  • Have 30-90 days of clean traffic to establish baseline
  • Sufficient gateway resources (x86 server, commercial edge gateway)
  • Example: Industrial control system (500 devices, dedicated security team)
  • Cost: $5,000-20,000/year for 500 devices
  • Benefit: Detected Stuxnet-like attacks before signatures existed

Use Hybrid IDS if:

  • Need both known threat protection and zero-day detection
  • Can afford higher cost ($20-60/device/year)
  • Have security operations center (SOC) to handle 1-5% false positive rate
  • Example: Smart city infrastructure (2,000+ devices, city IT department)
  • Cost: $40,000-120,000/year for 2,000 devices
  • Approach: Signature-based catches 90% of known attacks (auto-block), anomaly-based flags suspicious behavior for human review

Recommended Hybrid Configuration (80% of Enterprise IoT Deployments):

Layer 1: Signature-Based (Snort/Suricata)
  - Block known attacks automatically
  - Rules for MQTT injection, CoAP exploits, Modbus attacks
  - Update signatures weekly from vendor feeds

Layer 2: Anomaly Detection (Custom/Darktrace IoT)
  - Flag z-score > 3 as MEDIUM (human review)
  - Flag z-score > 5 as HIGH (auto-isolate + review)
  - Metrics: packet size, interval, destination IPs, protocol usage
  - Retrain baseline monthly (rolling 30-day window)

Layer 3: SIEM Correlation (Splunk/ELK)
  - Correlate signature hits + anomaly alerts + authentication logs
  - Multi-stage attack detection (e.g., scan → exploit → exfiltration)

Budget Allocation:

  • $10K budget → Signature-based only (Snort + pfSense)
  • $50K budget → Hybrid (Snort + basic anomaly + SIEM)
  • $200K+ budget → Commercial hybrid (Darktrace IoT + CrowdStrike + SIEM)
Common Mistake: Setting Anomaly Detection Thresholds Too Tight Without Tuning Period

The Error: A developer deploys anomaly-based IDS with default threshold z > 2 (95% confidence) without a tuning period, expecting to catch all suspicious behavior immediately. Within 24 hours, the security team receives 847 alerts. Investigation reveals 839 (99%) are false positives caused by legitimate but unusual events (firmware update, weekend traffic drop, network maintenance).

Why It Fails: Statistical anomaly detection requires understanding normal variance before defining “abnormal”. A z-score threshold of 2 means 5% of all observations will trigger alerts even under perfectly normal conditions (by definition of 95% confidence interval). For a network with 500 devices generating 1 metric per minute:

Total observations per day: 500 devices × 1,440 minutes = 720,000 observations

Expected false positives at z > 2 threshold:
  720,000 × 5% = 36,000 alerts/day (99% false positives)

Expected false positives at z > 3 threshold:
  720,000 × 0.3% = 2,160 alerts/day

Expected false positives at z > 5 threshold:
  720,000 × 0.00006% = 0.4 alerts/day (mostly true positives)

The Impact:

The security team is overwhelmed by 847 alerts on day 1, spends 16 hours investigating, finds all but 8 are false positives, and disables the IDS entirely to stop the noise. Two weeks later, an actual data exfiltration attack (z=37) goes undetected because monitoring was turned off.

Real-World Example (2018 Healthcare IoT Deployment): A hospital deployed anomaly detection across 1,200 medical IoT devices (infusion pumps, monitors, beds). Initial threshold z > 2 generated 3,000+ alerts/day. Security staff investigated for 2 weeks, found 99.7% false positives (nurse call button spikes during shift change, bed weight sensors during patient transfer). They increased threshold to z > 4, reducing alerts to 12/day, and added 30-day tuning period to learn shift-change patterns. After tuning, false positive rate dropped to 5%, and the system detected a ransomware infection 6 minutes after initial device compromise (before lateral spread).

How to Avoid:

Phase 1: Baseline Period (30-90 days)

  1. Run IDS in monitoring-only mode (no alerts, just collect data)
  2. Capture full range of normal variation (weekday/weekend, business hours, maintenance windows)
  3. Calculate mean and standard deviation per metric per device
  4. Identify expected high-variance events (firmware updates, backups, shift changes)

Phase 2: Tuning Period (30 days)

  1. Start with conservative threshold: z > 5 (0.00006% false positive rate)
  2. Log all alerts, but don’t block traffic (alert-only mode)
  3. Review all alerts daily, classify as true positive or false positive
  4. Adjust thresholds based on false positive rate:
    • Goal: <5% false positive rate per day
    • If too many false positives: increase threshold to z > 6 or z > 7
    • If no true positives in 30 days: decrease threshold to z > 4

Phase 3: Production (ongoing)

  1. Enable automated response for z > 7 (critical severity, very rare false positives)
  2. Human review for z > 4 (high severity, moderate false positive rate)
  3. Retrain baseline quarterly (rolling 90-day window)

Threshold Selection Guide: | Z-Score | False Positive Rate | Alert Volume (500 devices) | Use Case | |———|———————|—————————-|———-| | > 2 | 5% | 36,000/day | Never use (too many alerts) | | > 3 | 0.3% | 2,160/day | Tuning phase only | | > 4 | 0.006% | 43/day | Human review, normal operations | | > 5 | 0.00006% | 0.4/day | Auto-alert, high confidence | | > 7 | ~0% | 0.01/day | Auto-block, critical severity |

10.10 Concept Relationships

How Monitoring Concepts Connect
Core Concept Builds Upon Enables Real-World Application
Signature-Based IDS Known attack patterns Detection of Mirai, SQL injection, port scans Blocks 90% of automated attacks with near-zero false positives
Anomaly-Based IDS Statistical baselines Zero-day attack detection Caught Stuxnet before signatures existed
SIEM Integration IDS alerts + auth logs + access logs Multi-stage attack correlation Links failed auth → anomaly → data exfiltration into single incident
Audit Logging Comprehensive event capture Forensic investigation, compliance HIPAA requires 6-year retention of all access attempts
Automated Response IDS + SIEM Quarantine, blocking, key revocation Verkada breach: auto-quarantine would have limited blast radius

Critical Insight: Monitoring is the final layer of defense-in-depth. Preventive controls (firewalls, encryption) reduce attack surface. Detective controls (IDS, logging) catch attacks that bypass prevention. Response controls (SIEM, SOAR) contain breaches before catastrophic damage.

A SIEM system must ingest, correlate, and alert on security events across all monitored sources. Sizing requires calculating total event volume and peak throughput.

\[\text{Events}_{\text{total}} = \sum_{i=1}^{n} (\text{Devices}_i \times \text{Rate}_i \times \text{LogTypes}_i)\]

Working through an example:

Given: IoT deployment with 500 sensors, 100 gateways, 50 cameras, 10 servers

Step 1: Calculate Per-Source Event Rates

Source Count Events/sec/device Log Types EPS Contribution
Sensors 500 0.017 (1/min) 2 (data + auth) \(500 \times 0.017 \times 2 = 17\)
Gateways 100 0.5 5 (fw, sys, auth, app, ids) \(100 \times 0.5 \times 5 = 250\)
Cameras 50 0.1 3 (motion, auth, error) \(50 \times 0.1 \times 3 = 15\)
Servers 10 10 8 (multi-tier logs) \(10 \times 10 \times 8 = 800\)

\[\text{EPS}_{\text{sustained}} = 17 + 250 + 15 + 800 = 1{,}082 \text{ events/sec}\]

Step 2: Calculate Peak Capacity Requirement

Peak traffic during attacks or firmware updates: \[\text{EPS}_{\text{peak}} = \text{EPS}_{\text{sustained}} \times \text{BurstFactor} = 1{,}082 \times 5 = 5{,}410 \text{ events/sec}\]

Step 3: Calculate Storage Requirements

Assuming 90-day retention with 500 bytes average per event: \[\text{Storage} = \text{EPS}_{\text{sustained}} \times 86{,}400 \frac{\text{sec}}{\text{day}} \times 90 \text{ days} \times 500 \text{ bytes}\] \[= 1{,}082 \times 86{,}400 \times 90 \times 500 \approx 4.2 \text{ TB (uncompressed)}\]

Step 4: Calculate Alert Precision and Recall

True Positives (TP) = 45, False Positives (FP) = 12, False Negatives (FN) = 3 \[\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} = \frac{45}{45 + 12} = 0.789 = 78.9\%\] \[\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} = \frac{45}{45 + 3} = 0.938 = 93.8\%\]

Result: SIEM must handle 1,082 EPS sustained, 5,410 EPS peak, with approximately 4.2 TB storage for 90-day retention. Current tuning achieves 78.9% precision (21.1% false positive rate) and 93.8% recall (catches 93.8% of actual threats).

In practice: IoT generates 10-100x more events than traditional IT (constant sensor telemetry vs occasional user actions). Under-provisioned SIEM drops events during attacks. The math proves you need 5x peak capacity, not just average.

10.10.1 Interactive: SIEM Capacity Planner

Adjust the deployment parameters to estimate SIEM sizing requirements for your IoT network.

10.11 See Also

Foundation Concepts:

Complementary Security:

Advanced Topics:

10.12 Chapter Summary

Security monitoring provides visibility into system behavior, enabling detection of attacks that bypass preventive controls. Signature-based detection identifies known attack patterns with low false positive rates, while anomaly-based detection catches novel attacks by comparing current behavior against established baselines.

Comprehensive logging creates audit trails for incident investigation, compliance verification, and forensic analysis. SIEM systems correlate events across multiple sources to detect complex, multi-stage attacks that individual systems might miss.

The practice exercises reinforce key concepts: RBAC implementation, mTLS configuration, anomaly detection system design, and secure OTA update verification. Together, these monitoring capabilities complete the defense-in-depth security architecture.

10.13 What’s Next

This completes the Cybersecurity Methods series. For the complete overview and links to all topics, see the Cybersecurity Methods Overview.

For deeper exploration of related topics:


← Defense in Depth Controls Cyber Security Methods Overview →

Common Pitfalls

Network-level monitoring detects anomalies in communication patterns but misses on-device indicators of compromise (unexpected processes, modified firmware, abnormal memory access). Combine network monitoring with device health telemetry for complete visibility.

A security monitoring system that produces thousands of alerts per day creates alert fatigue where operators ignore all alerts — including real incidents. Tune alert thresholds, implement correlation rules, and prioritise by asset criticality to keep alert volume manageable.

Centralising device logs in a SIEM without defining detection rules, retention policies, and analyst workflows means the logs are useless in practice. Define what you are looking for before building the collection infrastructure.

Anomaly-based monitoring requires a baseline of normal behaviour to compare against. Deploying monitoring without a baseline produces endless false positives as every normal but unusual activity triggers an alert.