10 Monitoring & Intrusion Detection
10.1 Learning Objectives
By the end of this chapter, you should be able to:
- Differentiate signature-based from anomaly-based intrusion detection and select the appropriate approach for given IoT scenarios
- Design anomaly detection systems that establish behavioral baselines for IoT device monitoring
- Implement comprehensive security logging and audit trail architectures for forensic analysis
- Integrate security events with SIEM systems for cross-device correlation
- Apply security monitoring concepts through hands-on detection exercises
For Beginners: Understanding Security Monitoring
What is Security Monitoring? Security monitoring continuously observes systems for signs of attack or compromise. It’s like having security cameras and guards watching for suspicious behavior, but for digital systems.
Why does it matter for IoT? Even with strong preventive controls (firewalls, encryption), attacks can still succeed. Detection systems catch attacks in progress, enabling rapid response before significant damage occurs. The average time to detect a breach is 194 days (IBM, 2024) – monitoring aims to reduce this dramatically.
Key terms: | Term | Definition | |——|————| | IDS | Intrusion Detection System - monitors for attacks | | IPS | Intrusion Prevention System - blocks detected attacks | | SIEM | Security Information and Event Management - correlates events | | Anomaly Detection | Detecting unusual behavior vs. established baseline | | Signature-based | Detecting known attack patterns |
Sensor Squad: The Night Watch Patrol!
“Shhh, everyone quiet!” Sammy the Sensor whispered, his detection lights dimmed to stealth mode. “I am on night watch duty, monitoring everything that moves on our network. Every single data packet that comes through – I check it!”
Max the Microcontroller nodded. “Sammy does two kinds of watching. First, he has a book of known troublemakers – like a wanted poster list. If data matches a known attack pattern, that is called signature-based detection. But the really clever part is his second trick: he learns what NORMAL looks like, so when something unusual happens – like a sensor suddenly sending ten times more data than usual at 3 AM – he spots it even if it is a brand new type of attack!”
“When Sammy catches something suspicious, that is where I come in!” Lila the LED exclaimed. “I flash bright red alerts to the security team. It is like a burglar alarm for the digital world. The faster we alert, the faster humans can respond and stop the bad guys.”
Bella the Battery added, “And I make sure the monitoring system NEVER runs out of power. Because what good is a security camera that goes dark? The average time to catch a breach without monitoring is almost 200 days. With our Sensor Squad on watch, we aim to catch problems in minutes, not months!”
10.2 Prerequisites
Before diving into this chapter, you should be familiar with:
- Defense in Depth: Where monitoring fits in the security architecture
- Access Control: Understanding what constitutes an access violation
10.3 Introduction
Security monitoring is the detection layer of a defense-in-depth strategy. While preventive controls such as firewalls, encryption, and access control reduce attack surface, no prevention is perfect. Monitoring systems continuously observe network traffic, device behavior, and system events to detect attacks in progress, enabling rapid response before significant damage occurs. This chapter covers two core detection approaches – signature-based and anomaly-based intrusion detection – along with the logging infrastructure and SIEM integration needed to operationalize them in IoT environments.
10.4 Intrusion Detection Systems
Intrusion detection systems monitor network traffic and system activity for malicious activity or policy violations.
How It Works: Signature-Based Detection
Step 1: Build Signature Database
- Security researchers analyze known malware and attacks
- Extract patterns: specific byte sequences, packet structures, behavioral indicators
- Example: Mirai botnet signature = telnet login with username “root” + password from list of 62 defaults
Step 2: Capture Network Traffic
- IDS monitors network packets (NIDS) or host activity (HIDS)
- Every packet analyzed against signature database in real-time
- Low overhead: simple pattern matching (regex or hash comparison)
Step 3: Match and Alert
- If packet matches signature → immediate alert generated
- Alert includes: signature name, source IP, destination, severity, timestamp
- Example: “SQL_INJECTION detected from 192.168.1.100 targeting device 192.168.10.50”
Step 4: Response Action
- IDS (detection): Log alert, notify security team
- IPS (prevention): Block traffic automatically + log + notify
Why It Works: Known attacks have recognizable patterns. Like a burglar alarm programmed to detect specific break-in techniques.
Limitation: Zero-day attacks with no existing signature go undetected. This is why anomaly detection is also needed.
How It Works: Anomaly-Based Detection Using Z-Score
Step 1: Establish Baseline (Learning Phase - 7-30 days)
- Monitor normal device behavior: packet sizes, transmission intervals, destinations contacted
- Calculate statistics: mean (average), standard deviation (variation)
- Example: Temperature sensor baseline = 1024 bytes every 60 seconds ± 50 bytes
Step 2: Compute Z-Score for Each New Observation
Z = (Current Value - Baseline Mean) / Baseline Standard Deviation
If Z > 3: Value is 3+ standard deviations away from normal
→ Only 0.3% of normal observations would be this extreme
Step 3: Threshold Decision
- Z > 3: Flag as MEDIUM severity (investigate)
- Z > 5: Flag as HIGH severity (likely attack)
- Z > 10: Flag as CRITICAL (definitely malicious)
Step 4: Alert and Adapt
- Generate alert with Z-score, current value, baseline for context
- Periodically retrain baseline (monthly) to adapt to legitimate changes
- Example: After firmware update, normal packet size increases from 1KB to 1.5KB → retrain baseline
Why It Works: Attacks cause behavioral changes. Data exfiltration = huge packet sizes. DDoS = massive transmission rate. Both are statistical outliers.
Limitation: Requires accurate baseline. False positives if baseline drifts. Higher computational cost than signature matching.
10.4.1 Signature-Based vs Anomaly-Based Detection
| Aspect | Signature-Based | Anomaly-Based |
|---|---|---|
| How it works | Match patterns against known attack signatures | Compare behavior against established baseline |
| Detection | Known attacks only | Novel attacks + known attacks |
| False positives | Low | Higher (requires tuning) |
| Setup effort | Low (use vendor signatures) | High (must establish baseline) |
| Maintenance | Update signature database | Retrain models periodically |
| Best for | Detecting known malware, exploits | Detecting zero-day attacks, insider threats |
10.4.2 Signature-Based Detection Example
# Signature-based IDS for MQTT (simplified pseudocode)
class SignatureBasedIDS:
def __init__(self):
self.signatures = [
{
"name": "MQTT_Injection",
"pattern": r"[\x00-\x1f]", # Control characters
"action": "block"
},
{
"name": "SQL_Injection",
"pattern": r"(SELECT|INSERT|DELETE|DROP)\s",
"action": "alert"
},
{
"name": "Command_Injection",
"pattern": r"[;&|`$]", # Shell metacharacters
"action": "block"
},
{
"name": "Port_Scan",
"pattern": "5+ connection attempts in 10 seconds",
"action": "alert"
}
]
def analyze_traffic(self, packet):
for sig in self.signatures:
if re.search(sig["pattern"], packet.payload):
self.trigger_alert(sig, packet)
def trigger_alert(self, signature, packet):
alert = {
"type": signature["name"],
"severity": "HIGH" if signature["action"] == "block" else "MEDIUM",
"source_ip": packet.source_ip,
"timestamp": datetime.now()
}
self.send_to_siem(alert)10.4.3 Anomaly-Based Detection Example
# Anomaly-based IDS using statistical analysis
class AnomalyBasedIDS:
def __init__(self):
self.baseline = {} # device_id -> baseline metrics
def establish_baseline(self, device_id, historical_data):
"""Build baseline from 7-14 days of normal traffic"""
self.baseline[device_id] = {
"packet_size": {
"mean": np.mean(historical_data["sizes"]),
"std": np.std(historical_data["sizes"])
},
"interval": {
"mean": np.mean(historical_data["intervals"]),
"std": np.std(historical_data["intervals"])
},
"destinations": set(historical_data["destinations"])
}
def analyze_traffic(self, device_id, current_data):
"""Compare current behavior to baseline"""
if device_id not in self.baseline:
return # No baseline yet
baseline = self.baseline[device_id]
# Check packet size anomaly (z-score > 3)
z_size = (current_data["size"] - baseline["packet_size"]["mean"]) / \
baseline["packet_size"]["std"]
if abs(z_size) > 3:
self.trigger_alert("PACKET_SIZE_ANOMALY", device_id, z_size)
# Check for new destinations
if current_data["destination"] not in baseline["destinations"]:
self.trigger_alert("NEW_DESTINATION", device_id,
current_data["destination"])
# Check transmission interval
z_interval = (current_data["interval"] - baseline["interval"]["mean"]) / \
baseline["interval"]["std"]
if abs(z_interval) > 3:
self.trigger_alert("INTERVAL_ANOMALY", device_id, z_interval)
def trigger_alert(self, alert_type, device_id, details):
alert = {
"type": alert_type,
"device_id": device_id,
"details": details,
"severity": "HIGH" if "NEW_DESTINATION" in alert_type else "MEDIUM",
"timestamp": datetime.now()
}
self.send_to_siem(alert)10.5 Attack Detection Scenarios
| Attack Type | Normal Behavior | Anomalous Behavior | Detection Method |
|---|---|---|---|
| Data Exfiltration | 1 KB packets | 10 MB packets | Packet size z-score > 5 |
| DDoS Participation | 1 packet/min | 1000 packets/sec | Interval z-score > 5 |
| C&C Beaconing | No C&C contact | Contact unknown IP every 5 sec | Unknown destination IP |
| Cryptomining | 5-10% CPU | 95% CPU sustained | CPU usage z-score > 3 |
| Port Scanning | MQTT:8883 only | Scan ports 1-65535 | Connection count > 10 |
10.6 Security Logging and Audit Trails
Comprehensive logging enables incident investigation, compliance verification, and forensic analysis.
10.6.1 What to Log
| Event Category | Specific Events | Retention |
|---|---|---|
| Authentication | Login success/failure, session start/end | 1 year |
| Authorization | Access granted/denied, permission changes | 1 year |
| Configuration | Setting changes, firmware updates | 7 years |
| Network | Connection attempts, data transfers | 90 days |
| Security | IDS alerts, policy violations | 2 years |
| System | Boot, shutdown, errors | 90 days |
10.6.2 Security Log Schema
{
"event_id": "uuid-v4",
"timestamp": "2026-01-10T14:32:15.234Z",
"event_type": "AUTHENTICATION_ATTEMPT",
"outcome": "FAILURE",
"severity": "MEDIUM",
"source": {
"device_id": "sensor-42",
"ip_address": "192.168.1.100",
"mac_address": "AA:BB:CC:DD:EE:FF"
},
"target": {
"service": "MQTT_BROKER",
"endpoint": "mqtt.company.com:8883"
},
"details": {
"reason": "INVALID_CERTIFICATE",
"certificate_cn": "sensor-42.factory.com",
"certificate_expiry": "2025-12-31T23:59:59Z"
},
"context": {
"session_id": "sess-123456",
"request_id": "req-789012"
}
}10.6.3 SIEM Integration
Security Information and Event Management (SIEM) systems correlate events across multiple sources to detect complex attacks.
Knowledge Check: Signature vs Anomaly Detection
Question: A new zero-day exploit targets MQTT brokers by sending malformed CONNECT packets. Your IoT network has both signature-based and anomaly-based IDS deployed. Which system detects this attack first, and why?
Click to reveal answer
Answer: The anomaly-based IDS detects it first. Signature-based IDS requires a pre-existing pattern match for the specific exploit, which does not exist for a zero-day attack. The anomaly-based system detects the malformed packets because they deviate from the established baseline of normal MQTT CONNECT packet structure (size, frequency, payload patterns). After the zero-day is analyzed and a signature is created, the signature-based IDS can then detect future occurrences with lower false-positive rates.
10.7 Practice Exercises
Exercise 1: Implementing RBAC with Audit Logging
Objective: Design and implement a role-based access control system for an IoT platform, with emphasis on the audit logging that feeds into the monitoring pipeline.
Scenario: You’re building a smart building management system with 3 user types: Administrators, Operators, and Viewers.
Tasks:
Define Role-Permission Matrix:
Role Read Sensors Write Data Control Actuators Manage Devices Admin Panel Admin Yes Yes Yes Yes Yes Operator Yes No Yes No No Viewer Yes No No No No Design Access Control Flow:
- User authenticates with credentials or certificate
- System looks up user’s role in the role-permission matrix
- Permission check: Does this role have the required permission?
- If yes, execute action and log success; if no, return 403 and log denial
Test RBAC Enforcement:
Test User Role Action Expected Result 1 Viewer Unlock door 403 Forbidden 2 Operator Delete device 403 Forbidden 3 Admin Unlock door 200 Success 4 (Invalid token) Any 401 Unauthorized
Expected Outcome: Working RBAC implementation with complete audit trail. Every access decision (granted or denied) should be logged in the format described in the Security Log Schema section above, enabling SIEM correlation.
Exercise 2: Mutual TLS (mTLS) Authentication
Objective: Implement mutual TLS authentication for device-to-server communication.
Scenario: Your smart factory has 100 industrial sensors connecting to an MQTT broker. Implement mTLS so only authorized devices can connect.
Tasks:
Create Certificate Authority:
# Generate CA openssl genrsa -out ca.key 4096 openssl req -new -x509 -days 3650 -key ca.key -out ca.crt \ -subj "/C=US/ST=CA/O=MyCompany/CN=MyCompany Root CA" # Generate server certificate openssl genrsa -out server.key 2048 openssl req -new -key server.key -out server.csr \ -subj "/C=US/ST=CA/O=MyCompany/CN=mqtt.mycompany.com" openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key \ -CAcreateserial -out server.crt -days 365 # Generate device certificate openssl genrsa -out device1.key 2048 openssl req -new -key device1.key -out device1.csr \ -subj "/C=US/ST=CA/O=MyCompany/CN=device-001" openssl x509 -req -in device1.csr -CA ca.crt -CAkey ca.key \ -CAcreateserial -out device1.crt -days 365Configure MQTT broker for mTLS:
listener 8883 certfile /etc/mosquitto/certs/server.crt keyfile /etc/mosquitto/certs/server.key cafile /etc/mosquitto/certs/ca.crt require_certificate true use_identity_as_username trueTest Security:
- Device with valid certificate connects successfully
- Device without certificate is rejected
- Device with wrong certificate is rejected
- Device rejects connection to fake broker
Expected Outcome: Bidirectional authentication with full audit trail. Log all connection attempts (successful and rejected) so your IDS can detect patterns like certificate brute-forcing or unauthorized device enrollment.
Exercise 3: Intrusion Detection with Anomaly Detection
Objective: Implement anomaly-based intrusion detection for IoT devices.
Scenario: Your smart home temperature sensors normally send 1KB every 60 seconds. Build a system to detect unusual behavior.
Tasks:
Establish Normal Baseline:
Metric Normal Value Purpose Packet Size 1024 bytes +/- 50 Detect data exfiltration Interval 60 sec +/- 5 Detect beaconing/DDoS Destinations mqtt.company.com only Detect C&C communication Implement Detection:
def detect_anomaly(current, baseline): z_score = (current - baseline['mean']) / baseline['std'] if abs(z_score) > 3: return "MEDIUM" if abs(z_score) <= 5 else "HIGH" return NoneTest Detection Scenarios:
Attack Normal Anomalous Detection Exfiltration 1 KB 10 MB z > 5 DDoS 1/min 1000/sec z > 5 C&C Known IP Unknown IP New destination
Expected Outcome: Working anomaly detection with tuned thresholds and false positive analysis.
Exercise 4: Secure OTA Updates with Signature Verification
Objective: Build a secure over-the-air update system with cryptographic verification.
Scenario: You have 1,000 IoT sensors deployed. Push a security patch without physical access.
Tasks:
Generate signing keys:
openssl genrsa -out firmware_signing_key.pem 2048 openssl rsa -in firmware_signing_key.pem -pubout -out firmware_public_key.pemSign firmware:
openssl dgst -sha256 -sign firmware_signing_key.pem \ -sigopt rsa_padding_mode:pss \ -out firmware_v2.0.sig \ firmware_v2.0.binVerify on device (before installation):
- Compute SHA-256 hash of downloaded firmware
- Verify RSA signature with embedded public key
- Reject if signature invalid
- Install only if verification passes
Test Security:
- Correct signature: Installation succeeds
- Modified firmware: Signature fails, blocked
- Wrong key signature: Verification fails, blocked
- Old firmware (downgrade): Version check fails
Expected Outcome: Secure OTA with anti-rollback protection. All update events (download, verification, installation, rollback) should be logged and forwarded to the SIEM for fleet-wide anomaly detection (e.g., unexpected update failures across multiple devices).
10.8 Knowledge Checks
10.9 Case Study: Verkada Camera Breach (2021)
Real-World Case Study: How Anomaly Detection Could Have Stopped the Verkada Hack
Company: Verkada, a cloud-based security camera company serving 5,200+ customers including hospitals, prisons, schools, and Tesla factories.
What happened (March 2021): A hacking group gained access to Verkada’s internal support tools using a hardcoded administrator credential found in an exposed Jenkins server. They accessed live feeds from 150,000+ cameras, including sensitive locations like Halifax Health hospital patient rooms and Tesla manufacturing floors.
Timeline:
- Attackers found exposed Jenkins CI/CD server with default credentials
- Used Jenkins access to discover a super-admin service account password
- Super-admin account had unrestricted access to all customer cameras
- Attackers had access for approximately 36 hours before Verkada detected the breach
- Detection came from a public tweet by the attackers, not from monitoring systems
What monitoring should have caught:
| Anomaly Signal | Normal Baseline | Attack Behavior | Detection Method |
|---|---|---|---|
| Admin login location | Mountain View, CA (HQ) | Multiple countries simultaneously | Geo-velocity check: impossible travel speed |
| Camera access pattern | Support tickets access 1-3 cameras per session | 150,000+ cameras accessed in batch | Volume anomaly: z-score > 100 |
| Access time pattern | Business hours, weekdays | Continuous access 24/7 over 36 hours | Time-based anomaly detection |
| Data export volume | ~50 MB/day for support | Terabytes of video exported | Egress volume z-score > 50 |
| Account behavior | 1 concurrent session per admin | Same account active from 10+ IPs | Session multiplicity alert |
Cost of inadequate monitoring:
- SEC investigation and regulatory fines
- $100+ million class-action lawsuits from affected customers
- Loss of customer trust – several major clients (including Tesla) publicly reviewed their contracts
- Verkada had to rebuild its entire access control architecture
Key lessons for IoT security monitoring:
- Never use hardcoded credentials – but if they exist, anomaly detection on privileged accounts catches misuse
- Baseline every admin account: typical login times, locations, access volumes, and session patterns
- Implement impossible-travel detection: if an account logs in from New York and London within 30 minutes, it is compromised
- Alert on access volume spikes: no legitimate admin needs to view 150,000 cameras in one session
- Detect data exfiltration: monitor egress bandwidth per account – terabytes of video export should trigger immediate investigation
The 36-hour detection gap is significant. With the anomaly-based IDS techniques described in this chapter (z-score thresholds on access volume, geo-velocity checks, session multiplicity monitoring), automated alerts would have fired within minutes of the attackers beginning to access cameras at scale.
Worked Example: Detecting Data Exfiltration Using Z-Score Anomaly Detection
A smart factory has 200 IoT sensors. Each sensor normally sends 10 KB of telemetry data per hour to an MQTT broker. Sensor ID 42 suddenly sends 50 MB in one hour. Calculate whether this triggers an anomaly alert using z-score threshold detection.
Baseline (established over 30 days of normal operation):
Mean (μ): 10,240 bytes/hour per sensor
Standard deviation (σ): 512 bytes/hour (sensors are consistent)
Sample size: 200 sensors × 720 hours = 144,000 observations
Current observation (Sensor 42, hour 1842):
Actual traffic: 50,000,000 bytes (50 MB)
Z-Score Calculation:
z = (X - μ) / σ
Where:
X = current value = 50,000,000 bytes
μ = baseline mean = 10,240 bytes
σ = baseline standard deviation = 512 bytes
z = (50,000,000 - 10,240) / 512
z = 49,989,760 / 512
z = 97,636
Absolute z-score: |97,636| = 97,636
Threshold Evaluation:
Alert thresholds (from security policy):
MEDIUM severity: z > 3 (only 0.3% of normal values exceed this)
HIGH severity: z > 5 (only 0.00006% of normal values exceed this)
CRITICAL severity: z > 10 (virtually certain attack)
Sensor 42 z-score: 97,636
>> 10 (CRITICAL threshold)
Severity: CRITICAL
Alert Details:
{
"alert_id": "ALT-2026-01-08-001842",
"severity": "CRITICAL",
"type": "DATA_EXFILTRATION_DETECTED",
"device_id": "sensor-042",
"metric": "hourly_bytes_transmitted",
"baseline_mean": 10240,
"baseline_std": 512,
"current_value": 50000000,
"z_score": 97636,
"deviation_factor": "4,882× baseline",
"timestamp": "2026-01-08T14:32:00Z",
"recommended_action": "ISOLATE_DEVICE_IMMEDIATELY"
}Interpretation:
The z-score of 97,636 indicates the current traffic is 97,636 standard deviations away from the mean. In a normal distribution: - z=3: 99.7% of observations fall within (1 in 370 is beyond) - z=5: 99.99994% within (1 in 1.7 million beyond) - z=97,636: Probability of natural occurrence ≈ 10^-2,074,000,000 (essentially impossible)
Conclusion: This is definitively an attack, not random variation. Sensor 42 is either compromised (data exfiltration) or malfunctioning. Automated response: Isolate sensor from network immediately, trigger investigation workflow.
Key Insight: Z-score detection requires no prior knowledge of attack signatures. The algorithm simply recognizes “this behavior is statistically impossible under normal conditions” and flags it, even if the attack is zero-day malware never seen before.
10.9.1 Interactive: Z-Score Anomaly Detector
Adjust the parameters below to see how z-score anomaly detection works with different baselines and observed values.
Decision Framework: Choosing Between Signature-Based and Anomaly-Based IDS
You’re securing a 500-device IoT network with constrained budget. Compare signature-based vs anomaly-based intrusion detection systems.
| Criterion | Signature-Based IDS | Anomaly-Based IDS | Hybrid IDS | Best For |
|---|---|---|---|---|
| Detects Known Attacks | Excellent (99%+) | Good (85-90%) | Excellent (99%+) | CVE exploits, malware |
| Detects Zero-Day Attacks | Poor (0%) | Excellent (80-95%) | Excellent (90%+) | Novel threats |
| False Positive Rate | Very Low (0.1-1%) | Medium-High (5-20%) | Low (1-5%) | Operations impact |
| Setup Effort | Low (load vendor signatures) | High (30-90 day baseline) | High (baseline + signatures) | Time to deployment |
| Maintenance | Medium (update signatures monthly) | Medium (retrain quarterly) | High (both) | Ongoing effort |
| Resource Usage | Low (pattern matching) | High (statistical computation) | High (both engines) | Constrained gateways |
| Cost | $5-20/device/year | $10-40/device/year | $20-60/device/year | Budget |
Decision Criteria:
Use Signature-Based IDS if:
- Protecting against known threats (SQL injection, command injection, malware)
- Low false positive tolerance (can’t afford alarm fatigue)
- Limited baseline data (new deployment, <30 days operational)
- Constrained gateway resources (ESP32, Raspberry Pi)
- Example: Small smart home (20 devices, consumer-grade gateway)
- Cost: $100-400/year for 20 devices
- Limitation: Won’t catch novel attacks (assumes attacker uses known techniques)
Use Anomaly-Based IDS if:
- High-value target likely to face zero-day attacks (critical infrastructure, defense)
- Can afford 5-20% false positive rate (security team can investigate)
- Have 30-90 days of clean traffic to establish baseline
- Sufficient gateway resources (x86 server, commercial edge gateway)
- Example: Industrial control system (500 devices, dedicated security team)
- Cost: $5,000-20,000/year for 500 devices
- Benefit: Detected Stuxnet-like attacks before signatures existed
Use Hybrid IDS if:
- Need both known threat protection and zero-day detection
- Can afford higher cost ($20-60/device/year)
- Have security operations center (SOC) to handle 1-5% false positive rate
- Example: Smart city infrastructure (2,000+ devices, city IT department)
- Cost: $40,000-120,000/year for 2,000 devices
- Approach: Signature-based catches 90% of known attacks (auto-block), anomaly-based flags suspicious behavior for human review
Recommended Hybrid Configuration (80% of Enterprise IoT Deployments):
Layer 1: Signature-Based (Snort/Suricata)
- Block known attacks automatically
- Rules for MQTT injection, CoAP exploits, Modbus attacks
- Update signatures weekly from vendor feeds
Layer 2: Anomaly Detection (Custom/Darktrace IoT)
- Flag z-score > 3 as MEDIUM (human review)
- Flag z-score > 5 as HIGH (auto-isolate + review)
- Metrics: packet size, interval, destination IPs, protocol usage
- Retrain baseline monthly (rolling 30-day window)
Layer 3: SIEM Correlation (Splunk/ELK)
- Correlate signature hits + anomaly alerts + authentication logs
- Multi-stage attack detection (e.g., scan → exploit → exfiltration)
Budget Allocation:
- $10K budget → Signature-based only (Snort + pfSense)
- $50K budget → Hybrid (Snort + basic anomaly + SIEM)
- $200K+ budget → Commercial hybrid (Darktrace IoT + CrowdStrike + SIEM)
Common Mistake: Setting Anomaly Detection Thresholds Too Tight Without Tuning Period
The Error: A developer deploys anomaly-based IDS with default threshold z > 2 (95% confidence) without a tuning period, expecting to catch all suspicious behavior immediately. Within 24 hours, the security team receives 847 alerts. Investigation reveals 839 (99%) are false positives caused by legitimate but unusual events (firmware update, weekend traffic drop, network maintenance).
Why It Fails: Statistical anomaly detection requires understanding normal variance before defining “abnormal”. A z-score threshold of 2 means 5% of all observations will trigger alerts even under perfectly normal conditions (by definition of 95% confidence interval). For a network with 500 devices generating 1 metric per minute:
Total observations per day: 500 devices × 1,440 minutes = 720,000 observations
Expected false positives at z > 2 threshold:
720,000 × 5% = 36,000 alerts/day (99% false positives)
Expected false positives at z > 3 threshold:
720,000 × 0.3% = 2,160 alerts/day
Expected false positives at z > 5 threshold:
720,000 × 0.00006% = 0.4 alerts/day (mostly true positives)
The Impact:
The security team is overwhelmed by 847 alerts on day 1, spends 16 hours investigating, finds all but 8 are false positives, and disables the IDS entirely to stop the noise. Two weeks later, an actual data exfiltration attack (z=37) goes undetected because monitoring was turned off.
Real-World Example (2018 Healthcare IoT Deployment): A hospital deployed anomaly detection across 1,200 medical IoT devices (infusion pumps, monitors, beds). Initial threshold z > 2 generated 3,000+ alerts/day. Security staff investigated for 2 weeks, found 99.7% false positives (nurse call button spikes during shift change, bed weight sensors during patient transfer). They increased threshold to z > 4, reducing alerts to 12/day, and added 30-day tuning period to learn shift-change patterns. After tuning, false positive rate dropped to 5%, and the system detected a ransomware infection 6 minutes after initial device compromise (before lateral spread).
How to Avoid:
Phase 1: Baseline Period (30-90 days)
- Run IDS in monitoring-only mode (no alerts, just collect data)
- Capture full range of normal variation (weekday/weekend, business hours, maintenance windows)
- Calculate mean and standard deviation per metric per device
- Identify expected high-variance events (firmware updates, backups, shift changes)
Phase 2: Tuning Period (30 days)
- Start with conservative threshold: z > 5 (0.00006% false positive rate)
- Log all alerts, but don’t block traffic (alert-only mode)
- Review all alerts daily, classify as true positive or false positive
- Adjust thresholds based on false positive rate:
- Goal: <5% false positive rate per day
- If too many false positives: increase threshold to z > 6 or z > 7
- If no true positives in 30 days: decrease threshold to z > 4
Phase 3: Production (ongoing)
- Enable automated response for z > 7 (critical severity, very rare false positives)
- Human review for z > 4 (high severity, moderate false positive rate)
- Retrain baseline quarterly (rolling 90-day window)
Threshold Selection Guide: | Z-Score | False Positive Rate | Alert Volume (500 devices) | Use Case | |———|———————|—————————-|———-| | > 2 | 5% | 36,000/day | Never use (too many alerts) | | > 3 | 0.3% | 2,160/day | Tuning phase only | | > 4 | 0.006% | 43/day | Human review, normal operations | | > 5 | 0.00006% | 0.4/day | Auto-alert, high confidence | | > 7 | ~0% | 0.01/day | Auto-block, critical severity |
10.10 Concept Relationships
How Monitoring Concepts Connect
| Core Concept | Builds Upon | Enables | Real-World Application |
|---|---|---|---|
| Signature-Based IDS | Known attack patterns | Detection of Mirai, SQL injection, port scans | Blocks 90% of automated attacks with near-zero false positives |
| Anomaly-Based IDS | Statistical baselines | Zero-day attack detection | Caught Stuxnet before signatures existed |
| SIEM Integration | IDS alerts + auth logs + access logs | Multi-stage attack correlation | Links failed auth → anomaly → data exfiltration into single incident |
| Audit Logging | Comprehensive event capture | Forensic investigation, compliance | HIPAA requires 6-year retention of all access attempts |
| Automated Response | IDS + SIEM | Quarantine, blocking, key revocation | Verkada breach: auto-quarantine would have limited blast radius |
Critical Insight: Monitoring is the final layer of defense-in-depth. Preventive controls (firewalls, encryption) reduce attack surface. Detective controls (IDS, logging) catch attacks that bypass prevention. Response controls (SIEM, SOAR) contain breaches before catastrophic damage.
Putting Numbers to It: SIEM Event Rate Capacity Planning
A SIEM system must ingest, correlate, and alert on security events across all monitored sources. Sizing requires calculating total event volume and peak throughput.
\[\text{Events}_{\text{total}} = \sum_{i=1}^{n} (\text{Devices}_i \times \text{Rate}_i \times \text{LogTypes}_i)\]
Working through an example:
Given: IoT deployment with 500 sensors, 100 gateways, 50 cameras, 10 servers
Step 1: Calculate Per-Source Event Rates
| Source | Count | Events/sec/device | Log Types | EPS Contribution |
|---|---|---|---|---|
| Sensors | 500 | 0.017 (1/min) | 2 (data + auth) | \(500 \times 0.017 \times 2 = 17\) |
| Gateways | 100 | 0.5 | 5 (fw, sys, auth, app, ids) | \(100 \times 0.5 \times 5 = 250\) |
| Cameras | 50 | 0.1 | 3 (motion, auth, error) | \(50 \times 0.1 \times 3 = 15\) |
| Servers | 10 | 10 | 8 (multi-tier logs) | \(10 \times 10 \times 8 = 800\) |
\[\text{EPS}_{\text{sustained}} = 17 + 250 + 15 + 800 = 1{,}082 \text{ events/sec}\]
Step 2: Calculate Peak Capacity Requirement
Peak traffic during attacks or firmware updates: \[\text{EPS}_{\text{peak}} = \text{EPS}_{\text{sustained}} \times \text{BurstFactor} = 1{,}082 \times 5 = 5{,}410 \text{ events/sec}\]
Step 3: Calculate Storage Requirements
Assuming 90-day retention with 500 bytes average per event: \[\text{Storage} = \text{EPS}_{\text{sustained}} \times 86{,}400 \frac{\text{sec}}{\text{day}} \times 90 \text{ days} \times 500 \text{ bytes}\] \[= 1{,}082 \times 86{,}400 \times 90 \times 500 \approx 4.2 \text{ TB (uncompressed)}\]
Step 4: Calculate Alert Precision and Recall
True Positives (TP) = 45, False Positives (FP) = 12, False Negatives (FN) = 3 \[\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} = \frac{45}{45 + 12} = 0.789 = 78.9\%\] \[\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} = \frac{45}{45 + 3} = 0.938 = 93.8\%\]
Result: SIEM must handle 1,082 EPS sustained, 5,410 EPS peak, with approximately 4.2 TB storage for 90-day retention. Current tuning achieves 78.9% precision (21.1% false positive rate) and 93.8% recall (catches 93.8% of actual threats).
In practice: IoT generates 10-100x more events than traditional IT (constant sensor telemetry vs occasional user actions). Under-provisioned SIEM drops events during attacks. The math proves you need 5x peak capacity, not just average.
10.10.1 Interactive: SIEM Capacity Planner
Adjust the deployment parameters to estimate SIEM sizing requirements for your IoT network.
10.11 See Also
Foundation Concepts:
- Defense in Depth - Monitoring fits in the detection layer of the security architecture
- Access Control - What access violations to detect
- Threats and Vulnerabilities - Attack patterns that IDS signatures target
Complementary Security:
- Network Segmentation - Limits lateral movement that IDS must monitor
- Intrusion Detection - IoT-specific IDS deployment strategies
- Secure Communications - Protecting the data being monitored
Advanced Topics:
- Zero Trust Security - Continuous verification complements monitoring
- Incident Response - What to do when IDS triggers alerts
10.12 Chapter Summary
Security monitoring provides visibility into system behavior, enabling detection of attacks that bypass preventive controls. Signature-based detection identifies known attack patterns with low false positive rates, while anomaly-based detection catches novel attacks by comparing current behavior against established baselines.
Comprehensive logging creates audit trails for incident investigation, compliance verification, and forensic analysis. SIEM systems correlate events across multiple sources to detect complex, multi-stage attacks that individual systems might miss.
The practice exercises reinforce key concepts: RBAC implementation, mTLS configuration, anomaly detection system design, and secure OTA update verification. Together, these monitoring capabilities complete the defense-in-depth security architecture.
10.13 What’s Next
This completes the Cybersecurity Methods series. For the complete overview and links to all topics, see the Cybersecurity Methods Overview.
For deeper exploration of related topics:
- Threats and Vulnerabilities - Understanding what you’re monitoring for
- Threat Modeling - Systematic approach to identifying risks
| ← Defense in Depth Controls | Cyber Security Methods Overview → |
Common Pitfalls
1. Monitoring network traffic but not device behaviour
Network-level monitoring detects anomalies in communication patterns but misses on-device indicators of compromise (unexpected processes, modified firmware, abnormal memory access). Combine network monitoring with device health telemetry for complete visibility.
2. Generating too many alerts to act on
A security monitoring system that produces thousands of alerts per day creates alert fatigue where operators ignore all alerts — including real incidents. Tune alert thresholds, implement correlation rules, and prioritise by asset criticality to keep alert volume manageable.
3. Collecting logs but not analysing them
Centralising device logs in a SIEM without defining detection rules, retention policies, and analyst workflows means the logs are useless in practice. Define what you are looking for before building the collection infrastructure.
4. Not establishing a normal behaviour baseline before monitoring for anomalies
Anomaly-based monitoring requires a baseline of normal behaviour to compare against. Deploying monitoring without a baseline produces endless false positives as every normal but unusual activity triggers an alert.