135  SDN Analytics & OpenFlow

In 60 Seconds

OpenFlow statistics collection queries five counter types: flow (packet/byte counts per rule), port (RX/TX per interface), table (active entries, lookups, matches), queue (queue depth, drops), and meter (rate limiting enforcement). Poll every 5-10 seconds for real-time monitoring or 30-60 seconds for baseline establishment. Rolling-window baselines (24-hour windows with 2-sigma thresholds) enable anomaly detection with <1% false positive rate for stable IoT networks.

135.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Collect OpenFlow Statistics: Query flow, port, table, queue, and meter statistics from switches using standardized OpenFlow messages
  • Configure Polling Intervals: Justify polling interval selections that balance detection speed against controller overhead
  • Construct Analytics Workflows: Build three-step monitoring pipelines (collection, detection, response) for IoT environments
  • Establish Baselines: Calculate rolling-window baselines and statistical thresholds for anomaly detection
  • Scale Analytics Systems: Design sampling and tiered polling strategies for large network deployments exceeding 500 switches

Software-Defined Networking (SDN) separates the brain of a network (the control plane) from the muscles (the data plane). Think of a traffic management center: instead of each traffic light making its own decisions, a central system monitors all intersections and coordinates them for optimal flow. SDN brings this same centralized intelligence to IoT networks.

135.2 Prerequisites

Before diving into this chapter, you should be familiar with:

Pitfall: Polling Statistics Too Frequently and Overloading the Controller

The Mistake: Setting aggressive polling intervals (every 1-5 seconds) across all switches and all flow tables to achieve “real-time” visibility, which overwhelms the controller CPU and causes delayed responses to actual network events.

Why It Happens: Teams equate faster polling with better security and visibility. They don’t calculate the actual message load: polling 100 switches every 5 seconds with 1000 flows each generates 20,000 statistics messages per second. The controller becomes a bottleneck, ironically reducing its ability to respond quickly to genuine threats.

The Fix: Use tiered polling intervals based on criticality: 10-15 seconds for port statistics, 15-30 seconds for flow statistics, 30-60 seconds for table statistics. For large networks (more than 500 switches), implement sampling where you poll 10-20% of switches each interval on a rotating basis. Use event-driven collection (PACKET_IN triggers) for suspicious flows rather than constant polling. Monitor controller CPU and message queue depth as key health metrics. If you need sub-second visibility for specific flows, install those flows with counters and poll only those entries, not the entire flow table.

135.3 OpenFlow Statistics Collection

OpenFlow protocol provides standardized statistics messages for network monitoring:

OpenFlow Statistics Collection process: SDN Controller sends statistics request to OpenFlow Switch, switch provides five types of statistics (Flow Stats with per-flow counters at 15-30s intervals, Port Stats with RX/TX counters at 10-15s, Table Stats for flow table usage at 30-60s, Queue Stats for QoS metrics at 15-30s, Meter Stats for rate limiting at 15-30s), statistics reply sent to processing and analysis layer, periodic loop back to controller
Figure 135.1: OpenFlow Statistics Collection: Five Types of Switch Metrics

OpenFlow Statistics Types:

Statistics Type Information Provided Update Frequency IoT Use Case
Flow Stats Per-flow packet/byte counts, duration 15-30s Identify elephant flows, detect DDoS
Port Stats Per-port RX/TX counters, errors, drops 10-15s Monitor device health, detect failures
Table Stats Flow table utilization, lookups, matches 30-60s Capacity planning, rule optimization
Queue Stats Per-queue packet counts, errors 15-30s QoS verification, priority enforcement
Meter Stats Rate-limiting statistics, band counts 15-30s Verify rate limits, adjust thresholds
Group Stats Multi-path forwarding statistics 30-60s Load balancing analysis

135.4 Implementation Workflow

Scenario: Monitor IoT sensor network for unusual traffic patterns

Three-Step Implementation Process:

SDN Analytics Implementation Workflow showing three sequential steps: Step 1 Configure Collection (register switches, start polling thread with 15 second intervals), Step 2 Process and Detect (extract flow metrics, compare against baseline, flag anomalies exceeding 3x normal), Step 3 Automated Response (create rate-limiting meter, install flow rule, log action), with Baseline DB storing mean +/- 2 standard deviations over 24-hour window providing comparison data and receiving updates
Figure 135.2: SDN Analytics Implementation: Three-Step Workflow with Baseline Comparison

135.4.1 Step 1: Configure Periodic Statistics Collection

The controller maintains connections to all switches and periodically requests statistics:

  • Initialize Data Structures: Store switch connections and historical flow statistics
  • Start Monitoring Thread: Background process polls switches every 15 seconds
  • Send Statistics Requests: OpenFlow messages (FlowStatsRequest, PortStatsRequest) to each switch

135.4.2 Step 2: Process Statistics and Detect Anomalies

When statistics replies arrive, the controller analyzes traffic patterns:

  • Extract Flow Metrics: Parse source/destination IPs, packet counts, byte counts, duration
  • Calculate Rates: packets_per_sec = packet_count / duration
  • Compare Against Baseline: Retrieve historical mean for source IP
  • Flag Anomalies: If current rate > 3x baseline, trigger alert and mitigation

135.4.3 Step 3: Automated Response Implementation

Upon detecting an anomaly, the controller installs mitigation rules:

  • Create Meter: OpenFlow meter band with rate limit (e.g., 100 kbps) and burst size
  • Install Flow Rule: Match suspicious source IP, apply meter, forward normally (rate-limited)
  • Log Action: Record mitigation for auditing and future analysis

135.5 Baseline Establishment Strategy

Aspect Implementation Approach
Data Collection Store per-source metrics (packets/sec, bytes/sec) in time-series database
Window Size Rolling 24-hour window for typical daily patterns
Statistical Model Calculate mean (μ) and standard deviation (σ)
Normal Range μ +/- 2σ captures 95% of traffic under normal conditions
Anomaly Threshold Alert when current rate > μ + 3σ (99.7% confidence)
Cold Start Use default baseline (e.g., 10 pps) for new devices with <10 samples

135.6 Performance Considerations

Polling Interval Tradeoff: Faster detection vs. controller overhead

Interval Use Case Controller Impact
5-10 seconds Critical infrastructure requiring rapid response High (reserve for small networks)
15-30 seconds Typical IoT deployments Moderate (recommended default)
60-120 seconds Low-priority monitoring with minimal overhead Low (suitable for large networks)

Scalability Analysis:

  • 1000 flows x 15-second polling = ~67 statistics messages/second
  • Modern controllers handle 10,000+ messages/second
  • Use sampling for very large networks (monitor 10% of flows, rotate coverage)

Storage Requirements:

  • ~100 samples/source x 1000 sources x 50 bytes/sample = 5 MB (manageable in-memory)
  • For persistent storage, use time-series databases (InfluxDB, TimescaleDB)

Calculating Statistics Message Load

A campus network with 200 switches and average 500 active flows per switch needs continuous monitoring. Calculate controller message load:

Flow statistics polling at 15-second intervals: - Total flows: \(N_{flows} = 200 \times 500 = 100{,}000\) flows - Messages per poll cycle: 200 stats requests + 100,000 stats replies - Cycle duration: 15 seconds

Message rate:

\[R_{messages} = \frac{N_{requests} + N_{replies}}{T_{cycle}} = \frac{200 + 100{,}000}{15} = 6{,}680 \text{ messages/second}\]

Bandwidth consumption (assuming 128 bytes per stats reply):

\[B_{stats} = \frac{100{,}000 \times 128 \text{ bytes}}{15 \text{ s}} = \frac{12.8 \text{ MB}}{15} = 0.85 \text{ MB/s} = 6.8 \text{ Mbps}\]

Controller CPU (2 ms processing per 1000 stats):

\[T_{CPU} = \frac{100{,}000}{1{,}000} \times 0.002 = 0.2 \text{ seconds of CPU per 15-second cycle} = 1.33\% \text{ utilization}\]

Scaling to 1,000 switches: \(R_{messages} = 33{,}400\) msg/s, \(B_{stats} = 34\) Mbps, \(CPU = 6.7\%\). Still manageable.

Bottleneck emerges at 5,000 switches: \(R_{messages} = 167{,}000\) msg/s exceeds typical controller capacity (10K-50K msg/s). Solution: sampling (poll 20% of switches per cycle, rotating), reducing load to \(33{,}400\) msg/s while maintaining coverage.

Tiered Polling Strategy for Large Networks:

Tier 1 (Critical): 10-second polling
  - Edge switches connecting high-value assets
  - Internet gateway switches
  - ~10% of switches

Tier 2 (Standard): 30-second polling
  - Distribution layer switches
  - Building aggregation
  - ~30% of switches

Tier 3 (Low Priority): 60-second polling
  - Access layer switches
  - Low-traffic segments
  - ~60% of switches

135.6.1 Interactive: Statistics Message Load Calculator

Calculate the controller message load for your network deployment and determine if tiered polling is needed.

135.7 Knowledge Check

135.8 Worked Example: Detecting IoT Botnet Traffic in a Smart Campus

Scenario: A university campus has 3,000 IoT devices (smart thermostats, occupancy sensors, IP cameras) managed via an SDN controller (ONOS) with 12 OpenFlow switches. Security operations receive an alert that compromised IoT devices are being used in a DDoS botnet. Design an OpenFlow statistics-based detection and mitigation pipeline.

Network Characteristics:

Device Type Count Normal Traffic Update Interval
Thermostats 1,500 2 pps (MQTT publish) 60 seconds
Occupancy sensors 1,000 0.5 pps (CoAP observe) 30 seconds
IP cameras 500 200 pps (RTSP stream) Continuous

Step 1: Establish Normal Traffic Baselines

Collect 7 days of flow statistics to build per-device-type baselines:

Thermostat baseline (1,500 devices):
  Mean: 2.1 pps    Std dev: 0.8 pps
  Normal range (mean +/- 2 sigma): 0.5 - 3.7 pps
  Anomaly threshold (mean + 3 sigma): 4.5 pps

Occupancy sensor baseline (1,000 devices):
  Mean: 0.6 pps    Std dev: 0.3 pps
  Normal range: 0.0 - 1.2 pps
  Anomaly threshold: 1.5 pps

IP camera baseline (500 devices):
  Mean: 195 pps    Std dev: 35 pps
  Normal range: 125 - 265 pps
  Anomaly threshold: 300 pps

Step 2: Configure Tiered Polling

Tier 1 (15-second polling): 2 edge switches connecting
  cameras and external-facing ports
  Statistics messages: 2 switches x (flow + port) = 4 msgs
  per 15s = 0.27 msgs/sec

Tier 2 (30-second polling): 6 distribution switches
  connecting building aggregation
  Statistics messages: 6 x 2 = 12 msgs per 30s = 0.4 msgs/sec

Tier 3 (60-second polling): 4 access switches
  for low-traffic sensor VLANs
  Statistics messages: 4 x 2 = 8 msgs per 60s = 0.13 msgs/sec

Total controller load: 0.8 msgs/sec (well within ONOS
  capacity of 10,000+ msgs/sec)

Step 3: Detect Anomaly

At 2:47 AM on Tuesday, the controller detects anomalous flow statistics:

Metric Normal Detected Multiplier
127 thermostats: outbound pps 2.1 pps each 850 pps each 405x
Destination diversity 1-3 IPs (MQTT broker) 47,000 unique IPs 15,667x
Packet size 120-200 bytes (MQTT) 64 bytes (SYN flood) Suspicious
Protocol TCP port 8883 (MQTTS) TCP port 80 (HTTP) Wrong protocol

Detection logic: 127 thermostats exceeded the 4.5 pps threshold simultaneously. Destination diversity (47,000 unique IPs instead of 1-3) and protocol mismatch (HTTP instead of MQTTS) confirm botnet behavior. Aggregate attack traffic: 127 devices x 850 pps x 64 bytes = 6.9 Mbps outbound.

Step 4: Automated Mitigation

The controller executes a three-phase response within 2 seconds of detection:

Phase 1 (immediate, 200ms): Rate-limit compromised devices
  For each of 127 flagged thermostats:
    Install OpenFlow meter: 5 pps max (above normal 2.1 pps,
    preserves legitimate MQTT traffic)
    Install flow rule: match src_ip=<thermostat>,
    apply meter, forward normally

Phase 2 (1 second): Quarantine network segment
  Install flow rule on edge switches:
    match src_ip=10.20.0.0/16 (thermostat VLAN),
    dst_port=80 -> DROP
  Result: Block all HTTP from thermostat VLAN
  (MQTTS on port 8883 still allowed)

Phase 3 (5 seconds): Alert and log
  REST API call to SIEM: incident details, affected MACs
  Update baseline: exclude anomaly window from rolling average
  Generate report: 127 compromised devices, 6.9 Mbps attack

Step 5: Measure Effectiveness

Metric Before Mitigation After Phase 1 After Phase 2
Attack traffic 6.9 Mbps 0.04 Mbps 0 Mbps
Legitimate MQTT 100% delivered 100% delivered 100% delivered
Detection-to-mitigation - 200 ms 1.2 seconds
False positives (other thermostats) - 0 (per-IP targeting) 3 (VLAN-wide HTTP block)

Key lessons:

  1. Per-device baselines catch botnet behavior that aggregate monitoring misses: 127 out of 1,500 thermostats (8.5%) were compromised. Aggregate thermostat VLAN traffic only increased 36%, which might not trigger aggregate thresholds. But per-device monitoring flagged every compromised device individually.

  2. Protocol-aware detection reduces false positives: Thermostats should never generate HTTP traffic. Protocol mismatch detection identified the attack before rate thresholds alone would have. Combining rate anomaly + destination anomaly + protocol anomaly achieves near-zero false positives.

  3. Tiered mitigation preserves legitimate services: Rate-limiting (Phase 1) immediately reduces attack impact while preserving legitimate MQTT. VLAN-level protocol blocking (Phase 2) stops the attack completely. Neither action disrupts normal thermostat operation because legitimate traffic uses MQTTS on port 8883, not HTTP on port 80.

  4. SDN response time vs traditional networks: Traditional network response requires manual firewall rule changes (15-30 minutes). SDN automated response: 1.2 seconds from detection to full mitigation. For a 6.9 Mbps attack, this difference means preventing 621 MB of attack traffic.

Key Concepts

  • SDN (Software-Defined Networking): An architectural approach separating the network control plane (routing decisions) from the data plane (packet forwarding), centralizing control in a software controller for programmable network management
  • Control Plane: The network intelligence layer making routing and forwarding decisions, centralized in an SDN controller rather than distributed across individual switches as in traditional networking
  • Data Plane: The network forwarding layer physically moving packets based on rules installed by the control plane — in SDN, this is the switch hardware executing OpenFlow flow table entries
  • OpenFlow: The foundational SDN protocol enabling communication between an SDN controller and network switches, allowing the controller to install, modify, and delete flow table entries that govern packet forwarding
  • Flow Table Statistics: Per-flow byte and packet counters maintained by OpenFlow switches, polled by the controller via OFPST_FLOW requests to track traffic volumes, detect inactive flows, and populate analytics dashboards
  • Port Statistics: Per-physical-port counters (TX/RX bytes, packets, errors, dropped) available via OFPST_PORT requests, used to detect link utilization, errors, and congestion on SDN switch interfaces
  • Proactive Flow Installation: Pre-installing flow rules in switches before traffic arrives based on predicted patterns, avoiding per-flow controller consultation delay (packet-in latency) for expected traffic — essential for latency-sensitive IoT control traffic

Common Pitfalls

Querying OpenFlow flow statistics every 100 ms from all 100 switches. At 10 switches per 100 ms cycle with 1000 flows per switch, the controller processes 100,000 statistics responses per second — consuming significant CPU that should be available for flow installation. Use 5–30 second polling intervals for non-critical analytics.

Interpreting a long-duration flow table entry as an indication of an active connection. Flow entries persist until explicitly deleted or their idle_timeout expires — a 1-day-old entry may represent an IoT device that disconnected hours ago. Use idle_timeout to garbage-collect stale entries.

Installing flow rules without idle_timeout or hard_timeout values. In busy IoT networks, stale flow entries accumulate, consuming switch flow table memory (typically 2,000–10,000 entries on commodity hardware). Always set appropriate timeouts based on expected IoT session duration.

Relying on packet-in events to the controller for every new IoT device flow. Controller round-trip time (10–100 ms) adds latency to the first packet of every new connection. Pre-install wildcard flow rules for known IoT device communication patterns using proactive flow installation.

135.9 Summary

This chapter covered practical implementation of SDN analytics using OpenFlow:

OpenFlow Statistics Types:

  • Flow Stats: Per-flow packet/byte counts and duration for traffic analysis
  • Port Stats: RX/TX counters and error rates for device health monitoring
  • Table Stats: Flow table utilization for capacity planning
  • Queue Stats: QoS metrics for priority enforcement verification
  • Meter Stats: Rate-limiting statistics for threshold adjustment

Three-Step Implementation:

  1. Configure Collection: Register switches, start polling threads, set intervals
  2. Process & Detect: Extract metrics, calculate rates, compare against baselines
  3. Automated Response: Create meters, install flow rules, log actions

Baseline Strategy:

  • 24-hour rolling window captures daily traffic patterns
  • Mean +/- 3σ provides 99.7% confidence threshold
  • Cold start with conservative defaults for new devices
  • Weekly updates to adapt to changing patterns

Performance Optimization:

  • Tiered polling intervals based on criticality (10s/30s/60s)
  • Sampling strategy for large networks (10-20% rotating coverage)
  • Monitor controller CPU and message queue depth
  • Event-driven collection for specific suspicious flows

OpenFlow statistics are like a fitness tracker for your network – counting every message, measuring speed, and alerting you when something seems off!

135.9.1 The Sensor Squad Adventure: The Network Fitness Tracker

The Sensor Squad wanted to keep their network healthy, so they gave every switch a fitness tracker! Each tracker counted five important things:

  1. Flow Counter: “I count how many messages each rule handles!” (Like counting steps)
  2. Port Counter: “I watch how busy each connection is!” (Like measuring heart rate)
  3. Table Counter: “I track how full the rule book is!” (Like checking how full your backpack is)
  4. Queue Counter: “I measure how long messages wait in line!” (Like timing how long you wait for lunch)
  5. Meter Counter: “I check if anyone is going too fast!” (Like a speed limit checker)

Every 15 seconds, Connie the Controller collected all the fitness data. One day, the Port Counter on Switch 3 shouted: “My utilization just jumped from 5% to 95%! Something is wrong!”

Connie compared this to the baseline – normally that port only used 10% of its capacity. This was definitely abnormal! Connie installed a rate-limiting meter to slow down the suspicious traffic and sent an alert to the security team.

“See?” said Sammy the Sensor. “By checking the fitness trackers regularly, we catch problems before they become disasters!”

135.9.2 Key Words for Kids

Word What It Means
Statistics Numbers that tell you how the network is doing (like a report card)
Polling Checking the numbers at regular intervals (like checking your watch every few minutes)
Baseline What the normal numbers look like, so you know when something is unusual
Key Takeaway

OpenFlow statistics collection provides five types of switch metrics (flow, port, table, queue, meter) that enable anomaly detection when compared against 24-hour rolling baselines. Use tiered polling intervals (10s for critical infrastructure, 30s for standard, 60s for low-priority) to balance detection speed against controller overhead, and implement sampling for networks exceeding 500 switches.

135.10 What’s Next

If you want to… Read this
Study SDN analytics architecture SDN Analytics Architecture
Explore SDN anomaly detection SDN Anomaly Detection
Learn about SDN controllers and use cases SDN Controllers and Use Cases
Review OpenFlow architecture OpenFlow Architecture
Study SDN production deployment SDN Production Framework