119  SDN Controller Basics

In 60 Seconds

SDN controllers (OpenDaylight, ONOS, Ryu, Floodlight) centralize network decision-making via northbound REST APIs and southbound OpenFlow protocols. ONOS handles 6M+ flow operations/sec for carrier-grade deployments; Ryu is best for prototyping with its Python API. Controller clustering with Raft consensus provides sub-second failover for production IoT networks.

119.1 Learning Objectives

By the end of this chapter series, you will be able to:

  • Diagram Controller Architecture: Illustrate the internal components and trace message flow within an SDN controller
  • Contrast Major Controllers: Evaluate OpenDaylight, ONOS, Ryu, and Floodlight for different IoT deployment scenarios
  • Classify Controller APIs: Distinguish northbound (REST/gRPC) from southbound (OpenFlow) API interactions and their message types
  • Architect High Availability: Design controller clustering and failover strategies that meet production uptime requirements
  • Justify Controller Selection: Recommend appropriate SDN controllers based on scale, team expertise, and deployment constraints

119.2 Prerequisites

Before diving into this chapter, you should be familiar with:

  • SDN Fundamentals and OpenFlow: Understanding the basic SDN architecture, control/data plane separation, and OpenFlow protocol is essential for grasping controller internals
  • Networking Basics: Knowledge of network protocols, routing, and packet forwarding provides context for controller decision-making
  • IoT Reference Models: Familiarity with layered IoT architectures helps understand where controllers fit in the system design

Think of the SDN controller as the brain of a traffic control system.

In a traditional network, each router or switch makes its own decisions - like individual traffic lights operating independently. An SDN controller centralizes all decision-making, like a smart city traffic control center that coordinates every intersection.

Simple Analogy:

Traditional Network SDN with Controller
Each device has its own brain One central brain (controller)
Devices communicate via shouting Controller tells each device what to do
Hard to coordinate Easy network-wide changes
Each device learns slowly Controller has instant global view

What the controller does:

  1. Receives events - “A new device connected!” or “Link failed!”
  2. Makes decisions - “Route traffic via path A” or “Block this IP”
  3. Programs switches - Sends flow rules telling switches how to forward packets

Why this matters for IoT:

  • Thousands of devices - Controller manages them all from one place
  • Dynamic networks - Controller adapts instantly when sensors join/leave
  • Security - Controller can isolate compromised devices network-wide

The controller is software running on a server - it’s not a special hardware box. Popular controllers include OpenDaylight (enterprise), ONOS (telecom), Ryu (education), and Floodlight (performance).

119.3 Chapter Overview

This topic has been organized into three focused chapters for easier learning:

Navigation map showing three SDN controller topic chapters: Controller Architecture covering internal components and message flow, Controller Comparison covering OpenDaylight ONOS Ryu and Floodlight evaluation, and APIs and Clustering covering northbound southbound APIs and high availability strategies
Figure 119.1: Navigation map for SDN Controller topics showing three focused chapters with their main content areas and estimated reading times.

119.3.1 1. SDN Controller Architecture

Read: SDN Controller Architecture (15 min)

Learn about the internal structure of SDN controllers:

  • Controller components: Topology Discovery, Device Manager, Flow Manager, Statistics Collector, Policy Engine
  • Message flow: Event-driven communication between applications, controller, and switches
  • Reactive vs proactive: Trade-offs between latency (20-50ms vs <1ms) and flow table memory usage
  • IoT implications: Pre-installing rules for sensor traffic patterns

119.3.2 2. SDN Controller Comparison

Read: SDN Controller Comparison (18 min)

Compare the four major open-source SDN controllers:

Controller Best For Performance Scalability
OpenDaylight Enterprise, multi-protocol 200K-500K flows/sec 1K-5K devices
ONOS Carrier-grade, smart cities 1M+ flows/sec 10K+ devices
Ryu Learning, prototyping 50K-100K flows/sec <1K devices
Floodlight High performance apps 300K-600K flows/sec 1K-3K devices

119.3.3 3. SDN APIs and High Availability

Read: SDN APIs and High Availability (25 min)

Understand how to build production-ready SDN deployments:

  • Northbound APIs: REST, gRPC, NETCONF for application integration
  • Southbound APIs: OpenFlow messages (Packet-In, Flow-Mod, Stats-Request)
  • Clustering strategies: Active-Standby, Active-Active, DHT-based
  • State synchronization: Raft consensus, eventual consistency, failover behavior

119.4 Quick Reference

119.4.1 Controller Selection Cheat Sheet

Your Situation Recommended Controller Why
Learning SDN Ryu Python, simple, great tutorials
Production IoT (10K+ devices) ONOS Scale, clustering, 1M+ flows/sec
Enterprise (mixed protocols) OpenDaylight Comprehensive feature set
Performance-critical Floodlight 600K flows/sec, optimized pipeline

119.4.2 Key Metrics to Remember

Metric Value Context
Reactive flow latency 20-50ms Controller computes path
Proactive flow latency <1ms Pre-installed rules
Cluster failover 3-10s Switch detects, requests new master
State sync overhead 5-10ms Raft consensus latency
Optimal cluster size 3 nodes Survives 1 failure, minimal overhead

Deep Dives:

Protocols:

Architecture:

Advanced Topics:

Learning:

Cross-Hub Connections

Related Learning Resources:

  • Simulations Hub: Try the Network Topology Visualizer to see how SDN controllers manage different network structures. Practice controller concepts with Mininet simulations.
  • Videos Hub: Watch SDN controller demonstrations showing OpenDaylight, ONOS, and Ryu in action.
  • Quizzes Hub: Test your understanding of controller architecture, API design, and high availability with SDN quizzes.
  • Knowledge Gaps Hub: Address common misconceptions about controller performance, clustering, and API design patterns.

SDN controller capacity planning requires calculating flow installations per second against cluster throughput limits. Given 120 flows/min peak rate × 3 switches/flow = 6 installations/second. A 3-node ONOS cluster distributes load: 6 installations ÷ 3 nodes = 2 installations/second per instance (0.0004% utilization vs. 500K flows/sec capacity). Worked example: Flow setup latency = 3 switches × 15 ms synchronous installation + 5 ms network RTT/hop = 3 × (15+5) = 60 ms. Cluster headroom = (3 × 500,000) / 6 = 250,000× current load, meaning latency (not throughput) is the bottleneck for sub-10ms applications.

Scenario: A smart building has 800 IoT devices (sensors, actuators, access control) generating new flows at an average rate of 120 flows/minute during peak occupancy (8 AM - 6 PM). Each flow requires the controller to compute a path and install flow rules across an average of 3 switches. The controller uses an ONOS cluster with 3 nodes.

Given:

  • Peak flow rate: 120 flows/minute = 2 flows/second
  • Average path length: 3 switches (3 flow installations per new flow)
  • ONOS controller performance: 500,000 flows/second per instance (per-instance theoretical max)
  • Synchronous flow installation latency: 15 ms per switch

Step 1: Calculate total flow installations per second - 2 new flows/second × 3 switches/flow = 6 flow installations/second

Step 2: Determine controller capacity with clustering - 3-node ONOS cluster distributes load - Each instance handles: 6 installations / 3 nodes = 2 installations/second per instance - Utilization per instance: 2 / 500,000 = 0.0004% = negligible

Step 3: Calculate worst-case flow setup latency - Synchronous installation: 3 switches × 15 ms = 45 ms per flow - With network RTT overhead (5 ms per hop): 3 × (15 + 5) = 60 ms

Step 4: Evaluate scaling headroom - Current load: 6 installations/second - Cluster capacity: 3 × 500,000 = 1,500,000 installations/second - Headroom: 1,500,000 / 6 = 250,000x current load

Conclusion: The smart building deployment operates well below controller capacity. The limiting factor is flow setup latency (60 ms), not throughput. For applications requiring sub-10ms response, proactive flow installation (pre-installing rules for predictable traffic patterns) should be used instead of reactive installation.

Design validation: Even during a fire alarm event triggering 500 simultaneous new flows (evacuation routes, access control overrides), the controller handles 500 × 3 = 1,500 installations in under 100 ms (within emergency response requirements).

Criterion Ryu OpenDaylight ONOS Floodlight Best For
Programming language Python Java Java Java Ryu: Rapid prototyping, Python developers
Flow setup throughput 50-100K flows/sec 200-500K flows/sec 1M+ flows/sec 300-600K flows/sec ONOS: Carrier-grade deployments (10K+ devices)
Clustering support No (single instance) Yes (OSGi modular) Yes (Raft consensus) Limited ONOS: High availability (99.99% uptime required)
Learning curve Low (50-100 LOC for apps) High (complex plugin system) Moderate (Java proficiency) Moderate Ryu: Education, PoC development
Device support Basic OpenFlow 1.0-1.3 Multi-protocol (OF, NETCONF, OVSDB) OpenFlow 1.0-1.5 + NETCONF OpenFlow 1.0-1.4 OpenDaylight: Mixed vendor environments
Community & docs Active, excellent tutorials Large, complex ecosystem Strong, carrier focus Moderate, declining Ryu: Student projects; ONOS: Production IoT
Northbound APIs REST + custom RESTCONF, MD-SAL REST + gRPC + Intent REST OpenDaylight: Enterprise IT integration
Memory footprint 50-200 MB 1-4 GB 2-8 GB 200-800 MB Ryu: Raspberry Pi edge deployments

Decision tree:

  • Choose Ryu when: Learning SDN (days to first working app), prototyping new algorithms (Python data science ecosystem), small IoT deployments (<100 devices), edge SDN on constrained hardware
  • Choose ONOS when: Production deployment (1,000-100,000 devices), high availability is mandatory (carrier networks, critical infrastructure), need clustering with automatic failover, working with service providers
  • Choose OpenDaylight when: Enterprise IoT with mixed vendors (Cisco, Juniper, Arista), need multi-protocol support beyond OpenFlow, integrating SDN with existing NETCONF-based management systems, large plugin ecosystem is valuable
  • Choose Floodlight when: Prioritize raw performance (data center), simpler than OpenDaylight but more scalable than Ryu, deploying in 100-1,000 device range

Migration path: Start with Ryu for concept validation (1-2 weeks), migrate to OpenDaylight for enterprise PoC (1-2 months), deploy ONOS for production at scale (3+ months to proficiency). All three use OpenFlow southbound, so switch infrastructure remains compatible.

Common Mistake: Deploying SDN Controller Without Flow Table Overflow Protection

What practitioners do wrong: Deploy an SDN controller with reactive flow installation (install-on-demand for every new flow) without implementing flow table management, assuming switch memory is unlimited.

Why it fails:

  • Physical switches have limited TCAM (Ternary Content Addressable Memory) for flow tables: OpenFlow hardware switches typically support 2K-32K flow entries depending on model (e.g., Cisco Nexus 3K: 8,000 entries; Broadcom Trident 2: 16,000 entries)
  • Software switches (Open vSwitch) have higher limits (100K-1M entries) but still finite
  • When flow table fills, switches send PACKET_IN for every new flow, overwhelming the controller and creating a “control plane storm”
  • In IoT deployments with thousands of devices, a single security scan event can generate 50,000+ unique flows instantly (each sensor IP × each destination port)

Correct approach:

  1. Flow aging policies: Configure idle_timeout (e.g., 30 seconds) and hard_timeout (e.g., 300 seconds) on flow rules to auto-expire unused entries
  2. Flow table monitoring: Poll switch flow table usage via OFPT_TABLE_STATS messages every 10 seconds
  3. Proactive eviction: When usage exceeds 80%, evict least-recently-used flows or lowest-priority rules before table is full
  4. Wildcard rules for IoT traffic: Instead of per-device flows, install aggregated rules for IoT subnets (e.g., “all sensors in VLAN 100 → gateway” as one rule instead of 500 per-sensor rules)

Real-world example: A university campus deployed SDN for 3,000 IoT sensors (building automation, access control). During a security audit, the penetration testing team ran an Nmap scan that generated 65,535 unique flows per sensor (every TCP/UDP port). Within 5 minutes, all campus OpenFlow switches hit flow table capacity (8,000 entries), causing controller CPU to spike to 100% processing PACKET_IN messages. The network experienced 20 minutes of degraded performance until the audit was halted.

Solution implemented: (1) Install wildcard rules for common IoT traffic patterns (MQTT to broker, HTTP to management servers) reducing per-sensor rule count from 10 to 2. (2) Implement flow table usage threshold alerts at 60% (warning) and 80% (critical). (3) Configure aggressive idle_timeout of 10 seconds for scan-like traffic (high port numbers, single-packet flows). Post-fix, the same security scan generated only 1,200 flow table entries (96% reduction) and completed without network disruption.

Estimate whether your deployment needs a controller cluster or a single instance.

119.5 Concept Relationships

Concept Relationship to SDN Controllers Importance
Control Plane Separation Fundamental SDN principle enabling centralized controller logic Critical - core architectural concept
Northbound APIs Enable application-controller communication without OpenFlow knowledge High - simplifies app development
Southbound APIs Standardized protocols (OpenFlow, NETCONF) for controller-switch communication High - enables multi-vendor interoperability
Controller Clustering High availability mechanism providing fault tolerance and load distribution Critical - production deployment requirement
Flow Setup Latency Time to install rules (20-50ms reactive, <1ms proactive); determines responsiveness High - impacts IoT real-time applications

Key Concepts

  • SDN (Software-Defined Networking): An architectural approach separating the network control plane (routing decisions) from the data plane (packet forwarding), centralizing control in a software controller for programmable network management
  • Control Plane: The network intelligence layer making routing and forwarding decisions, centralized in an SDN controller rather than distributed across individual switches as in traditional networking
  • Data Plane: The network forwarding layer physically moving packets based on rules installed by the control plane — in SDN, this is the switch hardware executing OpenFlow flow table entries
  • OpenFlow: The foundational SDN protocol enabling communication between an SDN controller and network switches, allowing the controller to install, modify, and delete flow table entries that govern packet forwarding
  • SDN Controller: The centralized network operating system providing a global view of the network topology and programming switch forwarding behavior through southbound APIs (OpenFlow) and exposing northbound APIs to applications
  • Flow Table: A data structure in an SDN switch containing match-action rules: each entry matches packet headers (source IP, destination MAC, port number) and specifies forwarding action (forward, drop, modify, send-to-controller)
  • Southbound API: The interface between an SDN controller and the data plane switches — OpenFlow is the dominant southbound API, though NETCONF, OVSDB, and P4Runtime are also used

Common Pitfalls

Thinking the SDN controller forwards packets like a router. The controller only programs flow tables in switches — it does not forward data plane traffic (except initial packet-in processing). The controller communicates with switches via a separate control channel, not through the data path.

Not configuring packet-in rate limits on SDN switches. Every unknown flow triggers a packet-in event to the controller. In IoT networks where new device connections happen frequently, unconstrained packet-in events can saturate the controller. Configure packet-in rate limiting at switches.

Deploying SDN without measuring controller processing latency per flow. Control plane latency directly impacts new-flow forwarding delay. If the controller takes 50 ms to install a flow rule, every new connection experiences a 50 ms initial delay. Monitor and alert on controller latency percentiles.

Designing an SDN architecture assuming all devices support OpenFlow, then discovering that IoT gateways, industrial switches, and wireless access points use NETCONF, SNMP, or vendor-specific APIs. Design the southbound protocol abstraction layer to support multiple protocols from the start.

119.6 Summary

Key Takeaways:

  1. Controller architecture has three layers: Application (northbound), Control (core services), Infrastructure (southbound)
  2. Major controllers have different strengths: OpenDaylight (features), ONOS (scalability/HA), Ryu (simplicity), Floodlight (performance)
  3. Northbound APIs (REST/gRPC) allow applications to program the network without OpenFlow knowledge
  4. Southbound APIs (OpenFlow/NETCONF) control network devices with standardized protocols
  5. Clustering provides high availability (99.99%+) but at performance cost (10-25% slower)
  6. Controller selection depends on deployment scale, reliability requirements, and ecosystem integration needs

Practical guidelines:

  • Start with Ryu for learning (days to proficiency)
  • Use ONOS for production IoT requiring high availability (weeks to proficiency)
  • Consider OpenDaylight for complex enterprise with mixed devices (months to proficiency)
  • Deploy 3-node clusters for 99.99%+ uptime requirements
  • Use REST APIs for application integration, not direct OpenFlow manipulation

119.7 See Also

An SDN controller is like the principal of a school – one person who knows everything happening in every classroom!

119.7.1 The Sensor Squad Adventure: The Network Principal

The Sensor Squad’s network was growing fast! There were hundreds of switches sending messages in every direction, and nobody was in charge. Messages were getting lost, confused, and sometimes delivered to the wrong place.

“We need a principal!” said Sammy the Sensor. “Someone who can see the WHOLE network and make smart decisions for everyone!”

That’s when they hired Connie the Controller. Connie sat in a special office with screens showing every single switch in the network. When a new message arrived and a switch didn’t know where to send it, the switch would call Connie: “Principal! I have a message for Building 5 but I don’t know the way!”

Connie would look at the big map and say: “Send it left, then straight, then right – that’s the fastest path!” Then Connie would write that instruction down so the switch would remember for next time.

Lila the LED asked, “What if Connie gets sick?”

“That’s why we have THREE Connies!” said Max the Microcontroller. “If one goes down, another takes over in seconds. The network never stops!”

119.7.2 Key Words for Kids

Word What It Means
Controller The central brain that sees the whole network and makes routing decisions
Clustering Having backup controllers ready to take over if the main one fails
API A special language that lets apps talk to the controller (like a phone number)

119.8 What’s Next

If you want to… Read this
Study SDN architecture fundamentals SDN Architecture Fundamentals
Learn OpenFlow core concepts OpenFlow Core Concepts
Explore SDN basics and controllers SDN Controller Comparison
Review SDN fundamentals overview SDN Fundamentals and OpenFlow
Study SDN IoT applications SDN IoT Applications