300  SDN Production Case Studies

300.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Analyze Google B4 WAN architecture achieving 95%+ link utilization with centralized traffic engineering
  • Evaluate Barcelona smart city network slicing for 19,500 IoT sensors with differentiated QoS
  • Understand Siemens industrial IoT deterministic scheduling using SDN with Time-Sensitive Networking
  • Extract key lessons for applying SDN principles to your own IoT deployments

300.2 Prerequisites

Required Chapters: - SDN Production Framework - Enterprise architecture and controller platforms

Technical Background: - Controller clustering concepts - OpenFlow flow tables - Network slicing and QoS

Estimated Time: 15 minutes

NoteCross-Hub Connections

Interactive Learning: - Simulations Hub - Explore network topology and traffic engineering simulations - Videos Hub - Watch case study presentations from Google and Siemens

Knowledge Assessment: - Quizzes Hub - Test your understanding of SDN deployment patterns

These case studies demonstrate how major organizations solve real networking challenges with SDN:

  • Google B4: How to achieve near-100% link utilization through centralized control
  • Barcelona: How to run multiple isolated services on shared infrastructure
  • Siemens: How to guarantee sub-millisecond timing for industrial control

Each case study includes architecture diagrams, results, and lessons you can apply to your own projects.

300.3 Case Study 1: Google B4 WAN

Background: Google operates a global WAN connecting data centers for inter-DC traffic (e.g., search index replication, video distribution to edge caches). Traditional WAN routing achieved only 30-40% average link utilization due to conservative traffic engineering.

SDN Implementation: - Custom SDN controller (Central Traffic Engineering - CTE) - OpenFlow-based switches with centralized path computation - Bandwidth-aware routing with application-level priorities

Architecture:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D'}}}%%
graph TB
    subgraph Apps[Application Layer]
        Search[Search Index Sync<br/>High Priority]
        Video[Video Distribution<br/>Medium Priority]
        Backup[Data Backup<br/>Best Effort]
    end
    subgraph Controller[SDN Controller - CTE]
        TE[Traffic Engineering]
        Topo[Topology Manager]
        BW[Bandwidth Allocator]
    end
    subgraph WAN[WAN Data Plane]
        DC1[Data Center 1]
        DC2[Data Center 2]
        DC3[Data Center 3]
        DC4[Data Center 4]
    end
    Apps -->|Traffic Demands| Controller
    Controller -->|Flow Rules| WAN
    DC1 <-->|Link Util 95%| DC2
    DC2 <-->|Link Util 95%| DC3
    DC3 <-->|Link Util 95%| DC4
    DC4 <-->|Link Util 95%| DC1
    style Apps fill:#7F8C8D,color:#fff
    style Controller fill:#2C3E50,color:#fff
    style WAN fill:#16A085,color:#fff

Figure 300.1: Google B4 WAN SDN: Application-Aware Traffic Engineering with 95% Utilization

Alternative View - Utilization Comparison:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D'}}}%%
graph LR
    subgraph Traditional["Traditional WAN Routing"]
        T_Link["Link Capacity: 100 Gbps"]
        T_Used["Used: 30-40 Gbps<br/>(conservative buffers)"]
        T_Waste["Wasted: 60-70 Gbps<br/>Reserved for bursts"]
    end

    subgraph SDN_B4["Google B4 SDN Routing"]
        S_Link["Link Capacity: 100 Gbps"]
        S_Used["Used: 95 Gbps<br/>(dynamic allocation)"]
        S_Reserve["Reserve: 5 Gbps<br/>Managed headroom"]
    end

    subgraph Savings["Annual Impact"]
        Cost["3.17x Efficiency Gain<br/>$10M+ deferred CAPEX"]
        Time["Recovery: <2s<br/>vs 15-30 min OSPF"]
    end

    T_Link --> T_Used
    T_Used --> T_Waste
    S_Link --> S_Used
    S_Used --> S_Reserve
    T_Waste -.->|"SDN Transforms"| S_Used
    S_Reserve --> Cost
    S_Reserve --> Time

    style Traditional fill:#7F8C8D,color:#fff
    style SDN_B4 fill:#16A085,color:#fff
    style Savings fill:#E67E22,color:#fff
    style T_Waste fill:#c0392b,color:#fff

Figure 300.2: Alternative view: Utilization comparison between traditional WAN and Google B4 SDN. Traditional routing uses only 30-40% of link capacity due to conservative buffer allocation, wasting 60-70% of available bandwidth. SDN’s centralized traffic engineering enables 95% utilization with managed 5% headroom, achieving 3.17x efficiency improvement and deferring $10M+ in capacity expansion costs while reducing failure recovery time from 15-30 minutes to under 2 seconds.

300.3.1 Results

  • 95%+ link utilization (vs. 30-40% traditional routing)
  • Near-zero packet loss through congestion-aware routing
  • Rapid failure recovery (<2 seconds) via centralized rerouting
  • Cost savings: Better utilization = fewer links needed

300.3.2 Key Lessons for IoT

  • Centralized visibility enables better resource allocation
  • Application-aware routing improves QoS significantly
  • SDN scales to massive networks (Google’s WAN is planetary scale)

300.4 Case Study 2: Smart City IoT - Barcelona

Background: Barcelona deployed 19,500 IoT sensors across the city for smart lighting, parking, environmental monitoring, and public Wi-Fi. Traditional network management couldn’t handle dynamic traffic patterns and multi-tenant isolation.

SDN Implementation: - OpenDaylight controller cluster (3 nodes) - Network slicing for different city services - QoS policies prioritizing emergency services over convenience apps

Network Slices:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D'}}}%%
graph TB
    subgraph Physical[Physical Infrastructure]
        SW1[OpenFlow Switch 1]
        SW2[OpenFlow Switch 2]
        SW3[OpenFlow Switch 3]
    end
    subgraph Slice1[Emergency Services Slice - VLAN 100]
        Fire[Fire Sensors]
        Police[Police Systems]
    end
    subgraph Slice2[Environmental Monitoring - VLAN 200]
        Air[Air Quality]
        Noise[Noise Sensors]
    end
    subgraph Slice3[Convenience Services - VLAN 300]
        Parking[Smart Parking]
        Lighting[Street Lights]
    end
    Slice1 -->|High Priority<br/>QoS| Physical
    Slice2 -->|Medium Priority| Physical
    Slice3 -->|Best Effort| Physical
    style Slice1 fill:#E67E22,color:#fff
    style Slice2 fill:#16A085,color:#fff
    style Slice3 fill:#7F8C8D,color:#fff
    style Physical fill:#2C3E50,color:#fff

Figure 300.3: Barcelona Smart City: SDN Network Slicing for 19,500 IoT Sensors

Alternative View - QoS Priority Enforcement:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D'}}}%%
graph LR
    subgraph Congestion["Network Congestion Event"]
        BW["Available Bandwidth:<br/>100 Mbps total"]
    end

    subgraph Priority["QoS Priority Enforcement"]
        P1["Emergency: 50 Mbps<br/>Guaranteed minimum"]
        P2["Environmental: 30 Mbps<br/>Minimum reserve"]
        P3["Convenience: 20 Mbps<br/>Best effort only"]
    end

    subgraph Result["Congestion Resolution"]
        R1["Fire alert: Delivered<br/><50ms latency"]
        R2["Air quality: Delivered<br/>~100ms latency"]
        R3["Parking update: Queued<br/>Delayed 2 seconds"]
    end

    BW --> P1
    BW --> P2
    BW --> P3
    P1 --> R1
    P2 --> R2
    P3 --> R3

    style Congestion fill:#7F8C8D,color:#fff
    style P1 fill:#E67E22,color:#fff
    style P2 fill:#16A085,color:#fff
    style P3 fill:#2C3E50,color:#fff

Figure 300.4: Alternative view: QoS enforcement during congestion. When bandwidth is constrained, SDN prioritizes emergency services (guaranteed 50 Mbps), then environmental monitoring (30 Mbps reserve), while convenience services (parking, lighting) use remaining capacity. This operational view shows how network slicing protects critical traffic during peak demand.

300.4.1 Results

  • Energy savings: 30% reduction in street lighting costs via adaptive scheduling
  • Traffic reduction: 20% fewer cars circling for parking
  • Response time: Emergency services get guaranteed <50ms latency
  • Multi-tenancy: Clean isolation between city departments

300.4.2 Key Lessons for IoT

  • Network slicing essential for mixed-priority IoT workloads
  • SDN enables dynamic policy updates without rewiring
  • Centralized monitoring provides citywide visibility

300.5 Case Study 3: Industrial IoT - Siemens Factory

Background: Siemens manufacturing plant with 3,000 industrial IoT sensors (vibration monitors, temperature sensors, robotic arms) requiring deterministic latency and ultra-reliability (99.9999% uptime).

SDN Implementation: - ONOS controller with Time-Sensitive Networking (TSN) extensions - Scheduled traffic for deterministic flows (robotic control) - In-band Network Telemetry (INT) for microsecond-level monitoring

Deterministic Scheduling:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D'}}}%%
gantt
    title Time-Sliced TSN Traffic Schedule (Deterministic)
    dateFormat X
    axisFormat %L ms
    section Robot Control
    Robot Slot 1 :active, 0, 100
    Robot Slot 2 :active, 200, 300
    Robot Slot 3 :active, 400, 500
    section Sensor Data
    Sensor Slot 1 :crit, 100, 200
    Sensor Slot 2 :crit, 300, 400
    section Best Effort
    Monitoring 1 :done, 150, 200
    Monitoring 2 :done, 350, 400

Figure 300.5: Industrial IoT TSN: Deterministic Time-Sliced Traffic Scheduling

300.5.1 Results

  • 99.9999% uptime (5 minutes downtime per year)
  • Deterministic latency: <1ms jitter for robotic control loops
  • Predictive maintenance: SDN flow analytics detect anomalies before failures
  • Production increase: 15% throughput improvement via optimized coordination

300.5.2 Key Lessons for IoT

  • Industrial IoT demands determinism that SDN-TSN can provide
  • Flow monitoring enables predictive analytics
  • SDN complements rather than replaces domain-specific protocols (OPC-UA, Modbus)

300.6 Cross-Case Comparison

Metric Google B4 Barcelona Siemens
Scale Planetary WAN City-wide (19,500 sensors) Factory (3,000 sensors)
Controller Custom CTE OpenDaylight ONOS + TSN
Primary Goal Utilization Multi-tenancy Determinism
Latency Target Seconds <50ms critical <1ms
Availability 99.999% 99.99% 99.9999%
Key Innovation Traffic engineering Network slicing TSN scheduling

300.7 Understanding Check

Scenario: Google’s WAN connects datacenters worldwide. Traditional OSPF routing achieves 30-40% average link utilization (conservative to avoid congestion). B4 SDN achieves 95%+ utilization. Two 10 Gbps links: Link A (shortest path) at 100% capacity, Link B (alternate) at 20%.

Think about: 1. Why does traditional routing leave Link B underutilized? 2. How does centralized controller rebalance traffic across both links? 3. Calculate throughput improvement: 30% vs 95% utilization on 100 Gbps total capacity

Key Insight: Traditional routing uses shortest-path only -> Link A overloaded (packet loss), Link B idle. Operators set conservative link weights -> average 30% utilization to avoid congestion. SDN solution: Controller sees global topology + real-time utilization. Routes 60% traffic on Link A, 40% on Link B -> both at ~60% -> no congestion, better utilization. Application-aware: high-priority traffic gets low-latency path, bulk transfers use alternate paths. Throughput: Traditional (30% util) = 30 Gbps used, 70 Gbps wasted. B4 SDN (95% util) = 95 Gbps used. 3.17x improvement. Annual savings: defer capacity expansion by 3+ years (~$10M).

Scenario: Barcelona smart city SDN manages emergency services (VLAN 100), environmental monitoring (VLAN 200), parking sensors (VLAN 300). 5,000 parking sensors simultaneously update, flooding network with 50 Mbps burst traffic. Emergency fire alarm must reach dispatch in <50ms.

Think about: 1. How do priority-based flow rules + queue scheduling ensure emergency latency? 2. Calculate queue servicing: strict priority Queue 1 vs best-effort Queue 3 3. Why doesn’t parking flood starve emergency traffic?

Key Insight: Controller installs priority-based rules: Emergency (priority=1000) -> Queue 1 (strict priority, 10 Mbps reserved). Parking (priority=100) -> Queue 3 (best-effort). Parking flood: 5,000 sensors -> 50 Mbps burst -> Queue 3 fills up -> packets delayed/dropped. Emergency packet arrives: Matches VLAN 100 rule -> Queue 1 -> switch services Queue 1 BEFORE Queue 3 -> fire alarm forwarded in <1ms despite Queue 3 congestion. Isolation: Physical infrastructure shared, but performance differentiated. Emergency gets guaranteed service, parking gets leftover bandwidth. Network slicing = virtual networks on shared hardware.

Scenario: Siemens factory robotic assembly line requires <1ms jitter for closed-loop control (robot arm updates position every 1ms). Traditional Ethernet: best-effort forwarding causes 0.1-10ms variable latency (100x jitter). SDN + TSN provides deterministic scheduling.

Think about: 1. How does time-triggered scheduling eliminate jitter? 2. Calculate bounded latency: robot packet arrives at 99us, scheduled window 0-100us 3. Why can’t traditional Ethernet provide deterministic guarantees?

Key Insight: TSN pre-allocates transmission windows synchronized across all switches (IEEE 802.1AS clock sync <1us accuracy). Schedule: Robot control: 0-100us (reserved), Sensor data: 100-200us, Best-effort: 200-1000us. Bounded latency: Worst case = packet arrives at 99us -> buffered 1us -> transmitted at 100us -> jitter bounded to 100us. Traditional Ethernet fails: Best-effort queuing -> robot packet waits for bulk data transfer -> 10ms delay -> control loop misses deadline -> robot positioning error. SDN role: Controller computes end-to-end schedule, installs time-triggered flow rules on all switches. Result: deterministic <1ms jitter for industrial control, AR/VR, medical devices.

300.8 Summary

This chapter examined three production SDN deployments demonstrating different IoT applications:

Key Takeaways:

  1. Google B4: Centralized traffic engineering achieves 95%+ link utilization, 3.17x improvement over traditional routing

  2. Barcelona Smart City: Network slicing isolates 19,500 sensors across emergency, environmental, and convenience services with differentiated QoS

  3. Siemens Factory: SDN + TSN provides 99.9999% uptime with <1ms jitter for industrial robotic control

  4. Common Themes: Centralized visibility, programmable policies, and application-awareness enable optimizations impossible with distributed routing

  5. Scale Diversity: SDN applies from factory floors to planetary-scale WANs

Related Chapters: - SDN Production Framework - Controller platforms and architecture - SDN Production Best Practices - HA, security, monitoring, and optimization

300.9 What’s Next?

Continue to learn best practices for SDN production deployments including high availability, security hardening, and monitoring.

Continue to SDN Best Practices ->