1287 Database Detective Game

Interactive Puzzle Game: Master IoT Database Selection Through Investigation

animation

database

data-management

game

interactive

learning

1287.1 Database Detective: The IoT Investigation

Welcome to Database Detective! You are a data architect consultant called in to solve database selection puzzles for IoT companies. Each case presents a unique scenario with specific requirements. Your mission: identify the right database type for each situation.

Game Overview

Databases to Master:

Time-Series (InfluxDB): Optimized for timestamped data, high write throughput, retention policies
Document (MongoDB): Flexible schemas, nested data, varied structures
Key-Value (Redis): Ultra-fast caching, session state, real-time operations
Relational (PostgreSQL): ACID compliance, complex relationships, structured queries
Graph (Neo4j): Relationship-heavy data, network topology, device connections

Game Features:

3 difficulty levels with progressively complex scenarios
CAP theorem considerations
Detailed feedback explaining each choice
Score tracking and expertise rating

How to Play

Read the case file describing the IoT scenario
Analyze the requirements (volume, velocity, query patterns)
Select the database type you think fits best
Review the detective’s verdict explaining why it works or doesn’t
Build your score to earn your Database Expertise Rating

Show code

{
  // Load D3.js
  const d3 = await require("d3@7");

  // IEEE Color palette
  const colors = {
    navy: "#2C3E50",
    teal: "#16A085",
    orange: "#E67E22",
    gray: "#7F8C8D",
    lightGray: "#ECF0F1",
    white: "#FFFFFF",
    red: "#E74C3C",
    green: "#27AE60",
    purple: "#9B59B6",
    yellow: "#F1C40F",
    darkBlue: "#1a252f",
    gold: "#D4A500"
  };

  // Database definitions with CAP properties
  const databases = {
    timeseries: {
      name: "Time-Series (InfluxDB)",
      shortName: "Time-Series",
      color: "#22ADF6",
      icon: "\u23F1",
      cap: { c: 0.6, a: 0.9, p: 0.8 },
      strengths: ["High write throughput", "Time-range queries", "Retention policies", "Downsampling"],
      weaknesses: ["Limited joins", "Eventual consistency", "Cardinality limits"],
      bestFor: ["Sensor data", "Metrics", "Logs", "Monitoring"]
    },
    document: {
      name: "Document (MongoDB)",
      shortName: "Document",
      color: "#00ED64",
      icon: "\uD83D\uDCC4",
      cap: { c: 0.7, a: 0.85, p: 0.85 },
      strengths: ["Flexible schema", "Nested data", "Horizontal scaling", "Rich queries"],
      weaknesses: ["Storage overhead", "No native time-series", "Memory intensive"],
      bestFor: ["Device configs", "User profiles", "Events", "Catalogs"]
    },
    keyvalue: {
      name: "Key-Value (Redis)",
      shortName: "Key-Value",
      color: "#DC382D",
      icon: "\u26A1",
      cap: { c: 0.5, a: 0.95, p: 0.9 },
      strengths: ["Sub-millisecond latency", "Pub/sub messaging", "Data structures", "Caching"],
      weaknesses: ["Memory-bound", "Limited queries", "No complex relationships"],
      bestFor: ["Caching", "Session state", "Real-time counters", "Rate limiting"]
    },
    relational: {
      name: "Relational (PostgreSQL)",
      shortName: "Relational",
      color: "#336791",
      icon: "\uD83D\uDDC3",
      cap: { c: 0.95, a: 0.7, p: 0.6 },
      strengths: ["ACID compliance", "Complex joins", "Data integrity", "Rich ecosystem"],
      weaknesses: ["Scaling complexity", "Schema rigidity", "Write bottlenecks"],
      bestFor: ["Inventory", "Billing", "Compliance", "User management"]
    },
    graph: {
      name: "Graph (Neo4j)",
      shortName: "Graph",
      color: "#008CC1",
      icon: "\uD83D\uDD78",
      cap: { c: 0.8, a: 0.75, p: 0.7 },
      strengths: ["Relationship traversal", "Pattern matching", "Network analysis", "Recommendations"],
      weaknesses: ["Not for bulk analytics", "Complex operations", "Scaling challenges"],
      bestFor: ["Device relationships", "Network topology", "Impact analysis", "Dependencies"]
    }
  };

  // Scenarios organized by difficulty
  const scenarios = {
    simple: [
      {
        id: 1,
        title: "Smart Factory Sensors",
        description: "A manufacturing plant has 500 vibration sensors sampling at 100Hz. They need to store 90 days of raw data and query last-hour trends for anomaly detection dashboards.",
        requirements: {
          volume: "~4TB/month",
          velocity: "50,000 writes/sec",
          queryPattern: "Time-range aggregations",
          consistency: "Eventual OK",
          retention: "90 days raw, 2 years aggregated"
        },
        correct: "timeseries",
        feedback: {
          timeseries: "Correct! Time-series databases excel at high-frequency sensor data with built-in retention policies and efficient time-range queries. InfluxDB handles 50K writes/sec easily with automatic downsampling.",
          document: "Not ideal. MongoDB can store time-series data, but lacks native retention policies and time-optimized indexes. Storage overhead would be significant at this volume.",
          keyvalue: "Incorrect. Redis is memory-bound and lacks time-range query capabilities. 4TB/month would require expensive RAM and manual data management.",
          relational: "Suboptimal. PostgreSQL cannot efficiently handle 50K writes/sec without TimescaleDB extension. Standard indexes struggle with time-range aggregations at this scale.",
          graph: "Incorrect. Graph databases are for relationships between entities, not high-velocity sensor readings. This would be extremely inefficient."
        }
      },
      {
        id: 2,
        title: "IoT Device Registry",
        description: "A smart home platform needs to store configuration for 100,000 devices. Each device has different attributes (thermostats have schedules, cameras have resolution settings, locks have access lists). Configs are read frequently but updated rarely.",
        requirements: {
          volume: "~10GB total",
          velocity: "100 reads/sec, 10 writes/sec",
          queryPattern: "Key lookups, filtered searches",
          consistency: "Strong preferred",
          retention: "Indefinite"
        },
        correct: "document",
        feedback: {
          timeseries: "Incorrect. Time-series databases are optimized for append-only timestamped data, not device configurations that are updated in place.",
          document: "Correct! Document databases excel with varied, nested structures. Each device type can have different fields without schema migrations. MongoDB's flexible schema perfectly fits heterogeneous device configs.",
          keyvalue: "Partial fit. Redis could work for simple key-value lookups but lacks query capabilities to filter devices by attributes (e.g., 'all thermostats with eco mode').",
          relational: "Possible but rigid. You'd need many nullable columns or a complex EAV pattern. Schema changes require migrations when adding new device types.",
          graph: "Overkill. Graph databases are for relationship traversals. Device configs don't have significant interconnections requiring graph queries."
        }
      },
      {
        id: 3,
        title: "Real-Time Dashboard Cache",
        description: "A fleet management system shows live vehicle positions to 1,000 concurrent users. GPS updates arrive every 2 seconds per vehicle (5,000 vehicles). Dashboards need instant refresh with sub-50ms response.",
        requirements: {
          volume: "~500MB active data",
          velocity: "2,500 writes/sec, 10,000 reads/sec",
          queryPattern: "Point lookups by vehicle ID",
          consistency: "Eventual (latest position is fine)",
          retention: "Current state only (historical stored elsewhere)"
        },
        correct: "keyvalue",
        feedback: {
          timeseries: "Not optimal for this use case. While time-series DBs store historical data well, real-time dashboards need sub-50ms lookups which Redis provides natively.",
          document: "Slower than needed. MongoDB queries typically take 5-20ms, not suitable for 10K reads/sec with strict latency requirements.",
          keyvalue: "Correct! Redis provides sub-millisecond lookups perfect for real-time dashboards. Store current vehicle state as key-value pairs, publish updates via pub/sub for instant UI refresh.",
          relational: "Too slow. PostgreSQL point lookups are 2-10ms, and connection pooling overhead adds latency. Not designed for 10K concurrent reads.",
          graph: "Incorrect. No relationship traversal needed here. Graph databases would add unnecessary complexity and latency."
        }
      },
      {
        id: 4,
        title: "Utility Billing System",
        description: "An electric utility company needs to store smart meter readings and generate monthly bills. Billing requires joining meter data with customer accounts, rate plans, and payment history. Must maintain audit trails for regulatory compliance.",
        requirements: {
          volume: "~50GB/month",
          velocity: "1,000 writes/sec",
          queryPattern: "Complex joins, aggregate calculations",
          consistency: "Strong ACID required",
          retention: "7 years (regulatory)"
        },
        correct: "relational",
        feedback: {
          timeseries: "Insufficient. Time-series databases lack JOIN capabilities needed to link meters, customers, rate plans, and payments for billing calculations.",
          document: "Risky. MongoDB lacks ACID transactions across collections (until v4.0+) and JOIN operations. Audit trails and regulatory compliance need relational integrity.",
          keyvalue: "Incorrect. Billing requires complex queries joining multiple entities. Key-value stores cannot express billing logic efficiently.",
          relational: "Correct! Billing systems need ACID compliance for financial accuracy, JOINs to connect customers-meters-rates-payments, and audit trails. PostgreSQL with proper indexing handles this workload perfectly.",
          graph: "Not suited. While relationships exist, billing is tabular/aggregate rather than graph traversal. Graph databases aren't optimized for SUM/AVG calculations."
        }
      },
      {
        id: 5,
        title: "Application Event Logs",
        description: "A smart building system generates events from 10,000 sensors (door opens, motion detected, temperature alerts). Events have varied structures and need to be searchable by time, location, and event type for security audits.",
        requirements: {
          volume: "~100GB/month",
          velocity: "5,000 events/sec",
          queryPattern: "Time-range + field filters",
          consistency: "Eventual OK",
          retention: "1 year"
        },
        correct: "timeseries",
        feedback: {
          timeseries: "Correct! Event logs are naturally timestamped and queried by time ranges. InfluxDB's tag-based filtering handles location and event type queries efficiently with automatic retention.",
          document: "Acceptable alternative. MongoDB can store varied event structures, but lacks native time-series optimizations. Queries would be slower at this scale without careful indexing.",
          keyvalue: "Incorrect. Event logs require range queries and filtering, not just key lookups. Redis lacks the query capabilities needed.",
          relational: "Suboptimal. Varied event structures would require EAV patterns or JSONB columns. Time-range queries need careful partitioning to perform well.",
          graph: "Incorrect. Events are independent records, not interconnected entities requiring relationship traversal."
        }
      }
    ],
    multi: [
      {
        id: 6,
        title: "Smart Agriculture Platform",
        description: "A precision farming startup manages 1,000 farms with soil sensors (hourly readings), weather stations (minute data), and crop planning documents. Farmers query historical trends, but also need real-time irrigation triggers when soil moisture drops.",
        requirements: {
          volume: "~200GB/month sensor data, ~5GB configs",
          velocity: "10,000 writes/sec sensors, 100 writes/sec configs",
          queryPattern: "Time-series aggregations + document lookups + real-time alerts",
          consistency: "Eventual for sensors, Strong for farm configs",
          retention: "5 years trends, current configs"
        },
        correct: "timeseries",
        hint: "Think about the PRIMARY data workload. What generates the most volume?",
        feedback: {
          timeseries: "Correct! The primary workload is sensor time-series data (200GB vs 5GB configs). Use InfluxDB for sensor data with continuous queries for irrigation alerts. Farm configs can use a secondary document store or PostgreSQL.",
          document: "Partial. MongoDB can handle both, but sensor queries would be inefficient. The 10K writes/sec sensor workload dominates the architecture decision.",
          keyvalue: "Incorrect for primary storage. Redis could cache real-time alerts but can't store 200GB of historical sensor data or support trend queries.",
          relational: "Insufficient. 10K sensor writes/sec requires time-series optimization. PostgreSQL without TimescaleDB would struggle with this volume.",
          graph: "Incorrect. Farm-sensor relationships are simple hierarchies, not complex graphs. Time-series queries dominate the workload."
        }
      },
      {
        id: 7,
        title: "Industrial IoT Gateway",
        description: "An edge gateway aggregates data from 100 PLCs (Programmable Logic Controllers). It must cache recent values for local HMI displays (sub-10ms), buffer messages during network outages, and forward batches to cloud. Memory is limited to 2GB.",
        requirements: {
          volume: "~100MB active cache",
          velocity: "5,000 reads/sec from HMI, 1,000 writes/sec from PLCs",
          queryPattern: "Point lookups by tag name",
          consistency: "Latest value sufficient",
          retention: "Current + 1 hour buffer"
        },
        correct: "keyvalue",
        hint: "Edge constraints matter. What database fits in 2GB RAM with sub-10ms latency?",
        feedback: {
          timeseries: "Overkill for edge. InfluxDB's footprint exceeds 2GB for proper operation. Edge caching needs simpler, lighter storage.",
          document: "Too heavy. MongoDB requires ~1GB minimum and doesn't provide sub-10ms lookups reliably. Not suited for constrained edge environments.",
          keyvalue: "Correct! Redis (or embedded alternatives like LevelDB) fits edge constraints perfectly. Sub-millisecond lookups for HMI, efficient buffering with lists/streams, minimal memory footprint.",
          relational: "Too slow and heavy. SQLite could work for buffering but can't match sub-10ms read requirements at 5K reads/sec.",
          graph: "Incorrect. No relationship queries needed. Tag-value lookups are simple key-value operations."
        }
      },
      {
        id: 8,
        title: "Smart City Traffic System",
        description: "A city deploys 5,000 traffic sensors measuring vehicle counts and speeds. The system must identify traffic patterns, detect congestion in real-time, and model how traffic flows between intersections to optimize signal timing.",
        requirements: {
          volume: "~500GB/month sensor data",
          velocity: "20,000 measurements/sec",
          queryPattern: "Time-series analysis + graph traversal for flow modeling",
          consistency: "Eventual for sensors, Strong for signal timing",
          retention: "1 year historical, real-time current"
        },
        correct: "timeseries",
        hint: "Which workload dominates? Raw measurements or relationship queries?",
        feedback: {
          timeseries: "Correct! The 20K measurements/sec sensor workload dominates. Use time-series DB for traffic counts/speeds. Flow modeling can use pre-computed adjacency or a secondary graph store for occasional optimization queries.",
          document: "Insufficient. Traffic data is inherently timestamped. Document stores lack time-series optimizations for 20K writes/sec.",
          keyvalue: "Partial. Good for real-time congestion state but cannot store 500GB/month historical data or support trend analysis.",
          relational: "Not optimized. High-velocity time-series writes need specialized storage. Adjacency modeling for signal optimization is secondary to measurement storage.",
          graph: "Secondary concern. While intersection flow is graph-like, the 20K/sec measurement ingestion is the primary challenge. Graph queries run occasionally, not per-measurement."
        }
      },
      {
        id: 9,
        title: "Healthcare Wearable Platform",
        description: "A telehealth company monitors 50,000 patients with wearables. Heart rate (every second), blood pressure (hourly), and daily activity summaries. Doctors query patient histories, and compliance regulations require 10-year audit trails with no data modification.",
        requirements: {
          volume: "~2TB/month (mostly heart rate)",
          velocity: "50,000 writes/sec peak",
          queryPattern: "Patient time-series + compliance audits",
          consistency: "Strong for audit integrity",
          retention: "10 years, immutable"
        },
        correct: "timeseries",
        hint: "HIPAA compliance needs audit trails, but what's the primary storage challenge?",
        feedback: {
          timeseries: "Correct! Heart rate data at 50K writes/sec is the dominant challenge. Time-series DBs with append-only storage support HIPAA immutability. Use PostgreSQL alongside for patient records and audit metadata.",
          document: "Insufficient for write volume. MongoDB can't efficiently handle 50K writes/sec of time-stamped data without significant infrastructure.",
          keyvalue: "Cannot meet requirements. 2TB/month exceeds practical Redis capacity, and regulatory audit trails need queryable history, not just current state.",
          relational: "Cannot handle primary workload. 50K heart rate writes/sec requires time-series optimization. PostgreSQL is needed for patient records but not raw vitals.",
          graph: "Incorrect. Patient-doctor relationships are simple and don't require graph traversal. Time-series vitals dominate the architecture."
        }
      },
      {
        id: 10,
        title: "E-commerce IoT Inventory",
        description: "A warehouse uses RFID readers tracking 500,000 items. The system must maintain accurate inventory counts (for orders), track item locations in real-time (for picking), and generate compliance reports for auditors.",
        requirements: {
          volume: "~20GB inventory + ~50GB movement logs/month",
          velocity: "10,000 RFID scans/sec peak",
          queryPattern: "Inventory lookups + location queries + compliance reports",
          consistency: "Strong for inventory counts",
          retention: "Current inventory, 2 years movement history"
        },
        correct: "relational",
        hint: "Inventory accuracy is critical for orders. What ensures data integrity?",
        feedback: {
          timeseries: "Partial. Good for movement history but cannot maintain authoritative inventory counts with ACID guarantees for order fulfillment.",
          document: "Risky. Inventory counts need transactional accuracy. Document stores historically lack multi-document ACID (improved in MongoDB 4.0+).",
          keyvalue: "Insufficient. While fast for location lookups, inventory counts need transactional integrity that Redis cannot provide.",
          relational: "Correct! Inventory management requires ACID transactions to prevent overselling. PostgreSQL handles inventory counts with foreign keys to orders, locations, and compliance records. Movement history can use a secondary time-series store.",
          graph: "Not suited. Item-location relationships are simple hierarchies, not complex graphs. Inventory accuracy needs relational integrity."
        }
      }
    ],
    tradeoff: [
      {
        id: 11,
        title: "Global Fleet Telematics",
        description: "A logistics company tracks 100,000 vehicles across 50 countries. Each vehicle sends GPS, fuel, and engine data every 5 seconds. Regional offices need local data access during network partitions. Corporate needs global analytics.",
        requirements: {
          volume: "~5TB/month global",
          velocity: "20,000 writes/sec per region",
          queryPattern: "Regional real-time + global analytics",
          consistency: "Regional strong, global eventual",
          retention: "1 year per region, 5 years aggregated global"
        },
        capFocus: "Partition Tolerance vs Consistency",
        correct: "timeseries",
        hint: "CAP theorem: Network partitions between regions are guaranteed. What trade-off fits?",
        feedback: {
          timeseries: "Correct! Time-series DBs like InfluxDB Enterprise support multi-region deployment with eventual consistency for global aggregation while maintaining regional consistency. AP-focused for partitions.",
          document: "Acceptable. MongoDB's sharding supports global distribution, but time-series queries would be less efficient than purpose-built TSDB.",
          keyvalue: "Insufficient. Cannot store 5TB/month historical data and lacks cross-region replication for analytics.",
          relational: "Struggles with partitions. PostgreSQL prioritizes consistency (CP), causing regional unavailability during network issues between datacenters.",
          graph: "Incorrect workload. Telematics is time-series GPS data, not relationship traversal."
        }
      },
      {
        id: 12,
        title: "Smart Grid Network Topology",
        description: "A power utility models its distribution network: 500,000 nodes (transformers, switches, meters) with 2 million connections. Engineers query 'what's affected if transformer X fails?' and 'find shortest path for power rerouting.'",
        requirements: {
          volume: "~50GB topology data",
          velocity: "Low (topology changes rarely)",
          queryPattern: "Multi-hop traversals, impact analysis",
          consistency: "Strong for topology accuracy",
          retention: "Current topology + change history"
        },
        capFocus: "Consistency for topology integrity",
        correct: "graph",
        hint: "What database type excels at 'what's connected to this?' queries?",
        feedback: {
          timeseries: "Incorrect. Network topology isn't time-series data. Meters might send readings (time-series), but the grid structure is a graph problem.",
          document: "Possible but inefficient. Traversing 'what's downstream of transformer X' requires recursive queries that document stores don't optimize.",
          keyvalue: "Cannot express relationships. 'Find all meters affected by outage' requires graph traversal, not key lookups.",
          relational: "Technically possible with recursive CTEs, but graph databases are 100-1000x faster for multi-hop traversals on 2M connections.",
          graph: "Correct! Graph databases excel at relationship traversal. 'What's affected by failure X?' is a natural graph query. Neo4j's Cypher handles multi-hop paths efficiently on 500K nodes."
        }
      },
      {
        id: 13,
        title: "Real-Time Fraud Detection",
        description: "A payment processor analyzes IoT point-of-sale transactions. The system must check each transaction against the last 100 transactions for the card (velocity check), customer location history (impossible travel), and known fraud patterns. Decision needed in <50ms.",
        requirements: {
          volume: "~1TB/month transactions",
          velocity: "50,000 transactions/sec peak",
          queryPattern: "Point lookups + pattern matching",
          consistency: "Latest data critical",
          retention: "90 days detailed, 7 years summary"
        },
        capFocus: "Availability under load",
        correct: "keyvalue",
        hint: "50ms decision time with 50K TPS. What provides sub-millisecond lookups?",
        feedback: {
          timeseries: "Too slow. Time-series queries optimize for range aggregations, not sub-50ms per-transaction lookups at 50K TPS.",
          document: "Possible but borderline. MongoDB queries at 50K TPS would require careful optimization to meet 50ms latency consistently.",
          keyvalue: "Correct! Redis provides sub-millisecond lookups for last-N transactions and location history. Use sorted sets for velocity checks, geo commands for location. 50K TPS is routine for Redis clusters.",
          relational: "Too slow. Even optimized PostgreSQL can't reliably deliver 50ms at 50K TPS for complex per-transaction queries.",
          graph: "Not suited. Fraud detection here is pattern matching on recent history, not relationship traversal. Graph would add latency."
        }
      },
      {
        id: 14,
        title: "IoT Platform Multi-Tenancy",
        description: "A SaaS IoT platform hosts 500 customers, each with different data volumes (10 devices to 100,000 devices). Customers must be isolated (can't see each other's data), and large customers shouldn't impact small customer performance.",
        requirements: {
          volume: "~10TB total across tenants",
          velocity: "Varies wildly by tenant",
          queryPattern: "Tenant-scoped queries",
          consistency: "Per-tenant strong",
          retention: "Customer-configurable"
        },
        capFocus: "Tenant isolation and fairness",
        correct: "timeseries",
        hint: "What database supports tenant isolation with independent retention policies?",
        feedback: {
          timeseries: "Correct! InfluxDB's bucket-per-tenant or TimescaleDB's schema-per-tenant provides isolation. Per-tenant retention policies, independent scaling, and query isolation prevent noisy neighbors.",
          document: "Possible. MongoDB's database-per-tenant works but requires careful index design per tenant. Time-series queries less efficient.",
          keyvalue: "Difficult. Tenant isolation in Redis requires separate instances or careful key namespacing. No built-in retention policies.",
          relational: "Complex. Schema-per-tenant in PostgreSQL works but time-series queries need TimescaleDB. Row-level security has performance overhead.",
          graph: "Not primary concern. IoT platforms are dominated by time-series sensor data, not relationship queries."
        }
      },
      {
        id: 15,
        title: "Predictive Maintenance Pipeline",
        description: "A manufacturing analytics company builds ML models on equipment sensor data. Data scientists need to: (1) query historical data for model training, (2) join sensor data with maintenance records, (3) serve model predictions with <100ms latency.",
        requirements: {
          volume: "~20TB historical",
          velocity: "10,000 predictions/sec serving",
          queryPattern: "Bulk historical reads + real-time serving",
          consistency: "Eventual for training, Strong for predictions",
          retention: "5 years historical"
        },
        capFocus: "Read optimization for ML",
        correct: "timeseries",
        hint: "ML pipelines have distinct training vs serving workloads. What handles both?",
        feedback: {
          timeseries: "Correct! Time-series DBs excel at historical sensor data queries for ML training. Use TimescaleDB for SQL JOINs with maintenance records. Prediction serving typically uses a separate cache/model server, not the primary DB.",
          document: "Suboptimal. Bulk historical reads for ML training would be slow. Document stores don't optimize for time-range scans on 20TB.",
          keyvalue: "Serving only. Redis can cache predictions for <100ms serving but cannot store 20TB historical data or support training queries.",
          relational: "Partial. PostgreSQL handles JOINs well but struggles with 20TB time-series bulk reads without TimescaleDB extension.",
          graph: "Incorrect. Predictive maintenance is time-series analysis, not relationship traversal. Equipment-sensor relationships are simple hierarchies."
        }
      }
    ]
  };

  // Game state
  let currentLevel = "simple";
  let currentScenarioIndex = 0;
  let score = 0;
  let totalAttempted = 0;
  let streak = 0;
  let showingFeedback = false;
  let selectedAnswer = null;
  let answeredScenarios = new Set();

  // Create container
  const container = d3.create("div")
    .style("font-family", "system-ui, -apple-system, sans-serif")
    .style("max-width", "900px")
    .style("margin", "0 auto");

  // Header section
  const header = container.append("div")
    .style("background", `linear-gradient(135deg, ${colors.navy} 0%, ${colors.darkBlue} 100%)`)
    .style("padding", "20px")
    .style("border-radius", "12px")
    .style("margin-bottom", "20px")
    .style("box-shadow", "0 4px 15px rgba(0,0,0,0.2)");

  const headerTop = header.append("div")
    .style("display", "flex")
    .style("justify-content", "space-between")
    .style("align-items", "center")
    .style("flex-wrap", "wrap")
    .style("gap", "15px");

  headerTop.append("div")
    .style("font-size", "24px")
    .style("font-weight", "bold")
    .style("color", colors.white)
    .html("\uD83D\uDD0D Database Detective");

  // Score display
  const scoreDisplay = headerTop.append("div")
    .style("display", "flex")
    .style("gap", "20px")
    .style("align-items", "center");

  const scoreBox = scoreDisplay.append("div")
    .style("background", "rgba(255,255,255,0.1)")
    .style("padding", "10px 20px")
    .style("border-radius", "8px")
    .style("text-align", "center");

  scoreBox.append("div")
    .style("font-size", "11px")
    .style("color", colors.gray)
    .style("text-transform", "uppercase")
    .text("Score");

  const scoreValue = scoreBox.append("div")
    .style("font-size", "24px")
    .style("font-weight", "bold")
    .style("color", colors.teal);

  const streakBox = scoreDisplay.append("div")
    .style("background", "rgba(255,255,255,0.1)")
    .style("padding", "10px 20px")
    .style("border-radius", "8px")
    .style("text-align", "center");

  streakBox.append("div")
    .style("font-size", "11px")
    .style("color", colors.gray)
    .style("text-transform", "uppercase")
    .text("Streak");

  const streakValue = streakBox.append("div")
    .style("font-size", "24px")
    .style("font-weight", "bold")
    .style("color", colors.orange);

  // Level selector
  const levelSelector = header.append("div")
    .style("display", "flex")
    .style("gap", "10px")
    .style("margin-top", "15px")
    .style("flex-wrap", "wrap");

  const levels = [
    { key: "simple", label: "Level 1: Simple", desc: "Single database choice" },
    { key: "multi", label: "Level 2: Multi-Req", desc: "Multiple requirements" },
    { key: "tradeoff", label: "Level 3: Trade-offs", desc: "CAP theorem analysis" }
  ];

  levels.forEach(level => {
    levelSelector.append("button")
      .attr("data-level", level.key)
      .style("padding", "10px 20px")
      .style("border", "none")
      .style("border-radius", "6px")
      .style("cursor", "pointer")
      .style("font-size", "13px")
      .style("font-weight", "bold")
      .style("transition", "all 0.2s")
      .style("background", level.key === currentLevel ? colors.teal : "rgba(255,255,255,0.1)")
      .style("color", colors.white)
      .text(level.label)
      .on("click", function() {
        currentLevel = level.key;
        currentScenarioIndex = 0;
        showingFeedback = false;
        selectedAnswer = null;
        updateLevelButtons();
        renderScenario();
      })
      .on("mouseover", function() {
        if (level.key !== currentLevel) {
          d3.select(this).style("background", "rgba(255,255,255,0.2)");
        }
      })
      .on("mouseout", function() {
        if (level.key !== currentLevel) {
          d3.select(this).style("background", "rgba(255,255,255,0.1)");
        }
      });
  });

  function updateLevelButtons() {
    levelSelector.selectAll("button").each(function() {
      const btn = d3.select(this);
      const level = btn.attr("data-level");
      btn.style("background", level === currentLevel ? colors.teal : "rgba(255,255,255,0.1)");
    });
  }

  // Main game area
  const gameArea = container.append("div")
    .style("display", "flex")
    .style("gap", "20px")
    .style("flex-wrap", "wrap");

  // Case file panel
  const casePanel = gameArea.append("div")
    .style("flex", "1.5")
    .style("min-width", "350px");

  // Answer panel
  const answerPanel = gameArea.append("div")
    .style("flex", "1")
    .style("min-width", "280px");

  // Feedback panel (below game area)
  const feedbackPanel = container.append("div")
    .style("margin-top", "20px")
    .style("display", "none");

  // Progress bar
  const progressContainer = container.append("div")
    .style("margin-top", "20px")
    .style("background", colors.lightGray)
    .style("border-radius", "8px")
    .style("padding", "15px");

  progressContainer.append("div")
    .style("font-size", "12px")
    .style("color", colors.navy)
    .style("margin-bottom", "8px")
    .style("font-weight", "bold")
    .text("Investigation Progress");

  const progressBar = progressContainer.append("div")
    .style("display", "flex")
    .style("gap", "4px")
    .style("flex-wrap", "wrap");

  // Expertise rating panel
  const expertisePanel = container.append("div")
    .style("margin-top", "20px")
    .style("background", colors.white)
    .style("border", `2px solid ${colors.lightGray}`)
    .style("border-radius", "8px")
    .style("padding", "20px");

  function getExpertiseRating(pct) {
    if (pct >= 90) return { title: "Database Architect", color: colors.gold, icon: "\uD83C\uDFC6" };
    if (pct >= 75) return { title: "Senior Data Engineer", color: colors.purple, icon: "\uD83E\uDD47" };
    if (pct >= 60) return { title: "Data Engineer", color: colors.teal, icon: "\uD83E\uDD48" };
    if (pct >= 40) return { title: "Junior Developer", color: colors.orange, icon: "\uD83E\uDD49" };
    return { title: "Trainee", color: colors.gray, icon: "\uD83D\uDCDA" };
  }

  function renderScenario() {
    const levelScenarios = scenarios[currentLevel];
    const scenario = levelScenarios[currentScenarioIndex];

    // Clear panels
    casePanel.html("");
    answerPanel.html("");
    feedbackPanel.style("display", "none").html("");

    // Case file header
    const caseHeader = casePanel.append("div")
      .style("background", colors.navy)
      .style("color", colors.white)
      .style("padding", "15px")
      .style("border-radius", "8px 8px 0 0")
      .style("display", "flex")
      .style("justify-content", "space-between")
      .style("align-items", "center");

    caseHeader.append("div")
      .style("font-size", "14px")
      .style("font-weight", "bold")
      .html(`\uD83D\uDCC1 Case #${scenario.id}: ${scenario.title}`);

    caseHeader.append("div")
      .style("font-size", "12px")
      .style("background", "rgba(255,255,255,0.2)")
      .style("padding", "4px 10px")
      .style("border-radius", "4px")
      .text(`${currentScenarioIndex + 1} of ${levelScenarios.length}`);

    // Case content
    const caseContent = casePanel.append("div")
      .style("background", colors.white)
      .style("border", `2px solid ${colors.lightGray}`)
      .style("border-top", "none")
      .style("border-radius", "0 0 8px 8px")
      .style("padding", "20px");

    // Description
    caseContent.append("div")
      .style("font-size", "14px")
      .style("line-height", "1.6")
      .style("color", colors.navy)
      .style("margin-bottom", "20px")
      .text(scenario.description);

    // Requirements table
    caseContent.append("div")
      .style("font-size", "13px")
      .style("font-weight", "bold")
      .style("color", colors.teal)
      .style("margin-bottom", "10px")
      .text("Evidence Summary:");

    const reqTable = caseContent.append("div")
      .style("display", "grid")
      .style("grid-template-columns", "auto 1fr")
      .style("gap", "8px 15px")
      .style("font-size", "13px")
      .style("background", colors.lightGray)
      .style("padding", "15px")
      .style("border-radius", "6px");

    const reqLabels = {
      volume: "\uD83D\uDCCA Data Volume",
      velocity: "\u26A1 Velocity",
      queryPattern: "\uD83D\uDD0D Query Pattern",
      consistency: "\uD83D\uDD12 Consistency",
      retention: "\uD83D\uDCC5 Retention"
    };

    Object.entries(scenario.requirements).forEach(([key, value]) => {
      reqTable.append("div")
        .style("font-weight", "bold")
        .style("color", colors.navy)
        .text(reqLabels[key] || key);
      reqTable.append("div")
        .style("color", colors.gray)
        .text(value);
    });

    // CAP focus for trade-off level
    if (scenario.capFocus) {
      caseContent.append("div")
        .style("margin-top", "15px")
        .style("padding", "12px")
        .style("background", `linear-gradient(135deg, ${colors.purple}22, ${colors.purple}11)`)
        .style("border-left", `4px solid ${colors.purple}`)
        .style("border-radius", "4px")
        .style("font-size", "13px")
        .html(`<strong style="color: ${colors.purple}">CAP Focus:</strong> <span style="color: ${colors.navy}">${scenario.capFocus}</span>`);
    }

    // Hint for harder levels
    if (scenario.hint && currentLevel !== "simple") {
      caseContent.append("div")
        .style("margin-top", "15px")
        .style("padding", "12px")
        .style("background", `linear-gradient(135deg, ${colors.orange}22, ${colors.orange}11)`)
        .style("border-left", `4px solid ${colors.orange}`)
        .style("border-radius", "4px")
        .style("font-size", "13px")
        .html(`<strong style="color: ${colors.orange}">Detective's Hint:</strong> <span style="color: ${colors.navy}">${scenario.hint}</span>`);
    }

    // Answer panel
    answerPanel.append("div")
      .style("font-size", "14px")
      .style("font-weight", "bold")
      .style("color", colors.navy)
      .style("margin-bottom", "15px")
      .text("Select the Database:");

    const dbOptions = answerPanel.append("div")
      .style("display", "flex")
      .style("flex-direction", "column")
      .style("gap", "10px");

    Object.entries(databases).forEach(([key, db]) => {
      const btn = dbOptions.append("button")
        .attr("data-db", key)
        .style("display", "flex")
        .style("align-items", "center")
        .style("gap", "12px")
        .style("padding", "12px 15px")
        .style("border", `2px solid ${colors.lightGray}`)
        .style("border-radius", "8px")
        .style("background", colors.white)
        .style("cursor", "pointer")
        .style("transition", "all 0.2s")
        .style("text-align", "left")
        .on("click", function() {
          if (!showingFeedback) {
            selectAnswer(key, scenario);
          }
        })
        .on("mouseover", function() {
          if (!showingFeedback) {
            d3.select(this)
              .style("border-color", db.color)
              .style("background", `${db.color}11`);
          }
        })
        .on("mouseout", function() {
          if (!showingFeedback && selectedAnswer !== key) {
            d3.select(this)
              .style("border-color", colors.lightGray)
              .style("background", colors.white);
          }
        });

      btn.append("span")
        .style("font-size", "24px")
        .text(db.icon);

      const btnText = btn.append("div");

      btnText.append("div")
        .style("font-size", "13px")
        .style("font-weight", "bold")
        .style("color", colors.navy)
        .text(db.shortName);

      btnText.append("div")
        .style("font-size", "11px")
        .style("color", colors.gray)
        .text(db.bestFor.slice(0, 2).join(", "));
    });

    // CAP theorem mini-reference
    if (currentLevel === "tradeoff") {
      answerPanel.append("div")
        .style("margin-top", "20px")
        .style("padding", "15px")
        .style("background", colors.lightGray)
        .style("border-radius", "8px")
        .style("font-size", "12px")
        .html(`
          <div style="font-weight: bold; color: ${colors.navy}; margin-bottom: 8px;">CAP Theorem Reference</div>
          <div style="color: ${colors.gray}; line-height: 1.6;">
            <strong>C</strong>onsistency: All nodes see same data<br>
            <strong>A</strong>vailability: Every request gets response<br>
            <strong>P</strong>artition tolerance: System works despite network splits<br><br>
            <em>Pick 2 of 3 during network partitions</em>
          </div>
        `);
    }

    updateProgress();
    updateScore();
  }

  function selectAnswer(dbKey, scenario) {
    selectedAnswer = dbKey;
    showingFeedback = true;
    const isCorrect = dbKey === scenario.correct;

    if (isCorrect) {
      score++;
      streak++;
    } else {
      streak = 0;
    }
    totalAttempted++;
    answeredScenarios.add(scenario.id);

    // Update button states
    answerPanel.selectAll("button").each(function() {
      const btn = d3.select(this);
      const btnDb = btn.attr("data-db");

      if (btnDb === scenario.correct) {
        btn.style("border-color", colors.green)
           .style("background", `${colors.green}22`);
      } else if (btnDb === dbKey && !isCorrect) {
        btn.style("border-color", colors.red)
           .style("background", `${colors.red}22`);
      } else {
        btn.style("opacity", "0.5")
           .style("cursor", "default");
      }
    });

    // Show feedback
    feedbackPanel.style("display", "block").html("");

    const feedbackHeader = feedbackPanel.append("div")
      .style("background", isCorrect ? colors.green : colors.red)
      .style("color", colors.white)
      .style("padding", "15px")
      .style("border-radius", "8px 8px 0 0")
      .style("display", "flex")
      .style("align-items", "center")
      .style("gap", "10px");

    feedbackHeader.append("span")
      .style("font-size", "24px")
      .text(isCorrect ? "\u2705" : "\u274C");

    feedbackHeader.append("span")
      .style("font-size", "16px")
      .style("font-weight", "bold")
      .text(isCorrect ? "Correct! Case Solved" : "Not Quite Right");

    if (isCorrect && streak > 1) {
      feedbackHeader.append("span")
        .style("font-size", "12px")
        .style("background", "rgba(255,255,255,0.2)")
        .style("padding", "4px 10px")
        .style("border-radius", "4px")
        .text(`\uD83D\uDD25 ${streak} streak!`);
    }

    const feedbackContent = feedbackPanel.append("div")
      .style("background", colors.white)
      .style("border", `2px solid ${colors.lightGray}`)
      .style("border-top", "none")
      .style("border-radius", "0 0 8px 8px")
      .style("padding", "20px");

    // Show explanation
    feedbackContent.append("div")
      .style("font-size", "14px")
      .style("line-height", "1.6")
      .style("color", colors.navy)
      .style("margin-bottom", "15px")
      .text(scenario.feedback[dbKey]);

    // If wrong, show correct answer explanation
    if (!isCorrect) {
      feedbackContent.append("div")
        .style("padding", "15px")
        .style("background", `${colors.green}11`)
        .style("border-left", `4px solid ${colors.green}`)
        .style("border-radius", "4px")
        .style("margin-bottom", "15px")
        .html(`<strong style="color: ${colors.green}">Correct Answer: ${databases[scenario.correct].name}</strong><br><span style="font-size: 13px; color: ${colors.navy}">${scenario.feedback[scenario.correct]}</span>`);
    }

    // CAP visualization for correct answer
    const correctDb = databases[scenario.correct];
    const capViz = feedbackContent.append("div")
      .style("margin-top", "15px")
      .style("padding", "15px")
      .style("background", colors.lightGray)
      .style("border-radius", "8px");

    capViz.append("div")
      .style("font-size", "12px")
      .style("font-weight", "bold")
      .style("color", colors.navy)
      .style("margin-bottom", "10px")
      .text(`${correctDb.name} CAP Profile`);

    const capBars = capViz.append("div")
      .style("display", "flex")
      .style("gap", "15px");

    ["Consistency", "Availability", "Partition Tol."].forEach((label, i) => {
      const key = ["c", "a", "p"][i];
      const value = correctDb.cap[key];

      const bar = capBars.append("div")
        .style("flex", "1");

      bar.append("div")
        .style("font-size", "11px")
        .style("color", colors.gray)
        .style("margin-bottom", "4px")
        .text(label);

      const barBg = bar.append("div")
        .style("height", "8px")
        .style("background", colors.white)
        .style("border-radius", "4px")
        .style("overflow", "hidden");

      barBg.append("div")
        .style("width", `${value * 100}%`)
        .style("height", "100%")
        .style("background", correctDb.color)
        .style("border-radius", "4px");
    });

    // Next button
    const levelScenarios = scenarios[currentLevel];
    const hasMore = currentScenarioIndex < levelScenarios.length - 1;

    feedbackContent.append("button")
      .style("margin-top", "20px")
      .style("padding", "12px 24px")
      .style("background", hasMore ? colors.teal : colors.purple)
      .style("color", colors.white)
      .style("border", "none")
      .style("border-radius", "6px")
      .style("font-size", "14px")
      .style("font-weight", "bold")
      .style("cursor", "pointer")
      .text(hasMore ? "Next Case \u2192" : "View Final Rating")
      .on("click", function() {
        if (hasMore) {
          currentScenarioIndex++;
          showingFeedback = false;
          selectedAnswer = null;
          renderScenario();
        } else {
          showFinalRating();
        }
      });

    updateScore();
    updateProgress();
  }

  function showFinalRating() {
    const pct = Math.round((score / totalAttempted) * 100);
    const rating = getExpertiseRating(pct);

    gameArea.style("display", "none");
    feedbackPanel.style("display", "none");
    progressContainer.style("display", "none");

    expertisePanel.html("");
    expertisePanel.style("text-align", "center")
      .style("border-color", rating.color);

    expertisePanel.append("div")
      .style("font-size", "60px")
      .text(rating.icon);

    expertisePanel.append("div")
      .style("font-size", "24px")
      .style("font-weight", "bold")
      .style("color", rating.color)
      .style("margin", "10px 0")
      .text(rating.title);

    expertisePanel.append("div")
      .style("font-size", "48px")
      .style("font-weight", "bold")
      .style("color", colors.navy)
      .text(`${pct}%`);

    expertisePanel.append("div")
      .style("font-size", "14px")
      .style("color", colors.gray)
      .style("margin-bottom", "20px")
      .text(`${score} of ${totalAttempted} cases solved correctly`);

    // Breakdown by level
    const breakdown = expertisePanel.append("div")
      .style("display", "flex")
      .style("justify-content", "center")
      .style("gap", "30px")
      .style("margin", "20px 0")
      .style("flex-wrap", "wrap");

    levels.forEach(level => {
      const levelScenarios = scenarios[level.key];
      const answered = levelScenarios.filter(s => answeredScenarios.has(s.id)).length;

      breakdown.append("div")
        .style("text-align", "center")
        .html(`
          <div style="font-size: 12px; color: ${colors.gray}; text-transform: uppercase">${level.label.split(":")[1].trim()}</div>
          <div style="font-size: 20px; font-weight: bold; color: ${colors.navy}">${answered}/${levelScenarios.length}</div>
        `);
    });

    // Play again button
    expertisePanel.append("button")
      .style("padding", "15px 30px")
      .style("background", colors.teal)
      .style("color", colors.white)
      .style("border", "none")
      .style("border-radius", "8px")
      .style("font-size", "16px")
      .style("font-weight", "bold")
      .style("cursor", "pointer")
      .text("Play Again")
      .on("click", function() {
        // Reset game
        currentLevel = "simple";
        currentScenarioIndex = 0;
        score = 0;
        totalAttempted = 0;
        streak = 0;
        showingFeedback = false;
        selectedAnswer = null;
        answeredScenarios = new Set();

        gameArea.style("display", "flex");
        progressContainer.style("display", "block");
        expertisePanel.html("");
        expertisePanel.style("border-color", colors.lightGray);

        updateLevelButtons();
        renderScenario();
      });
  }

  function updateScore() {
    scoreValue.text(`${score}/${totalAttempted}`);
    streakValue.text(streak);

    if (streak >= 3) {
      streakValue.style("color", colors.orange);
    } else {
      streakValue.style("color", colors.gray);
    }
  }

  function updateProgress() {
    progressBar.html("");

    const allScenarios = [...scenarios.simple, ...scenarios.multi, ...scenarios.tradeoff];

    allScenarios.forEach((scenario, i) => {
      const answered = answeredScenarios.has(scenario.id);
      const isCurrent = scenarios[currentLevel][currentScenarioIndex]?.id === scenario.id;

      progressBar.append("div")
        .style("width", "18px")
        .style("height", "18px")
        .style("border-radius", "4px")
        .style("background", answered ? colors.teal : isCurrent ? colors.orange : colors.lightGray)
        .style("border", isCurrent ? `2px solid ${colors.navy}` : "none")
        .style("transition", "all 0.2s");
    });
  }

  // Initialize expertise panel (empty initially)
  expertisePanel.html("");

  // Initial render
  renderScenario();

  return container.node();
}

1287.2 Understanding Database Selection Through Play

The Database Detective game reinforces these key concepts:

1287.2.1 Level 1: Simple Scenarios

Learn the fundamental strengths of each database type:

Database Type	Primary Strength	Typical IoT Use Case
Time-Series	High write throughput, time-range queries	Sensor data, metrics, logs
Document	Schema flexibility, nested data	Device configs, user profiles
Key-Value	Sub-millisecond latency	Caching, real-time state
Relational	ACID compliance, complex joins	Billing, inventory, compliance
Graph	Relationship traversal	Network topology, dependencies

1287.2.2 Level 2: Multi-Requirement Scenarios

Real IoT systems have competing requirements. Learn to identify the dominant workload:

Volume: Which data type generates the most storage?
Velocity: Which operations need the highest throughput?
Query Pattern: What questions do users ask most frequently?

1287.2.3 Level 3: Trade-off Analysis (CAP Theorem)

The CAP theorem states that distributed systems can only guarantee 2 of 3 properties during network partitions:

%% fig-alt: "CAP theorem triangle showing Consistency, Availability, and Partition Tolerance at vertices, with database types positioned according to their trade-offs: PostgreSQL prioritizes CP, Redis and Cassandra prioritize AP, and MongoDB offers tunable consistency"
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart TB
    subgraph CAP["CAP Theorem"]
        C["<strong>C</strong>onsistency<br/>All nodes see same data"]
        A["<strong>A</strong>vailability<br/>Every request gets response"]
        P["<strong>P</strong>artition Tolerance<br/>Works despite network splits"]
    end

    subgraph CP["CP Systems"]
        PG["PostgreSQL"]
        MS["MySQL"]
    end

    subgraph AP["AP Systems"]
        RD["Redis Cluster"]
        CS["Cassandra"]
        IF["InfluxDB"]
    end

    subgraph Tunable["Tunable"]
        MG["MongoDB"]
        TB["TimescaleDB"]
    end

    C --> CP
    P --> CP
    A --> AP
    P --> AP

    style C fill:#2C3E50,stroke:#16A085,color:#fff
    style A fill:#16A085,stroke:#2C3E50,color:#fff
    style P fill:#E67E22,stroke:#2C3E50,color:#fff

Figure 1287.1

Decision Framework:

CP (Consistency + Partition Tolerance): Choose when data accuracy is critical (billing, inventory)
AP (Availability + Partition Tolerance): Choose when uptime matters more than perfect accuracy (dashboards, caching)
CA (Consistency + Availability): Only possible in non-distributed systems (single-node databases)

1287.3 Knowledge Check

Knowledge Check: Database Selection Mastery Quick Check

Question 1: A smart building has 10,000 occupancy sensors sending presence updates every 30 seconds. The building management system needs to answer: “How many people were in zone X between 2-3pm yesterday?” Which database type is MOST appropriate?

Explanation: The query “How many people were in zone X between 2-3pm yesterday?” is a time-range aggregation - exactly what time-series databases optimize for. InfluxDB’s continuous queries can pre-compute these aggregations efficiently.

Question 2: An IoT gateway needs to cache the latest reading from 1,000 sensors for a local HMI display. Updates arrive 10x per second per sensor. Display refresh requires <10ms response. Memory is limited to 512MB. Which database fits?

Explanation: The requirements are: sub-10ms reads, 10K writes/sec, 512MB memory, current-value-only storage. Redis provides sub-millisecond lookups, runs efficiently in limited memory, and naturally overwrites previous values. Other databases are heavier and slower for this use case.

Question 3: A utility company needs to model its power grid: 100,000 nodes (transformers, switches, meters) with 500,000 connections. Engineers frequently ask “If transformer T-4502 fails, which customers lose power?” Which database type is MOST efficient for this query?

Explanation: “Which customers are downstream of transformer T-4502?” is a graph traversal query. Graph databases like Neo4j are optimized for multi-hop relationship queries, executing them 100-1000x faster than relational JOINs on 500K connections.