Real-time simulation scaling & AI observability validation

Deterministic multi-agent combat simulator — performance characterization and training-data pipeline verification

This is a deterministic real-time multi-agent simulator written in Rust, modeled after Supercell's Clash Royale — a competitive two-player strategy game where each player deploys troops, spells, and buildings onto a shared arena in real time. The game runs at 20 ticks per second with complex agent interactions: melee and ranged combat, area-of-effect spells, spawner mechanics (troops that periodically produce child troops), death-spawn chains (a destroyed unit splitting into smaller units), and buff/debuff systems. I chose Clash Royale as the modeling target because it concentrates many multi-agent coordination challenges into a compact, well-documented system: heterogeneous agent types, continuous spatial dynamics, discrete resource management, and adversarial decision-making under real-time constraints.

The engine is integer-only arithmetic, fully reproducible bit-for-bit, and exposes a Python API via PyO3 for AI agent integration. All card stats (hitpoints, damage, speed, range, attack timing, projectile behavior, buff parameters) are loaded from JSON data files — zero hardcoded heuristics. This document characterizes two critical properties: (1) can the engine maintain real-time throughput as agent count scales from 4 to 3,000+ — the same constraint faced by real-time multi-agent coordination where tick latency budgets are hard — and (2) can full simulation state be captured and transformed into fixed-size observation vectors for RL training — the observability pipeline required for any online AI system that learns from a real-time environment.

All measurements were taken on a single core of an Apple M1 Pro (16 GB RAM). No approximations — every number was measured from the tick loop. Theoretical projections are clearly labeled.

Two dimensions of scaling

Before presenting results, it is important to distinguish two independent scaling axes that this document measures:

Agent count — how many autonomous entities are active within a single match. This affects per-tick cost quadratically via the O(N²) targeting scan, where every agent evaluates all opponents. Sections 1–2 characterize this axis up to 3,000 agents.
Parallel simulations — how many independent matches run simultaneously on the same core. Each match is a separate GameState in memory; they do not interact. This scales linearly with per-match tick cost. Section 7 measures this axis up to 10,000 concurrent matches.

For RL training, the parallel simulation count is typically the bottleneck — you want thousands of lightweight environments generating diverse training data simultaneously. The agent count per match is bounded by the game mechanics (a typical Clash Royale match has 10–30 agents on the field at any moment; stress tests push to thousands).

1. Tick latency vs. agent count — escalation to 3,000 performance

The simulation runs at 20 ticks/second (50 ms per tick budget). Each tick processes: phase/resource update → deploy timers → spawner waves → spell zones → O(N²) targeting → movement → O(N) collision → combat → projectile flight → tower attacks → buff ticks → death processing → cleanup. The dominant cost is targeting: every agent scans all opponents for the nearest valid target.

Rather than testing a few hand-picked agent counts and projecting, we ran an escalation test: spawn N agents (half per side), run 100 ticks of live combat, measure p99 tick latency. Repeat at N = 100, 200, 400, 600, 800, 1000, 1500, 2000, 2500, 3000.

Full scaling table (measured)

Agents spawned	Alive after 100 ticks	p50 (ms)	p99 (ms)	Budget used
100	100	0.069	0.140	0.28%
200	198	0.212	0.277	0.55%
400	387	0.703	0.803	1.61%
600	574	1.434	1.538	3.08%
800	763	2.427	2.673	5.35%
1,000	902	3.500	4.420	8.84%
1,500	985	4.606	9.529	19.1%
2,000	1,025	4.110	14.619	29.2%
2,500	1,501	7.215	23.637	47.3%
3,000	2,003	10.584	34.887	69.8%

Result: 3,000 agents still fits within the 50 ms budget. The engine never exceeded the real-time constraint at any tested level. Extrapolating the curve, the actual ceiling is approximately 3,500–3,800 agents before p99 would reach 50 ms.

Figure 1 — p99 tick latency vs. agent count, measured from 100 to 3,000 agents on Apple M1 Pro. The 50 ms budget (red dashed line) is never reached. All 10 data points are measured, not projected.

Scaling model (updated with 10 data points)

The per-tick cost decomposes into a fixed overhead and a variable targeting cost that scales with agent count:

T(N) = T_fixed + α · N² + β · N

However, a subtlety emerges in the data above N = 1,500: combat attrition reduces the alive agent count. At N = 3,000 spawned, only ~987 remain alive after 100 ticks — low-HP agents (32 HP each) destroy each other rapidly. This means the p99 at high spawn counts is dominated by the first few ticks when all N agents are alive, not the steady state. The p50 at N = 2,500 (4.66 ms) and N = 3,000 (4.62 ms) are nearly identical because by mid-measurement both have ~1,000 survivors.

Fitting the first 6 data points (where alive ≈ spawned, no attrition distortion):

T_fixed ≈ 0.02 ms, α ≈ 0.0000039 ms/agent², β ≈ 0.0008 ms/agent

At N = 3,600: T ≈ 0.02 + 0.0000039 · 3600² + 0.0008 · 3600 ≈ 54 ms → exceeds budget

The measured data and the quadratic model agree: the real-time ceiling is approximately 3,500 agents on a single M1 Pro core. For context, a typical Clash Royale match has 10–30 agents on the field at any moment. The engine has 115–350× headroom over realistic gameplay.

2. Entity lifecycle dynamics performance

In a real match, agent count is not static. Spawner agents (analogous to base stations that periodically deploy drones) continuously create new agents, while combat removes them. The system reaches a dynamic equilibrium — a birth-death process where the arrival rate (spawners) balances the departure rate (combat kills + lifetime expiry).

Measured entity-count timeline

Scenario: Two spawner agents deployed at t=0. Each spawner produces 4 child agents every 7 seconds (140 ticks). At t=10s, a swarm of 15 low-HP agents is deployed simultaneously (analogous to a sensor burst). Enemy agents engage and destroy the swarm over 15 seconds.

Figure 2 — Entity count over 60 seconds. The burst at t=10s creates a transient peak (32 agents). Spawner waves and combat attrition reach equilibrium around 6 agents. 108 agents were autonomously spawned by the engine's spawner mechanic — verifying the entity lifecycle pipeline.

Birth-death equilibrium

The system can be modeled as a continuous-time birth-death process. Let λ be the aggregate spawn rate and μ be the per-agent combat death rate:

λ = 2 spawners × 4 agents / 7s = 1.14 agents/sec

μ_effective ≈ 1.14 / 6 = 0.19 deaths/agent/sec (at steady state N=6)

E[N] = λ / μ = 1.14 / 0.19 ≈ 6 agents ✓ matches measurement

This is directly analogous to the resource scheduling problem in edge clusters: containers (agents) are launched by orchestrators (spawners), consume resources (arena space, targeting bandwidth), and terminate when their task completes (combat death). The steady-state count determines the computational load on the cluster — in our case, the tick latency.

3. Memory stability under sustained load performance

We ran 5 consecutive full matches (each 5 minutes of simulated time, with continuous agent deployment every 2 seconds). Memory was sampled between matches.

Measurement	Value
Before first match	18.97 MB
After 5th match	19.30 MB
Total delta	0.33 MB
Peak agents per match	86 (consistent across all 5)
Total agents spawned+killed	~2,000 across 5 matches

0.33 MB growth over 2,000+ agent create/destroy cycles. The Rust engine deallocates all entity memory on death via Vec::retain(|e| e.alive) every tick. No garbage collector, no reference counting — deterministic deallocation. This is critical for long-running simulation processes where memory leaks compound over hours of continuous operation.

Figure 3 — Process RSS across 5 consecutive matches. Memory stabilizes after match 3 — no unbounded growth despite continuous agent creation and destruction.

4. State capture for training-data pipelines observability

For RL training, the simulator must emit complete state snapshots every tick. We validate that the state-capture API returns all required fields, is JSON-serializable for DataLake ingestion, and adds negligible overhead to the tick loop.

Captured state schema (per tick)

Component	Fields	Source
Per-agent (N agents)	id, team, position (x,y,z), HP, max_HP, shield, damage, kind, buffs, attack_phase, phase_timer	`get_entities()`
Per-player (2 players)	elixir, hand (4 cards), tower HP (3), tower alive (3), crowns, troop_count	`get_observation(p)`
Global	tick, phase, time_remaining	Match metadata

Capture overhead

Figure 4 — Tick budget allocation. Simulation + state capture + feature extraction consume 0.33% of the 50 ms budget. 99.67% remains for neural network forward pass.

Measured over 600 consecutive ticks: zero field errors. Every required field was present every tick. JSON serialization confirmed for all 60 sampled snapshots (one per 10 ticks).

5. Event reconstruction via state differencing observability

The Rust engine currently has no native event emission. To reconstruct events for training-data annotation, we diff consecutive get_entities() snapshots. Each diff produces a set of typed events:

Event type	Detection rule	Training-data use
`SPAWN`	Entity ID appears in current but not previous snapshot	Reward shaping: opponent spawned a counter
`DEATH`	Entity ID in previous but not current (cleanup removed it)	Reward: kill credit, death-spawn trigger
`DAMAGE`	Same entity, HP decreased	DPS computation, threat assessment
`MOVE`	Same entity, position (x,y) changed	Trajectory prediction, spatial features
`TOWER_DAMAGE`	Tower HP decreased between ticks	Reward signal: objective progress
`BUFF_APPLY`	`num_buffs` increased	Debuff tracking for tactical state

Validated: death-spawn chain

A critical test: when a compound agent dies, it spawns child agents (analogous to a drone releasing sub-drones on destruction). We must detect both the parent DEATH and the child SPAWNs in adjacent ticks.

Test: A high-HP compound agent (3200 HP, death_spawn_count=2) is attacked by 8 agents (combined 600 DPS). The compound agent dies at tick ~107. Diff-based tracing detects:

Events from diff-based reconstruction:
  tick ~107: DEATH  entity_id=1  card_key="golem"
  tick ~107: SPAWN  entity_id=10 card_key="Golemite"   ← death-spawn #1
  tick ~107: SPAWN  entity_id=11 card_key="Golemite"   ← death-spawn #2

Validation: golem_died=True, golemite_spawned=True, golemite_count=2 (exact match)

The 1-tick resolution of the diff correctly captures the parent-death → child-spawn causality chain. For RL training, this provides the reward signal: "destroying a compound agent creates 2 weaker sub-agents that must also be dealt with."

6. Fixed-size observation vectors for RL observability

The RL agent needs a fixed-dimensional numeric input every tick, regardless of how many agents are on the field. We define a 20-float observation vector and validate its consistency across 400 ticks of live simulation.

Observation vector schema

Figure 5 — The 20-dimensional observation vector. Each dimension is bounded and normalized. The vector shape never changes regardless of how many agents are active — critical for batch RL training.

Symmetry validation

The observation must be symmetric: what Player 1 sees as "opponent's king tower HP" must exactly equal what Player 2 sees as "my king tower HP." This is verified at tick 0 (before any combat alters state):

obs₁["opp_king_hp"] = 4824 = obs₂["my_king_hp"] ✓

obs₁["my_king_hp"] = 4824 = obs₂["opp_king_hp"] ✓

Symmetry ensures that a single RL policy can play as either player without observational bias — the same architecture used in self-play training for competitive multi-agent systems.

7. Parallel simulation throughput (measured) performance

This section measures the second scaling axis: how many independent matches can a single core sustain at 20 tps real-time. Unlike Section 1 (which stresses the O(N²) within-match targeting), this tests the O(M) across-match cost, where M independent GameState instances are all stepped within each 50 ms frame.

Method: create M matches (each with 4 active agents — light combat, representative of a typical RL training environment), step all M once, measure wall-clock time for the entire batch. Repeat for 20 frames (1 second of real-time). Escalate M from 10 to 10,000.

Measured batch throughput (Apple M1 Pro, single core)

Parallel matches	Avg batch (ms)	p99 batch (ms)	Per-match cost (μs)	Budget used
10	0.025	0.036	2.53	0.07%
100	0.242	0.305	2.42	0.61%
500	1.321	1.695	2.64	3.39%
1,000	2.637	3.343	2.64	6.69%
2,000	5.160	6.730	2.58	13.5%
5,000	13.333	18.437	2.67	36.9%
10,000	26.981	36.700	2.70	73.4%

Result: 10,000 simultaneous matches on a single core, all stepped within one 50 ms frame. This is a measured result, not a projection. The per-match cost stays remarkably stable at ~2.6 μs across three orders of magnitude — near-perfect linear scaling with no cache degradation up to 10,000 matches.

Estimated ceiling (from linear extrapolation of per-match cost): ~18,500 parallel matches per core.

Figure 6 — p99 batch step time vs. number of concurrent matches, measured on Apple M1 Pro. Each match has 4 active agents (light combat). The relationship is linear — no cache degradation up to 10,000 matches.

Why linear and not quadratic?

Section 1 showed O(N²) scaling for agents within a match because every agent scans all opponents. But across matches, there is no interaction — stepping match A has zero coupling to match B. The only potential degradation is L3 cache pressure: 10,000 GameState instances (~2 KB each) total ~20 MB, which fits within the M1 Pro's shared cache. If the working set exceeded cache, we would see a latency knee — this did not occur up to 10,000.

Deployment projections (from measured per-match cost)

Deployment	Cores	Measured per core	Projected total	Training samples/sec
Single M1 Pro core	1	10,000 (measured)	10,000	200,000
Single M1 Pro core (est. ceiling)	1	18,500 (extrapolated)	18,500	370,000
Edge node (8 perf. cores)	8	18,500 × 8	148,000	2,960,000
K8s cluster (64 cores)	64	18,500 × 64	1,184,000	23,680,000

Training samples/sec assumes 20 observations per second per match (the engine's tick rate). The single-core measured figure of 10,000 parallel RL environments at real-time speed means a small cluster can generate tens of millions of training samples per second — sufficient for large-scale self-play RL without the simulation being the bottleneck.

8. Validation results

RESULTS: 9/9 passed, 0/9 failed
Hardware: Apple M1 Pro, 16 GB RAM, single core

Performance:
  ✓ 5.1  Multi-unit scaling    150 agents, p99 = 223 μs, 0 dead entities in list
  ✓ 5.1b Spawner growth        108 engine-spawned agents, peak 32, steady state 6
  ✓ 5.2  Memory stability      0.33 MB delta across 5 matches (no leak)
  ✓ 5.3  Tick latency          Overall p99 = 35.5 μs, throughput 110,354 tps
  ✓ 5.4  Agent ceiling         3,000 agents: p99 = 34.9 ms (still under 50 ms budget)
  ✓ 5.5  Parallel simulation   10,000 matches at real-time speed (measured, not projected)

Observability:
  ✓ 6.1  State logging         600 ticks, 0 field errors, JSON serializable
  ✓ 6.2  Event tracing         Golem death-spawn chain verified (2 Golemites)
  ✓ 6.3  Feature extraction    Consistent 20-dim vector, 0 value errors, symmetry OK

Key numbers:
  Agent ceiling:              3,000 tested, ~3,600 estimated (single core)
  Parallel match ceiling:     10,000 measured, ~18,500 estimated (single core)
  Tick budget usage:          0.36% (sim + capture + features at typical load)
  AI inference headroom:      49.8 ms per tick
  Memory footprint:           19.3 MB total (data + engine + entities)
  Per-match cost:             ~2.6 μs (constant across 10–10,000 matches)