Node Scaling — 2000 Edge Nodes

Status & scope

Stage: Draft
Date: 2026-03-14
Requirement: VRS use case demands 2000 edge nodes reporting to a single hub.

Problem

Two scaling axes:

Ingest capacity — Can the hub accept 2000 concurrent POSTs?
Fusion capacity — Can the engine process 2000 nodes?

The Quadratic Problem

run_multi_fusion() uses itertools.combinations(node_ids, 2) for pairwise comparison: - 3 nodes → 3 pairs (current demo) - 100 nodes → 4,950 pairs - 500 nodes → 124,750 pairs - 2000 nodes → 1,999,000 pairs

Each pair runs multi-pass blocking + scoring. Even if each pair takes 0.1ms, 2M pairs = 200 seconds. This is the bottleneck.

Mitigation: Pre-merge by Blocking Key

Most node pairs will share ZERO blocking keys and produce ZERO candidates. Instead of iterating all node pairs, we can:

Build a GLOBAL blocking index: {blocking_key: [(node_id, record_id, record)...]}
Only compare records that share a blocking key — regardless of which node they came from
This converts O(nodes²) into O(blocks × records_per_block²)

For island demo: blocking on phenomenon_class with ~5 values. 2000 nodes × 5 obs = 10,000 records. Each phenomenon has ~2000 records. Candidates within each block: 2000² / 2 = 2M per phenomenon × 5 = 10M total. Still large.

But with proper blocking (phenomenon_class + temporal bucketing), this drops dramatically.

Test Plan

Phase 1: Find the Breaking Point (local, no Docker)

Test	Nodes	Obs/Node	Total Obs	Expected
S-20a	10	5	50	Baseline
S-20b	50	5	250	Fast
S-20c	100	5	500	Should work
S-20d	500	5	2,500	May slow down
S-20e	1000	5	5,000	Stress
S-20f	2000	5	10,000	Target

Measure: ingest time, fusion time, memory, match count.

Phase 2: Ingest Load Test (local, threading)

Test	Concurrent POSTs	Expected
S-20g	100 simultaneous	< 1s
S-20h	500 simultaneous	< 2s
S-20i	2000 simultaneous	< 5s

Phase 3: Optimize if Needed

If Phase 1 shows run_multi_fusion() is too slow at 2000 nodes, implement global blocking index optimization. This is a performance fix in pipeline.py, not an architecture change.

File Plan

File	Action
`tests/test_node_scaling.py`	New — all scaling tests
`parallax/ops/fusion/pipeline.py`	May need optimization for global blocking

Acceptance Criteria

2000 nodes ingest in < 5s (local TestClient)
2000 nodes fusion completes — establishing the baseline time is the first deliverable of this spec; the measured value becomes the regression budget
Accuracy unchanged from 3-node baseline where ground truth overlaps
No crashes or OOM at any scale

Depends on: component.parallax.three-phase-protocol

Realizes: product.fusion