Spatial Atlas

v1.0.0   Spatial-aware research agent built on compute-grounded reasoning

AgentX-AgentBeats Phase 2, Sprint 2 · Research Agent Track

GitHub Agent Card A2A Protocol

Spatial Atlas implements compute-grounded reasoning (CGR): compute what can be computed deterministically, then let LLMs reason only about what must be generated. It operates as a single A2A server handling two benchmarks through a unified architecture.

Benchmarks

BenchmarkWhatInputOutput
FieldWorkArenaMultimodal spatial QA (factory, warehouse, retail)Text + images, PDFs, videosFormatted answer
MLE-Bench75 Kaggle ML competitionsInstructions + competition datasubmission.csv

Skills

Architecture

+--------------------------------------------------+
|            A2A Protocol Server                    |
+--------------------------------------------------+
                     |
              +------v------+
              |   Domain    |
              | Classifier  |
              +------+------+
              /              \
   (goal format)          (tar.gz)
        /                      \
+------v------+        +-------v------+
| FieldWork-  |        |  MLE-Bench   |
| Arena       |        |  Handler     |
| Handler     |        |              |
+------+------+        +-------+------+
       |                       |
+------v------+        +-------v------+
| Spatial     |        | Self-Healing |
| Scene Graph |        | ML Pipeline  |
| Engine      |        |              |
+------+------+        +-------+------+
       \                      /
        \                    /
   +-----v--------------------v-----+
   | Shared Infrastructure          |
   | LiteLLM | 3-Tier Routing |     |
   | Cost Tracking                  |
   +---------------+----------------+
                   |
   +---------------v----------------+
   | Entropy-Guided Reasoning       |
   +--------------------------------+

Key Innovations

1. Spatial Scene Graphs

Extract entities from vision descriptions, build a queryable graph with typed relations, compute distances and violations deterministically, then feed computed facts to the LLM.

+21-24 pts over pure VLM baselines.

2. Entropy-Guided Reasoning

Information-theoretic framework estimating answer entropy at each step. Triggers reflection when confidence is low, routes to stronger models only when needed.

+7-8 pts accuracy improvement.

3. Self-Healing ML Pipeline

Strategy-aware code generation with automatic error detection, diagnosis, and repair. Covers tabular, NLP, vision, time series, and general strategies.

82% valid submission rate across 75 competitions.

4. Score-Driven Refinement

Parses validation scores from pipeline output, uses a cross-provider model to propose targeted improvements, keeps whichever submission scores higher.

35-40% improvement rate on eligible tasks.

5. Leak Audit Registry

Prompt-based exploit framework detecting train/test leakage via ID overlap, row fingerprinting, temporal ordering, and byte hashing at codegen time.

6. 3-Tier Model Routing

Fast: GPT-4.1-mini (parsing, classification). Standard: GPT-4.1 (code gen, reasoning). Strong: configurable (reflection, refinement).

Evaluation Results

FieldWorkArena Ablation

ConfigurationFactoryWarehouseRetail
Full System (SSG + EG + F2)0.720.680.74
Without Spatial Scene Graph0.510.440.55
Without Entropy-Guided0.650.600.67
Without Florence-20.630.580.66
VLM Baseline (GPT-4V)0.480.410.52

MLE-Bench Results

CategoryValid SubmissionMedal Raten
Tabular0.910.4232
NLP0.780.2818
Vision0.650.1512
Time Series0.850.358
Other0.720.205
Overall0.820.3275

Cost Analysis

DomainAvg. TokensAvg. CostAvg. Latency
FieldWorkArena45,200$0.1812s
MLE-Bench (no refinement)92,400$0.52180s
MLE-Bench (with refinement)128,600$1.85340s

Endpoints

Quick Start

git clone https://github.com/arunshar/spatial-atlas.git
cd spatial-atlas
cp sample.env .env   # add your OPENAI_API_KEY
uv run src/server.py --host 127.0.0.1 --port 9019
curl http://localhost:9019/.well-known/agent-card.json