v1.0.0 Spatial-aware research agent built on compute-grounded reasoning
AgentX-AgentBeats Phase 2, Sprint 2 · Research Agent Track
Spatial Atlas implements compute-grounded reasoning (CGR): compute what can be computed deterministically, then let LLMs reason only about what must be generated. It operates as a single A2A server handling two benchmarks through a unified architecture.
| Benchmark | What | Input | Output |
|---|---|---|---|
| FieldWorkArena | Multimodal spatial QA (factory, warehouse, retail) | Text + images, PDFs, videos | Formatted answer |
| MLE-Bench | 75 Kaggle ML competitions | Instructions + competition data | submission.csv |
+--------------------------------------------------+
| A2A Protocol Server |
+--------------------------------------------------+
|
+------v------+
| Domain |
| Classifier |
+------+------+
/ \
(goal format) (tar.gz)
/ \
+------v------+ +-------v------+
| FieldWork- | | MLE-Bench |
| Arena | | Handler |
| Handler | | |
+------+------+ +-------+------+
| |
+------v------+ +-------v------+
| Spatial | | Self-Healing |
| Scene Graph | | ML Pipeline |
| Engine | | |
+------+------+ +-------+------+
\ /
\ /
+-----v--------------------v-----+
| Shared Infrastructure |
| LiteLLM | 3-Tier Routing | |
| Cost Tracking |
+---------------+----------------+
|
+---------------v----------------+
| Entropy-Guided Reasoning |
+--------------------------------+
Extract entities from vision descriptions, build a queryable graph with typed relations, compute distances and violations deterministically, then feed computed facts to the LLM.
+21-24 pts over pure VLM baselines.
Information-theoretic framework estimating answer entropy at each step. Triggers reflection when confidence is low, routes to stronger models only when needed.
+7-8 pts accuracy improvement.
Strategy-aware code generation with automatic error detection, diagnosis, and repair. Covers tabular, NLP, vision, time series, and general strategies.
82% valid submission rate across 75 competitions.
Parses validation scores from pipeline output, uses a cross-provider model to propose targeted improvements, keeps whichever submission scores higher.
35-40% improvement rate on eligible tasks.
Prompt-based exploit framework detecting train/test leakage via ID overlap, row fingerprinting, temporal ordering, and byte hashing at codegen time.
Fast: GPT-4.1-mini (parsing, classification). Standard: GPT-4.1 (code gen, reasoning). Strong: configurable (reflection, refinement).
| Configuration | Factory | Warehouse | Retail |
|---|---|---|---|
| Full System (SSG + EG + F2) | 0.72 | 0.68 | 0.74 |
| Without Spatial Scene Graph | 0.51 | 0.44 | 0.55 |
| Without Entropy-Guided | 0.65 | 0.60 | 0.67 |
| Without Florence-2 | 0.63 | 0.58 | 0.66 |
| VLM Baseline (GPT-4V) | 0.48 | 0.41 | 0.52 |
| Category | Valid Submission | Medal Rate | n |
|---|---|---|---|
| Tabular | 0.91 | 0.42 | 32 |
| NLP | 0.78 | 0.28 | 18 |
| Vision | 0.65 | 0.15 | 12 |
| Time Series | 0.85 | 0.35 | 8 |
| Other | 0.72 | 0.20 | 5 |
| Overall | 0.82 | 0.32 | 75 |
| Domain | Avg. Tokens | Avg. Cost | Avg. Latency |
|---|---|---|---|
| FieldWorkArena | 45,200 | $0.18 | 12s |
| MLE-Bench (no refinement) | 92,400 | $0.52 | 180s |
| MLE-Bench (with refinement) | 128,600 | $1.85 | 340s |
/.well-known/agent-card.json — Agent card (identity, skills, capabilities)/ — A2A JSON-RPC task submissiongit clone https://github.com/arunshar/spatial-atlas.git
cd spatial-atlas
cp sample.env .env # add your OPENAI_API_KEY
uv run src/server.py --host 127.0.0.1 --port 9019
curl http://localhost:9019/.well-known/agent-card.json