{

Experiment Index

Searchable table of all PodPedia experiments. Each entry links to a full report in this directory.

Quick Reference

ID Date Hypothesis Conclusion Commit Status
ADR-001-live-ab 2026-05-13 10K chunk threshold (ADR-001) reduces pipeline latency vs. 20K threshold Promising but inconclusive (n=15, environment confound) Production vs. Staging 🟡 inconclusive
ADR-007-010-pipeline 2026-05-12 Merged ADR-007 + ADR-010 introduce no regression in graph latency or memory Inconclusive (benchtime mismatch: 1s vs 30s) e5d631c / dc1fbe5 ⚪ inconclusive
EXP-001 2026-05-11 Formalized experiment tracking will improve performance decision rigor and reduce regression incidents Pending (deadline 2026-05-25) — 🟡 planned
EXP-002 — Ollama-based LLM handler achieves comparable p95 latency to Vertex AI for <500 token prompts — — ⬜ planned
EXP-003 — SQLite-backed graph storage outperforms in-memory graph for entity counts >10,000 — — ⬜ planned
EXP-004 — Batched entity resolution (chunk size 50) reduces p99 latency vs. sequential resolution by ≥30% — — ⬜ planned
EXP-005 — Token-bucket rate limiter sustains 2x throughput vs. sliding-window under bursty ingest patterns — — ⬜ planned

Status Legend

Icon Meaning
🟢 Confirmed
🔴 Rejected
🟡 In progress / planned with deadline
⬜ Planned (no start date)
⚪ Inconclusive

ADR-001-live-ab — 10K Character Threshold Live A/B Latency Test

Field Value
ID ADR-001-live-ab
Type Live A/B (production vs. staging)
Date 2026-05-13
Author @coding-agent
Status 🟡 inconclusive — low trial count

Hypothesis

Lowering the parallel graph extraction chunking threshold from 20K → 10K characters (ADR-001) reduces wall-clock extraction latency.

Results Summary

Payload Production (20K) Staging (10K) Δ KS p-value
5K chars 50.1s 35.6s -28.9% 0.017*
15K chars 45.8s 37.0s -19.2% 0.308
25K chars 116.2s 37.0s -68.2% <0.001*

Conclusion

Directional evidence strongly favors staging (10K threshold), especially at 25K chars (68% faster, d=-2.06, p<0.001). However, the 5K result shows an environment confound (staging is faster even where no parallelization benefit exists), and the critical 15K payload did not reach significance (p=0.31, n=15). Iterate: re-run with controlled environment and ≥30 trials.

Report: 2026-05-13-adr001-live-ab.md


ADR-007-010-pipeline — Pipeline Regression Experiment

Field Value
ID ADR-007-010-pipeline
Type Regression Test
Target backend/handlers/graph.go
Date 2026-05-12
Author @coding-agent
PR feature/experiment-adr007-010
Commit (baseline) e5d631ca266d22d1efd2bc9e6d0537c7a6f4b978
Commit (experiment) dc1fbe51c83a020cdad6a99325d65a83076bf0c5
Status ⚪ inconclusive — benchtime mismatch

Hypothesis

Merged ADR-007 + ADR-010 changes introduce no statistically significant regression in graph snapshot latency, graph traversal latency, or memory allocations.

Results

Conclusion

Inconclusive. The shorter benchtime invalidates direct latency comparison. Memory analysis (valid regardless of benchtime) shows zero allocation changes — a positive signal. Recommended re-run with identical parameters.

Report: 2026-05-12-adr007-010-pipeline.md


EXP-001 — Meta-Experiment: Formalized Tracking Efficacy

Field Value
ID EXP-001
Type Meta-experiment
Start Date 2026-05-11
Deadline 2026-05-25
Author @team
Status 🟡 planned

Hypothesis

Formalized experiment tracking (ADR-009) will reduce the number of performance regressions merged to main by ≥50% compared to the 3 months prior, and will increase the rate at which performance questions are answered with data rather than intuition.

Success Criteria

Methodology

This is a process meta-experiment. Over a 2-week trial period, all performance-sensitive PRs must comply with ADR-009. Data collected:

Report

Report file: EXP-001-formalized-tracking-efficacy.md (pending)


EXP-002 — Ollama vs. Vertex AI LLM Latency Comparison

Field Value
ID EXP-002
Type A/B Comparison
Target Component backend/handlers/llm_ollama.go, backend/handlers/llm_vertex.go
Status ⬜ planned

Hypothesis

The Ollama-based LLM handler achieves comparable p95 latency to the Vertex AI handler for prompts under 500 tokens, with latency within ±10% of Vertex at p50 and p95.

Success Criteria

Key Metrics


EXP-003 — SQLite Graph vs. In-Memory Graph at Scale

Field Value
ID EXP-003
Type Optimization Experiment
Target Component backend/handlers/graph_sqlite.go, backend/handlers/graph.go
Status ⬜ planned

Hypothesis

SQLite-backed graph storage outperforms in-memory graph for entity counts exceeding 10,000, reducing p95 traversal latency by ≥40% while adding <10% overhead for small graphs (<1,000 entities).

Success Criteria

Key Metrics


EXP-004 — Batched vs. Sequential Entity Resolution

Field Value
ID EXP-004
Type Optimization Experiment
Target Component backend/handlers/entity_resolver.go
Status ⬜ planned

Hypothesis

Batched entity resolution with chunk size 50 reduces p99 resolution latency by ≥30% compared to sequential resolution, without increasing resolution error rate by more than 1 percentage point.

Success Criteria

Key Metrics


EXP-005 — Token-Bucket vs. Sliding-Window Rate Limiter

Field Value
ID EXP-005
Type A/B Comparison
Target Component backend/handlers/ratelimit.go
Status ⬜ planned

Hypothesis

The token-bucket rate limiter sustains 2x the throughput of the sliding-window limiter under bursty ingest patterns (burst size 100 requests, 1s window), while maintaining equivalent fairness (measured by max single-client starvation time).

Success Criteria

Key Metrics

}