{

Experiment Index

Searchable table of all PodPedia experiments. Each entry links to a full report in this directory.

Quick Reference

ID	Date	Hypothesis	Conclusion	Commit	Status
ADR-001-live-ab	2026-05-13	10K chunk threshold (ADR-001) reduces pipeline latency vs. 20K threshold	Promising but inconclusive (n=15, environment confound)	Production vs. Staging	🟡 inconclusive
ADR-007-010-pipeline	2026-05-12	Merged ADR-007 + ADR-010 introduce no regression in graph latency or memory	Inconclusive (benchtime mismatch: 1s vs 30s)	`e5d631c` / `dc1fbe5`	⚪ inconclusive
EXP-001	2026-05-11	Formalized experiment tracking will improve performance decision rigor and reduce regression incidents	Pending (deadline 2026-05-25)	—	🟡 planned
EXP-002	—	Ollama-based LLM handler achieves comparable p95 latency to Vertex AI for <500 token prompts	—	—	⬜ planned
EXP-003	—	SQLite-backed graph storage outperforms in-memory graph for entity counts >10,000	—	—	⬜ planned
EXP-004	—	Batched entity resolution (chunk size 50) reduces p99 latency vs. sequential resolution by ≥30%	—	—	⬜ planned
EXP-005	—	Token-bucket rate limiter sustains 2x throughput vs. sliding-window under bursty ingest patterns	—	—	⬜ planned

Status Legend

Icon	Meaning
🟢	Confirmed
🔴	Rejected
🟡	In progress / planned with deadline
⬜	Planned (no start date)
⚪	Inconclusive

ADR-001-live-ab — 10K Character Threshold Live A/B Latency Test

Field	Value
ID	ADR-001-live-ab
Type	Live A/B (production vs. staging)
Date	2026-05-13
Author	@coding-agent
Status	🟡 inconclusive — low trial count

Hypothesis

Lowering the parallel graph extraction chunking threshold from 20K → 10K characters (ADR-001) reduces wall-clock extraction latency.

Results Summary

Payload	Production (20K)	Staging (10K)	Δ	KS p-value
5K chars	50.1s	35.6s	-28.9%	0.017*
15K chars	45.8s	37.0s	-19.2%	0.308
25K chars	116.2s	37.0s	-68.2%	<0.001*

Conclusion

Directional evidence strongly favors staging (10K threshold), especially at 25K chars (68% faster, d=-2.06, p<0.001). However, the 5K result shows an environment confound (staging is faster even where no parallelization benefit exists), and the critical 15K payload did not reach significance (p=0.31, n=15). Iterate: re-run with controlled environment and ≥30 trials.

Report: 2026-05-13-adr001-live-ab.md

ADR-007-010-pipeline — Pipeline Regression Experiment

Field	Value
ID	ADR-007-010-pipeline
Type	Regression Test
Target	`backend/handlers/graph.go`
Date	2026-05-12
Author	@coding-agent
PR	feature/experiment-adr007-010
Commit (baseline)	`e5d631ca266d22d1efd2bc9e6d0537c7a6f4b978`
Commit (experiment)	`dc1fbe51c83a020cdad6a99325d65a83076bf0c5`
Status	⚪ inconclusive — benchtime mismatch

Hypothesis

Merged ADR-007 + ADR-010 changes introduce no statistically significant regression in graph snapshot latency, graph traversal latency, or memory allocations.

Results

All 12 benchmark variants showed statistically significant improvements (2.15% to 10.69%)
Memory allocations: zero change (identical allocs/op and bytes/op across all variants)
However: baseline used -benchtime=30s -count=30, experiment used -benchtime=1s -count=10
The consistent improvement across ALL variants strongly suggests a benchtime artifact, not actual code improvement

Conclusion

Inconclusive. The shorter benchtime invalidates direct latency comparison. Memory analysis (valid regardless of benchtime) shows zero allocation changes — a positive signal. Recommended re-run with identical parameters.

Report: 2026-05-12-adr007-010-pipeline.md

EXP-001 — Meta-Experiment: Formalized Tracking Efficacy

Field	Value
ID	EXP-001
Type	Meta-experiment
Start Date	2026-05-11
Deadline	2026-05-25
Author	@team
Status	🟡 planned

Hypothesis

Formalized experiment tracking (ADR-009) will reduce the number of performance regressions merged to main by ≥50% compared to the 3 months prior, and will increase the rate at which performance questions are answered with data rather than intuition.

Success Criteria

≥50% reduction in performance regression incidents (counted as reverts or hotfixes due to perf)
≥80% of performance-sensitive PRs have a linked experiment report
Developer survey shows increased confidence in performance decisions

Methodology

This is a process meta-experiment. Over a 2-week trial period, all performance-sensitive PRs must comply with ADR-009. Data collected:

Count of performance regressions merged before/after
Count of PRs with/without linked experiments
Qualitative developer feedback

Report

Report file: EXP-001-formalized-tracking-efficacy.md (pending)

EXP-002 — Ollama vs. Vertex AI LLM Latency Comparison

Field	Value
ID	EXP-002
Type	A/B Comparison
Target Component	`backend/handlers/llm_ollama.go`, `backend/handlers/llm_vertex.go`
Status	⬜ planned

Hypothesis

The Ollama-based LLM handler achieves comparable p95 latency to the Vertex AI handler for prompts under 500 tokens, with latency within ±10% of Vertex at p50 and p95.

Success Criteria

p95 Ollama latency ≤ 1.10 × p95 Vertex latency for <500 token prompts
p99 Ollama latency ≤ 1.20 × p99 Vertex latency
No significant regression in output quality (measured via structured output validation)

Key Metrics

p50, p95, p99 latency per handler
Token throughput (tokens/sec)
Memory allocations per invocation

EXP-003 — SQLite Graph vs. In-Memory Graph at Scale

Field	Value
ID	EXP-003
Type	Optimization Experiment
Target Component	`backend/handlers/graph_sqlite.go`, `backend/handlers/graph.go`
Status	⬜ planned

Hypothesis

SQLite-backed graph storage outperforms in-memory graph for entity counts exceeding 10,000, reducing p95 traversal latency by ≥40% while adding <10% overhead for small graphs (<1,000 entities).

Success Criteria

p95 traversal latency reduced by ≥40% for >10K entity graphs
p95 latency for <1K entity graphs not increased by >10%
Memory usage reduced by ≥50% for >10K entity graphs

Key Metrics

Graph traversal latency (p50, p95, p99)
Memory usage (RSS at steady state)
Insert/update latency for incremental graph mutations

EXP-004 — Batched vs. Sequential Entity Resolution

Field	Value
ID	EXP-004
Type	Optimization Experiment
Target Component	`backend/handlers/entity_resolver.go`
Status	⬜ planned

Hypothesis

Batched entity resolution with chunk size 50 reduces p99 resolution latency by ≥30% compared to sequential resolution, without increasing resolution error rate by more than 1 percentage point.

Success Criteria

p99 latency reduced by ≥30% (KS test p < 0.05)
Resolution error rate Δ ≤ 1.0 percentage point
Memory usage not increased by >20%

Key Metrics

Entity resolution latency (p50, p95, p99)
Resolution accuracy (precision/recall vs. gold standard)
Memory allocations per resolution batch

EXP-005 — Token-Bucket vs. Sliding-Window Rate Limiter

Field	Value
ID	EXP-005
Type	A/B Comparison
Target Component	`backend/handlers/ratelimit.go`
Status	⬜ planned

Hypothesis

The token-bucket rate limiter sustains 2x the throughput of the sliding-window limiter under bursty ingest patterns (burst size 100 requests, 1s window), while maintaining equivalent fairness (measured by max single-client starvation time).

Success Criteria

Token-bucket throughput ≥ 2.0 × sliding-window throughput under bursty load
Max single-client starvation time ≤ sliding-window baseline
p99 request latency under token-bucket ≤ p99 under sliding-window

Key Metrics

Sustained throughput (req/sec)
Request latency distribution (p50, p95, p99)
Client fairness (max consecutive denials per client)

}