Hone vs. The 1 Billion Row Challenge
1,000,000,000 rows of data. No hand-tuning. Just an agent, a benchmark, and a budget.
The 1 Billion Row Challenge is simple on paper: read a file with 1B rows of weather station measurements, compute min/mean/max per station, as fast as possible. In Python, a naive solution takes minutes. The best human-optimized ones use memory-mapped files, multiprocessing, and numpy.
I'm not optimizing it by hand. I'm giving it to Hone — and letting it figure it out.
Hone is now on PyPI. Install it with pip install hone-ai.
This is a living document. I'll update it as each run completes. Follow the code at laxmena/hone-1brc.
The Setup
The challenge: Parse a 1B-row file. Each row: Hamburg;12.0. Compute min/mean/max per station. Print results sorted alphabetically.
The metric: Wall-clock runtime in seconds. Lower is better.
The constraints: Python standard library only. No numpy, no pandas, no third-party packages. Correctness must be preserved — output format and values must not change.
The baseline: Simple. Correct. Slow. One thread, one line at a time, float() on every value.
Results at a Glance
| Run | Model | Dataset | Baseline | Optimized | Improvement |
|---|---|---|---|---|---|
| 1 | Haiku | 1M rows | 0.546s | 0.471s | 13.7% |
| 2 | Haiku | 100M rows | 47.197s | 42.739s | 9.4% |
| 3 | Sonnet | 100M rows | 48.104s | 10.110s | 79% |
| 4 | Sonnet (100M solution, no re-run) | 1B rows | 487.525s | 130.080s | 73.3% |
| 5 | Sonnet | 1B rows | 487.525s | 90.929s | 81.4% |
Episode 1: Haiku, 1M rows — 13.7% faster
March 25, 2026
0.546s → 0.471s
First run: claude-haiku-4-5, 1M rows, $5 budget, 50 max iterations.
The 13.7% gain looks decent on paper. It isn't. The absolute numbers are tiny — we're talking 75 milliseconds. At this scale, Python startup time and OS disk caching dominate. The agent is optimizing noise, not the algorithm. Haiku made incremental tweaks but never found a structural breakthrough.
Wrong dataset size. Move on.
Hone v1.2.0: --goal-file
March 25, 2026
Episode 1 exposed a friction point. Pasting a long goal string into the terminal every run is error-prone and hard to version. For complex, multi-constraint goals it breaks down fast.
I added --goal-file to Hone — pass a path to a plain text file, Hone reads the goal from there. Same idea as Karpathy's program.md in autoresearch. The goal now lives alongside the code, versioned in git.
Live in v1.2.0. pip install --upgrade hone-ai.
Episode 2: Haiku, 100M rows — 9.4% faster
March 25, 2026
47.197s → 42.739s
10x harder dataset. Now I/O pressure actually matters — 4.5 seconds saved is a real signal.
But Haiku still couldn't find the structural moves. It made safe, local edits — better buffering, minor parsing cleanup — and never stepped back to reconsider the architecture. No parallelism. No mmap. No integer parsing. It hit its ceiling.
Episode 3: Sonnet, 100M rows — 79% faster
March 25, 2026
48.104s → 10.110s
Same benchmark. Same constraints. One change: claude-haiku-4-5 → claude-sonnet-4-6.
38 seconds saved. The agent didn't tune the baseline — it replaced it.
What Sonnet actually did
1. Text → Binary reads with mmap
The baseline opens the file in text mode and reads line by line. Sonnet switched to binary mode with memory-mapped I/O — the OS maps the file directly into memory, eliminating repeated read syscalls.
2. float() → integer arithmetic
Every float() call in the baseline is expensive. Sonnet eliminated them entirely. Temperatures are stored as integers ×10 — 12.3 becomes 123. The decimal point is skipped by knowing its fixed position in the byte string. Division back to float happens only once, at output time. It also pre-built a lookup table for all valid temperature values (-99.9 to 99.9) to skip even manual parsing on the common case.
3. Multiprocessing across all CPU cores
The baseline is single-threaded. Sonnet split the file into cpu_count() × 8 chunks, aligned each boundary to the next newline to avoid splitting rows, and ran each chunk in a separate process. Results merged at the end.
4. strip() + index() → partition()
The baseline does line.strip() then line.index(";") — two passes. Sonnet used line.partition(b';') — one pass, station and temperature in a single call.
Why Haiku couldn't find this
Haiku made safe, local edits. It never stepped back to reconsider the architecture. Sonnet saw the whole picture: the bottleneck isn't any single line, it's the approach. Single-threaded text parsing doesn't scale. The winning move was to throw it out and start from a parallel, binary-aware design.
Q: Does model choice matter more than iteration count?
Episode 4: Sonnet's 100M solution, dropped on 1B rows — 73.3% faster
April 7, 2026
487.525s → 130.080s
Before spending more API budget, I wanted to answer a simpler question first: does the architecture Sonnet found at 100M rows even generalize? No new Hone run. No new cost. Just the existing solution, run unchanged against the full 1B dataset.
357 seconds saved. The answer is yes — it generalizes. mmap, multiprocessing, and integer arithmetic aren't tricks tuned to a particular file size. They're structural. The solution held up.
But 130 seconds also exposed the ceiling. Optimizing against a 10x smaller proxy leaves performance on the table. The solution was good — just not good enough. Time to run Hone against the real target.
Episode 5: Sonnet, 1B rows directly — 81.4% faster
April 7, 2026
487.525s → 90.929s
Same model, same constraints. This time Hone optimized against the full 1B row dataset from the start.
396 seconds saved. Under 91 seconds for a billion rows of Python.
The gains from Episode 4 weren't wasted — they were the floor. Hone started from a strong architecture and pushed further. 81.4% beats the 79% from Episode 3. More data, better result. The solution isn't fragile.
The lesson: optimize against the real target. A proxy dataset is useful for iteration speed, but the final run needs to face the actual problem.
What's Next
Under 91 seconds on 1B rows. The question now is how much headroom is left — and whether Hone can find it without numpy or third-party packages.
Updates appear here as experiments run. Subscribe below or follow via RSS.