Hone vs. The 1 Billion Row Challenge

Wed, 25 Mar 2026 04:06:42 +0000

1,000,000,000 rows of data. No hand-tuning. Just an agent, a benchmark, and a budget.

The 1 Billion Row Challenge is simple on paper: read a file with 1B rows of weather station measurements, compute min/mean/max per station, as fast as possible. In Python, a naive solution takes minutes. The best human-optimized ones use memory-mapped files, multiprocessing, and numpy.

I'm not optimizing it by hand. I'm giving it to Hone — and letting it figure it out.

Hone is now on PyPI. Install it with pip install hone-ai.

This is a living document. I'll update it as each run completes. Follow the code at laxmena/hone-1brc.

The Setup

The challenge: Parse a 1B-row file. Each row: Hamburg;12.0. Compute min/mean/max per station. Print results sorted alphabetically.

The metric: Wall-clock runtime in seconds. Lower is better.

The constraints: Python standard library only. No numpy, no pandas, no third-party packages. Correctness must be preserved — output format and values must not change.

The baseline: Simple. Correct. Slow. One thread, one line at a time, float() on every value.

Results at a Glance

Run	Model	Dataset	Baseline	Optimized	Improvement
1	Haiku	1M rows	0.546s	0.471s	13.7%
2	Haiku	100M rows	47.197s	42.739s	9.4%
3	Sonnet	100M rows	48.104s	10.110s	79%
4	Sonnet (100M solution, no re-run)	1B rows	487.525s	130.080s	73.3%
5	Sonnet	1B rows	487.525s	90.929s	81.4%

Episode 1: Haiku, 1M rows — 13.7% faster

March 25, 2026

0.546s → 0.471s

First run: claude-haiku-4-5, 1M rows, $5 budget, 50 max iterations.

The 13.7% gain looks decent on paper. It isn't. The absolute numbers are tiny — we're talking 75 milliseconds. At this scale, Python startup time and OS disk caching dominate. The agent is optimizing noise, not the algorithm. Haiku made incremental tweaks but never found a structural breakthrough.

Wrong dataset size. Move on.

Hone v1.2.0: `--goal-file`

March 25, 2026

Episode 1 exposed a friction point. Pasting a long goal string into the terminal every run is error-prone and hard to version. For complex, multi-constraint goals it breaks down fast.

I added --goal-file to Hone — pass a path to a plain text file, Hone reads the goal from there. Same idea as Karpathy's program.md in autoresearch. The goal now lives alongside the code, versioned in git.

Live in v1.2.0. pip install --upgrade hone-ai.

Episode 2: Haiku, 100M rows — 9.4% faster

March 25, 2026

47.197s → 42.739s

10x harder dataset. Now I/O pressure actually matters — 4.5 seconds saved is a real signal.

But Haiku still couldn't find the structural moves. It made safe, local edits — better buffering, minor parsing cleanup — and never stepped back to reconsider the architecture. No parallelism. No mmap. No integer parsing. It hit its ceiling.

Episode 3: Sonnet, 100M rows — 79% faster

March 25, 2026

48.104s → 10.110s

Same benchmark. Same constraints. One change: claude-haiku-4-5 → claude-sonnet-4-6.

38 seconds saved. The agent didn't tune the baseline — it replaced it.

What Sonnet actually did

1. Text → Binary reads with mmap

The baseline opens the file in text mode and reads line by line. Sonnet switched to binary mode with memory-mapped I/O — the OS maps the file directly into memory, eliminating repeated read syscalls.

2. float() → integer arithmetic

Every float() call in the baseline is expensive. Sonnet eliminated them entirely. Temperatures are stored as integers ×10 — 12.3 becomes 123. The decimal point is skipped by knowing its fixed position in the byte string. Division back to float happens only once, at output time. It also pre-built a lookup table for all valid temperature values (-99.9 to 99.9) to skip even manual parsing on the common case.

3. Multiprocessing across all CPU cores

The baseline is single-threaded. Sonnet split the file into cpu_count() × 8 chunks, aligned each boundary to the next newline to avoid splitting rows, and ran each chunk in a separate process. Results merged at the end.

4. strip() + index() → partition()

The baseline does line.strip() then line.index(";") — two passes. Sonnet used line.partition(b';') — one pass, station and temperature in a single call.

Why Haiku couldn't find this

Haiku made safe, local edits. It never stepped back to reconsider the architecture. Sonnet saw the whole picture: the bottleneck isn't any single line, it's the approach. Single-threaded text parsing doesn't scale. The winning move was to throw it out and start from a parallel, binary-aware design.

Q: Does model choice matter more than iteration count?

Episode 4: Sonnet's 100M solution, dropped on 1B rows — 73.3% faster

April 7, 2026

487.525s → 130.080s

Before spending more API budget, I wanted to answer a simpler question first: does the architecture Sonnet found at 100M rows even generalize? No new Hone run. No new cost. Just the existing solution, run unchanged against the full 1B dataset.

357 seconds saved. The answer is yes — it generalizes. mmap, multiprocessing, and integer arithmetic aren't tricks tuned to a particular file size. They're structural. The solution held up.

But 130 seconds also exposed the ceiling. Optimizing against a 10x smaller proxy leaves performance on the table. The solution was good — just not good enough. Time to run Hone against the real target.

Source code as Gist here

Episode 5: Sonnet, 1B rows directly — 81.4% faster

April 7, 2026

487.525s → 90.929s

Same model, same constraints. This time Hone optimized against the full 1B row dataset from the start.

396 seconds saved. Under 91 seconds for a billion rows of Python.

The gains from Episode 4 weren't wasted — they were the floor. Hone started from a strong architecture and pushed further. 81.4% beats the 79% from Episode 3. More data, better result. The solution isn't fragile.

Source code as Gist here

The lesson: optimize against the real target. A proxy dataset is useful for iteration speed, but the final run needs to face the actual problem.

What's Next

Under 91 seconds on 1B rows. The question now is how much headroom is left — and whether Hone can find it without numpy or third-party packages.

Updates appear here as experiments run. Subscribe below or follow via RSS.

#engineering #hone #ai

hone — laxmena

Hone vs. The 1 Billion Row Challenge

The Setup

Results at a Glance

Episode 1: Haiku, 1M rows — 13.7% faster

Hone v1.2.0: --goal-file

Episode 2: Haiku, 100M rows — 9.4% faster

Episode 3: Sonnet, 100M rows — 79% faster

What Sonnet actually did

Why Haiku couldn't find this

Episode 4: Sonnet's 100M solution, dropped on 1B rows — 73.3% faster

Episode 5: Sonnet, 1B rows directly — 81.4% faster

What's Next

Hone v1.2.0: `--goal-file`