Hone vs. The 1 Billion Row Challenge
1,000,000,000 rows of data. No hand-tuning. Just an agent, a benchmark, and a budget.
The 1 Billion Row Challenge is simple on paper: read a file with 1B rows of weather station measurements, compute min/mean/max per station, as fast as possible. In Python, a naive solution takes minutes. The best human-optimized ones use memory-mapped files, multiprocessing, and numpy.
I'm not optimizing it by hand. I'm giving it to Hone — and letting it figure it out.
Hone is now on PyPI. Install it with pip install hone-ai.
This is a living document. I'll update it as each run completes. Follow the code at laxmena/hone-1brc.
The Setup
The challenge: Parse a 1B-row file. Each row: Hamburg;12.0. Compute min/mean/max per station. Print results sorted alphabetically.
The metric: Wall-clock runtime in seconds. Lower is better.
The constraints: Python standard library only. No numpy, no pandas, no third-party packages. Correctness must be preserved — output format and values must not change.
The baseline:
with open(filepath, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
sep = line.index(";")
station = line[:sep]
temp = float(line[sep + 1:])
...
Simple. Correct. Slow. One thread, one line at a time, float() on every value.
Results at a Glance
| Run | Model | Dataset | Baseline | Optimized | Improvement |
|---|---|---|---|---|---|
| 1 | Haiku | 1M rows | 0.546s | 0.471s | 13.7% |
| 2 | Haiku | 100M rows | 47.197s | 42.739s | 9.4% |
| 3 | Sonnet | 100M rows | 48.104s | 10.110s | 79% |
Episode 1: Haiku, 1M rows — 13.7% faster
0.546s → 0.471s
First run: claude-haiku-4-5, 1M rows, $5 budget, 50 max iterations.
The 13.7% gain looks decent on paper. It isn't. The absolute numbers are tiny — we're talking 75 milliseconds. At this scale, Python startup time and OS disk caching dominate. The agent is optimizing noise, not the algorithm. Haiku made incremental tweaks but never found a structural breakthrough.
Wrong dataset size. Move on.
Hone v1.2.0: --goal-file
Episode 1 exposed a friction point. Pasting a long goal string into the terminal every run is error-prone and hard to version. For complex, multi-constraint goals it breaks down fast.
I added --goal-file to Hone — pass a path to a plain text file, Hone reads the goal from there. Same idea as Karpathy's program.md in autoresearch. The goal now lives alongside the code, versioned in git.
hone --goal-file program.md
--bench "python benchmark.py data/measurements_100M.txt"
--files "solution.py"
--optimize lower
--score-pattern "Time Taken:\s*(\d+\.\d+)"
--budget 3.0
--max-iter 50
--model claude-haiku-4-5
Live in v1.2.0. pip install --upgrade hone-ai.
Episode 2: Haiku, 100M rows — 9.4% faster
47.197s → 42.739s
10x harder dataset. Now I/O pressure actually matters — 4.5 seconds saved is a real signal.
But Haiku still couldn't find the structural moves. It made safe, local edits — better buffering, minor parsing cleanup — and never stepped back to reconsider the architecture. No parallelism. No mmap. No integer parsing. It hit its ceiling.
Episode 3: Sonnet, 100M rows — 79% faster
48.104s → 10.110s
Same benchmark. Same constraints. One change: claude-haiku-4-5 → claude-sonnet-4-6.
38 seconds saved. The agent didn't tune the baseline — it replaced it.
What Sonnet actually did
1. Text → Binary reads with mmap
The baseline opens the file in text mode and reads line by line. Sonnet switched to binary mode with memory-mapped I/O — the OS maps the file directly into memory, eliminating repeated read syscalls.
mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
chunk = mm[start:end]
2. float() → integer arithmetic
Every float() call in the baseline is expensive. Sonnet eliminated them entirely. Temperatures are stored as integers ×10 — 12.3 becomes 123. The decimal point is skipped by knowing its fixed position in the byte string. Division back to float happens only once, at output time.
d0 = tb[-1] - 48 # last digit
val = (tb[0] - 48) * 10 + d0 # b'12.3' → 123
It also pre-built a lookup table for all valid temperature values (-99.9 to 99.9) to skip even manual parsing on the common case.
3. Multiprocessing across all CPU cores
The baseline is single-threaded. Sonnet split the file into cpu_count() × 8 chunks, aligned each boundary to the next newline to avoid splitting rows, and ran each chunk in a separate process. Results merged at the end.
num_workers = cpu_count()
boundaries = find_chunk_boundaries(filepath, num_workers * 8)
with Pool(processes=num_workers) as pool:
all_stats = pool.map(process_chunk, args)
4. strip() + index() → partition()
The baseline does line.strip() then line.index(";") — two passes. Sonnet used line.partition(b';') — one pass, station and temperature in a single call.
Why Haiku couldn't find this
Haiku made safe, local edits. It never stepped back to reconsider the architecture. Sonnet saw the whole picture: the bottleneck isn't any single line, it's the approach. Single-threaded text parsing doesn't scale. The winning move was to throw it out and start from a parallel, binary-aware design.
Q: Does model choice matter more than iteration count?
What's Next
100M rows, 79% faster. The real test is 1B rows — 10x again. Running next.
Updates appear here as experiments run. Subscribe below or follow via RSS.