Hone vs. The 1 Billion Row Challenge

1,000,000,000 rows of data. No hand-tuning. Just an agent, a benchmark, and a budget.

The 1 Billion Row Challenge is simple on paper: read a file with 1B rows of weather station measurements, compute min/mean/max per station, as fast as possible. In Python, a naive solution takes minutes. The best human-optimized ones use memory-mapped files, multiprocessing, and numpy.

I'm not optimizing it by hand. I'm giving it to Hone — and letting it figure it out.

Hone is now on PyPI. Install it with pip install hone-ai.

This is a living document. I'll update it as each run completes. Follow the code at laxmena/hone-1brc.


The Setup

The challenge: Parse a 1B-row file. Each row: Hamburg;12.0. Compute min/mean/max per station. Print results sorted alphabetically.

The metric: Wall-clock runtime in seconds. Lower is better.

The constraints: Python standard library only. No numpy, no pandas, no third-party packages. Correctness must be preserved — output format and values must not change.

The baseline:

with open(filepath, "r", encoding="utf-8") as f:
    for line in f:
        line = line.strip()
        sep = line.index(";")
        station = line[:sep]
        temp = float(line[sep + 1:])
        ...

Simple. Correct. Slow. One thread, one line at a time, float() on every value.


Results at a Glance

Run Model Dataset Baseline Optimized Improvement
1 Haiku 1M rows 0.546s 0.471s 13.7%
2 Haiku 100M rows 47.197s 42.739s 9.4%
3 Sonnet 100M rows 48.104s 10.110s 79%

Episode 1: Haiku, 1M rows — 13.7% faster

0.546s → 0.471s

First run: claude-haiku-4-5, 1M rows, $5 budget, 50 max iterations.

The 13.7% gain looks decent on paper. It isn't. The absolute numbers are tiny — we're talking 75 milliseconds. At this scale, Python startup time and OS disk caching dominate. The agent is optimizing noise, not the algorithm. Haiku made incremental tweaks but never found a structural breakthrough.

Wrong dataset size. Move on.


Hone v1.2.0: --goal-file

Episode 1 exposed a friction point. Pasting a long goal string into the terminal every run is error-prone and hard to version. For complex, multi-constraint goals it breaks down fast.

I added --goal-file to Hone — pass a path to a plain text file, Hone reads the goal from there. Same idea as Karpathy's program.md in autoresearch. The goal now lives alongside the code, versioned in git.

hone --goal-file program.md 
     --bench "python benchmark.py data/measurements_100M.txt" 
     --files "solution.py" 
     --optimize lower 
     --score-pattern "Time Taken:\s*(\d+\.\d+)" 
     --budget 3.0 
     --max-iter 50 
     --model claude-haiku-4-5

Live in v1.2.0. pip install --upgrade hone-ai.


Episode 2: Haiku, 100M rows — 9.4% faster

47.197s → 42.739s

10x harder dataset. Now I/O pressure actually matters — 4.5 seconds saved is a real signal.

But Haiku still couldn't find the structural moves. It made safe, local edits — better buffering, minor parsing cleanup — and never stepped back to reconsider the architecture. No parallelism. No mmap. No integer parsing. It hit its ceiling.


Episode 3: Sonnet, 100M rows — 79% faster

48.104s → 10.110s

Same benchmark. Same constraints. One change: claude-haiku-4-5claude-sonnet-4-6.

38 seconds saved. The agent didn't tune the baseline — it replaced it.

What Sonnet actually did

1. Text → Binary reads with mmap

The baseline opens the file in text mode and reads line by line. Sonnet switched to binary mode with memory-mapped I/O — the OS maps the file directly into memory, eliminating repeated read syscalls.

mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
chunk = mm[start:end]

2. float() → integer arithmetic

Every float() call in the baseline is expensive. Sonnet eliminated them entirely. Temperatures are stored as integers ×10 — 12.3 becomes 123. The decimal point is skipped by knowing its fixed position in the byte string. Division back to float happens only once, at output time.

d0 = tb[-1] - 48           # last digit
val = (tb[0] - 48) * 10 + d0   # b'12.3' → 123

It also pre-built a lookup table for all valid temperature values (-99.9 to 99.9) to skip even manual parsing on the common case.

3. Multiprocessing across all CPU cores

The baseline is single-threaded. Sonnet split the file into cpu_count() × 8 chunks, aligned each boundary to the next newline to avoid splitting rows, and ran each chunk in a separate process. Results merged at the end.

num_workers = cpu_count()
boundaries = find_chunk_boundaries(filepath, num_workers * 8)
with Pool(processes=num_workers) as pool:
    all_stats = pool.map(process_chunk, args)

4. strip() + index()partition()

The baseline does line.strip() then line.index(";") — two passes. Sonnet used line.partition(b';') — one pass, station and temperature in a single call.

Why Haiku couldn't find this

Haiku made safe, local edits. It never stepped back to reconsider the architecture. Sonnet saw the whole picture: the bottleneck isn't any single line, it's the approach. Single-threaded text parsing doesn't scale. The winning move was to throw it out and start from a parallel, binary-aware design.

Q: Does model choice matter more than iteration count?


What's Next

100M rows, 79% faster. The real test is 1B rows — 10x again. Running next.


Updates appear here as experiments run. Subscribe below or follow via RSS.

#engineering #hone #ai