Hone vs. The 1 Billion Row Challenge

1,000,000,000 rows of data. No hand-tuning. Just an agent, a benchmark, and a budget.

The 1 Billion Row Challenge is simple on paper: read a file with 1B rows of weather station measurements, compute min/mean/max per station, as fast as possible. In Python, a naive solution takes minutes. The best human-optimized ones use memory-mapped files, multiprocessing, and numpy.

I'm not optimizing it by hand. I'm giving it to Hone — and letting it figure it out.

Hone is now on PyPI. Install it with pip install hone-ai.

This is a living document. I'll update it as each run completes. Follow the code at laxmena/hone-1brc.


The Setup

The challenge: Parse a 1B-row file. Each row: Hamburg;12.0. Compute min/mean/max per station. Print results sorted alphabetically.

The metric: Wall-clock runtime in seconds. Lower is better.

The constraints: Python standard library only. No numpy, no pandas, no third-party packages. Correctness must be preserved — output format and values must not change.

The rules: 1. Start with the most naive Python implementation possible 2. Feed it to Hone with runtime as the only objective 3. No hints. No hand-holding. 4. Document every move the agent makes


Episode 1: Baseline + Run 1 (1M rows)

The baseline is intentionally blunt. Open the file, read line by line, split on ;, accumulate min/mean/max into a dict, print sorted results. No buffering tricks. No compiled regex. Just Python doing Python things.

Hone launched with this command:

hone "Optimize solution.py to minimize wall-clock execution time when processing \
a large measurements file. The program reads lines in the format \
'StationName;Temperature', computes min, mean, and max temperature per station, \
and prints results sorted alphabetically. Optimizations must use Python standard \
library only — no third-party packages. Correctness must be preserved: output \
format and values must remain unchanged. Focus on I/O throughput, parsing speed, \
and efficient aggregation." \
     --bench "python benchmark.py data/measurements_1M.txt" \
     --files "solution.py" \
     --optimize lower \
     --score-pattern "Time Taken:\s*(\d+\.\d+)" \
     --budget 5.0 \
     --max-iter 50 \
     --model claude-haiku-4-5

Budget: $5. Max iterations: 50. Model: claude-haiku-4-5 — deliberately a smaller, faster model. The interesting question isn't whether a frontier model can optimize Python. It's whether a cheap model, given enough iterations, can find the same insights a senior engineer would.

Results (1M rows — lightweight test): The 1M row file is small enough that most optimization attempts didn't move the needle significantly. The agent found some wins — (add key findings from report here) — but the gains were modest. At this scale, Python startup overhead and OS caching effects dominate. The real test is larger files.


Hone v1.2.0: --goal-file

Running Episode 1 exposed a real friction point: pasting a long goal string into the terminal every run is tedious and error-prone. It's fine for short goals. For complex, multi-constraint optimization tasks, it breaks down fast.

I extended Hone to accept a --goal-file flag — pass a path to a plain text file, and Hone reads the goal from there. Same idea as Karpathy's program.md in autoresearch. Keep your goal versioned alongside your code.

hone --goal-file goal.txt \
     --bench "python benchmark.py data/measurements_1M.txt" \
     --files "solution.py" \
     --optimize lower \
     --score-pattern "Time Taken:\s*(\d+\.\d+)" \
     --budget 5.0 \
     --max-iter 50 \
     --model claude-haiku-4-5

The change is live in v1.2.0 and published to PyPI. pip install --upgrade hone-ai to get it.


Episode 2: Run 2 (100M rows, with --goal-file)

10x harder. Episode 1 ran on 1M rows — too small for meaningful signal. Episode 2 steps up to 100M rows, where I/O pressure and parsing overhead actually matter.

The goal string now lives in program.md, versioned with the code.

hone \
     --goal-file program.md \
     --bench "python benchmark.py data/measurements_100M.txt" \
     --files "solution.py" \
     --optimize lower \
     --score-pattern "Time Taken:\s*(\d+\.\d+)" \
     --budget 3.0 \
     --max-iter 50 \
     --model claude-haiku-4-5

Budget: $3. Same model, same constraints. I kicked this off before bed — Hone is running through the night while I sleep. Results tomorrow.


Updates appear here as experiments run. Subscribe below or follow via RSS.

#engineering #hone #llm