<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>hone &amp;mdash; laxmena</title>
    <link>https://laxmena.com/tag:hone</link>
    <description></description>
    <pubDate>Wed, 15 Apr 2026 01:19:18 +0000</pubDate>
    <image>
      <url>https://i.snap.as/n9575tJN.png</url>
      <title>hone &amp;mdash; laxmena</title>
      <link>https://laxmena.com/tag:hone</link>
    </image>
    <item>
      <title>Hone vs. The 1 Billion Row Challenge</title>
      <link>https://laxmena.com/hone-vs-the-1-billion-row-challenge?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[1,000,000,000 rows of data. No hand-tuning. Just an agent, a benchmark, and a budget.&#xA;&#xA;The 1 Billion Row Challenge is simple on paper: read a file with 1B rows of weather station measurements, compute min/mean/max per station, as fast as possible. In Python, a naive solution takes minutes. The best human-optimized ones use memory-mapped files, multiprocessing, and numpy.&#xA;&#xA;I&#39;m not optimizing it by hand. I&#39;m giving it to Hone — and letting it figure it out.&#xA;&#xA;Hone is now on PyPI. Install it with pip install hone-ai.&#xA;&#xA;This is a living document. I&#39;ll update it as each run completes. Follow the code at laxmena/hone-1brc.&#xA;&#xA;!--more--&#xA;&#xA;---&#xA;&#xA;The Setup&#xA;&#xA;The challenge: Parse a 1B-row file. Each row: Hamburg;12.0. Compute min/mean/max per station. Print results sorted alphabetically.&#xA;&#xA;The metric: Wall-clock runtime in seconds. Lower is better.&#xA;&#xA;The constraints: Python standard library only. No numpy, no pandas, no third-party packages. Correctness must be preserved — output format and values must not change.&#xA;&#xA;The baseline: Simple. Correct. Slow. One thread, one line at a time, float() on every value.&#xA;&#xA;---&#xA;&#xA;Results at a Glance&#xA;&#xA;| Run | Model | Dataset | Baseline | Optimized | Improvement |&#xA;|-----|-------|---------|----------|-----------|-------------|&#xA;| 1 | Haiku | 1M rows | 0.546s | 0.471s | 13.7% |&#xA;| 2 | Haiku | 100M rows | 47.197s | 42.739s | 9.4% |&#xA;| 3 | Sonnet | 100M rows | 48.104s | 10.110s | 79% |&#xA;| 4 | Sonnet (100M solution, no re-run) | 1B rows | 487.525s | 130.080s | 73.3% |&#xA;| 5 | Sonnet | 1B rows | 487.525s | 90.929s | 81.4% |&#xA;&#xA;---&#xA;&#xA;Episode 1: Haiku, 1M rows — 13.7% faster&#xA;March 25, 2026&#xA;&#xA;0.546s → 0.471s&#xA;&#xA;First run: claude-haiku-4-5, 1M rows, $5 budget, 50 max iterations.&#xA;&#xA;The 13.7% gain looks decent on paper. It isn&#39;t. The absolute numbers are tiny — we&#39;re talking 75 milliseconds. At this scale, Python startup time and OS disk caching dominate. The agent is optimizing noise, not the algorithm. Haiku made incremental tweaks but never found a structural breakthrough.&#xA;&#xA;Wrong dataset size. Move on.&#xA;&#xA;---&#xA;&#xA;Hone v1.2.0: --goal-file&#xA;March 25, 2026&#xA;&#xA;Episode 1 exposed a friction point. Pasting a long goal string into the terminal every run is error-prone and hard to version. For complex, multi-constraint goals it breaks down fast.&#xA;&#xA;I added --goal-file to Hone — pass a path to a plain text file, Hone reads the goal from there. Same idea as Karpathy&#39;s program.md in autoresearch. The goal now lives alongside the code, versioned in git.&#xA;&#xA;Live in v1.2.0. pip install --upgrade hone-ai.&#xA;&#xA;---&#xA;&#xA;Episode 2: Haiku, 100M rows — 9.4% faster&#xA;March 25, 2026&#xA;&#xA;47.197s → 42.739s&#xA;&#xA;10x harder dataset. Now I/O pressure actually matters — 4.5 seconds saved is a real signal.&#xA;&#xA;But Haiku still couldn&#39;t find the structural moves. It made safe, local edits — better buffering, minor parsing cleanup — and never stepped back to reconsider the architecture. No parallelism. No mmap. No integer parsing. It hit its ceiling.&#xA;&#xA;---&#xA;&#xA;Episode 3: Sonnet, 100M rows — 79% faster&#xA;March 25, 2026&#xA;&#xA;48.104s → 10.110s&#xA;&#xA;Same benchmark. Same constraints. One change: claude-haiku-4-5 → claude-sonnet-4-6.&#xA;&#xA;38 seconds saved. The agent didn&#39;t tune the baseline — it replaced it.&#xA;&#xA;What Sonnet actually did&#xA;&#xA;1. Text → Binary reads with mmap&#xA;&#xA;The baseline opens the file in text mode and reads line by line. Sonnet switched to binary mode with memory-mapped I/O — the OS maps the file directly into memory, eliminating repeated read syscalls.&#xA;&#xA;2. float() → integer arithmetic&#xA;&#xA;Every float() call in the baseline is expensive. Sonnet eliminated them entirely. Temperatures are stored as integers ×10 — 12.3 becomes 123. The decimal point is skipped by knowing its fixed position in the byte string. Division back to float happens only once, at output time. It also pre-built a lookup table for all valid temperature values (-99.9 to 99.9) to skip even manual parsing on the common case.&#xA;&#xA;3. Multiprocessing across all CPU cores&#xA;&#xA;The baseline is single-threaded. Sonnet split the file into cpucount() × 8 chunks, aligned each boundary to the next newline to avoid splitting rows, and ran each chunk in a separate process. Results merged at the end.&#xA;&#xA;4. strip() + index() → partition()&#xA;&#xA;The baseline does line.strip() then line.index(&#34;;&#34;) — two passes. Sonnet used line.partition(b&#39;;&#39;) — one pass, station and temperature in a single call.&#xA;&#xA;Why Haiku couldn&#39;t find this&#xA;&#xA;Haiku made safe, local edits. It never stepped back to reconsider the architecture. Sonnet saw the whole picture: the bottleneck isn&#39;t any single line, it&#39;s the approach. Single-threaded text parsing doesn&#39;t scale. The winning move was to throw it out and start from a parallel, binary-aware design.&#xA;&#xA;Q: Does model choice matter more than iteration count?&#xA;&#xA;---&#xA;&#xA;Episode 4: Sonnet&#39;s 100M solution, dropped on 1B rows — 73.3% faster&#xA;April 7, 2026&#xA;&#xA;487.525s → 130.080s&#xA;&#xA;Before spending more API budget, I wanted to answer a simpler question first: does the architecture Sonnet found at 100M rows even generalize? No new Hone run. No new cost. Just the existing solution, run unchanged against the full 1B dataset.&#xA;&#xA;357 seconds saved. The answer is yes — it generalizes. mmap, multiprocessing, and integer arithmetic aren&#39;t tricks tuned to a particular file size. They&#39;re structural. The solution held up.&#xA;&#xA;But 130 seconds also exposed the ceiling. Optimizing against a 10x smaller proxy leaves performance on the table. The solution was good — just not good enough. Time to run Hone against the real target.&#xA;&#xA;Source code as Gist here&#xA;&#xA;---&#xA;&#xA;Episode 5: Sonnet, 1B rows directly — 81.4% faster&#xA;April 7, 2026&#xA;&#xA;487.525s → 90.929s&#xA;&#xA;Same model, same constraints. This time Hone optimized against the full 1B row dataset from the start.&#xA;&#xA;396 seconds saved. Under 91 seconds for a billion rows of Python.&#xA;&#xA;The gains from Episode 4 weren&#39;t wasted — they were the floor. Hone started from a strong architecture and pushed further. 81.4% beats the 79% from Episode 3. More data, better result. The solution isn&#39;t fragile.&#xA;&#xA;Source code as Gist here&#xA;&#xA;The lesson: optimize against the real target. A proxy dataset is useful for iteration speed, but the final run needs to face the actual problem.&#xA;&#xA;---&#xA;&#xA;What&#39;s Next&#xA;&#xA;Under 91 seconds on 1B rows. The question now is how much headroom is left — and whether Hone can find it without numpy or third-party packages.&#xA;&#xA;---&#xA;&#xA;Updates appear here as experiments run. Subscribe below or follow via RSS._&#xA;&#xA;#engineering #hone #ai&#xA;&#xA;!--emailsub--]]&gt;</description>
      <content:encoded><![CDATA[<p>1,000,000,000 rows of data. No hand-tuning. Just an agent, a benchmark, and a budget.</p>

<p>The <a href="https://github.com/gunnarmorling/1brc">1 Billion Row Challenge</a> is simple on paper: read a file with 1B rows of weather station measurements, compute min/mean/max per station, as fast as possible. In Python, a naive solution takes minutes. The best human-optimized ones use memory-mapped files, multiprocessing, and numpy.</p>

<p>I&#39;m not optimizing it by hand. I&#39;m giving it to <a href="https://github.com/laxmena/hone">Hone</a> — and letting it figure it out.</p>

<p>Hone is now on PyPI. Install it with <code>pip install hone-ai</code>.</p>

<p>This is a living document. I&#39;ll update it as each run completes. Follow the code at <a href="https://github.com/laxmena/hone-1brc">laxmena/hone-1brc</a>.</p>



<hr/>

<h2 id="the-setup" id="the-setup">The Setup</h2>

<p><strong>The challenge:</strong> Parse a 1B-row file. Each row: <code>Hamburg;12.0</code>. Compute min/mean/max per station. Print results sorted alphabetically.</p>

<p><strong>The metric:</strong> Wall-clock runtime in seconds. Lower is better.</p>

<p><strong>The constraints:</strong> Python standard library only. No numpy, no pandas, no third-party packages. Correctness must be preserved — output format and values must not change.</p>

<p><strong>The baseline:</strong> Simple. Correct. Slow. One thread, one line at a time, <code>float()</code> on every value.</p>

<hr/>

<h2 id="results-at-a-glance" id="results-at-a-glance">Results at a Glance</h2>

<table>
<thead>
<tr>
<th>Run</th>
<th>Model</th>
<th>Dataset</th>
<th>Baseline</th>
<th>Optimized</th>
<th>Improvement</th>
</tr>
</thead>

<tbody>
<tr>
<td>1</td>
<td>Haiku</td>
<td>1M rows</td>
<td>0.546s</td>
<td>0.471s</td>
<td>13.7%</td>
</tr>

<tr>
<td>2</td>
<td>Haiku</td>
<td>100M rows</td>
<td>47.197s</td>
<td>42.739s</td>
<td>9.4%</td>
</tr>

<tr>
<td>3</td>
<td>Sonnet</td>
<td>100M rows</td>
<td>48.104s</td>
<td>10.110s</td>
<td><strong>79%</strong></td>
</tr>

<tr>
<td>4</td>
<td>Sonnet (100M solution, no re-run)</td>
<td>1B rows</td>
<td>487.525s</td>
<td>130.080s</td>
<td>73.3%</td>
</tr>

<tr>
<td>5</td>
<td>Sonnet</td>
<td>1B rows</td>
<td>487.525s</td>
<td>90.929s</td>
<td><strong>81.4%</strong></td>
</tr>
</tbody>
</table>

<hr/>

<h2 id="episode-1-haiku-1m-rows-13-7-faster" id="episode-1-haiku-1m-rows-13-7-faster">Episode 1: Haiku, 1M rows — 13.7% faster</h2>

<p><em>March 25, 2026</em></p>

<p><code>0.546s → 0.471s</code></p>

<p>First run: <code>claude-haiku-4-5</code>, 1M rows, $5 budget, 50 max iterations.</p>

<p>The 13.7% gain looks decent on paper. It isn&#39;t. The absolute numbers are tiny — we&#39;re talking 75 milliseconds. At this scale, Python startup time and OS disk caching dominate. The agent is optimizing noise, not the algorithm. Haiku made incremental tweaks but never found a structural breakthrough.</p>

<p>Wrong dataset size. Move on.</p>

<hr/>

<h2 id="hone-v1-2-0-goal-file" id="hone-v1-2-0-goal-file">Hone v1.2.0: <code>--goal-file</code></h2>

<p><em>March 25, 2026</em></p>

<p>Episode 1 exposed a friction point. Pasting a long goal string into the terminal every run is error-prone and hard to version. For complex, multi-constraint goals it breaks down fast.</p>

<p>I added <code>--goal-file</code> to Hone — pass a path to a plain text file, Hone reads the goal from there. Same idea as Karpathy&#39;s <code>program.md</code> in autoresearch. The goal now lives alongside the code, versioned in git.</p>

<p>Live in <a href="https://github.com/laxmena/hone/commit/477a83d5050628355bf45ceded3807fea75b8ce6">v1.2.0</a>. <code>pip install --upgrade hone-ai</code>.</p>

<hr/>

<h2 id="episode-2-haiku-100m-rows-9-4-faster" id="episode-2-haiku-100m-rows-9-4-faster">Episode 2: Haiku, 100M rows — 9.4% faster</h2>

<p><em>March 25, 2026</em></p>

<p><code>47.197s → 42.739s</code></p>

<p>10x harder dataset. Now I/O pressure actually matters — 4.5 seconds saved is a real signal.</p>

<p>But Haiku still couldn&#39;t find the structural moves. It made safe, local edits — better buffering, minor parsing cleanup — and never stepped back to reconsider the architecture. No parallelism. No mmap. No integer parsing. It hit its ceiling.</p>

<hr/>

<h2 id="episode-3-sonnet-100m-rows-79-faster" id="episode-3-sonnet-100m-rows-79-faster">Episode 3: Sonnet, 100M rows — <strong>79% faster</strong></h2>

<p><em>March 25, 2026</em></p>

<p><code>48.104s → 10.110s</code></p>

<p>Same benchmark. Same constraints. One change: <code>claude-haiku-4-5</code> → <code>claude-sonnet-4-6</code>.</p>

<p>38 seconds saved. The agent didn&#39;t tune the baseline — it replaced it.</p>

<h3 id="what-sonnet-actually-did" id="what-sonnet-actually-did">What Sonnet actually did</h3>

<p><strong>1. Text → Binary reads with <code>mmap</code></strong></p>

<p>The baseline opens the file in text mode and reads line by line. Sonnet switched to binary mode with memory-mapped I/O — the OS maps the file directly into memory, eliminating repeated read syscalls.</p>

<p><strong>2. <code>float()</code> → integer arithmetic</strong></p>

<p>Every <code>float()</code> call in the baseline is expensive. Sonnet eliminated them entirely. Temperatures are stored as integers ×10 — <code>12.3</code> becomes <code>123</code>. The decimal point is skipped by knowing its fixed position in the byte string. Division back to float happens only once, at output time. It also pre-built a lookup table for all valid temperature values (<code>-99.9</code> to <code>99.9</code>) to skip even manual parsing on the common case.</p>

<p><strong>3. Multiprocessing across all CPU cores</strong></p>

<p>The baseline is single-threaded. Sonnet split the file into <code>cpu_count() × 8</code> chunks, aligned each boundary to the next newline to avoid splitting rows, and ran each chunk in a separate process. Results merged at the end.</p>

<p><strong>4. <code>strip()</code> + <code>index()</code> → <code>partition()</code></strong></p>

<p>The baseline does <code>line.strip()</code> then <code>line.index(&#34;;&#34;)</code> — two passes. Sonnet used <code>line.partition(b&#39;;&#39;)</code> — one pass, station and temperature in a single call.</p>

<h3 id="why-haiku-couldn-t-find-this" id="why-haiku-couldn-t-find-this">Why Haiku couldn&#39;t find this</h3>

<p>Haiku made safe, local edits. It never stepped back to reconsider the architecture. Sonnet saw the whole picture: the bottleneck isn&#39;t any single line, it&#39;s the approach. Single-threaded text parsing doesn&#39;t scale. The winning move was to throw it out and start from a parallel, binary-aware design.</p>

<p><strong>Q: Does model choice matter more than iteration count?</strong></p>

<hr/>

<h2 id="episode-4-sonnet-s-100m-solution-dropped-on-1b-rows-73-3-faster" id="episode-4-sonnet-s-100m-solution-dropped-on-1b-rows-73-3-faster">Episode 4: Sonnet&#39;s 100M solution, dropped on 1B rows — 73.3% faster</h2>

<p><em>April 7, 2026</em></p>

<p><code>487.525s → 130.080s</code></p>

<p>Before spending more API budget, I wanted to answer a simpler question first: does the architecture Sonnet found at 100M rows even generalize? No new Hone run. No new cost. Just the existing solution, run unchanged against the full 1B dataset.</p>

<p>357 seconds saved. The answer is yes — it generalizes. mmap, multiprocessing, and integer arithmetic aren&#39;t tricks tuned to a particular file size. They&#39;re structural. The solution held up.</p>

<p>But 130 seconds also exposed the ceiling. Optimizing against a 10x smaller proxy leaves performance on the table. The solution was good — just not good enough. Time to run Hone against the real target.</p>

<p><a href="https://gist.github.com/laxmena/a55238ce48ab0a3157087b8f345a0775">Source code as Gist here</a></p>

<hr/>

<h2 id="episode-5-sonnet-1b-rows-directly-81-4-faster" id="episode-5-sonnet-1b-rows-directly-81-4-faster">Episode 5: Sonnet, 1B rows directly — <strong>81.4% faster</strong></h2>

<p><em>April 7, 2026</em></p>

<p><code>487.525s → 90.929s</code></p>

<p>Same model, same constraints. This time Hone optimized against the full 1B row dataset from the start.</p>

<p>396 seconds saved. Under 91 seconds for a billion rows of Python.</p>

<p>The gains from Episode 4 weren&#39;t wasted — they were the floor. Hone started from a strong architecture and pushed further. 81.4% beats the 79% from Episode 3. More data, better result. The solution isn&#39;t fragile.</p>

<p><a href="https://gist.github.com/laxmena/0cb6ba6c5d8a5e235d245295afa0b9fd">Source code as Gist here</a></p>

<p>The lesson: optimize against the real target. A proxy dataset is useful for iteration speed, but the final run needs to face the actual problem.</p>

<hr/>

<h2 id="what-s-next" id="what-s-next">What&#39;s Next</h2>

<p>Under 91 seconds on 1B rows. The question now is how much headroom is left — and whether Hone can find it without numpy or third-party packages.</p>

<hr/>

<p><em>Updates appear here as experiments run. Subscribe below or follow via <a href="https://write.as/laxmena/feed/">RSS</a>.</em></p>

<p><a href="https://laxmena.com/tag:engineering" class="hashtag"><span>#</span><span class="p-category">engineering</span></a> <a href="https://laxmena.com/tag:hone" class="hashtag"><span>#</span><span class="p-category">hone</span></a> <a href="https://laxmena.com/tag:ai" class="hashtag"><span>#</span><span class="p-category">ai</span></a></p>


]]></content:encoded>
      <guid>https://laxmena.com/hone-vs-the-1-billion-row-challenge</guid>
      <pubDate>Wed, 25 Mar 2026 04:06:42 +0000</pubDate>
    </item>
  </channel>
</rss>