HILLCLIMBER
An open-source /goal alternative.
Auto-improve your code. Define your goal, budget, and models—hillclimber orchestrates, executes, and monitors the work.
Open-source and harness-agnostic.
Write a spec file
path_to_artefact = "my_repo/src/" strategy = "chain" # Explicit target to climb [goal] direction = "maximize" target = 1.0 # Hard stop: max cycles, tokens or money. [budget] cycles = 5 # Eval function [scorer] kind = "command" cmd = "python eval.py" # kind = "none" to switch the sandbox off. [sandbox] kind = "seatbelt" # Proposes the next hypothesis for improving the artefact. [agents.orchestrator] harness = "claude" model = "claude-opus-4-8" # Applies the proposed change to the artefact. [agents.worker] harness = "claude" model = "claude-opus-4-8"
The spec file defines the core of the long-running experiment. Define your goal, budget, and eval function to measure the improvement rate.
To generate the spec and eval files execute hillclimber init
Run hillclimber
$ hillclimber run # preflight — score the untouched artefact, check models ✓ baseline 0.712 ✓ models verified ✓ strategy: chain # each cycle: propose → apply → score, keep what climbs ◆ cycle 001: Strip markup before matching field boundaries ▴ cycle 001 scored 0.781 (+0.069) ◆ cycle 002: Fuzzy-match malformed date fields ▾ cycle 002 scored 0.774 (-0.007) ◆ cycle 003: Normalize unicode before matching fields # live status — redraws in place, gone when the run ends ⠹ cycle 3/5 — applying the hypothesis 12:47 baseline 0.712 · best 0.781 │ Read(file_path='src/extract.py') │ Edit(file_path='src/extract.py', old_string=…) │ tool returned: ok
Hillclimber reads your spec and orchestrates the experiment. Each cycle is an isolated git worktee, with dedicated coding agent and tight feedback loop.
To start climbing execute hillclimber run
# Models are great at iteratively improving performance. But without explicit constraints and goals, you risk burning tokens and losing control.
# I built hillclimber to do two things:
1. Force you to be explicit upfront — what you want, and how much you're willing to spend.
2. Leave you free to choose any model provider you like.
You're in control
Explicilty set up the goal, budget, and models.
Free & open-source
It's completely free to use, and you are more than welcome to tweak the source code in any way or form.
Extendable by design
Architecture supports adding new strategies, harnesses, and sandboxes. Work with what suits you best.
Durable execution
If the agent crashes, you can always run hillclimber continue to resume where you left off.
Use with your harness
Let your harness to do all the work and only use hillclimber as experiment orchestrator.