contributors

Build with us.

The fastest way in is a small reproducible artifact: a trace, a failing case, a benchmark, or a patch. We are two people. There is no Slack, no Discord, no weekly call. Contribution is async and lives on GitHub.

How to start

Pick one of these. Each is articulated enough that you can start today without asking us first.

01
Reproduce Gate 2 on your own hardware.
The Gate 2 numbers (53.9 ms solo, 61.5 ms under flooder, 26× better than FIFO) were measured on A100 with vLLM 0.19.1. Re-run the harness on different hardware — H100, L40S, MI300X, even a 4090 — and open a PR with your traces and a one-page note. The harness is at coconut-labs/kvwarden/bench/. Reproductions on hardware we do not own are the most useful contribution we can receive right now.
02
Run the H100 saturation case.
The current H100 result shows modest deltas because the engine did not saturate at 32 RPS. We want a follow-up at higher flooder RPS (128+) or larger tenant count (N=16) on H100 SXM. Estimated cost: ~$3, ~30 min. If you have credit on Lambda, RunPod, or Modal and want to take this on, open an issue named “Gate 2.1b H100 saturation” and we will write up the runbook in the same thread.
03
Add a baseline scheduler we have not compared against.
KVWarden today is benchmarked against FIFO and solo. We want comparisons against at least vLLM's native scheduler at higher concurrency, and against any cache-aware baseline you can wire into the harness. The interface is in kvwarden/scheduler/baseline.py. Add a class, run the harness, ship a plot.
04
Find a failure mode in the fairness claim.
The Gate 2 result is narrow on purpose: one quiet tenant, one flooder, one trace shape. Construct a workload where KVWarden does worse than FIFO — different arrival distributions, adversarial prompt lengths, mixed model sizes. We will publish the counter-example as a research note with co-authorship if it holds up. Adversarial reproductions are at least as valuable to us as confirmatory ones.
05
Patch the harness.
The harness has rough edges: brittle config loading, no built-in support for streaming output measurement, no per-tenant histograms. Issues tagged `harness` in coconut-labs/kvwarden are real, current, and small enough to land in a weekend.

What we give back

Commit attribution. Every PR lands with your name on the commit. We do not squash to hide who did the work.
Co-authorship on substantive contributions. If your work materially shapes a research note, your name goes on the byline. We will negotiate this in the PR thread, not after the fact.
A contributor list. Your name lands on this page once a PR merges. The list updates from the canonical CONTRIBUTORS file in the relevant repo.
References and endorsements. If you do good work here and ask, we will write you a real reference for grad school, jobs, or grants.

What we don't do

Paid contracting. We do not pay for contributions. We are also not paid by anyone for the lab's work. If money is the right shape for what you are offering, we are the wrong door.
Recruiting outreach. We are not hiring. If we are ever hiring, the page you are reading will say so.
Sales calls. No demo decks, no discovery calls, no enterprise pilots. If KVWarden does not solve your problem from the README, it probably does not solve it.

What we're not looking for right now

Productizing KVWarden into a SaaS. The lab is research-first. The middleware is open source and stays that way.
Staffing the team. Coconut Labs is two people on purpose. Adding a third person is a decision we have not made and will not make casually.
VC introductions. We are not raising. We will say so on this page if that ever changes.

the actual files

contributors so far

Just us, for now.

Build with us.

How to start

Reproduce Gate 2 on your own hardware.

Run the H100 saturation case.

Add a baseline scheduler we have not compared against.

Find a failure mode in the fairness claim.

Patch the harness.

What we give back

What we don't do

What we're not looking for right now