KVWarden is a scheduler and cache-pressure experiment for shared LLM inference. The first public result is narrow on purpose: a quiet tenant stays near solo latency while a flooder pushes the system. The harness is public; the plots do not hide the quiet tenant in an aggregate.
Live
KVWarden
Tenant fairness on shared inference.
1.14× of solo TTFT, 26× better than FIFO
Read the full project