Live

KVWarden

Tenant fairness on shared inference.

1.14× of solo TTFT, 26× better than FIFO

Read the full project

KVWarden is a scheduler and cache-pressure experiment for shared LLM inference. The first public result is narrow on purpose: a quiet tenant stays near solo latency while a flooder pushes the system. The harness is public; the plots do not hide the quiet tenant in an aggregate.