A KVCache-centric disaggregated architecture for LLM serving. Won Best Paper at FAST 2025.

Outputs 2

Mooncake: KVCache-centric Disaggregated Architecture

paper

arXiv: 2407.00079

Venue: FAST 2025

Mooncake Dataset & Code

dataset

GitHub Repository

infrastructureefficiency