Abstract: The block-based inference engine, powered by noncontiguous key-value (KV) cache management, has emerged as a new paradigm for large language model (LLM) inference due to its efficient memory ...