vllm.v1.worker.gpu.spec_decode.eagle.cudagraph ¶
DecodeEagleCudaGraphManager ¶
Bases: CudaGraphManager
Eagle CudaGraphManager for decode draft generation, building its own attention metadata from scratch.
Source code in vllm/v1/worker/gpu/spec_decode/eagle/cudagraph.py
PrefillEagleCudaGraphManager ¶
Bases: CudaGraphManager
Eagle CudaGraphManager for prefill, using pre-built attention states from the target model's capture.