Inference: Attention Optimizations
Attention optimizations for inference: Multi-Query Attention, PagedAttention, Hybrid Attention (local + global), Ring Attention, Infiniti Attention, Block transformer, Longformer attention, and RadixAttention.
Attention optimizations for inference: Multi-Query Attention, PagedAttention, Hybrid Attention (local + global), Ring Attention, Infiniti Attention, Block transformer, Longformer attention, and RadixAttention.