Fluid Attacks Database

Description

vLLM Vulnerable to Remote DoS via Special-Token Placeholders

Summary

This report explains a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. Severity: High (remote DoS). Reproduced on vLLM 0.10.0 with Qwen2.5-VL.

Details

Affected component: multimodal input position computation.

File/functions (paths are indicative):

vllm/model_executor/layers/rotary_embedding.py

get_input_positions_tensor(...)

_vl_get_input_positions_tensor(...)

Failure mechanism:

The code counts detected vision tokens and then indexes video_grid_thw/image_grid_thw accordingly.

When user input carries placeholder tokens but no actual multimodal payload, these grids are empty. The code does not bounds-check before indexing.

Representative snippet (context):

# vllm/model_executor/layers/rotary_embedding.py
@classmethod
def _vl_get_input_positions_tensor(
    cls,
    input_tokens,
    hf_config,
    image_grid_thw,
    video_grid_thw,...

Abbreviated call path:

OpenAI API request
 → vllm.v1.engine.core: step/execute_model
 → vllm.v1.worker.gpu_model_runner: _update_states/execute_model
 → vllm.model_executor.layers.rotary_embedding: get_input_positions_tensor
 → _vl_get_input_positions_tensor
 → IndexError: list index out of range

PoC

Environment

vLLM: 0.10.0

Model: Qwen/Qwen2.5-VL-3B-Instruct

Launch server:

python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-VL-3B-Instruct \
  --port 8000

Request (text-only, no image/video data)

cat > request.json <<'JSON'
{
  "model": "Qwen/Qwen2.5-VL-3B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text",...

Observed result

HTTP 500; logs show IndexError: list index out of range from _vl_get_input_positions_tensor(...).

In some deployments, the worker exits and capacity remains reduced until manual restart.

Impact

Type: Token Injection leading to Remote Denial of Service (unauthenticated). A single request can trigger the fault.

Scope: Any vLLM deployment that serves VLMs and accepts raw user text via OpenAI-compatible endpoints (self-hosted or proxied/managed fronts).

Effect: Request → unhandled exception in position computation → worker termination / service unavailability.

Fixes

Changes associated with https://github.com/vllm-project/vllm/issues/32656

Credits

Pengyu Ding (Infra Security, Ant Group)
Ziteng Xu (Infra Security, Ant Group)

Mitigation

Update Impact

Minimal update. May introduce new vulnerabilities or breaking changes.

Ecosystem	Package	Affected version	Patched versions
pypi	vllm	>=0.6.1 <0.20.0	0.20.0

Aliases

1. CVE-2026-442222. NVD-2026-442223. OSV-2026-442224. GHSA-hpv8-x276-m59f5. GCVE-2026-44222

References

1. https://github.com/vllm-project/vllm/security/advisories/GHSA-hpv8-x276-m59f2. https://github.com/vllm-project/vllm/issues/32656