vLLM - WednesdayAI

vLLM is a high-throughput inference server for open-weight models. OpenClaw connects to vLLM via its OpenAI-compatible API.

Prerequisites

A running vLLM server with its OpenAI-compatible API enabled
The server must be reachable from the OpenClaw gateway host

Activation

vLLM auto-activates when VLLM_API_KEY is set in the environment. The default base URL is http://127.0.0.1:8000/v1.

Run openclaw onboard to configure vLLM interactively — it generates the correct models.providers.vllm config block for you.

Configuration

models:
  providers:
    vllm:
      baseUrl: "http://127.0.0.1:8000/v1"
      api: "openai-completions"
      apiKey: "VLLM_API_KEY"   # env var name, not the literal key
      models:
        - id: "meta-llama/Llama-3.1-8B-Instruct"
          name: "Llama 3.1 8B"
          contextWindow: 128000
          maxTokens: 4096
          input: 0
          cost: 0

Key	Type	Description
`baseUrl`	string	Base URL of the vLLM OpenAI-compatible endpoint
`api`	string	API style — `openai-completions` for vLLM
`apiKey`	string	Name of the env var holding the API key
`models[].id`	string	Model ID as loaded by your vLLM server
`models[].contextWindow`	integer	Context window size in tokens
`models[].maxTokens`	integer	Max output tokens
`models[].input` / `cost`	number	Cost per token (use `0` for self-hosted)

Running vLLM

docker run --runtime nvidia --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -p 8000:8000 \
  vllm/vllm-openai:latest \
  --model meta-llama/Llama-3.1-8B-Instruct

Verify the connection

openclaw models status --probe-provider vllm

vLLM does not expose a model list endpoint by default. The --probe flag sends a test completion to verify connectivity.

Amazon Bedrock Mistral AI

​Prerequisites

​Activation

​Configuration

​Running vLLM

​Verify the connection

Prerequisites

Activation

Configuration

Running vLLM

Verify the connection