Local LLM in VS Code¶
Use Vllama to chat with a local language model directly inside VS Code's native chat panel — no API keys, no subscriptions, no data sent to the cloud.
How It Works¶
Vllama runs a local LLM as an HTTP server on your machine. The Vllama VS Code extension connects to that server and plugs it into VS Code's native Chat with AI panel. Your model, your hardware, your data.
Your Machine
┌────────────────────────────────────────────┐
│ vllama run_llm (Flask server :2513) │
│ ↕ HTTP │
│ VS Code Vllama Extension │
│ ↕ VS Code Chat API │
│ VS Code Chat panel │
└────────────────────────────────────────────┘
Step 1: Install the VS Code Extension¶
- Open VS Code
- Press
Ctrl+Shift+X(Windows/Linux) orCmd+Shift+X(macOS) - Search for Vllama
- Click Install
- Reload VS Code when prompted
Step 2: Start the Local LLM Server¶
In your terminal:
On first run, this downloads the model weights (~1GB for Qwen 0.5B). Wait for:
Keep this terminal open — closing it stops the server.
Step 3: Chat in VS Code¶
- In VS Code, open the Chat panel: View → Chat with AI (or
Ctrl+Alt+I) - Select your Vllama local model from the model dropdown
- Start chatting
Recommended Models by Use Case¶
| Goal | Model | Download Size |
|---|---|---|
| Code help, fast | Qwen/Qwen2.5-Coder-0.5B-Instruct |
~1 GB |
| General assistant (low RAM) | microsoft/DialoGPT-medium |
~1.5 GB |
| Better quality (needs 8GB+ RAM) | meta-llama/Llama-2-7b-chat-hf |
~14 GB |
Troubleshooting¶
"VS Code extension can't connect"
- Make sure
vllama run_llmis running and showsRunning on http://localhost:2513 - Check VS Code extension settings — the default port is
2513 - On Linux, check if a firewall is blocking localhost connections:
sudo ufw status
Model responses are slow
Smaller models (0.5B–1B parameters) respond in seconds even on CPU. Larger models (7B+) need a GPU or are slow on CPU. Try Qwen/Qwen2.5-Coder-0.5B-Instruct for fast responses.
"Model not found"
Verify the model ID is valid on huggingface.co. Gated models (Llama 2) require a HuggingFace token:
Also: Chat from the Terminal¶
You don't need VS Code — you can also chat directly from a second terminal window:
Both the CLI chat and VS Code extension can connect to the same server simultaneously.