Installation¶
Requirements¶
- Python 3.8 or higher
- pip
That's all you need to get started. Heavy dependencies like PyTorch and Diffusers are pulled in on first use, not at install time.
Install from PyPI¶
Verify the install:
System-specific Notes¶
Install works normally on Windows. For Speech commands (vllama stt), you may need to install PortAudio:
Virtual Environment (Recommended)¶
Running in a virtual environment keeps your global Python clean:
python -m venv vllama-env
# Activate
source vllama-env/bin/activate # Linux / macOS
vllama-env\Scripts\activate # Windows
pip install vllama
GPU Support (Optional but Faster)¶
Vllama works on CPU out of the box. For GPU acceleration, install PyTorch with CUDA support before installing vllama:
# CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# Then install vllama
pip install vllama
No GPU locally?
Use Vllama's built-in Kaggle GPU offload to run heavy models for free without any local GPU.
VS Code Extension¶
To use the VS Code chat integration:
- Open VS Code
- Press
Ctrl+Shift+X(orCmd+Shift+Xon macOS) - Search for Vllama
- Click Install
The extension connects to a local LLM server you start with vllama run_llm. See the Local LLM in VS Code guide.
Updating¶
Uninstalling¶
Model weights cached by HuggingFace are stored in ~/.cache/huggingface. Remove that directory to free disk space.