Build llama.cpp on ubuntu with nvidia gpu
In this article we will see how to install llama.cpp on ubuntu
Installing build essentials and Initial Setup
1sudo apt update
2sudo apt install -y build-essential libcurl4-openssl-dev cmake git
Setup nvcc path
1export CUDACXX=/usr/local/cuda-13.0/bin/nvcc
find gpu arch, make sure nvidia-smi command works
1# Queries the GPU for compute capability (e.g., 8.6) and converts to format (86)
2GPU_ARCH=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader | head -n 1 | tr -d '.')
Clone Repository
1git clone git@github.com:ggml-org/llama.cpp.git
Build Binary
Exec cmake command
1cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=$GPU_ARCH
Compile
1# $(nproc) find all cpu cores i had 12
2cmake --build build --config Release -j $(nproc)
Once the build is successful, we will see binary files present in $ROOT/build/bin
- llama-cli
- llama-server
Running a gguf model from hugging face repository
1./build/bin/llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M \
2--host 0.0.0.0 --port 8080 \
3-ngl 99 -fa on -c 49152 -b 2048 -ub 1024 \
4--cache-type-k q8_0 --cache-type-v q8_0
Thanks!
comments powered by Disqus