Build llama.cpp on ubuntu with nvidia gpu

Mar 22, 2026 · 1 min read · llama.cpp gpu ubuntu ·

In this article we will see how to install llama.cpp on ubuntu

Installing build essentials and Initial Setup

1sudo apt update
2sudo apt install -y build-essential libcurl4-openssl-dev cmake git

Setup nvcc path

1export CUDACXX=/usr/local/cuda-13.0/bin/nvcc

find gpu arch, make sure nvidia-smi command works

1# Queries the GPU for compute capability (e.g., 8.6) and converts to format (86)
2GPU_ARCH=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader | head -n 1 | tr -d '.')

Clone Repository

1git clone git@github.com:ggml-org/llama.cpp.git

Build Binary

Exec cmake command

1cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=$GPU_ARCH

Compile

1# $(nproc) find all cpu cores i had 12
2cmake --build build --config Release -j $(nproc)

Once the build is successful, we will see binary files present in $ROOT/build/bin

llama-cli
llama-server

Running a gguf model from hugging face repository

1./build/bin/llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M \
2--host 0.0.0.0 --port 8080 \
3-ngl 99 -fa on -c 49152 -b 2048 -ub 1024 \
4--cache-type-k q8_0 --cache-type-v q8_0

Thanks!