Build llama.cpp on ubuntu with nvidia gpu

In this article we will see how to install llama.cpp on ubuntu

Installing build essentials and Initial Setup

1sudo apt update
2sudo apt install -y build-essential libcurl4-openssl-dev cmake git

Setup nvcc path

1export CUDACXX=/usr/local/cuda-13.0/bin/nvcc

find gpu arch, make sure nvidia-smi command works

1# Queries the GPU for compute capability (e.g., 8.6) and converts to format (86)
2GPU_ARCH=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader | head -n 1 | tr -d '.')

Clone Repository

1git clone git@github.com:ggml-org/llama.cpp.git

Build Binary

Exec cmake command

1cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=$GPU_ARCH

Compile

1# $(nproc) find all cpu cores i had 12
2cmake --build build --config Release -j $(nproc)

Once the build is successful, we will see binary files present in $ROOT/build/bin

  • llama-cli
  • llama-server

Running a gguf model from hugging face repository

1./build/bin/llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M \
2--host 0.0.0.0 --port 8080 \
3-ngl 99 -fa on -c 49152 -b 2048 -ub 1024 \
4--cache-type-k q8_0 --cache-type-v q8_0

Thanks!

comments powered by Disqus