Build llama.cpp on ubuntu with amd gpu or 8840HS
In this article we will see how to install llama.cpp on ubuntu amd cpu+gpu this my laptop lscpu output
1AMD Ryzen 7 PRO 8840HS w/ Radeon 780M Graphics
Building High-Performance Llama.cpp for AMD GPUs with Vulkan: A Bash Automation Guide
The landscape of local AI is shifting rapidly, with large language models moving from cloud APIs to local, private instances. One of the most popular open-source tools for this transition is llama.cpp. While it natively supports NVIDIA GPUs via CUDA, building it for AMD processors requires using the Vulkan backend to access hardware acceleration.
Often, manual compilation can be prone to environment errors, dependency mismatches, or slow build times. The script below provides a robust, automated pipeline to configure, compile, and verify your Vulkan-enabled Llama.cpp build on Ubuntu systems with AMD GPUs.
Prerequisites
Before running the script, ensure your system meets the following requirements:
- Operating System: Ubuntu 20.04 or newer (tested on 22.04).
- GPU: An AMD GPU (Radeon RX series or Instinct) with driver support.
- Drivers: Properly installed AMD GPU drivers (typically via
ubuntu-drivers autoinstall). - System Updates: Standard
sudo apt updatehas been included in the script to ensure package freshness.
The Build Script
The following Bash script automates the entire process. It handles dependency verification, cleans previous caches, configures CMake specifically for Vulkan, and utilizes multi-threading to speed up compilation.
1#!/bin/bash
2
3# Exit immediately if any command fails
4set -e
5
6echo "=================================================="
7echo " Starting llama.cpp Vulkan Build Automation "
8echo "=================================================="
9
10# Step 1: Sync and verify required packages
11echo " [1/4] Verifying Ubuntu system dependencies..."
12sudo apt update
13sudo apt install -y libvulkan-dev glslc spirv-headers cmake build-essential
14
15# Step 2: Clear old cache to prevent build conflicts
16if [ -d "build" ]; then
17 echo " [2/4] Found existing build directory. Purging cache..."
18 rm -rf build
19else
20 echo " [2/4] Clean environment detected. Proceeding..."
21fi
22
23# Step 3: Configure build files for Vulkan
24echo " [3/4] Configuring CMake with Vulkan backend..."
25cmake -S . -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
26
27# Step 4: Compile the binaries
28# nproc dynamically fetches your system's total core/thread count (16)
29THREAD_COUNT=$(nproc)
30echo " [4/4] Compiling project using $THREAD_COUNT threads..."
31cmake --build build --config Release -- -j"$THREAD_COUNT"
32
33echo "=================================================="
34echo " Build successful! "
35echo " Server binary location: ./build/bin/llama-server"
36echo "=================================================="
How It Works
- Dependency Sync: The script ensures
libvulkan-dev,glslc(GLSL compiler),spirv-headers,cmake, and essential build tools are installed. These are critical for Vulkan support. - Cache Management: If a previous build attempt failed or if dependencies changed, the script detects the
builddirectory and wipes it. This prevents "outdated cache" issues that often cause compilation errors. - CMake Configuration: It flags
-DGGML_VULKAN=ON, instructing the build system to link against the Vulkan API instead of CUDA or CPU-only options. - Parallel Compilation: Instead of waiting for a single thread to compile, it uses
nprocto determine the number of CPU cores available and spawns threads accordingly (-j"$THREAD_COUNT"). This significantly reduces build time on multi-core systems.
How to Run the Build
- Save the file: Copy the script above into a file named
build_llama.sh. - Make it executable:
1chmod +x build_llama.sh - Run the script:
1./build_llama.sh
Note: You must be in the root directory of the llama.cpp source code repository when running this script. If you are cloning it, you must be in the folder immediately after git clone.
Post-Build Verification
Once the script completes successfully, the binary llama-server will be located in ./build/bin/llama-server.
To verify that Vulkan is working correctly and your AMD GPU is being utilized, run the following command inside the build directory:
1llama-server -hf unsloth/Qwen3.5-4B-MTP-GGUF:UD-Q4_K_XL -ngl 99 \
2 -c 8192 -fa on -np 1 --mlock --no-mmap \
3 --spec-type draft-mtp --spec-draft-n-max 6
You should see output logs indicating GPU device selection
10.01.004.395 I log_info: verbosity = 3 (adjust with the `-lv N` CLI arg)
20.01.004.401 I device_info:
30.01.004.751 I - Vulkan0 : AMD Radeon Graphics (RADV PHOENIX) (34051 MiB, 33316 MiB free)
40.01.004.769 I - CPU : AMD Ryzen 7 PRO 8840HS w/ Radeon 780M Graphics (59911 MiB, 59911 MiB free)
Conclusion
By utilizing this automated Bash script, you can streamline the complex process of compiling llama.cpp for AMD hardware. This approach eliminates manual dependency guessing and ensures a clean, optimized build ready for local AI inference. Happy coding!