Build llama.cpp on ubuntu with amd gpu or 8840HS

In this article we will see how to install llama.cpp on ubuntu amd cpu+gpu this my laptop lscpu output

1AMD Ryzen 7 PRO 8840HS w/ Radeon 780M Graphics

Building High-Performance Llama.cpp for AMD GPUs with Vulkan: A Bash Automation Guide

The landscape of local AI is shifting rapidly, with large language models moving from cloud APIs to local, private instances. One of the most popular open-source tools for this transition is llama.cpp. While it natively supports NVIDIA GPUs via CUDA, building it for AMD processors requires using the Vulkan backend to access hardware acceleration.

Often, manual compilation can be prone to environment errors, dependency mismatches, or slow build times. The script below provides a robust, automated pipeline to configure, compile, and verify your Vulkan-enabled Llama.cpp build on Ubuntu systems with AMD GPUs.


Prerequisites

Before running the script, ensure your system meets the following requirements:

  1. Operating System: Ubuntu 20.04 or newer (tested on 22.04).
  2. GPU: An AMD GPU (Radeon RX series or Instinct) with driver support.
  3. Drivers: Properly installed AMD GPU drivers (typically via ubuntu-drivers autoinstall).
  4. System Updates: Standard sudo apt update has been included in the script to ensure package freshness.

The Build Script

The following Bash script automates the entire process. It handles dependency verification, cleans previous caches, configures CMake specifically for Vulkan, and utilizes multi-threading to speed up compilation.

 1#!/bin/bash
 2
 3# Exit immediately if any command fails
 4set -e
 5
 6echo "=================================================="
 7echo " Starting llama.cpp Vulkan Build Automation "
 8echo "=================================================="
 9
10# Step 1: Sync and verify required packages
11echo " [1/4] Verifying Ubuntu system dependencies..."
12sudo apt update
13sudo apt install -y libvulkan-dev glslc spirv-headers cmake build-essential
14
15# Step 2: Clear old cache to prevent build conflicts
16if [ -d "build" ]; then
17    echo " [2/4] Found existing build directory. Purging cache..."
18    rm -rf build
19else
20    echo " [2/4] Clean environment detected. Proceeding..."
21fi
22
23# Step 3: Configure build files for Vulkan
24echo " [3/4] Configuring CMake with Vulkan backend..."
25cmake -S . -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
26
27# Step 4: Compile the binaries
28# nproc dynamically fetches your system's total core/thread count (16)
29THREAD_COUNT=$(nproc)
30echo " [4/4] Compiling project using $THREAD_COUNT threads..."
31cmake --build build --config Release -- -j"$THREAD_COUNT"
32
33echo "=================================================="
34echo " Build successful! "
35echo " Server binary location: ./build/bin/llama-server"
36echo "=================================================="

How It Works

  1. Dependency Sync: The script ensures libvulkan-dev, glslc (GLSL compiler), spirv-headers, cmake, and essential build tools are installed. These are critical for Vulkan support.
  2. Cache Management: If a previous build attempt failed or if dependencies changed, the script detects the build directory and wipes it. This prevents "outdated cache" issues that often cause compilation errors.
  3. CMake Configuration: It flags -DGGML_VULKAN=ON, instructing the build system to link against the Vulkan API instead of CUDA or CPU-only options.
  4. Parallel Compilation: Instead of waiting for a single thread to compile, it uses nproc to determine the number of CPU cores available and spawns threads accordingly (-j"$THREAD_COUNT"). This significantly reduces build time on multi-core systems.

How to Run the Build

  1. Save the file: Copy the script above into a file named build_llama.sh.
  2. Make it executable:
    1chmod +x build_llama.sh
    
  3. Run the script:
    1./build_llama.sh
    

Note: You must be in the root directory of the llama.cpp source code repository when running this script. If you are cloning it, you must be in the folder immediately after git clone.


Post-Build Verification

Once the script completes successfully, the binary llama-server will be located in ./build/bin/llama-server.

To verify that Vulkan is working correctly and your AMD GPU is being utilized, run the following command inside the build directory:

1llama-server -hf unsloth/Qwen3.5-4B-MTP-GGUF:UD-Q4_K_XL -ngl 99 \
2  -c 8192 -fa on -np 1 --mlock --no-mmap \
3  --spec-type draft-mtp --spec-draft-n-max 6

You should see output logs indicating GPU device selection

10.01.004.395 I log_info: verbosity = 3 (adjust with the `-lv N` CLI arg)
20.01.004.401 I device_info:
30.01.004.751 I   - Vulkan0 : AMD Radeon Graphics (RADV PHOENIX) (34051 MiB, 33316 MiB free)
40.01.004.769 I   - CPU     : AMD Ryzen 7 PRO 8840HS w/ Radeon 780M Graphics (59911 MiB, 59911 MiB free)

Conclusion

By utilizing this automated Bash script, you can streamline the complex process of compiling llama.cpp for AMD hardware. This approach eliminates manual dependency guessing and ensures a clean, optimized build ready for local AI inference. Happy coding!

comments powered by Disqus