#server-performance

Tuned Profiles for Linux Server Workload Optimisation

16 min read - June 9, 2026

Table of contents

tuned profiles for server workload optimisation
How tuned profiles work
Choosing the right profile for your workload
Installing and applying profiles
Building a custom profile for AI, ML, and high-bandwidth workloads
Managing profiles across a server fleet
Conclusion

Share

How to choose, apply, and customise tuned profiles for GPU, database, and high-bandwidth Linux servers, with examples and Ansible deployment tips.

Table of contents

tuned profiles for server workload optimisation
How tuned profiles work
Choosing the right profile for your workload
Installing and applying profiles
Building a custom profile for AI, ML, and high-bandwidth workloads
Managing profiles across a server fleet
Conclusion

tuned profiles for server workload optimisation

Linux default settings are tuned for compatibility, not performance. The tuned daemon ships predefined profiles that adjust CPU governors, I/O schedulers, kernel parameters, and network buffers to match a specific workload. This post covers how the profiles work, which one to pick for common server roles, and how to build and deploy custom profiles across a fleet.

How tuned profiles work

A profile is a directory under /usr/lib/tuned/profiles/ (system) or /etc/tuned/profiles/ (custom) containing a tuned.conf file. The conf file groups parameters by plugin: [cpu], [disk], [sysctl], [vm], [bootloader], and so on. Activate a profile and the tuned daemon applies every parameter in one go, rather than running dozens of sysctl and sysfs commands by hand.

Profiles can inherit from each other with the include directive. The throughput-performance profile, for example, can serve as the base for a custom database profile that overrides only vm.swappiness and the Transparent Huge Pages setting.

tuned runs in two modes. Static tuning applies the profile once at activation and leaves the system alone, which is what you want on production servers where consistency matters more than power savings. Dynamic tuning monitors disk, network, and load usage in real time and adjusts settings on the fly. Performance profiles disable dynamic tuning by default to avoid the monitoring overhead.

Choosing the right profile for your workload

tuned ships a dozen profiles covering most common workloads. Pick the one that matches what the server actually does, rather than leaving the default balanced profile in place.

Workload	Profile	What it does
GPU training and inference	`accelerator-performance`	Locks the CPU into low C-states, keeping CPU-to-GPU latency under 100µs
Databases (Postgres, MySQL, Redis)	`throughput-performance`	Disables power saving, tunes disk and network I/O, disables Transparent Huge Pages
High-bandwidth networking (CDN, replication, data pipelines)	`network-throughput`	Enlarges kernel network buffers for sustained high-bandwidth transfers
Latency-sensitive services	`network-latency` or `latency-performance`	Pegs the CPU governor to `performance`, disables deep C-states
HPC and compute clusters	`hpc-compute`	Extends latency-performance with NUMA and memory tuning
VPS instances (guest OS)	`virtual-guest`	Lowers swappiness, increases disk readahead for paravirtualised I/O
KVM hypervisor hosts	`virtual-host`	Tunes dirty page writeback for VM workloads
Mixed or unknown	`balanced`	Default. Trades performance for power efficiency

For specific database engines, tuned also ships postgresql, mssql, and oracle profiles that go further than throughput-performance by tuning shared memory and kernel scheduler parameters for those engines.

On multi-socket servers, NUMA topology matters. Remote-node memory access can be two to three times slower than local access. For latency-critical workloads on dual-socket boxes, disable automatic NUMA balancing in the profile and pin processes to specific nodes manually.

Installing and applying profiles

Install tuned on RHEL, Rocky, AlmaLinux, or Fedora:

dnf install tuned
systemctl enable --now tuned

On Debian and Ubuntu the package is also called tuned and installs via apt. If power-profiles-daemon is already running, mask it to avoid conflicts:

systemctl mask --now power-profiles-daemon

List available profiles, ask tuned what it recommends for the hardware, apply a profile, and verify it:

tuned-adm list
tuned-adm recommend
tuned-adm profile throughput-performance
tuned-adm verify

The active profile is stored in /etc/tuned/active_profile and persists across reboots. To strip tuning entirely and measure the baseline, run tuned-adm off.

Building a custom profile for AI, ML, and high-bandwidth workloads

When the stock profiles get you 90% of the way there, build a custom profile that inherits from the closest match and overrides the remaining parameters. Start with a directory and a conf file:

mkdir -p /etc/tuned/ai-gpu
cat > /etc/tuned/ai-gpu/tuned.conf <<'EOF'
[main]
summary=Custom profile for GPU training with high-bandwidth networking
include=accelerator-performance
 
[cpu]
governor=performance
 
[sysctl]
kernel.numa_balancing=0
net.core.rmem_max=268435456
net.core.wmem_max=268435456
net.ipv4.tcp_rmem=4096 87380 268435456
net.ipv4.tcp_wmem=4096 65536 268435456
 
[vm]
transparent_hugepages=never
 
[bootloader]
cmdline=hugepagesz=2M hugepages=16384 <a target="_blank" rel="noopener noreferrer" href="https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit">iommu</a>=pt
EOF
 
tuned-adm profile ai-gpu

The key choices here:

numa_balancing=0 stops the kernel migrating memory between sockets during training runs, a common source of stall on dual-socket GPU boxes.
The rmem_max and tcp_rmem values raise the socket buffer ceiling to 256MB. On 25G, 40G, or 100G interconnects between training nodes, default buffer sizes cap throughput well below line rate.
transparent_hugepages=never removes the latency jitter THP causes for frameworks like PyTorch and TensorFlow that allocate large tensors.
iommu=pt puts IOMMU in passthrough mode, required for GPU and NIC passthrough and reduces overhead on bare-metal DMA.

Anything under [bootloader] requires a reboot. After activating the profile, run tuned-adm verify to confirm runtime parameters applied, and check journalctl -u tuned for errors. Benchmark before and after with iostat -xz, numastat, and the relevant workload tool (iperf3, fio, or the actual training run).

One trade-off worth being explicit about: disabling CPU security mitigations gains roughly 3-8% on GPU workloads but costs 15-30% on workloads with heavy system call patterns. Decide based on the threat model for the box. Inside a dedicated training cluster behind a firewall, the maths usually favours disabling them. On a multi-tenant host, leave them on.

Managing profiles across a server fleet

Applying tuned by hand stops being viable past a handful of servers. Ansible handles this cleanly. A single playbook installs tuned, drops custom profile directories under /etc/tuned/ via the template module, and applies the right profile per inventory group.

Map profiles to roles in inventory:

GPU and AI nodes: accelerator-performance, or a custom profile that inherits from it
Database servers: throughput-performance or the engine-specific profile
CDN and edge nodes pushing high-bandwidth traffic: network-throughput
API and web servers behind a load balancer: network-latency
VPS and KVM guests: virtual-guest
Hypervisor hosts: virtual-host

Drift is the real operational problem. Manual sysctl changes, package upgrades that ship new defaults, or another config management tool stepping on tuned will all cause settings to diverge from what the profile says. Schedule an Ansible job to run tuned-adm active and tuned-adm verify on cron and alert on failures. Watch /var/log/tuned/tuned.log for "Verification failed" lines.

Conclusion

tuned removes most of the guesswork from kernel and sysctl tuning. The defaults are good enough for general use, and the workload-specific profiles like accelerator-performance, throughput-performance, and network-throughput get you most of the way to optimised without writing a single config file.

Pick the closest stock profile, run tuned-adm verify, then benchmark
Build custom profiles by inheriting from a stock profile and overriding only what you need
Be deliberate about NUMA balancing, hugepages, and network buffer sizes on GPU and high-bandwidth boxes
Deploy with Ansible and audit on a schedule to catch drift

Need bare-metal capacity with the bandwidth headroom to actually use these settings? Talk to FDC about dedicated servers built for high-throughput and GPU workloads.

Blog