Linux Memory Management: Swap, OOM Killer & Cgroups
12 min read - May 31, 2026

How Linux swap, the OOM killer, and cgroups work together — with configuration examples for databases, web servers, and multi-tenant VPS hosts.
Linux memory management explained: swap, OOM killer, and cgroups
Linux handles memory differently than most operating systems. High RAM usage isn't always a problem — the kernel actively uses free memory for caching to speed up disk reads. But when real memory pressure builds, three mechanisms do the work: swap, the OOM killer, and cgroups. Understanding how each one behaves, and how to configure them, is the difference between a server that degrades gracefully under load and one that crashes without warning.
How Linux manages memory pages
Every process runs in its own virtual address space, up to 128 TB on 64-bit systems. The kernel maps these virtual addresses to physical RAM through page tables, with the Translation Lookaside Buffer (TLB) caching recent lookups. A TLB hit takes around 1 nanosecond; a miss costs 20–100 nanoseconds, which adds up in memory-intensive workloads like databases.
Physical memory is divided into 4 KB pages, and the kernel splits them into two categories:
- File-backed pages — tied to files on disk. The kernel can discard clean ones or flush dirty ones without needing swap.
- Anonymous pages — heap and stack memory with no backing file. These must be written to swap before the kernel can free them.
On servers with high memory demand, a large proportion of anonymous pages means swap gets involved early. Watch the si (swap in) and so (swap out) columns in vmstat 1 — persistent non-zero values are your first warning that the system is under pressure.
For monitoring, use the right tool for the job:
| Tool | Best for | Key metric |
|---|---|---|
free -h | Quick system-wide snapshot | available column |
vmstat 1 | Real-time swap and I/O monitoring | si, so |
htop | Interactive per-process view | Memory bars, process list |
smem | Accurate per-process usage | USS (Unique Set Size) |
/proc/meminfo | Kernel-level detail | MemAvailable, Dirty, Slab |
One common mistake: watching the free column in free -h and panicking. The available column is what matters. It includes memory the kernel can reclaim from cache on demand. A server showing only 512 MB free but 5 GB available is not in trouble.
When memory drops below a threshold, the kernel's kswapd daemon starts reclaiming pages in the background. If that's not enough, the kernel falls into direct reclaim, blocking processes until pages are freed. This is where latency spikes come from. Set an alert when MemAvailable falls below 10–15% of total RAM so you have time to respond.
Configuring swap
Swap is a disk area — either a partition or a file — where the kernel moves inactive anonymous pages when RAM fills up. The speed gap is significant: DDR4 RAM has roughly 100 ns latency, while NVMe SSDs are around 100,000 ns and SATA SSDs closer to 500,000 ns. Swap is a safety buffer, not extra RAM. A server that consistently relies on swap has a memory problem that more swap won't fix.
Use a swap file rather than a partition. It's easier to resize and doesn't require repartitioning.
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfileAdd the file to /etc/fstab to persist across reboots. The chmod 600 step is required — any data paged out of RAM is readable from swap, so the file must not be world-readable.
After creating swap, tune vm.swappiness. The default of 60 is aggressive. For most hosting workloads you want the kernel to prefer RAM and only use swap as a last resort:
| Server role | vm.swappiness | vm.vfs_cache_pressure |
|---|---|---|
| General web server | 10–20 | 50 |
| Database (MySQL/PostgreSQL) | 1–5 | 50 |
| Default (most distros) | 60 | 100 |
For swap sizing: 1–2 GB is enough for a 2 GB VPS handling occasional traffic spikes. On systems with 8 GB or more, a fixed 2–4 GB swap is generally sufficient. The goal is to give the kernel a pressure valve for cold pages, not to extend total addressable memory.
On RAM-constrained servers with ample CPU, zram creates a compressed swap area in memory, avoiding disk I/O entirely. It's worth considering on multi-tenant VPS hosts where NVMe is shared across tenants. Watch for I/O contention if swap lives on the same device as database files — heavy swapping and high-throughput disk writes don't coexist well.
The OOM killer
When the kernel exhausts RAM and swap and can't reclaim enough memory through other means, the OOM killer steps in. It scores processes using the oom_badness() function:
points = (rss_anon + rss_file + rss_shmem + swapents + pgtables_pages) + (oom_score_adj × totalpages / 1000)The process with the highest score gets killed. The formula favors large memory consumers, and the kernel avoids killing multiple processes in quick succession by checking whether a process was already terminated in the last 5 seconds.
Two types of OOM events appear in logs:
- Global OOM — the entire system is out of RAM and swap. Logs prefix with
Out of memory: - Cgroup OOM — a container or service hit its
memory.maxlimit. Logs prefix withMemory cgroup out of memory:
To review past OOM events:
dmesg -T | grep -i "out of memory"
journalctl -k --grep="oom"Pay attention to the order field in OOM logs. A value above 0 suggests memory fragmentation rather than total exhaustion — the kernel couldn't find enough contiguous pages even with free memory available.
You can influence which processes the OOM killer targets by adjusting /proc/<pid>/oom_score_adj. The range is -1000 (never kill) to +1000 (kill first). For systemd-managed services, set this permanently in the unit file:
[Service]
OOMScoreAdjust=-1000Additional sysctl parameters for tuning OOM behavior:
| Parameter | Value | Effect |
|---|---|---|
vm.overcommit_memory | 0 | Default heuristic overcommit mode |
vm.overcommit_memory | 2 | Strict mode; prevents allocations exceeding RAM × overcommit_ratio + swap |
vm.panic_on_oom | 1 | Reboots instead of killing a process |
vm.oom_kill_allocating_task | 1 | Kills the process that triggered OOM rather than the largest consumer |
For proactive monitoring, check /proc/pressure/memory (Pressure Stall Information, available since kernel 4.20). Watch the some avg10 value: below 5% is healthy, sustained above 20% means an OOM event is likely coming. A rising allocstall counter in /proc/vmstat is another early signal — it counts direct reclaim stalls, which often precede OOM kills. Tools like systemd-oomd or earlyoom can act on PSI thresholds before the kernel's OOM killer fires.
Cgroups and memory limits
Control groups (cgroups) let you organize processes into groups and enforce hard resource limits. Introduced in Linux 2.6.24, they're the foundation of container runtimes including Docker, Podman, Kubernetes, and LXC. The kernel tracks memory usage per cgroup, covering anonymous memory, file-backed pages, and kernel objects. If a cgroup hits its limit, the kernel reclaims memory within that group or triggers a cgroup-scoped OOM kill.
Cgroup v1 and v2 differ primarily in how they're structured. V1 mounts each controller (memory, CPU, I/O) separately under /sys/fs/cgroup/<controller>/, which leads to inconsistent resource tracking. V2 uses a unified hierarchy at /sys/fs/cgroup/. Kubernetes switched to v2 as default in version 1.25 and dropped v1 support in 1.31.
To check which version your system uses:
stat -fc %T /sys/fs/cgroup/cgroup2fs means v2; tmpfs typically means v1.
| Feature | Cgroup v1 | Cgroup v2 |
|---|---|---|
| Hierarchy | Multiple, per-controller | Single, unified |
| Hard memory limit | memory.limit_in_bytes | memory.max |
| Soft memory limit | memory.soft_limit_in_bytes | memory.high (throttles) |
| Usage tracking | memory.usage_in_bytes | memory.current |
| Pressure metrics | Limited | PSI integrated |
The key memory controls in cgroup v2:
| Parameter | Type | Description |
|---|---|---|
memory.max | Hard limit | Exceeding this triggers the OOM killer |
memory.high | Soft limit | Throttles allocation and triggers reclaim before hitting the hard limit |
memory.low | Soft protection | Memory below this threshold is reclaimed last |
memory.min | Hard protection | Memory below this level is never reclaimed |
memory.swap.max | Swap limit | Set to 0 to disable swap for this cgroup |
memory.oom.group | Boolean | If enabled, OOM kills all processes in the cgroup together |
A practical rule: set memory.high around 10–20% below memory.max to give the kernel room to reclaim before hitting the hard limit. When sizing memory.max, add 20–30% above the application's peak usage to account for page cache, which counts against cgroup memory totals.
Manage cgroups via systemd rather than writing directly to the cgroup filesystem. Use unit file directives like MemoryMax=, MemoryHigh=, and MemoryMin= for persistent limits. For quick tests:
systemd-run --scope -p MemoryMax=512M <command>For web server worker pools, setting memory.oom.group=1 ensures a clean kill if one worker exceeds its limit — no orphaned processes left behind. For database engines, memory.min protects the buffer pool from being reclaimed under system-wide pressure.
Memory configuration by server role
The right memory settings depend on what the server is doing. Applying the same configuration to a database and a PHP web server will hurt one of them.
| Server role | vm.swappiness | OOM strategy | Cgroup policy |
|---|---|---|---|
| Database | 1–5 | Protect (OOMScoreAdjust=-900) | Use memory.min to protect buffer pool |
| Web/app server | 10–20 | Default | Cap per worker pool via memory.max |
| Background worker | 60 | Killable (OOMScoreAdjust=+200) | Throttle via memory.high |
| Multi-tenant VPS | 60 (with zram) | Default | Hard isolation per tenant via memory.max |
For MySQL and PostgreSQL, allocate 50–70% of available RAM to innodb_buffer_pool_size, disable Transparent Huge Pages to reduce latency spikes, and protect the process with OOMScoreAdjust=-900 in the systemd unit file.
For PHP-FPM, size worker pools against actual memory usage. Each worker typically uses 30–100 MB. Divide allocated RAM by average worker size to get a safe pm.max_children value. Use memory.max in cgroups to cap the pool.
For write-heavy workloads, set vm.dirty_ratio to around 10% and vm.dirty_background_ratio to 3%. This flushes dirty pages more frequently, avoiding large I/O stalls.
Make kernel tuning persistent by saving parameters to /etc/sysctl.d/90-memory.conf. Settings applied at runtime are lost on reboot.
For a summary of recommended values by role:
| Parameter | Web/app server | Database server |
|---|---|---|
vm.swappiness | 10–20 | 1–5 |
vm.vfs_cache_pressure | 50 | 50 |
vm.dirty_ratio | 15% | 10% |
vm.min_free_kbytes | 65536 | 65536 |
| OOM protection | Default | OOMScoreAdjust=-1000 |
If you're running high-density workloads and need a server with the headroom to apply these policies properly, FDC's dedicated servers are worth a look.

Linux Memory Management: Swap, OOM Killer & Cgroups
How Linux swap, the OOM killer, and cgroups work together — with configuration examples for databases, web servers, and multi-tenant VPS hosts.
12 min read - May 31, 2026

Have questions or need a custom solution?
Flexible options
Global reach
Instant deployment
Flexible options
Global reach
Instant deployment