Remote Support Start download

ZFS SLOG and Special VDEV: Accelerate Sync Writes and Optimize Metadata

ZFSTrueNASStorage
ZFS SLOG and Special VDEV: Accelerate Sync Writes and Optimize Metadata

ZFS pools on spinning disks deliver impressive sequential throughput but struggle with synchronous write operations. Every sync write must land on non-volatile storage before ZFS acknowledges the operation — and with HDDs, that means waiting for platter rotation. SLOG (Separate Intent Log) and Special VDEV are two ZFS features that specifically address this problem.

Understanding the ZFS Intent Log (ZIL)

Before discussing SLOG, we need to understand the ZIL. The ZFS Intent Log is a mechanism that secures synchronous write operations. During a sync write, the following happens:

  1. Data is written to the ZIL
  2. ZFS acknowledges the write to the application
  3. In the background, data is written to its final location in the pool during the next Transaction Group (TXG) commit
  4. After the TXG commit, the ZIL entry is released

The ZIL always exists — it is an integral part of every ZFS pool. By default, it resides on the pool disks themselves. The problem: when the ZIL sits on the same HDDs as the data, ZIL writes compete with regular I/O operations.

When Are Sync Writes Used?

Not every application uses sync writes. The key scenarios:

Protocol/ApplicationSync Writes?Reason
NFS (sync=standard)YesNFS standard requires sync
iSCSIYesBlock-level protocol, sync by default
SMB (Durable Handles)PartiallyDepends on client and configuration
Databases (PostgreSQL, MySQL)Yesfsync() for transaction safety
VMs on ZFS (zvol)YesGuest OS expects sync confirmation
Local file operationsNo (usually)async by default

SLOG: The Separate Intent Log

A SLOG is a dedicated device to which the ZIL is offloaded. Instead of landing on slow pool disks, the ZIL sits on a fast NVMe or Optane device.

What a SLOG Needs (and What It Does Not)

SLOG needs:

  • Extremely high IOPS (4K random write)
  • Very low latency (< 100 microseconds ideal)
  • Power-Loss Protection (PLP) — absolutely critical
  • Moderate capacity (8–32 GB usually sufficient)

SLOG does NOT need:

  • High sequential throughput
  • Large capacity (the ZIL holds data for only seconds)
  • High endurance (TBW is rarely a concern)

Why So Little Capacity?

The ZIL stores data only until the next TXG commit (every 5 seconds by default). After that, data is moved to its final location and the ZIL entry is deleted. Even under heavy sync-write load, rarely more than a few gigabytes accumulate.

The rule of thumb:

SLOG capacity ≈ Maximum sync-write throughput × TXG timeout × 2

Example: 500 MB/s sync writes x 5 seconds x 2 (safety buffer) = 5 GB. A 16 GB SLOG is sufficient for the vast majority of workloads.

Setting Up a SLOG

# Add SLOG device to pool
zpool add tank log /dev/nvme0n1

# SLOG as mirror for redundancy (recommended)
zpool add tank log mirror /dev/nvme0n1 /dev/nvme1n1

# Check pool status
zpool status tank

Output:

  pool: tank
 state: ONLINE
config:
    NAME                    STATE
    tank                    ONLINE
      raidz2-0              ONLINE
        da0                 ONLINE
        da1                 ONLINE
        da2                 ONLINE
        da3                 ONLINE
        da4                 ONLINE
        da5                 ONLINE
    logs
      mirror-1              ONLINE
        nvme0n1             ONLINE
        nvme1n1             ONLINE

Measuring SLOG Performance

Measure the difference before and after adding the SLOG:

# Test sync-write performance (without SLOG)
fio --name=sync-write --rw=randwrite --bs=4k --size=1G \
    --numjobs=8 --sync=1 --direct=1 --filename=/tank/testfile

# Test again after SLOG installation
# Expected: 5–50x more IOPS for sync writes

Power-Loss Protection: Non-Negotiable

A SLOG without PLP is worse than no SLOG at all. The reason: ZFS acknowledges the sync write as soon as data lands in the ZIL (on the SLOG). If the SLOG device loses data during a power failure that has not yet been written to the pool, that data is irrecoverably lost — and ZFS has already told the application the write was safe.

Devices with PLP:

  • Intel Optane (all models)
  • Enterprise NVMe with PLP (e.g., Samsung PM9A3, Micron 7450)
  • Enterprise SATA SSDs (e.g., Intel S4610, Samsung PM893)

Devices WITHOUT PLP (do not use as SLOG):

  • Consumer NVMe (Samsung 990 Pro, WD Black SN850X)
  • Consumer SATA SSDs (Samsung 870 EVO, Crucial MX500)

Special VDEV: Accelerating Metadata

The Special VDEV is a newer ZFS feature (since OpenZFS 0.8) that addresses a different bottleneck: metadata and small blocks. ZFS stores metadata (directory structures, file attributes, deduplication tables) on the pool disks by default. On large pools with millions of files, metadata lookups become the bottleneck.

What the Special VDEV Stores

A Special VDEV handles:

  • Metadata (dnode blocks, directory contents, filesystem metadata)
  • Small data blocks (configurable via special_small_blocks_threshold)
  • Deduplication tables (DDT — when dedup is enabled)

Setting Up a Special VDEV

# Add Special VDEV (mirror recommended)
zpool add tank special mirror /dev/nvme2n1 /dev/nvme3n1

# Set threshold for small blocks (e.g., 128 KB)
zfs set special_small_blocks=128k tank

# Check pool status
zpool status tank

Output:

  pool: tank
 state: ONLINE
config:
    NAME                    STATE
    tank                    ONLINE
      raidz2-0              ONLINE
        da0 ... da5         ONLINE
    logs
      mirror-1              ONLINE
        nvme0n1             ONLINE
        nvme1n1             ONLINE
    special
      mirror-2              ONLINE
        nvme2n1             ONLINE
        nvme3n1             ONLINE

Sizing the Special VDEV

The Special VDEV requires significantly more capacity than a SLOG since metadata is stored permanently:

Special VDEV capacity ≈ Pool capacity × metadata ratio + small blocks

Rules of thumb:

  • Without small blocks: 1–5% of pool capacity for pure metadata
  • With small_blocks=128k: 5–20% of pool capacity (depending on workload)

For a 100 TB pool:

  • Pure metadata: 1–5 TB NVMe
  • With small blocks: 5–20 TB NVMe

When Is a Special VDEV Worth It?

ScenarioBenefit
Millions of small files (email, home shares)High — metadata lookups become dramatically faster
Few large files (video, backup images)Low — minimal metadata load
Dedup enabledHigh — DDT on NVMe is orders of magnitude faster
Many snapshotsMedium — snapshot metadata benefits

Hardware Selection: Intel Optane as the Gold Standard

Intel Optane is based on 3D XPoint technology and offers characteristics ideal for ZFS log devices:

PropertyOptane P5800XEnterprise NVMeConsumer NVMe
4K Random Write Latency6 microseconds15–30 microseconds50–200 microseconds
4K Random Write IOPS1.5M200–500K50–100K
Power-Loss ProtectionYesYes (Enterprise)No
Endurance (DWPD)1001–30.3–1
Price (2026)High (end of life)MediumLow

Optane availability: Intel has discontinued Optane production. Remaining stock is still available, but prices are rising. Alternatives like Samsung PM9A3 or Kioxia FL6 offer good performance with PLP but cannot match Optane’s latency figures.

Optane Models for SLOG and Special VDEV

  • Optane P1600X (118 GB): Ideal as SLOG — compact, affordable, PLP
  • Optane P5800X (400/800 GB): Ideal as Special VDEV — high capacity, extreme performance
  • Optane 905P/900P: Older generation, but still excellent for SLOG

Risks When SLOG Fails

SLOG Failure (Non-Redundant)

If a single SLOG without mirror fails:

  • Pool stays online — ZFS automatically falls back to the pool-internal ZIL
  • Performance drops to the level without SLOG
  • No data loss (as long as no power failure occurs during the transition)

Partial SLOG Mirror Failure

With a SLOG mirror where one device has failed:

  • Pool stays online with full performance
  • Redundancy is gone — replace the failed device promptly

Special VDEV Failure

A failed Special VDEV is critical:

  • The entire pool becomes unreadable if the Special VDEV is lost
  • A Special VDEV must be configured as a mirror
  • Consider a 3-way mirror for critical data
# Special VDEV as 3-way mirror (highest safety)
zpool add tank special mirror /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1

Monitoring

Monitor SLOG and Special VDEV regularly:

# SLOG and Special VDEV status
zpool status tank

# I/O statistics for log devices
zpool iostat -v tank 5

# Check SMART values of NVMe devices
smartctl -a /dev/nvme0n1

Pay special attention to:

  • Wear Level (Percentage Used) — usually uncritical for enterprise NVMe
  • Available Spare — should be above 10%
  • Media Errors — should remain at 0

Conclusion

SLOG and Special VDEV are not universal solutions but targeted optimizations for specific bottlenecks. A SLOG is worthwhile with high sync-write workloads (NFS, iSCSI, databases), while a Special VDEV benefits metadata-intensive workloads (many small files, dedup). The right hardware choice — especially Power-Loss Protection — is not optional but essential for data integrity.

More on these topics:

Need IT consulting?

Contact us for a no-obligation consultation on Proxmox, OPNsense, TrueNAS and more.

Get in touch