Why ZFS RAIDZ1 Made My Photo Library 17x Slower (And How I Fixed It) Mar 31, 2026

My Apple Photos library is about 3 TB — roughly 1.3 million files. It outgrew a single drive a while back, and I had a wish list for the replacement setup:

  • Span multiple drives — consumer SSDs go up to 8 TB now, but they’re outside the price-per-TB sweet spot. I wanted to stay future-proof without paying a premium.
  • Single drive fault tolerance — I don’t want a dead SSD to mean lost photos
  • Bit rot protection — silent data corruption is real and I want checksums
  • APFS volume — Apple Photos requires it. I did try pointing Photos at a ZFS dataset directly. It looked like it worked for a moment, right up until I tried to do anything, at which point it came to a screeching halt. APFS also has instant, zero-cost file clones, which I rely on to keep files both inside the Photos library and in my own folder structure on disk. ZFS doesn’t have that.
  • Direct-attached to my Mac Mini — not a NAS. I’d tried running Apple Photos over a network share before and didn’t enjoy the experience

No single technology does all of this. My esoteric solution: APFS on a ZFS zvol. I built a ZFS pool from three 4 TB SATA SSDs in a RAIDZ1 configuration (ZFS’s equivalent of RAID5), created a zvol, formatted it as APFS, and ticked every box. I did not benchmark the write performance first. This turned out to be a mistake.

It was painfully slow. Not “a little slow” — 19 MB/s for file-level operations on SSDs capable of hundreds of megabytes per second. I spent a few weeks running experiments to figure out why, and the answer surprised me.

The original (wrong) hypothesis

The pool was created with 16K volblocksize, and I’d read that small volblocksizes on RAIDZ cause write amplification. So I assumed migrating to 128K blocks would fix things. I created a new 128K zvol and started rsyncing 3 TB to it — without benchmarking first. Again.

The rsync crawled at 3.5 MB/s. Worse than the original.

It turned out the new zvol had sync=standard (the default), which forces every write to be flushed to disk. Setting sync=disabled on the 128K zvol brought it up to 25 MB/s — but 16K with sync=disabled was 280 MB/s. The original blocksize was fine. The original hypothesis was completely wrong.

But 280 MB/s was a cached result (the test only wrote 10 GB into 64 GB of RAM). The real question was: what’s the sustained throughput for real file-level workloads?

The experiments

I ran a series of benchmarks writing 6,000 files of 14 MB each (82 GB total), with the ZFS dirty data buffer reduced to 512 MB to prevent caching from masking the results. Each test ran long enough to reach steady state.

Here’s what I found:

The two bars in the middle are barely visible. That’s the point.

What each test measured

APFS on bare SSD (234 MB/s): Writing files directly to an APFS volume on a single SATA SSD, no ZFS involved. This is the baseline — how fast the hardware and filesystem can go.

APFS on RAIDZ1 zvol (19 MB/s): The production setup. APFS sitting on a ZFS zvol on the RAIDZ1 pool. 12x slower than bare APFS.

ZFS dataset on RAIDZ1 (15 MB/s): ZFS’s own native filesystem on the same RAIDZ1 pool, no APFS, no zvol. Even slower. This ruled out APFS as the culprit — the double copy-on-write “APFS on ZFS” stack wasn’t the problem.

ZFS stripe (254 MB/s): A ZFS pool with two drives, no parity. Each drive is its own vdev — data is striped across them, but no parity is computed. Actually faster than the single bare SSD because two drives share the load. This was the last experiment I ran, and the one that confirmed the fix.

RAIDZ1 is fine for big files

Here’s the thing that makes this confusing: RAIDZ1 is fast for sequential I/O. Writing a single 40 GB file to the same pool hit 203 MB/s. The hardware path is fine. The problem is specific to creating many files.

A 10x difference on the same hardware, same pool, same drives. The only variable is whether you’re writing one big file or many smaller ones.

Why RAIDZ1 is pathologically slow for file creates

In a ZFS stripe, when you create a file, the data and metadata blocks get written to whichever drive they hash to. One drive does the work, and you’re done.

In RAIDZ1, every write — no matter how small — must produce a full parity stripe across all drives. When ZFS needs to write a 16K metadata block (a dnode, an indirect block, a space map entry), it can’t just write 16K to one drive. It has to write data across the data drives and compute and write parity to the parity drive. All drives must participate in every write.

For sequential writes, this is fine. ZFS fills full stripes efficiently, and the parity cost is amortized across large chunks of data.

But creating a file generates many tiny, scattered metadata writes — B-tree updates, dnode allocations, indirect blocks, space map entries. Each one triggers a full-width stripe write. What would be a quick single-drive operation in a stripe becomes a synchronized multi-drive operation in RAIDZ1, serialized on the slowest drive.

This isn’t a bug — it’s inherent to how parity-based redundancy works with copy-on-write. Every COW metadata block on RAIDZ1 requires a parity stripe. And a photo library with a million files generates a lot of metadata.

The stripe benchmark

Once I understood the problem, I tested a ZFS stripe pool (two drives, no parity) to confirm that removing RAIDZ1 parity was the fix:

The stripe held steady at 254 MB/s for the entire 82 GB write with no degradation. The RAIDZ1 line barely registers on the same scale.

The fix

I’m going to destroy the RAIDZ1 pool and rebuild it as a stripe — three 4 TB SSDs as independent vdevs, no parity. I’ll go from ~8 TB usable (RAIDZ1 loses a drive to parity) to ~12 TB, and from 15 MB/s to 250+ MB/s for file-level operations.

The trade-off is obvious: any single drive failure takes down the whole pool. But I have offsite backup, so a drive failure means a day of restoring, not data loss. For a personal photo library, that’s an acceptable trade. For a production database, it wouldn’t be.

I do lose the bit rot protection from RAIDZ1 parity (a stripe pool can detect corruption via checksums, but can’t repair it without redundancy). I’m keeping ZFS for the checksums — at least I’ll know if something goes wrong — and relying on the offsite backup for recovery.

Lessons

  1. Benchmark before you migrate. I migrated 3 TB onto RAIDZ1 without testing write performance. Then I almost migrated it again to a 128K zvol based on a theory that turned out to be completely wrong. Measure twice, rsync once.

  2. Cache will lie to you. With 64 GB of RAM and a 3 GB ZFS dirty data buffer, any benchmark under ~10 GB is meaningless on this machine. Early tests showed 280 MB/s that was really 19 MB/s.

  3. RAIDZ is not RAID5. Traditional hardware RAID5 has its own problems, but it doesn’t do copy-on-write. ZFS COW + RAIDZ parity is a specific combination that creates pathological performance for small scattered writes.

  4. “APFS on ZFS” wasn’t the problem. My initial suspicion was that running APFS on a ZFS zvol (double copy-on-write) was causing the slowdown. The experiments showed APFS-on-zvol was actually slightly faster than native ZFS for file writes. APFS is innocent.

  5. This isn’t just an SSD thing. It’s tempting to think the penalty is only visible because SSDs are fast enough to expose it. But the underlying mechanism — every metadata write requiring a full parity stripe — applies to any storage. On spinning disks the absolute numbers are lower and the penalty may not be exactly 17x, but the same fundamental overhead is there. It’s just easier to blame the hardware when everything is already slow.

System: M4 Mac Mini, 64 GB RAM, 3x 4 TB SATA SSDs in OWC ThunderBay 4 mini (Thunderbolt 3), OpenZFS 2.3.0 on macOS. All experiment data and scripts are on GitHub.

...
Favorites: Bechamel Benchmarks, Gun Prosecutions, and Driving Negligence Mar 1, 2026
February 28, 2026 — How students use AI matters more than whether they do, post-2020 SF traffic enforcement finds new footing, Daring Fireball tames update reminders. ...
Favorites: Waymo Wins, Weary Fusion, and Whole-Aisle Coca-Cola Nov 25, 2025
November 25, 2025 — Charm turns out to be a real political asset in one surreal White House exchange, a social rule about who demands the “doctor” title survives another stress test.
Favorites: Fission, Forgeries, and Foreign Policy Nov 21, 2025
November 20, 2025 — A big tent for Democrats, skepticism on school laptops, and a new cloud for fights. ...
Favorites: Some Dormant Drafts: Darth Vader, and Deficits Nov 13, 2025
November 12, 2025 — A long-delayed tour through remedial math and missing test scores, electric trash trucks and aging EVs, and the awkward middle ground between fewer parking mandates and smarter sha ...
Favorites: Data Centers, Democrats, and Disavowals Nov 7, 2025
November 6, 2025 — Abundance earns policy praise, a wild SF cat saga spirals, open-weight TTS turns heads. ...
Favorites: Minecraft, Medicine, and Minerals Oct 20, 2025
October 19, 2025 — GLP-1s hit mainstream adoption, cabinet small talk reveals big regrets, AI revenue expectations ...
Favorites: AVs replace human drivers, AI makes radiologists faster Sep 26, 2025
September 24, 2025 — Broadway’s profits lag, Discord drama lingers, and The Argument's comment section AI-doom critique keeps the temperature down. ...
Favorites: Fee Fiascos, Free Speech, and $150B Sep 23, 2025
September 21, 2025 — Voters say climate matters but pocketbooks matter more—talk cheaper energy if you want the win. ...
Favorites: Factory Fumbles, Fare Findings, and Football’s GOAT Sep 17, 2025
September 16, 2025 — ICE’s Hyundai/LG raid stalls the plant until 2026, SB 79 clears the finish line, and TikTok’s “deal” leaves editorial control in Beijing. ...