Ceph Erasure Coding Calculator

Plan your Ceph EC profile with confidence. Estimate usable capacity, raw capacity requirements, storage efficiency, overhead, and failure tolerance for profiles like 2+1, 4+2, 8+3, and 12+4.

Capacity & Resilience Inputs

Tip: For host-level fault tolerance, failure domains should be at least k+m.

Results

Profile
4+2
Storage Efficiency
66.67%
Overhead
50.00%
Failure Tolerance
Up to 2 chunk failures
Usable Capacity
800 TB
Raw Capacity
1200 TB
Minimum OSDs Required
6
Equivalent 3x Replication Usable
400 TB
Efficiency: Good OSD Count: OK Failure Domains: Check
These estimates are theoretical and do not include BlueStore metadata, reserved space, nearfull thresholds, or temporary recovery overhead.

Ceph Erasure Coding Calculator Guide for Capacity Planning

A Ceph erasure coding calculator helps you predict how much usable storage you can deliver from a given amount of raw disk while preserving fault tolerance. In Ceph, erasure coding is commonly used for object workloads where storage efficiency matters, especially at large scale. Instead of storing multiple complete replicas of each object, erasure coding splits data into k data chunks and adds m coding chunks. The total chunk count is k+m, and any k chunks are enough to reconstruct the object.

This gives you two major planning levers: efficiency and resilience. Higher k usually increases efficiency, while higher m increases tolerance for simultaneous failures. A good Ceph design starts with business-level requirements—durability targets, performance profile, and recovery objectives—then maps those goals to the right erasure coding profile and hardware topology.

How the Ceph EC Capacity Formula Works

The calculator above uses standard erasure coding math:

Example: A 4+2 profile has efficiency of 4/6 = 66.67%. That means 1200 TB raw provides roughly 800 TB usable before operational buffers. The same raw pool under 3x replication provides only ~400 TB usable. This efficiency gap is why erasure coding is so attractive for large object archives, backups, and content repositories.

Choosing the Right Ceph Erasure Coding Profile

Common profiles include 4+2, 6+3, 8+3, and 12+4. There is no universal “best” profile; the right choice depends on cluster size, failure domain design, and rebuild behavior.

If your workload has strict low-latency random writes, replicated pools may still be preferable for hot data, while erasure-coded pools are excellent for colder object data where capacity economics dominate.

Failure Domains Matter More Than Raw Math

Many planning mistakes happen when teams focus on k+m math but ignore placement topology. If your failure domain is host, you need enough hosts to spread all chunks safely. As a baseline, hosts (or racks if that is your failure domain) should be at least k+m. In practice, extra domains improve balancing and reduce corner-case placement constraints during failures and maintenance.

Also remember that tolerance claims are conditional. “Can tolerate m failures” means failures affect distinct chunks in a stripe and placement remains valid. Real-world outages can be correlated: a rack power event, TOR switch problem, or maintenance window can reduce effective resilience quickly if topology is too tight.

Real-World Capacity vs Theoretical Capacity

A Ceph erasure coding calculator gives theoretical results. Production planning should include operational headroom:

A practical method is to compute theoretical usable, then apply a safety factor (for example, operate at 70% to 80% of theoretical usable depending on SLA strictness). This protects performance and shortens risk windows during rebuild events.

Ceph Erasure Coding vs 3x Replication

Replication is simple and fast for many hot data paths, but capacity efficiency is lower. 3x replication means only about 33.33% efficiency. By comparison, 4+2 provides 66.67% and 8+3 provides 72.73%. At petabyte scale, this difference can translate into large reductions in disk count, rack space, power, and cooling.

That said, performance characteristics are different. Erasure coding introduces encode/decode costs and can impact small writes unless architecture is tuned correctly. A common strategy is tiering: keep performance-sensitive data on replicated pools and move capacity-heavy data to EC pools.

Performance and Recovery Considerations

When evaluating a Ceph erasure coding profile, capacity is only one axis. You should also evaluate:

Wider profiles increase fault-domain spread and efficiency, but rebuild operations may touch more devices and can take longer depending on hardware and utilization. Always test with representative load and failure drills rather than relying only on static formulas.

Example Ceph EC Planning Scenarios

Profile Efficiency Raw Capacity Theoretical Usable Chunk Failures Tolerated Minimum Domains
4+2 66.67% 1.2 PB 0.8 PB 2 6
6+3 66.67% 3.0 PB 2.0 PB 3 9
8+3 72.73% 5.5 PB 4.0 PB 3 11
12+4 75.00% 8.0 PB 6.0 PB 4 16

Minimum domains refers to an absolute floor for placement. Production designs generally use more than the minimum to maintain flexibility during failures and maintenance.

Best Practices for Ceph Erasure Coding Capacity Planning

FAQ: Ceph Erasure Coding Calculator

What does k+m mean in Ceph erasure coding?
It means k data chunks plus m parity (coding) chunks per stripe. Total chunks are k+m.

How do I calculate usable capacity in Ceph EC?
Multiply raw capacity by k/(k+m). Example: 8+3 with 1.1 PB raw gives ~0.8 PB usable.

Is erasure coding always better than replication?
For capacity efficiency, often yes. For the lowest write latency and simplicity, replicated pools may still be better for hot paths.

How many failures can EC tolerate?
Theoretical stripe-level tolerance is up to m chunk failures, assuming failures affect distinct chunks and placement constraints are met.

Can I use EC for all Ceph workloads?
Not always. Match pool type to workload behavior. Many environments combine replicated and EC pools for balanced outcomes.

Conclusion

A reliable Ceph erasure coding strategy is built on both math and architecture. The calculator gives a fast estimate of efficiency and capacity, but final design should include topology validation, recovery testing, and operating headroom. If you size with realistic failure scenarios and choose a profile aligned to workload behavior, Ceph erasure coding can deliver strong durability with significantly better storage economics than traditional replication.