Complete Guide to Sequencing Coverage (Depth) Calculation
What sequencing coverage means
Sequencing coverage, often called read depth and written as “X” (for example, 30X or 100X), is the average number of times a base in your target region is read by the sequencer. Higher coverage generally increases confidence in variant calls, improves sensitivity for low-frequency events, and makes downstream analyses more robust. In practical terms, 30X whole-genome sequencing means that each genomic base is represented by roughly 30 bases of usable sequence data on average.
Coverage is not the same as breadth of coverage. Depth describes how many times a base is read, while breadth describes what percentage of the target has at least a minimum threshold of coverage (for example, 95% of targets at ≥20X). Most production studies care about both. A project with high average depth can still miss regions due to GC bias, capture inefficiency, repeats, or mappability constraints.
Coverage formula and practical interpretation
The core equation is simple:
Coverage (X) = Effective Bases / Target Size
Where effective bases are your total sequenced bases after accounting for read structure and real-world losses:
Effective Bases = Read Count × Read Length × (2 if paired-end else 1) × Usable Fraction
If you want to plan the run in reverse (how many reads you need for a target depth), rearrange the equation:
Required Reads = (Target Coverage × Target Size) / (Read Length × End Factor × Usable Fraction)
This calculator implements both directions: it estimates depth from your planned reads and estimates required reads for your target depth. You can also apply a reserve percentage to handle duplicates, off-target reads, and other practical losses that reduce callable depth.
How to plan reads for different sequencing applications
Coverage targets vary by study objective, assay design, sample quality, and downstream analysis strategy. For germline whole-genome sequencing, 30X is a common baseline for high-quality short-read variant discovery, while 40X to 60X may be selected for difficult samples or more conservative quality goals. For exome projects, teams frequently target higher nominal depth because capture introduces non-uniformity; average depths around 80X to 150X are common depending on desired on-target performance and minimum-per-base thresholds.
Targeted oncology panels and amplicon workflows often require substantially deeper sequencing because users need sensitivity for low variant allele fractions. Depending on the application, expected VAFs, molecular barcoding, and error-correction strategy, practical depth may range from several hundred to several thousand X. Metagenomics and microbial sequencing strategies may prioritize either breadth across mixed communities or very deep sequencing of smaller genomes, depending on whether the objective is taxonomic profiling, assembly, or rare variant detection.
The right answer is never just a single depth number. Strong planning starts with analysis requirements: what is the minimum VAF you must detect, what false negative rate is acceptable, what fraction of target bases must exceed a threshold like 20X or 100X, and what quality filters will be applied in variant calling.
Real-world factors that change effective depth
Theoretical depth can differ substantially from usable depth. The usable percentage in this calculator captures combined effects from read filtering, mapping quality, duplicates, and assay-specific inefficiencies. In reality, these losses arise from several components:
- Adapter contamination, low-quality tails, and short surviving fragments after trimming.
- Alignment failures in repetitive or low-complexity regions.
- PCR duplicates and optical duplicates reducing unique molecular coverage.
- Off-target capture in exome/panel protocols.
- GC bias and sequence-context artifacts causing uneven depth distribution.
- Sample degradation, FFPE damage, or low-input library complexity.
If your historical pipeline metrics show that only 70% to 85% of sequenced bases remain truly informative for your endpoint, your read planning should reflect that range. Teams that ignore these factors often under-sequence and miss project-level quality targets, leading to re-runs that cost more than conservative front-end planning.
Worked examples for common study designs
Example 1: Human WGS at 30X. Suppose target size is 3.1 Gb, read length is 150 bp, paired-end mode is on, and usable fraction is 85%. Required effective bases are about 93 Gb. Accounting for usable yield, gross bases needed are approximately 109.4 Gb. Dividing by 300 bp per read pair gives around 364.7 million read pairs, before reserve. Adding 10% reserve for duplicates and variability brings planning closer to 401 million pairs.
Example 2: Exome capture project. Assume 180 Mb target, 150 bp paired-end reads, and a lower usable fraction for on-target callable bases due to capture non-uniformity. If practical usable fraction is 60% and target mean depth is 100X, effective bases needed are 18 Gb and gross bases are 30 Gb. That corresponds to roughly 100 million read pairs. If your QC history shows broader dispersion across samples, you may add a larger reserve to reduce under-covered exomes.
Example 3: Small amplicon panel. For a 30 kb panel at 2000X target depth, required effective bases are modest in absolute terms, but assays in liquid biopsy contexts often need very high molecular depth plus strict filtering. With paired 150 bp reads and 50% usable fraction after UMI consensus and stringent error correction, required read count can still become substantial relative to the small target size. This is why assay chemistry and bioinformatics requirements must drive depth planning, not target size alone.
Best practices for accurate coverage forecasting
First, estimate coverage with realistic pipeline-based assumptions instead of idealized numbers. Use empirical historical metrics from your own lab, instrument model, library prep kit, and sample class. Second, plan depth using both average coverage and distribution targets (for example, percentage of target at ≥20X). Third, include reserve capacity in read planning to absorb variability in library quality and lane-to-lane yield.
It is also useful to run sensitivity scenarios before finalizing sequencing orders. Compare best-case, expected-case, and conservative-case assumptions for usable yield. This identifies how fragile the design is and where an incremental increase in planned reads can materially reduce downstream risk. For regulated, clinical-adjacent, or high-stakes translational pipelines, this type of pre-run stress test is often worth the effort.
Finally, avoid mixing incompatible depth numbers from different analysis contexts. For example, “raw mapped depth,” “deduplicated depth,” and “callable depth” are not interchangeable. Teams should define a standard depth metric tied directly to the endpoint decision, then plan reads specifically to that metric.
Coverage vs. confidence in variant calling
Coverage alone does not guarantee variant call quality, but it strongly influences statistical confidence. For germline variants at high allele fractions, moderate depth often suffices when quality and mapping are good. For somatic low-frequency variants, confidence depends on both depth and error model assumptions. In those contexts, molecular barcodes, strand-aware filters, and panel-specific error suppression can matter as much as nominal depth.
The practical takeaway is simple: use this calculator for fast depth estimates and read planning, then pair those numbers with assay validation data and platform-specific performance benchmarks. That combination yields planning decisions that are both efficient and defensible.
Sequencing Coverage Calculator FAQ
What is a good coverage target for whole-genome sequencing?
For many short-read human germline WGS projects, 30X is a common baseline. Depending on quality goals, sample type, and analysis pipeline, some teams choose higher depth such as 40X or more.
Should I use genome size or target capture size in the calculator?
Use total genome size for WGS. For exome or targeted assays, use the total intended target region size, then apply a realistic usable percentage to account for off-target and non-uniform coverage.
Why does paired-end sequencing change coverage?
Paired-end libraries produce two reads per fragment, effectively doubling bases per cluster relative to single-end mode (assuming equal read length). The calculator includes this factor automatically when paired-end is selected.
What does “usable bases” include?
It is an aggregate efficiency factor covering trimming loss, alignment loss, duplicates, off-target reads, and other filters that reduce bases contributing to final analysis.
Can average coverage hide poorly covered regions?
Yes. Average depth can look strong even when difficult regions remain under-covered. Always review coverage distribution metrics and minimum-depth thresholds across targets.