What Makes SelfDecode WGS Unique
Whole genome sequencing (WGS) by SelfDecode is built on a clinically rigorous pipeline — from sequencing depth and methodology to variant calling, filtering, and AI-powered interpretation. Most consumer genomics providers cut corners at multiple stages; we don't.
Summary: What Sets SelfDecode WGS Apart
| Factor | Many Competitors | SelfDecode |
|---|---|---|
| Sequencing type | Exome, arrays, or imputed low-pass | True whole genome |
| Coverage depth | 10x or inconsistent "up to 30x" | True 30x+ |
| Read type | Single-end (some providers) | Paired-end |
| Variant caller | bcftools or unvalidated proprietary tools | DRAGEN |
| Reference genome | GRCh37 (outdated) | GRCh38 (current) |
| Validation | Self-reported or none | GIAB + PrecisionFDA benchmarking |
| Quality filtering | Minimal or absent | Carefully calibrated, validated |
| Data delivery | Bloated, unfiltered, undocumented | Filtered, annotated, documented |
| AI interpretation | Raw data fed to generic AI | Validated data + purpose-built framework |
What "Whole Genome Sequencing" Actually Means
Not all WGS is equal. Across the industry, the term has become a marketing label that covers several very different products:
- Exome sequencing — covers only ~1–2% of the genome (protein-coding regions). Misses regulatory regions, structural variants, and non-coding mutations increasingly linked to disease.
- Genotyping arrays — not sequencing at all. Tests a predefined list of known variants (e.g., 23andMe). Cannot detect rare or novel variants by design.
- Low-pass sequencing with imputation — sequences at very shallow depth (0.5x–4x) and statistically "fills in" the rest using reference population data. The result looks like WGS but is largely inferred. Accuracy varies significantly by ancestry — least reliable for non-European clients.
SelfDecode performs true WGS: the full genome, no shortcuts.
Coverage Depth: Why 30x Is the Standard
Clinical-grade WGS is performed at 30x mean coverage — each position in the genome is independently read ~30 times. This redundancy is what allows confident, accurate variant calls.
At 10x (commonly marketed as a "budget" option):
- Heterozygous variant accuracy drops meaningfully
- Many clinically important regions receive fewer than 5 reads
- False positive and false negative rates increase significantly
An additional caveat: some providers advertise "30x" but deliver as little as 10x–30x with no minimum floor — meaning the data isn't suitable for rare variant detection. We use true 30x+ coverage across the full genome.
How We Sequence: Paired-End Reads
SelfDecode uses paired-end sequencing, reading both ends of each DNA fragment. This is required for:
- Reliable PCR duplicate removal (single-end reads cannot support this)
- Accurate mapping in repetitive genome regions
- Structural variant detection
Single-end sequencing inflates apparent coverage while reducing true information content and introducing systematic calling errors.
Variant Calling: Where Most Pipelines Fail
Turning raw sequencing reads into a usable variant list is the most consequential step in the pipeline — and the most commonly compromised.
Key issues in the industry:
| Problem | Impact |
|---|---|
| Old reference genome (build 37/hg19) | Missed variants, mislocalized calls, incompatibility with modern databases |
| Weak variant callers (e.g., bcftools-only) | No longer accepted for clinical-grade WGS; lower accuracy than current tools |
| Unvalidated proprietary algorithms | No external benchmarking — no way to verify accuracy |
| Population bias | Pipelines optimized on European-ancestry data; higher error rates for other ancestries |
We use DRAGEN — the industry-leading variant caller used in clinical laboratories worldwide — called against GRCh38 (the current standard reference genome) and benchmarked against truth sets from the Genome in a Bottle Consortium and PrecisionFDA challenges. We know our sensitivity and specificity numbers because we've measured them.
Quality Filtering: The Invisible Step
Raw variant calls always contain errors. Proper filtering removes artifacts; improper filtering either lets them through or removes real variants.
Providers who skip this step deliver bloated, unfiltered files that look comprehensive but contain significant noise. Feeding unfiltered data to any AI system — including an excellent one — produces confident-sounding analysis built on artifacts.
SelfDecode delivers:
- Carefully calibrated quality filters, tested across different ancestral backgrounds and genomic contexts
- Full quality metrics retained (read depth, mapping quality, allele balance, genotype quality scores)
- Clean, compressed, annotated output — not a 4–20 GB unfiltered dump
QA/QC at Every Stage
We run quality checks at every step: sample receipt, library prep, sequencing, alignment, variant calling, filtering, and annotation. Metrics monitored include mean coverage, coverage uniformity, duplicate rate, contamination estimates, sex concordance, and ancestry-informed checks. Samples that don't meet standards are flagged and reprocessed.
AI Interpretation — Built on Validated Data
AI analysis is only as good as the data underneath it. An LLM analyzing a variant list full of false positives will produce well-formatted, citation-backed, authoritative-sounding recommendations — built on variants that don't exist.
Our AI operates on data that has already passed through a validated calling and filtering pipeline, annotated with current clinical databases. Our AI framework:
- Enforces evidence standards and flags genuine uncertainty
- Cross-references multiple databases
- Distinguishes established pathogenic variants from speculative associations
- Avoids overstating confidence or building risk assessments on artifacts
The result is a complete, validated chain: sequencing → variant calling → filtering → annotation → interpretation. Each link has been rigorously tested.
Client-Facing Reporting
Accurate genomic data is only useful if it's communicated clearly. Most providers either deliver no report at all, or produce outputs so dense and technical that clients can't act on them.
SelfDecode delivers well-structured, accessible reports that translate validated genomic findings into clear, actionable insights — formatted for clients to review directly, on their own or with their practitioner's help.
Questions? You can reach us at support@selfdecode.com - we'll be happy to help!