What Makes SelfDecode WGS Unique

Whole genome sequencing (WGS) by SelfDecode is built on a clinically rigorous pipeline — from sequencing depth and methodology to variant calling, filtering, and AI-powered interpretation. Most consumer genomics providers cut corners at multiple stages; we don't.

Summary: What Sets SelfDecode WGS Apart

Factor	Many Competitors	SelfDecode
Sequencing type	Exome, arrays, or imputed low-pass	True whole genome
Coverage depth	10x or inconsistent "up to 30x"	True 30x+
Read type	Single-end (some providers)	Paired-end
Variant caller	bcftools or unvalidated proprietary tools	DRAGEN
Reference genome	GRCh37 (outdated)	GRCh38 (current)
Validation	Self-reported or none	GIAB + PrecisionFDA benchmarking
Quality filtering	Minimal or absent	Carefully calibrated, validated
Data delivery	Bloated, unfiltered, undocumented	Filtered, annotated, documented
AI interpretation	Raw data fed to generic AI	Validated data + purpose-built framework

What "Whole Genome Sequencing" Actually Means

Not all WGS is equal. Across the industry, the term has become a marketing label that covers several very different products:

Exome sequencing — covers only ~1–2% of the genome (protein-coding regions). Misses regulatory regions, structural variants, and non-coding mutations increasingly linked to disease.
Genotyping arrays — not sequencing at all. Tests a predefined list of known variants (e.g., 23andMe). Cannot detect rare or novel variants by design.
Low-pass sequencing with imputation — sequences at very shallow depth (0.5x–4x) and statistically "fills in" the rest using reference population data. The result looks like WGS but is largely inferred. Accuracy varies significantly by ancestry — least reliable for non-European clients.

SelfDecode performs true WGS: the full genome, no shortcuts.

Coverage Depth: Why 30x Is the Standard

Clinical-grade WGS is performed at 30x mean coverage — each position in the genome is independently read ~30 times. This redundancy is what allows confident, accurate variant calls.

At 10x (commonly marketed as a "budget" option):

Heterozygous variant accuracy drops meaningfully
Many clinically important regions receive fewer than 5 reads
False positive and false negative rates increase significantly

An additional caveat: some providers advertise "30x" but deliver as little as 10x–30x with no minimum floor — meaning the data isn't suitable for rare variant detection. We use true 30x+ coverage across the full genome.

How We Sequence: Paired-End Reads

SelfDecode uses paired-end sequencing, reading both ends of each DNA fragment. This is required for:

Reliable PCR duplicate removal (single-end reads cannot support this)
Accurate mapping in repetitive genome regions
Structural variant detection

Single-end sequencing inflates apparent coverage while reducing true information content and introducing systematic calling errors.

Variant Calling: Where Most Pipelines Fail

Turning raw sequencing reads into a usable variant list is the most consequential step in the pipeline — and the most commonly compromised.

Key issues in the industry:

Problem	Impact
Old reference genome (build 37/hg19)	Missed variants, mislocalized calls, incompatibility with modern databases
Weak variant callers (e.g., bcftools-only)	No longer accepted for clinical-grade WGS; lower accuracy than current tools
Unvalidated proprietary algorithms	No external benchmarking — no way to verify accuracy
Population bias	Pipelines optimized on European-ancestry data; higher error rates for other ancestries

We use DRAGEN — the industry-leading variant caller used in clinical laboratories worldwide — called against GRCh38 (the current standard reference genome) and benchmarked against truth sets from the Genome in a Bottle Consortium and PrecisionFDA challenges. We know our sensitivity and specificity numbers because we've measured them.

Quality Filtering: The Invisible Step

Raw variant calls always contain errors. Proper filtering removes artifacts; improper filtering either lets them through or removes real variants.

Providers who skip this step deliver bloated, unfiltered files that look comprehensive but contain significant noise. Feeding unfiltered data to any AI system — including an excellent one — produces confident-sounding analysis built on artifacts.

SelfDecode delivers:

Carefully calibrated quality filters, tested across different ancestral backgrounds and genomic contexts
Full quality metrics retained (read depth, mapping quality, allele balance, genotype quality scores)
Clean, compressed, annotated output — not a 4–20 GB unfiltered dump

QA/QC at Every Stage

We run quality checks at every step: sample receipt, library prep, sequencing, alignment, variant calling, filtering, and annotation. Metrics monitored include mean coverage, coverage uniformity, duplicate rate, contamination estimates, sex concordance, and ancestry-informed checks. Samples that don't meet standards are flagged and reprocessed.

AI Interpretation — Built on Validated Data

AI analysis is only as good as the data underneath it. An LLM analyzing a variant list full of false positives will produce well-formatted, citation-backed, authoritative-sounding recommendations — built on variants that don't exist.

Our AI operates on data that has already passed through a validated calling and filtering pipeline, annotated with current clinical databases. Our AI framework:

Enforces evidence standards and flags genuine uncertainty
Cross-references multiple databases
Distinguishes established pathogenic variants from speculative associations
Avoids overstating confidence or building risk assessments on artifacts

The result is a complete, validated chain: sequencing → variant calling → filtering → annotation → interpretation. Each link has been rigorously tested.

Client-Facing Reporting

Accurate genomic data is only useful if it's communicated clearly. Most providers either deliver no report at all, or produce outputs so dense and technical that clients can't act on them.

SelfDecode delivers well-structured, accessible reports that translate validated genomic findings into clear, actionable insights — formatted for clients to review directly, on their own or with their practitioner's help.

Questions? You can reach us at support@selfdecode.com - we'll be happy to help!