Understanding Genetic Imputation: Filling in the Gaps in Your DNA

Imputation is a process that predicts genetic variants you weren't directly tested for, based on the variants you were tested for. At SelfDecode, we use cutting-edge AI and machine learning to predict SNP genotypes, turning the data from your DNA kit into a more complete picture of your genome.

Watch this video to learn more about imputation, or keep reading for a detailed explanation.

What is Genetic Imputation? The Quick Explanation

Your DNA contains genetic variants — tiny differences that make you unique. A typical commercial DNA kit tests for about 650,000 variants. That's substantial, but it still leaves out many variants that could affect your health.

Here's where SelfDecode comes in:

  1. We take your ~650,000 tested variants and use AI to predict up to 200 million variants.
  2. We identify which of these variants impact your health.
  3. We calculate a genetic risk score, showing how your risk compares to others.
  4. Finally, we provide personalized lifestyle and health recommendations based on hundreds to thousands of studies, tailored to your DNA.

All of this information is available in your SelfDecode reports.

How Imputation Works in Detail

The Basic Biology

To understand imputation, let's review some fundamentals:

  • You inherit half of your DNA from each parent, so your genome is a unique mix of their DNA.
  • DNA isn't inherited one base pair at a time. Most of it is passed in larger "chunks" called haplotypes.

Think of your genome as a book: each letter is a base pair, and words are haplotypes (sequences of letters inherited together). Imputation works by using the letters you have to figure out the word they spell, then filling in the missing letters.

For example:

T E C H N O _ O G Y

The missing letter is L. You predicted the full word using the letters you had.

In DNA, the letters are base pairs and the words are haplotypes. Once we identify a haplotype, we can fill in the missing base pairs that weren't directly tested.

Why Computers Are Essential

Sometimes the clues are much fewer:

_ E _ _ N _ _ _ _ Y

Humans would struggle to predict the word from these limited clues. But AI and machine learning can analyze large datasets and predict the missing letters accurately — that's how SelfDecode performs genetic imputation at scale.

How Accurate Is Genetic Imputation?

Genetic imputation is highly accurate when applied appropriately. For common genetic variants, imputation accuracy is typically around 99–99.9%, especially when high-quality reference populations are used.

However, accuracy decreases for rare variants, particularly those with a minor allele frequency (MAF) below 1%. These rare variants are harder to predict because there is less population data available to reliably infer them.

Because of this limitation, genetic imputation should not be relied upon for high-impact or clinically significant variants, such as:

  • APOE
  • BRCA1 / BRCA2
  • Other rare or medically actionable variants

How SelfDecode Uses Imputation

SelfDecode uses imputation to improve coverage across the genome, enhance polygenic risk scores, and identify patterns across many common SNPs.

For critical medical genes and high-impact variants, SelfDecode relies on directly tested genetic data, not imputed results. This approach balances broad genetic insight, high accuracy, and responsible interpretation of medical risk.

Can I See Which SNPs Are Used for Imputation?

At this time, we're not able to provide a list of specific SNPs used to impute individual genetic variants.

Why This Information Isn't Available

SelfDecode uses an advanced imputation pipeline to infer additional genetic variants from your DNA file. This process relies on machine learning models trained on large reference datasets and evaluates thousands of surrounding variants at once to predict a single genotype.

Because of how imputation works:

  • There is no fixed or static list of SNPs used for each imputed variant
  • Different variants may be influenced by different combinations of neighboring SNPs
  • The models evaluate complex patterns across the genome rather than direct one-to-one mappings

As a result, it isn't possible to isolate or meaningfully list the exact SNPs contributing to the imputation of a specific variant.

What You Can Expect

While we can't provide individual SNP lists, we're transparent about the process:

  • Imputation is a widely used, peer-reviewed method in genetics research
  • Our pipeline analyzes over 200 million variants using high-quality reference data
  • The process is continuously refined as models and datasets improve
  • Imputed results are used to enhance accuracy and depth beyond raw genotyping alone

Why This Matters for Your Results

The goal of imputation is to provide more complete and reliable insights, not just more data. By evaluating broad genetic patterns rather than isolated SNPs, we deliver more comprehensive health reports, better-informed risk assessments, and stronger science-backed recommendations.

How Does SelfDecode Verify Imputation Accuracy?

We take accuracy seriously and test our methods regularly. To ensure our imputation process is reliable, we test it against whole genome sequencing (WGS) data from the same person — essentially checking our work against the most complete DNA data available.

What This Means for You

  • Our imputation accuracy averages ~99.7%
  • This means nearly all imputed genetic variants match what is seen in full genome sequencing
  • Accuracy is highest when using our own DNA kit
  • Results are consistent across admixed populations

You can feel confident that imputed data used for reports and insights is highly reliable.

Factors That Affect Accuracy

  • Chip density: Higher SNP coverage in the original genotype file leads to higher imputation accuracy
  • Ancestry: Genetic ancestry has a smaller but measurable effect on accuracy
Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.