Understanding Genetic Imputation

Understanding Imputation: Filling in the Gaps in Your DNA

Imputation is a process that predicts genetic variants you weren’t directly tested for, based on the variants you were tested for. At SelfDecode, we use cutting-edge AI and machine learning to predict SNP genotypes, turning the data from your DNA kit into a more complete picture of your genome.

Watch this video to learn more about imputation or keep reading for a clear explanation!


What is Genetic Imputation? The One-Minute Explanation

Your DNA is full of genetic variants — tiny differences that make you unique. A typical commercial DNA kit tests for about 650,000 variants. That’s a lot, but it still leaves out many variants that could affect your health.

Here’s where SelfDecode comes in:

  1. We take your ~650,000 tested variants and use AI to predict up to 200 million variants.
  2. We identify which of these variants impact your health.
  3. We calculate a genetic risk score, showing how your risk compares to others.
  4. Finally, we provide personalized lifestyle and health recommendations based on hundreds to thousands of studies, tailored to your DNA.

All of this information is available in your SelfDecode Reports.


The Long Explanation

How Imputation Works

To understand imputation, let’s review some basic biology:

  • You inherit half of your DNA from each parent, so your genome is a unique mix of their DNA.
  • DNA isn’t inherited one base pair at a time. Most of it is passed in larger “chunks” called haplotypes.

Think of your genome as a book:

  • Each letter is a base pair.
  • Words are haplotypes, sequences of letters inherited together.
  • Imputation works by using the letters you do have to figure out the word they spell, and then filling in the missing letters.

For example:

T E C H N O _ O G Y

The missing letter is L. You just predicted the full word using the letters you had.

In DNA, the letters are base pairs and the words are haplotypes. Once we identify a haplotype, we can fill in the missing base pairs that weren’t directly tested.


Why Computers Are Needed

Sometimes the clues are much fewer, like this:

_ E _ _ N _ _ _ _ Y

Humans would struggle to guess the word from these clues. But AI and machine learning can analyze large datasets and predict the missing letters accurately — that’s how SelfDecode does genetic imputation at scale.


How Accurate Is Genetic Imputation?

Genetic imputation is highly accurate when applied appropriately. For common genetic variants, imputation accuracy is typically around 99–99.9%, especially when high-quality reference populations are used.

However, accuracy decreases for rare variants, particularly those with a minor allele frequency (MAF) below 1%. These rare variants are harder to predict because there is less population data available to reliably infer them.

Because of this limitation, genetic imputation should not be relied upon for high-impact or clinically significant variants, such as:

  • APOE
  • BRCA1 / BRCA2
  • Other rare or medically actionable variants

How SelfDecode Uses Imputation

SelfDecode uses imputation to:

  • Improve coverage across the genome
  • Enhance polygenic risk scores
  • Identify patterns across many common SNPs

For critical medical genes and high-impact variants, SelfDecode relies on directly tested genetic data, not imputed results.

This approach allows us to balance:

  • Broad genetic insight
  • High accuracy
  • Responsible interpretation of medical risk

Imputation enhances genetic analysis — but it does not replace direct testing for clinically important variants.


Can I See Which SNPs Are Used for Imputation?

At this time, we’re not able to provide a list of specific SNPs used to impute individual genetic variants.

Why This Information Isn’t Available

SelfDecode uses an advanced imputation pipeline to infer additional genetic variants from your uploaded DNA file. This process relies on machine learning models trained on large reference datasets and evaluates thousands of surrounding variants at once to predict a single genotype.

Because of how imputation works:

  • There is no fixed or static list of SNPs used for each imputed variant
  • Different variants may be influenced by different combinations of neighboring SNPs
  • The models evaluate complex patterns across the genome rather than direct one-to-one mappings

As a result, it isn’t possible to isolate or meaningfully list the exact SNPs contributing to the imputation of a specific variant.

What You Can Expect

While we can’t provide individual SNP lists, we want to be transparent about the process:

  • Imputation is a widely used, peer-reviewed method in genetics research
  • Our pipeline analyzes over 200 million variants using high-quality reference data
  • The process is continuously refined as models and datasets improve
  • Imputed results are used to enhance accuracy and depth beyond raw genotyping alone

Why This Still Matters for Your Results

The goal of imputation is to provide more complete and reliable insights, not just more data. By evaluating broad genetic patterns rather than isolated SNPs, we’re able to deliver:

  • More comprehensive health reports
  • Better-informed risk assessments
  • Stronger science-backed recommendations

How does SelfDecode verify imputation accuracy?

We take accuracy very seriously and we test our methods regularly. To make sure our imputation process is reliable, we will test it against whole genome sequencing (WGS) data from the same person.

In simple terms: we check our work against the most complete DNA data available in the test sample.

What This Means for You

  • Our imputation accuracy averages ~99.7%
  • This means that nearly all imputed genetic variants match what is seen in full genome sequencing
  • Accuracy is highest when using our own DNA kit
  • Results are consistent across admixed populations

You can feel confident that imputed data used for reports and insights is highly reliable.

Factors That Affect Accuracy

  • Chip density: Higher SNP coverage in the original genotype file leads to higher imputation accuracy
  • Ancestry: Genetic ancestry has a smaller but measurable effect on accuracy
Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.