How do you verify imputation accuracy?

Our imputation accuracy is regularly assessed by comparing the imputation data resulting from the whole genome sequencing data for the same sample tested.

The testing process always starts with two files, coming from the same test sample. One file is genotyped data (with ~700k SNPs), and the other one is whole genome sequencing data (with >73M SNPs).

We run imputation over the genotype file using our pipeline, and afterward, we compare the imputed SNPs with the SNPs in the whole genome sequencing file using concordance rate as a metric of evaluation.

On average, we gain an accuracy of 99.7%, but this heavily depends on the chip density (number of SNPs) of the genotype data used and lightly on the ancestry of the sample tested. The accuracy of 99.7% refers to our DNA kit, with admixed populations tested.