Variant filtering

For our QC pipeline, we first read in the .vcf file, split multiallelics, and remove sites with more than 6 alleles. After splitting muliallelics in the .vcf file containing 29,911,479 variants and restricting to these sites, we have 37,344,246 variants.

Filter Variants %
Variants with < 7 alleles 37,344,246 100.0
Failing VQSR 100,742 0.3
In LCRs 1,215,218 3.3
Outside padded target interval 27,119,165 72.6
Invariant sites after initial variant and genotype filters 3,117,961 8.3
Invariant sites after sample filters 1,051,421 2.8
Overall variant call rate < 0.97 737,072 2.0
Overall variant case call rate < 0.97 716,709 1.9
Overall variant control call rate < 0.97 743,659 2.0
Difference between case and control variant call rate < 0.02 232,341 0.6
Variants failing HWE filter 1,083,479 2.9
Variants remaining after all filters 5,104,759 13.7


Sample filtering


Filter Samples Bipolar cases Controls %
Initial samples in vcf 39,618 16,486 17,212 100.0
Unable to obtain both phenotype and sequence information 2 NA NA 0.0
Unknown phenotype 32 NA NA 0.1
Low coverage or high contamination 133 72 54 0.3
Sample call rate < 0.93 185 124 53 0.5
% FREEMIX contamination > 0.02 268 146 104 0.7
% chimeric reads > 0.015 152 49 100 0.4
Mean DP < 30 20 5 12 0.1
Mean GQ < 55 56 28 25 0.1
Samples with sex swap 238 147 52 0.6
Related samples for removal 1,716 792 688 4.3
PCA based filters 2,880 1,120 1,422 7.3
Within batch Ti/Tv ratio outside 3 standard deviations 100 50 42 0.3
Within batch Het/HomVar ratio outside 3 standard deviations 150 66 58 0.4
Within batch Insertion/Deletion ratio outside 3 standard deviations 93 31 48 0.2
Within location n singletons outside 3 standard deviations 443 151 236 1.1
Samples after final sample filters 33,527 13,933 14,422 84.6


Summary of sample filtering

Filter Samples Bipolar cases Controls %
Initial samples in vcf 39,618 16,486 17,212 100.0
Unable to obtain both phenotype and sequence information 2 NA NA 0.0
Unknown phenotype 32 NA NA 0.1
Low coverage or high contamination 133 72 54 0.3
Below sample metric thresholds 557 276 252 1.4
Samples with sex swap 238 147 52 0.6
Related samples for removal 1,716 792 688 4.3
PCA based filters 2,880 1,120 1,422 7.3
Outliers in batch-specific sample metrics 771 293 374 1.9
Samples after final sample filters 33,527 13,933 14,422 84.6