GWAS Forum: Forcing the reference allele in PLINK

Aim: my data showed a region of nearly complete homozygosity in one population (let's say, pop1) and I wanted to see whether the same region (or a shorter part of it) was also nearly fixed in other three populations and also if it was the same allele that was fixed in the populations.

Method: my first try was to use PLINK to calculate the allelic frequencies (--freq) in that region (--chr --from-mb --to-mb) independently in the three populations using the minor allele in pop1 as the reference allele for the other breeds. This can be done in PLINK by adding --reference-allele, which specifies a file with two fields: the SNP identifier and the allele to be used as reference. The most obvious way to me to generate this file was from the previously generated file pop1.frq (--frq), which amongst other fields includes the SNP identifier and the minor and the major alleles in pop1 (denoted as A1 and A2, respectively).

Problem: in the *.frq file, SNPs that are fixed (i.e. MAF = 0), since there are not counts of the minor allele, PLINK codes the A1 as zero and one will end up with a list of reference alleles informativeless for some markers. Weird things may happen then if this list is used to calculate the derived allelic frequencies in the other populations. I do not fully understand why PLINK does not simply extracts the actual minor allele from the *.bim file (which is included by default in any PLINK analysis) instead of just using zero coding.

GWAS Forum

Pages

May 26, 2011

Forcing the reference allele in PLINK

No comments:

Post a Comment

Total Pageviews