May 26, 2011

Forcing the reference allele in PLINK

Aim: my data showed a region of nearly complete homozygosity in one population (let's say, pop1) and I wanted to see whether the same region (or a shorter part of it) was also nearly fixed in other three populations and also if it was the same allele that was fixed in the populations.

Method: my first try was to use PLINK to calculate the allelic frequencies (--freq) in that region (--chr --from-mb --to-mb) independently in the three populations using the minor allele in pop1 as the reference allele for the other breeds. This can be done in PLINK by adding --reference-allele, which specifies a file with two fields: the SNP identifier and the allele to be used as reference. The most obvious way to me to generate this file was from the previously generated file pop1.frq (--frq), which amongst other fields includes the SNP identifier and the minor and the major alleles in pop1 (denoted as A1 and A2, respectively).

Problem: in the *.frq file, SNPs that are fixed (i.e. MAF = 0), since there are not counts of the minor allele, PLINK codes the A1 as zero and one will end up with a list of reference alleles informativeless for some markers. Weird things may happen then if this list is used to calculate the derived allelic frequencies in the other populations. I do not fully understand why PLINK does not simply extracts the actual minor allele from the *.bim file (which is included by default in any PLINK analysis) instead of just using zero coding.

No comments:

Post a Comment