May 11, 2011

A fixed significance threshold for GWAS in humans

Pe'er et al., 2008 aimed to define the testing burden (tb) as the factor by which significance is exagerated. Phased chromosomes from the Human Haplotype Map (HapMap) ENCODE regions (representing a fraction g = 1/600 of the genome) were used to generate randomly 1,000 cases and 1,000 controls (no association expected). Association statistics and p-values were calculated and the process was simulated N = 10e7 times.

As I understand it, for a given p, a nominal p-value computed from the theoretical distribution, n(p) was calculated as the number of simulations out of N simulations at which the best simulated p-value region-wide (i.e. in the region of size g) exceeded p. H(p) = n(p)/g·N, was defined as the number of expected regions in the genome that have a SNP exceeding p (i.e. expected significant hits in the genome by chance). tb is defined as H(p)/p and by consesus they define H(p) = 1 so that tb = 1/p. So far so good.

What it is not so clear to me is why they set p to be the gN th element of the list of the top single-hits in each of the simulations sort from the smallest (most significant) to the largest. Less clear is how they extrapolate to estimate the number of independent tests (1 million for all ENCODE SNPs in the CEU HapMap population), which is used to set a fixed threshold of genome-wide significance of P = 0.05 / 1 million = ~10e-8.

1 comment:

  1. Xavi et al.,
    In relation to your post, but not directly answering any of your questions, Daniel Gianola posted the following guidelines in the AnGenMap list:

    0.01<p<0.05 Surprising
    0.001<p<0.01 Wow!
    0.00001<p<0.001 Miracolo ma non troppo
    0.000001<p<0.0001 Beatificazione subita
    Undetectable p I won the lottery

    and recommende this paper.

    http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002051

    Jules

    ReplyDelete