The numbers are made up for easy calculations to illustrate the problem. A study wants to investigate the relationship between smoking and lung cancer. It randomly surveyed 100 cancer patient in the hospital, and 2100 non-cancer patient. There are 39 smokers in the cancer group, and 1061 smokers in the non-cancer group. Table 1: Data Somker Non-smoker Cancer 39 61 Non-Cancer 1061 1039

Among the smokers, the rate of getting cancer is 39/1200, smaller than that among the non-smokers 61/1100. Table 2: Gene Present Gene Present Cancer Non-Cancer

Somker Non-smoker 30 1 970 99

Table 3: Gene Absent Gene Absent Somker Non-smoker Cancer 9 60 Non-Cancer 91 940

When the gene is present, among the smokers, the rate of getting cancer is 3%, bigger than that among the non-smokers 1%. When the gene is not present, among the smokers, the rate of getting cancer is 9%, bigger than that among the non-smokers 6%. Why? The confounding variable gene is associated with the rate of cancer, and it is not evenly distributed in the smoker and non-smoker group.

1

Background (Nature 2008 April 3 ): Three studies identify an association between genetic variation at a location on chromosome 15 and risk of lung cancer. But they disagree on whether the link is direct or mediated through nicotine dependence. • Simpson’s paradox: the conclusions are reversed when the groups are combined. • It is dangerous to infer causality from observational studies. • Gene is the confounding variable in this example. • Regression: rate of cancer ∼ gene effect + smoking effect. • Randomized double blind experiment. – treatment and control groups – blind on evaluators/researchers/doctors – double blind also on the patient – placebo • Retrospective study and odds ratio. a/(a + b) a/(a + c) smoking rate with cancer cancer rate in smokers 1 − cancer rate in smokers = c/(a + c) = b/(a + b) = 1 − smoking rate with cancer cancer rate in non-smokers smoking rate without cancer b/(b + d) c/c + d) 1 − cancer rate in non-smokers 1 − smoking rate without cancer d/(b + d) d/(c + d)

Table 4: Data Somker Non-smoker Cancer a b Non-Cancer c d

2

Low birth weight paradox. Among those low birth weight babies, they have a smaller mortality rate with smoking mothers. Hence, smoking is good for babies!

3

simpson