9780141982410

Page 1

PEDRO M I N G O– S , Pinker reveals our defi esD Ologic innate instinct to communicate.

Cover design: Richard Green

AUTHOR OF TH E M AS TER ALGORITHM

‘Illuminating ... it not only delivers a valuable lesson on the history of ideas but provides the conceptual tools needed to judge just what big data can and cannot deliver’ J O N AT H A N A . K N E E, T H E N E W YO RK T IMES

‘Pearl elaborates a vision for how truly intelligent machines would think’ T H E AT L A N T I C

‘Lively and accessible ... Pearl was one of the visionary leaders of the causal revolution, and The Book of Why is his crowning achievement’ JEWISH JOURNAL

I S B N 978-0-141-98241-0

9

JUDEA PEARL & DANA MACKENZIE THE BOOK OF WHY

How do we is‘know’ how to ’If causation not correlation, speak? Language, shows this then what is it? Thanks to Judea landmark book, is part of our Pearl's epoch-making research, genetic inheritance rather than we now have a precise answer a cultural creation. to this question. If youDestroying want to the myths about language – understand how the world works, that children learn to talk by this engrossing and delightful copying parents, English book is the place tothat start’

JJudea Pearl & Dana Mackenzie A

B

C

PENGUIN PRESS

Date: 07/01/2019 Designer: c/o RG Production Controller: ISBN: 9780141978284

A SPINE WIDTH: 24.5

C

D

Estimated Confirmed

B

A

(E) B

(T)

A

FORMAT

(H)

B Format Paperback

A B

A

A

B

PA P E R

C B

D (B)

C

C

(O) O O)

B

Standard/Coated

C D

(O) O O)

( K) (K K)

A

PRINT

A C B

C

A

‘Wonderful ... Illuminating ... Fun’ Daniel Kahneman

B

C

(O) O O)

A

(W) W W)

(F)

C

D

PROOFING METHOD

Wet proofs Digital only

C

( H) (H H)

No further proof required

( Y) (Y Y)

The New Science of Cause and Eff ffe ff fect

90000

780141 982410

U.K. £00.00 U.S. $00.00 CAN. $00.00

BAC K COV E R 9780141982410_TheBookOfWhy_COV.indd 1

AN

ALLEN LANE BOOK

PENGUIN

Science

24. 5

FINISHES

B

D

B

CMYK Other

Matte varnish A

B

MM

FRO NT COV E R 07/01/2019 15:29


p e n g u i n b oo k s

The Book of Why ‘Pearl’s accomplishments over the last 30 years have provided the theoretical basis for progress in artificial intelligence . . . and they have redefined the term “thinking machine” ’ Vint Cerf, Google, Inc. ‘Have you ever wondered about the puzzles of correlation and causation? This wonderful book has illuminating answers and it is fun to read’ Daniel Kahneman, author of Thinking, Fast and Slow ‘ “Correlation is not causation”. That scientific refrain has had social consequences – not least, the long debate over links between tobacco and lung cancer. Judea Pearl proposes a radical mathematical solution. A prominent researcher in AI, Pearl teams up with writer Dana Mackenzie to explore the science of causation, now bearing fruit in biology, medicine, social science and AI’ Barbara Kiser, Nature ‘Judea Pearl has been the heart and soul of a revolution in artificial intelligence and in computer science more broadly. He is an Alan Turing of our time’ Eric Horvitz, Managing Director, Microsoft Research ‘Modern applications of AI, such as robotics, self-driving cars, speech recognition, and machine translation deal with uncertainty. Pearl has been instrumental in supplying the rationale and much valuable technology that allow these applications to flourish’ Alfred Spector, Vice President of Research, Google, Inc.

9780141982410_TheBookOfWhy_PRE.indd 1

04/01/19 3:34 PM


abou t the au thors Judea Pearl is a world-renowned Israeli-American computer scientist and philosopher, known for his world-leading work in AI and the development of Bayesian networks, as well as his theory of causal and counterfactual inference. In 2011, he won the most prestigious award in computer science, the Alan Turing Award. He has also received the Rumelhart Prize (Cognitive Science Society), the Benjamin Franklin Medal (Franklin Institute) and the Lakatos Award (London School of Economics) and he is the founder and president of the Daniel Pearl Foundation. Dana Mackenzie, a PhD mathematician turned science writer, has written for such magazines as Science, New Scientist and Discover.

9780141982410_TheBookOfWhy_PRE.indd 2

04/01/19 3:34 PM


ju d e a p e a r l

and da n a mac k e nz i e The Book of Why The New Science of Cause and Effect

P EN GU I N B O O K S

9780141982410_TheBookOfWhy_PRE.indd 3

04/01/19 3:34 PM


P E N GUIN B O O KS UK | USA | Canada | Ireland | Australia India | New Zealand | South Africa Penguin Books is part of the Penguin Random House group of companies whose addresses can be found at global.penguinrandomhouse.com.

First published in the United States of America by Basic Books 2018 First published in Great Britain by Allen Lane 2018 Published in Penguin Books 2019 001 Copyright © Judea Pearl and Dana Mackenzie, 2018 The moral rights of the authors have been asserted Printed and bound in Great Britain by Clays Ltd, Elcograf S.p.A. A CIP catalogue record for this book is available from the British Library ISBN: 978–0–141–98241–0

www.greenpenguin.co.uk Penguin Random House is committed to a sustainable future for our business, our readers and our planet book is made from Forest Stewardship Council® certified paper.

9780141982410_TheBookOfWhy_PRE.indd 4

04/01/19 3:34 PM


To Ruth

9780141982410_TheBookOfWhy_PRE.indd 5

04/01/19 3:34 PM


9780141982410_TheBookOfWhy_PRE.indd 6

04/01/19 3:34 PM


CONTENTS

Preface

ix

INTRODUCTION

Mind over Data

CHAPTER 1

The Ladder of Causation

23

CHAPTER 2

From Buccaneers to Guinea Pigs: The Genesis of Causal Inference

53

From Evidence to Causes: Reverend Bayes Meets Mr. Holmes

93

CHAPTER 3

1

Confounding and Deconfounding: Or, Slaying the Lurking Variable

135

The Smoke-Filled Debate: Clearing the Air

167

CHAPTER 6

Paradoxes Galore!

189

CHAPTER 7

Beyond Adjustment: The Conquest of Mount Intervention

219

Counterfactuals: Mining Worlds That Could Have Been

259

CHAPTER 9

Mediation: The Search for a Mechanism

299

CHAPTER 10

Big Data, Artificial Intelligence, and the Big Questions

349

CHAPTER 4

CHAPTER 5

CHAPTER 8

Acknowledgments

371

Notes

373

Bibliography

377

Index

405 vii

9780141982410_TheBookOfWhy_PRE.indd 7

04/01/19 3:34 PM


9780141982410_TheBookOfWhy_PRE.indd 8

04/01/19 3:34 PM


PREFACE

A

LMOST two decades ago, when I wrote the preface to my book Causality (2000), I made a rather daring remark that friends advised me to tone down. “Causality has undergone a major transformation,” I wrote, “from a concept shrouded in mystery into a mathematical object with well-defined semantics and well-founded logic. Paradoxes and controversies have been resolved, slippery concepts have been explicated, and practical problems relying on causal information that long were regarded as either metaphysical or unmanageable can now be solved using elementary mathematics. Put simply, causality has been mathematized.” Reading this passage today, I feel I was somewhat shortsighted. What I described as a “transformation” turned out to be a “revolution” that has changed the thinking in many of the sciences. Many now call it “the Causal Revolution,” and the excitement that it has generated in research circles is spilling over to education and applications. I believe the time is ripe to share it with a broader audience. This book strives to fulfill a three-pronged mission: first, to lay before you in nonmathematical language the intellectual content of the Causal Revolution and how it is affecting our lives as well as our future; second, to share with you some of the heroic journeys, both successful and failed, that scientists have embarked on when confronted by critical cause-effect questions. Finally, returning the Causal Revolution to its womb in artificial intelligence, I aim to describe to you how robots can be

ix

9780141982410_TheBookOfWhy_PRE.indd 9

04/01/19 3:34 PM


x

Preface

constructed that learn to communicate in our mother tongue—the language of cause and effect. This new generation of robots should explain to us why things happened, why they responded the way they did, and why nature operates one way and not another. More ambitiously, they should also teach us about ourselves: why our mind clicks the way it does and what it means to think rationally about cause and effect, credit and regret, intent and responsibility. When I write equations, I have a very clear idea of who my readers are. Not so when I write for the general public—an entirely new adventure for me. Strange, but this new experience has been one of the most rewarding educational trips of my life. The need to shape ideas in your language, to guess your background, your questions, and your reactions, did more to sharpen my understanding of causality than all the equations I have written prior to writing this book. For this I will forever be grateful to you. I hope you are as excited as I am to see the results. Judea Pearl Los Angeles, October 2017

9780141982410_TheBookOfWhy_PRE.indd 10

04/01/19 3:34 PM


INTRODUCTION: MIND OVER DATA Every science that has thriven has thriven upon its own symbols. —Augustus de Morgan (1864)

T

HIS book tells the story of a science that has changed the way we distinguish facts from fiction and yet has remained under the radar of the general public. The consequences of the new science are already impacting crucial facets of our lives and have the potential to affect more, from the development of new drugs to the control of economic policies, from education and robotics to gun control and global warming. Remarkably, despite the diversity and apparent incommensurability of these problem areas, the new science embraces them all under a unified framework that was practically nonexistent two decades ago. The new science does not have a fancy name: I call it simply “causal inference,” as do many of my colleagues. Nor is it particularly high-tech. The ideal technology that causal inference strives to emulate resides within our own minds. Some tens of thousands of years ago, humans began to realize that certain things cause other things and that tinkering with the former can change the latter. No other species grasps this, certainly not to the extent that we do. From this discovery came organized societies, then towns and

1

9780141982410_TheBookOfWhy_TXT.indd 1

26/12/18 9:15 PM


22

THE THEBOOK BOOKOF OFWHY WHY

cities, cities,and andeventually eventuallythe thesciencescience-and andtechnology-based technology-basedcivilizacivilization tionwe weenjoy enjoytoday. today.All Allbecause becausewe weasked askedaasimple simplequestion: question:Why? Why? Causal Causalinference inferenceisisall allabout abouttaking takingthis thisquestion questionseriously. seriously.ItIt posits positsthat thatthe thehuman humanbrain brainisisthe themost mostadvanced advancedtool toolever everdevised devised for formanaging managingcauses causesand andeff effects. ects.Our Ourbrains brainsstore storean anincredible incredible amount amountofofcausal causalknowledge knowledgewhich, which,supplemented supplementedby bydata, data,we we could couldharness harnesstotoanswer answersome someofofthe themost mostpressing pressingquestions questionsofof our ourtime. time.More Moreambitiously, ambitiously,once oncewe wereally reallyunderstand understandthe thelogic logic behind behindcausal causalthinking, thinking,we wecould couldemulate emulateititon onmodern moderncomputers computers and andcreate createan an“artifi “artificial cialscientist.” scientist.”This Thissmart smartrobot robotwould woulddiscover discover yet yetunknown unknownphenomena, phenomena,fifind ndexplanations explanationstotopending pendingscientifi scientificc dilemmas, dilemmas,design designnew newexperiments, experiments,and andcontinually continuallyextract extractmore more causal causalknowledge knowledgefrom fromthe theenvironment. environment. But Butbefore beforewe wecan canventure venturetotospeculate speculateon onsuch suchfuturistic futuristicdedevelopments, velopments,ititisisimportant importanttotounderstand understandthe theachievements achievementsthat that causal causalinference inferencehas hastallied talliedthus thusfar. far.We Wewill willexplore explorethe theway waythat that itithas hastransformed transformedthe thethinking thinkingofofscientists scientistsininalmost almostevery everydatadatainformed informeddiscipline disciplineand andhow howititisisabout abouttotochange changeour ourlives. lives. The Thenew newscience scienceaddresses addressesseemingly seeminglystraightforward straightforwardquestions questions like likethese: these: • • How Howeff effective ectiveisisaagiven giventreatment treatmentininpreventing preventingaadisease? disease? • • Did Didthe thenew newtax taxlaw lawcause causeour oursales salestotogo goup, up,or orwas wasititour our advertising advertisingcampaign? campaign? • • What Whatisisthe thehealth-care health-carecost costattributable attributabletotoobesity? obesity? • • Can Canhiring hiringrecords recordsprove provean anemployer employerisisguilty guiltyofofaapolicy policyofof sex sexdiscrimination? discrimination? • • I’m I’mabout abouttotoquit quitmy myjob. job.Should ShouldI?I? These Thesequestions questionshave haveinincommon commonaaconcern concernwith withcause-andcause-andeff effect ectrelationships, relationships,recognizable recognizablethrough throughwords wordssuch suchasas“prevent“preventing,” ing,”“cause,” “cause,”“attributable “attributableto,” to,”“policy,” “policy,”and and“should “shouldI.” I.”Such Such words wordsare arecommon commoninineveryday everydaylanguage, language,and andour oursociety societyconconstantly stantlydemands demandsanswers answerstotosuch suchquestions. questions.Yet, Yet,until untilvery veryrecently, recently, science sciencegave gaveus usno nomeans meanseven eventotoarticulate, articulate,let letalone aloneanswer, answer,them. them. By Byfar farthe themost mostimportant importantcontribution contributionofofcausal causalinference inferencetoto mankind mankindhas hasbeen beentototurn turnthis thisscientifi scientificcneglect neglectinto intoaathing thingofofthe the

9780141982410_TheBookOfWhy_TXT.indd 2

26/12/18 9:15 PM


Introduction: Introduction:Mind Mindover overData Data

33

past. past.The Thenew newscience sciencehas hasspawned spawnedaasimple simplemathematical mathematicallanlanguage guagetotoarticulate articulatecausal causalrelationships relationshipsthat thatwe weknow knowasaswell wellasas those thosewe wewish wishtotofifind ndout outabout. about.The Theability abilitytotoexpress expressthis thisinforinformation mationininmathematical mathematicalform formhas hasunleashed unleashedaawealth wealthofofpowerful powerful and andprincipled principledmethods methodsfor forcombining combiningour ourknowledge knowledgewith withdata data and andanswering answeringcausal causalquestions questionslike likethe thefifive veabove. above. I Ihave havebeen beenlucky luckytotobe bepart partofofthis thisscientifi scientificcdevelopment developmentfor for the thepast pastquarter quartercentury. century.I Ihave havewatched watchedits itsprogress progresstake takeshape shapeinin students’ students’cubicles cubiclesand andresearch researchlaboratories, laboratories,and andI Ihave haveheard heardits its breakthroughs breakthroughsresonate resonateininsomber somberscientifi scientificcconferences, conferences,far farfrom from the thelimelight limelightofofpublic publicattention. attention.Now, Now,asaswe weenter enterthe theera eraofofstrong strong artifi artificial cialintelligence intelligence(AI) (AI)and andmany manytout toutthe theendless endlesspossibilities possibilitiesofof Big BigData Dataand anddeep deeplearning, learning,I Ififind nditittimely timelyand andexciting excitingtotopresent present totothe thereader readersome someofofthe themost mostadventurous adventurouspaths pathsthat thatthe thenew newsciscience enceisistaking, taking,how howititimpacts impactsdata datascience, science,and andthe themany manyways waysinin which whichititwill willchange changeour ourlives livesininthe thetwenty-fi twenty-first rstcentury. century. When Whenyou youhear hearme medescribe describethese theseachievements achievementsasasaa“new “newsciscience,” ence,”you youmay maybe beskeptical. skeptical.You Youmay mayeven evenask, ask,Why Whywasn’t wasn’tthis this done doneaalong longtime timeago? ago?Say Saywhen whenVirgil Virgilfifirst rstproclaimed, proclaimed,“Lucky “Luckyisis he hewho whohas hasbeen beenable abletotounderstand understandthe thecauses causesofofthings” things”(29 bc). (29 bc). Or Orwhen whenthe thefounders foundersofofmodern modernstatistics, statistics,Francis FrancisGalton Galtonand and Karl KarlPearson, Pearson,fifirst rstdiscovered discoveredthat thatpopulation populationdata datacan canshed shedlight light on onscientifi scientificcquestions. questions.There Thereisisaalong longtale talebehind behindtheir theirunfortunate unfortunate failure failuretotoembrace embracecausation causationatatthis thisjuncture, juncture,which whichthe thehistorical historical sections sectionsofofthis thisbook bookwill willrelate. relate.But Butthe themost mostserious seriousimpediment, impediment, ininmy myopinion, opinion,has hasbeen beenthe thefundamental fundamentalgap gapbetween betweenthe thevocabuvocabulary laryininwhich whichwe wecast castcausal causalquestions questionsand andthe thetraditional traditionalvocabuvocabulary laryininwhich whichwe wecommunicate communicatescientifi scientificctheories. theories. To Toappreciate appreciatethe thedepth depthofofthis thisgap, gap,imagine imaginethe thediffi difficulties cultiesthat that aascientist scientistwould wouldface faceinintrying tryingtotoexpress expresssome someobvious obviouscausal causal relationships—say, relationships—say,that thatthe thebarometer barometerreading readingBBtracks tracksthe theatmoatmospheric sphericpressure pressureP.P.We Wecan caneasily easilywrite writedown downthis thisrelationship relationshipinin an anequation equationsuch suchasasBB==kP, kP,where wherekkisissome someconstant constantofofproporproportionality. tionality.The Therules rulesofofalgebra algebranow nowpermit permitus ustotorewrite rewritethis thissame same equation equationininaawild wildvariety varietyofofforms, forms,for forexample, example,PP==B/k, B/k,kk==B/P, B/P, or orBB––kP kP==0.0.They Theyall allmean meanthe thesame samething—that thing—thatififwe weknow knowany any two twoofofthe thethree threequantities, quantities,the thethird thirdisisdetermined. determined.None Noneofofthe the

9780141982410_TheBookOfWhy_TXT.indd 3

26/12/18 9:15 PM


44

THE THEBOOK BOOKOF OFWHY WHY

letters lettersk,k,B,B,or orPPisisininany anymathematical mathematicalway wayprivileged privilegedover overany any ofofthe theothers. others.How Howthen thencan canwe weexpress expressour ourstrong strongconviction convictionthat that ititisisthe thepressure pressurethat thatcauses causesthe thebarometer barometertotochange changeand andnot notthe the other otherway wayaround? around?And Andififwe wecannot cannotexpress expresseven eventhis, this,how howcan canwe we hope hopetotoexpress expressthe themany manyother othercausal causalconvictions convictionsthat thatdo donot nothave have mathematical mathematicalformulas, formulas,such suchasasthat thatthe therooster’s rooster’scrow crowdoes doesnot not cause causethe thesun suntotorise? rise? My Mycollege collegeprofessors professorscould couldnot notdo doititand andnever nevercomplained. complained.I I would wouldbe bewilling willingtotobet betthat thatnone noneofofyours yoursever everdid dideither. either.We Wenow now understand understandwhy: why:never neverwere werethey theyshown shownaamathematical mathematicallanguage language ofofcauses; causes;nor norwere werethey theyshown shownits itsbenefi benefits. ts.ItItisisininfact factan anindictindictment mentofofscience sciencethat thatitithas hasneglected neglectedtotodevelop developsuch suchaalanguage languagefor for so somany manygenerations. generations.Everyone Everyoneknows knowsthat thatflflipping ippingaaswitch switchwill will cause causeaalight lighttototurn turnon onor oroff offand andthat thataahot, hot,sultry sultrysummer summerafterafternoon noonwill willcause causesales salestotogo goup upatatthe thelocal localice-cream ice-creamparlor. parlor.Why Why then thenhave havescientists scientistsnot notcaptured capturedsuch suchobvious obviousfacts factsininformulas, formulas, asasthey theydid didwith withthe thebasic basiclaws lawsofofoptics, optics,mechanics, mechanics,or orgeometry? geometry? Why Whyhave havethey theyallowed allowedthese thesefacts factstotolanguish languishininbare bareintuition, intuition, deprived deprivedofofmathematical mathematicaltools toolsthat thathave haveenabled enabledother otherbranches branches ofofscience sciencetotoflflourish ourishand andmature? mature? Part Partofofthe theanswer answerisisthat thatscientifi scientificctools toolsare aredeveloped developedtotomeet meet scientifi scientificcneeds. needs.Precisely Preciselybecause becausewe weare areso sogood goodatathandling handlingquesquestions tions about about switches, switches, ice ice cream, cream, and and barometers, barometers, our our need need for for special specialmathematical mathematicalmachinery machinerytotohandle handlethem themwas wasnot notobvious. obvious. But Butasasscientifi scientificccuriosity curiosityincreased increasedand andwe webegan beganposing posingcausal causal questions questionsinincomplex complexlegal, legal,business, business,medical, medical,and andpolicy-making policy-making situations, situations,we wefound foundourselves ourselveslacking lackingthe thetools toolsand andprinciples principlesthat that mature maturescience scienceshould shouldprovide. provide. Belated Belatedawakenings awakeningsofofthis thissort sortare arenot notuncommon uncommonininscience. science. For For example, example, until until about about four four hundred hundred years years ago, ago, people people were were quite quitehappy happywith withtheir theirnatural naturalability abilitytotomanage managethe theuncertainties uncertainties inindaily dailylife, life,from fromcrossing crossingaastreet streettotorisking riskingaafifistfi stfight. ght.Only Onlyafter after gamblers gamblersinvented inventedintricate intricategames gamesofofchance, chance,sometimes sometimescarefully carefully designed designedtototrick trickus usinto intomaking makingbad badchoices, choices,did didmathematicians mathematicians like likeBlaise BlaisePascal Pascal(1654), (1654),Pierre Pierrede deFermat Fermat(1654), (1654),and andChristiaan Christiaan Huygens Huygens(1657) (1657)fifind ndititnecessary necessarytotodevelop developwhat whatwe wetoday todaycall call

9780141982410_TheBookOfWhy_TXT.indd 4

26/12/18 9:15 PM


Introduction: Introduction:Mind Mindover overData Data

55

probability probabilitytheory. theory.Likewise, Likewise,only onlywhen wheninsurance insuranceorganizations organizations demanded demandedaccurate accurateestimates estimatesofoflife lifeannuity annuitydid didmathematicians mathematicians like likeEdmond EdmondHalley Halley(1693) (1693)and andAbraham Abrahamde deMoivre Moivre(1725) (1725)begin begin looking lookingatatmortality mortalitytables tablestotocalculate calculatelife lifeexpectancies. expectancies.Similarly, Similarly, astronomers’ astronomers’ demands demands for for accurate accurate predictions predictions ofof celestial celestial momotion tionled ledJacob JacobBernoulli, Bernoulli,Pierre-Simon Pierre-SimonLaplace, Laplace,and andCarl CarlFriedrich Friedrich Gauss Gausstotodevelop developaatheory theoryofoferrors errorstotohelp helpus usextract extractsignals signalsfrom from noise. noise.These Thesemethods methodswere wereall allpredecessors predecessorsofoftoday’s today’sstatistics. statistics. Ironically, Ironically,the theneed needfor foraatheory theoryofofcausation causationbegan begantotosurface surfaceatat the thesame sametime timethat thatstatistics statisticscame cameinto intobeing. being.InInfact, fact,modern modernstastatistics tisticshatched hatchedfrom fromthe thecausal causalquestions questionsthat thatGalton Galtonand andPearson Pearson asked askedabout aboutheredity heredityand andtheir theiringenious ingeniousattempts attemptstotoanswer answerthem them using usingcross-generational cross-generationaldata. data.Unfortunately, Unfortunately,they theyfailed failedininthis this endeavor, endeavor,and andrather ratherthan thanpause pausetotoask askwhy, why,they theydeclared declaredthose those questions questionsoff offlimits limitsand andturned turnedtotodeveloping developingaathriving, thriving,causalitycausalityfree freeenterprise enterprisecalled calledstatistics. statistics. This Thiswas wasaacritical criticalmoment momentininthe thehistory historyofofscience. science.The Theopporopportunity tunitytotoequip equipcausal causalquestions questionswith withaalanguage languageofoftheir theirown owncame came very veryclose closetotobeing beingrealized realizedbut butwas wassquandered. squandered.InInthe thefollowing following years, years, these these questions questions were were declared declared unscientifi unscientificc and and went went ununderground. derground.Despite Despiteheroic heroiceff efforts ortsby bythe thegeneticist geneticistSewall SewallWright Wright (1889–1988), (1889–1988),causal causalvocabulary vocabularywas wasvirtually virtuallyprohibited prohibitedfor formore more than thanhalf halfaacentury. century.And Andwhen whenyou youprohibit prohibitspeech, speech,you youprohibit prohibit thought thoughtand andstifl stifleeprinciples, principles,methods, methods,and andtools. tools. Readers Readersdo donot nothave havetotobe bescientists scientiststotowitness witnessthis thisprohibition. prohibition. InInStatistics Statistics101, 101,every everystudent studentlearns learnstotochant, chant,“Correlation “Correlationisisnot not causation.” causation.”With Withgood goodreason! reason!The Therooster’s rooster’scrow crowisishighly highlycorcorrelated relatedwith withthe thesunrise; sunrise;yet yetititdoes doesnot notcause causethe thesunrise. sunrise. Unfortunately, Unfortunately,statistics statisticshas hasfetishized fetishizedthis thiscommonsense commonsenseobserobservation. vation.ItIttells tellsus usthat thatcorrelation correlationisisnot notcausation, causation,but butititdoes doesnot not tell tellus uswhat whatcausation causationis. is.InInvain vainwill willyou yousearch searchthe theindex indexofofaastastatistics tisticstextbook textbookfor foran anentry entryon on“cause.” “cause.”Students Studentsare arenot notallowed allowed totosay saythat thatXXisisthe thecause causeofofY—only Y—onlythat thatXXand andYYare are“related” “related”or or “associated.” “associated.” Because Because ofof this this prohibition, prohibition, mathematical mathematical tools tools toto manage manage causal causalquestions questionswere weredeemed deemedunnecessary, unnecessary,and andstatistics statisticsfocused focused

9780141982410_TheBookOfWhy_TXT.indd 5

26/12/18 9:15 PM


66

THE THEBOOK BOOKOF OFWHY WHY

exclusively exclusivelyon onhow howtotosummarize summarizedata, data,not noton onhow howtotointerpret interpret it.it.AAshining shiningexception exceptionwas waspath pathanalysis, analysis,invented inventedby bygeneticist geneticist Sewall SewallWright Wrightininthe the1920s 1920sand andaadirect directancestor ancestorofofthe themethods methods we wewill willentertain entertainininthis thisbook. book.However, However,path pathanalysis analysiswas wasbadly badly underappreciated underappreciatedininstatistics statisticsand andits itssatellite satellitecommunities communitiesand andlanlanguished guishedfor fordecades decadesininits itsembryonic embryonicstatus. status.What Whatshould shouldhave havebeen been the thefifirst rststep steptoward towardcausal causalinference inferenceremained remainedthe theonly onlystep stepuntil until the the1980s. 1980s.The Therest restofofstatistics, statistics,including includingthe themany manydisciplines disciplinesthat that looked lookedtotoititfor forguidance, guidance,remained remainedininthe theProhibition Prohibitionera, era,falsely falsely believing believingthat thatthe theanswers answerstotoall allscientifi scientificcquestions questionsreside resideininthe the data, data,totobe beunveiled unveiledthrough throughclever cleverdata-mining data-miningtricks. tricks. Much Muchofofthis thisdata-centric data-centrichistory historystill stillhaunts hauntsus ustoday. today.We Welive live ininan anera erathat thatpresumes presumesBig BigData Datatotobe bethe thesolution solutiontotoall allour ourprobproblems. lems.Courses Coursesinin“data “datascience” science”are areproliferating proliferatingininour ouruniversiuniversities, ties,and andjobs jobsfor for“data “datascientists” scientists”are arelucrative lucrativeininthe thecompanies companies that thatparticipate participateininthe the“data “dataeconomy.” economy.”But ButI Ihope hopewith withthis thisbook book totoconvince convinceyou youthat thatdata dataare areprofoundly profoundlydumb. dumb.Data Datacan cantell tellyou you that thatthe thepeople peoplewho whotook tookaamedicine medicinerecovered recoveredfaster fasterthan thanthose those who whodid didnot nottake takeit,it,but butthey theycan’t can’ttell tellyou youwhy. why.Maybe Maybethose thosewho who took tookthe themedicine medicinedid didso sobecause becausethey theycould couldaff afford ordititand andwould would have haverecovered recoveredjust justasasfast fastwithout withoutit.it. Over Overand andover overagain, again,ininscience scienceand andininbusiness, business,we wesee seesituations situations where wheremere meredata dataaren’t aren’tenough. enough.Most Mostbig-data big-dataenthusiasts, enthusiasts,while while somewhat somewhataware awareofofthese theselimitations, limitations,continue continuethe thechase chaseafter afterdatadatacentric centricintelligence, intelligence,asasififwe wewere werestill stillininthe theProhibition era. Prohibition era. As AsI Imentioned mentionedearlier, earlier,things thingshave havechanged changeddramatically dramaticallyininthe the past pastthree threedecades. decades.Nowadays, Nowadays,thanks thankstotocarefully carefullycrafted craftedcausal causal models, models,contemporary contemporaryscientists scientistscan canaddress addressproblems problemsthat thatwould would have haveonce oncebeen beenconsidered consideredunsolvable unsolvableor oreven evenbeyond beyondthe thepale paleofof scientifi scientificcinquiry. inquiry.For Forexample, example,only onlyaahundred hundredyears yearsago, ago,the thequesquestion tionofofwhether whethercigarette cigarettesmoking smokingcauses causesaahealth healthhazard hazardwould would have havebeen beenconsidered consideredunscientifi unscientific.c.The Themere meremention mentionofofthe thewords words “cause” “cause”or or“eff “effect” ect”would wouldcreate createaastorm stormofofobjections objectionsininany anyrepureputable tablestatistical statisticaljournal. journal. Even Eventwo twodecades decadesago, ago,asking askingaastatistician statisticianaaquestion questionlike like“Was “Was ititthe theaspirin aspirinthat thatstopped stoppedmy myheadache?” headache?”would wouldhave havebeen beenlike like

9780141982410_TheBookOfWhy_TXT.indd 6

26/12/18 9:15 PM


Introduction: Introduction:Mind Mindover overData Data

77

asking askingififhe hebelieved believedininvoodoo. voodoo.To Toquote quotean anesteemed esteemedcolleague colleague ofofmine, mine,ititwould wouldbe be“more “moreofofaacocktail cocktailconversation conversationtopic topicthan than aascientifi scientificcinquiry.” inquiry.”But Buttoday, today,epidemiologists, epidemiologists,social socialscientists, scientists, computer computerscientists, scientists,and andatatleast leastsome someenlightened enlightenedeconomists economistsand and statisticians statisticianspose posesuch suchquestions questionsroutinely routinelyand andanswer answerthem themwith with mathematical mathematicalprecision. precision.To Tome, me,this thischange changeisisnothing nothingshort shortofof aarevolution. revolution.I Idare daretotocall callititthe theCausal CausalRevolution, Revolution,aascientifi scientificc shakeup shakeupthat thatembraces embracesrather ratherthan thandenies deniesour ourinnate innatecognitive cognitivegift gift ofofunderstanding understandingcause causeand andeff effect. ect. The TheCausal CausalRevolution Revolutiondid didnot nothappen happenininaavacuum; vacuum;itithas hasaa mathematical mathematicalsecret secretbehind behindititwhich whichcan canbe bebest bestdescribed describedasasaacalcalculus culusofofcausation, causation,which whichanswers answerssome someofofthe thehardest hardestproblems problems ever everasked askedabout aboutcause-eff cause-effect ectrelationships. relationships.I Iam amthrilled thrilledtotounveil unveil this thiscalculus calculusnot notonly onlybecause becausethe theturbulent turbulenthistory historyofofits itsdeveldevelopment opmentisisintriguing intriguingbut buteven evenmore morebecause becauseI Iexpect expectthat thatits itsfull full potential potentialwill willbe bedeveloped developedone oneday daybeyond beyondwhat whatI Ican canimagine imagine. .. .. . perhaps perhapseven evenby byaareader readerofofthis thisbook. book. The Thecalculus calculusofofcausation causationconsists consistsofoftwo twolanguages: languages:causal causaldidiagrams, agrams,totoexpress expresswhat whatwe weknow, know,and andaasymbolic symboliclanguage, language,reresembling semblingalgebra, algebra,totoexpress expresswhat whatwe wewant wanttotoknow. know.The Thecausal causal diagrams diagramsare aresimply simplydot-and-arrow dot-and-arrowpictures picturesthat thatsummarize summarizeour our existing existingscientifi scientificcknowledge. knowledge.The Thedots dotsrepresent representquantities quantitiesofofinterinterest, est,called called“variables,” “variables,”and andthe thearrows arrowsrepresent representknown knownor orsuspected suspected causal causalrelationships relationshipsbetween betweenthose thosevariables—namely, variables—namely,which whichvarivariable able“listens” “listens”totowhich whichothers. others.These Thesediagrams diagramsare areextremely extremelyeasy easytoto draw, draw,comprehend, comprehend,and anduse, use,and andthe thereader readerwill willfifind nddozens dozensofofthem them ininthe thepages pagesofofthis thisbook. book.IfIfyou youcan cannavigate navigateusing usingaamap mapofofone-way one-way streets, streets,then thenyou youcan canunderstand understandcausal causaldiagrams, diagrams,and andyou youcan cansolve solve the thetype typeofofquestions questionsposed posedatatthe thebeginning beginningofofthis thisintroduction. introduction. Though Thoughcausal causaldiagrams diagramsare aremy mytool toolofofchoice choiceininthis thisbook, book,asasinin the thelast lastthirty-fi thirty-five veyears yearsofofmy myresearch, research,they theyare arenot notthe theonly onlykind kindofof causal causalmodel modelpossible. possible.Some Somescientists scientists(e.g., (e.g.,econometricians) econometricians)like like totowork workwith withmathematical mathematicalequations; equations;others others(e.g., (e.g.,hardhard-core corestatstatisticians) isticians)prefer preferaalist listofofassumptions assumptionsthat thatostensibly ostensiblysummarizes summarizes the thestructure structureofofthe thediagram. diagram.Regardless Regardlessofoflanguage, language,the themodel model should shoulddepict, depict,however howeverqualitatively, qualitatively,the theprocess processthat thatgenerates generatesthe the

9780141982410_TheBookOfWhy_TXT.indd 7

26/12/18 9:15 PM


88

THE THEBOOK BOOKOF OFWHY WHY

data—in data—inother otherwords, words,the thecause-eff cause-effect ectforces forcesthat thatoperate operateininthe the environment environmentand andshape shapethe thedata datagenerated. generated. Side Sideby byside sidewith withthis thisdiagrammatic diagrammatic“language “languageofofknowledge,” knowledge,” we wealso alsohave haveaasymbolic symbolic“language “languageofofqueries” queries”totoexpress expressthe thequesquestions tionswe wewant wantanswers answersto. to.For Forexample, example,ififwe weare areinterested interestedininthe the eff effect ectofofaadrug drug(D) (D)on onlifespan lifespan(L), (L),then thenour ourquery querymight mightbe bewritten written symbolically symbolicallyas: as:P(L P(L| do(D)). | do(D)).InInother otherwords, words,what whatisisthe theprobability probability (P) (P)that thataatypical typicalpatient patientwould wouldsurvive surviveLLyears yearsififmade madetototake takethe the drug? drug?This Thisquestion questiondescribes describeswhat whatepidemiologists epidemiologistswould wouldcall callan an intervention interventionor oraatreatment treatmentand andcorresponds correspondstotowhat whatwe wemeasure measure ininaaclinical clinicaltrial. trial.InInmany manycases caseswe wemay mayalso alsowish wishtotocompare compareP(L | P(L | do(D)) do(D))with withP(L P(L| |do(not-D)); do(not-D));the thelatter latterdescribes describespatients patientsdenied denied treatment, treatment,also alsocalled calledthe the“control” “control”patients. patients.The Thedo-operator do-operatorsigsignifi nifiesesthat thatwe weare aredealing dealingwith withan anintervention interventionrather ratherthan thanaapassive passive observation; observation;classical classicalstatistics statisticshas hasnothing nothingremotely remotelysimilar similartotothis this operator. operator. We Wemust mustinvoke invokean anintervention interventionoperator operatordo(D) do(D)totoensure ensurethat that the theobserved observedchange changeininLifespan LifespanLLisisdue duetotothe thedrug drugitself itselfand andisis not notconfounded confoundedwith withother otherfactors factorsthat thattend tendtotoshorten shortenor orlengthen lengthen life. life.If, If,instead insteadofofintervening, intervening,we welet letthe thepatient patienthimself himselfdecide decide whether whethertototake takethe thedrug, drug,those thoseother otherfactors factorsmight mightinfl influence uencehis his decision, decision,and andlifespan lifespandiff differences erencesbetween betweentaking takingand andnot nottaking taking the thedrug drugwould wouldno nolonger longerbe besolely solelydue duetotothe thedrug. drug.For Forexample, example, suppose supposeonly onlythose thosewho whowere wereterminally terminallyillilltook tookthe thedrug. drug.Such Such persons personswould wouldsurely surelydiff differerfrom fromthose thosewho whodid didnot nottake takethe thedrug, drug, and andaacomparison comparisonofofthe thetwo twogroups groupswould wouldrefl reflect ectdiff differences erencesinin the theseverity severityofoftheir theirdisease diseaserather ratherthan thanthe theeff effect ectofofthe thedrug. drug.By By contrast, contrast,forcing forcingpatients patientstototake takeor orrefrain refrainfrom fromtaking takingthe thedrug, drug, regardless regardlessofofpreconditions, preconditions,would wouldwash washaway awaypreexisting preexistingdiff differerences encesand andprovide provideaavalid validcomparison. comparison. Mathematically, Mathematically,we wewrite writethe theobserved observedfrequency frequencyofofLifespan LifespanLL among amongpatients patientswho whovoluntarily voluntarilytake takethe thedrug drugasasP(L P(L| |D), D),which whichisis the thestandard standardconditional conditionalprobability probabilityused usedininstatistical statisticaltextbooks. textbooks. This Thisexpression expressionstands standsfor forthe theprobability probability(P) (P)ofofLifespan LifespanLLcondiconditional tionalon onseeing seeingthe thepatient patienttake takeDrug DrugD. D.Note Notethat thatP(L P(L| |D) D)may maybe be totally totallydiff different erentfrom fromP(L P(L| |do(D)). do(D)).This Thisdiff difference erencebetween betweenseeing seeing

9780141982410_TheBookOfWhy_TXT.indd 8

26/12/18 9:15 PM


Introduction: Introduction:Mind Mindover overData Data

99

and anddoing doingisisfundamental fundamentaland andexplains explainswhy whywe wedo donot notregard regardthe the falling fallingbarometer barometertotobe beaacause causeofofthe thecoming comingstorm. storm.Seeing Seeingthe the barometer barometerfall fallincreases increasesthe theprobability probabilityofofthe thestorm, storm,while whileforcing forcing itittotofall falldoes doesnot notaff affect ectthis thisprobability. probability. This Thisconfusion confusionbetween betweenseeing seeingand anddoing doinghas hasresulted resultedininaafounfountain tainofofparadoxes, paradoxes,some someofofwhich whichwe wewill willentertain entertainininthis thisbook. book.AA world worlddevoid devoidofofP(L P(L| |do(D)) do(D))and andgoverned governedsolely solelyby byP(L P(L| |D) D)would would be beaastrange strangeone oneindeed. indeed.For Forexample, example,patients patientswould wouldavoid avoidgoing going totothe thedoctor doctortotoreduce reducethe theprobability probabilityofofbeing beingseriously seriouslyill; ill;citcities ieswould woulddismiss dismisstheir theirfifirefi refighters ghterstotoreduce reducethe theincidence incidenceofoffifires; res; doctors doctorswould wouldrecommend recommendaadrug drugtotomale maleand andfemale femalepatients patientsbut but not nottotopatients patientswith withundisclosed undisclosedgender; gender;and andso soon. on.ItItisishard hardtoto believe believethat thatless lessthan thanthree threedecades decadesago agoscience sciencedid didoperate operateininsuch such aaworld: world:the thedo-operator do-operatordid didnot notexist. exist. One Oneofofthe thecrowning crowningachievements achievementsofofthe theCausal CausalRevolution Revolution has hasbeen beentotoexplain explainhow howtotopredict predictthe theeff effects ectsofofan anintervention intervention without withoutactually actuallyenacting enactingit.it.ItItwould wouldnever neverhave havebeen beenpossible possibleifif we wehad hadnot, not,fifirst rstofofall, all,defi defined nedthe thedo-operator do-operatorso sothat thatwe wecan canask ask the theright rightquestion questionand, and,second, second,devised devisedaaway waytotoemulate emulateititby bynonnoninvasive invasivemeans. means. When Whenthe thescientifi scientificcquestion questionofofinterest interestinvolves involvesretrospective retrospective thinking, thinking,we wecall callon onanother anothertype typeofofexpression expressionunique uniquetotocausal causal reasoning reasoningcalled calledaacounterfactual. counterfactual.For Forexample, example,suppose supposethat thatJoe Joe took tookDrug DrugDDand anddied diedaamonth monthlater; later;our ourquestion questionofofinterest interestisis whether whetherthe thedrug drugmight mighthave havecaused causedhis hisdeath. death.To Toanswer answerthis this question, question,we weneed needtotoimagine imagineaascenario scenarioininwhich whichJoe Joewas wasabout abouttoto take takethe thedrug drugbut butchanged changedhis hismind. mind.Would Wouldhe hehave havelived? lived? Again, Again,classical classicalstatistics statisticsonly onlysummarizes summarizesdata, data,so soititdoes doesnot not provide provideeven evenaalanguage languagefor forasking askingthat thatquestion. question.Causal Causalinference inference provides providesaanotation notationand, and,more moreimportantly, importantly,off offers ersaasolution. solution.As As with withpredicting predictingthe theeff effect ectofofinterventions interventions(mentioned (mentionedabove), above),inin many manycases caseswe wecan canemulate emulatehuman humanretrospective retrospectivethinking thinkingwith withan an algorithm algorithmthat thattakes takeswhat whatwe weknow knowabout aboutthe theobserved observedworld worldand and produces producesan ananswer answerabout aboutthe thecounterfactual counterfactualworld. world.This This“algo“algorithmization rithmizationofofcounterfactuals” counterfactuals”isisanother anothergem gemuncovered uncoveredby bythe the Causal CausalRevolution. Revolution.

9780141982410_TheBookOfWhy_TXT.indd 9

26/12/18 9:15 PM


10 10

THE THEBOOK BOOKOF OFWHY WHY

Counterfactual Counterfactual reasoning, reasoning, which which deals deals with with what-ifs, what-ifs, might might strike strikesome somereaders readersasasunscientifi unscientific.c.Indeed, Indeed,empirical empiricalobservation observation can cannever neverconfi confirm rmor orrefute refutethe theanswers answerstotosuch suchquestions. questions.Yet Yet our ourminds mindsmake makevery veryreliable reliableand andreproducible reproduciblejudgments judgmentsall allthe the time timeabout aboutwhat whatmight mightbe beor ormight mighthave havebeen. been.We Weall allunderstand, understand, for forinstance, instance,that thathad hadthe therooster roosterbeen beensilent silentthis thismorning, morning,the the sun sunwould wouldhave haverisen risenjust justasaswell. well.This Thisconsensus consensusstems stemsfrom fromthe the fact factthat thatcounterfactuals counterfactualsare arenot notproducts productsofofwhimsy whimsybut butrefl reflect ect the thevery verystructure structureofofour ourworld worldmodel. model.Two Twopeople peoplewho whoshare sharethe the same samecausal causalmodel modelwill willalso alsoshare shareall allcounterfactual counterfactualjudgments. judgments. Counterfactuals Counterfactualsare arethe thebuilding buildingblocks blocksofofmoral moralbehavior behaviorasas well wellasasscientifi scientificcthought. thought.The Theability abilitytotorefl reflect ecton onone’s one’spast pastactions actions and andenvision envisionalternative alternativescenarios scenariosisisthe thebasis basisofoffree freewill willand andsosocial cialresponsibility. responsibility.The Thealgorithmization algorithmizationofofcounterfactuals counterfactualsinvites invites thinking thinkingmachines machinestotobenefi benefit tfrom fromthis thisability abilityand andparticipate participateinin this this(until (untilnow) now)uniquely uniquelyhuman humanway wayofofthinking thinkingabout aboutthe theworld. world. My Mymention mentionofofthinking thinkingmachines machinesininthe thelast lastparagraph paragraphisisintenintentional. tional.I Icame cametotothis thissubject subjectasasaacomputer computerscientist scientistworking workingininthe the area areaofofartifi artificial cialintelligence, intelligence,which whichentails entailstwo twopoints pointsofofdeparture departure from frommost mostofofmy mycolleagues colleaguesininthe thecausal causalinference inferencearena. arena.First, First, ininthe theworld worldofofAI, AI,you youdo donot notreally reallyunderstand understandaatopic topicuntil untilyou you can canteach teachitittotoaamechanical mechanicalrobot. robot.That Thatisiswhy whyyou youwill willfifind ndme me emphasizing emphasizingand andreemphasizing reemphasizingnotation, notation,language, language,vocabulary, vocabulary, and andgrammar. grammar.For Forexample, example,I Iobsess obsessover overwhether whetherwe wecan canexpress express aacertain certainclaim claimininaagiven givenlanguage languageand andwhether whetherone oneclaim claimfollows follows from fromothers. others.ItItisisamazing amazinghow howmuch muchone onecan canlearn learnfrom fromjust justfolfollowing lowingthe thegrammar grammarofofscientifi scientificcutterances. utterances.My Myemphasis emphasison onlanlanguage guagealso alsocomes comesfrom fromaadeep deepconviction convictionthat thatlanguage languageshapes shapesour our thoughts. thoughts.You Youcannot cannotanswer answeraaquestion questionthat thatyou youcannot cannotask, ask,and and you youcannot cannotask askaaquestion questionthat thatyou youhave haveno nowords wordsfor. for.As Asaastustudent dentofofphilosophy philosophyand andcomputer computerscience, science,my myattraction attractiontotocausal causal inference inferencehas haslargely largelybeen beentriggered triggeredby bythe theexcitement excitementofofseeing seeingan an orphaned orphanedscientifi scientificclanguage languagemaking makingititfrom frombirth birthtotomaturity. maturity. My Mybackground backgroundininmachine machinelearning learninghas hasgiven givenme meyet yetanother another incentive incentivefor forstudying studyingcausation. causation.InInthe thelate late1980s, 1980s,I Irealized realizedthat that machines’ machines’lack lackofofunderstanding understandingofofcausal causalrelations relationswas wasperhaps perhaps

9780141982410_TheBookOfWhy_TXT.indd 10

26/12/18 9:15 PM


Introduction: Introduction:Mind Mindover overData Data

11 11

the thebiggest biggestroadblock roadblocktotogiving givingthem themhuman-level human-levelintelligence. intelligence.InIn the thelast lastchapter chapterofofthis thisbook, book,I Iwill willreturn returntotomy myroots, roots,and andtogether together we wewill willexplore explorethe theimplications implicationsofofthe theCausal CausalRevolution Revolutionfor forarartifi tificial cialintelligence. intelligence.I Ibelieve believethat thatstrong strongAI AIisisan anachievable achievablegoal goal and andone onenot nottotobe befeared fearedprecisely preciselybecause becausecausality causalityisispart partofofthe the solution. solution.AAcausal causalreasoning reasoningmodule modulewill willgive givemachines machinesthe theability ability totorefl reflect ecton ontheir theirmistakes, mistakes,totopinpoint pinpointweaknesses weaknessesinintheir theirsoftsoftware, ware,totofunction functionasasmoral moralentities, entities,and andtotoconverse conversenaturally naturallywith with humans humansabout abouttheir theirown ownchoices choicesand andintentions. intentions.

AABLUEPRINT BLUEPRINTOF OFREALITY REALITY InInour ourera, era,readers readershave haveno nodoubt doubtheard heardterms termslike like“knowledge,” “knowledge,” “information,” “information,”“intelligence,” “intelligence,”and and“data,” “data,”and andsome somemay mayfeel feelconconfused fusedabout aboutthe thediff differences erencesbetween betweenthem themor orhow howthey theyinteract. interact. Now NowI Iam amproposing proposingtotothrow throwanother anotherterm, term,“causal “causalmodel,” model,”into into the themix, mix,and andthe thereader readermay mayjustifi justifiably ablywonder wonderififthis thiswill willonly onlyadd add totothe theconfusion. confusion. ItItwill willnot! not!InInfact, fact,ititwill willanchor anchorthe theelusive elusivenotions notionsofofscience, science, knowledge, knowledge,and anddata dataininaaconcrete concreteand andmeaningful meaningfulsetting, setting,and andwill will enable enableus ustotosee seehow howthe thethree threework worktogether togethertotoproduce produceanswers answers totodiffi difficult cultscientifi scientificcquestions. questions.Figure FigureI.1 I.1shows showsaablueprint blueprintfor foraa “causal “causalinference inferenceengine” engine”that thatmight mighthandle handlecausal causalreasoning reasoningfor for aafuture futureartifi artificial cialintelligence. intelligence.It’s It’simportant importanttotorealize realizethat thatthis thisisis not notonly onlyaablueprint blueprintfor forthe thefuture futurebut butalso alsoaaguide guidetotohow howcausal causal models modelswork workininscientifi scientificcapplications applicationstoday todayand andhow howthey theyinteract interact with withdata. data. The Theinference inferenceengine engineisisaamachine machinethat thataccepts acceptsthree threediff different erent kinds kindsofofinputs—Assumptions, inputs—Assumptions,Queries, Queries,and andData—and Data—andproduces produces three threekinds kindsofofoutputs. outputs.The Thefifirst rstofofthe theoutputs outputsisisaaYes/No Yes/Nodecision decision asastotowhether whetherthe thegiven givenquery querycan caninintheory theorybe beanswered answeredunder underthe the existing existingcausal causalmodel, model,assuming assumingperfect perfectand andunlimited unlimiteddata. data.IfIfthe the answer answerisisYes, Yes,the theinference inferenceengine enginenext nextproduces producesan anEstimand. Estimand. This Thisisisaamathematical mathematicalformula formulathat thatcan canbe bethought thoughtofofasasaarecipe recipe for forgenerating generatingthe theanswer answerfrom fromany anyhypothetical hypotheticaldata, data,whenever whenever they theyare areavailable. available.Finally, Finally,after afterthe theinference inferenceengine enginehas hasreceived received

9780141982410_TheBookOfWhy_TXT.indd 11

26/12/18 9:15 PM


12 12

THE THEBOOK BOOKOF OFWHY WHY

Figure FigureI.I. How Howan an“inference “inferenceengine” engine”combines combinesdata datawith withcausal causalknowlknowledge edgetotoproduce produceanswers answerstotoqueries queriesofofinterest. interest.The Thedashed dashedbox boxisisnot notpart part ofofthe theengine enginebut butisisrequired requiredfor forbuilding buildingit.it.Arrows Arrowscould couldalso alsobe bedrawn drawn from fromboxes boxes44and and99totobox box1,1,but butI Ihave haveopted optedtotokeep keepthe thediagram diagramsimple. simple.

the theData Datainput, input,ititwill willuse usethe therecipe recipetotoproduce producean anactual actualEstimate Estimate for forthe theanswer, answer,along alongwith withstatistical statisticalestimates estimatesofofthe theamount amountofof uncertainty uncertaintyininthat thatestimate. estimate.This Thisuncertainty uncertaintyrefl reflects ectsthe thelimited limited size sizeofofthe thedata dataset setasaswell wellasaspossible possiblemeasurement measurementerrors errorsor ormissmissing ingdata. data. To Todig digmore moredeeply deeplyinto intothe thechart, chart,I Ihave havelabeled labeledthe theboxes boxes11 through through9,9,which whichI Iwill willannotate annotateininthe thecontext contextofofthe thequery query“What “What isisthe theeff effect ectofofDrug DrugDDon onLifespan LifespanL?” L?” 1.1. “Knowledge” “Knowledge”stands standsfor fortraces tracesofofexperience experiencethe thereasoning reasoning agent agenthas hashad hadininthe thepast, past,including includingpast pastobservations, observations,past past actions, actions, education, education, and and cultural cultural mores, mores, that that are are deemed deemed relevant relevanttotothe thequery queryofofinterest. interest.The Thedotted dottedbox boxaround around “Knowledge” “Knowledge”indicates indicatesthat thatititremains remainsimplicit implicitininthe themind mind ofofthe theagent agentand andisisnot notexplicated explicatedformally formallyininthe themodel. model. 2.2. Scientifi Scientificcresearch researchalways alwaysrequires requiressimplifying simplifyingassumptions, assumptions, that thatis, is,statements statementswhich whichthe theresearcher researcherdeems deemsworthy worthyofof making makingexplicit expliciton onthe thebasis basisofofthe theavailable availableKnowledge. Knowledge. While Whilemost mostofofthe theresearcher’s researcher’sknowledge knowledgeremains remainsimplicit implicit ininhis hisor orher herbrain, brain,only onlyAssumptions Assumptionssee seethe thelight lightofofday dayand and

9780141982410_TheBookOfWhy_TXT.indd 12

26/12/18 9:15 PM


Introduction: Introduction:Mind Mindover overData Data

13 13

are areencapsulated encapsulatedininthe themodel. model.They Theycan canininfact factbe beread readfrom from the themodel, model,which whichhas hasled ledsome somelogicians logicianstotoconclude concludethat thataa model modelisisnothing nothingmore morethan thanaalist listofofassumptions. assumptions.Computer Computer scientists scientiststake takeexception exceptiontotothis thisclaim, claim,noting notingthat thathow howasassumptions sumptionsare arerepresented representedcan canmake makeaaprofound profounddiff difference erence ininone’s one’sability abilitytotospecify specifythem themcorrectly, correctly,draw drawconclusions conclusions from fromthem, them,and andeven evenextend extendor ormodify modifythem themininlight lightofofcomcompelling pellingevidence. evidence. 3.3. Various Variousoptions optionsexist existfor forcausal causalmodels: models:causal causaldiagrams, diagrams, structural structuralequations, equations,logical logicalstatements, statements,and andso soforth. forth.I Iam am strongly stronglysold soldon oncausal causaldiagrams diagramsfor fornearly nearlyall allapplications, applications, primarily primarilydue duetototheir theirtransparency transparencybut butalso alsodue duetotothe theexexplicit plicitanswers answersthey theyprovide providetotomany manyofofthe thequestions questionswe we wish wishtotoask. ask.For Forthe thepurpose purposeofofconstructing constructingthe thediagram, diagram, the thedefi definition nitionofof“causation” “causation”isissimple, simple,ififaalittle littlemetaphormetaphorical: ical:aavariable variableXXisisaacause causeofofYYififYY“listens” “listens”totoXXand anddedetermines terminesits itsvalue valueininresponse responsetotowhat whatitithears. hears.For Forexample, example, ififwe wesuspect suspectthat thataapatient’s patient’sLifespan LifespanLL“listens” “listens”totowhether whether Drug DrugDDwas wastaken, taken,then thenwe wecall callDDaacause causeofofLLand anddraw draw an anarrow arrowfrom fromDDtotoLLininaacausal causaldiagram. diagram.Naturally, Naturally,the the answer answertotoour ourquery queryabout aboutDDand andLLisislikely likelytotodepend dependon on other othervariables variablesasaswell, well,which whichmust mustalso alsobe berepresented representedinin the thediagram diagramalong alongwith withtheir theircauses causesand andeff effects. ects.(Here, (Here,we we will willdenote denotethem themcollectively collectivelyby byZ.) Z.) 4.4. The Thelistening listeningpattern patternprescribed prescribedby bythe thepaths pathsofofthe thecausal causal model modelusually usuallyresults resultsininobservable observablepatterns patternsor ordependendependencies ciesininthe thedata. data.These Thesepatterns patternsare arecalled called“testable “testableimplicaimplications” tions”because becausethey theycan canbe beused usedfor fortesting testingthe themodel. model.These These are arestatements statementslike like“There “Thereisisno nopath pathconnecting connectingDDand andL,” L,” which whichtranslates translatestotoaastatistical statisticalstatement, statement,“D “Dand andLLare areinindependent,” dependent,”that thatis, is,fifinding ndingDDdoes doesnot notchange changethe thelikelihood likelihood ofofL.L.IfIfthe thedata datacontradict contradictthis thisimplication, implication,then thenwe weneed need totorevise reviseour ourmodel. model.Such Suchrevisions revisionsrequire requireanother anotherengine, engine, which whichobtains obtainsits itsinputs inputsfrom fromboxes boxes44and and77and andcomputes computes the the“degree “degreeofoffifitness,” tness,”that thatis, is,the thedegree degreetotowhich whichthe theData Data are arecompatible compatiblewith withthe themodel’s model’sassumptions. assumptions.For Forsimplicsimplicity, ity,I Idid didnot notshow showthis thissecond secondengine engineininFigure FigureI.1. I.1.

9780141982410_TheBookOfWhy_TXT.indd 13

26/12/18 9:15 PM


14 14

THE THEBOOK BOOKOF OFWHY WHY

5.5. Queries Queriessubmitted submittedtotothe theinference inferenceengine engineare arethe thescientifi scientificc questions questionsthat thatwe wewant wanttotoanswer. answer.They Theymust mustbe beformulated formulated inincausal causalvocabulary. vocabulary.For Forexample, example,what whatisisP(L P(L| |do(D))? do(D))? One Oneofofthe themain mainaccomplishments accomplishmentsofofthe theCausal CausalRevolution Revolution has hasbeen beentotomake makethis thislanguage languagescientifi scientifically callytransparent transparentasas well wellasasmathematically mathematicallyrigorous. rigorous. 6.6. “Estimand” “Estimand”comes comesfrom fromLatin, Latin,meaning meaning“that “thatwhich whichisistoto be beestimated.” estimated.”This Thisisisaastatistical statisticalquantity quantitytotobe beestimated estimated from fromthe thedata datathat, that,once onceestimated, estimated,can canlegitimately legitimatelyreprerepresent sentthe theanswer answertotoour ourquery. query.While Whilewritten writtenasasaaprobability probability formula—for formula—forexample, example,P(L P(L| |D, D,Z) Z)××P(Z)—it P(Z)—itisisininfact factaa recipe recipefor foranswering answeringthe thecausal causalquery queryfrom fromthe thetype typeofofdata data we wehave, have,once onceitithas hasbeen beencertifi certified edby bythe theengine. engine. It’s It’svery veryimportant importanttotorealize realizethat, that,contrary contrarytototraditional traditional estimation estimationininstatistics, statistics,some somequeries queriesmay maynot notbe beanswerable answerable under underthe thecurrent currentcausal causalmodel, model,even evenafter afterthe thecollection collectionofof any anyamount amountofofdata. data.For Forexample, example,ififour ourmodel modelshows showsthat that both bothDDand andLLdepend dependon onaathird thirdvariable variableZZ(say, (say,the thestage stageofof aadisease), disease),and andififwe wedo donot nothave haveany anyway waytotomeasure measureZ, Z,then then the thequery queryP(L P(L| do(D)) | do(D))cannot cannotbe beanswered. answered.InInthat thatcase caseititisisaa waste wasteofoftime timetotocollect collectdata. data.Instead Insteadwe weneed needtotogo goback backand and refi refine nethe themodel, model,either eitherby byadding addingnew newscientifi scientificcknowledge knowledge that thatmight mightallow allowus ustotoestimate estimateZZor orby bymaking makingsimplifying simplifying assumptions assumptions(at (atthe therisk riskofofbeing beingwrong)—for wrong)—forexample, example,that that the theeff effect ectofofZZon onDDisisnegligible. negligible. 7.7. Data Dataare arethe theingredients ingredientsthat thatgo gointo intothe theestimand estimandrecipe. recipe. ItItisiscritical criticaltotorealize realizethat thatdata dataare areprofoundly profoundlydumb dumbabout about causal causalrelationships. relationships.They Theytell tellus usabout aboutquantities quantitieslike likeP(L P(L| | D) D)or orP(L P(L| |D, D,Z). Z).ItItisisthe thejob jobofofthe theestimand estimandtototell tellus ushow how totobake bakethese thesestatistical statisticalquantities quantitiesinto intoone oneexpression expressionthat, that, based basedon onthe themodel modelassumptions, assumptions,isislogically logicallyequivalent equivalenttoto the thecausal causalquery—say, query—say,P(L P(L| |do(D)). do(D)). Notice Noticethat thatthe thewhole wholenotion notionofofestimands estimandsand andininfact factthe the whole wholetop toppart partofofFigure FigureI Idoes doesnot notexist existinintraditional traditionalmethmethods odsofofstatistical statisticalanalysis. analysis.There, There,the theestimand estimandand andthe thequery query coincide. coincide.For Forexample, example,ififwe weare areinterested interestedininthe theproportion proportion

9780141982410_TheBookOfWhy_TXT.indd 14

26/12/18 9:15 PM


Introduction: Introduction:Mind Mindover overData Data

15 15

ofofpeople peopleamong amongthose thosewith withLifespan LifespanLLwho whotook tookthe theDrug Drug D, D,we wesimply simplywrite writethis thisquery queryasasP(D P(D| |L). L).The Thesame samequantity quantity would wouldbe beour ourestimand. estimand.This Thisalready alreadyspecifi specifieseswhat whatproporproportions tionsininthe thedata dataneed needtotobe beestimated estimatedand andrequires requiresno nocausal causal knowledge. knowledge.For Forthis thisreason, reason,some somestatisticians statisticianstotothis thisday day fifind ndititextremely extremelyhard hardtotounderstand understandwhy whysome someknowledge knowledge lies liesoutside outsidethe theprovince provinceofofstatistics statisticsand andwhy whydata dataalone alonecancannot notmake makeup upfor forlack lackofofscientifi scientificcknowledge. knowledge. 8.8. The Theestimate estimateisiswhat whatcomes comesout outofofthe theoven. oven.However, However,ititisis only onlyapproximate approximatebecause becauseofofone oneother otherreal-world real-worldfact factabout about data: data:they theyare arealways alwaysonly onlyaafifinite nitesample samplefrom fromaatheoretitheoretically callyinfi infinite nitepopulation. population.InInour ourrunning runningexample, example,the thesamsample pleconsists consistsofofthe thepatients patientswe wechoose choosetotostudy. study.Even Evenififwe we choose choosethem thematatrandom, random,there thereisisalways alwayssome somechance chancethat thatthe the proportions proportionsmeasured measuredininthe thesample sampleare arenot notrepresentative representativeofof the theproportions proportionsininthe thepopulation populationatatlarge. large.Fortunately, Fortunately,the the discipline disciplineofofstatistics, statistics,nowadays nowadaysempowered empoweredby byadvanced advanced techniques techniquesofofmachine machinelearning, learning,gives givesususmany, many,many manyways waystoto manage managethis thisuncertainty—parametric uncertainty—parametricand andsemi-parametric semi-parametric models, models,maximum maximumlikelihood likelihoodmethods, methods,and andpropensity propensityscores scores are areoften oftenused usedtotosmooth smooththe thesparse sparsedata. data. 9.9. InInthe theend, end,ififour ourmodel modelisiscorrect correctand andour ourdata dataare aresuffi sufficient, cient, we weget getan ananswer answertotoour ourcausal causalquery, query,such suchasas“Drug “DrugDDinincreases creasesthe theLifespan LifespanLLofofdiabetic diabeticPatients PatientsZZby by30 30percent, percent, plus plusor orminus minus20 20percent.” percent.”Hooray! Hooray!The Theanswer answerwill willalso also add addtotoour ourscientifi scientificcknowledge knowledge(box (box1)1)and, and,ififthings thingsdid didnot not go gothe theway waywe weexpected, expected,might mightsuggest suggestsome someimprovements improvements totoour ourcausal causalmodel model(box (box3). 3). This Thisflflowchart owchartmay maylook lookcomplicated complicatedatatfifirst, rst,and andyou youmight might wonder wonderwhether whetherititisisreally reallynecessary. necessary.Indeed, Indeed,ininour ourordinary ordinarylives, lives, we we are are somehow somehow able able toto make make causal causal judgments judgments without without conconsciously sciouslygoing goingthrough throughsuch suchaacomplicated complicatedprocess processand andcertainly certainly without withoutresorting resortingtotothe themathematics mathematicsofofprobabilities probabilitiesand andproporproportions. tions.Our Ourcausal causalintuition intuitionalone aloneisisusually usuallysuffi sufficient cientfor forhandling handling the thekind kindofofuncertainty uncertaintywe wefifind ndininhousehold householdroutines routinesor oreven evenininour our

9780141982410_TheBookOfWhy_TXT.indd 15

26/12/18 9:15 PM


16 16

THE THEBOOK BOOKOF OFWHY WHY

professional professionallives. lives.But Butififwe wewant wanttototeach teachaadumb dumbrobot robottotothink think causally, causally,or orififwe weare arepushing pushingthe thefrontiers frontiersofofscientifi scientificcknowledge, knowledge, where wherewe wedo donot nothave haveintuition intuitiontotoguide guideus, us,then thenaacarefully carefullystrucstructured turedprocedure procedurelike likethis thisisismandatory. mandatory. I Iespecially especiallywant wanttotohighlight highlightthe therole roleofofdata dataininthe theabove aboveproprocess. cess.First, First,notice noticethat thatwe wecollect collectdata dataonly onlyafter afterwe weposit positthe thecausal causal model, model,after afterwe westate statethe thescientifi scientificcquery querywe wewish wishtotoanswer, answer,and and after afterwe wederive derivethe theestimand. estimand.This Thiscontrasts contrastswith withthe thetraditional traditional statistical statisticalapproach, approach,mentioned mentionedabove, above,which whichdoes doesnot noteven evenhave have aacausal causalmodel. model. But Butour ourpresent-day present-dayscientifi scientificcworld worldpresents presentsaanew newchallenge challenge totosound soundreasoning reasoningabout aboutcauses causesand andeff effects. ects.While Whileawareness awarenessofof the theneed needfor foraacausal causalmodel modelhas hasgrown grownby byleaps leapsand andbounds boundsamong among the thesciences, sciences,many manyresearchers researchersininartifi artificial cialintelligence intelligencewould wouldlike like totoskip skipthe thehard hardstep stepofofconstructing constructingor oracquiring acquiringaacausal causalmodel model and andrely relysolely solelyon ondata datafor forall allcognitive cognitivetasks. tasks.The Thehope—and hope—andatat present, present,ititisisusually usuallyaasilent silentone—is one—isthat thatthe thedata datathemselves themselveswill will guide guideus ustotothe theright rightanswers answerswhenever whenevercausal causalquestions questionscome comeup. up. I Iam aman anoutspoken outspokenskeptic skepticofofthis thistrend trendbecause becauseI Iknow knowhow how profoundly profoundlydumb dumbdata dataare areabout aboutcauses causesand andeff effects. ects.For Forexample, example, information informationabout aboutthe theeff effects ectsofofactions actionsor orinterventions interventionsisissimply simply not notavailable availableininraw rawdata, data,unless unlessititisiscollected collectedby bycontrolled controlledexexperimental perimentalmanipulation. manipulation.By Bycontrast, contrast,ififwe weare areininpossession possessionofof aacausal causalmodel, model,we wecan canoften oftenpredict predictthe theresult resultofofan anintervention intervention from fromhands-off hands-off, ,intervention-free intervention-freedata. data. The The case case for for causal causal models models becomes becomes even even more more compelling compelling when when we we seek seek toto answer answer counterfactual counterfactual queries queries such such asas “What “What would wouldhave havehappened happenedhad hadwe weacted acteddiff differently?” erently?”We Wewill willdiscuss discuss counterfactuals counterfactualsiningreat greatdetail detailbecause becausethey theyare arethe themost mostchallengchallenging ingqueries queriesfor forany anyartifi artificial cialintelligence. intelligence.They Theyare arealso alsoatatthe thecore core ofofthe thecognitive cognitiveadvances advancesthat thatmade madeus ushuman humanand andthe theimaginaimaginative tiveabilities abilitiesthat thathave havemade madescience sciencepossible. possible.We Wewill willalso alsoexplain explain why whyany anyquery queryabout aboutthe themechanism mechanismby bywhich whichcauses causestransmit transmit their theireff effects—the ects—themost mostprototypical prototypical“Why?” “Why?”question—is question—isactually actually aacounterfactual counterfactualquestion questioninindisguise. disguise.Thus, Thus,ififwe weever everwant wantrobots robots totoanswer answer“Why?” “Why?”questions questionsor oreven evenunderstand understandwhat whatthey theymean, mean,

9780141982410_TheBookOfWhy_TXT.indd 16

26/12/18 9:15 PM


Introduction: Introduction:Mind Mindover overData Data

17 17

we wemust mustequip equipthem themwith withaacausal causalmodel modeland andteach teachthem themhow howtoto answer answercounterfactual counterfactualqueries, queries,asasininFigure FigureI.1. I.1. Another Anotheradvantage advantagecausal causalmodels modelshave havethat thatdata datamining miningand and deep deeplearning learninglack lackisisadaptability. adaptability.Note Notethat thatininFigure FigureI.1, I.1,the theestiestimand mandisiscomputed computedon onthe thebasis basisofofthe thecausal causalmodel modelalone, alone,prior priortoto an anexamination examinationofofthe thespecifi specificscsofofthe thedata. data.This Thismakes makesthe thecausal causal inference inferenceengine enginesupremely supremelyadaptable, adaptable,because becausethe theestimand estimandcomcomputed putedisisgood goodfor forany anydata datathat thatare arecompatible compatiblewith withthe thequalitaqualitative tivemodel, model,regardless regardlessofofthe thenumerical numericalrelationships relationshipsamong amongthe the variables. variables. To To see see why why this this adaptability adaptability isis important, important, compare compare this this enengine ginewith withaalearning learningagent—in agent—inthis thisinstance instanceaahuman, human,but butininother other cases casesperhaps perhapsaadeep-learning deep-learningalgorithm algorithmor ormaybe maybeaahuman humanusing using aadeep-learning deep-learningalgorithm—trying algorithm—tryingtotolearn learnsolely solelyfrom fromthe thedata. data.By By observing observingthe theoutcome outcomeLLofofmany manypatients patientsgiven givenDrug DrugD, D,she sheisisable able totopredict predictthe theprobability probabilitythat thataapatient patientwith withcharacteristics characteristicsZZwill will survive surviveLLyears. years.Now Nowshe sheisistransferred transferredtotoaadiff different erenthospital, hospital,ininaa diff different erentpart partofoftown, town,where wherethe thepopulation populationcharacteristics characteristics(diet, (diet, hygiene, hygiene,work workhabits) habits)are arediff different. erent.Even Evenififthese thesenew newcharactercharacteristics isticsmerely merelymodify modifythe thenumerical numericalrelationships relationshipsamong amongthe thevarivariables ablesrecorded, recorded,she shewill willstill stillhave havetotoretrain retrainherself herselfand andlearn learnaanew new prediction predictionfunction functionall allover overagain. again.That’s That’sall allthat thataadeepdeep-learning learning program programcan cando: do:fifit taafunction functiontotodata. data.On Onthe theother otherhand, hand,ififshe she possessed possessedaamodel modelofofhow howthe thedrug drugoperated operatedand andits itscausal causalstructure structure remained remainedintact intactininthe thenew newlocation, location,then thenthe theestimand estimandshe sheobtained obtained inintraining trainingwould wouldremain remainvalid. valid.ItItcould couldbe beapplied appliedtotothe thenew newdata data totogenerate generateaanew newpopulation-specifi population-specificcprediction predictionfunction. function. Many Manyscientifi scientificcquestions questionslook lookdiff different erent“through “throughaacausal causallens,” lens,” and andI Ihave havedelighted delightedininplaying playingwith withthis thislens, lens,which whichover overthe thelast last twenty-fi twenty-five veyears yearshas hasbeen beenincreasingly increasinglyempowered empoweredby bynew newinsights insights and andnew newtools. tools.I Ihope hopeand andbelieve believethat thatreaders readersofofthis thisbook bookwill will share shareininmy mydelight. delight.Therefore, Therefore,I’d I’dlike liketotoclose closethis thisintroduction introduction with withaapreview previewofofsome someofofthe thecoming comingattractions attractionsininthis thisbook. book. Chapter Chapter11assembles assemblesthe thethree threesteps stepsofofobservation, observation,intervenintervention, tion,and andcounterfactuals counterfactualsinto intothe theLadder LadderofofCausation, Causation,the thecencentral tralmetaphor metaphorofofthis thisbook. book.ItItwill willalso alsoexpose exposeyou youtotothe thebasics basicsofof

9780141982410_TheBookOfWhy_TXT.indd 17

26/12/18 9:15 PM


18 18

THE THEBOOK BOOKOF OFWHY WHY

reasoning reasoningwith withcausal causaldiagrams, diagrams,our ourmain mainmodeling modelingtool, tool,and andset set you youwell wellon onyour yourway waytotobecoming becomingaaprofi proficient cientcausal causalreasoner—in reasoner—in fact, fact,you youwill willbe befar farahead aheadofofgenerations generationsofofdata datascientists scientistswho whoatattempted temptedtotointerpret interpretdata datathrough throughaamodel-blind model-blindlens, lens,oblivious oblivioustoto the thedistinctions distinctionsthat thatthe theLadder LadderofofCausation Causationilluminates. illuminates. Chapter Chapter22tells tellsthe thebizarre bizarrestory storyofofhow howthe thediscipline disciplineofofstatisstatistics ticsinfl inflicted ictedcausal causalblindness blindnesson onitself, itself,with withfar-reaching far-reachingeff effects ectsfor for all allsciences sciencesthat thatdepend dependon ondata. data.ItItalso alsotells tellsthe thestory storyofofone oneofofthe the great greatheroes heroesofofthis thisbook, book,the thegeneticist geneticistSewall SewallWright, Wright,who whoininthe the 1920s 1920sdrew drewthe thefifirst rstcausal causaldiagrams diagramsand andfor formany manyyears yearswas wasone oneofof the thefew fewscientists scientistswho whodared daredtototake takecausality causalityseriously. seriously. Chapter Chapter33relates relatesthe theequally equallycurious curiousstory storyofofhow howI Ibecame becameaa convert converttotocausality causalitythrough throughmy mywork workininAI AIand andparticularly particularlyon on Bayesian Bayesiannetworks. networks.These Thesewere werethe thefifirst rsttool toolthat thatallowed allowedcomputcomputers erstotothink thinkinin“shades “shadesofofgray”—and gray”—andfor foraatime timeI Ibelieved believedthey theyheld held the thekey keytotounlocking unlockingAI. AI.Toward Towardthe theend endofofthe the1980s 1980sI Ibecame became convinced convincedthat thatI Iwas waswrong, wrong,and andthis thischapter chaptertells tellsofofmy myjourney journey from from prophet prophet toto apostate. apostate. Nevertheless, Nevertheless, Bayesian Bayesian networks networks reremain mainaavery veryimportant importanttool toolfor forAI AIand andstill stillencapsulate encapsulatemuch muchofof the themathematical mathematicalfoundation foundationofofcausal causaldiagrams. diagrams.InInaddition additiontotoaa gentle, gentle,causality-minded causality-mindedintroduction introductiontotoBayes’s Bayes’srule ruleand andBayesian Bayesian methods methodsofofreasoning, reasoning,Chapter Chapter33will willentertain entertainthe thereader readerwith withexexamples amplesofofreal-life real-lifeapplications applicationsofofBayesian Bayesiannetworks. networks. Chapter Chapter 44 tells tells about about the the major major contribution contribution ofof statistics statistics toto causal causalinference: inference:the therandomized randomizedcontrolled controlledtrial trial(RCT). (RCT).From Fromaa causal causalperspective, perspective,the theRCT RCTisisaaman-made man-madetool toolfor foruncovering uncoveringthe the query queryP(L P(L| |do(D)), do(D)),which whichisisaaproperty propertyofofnature. nature.Its Itsmain mainpurpose purpose isistotodisassociate disassociatevariables variablesofofinterest interest(say, (say,DDand andL) L)from fromother other variables variables(Z) (Z)that thatwould wouldotherwise otherwiseaff affect ectthem themboth. both.Disarming Disarmingthe the distortions, distortions,or or“confounding,” “confounding,”produced producedby bysuch suchlurking lurkingvariables variables has hasbeen beenaacentury-old century-oldproblem. problem.This Thischapter chapterwalks walksthe thereader reader through throughaasurprisingly surprisinglysimple simplesolution solutiontotothe thegeneral generalconfounding confounding problem, problem,which whichyou youwill willgrasp graspininten tenminutes minutesofofplayfully playfullytracing tracing paths pathsininaadiagram. diagram. Chapter Chapter55gives givesan anaccount accountofofaaseminal seminalmoment momentininthe thehistory history ofofcausation causationand andindeed indeedthe thehistory historyofofscience, science,when whenstatisticians statisticians

9780141982410_TheBookOfWhy_TXT.indd 18

26/12/18 9:15 PM


Introduction: Introduction:Mind Mindover overData Data

19 19

struggled struggledwith withthe thequestion questionofofwhether whethersmoking smokingcauses causeslung lungcancancer. cer.Unable Unabletotouse usetheir theirfavorite favoritetool, tool,the therandomized randomizedcontrolled controlled trial, trial,they theystruggled struggledtotoagree agreeon onan ananswer answeror oreven evenon onhow howtotomake make sense senseofofthe thequestion. question.The Thesmoking smokingdebate debatebrings bringsthe theimportance importance ofofcausality causalityinto intoits itssharpest sharpestfocus. focus.Millions Millionsofoflives liveswere werelost lostor or shortened shortenedbecause becausescientists scientistsdid didnot nothave havean anadequate adequatelanguage languageor or methodology methodologyfor foranswering answeringcausal causalquestions. questions. Chapter Chapter66will, will,I Ihope, hope,be beaawelcome welcomediversion diversionfor forthe thereader reader after afterthe theserious seriousmatters mattersofofChapter Chapter5.5.This Thisisisaachapter chapterofofparparadoxes: adoxes:the theMonty MontyHall Hallparadox, paradox,Simpson’s Simpson’sparadox, paradox,Berkson’s Berkson’s paradox, paradox,and andothers. others.Classical Classicalparadoxes paradoxeslike likethese thesecan canbe beenjoyed enjoyed asasbrainteasers, brainteasers,but butthey theyhave haveaaserious seriousside sidetoo, too,especially especiallywhen when viewed viewedfrom fromaacausal causalperspective. perspective.InInfact, fact,almost almostall allofofthem themreprerepresent sentclashes clasheswith withcausal causalintuition intuitionand andtherefore thereforereveal revealthe theanatomy anatomy ofofthat thatintuition. intuition.They Theywere werecanaries canariesininthe thecoal coalmine minethat thatshould should have havealerted alertedscientists scientiststotothe thefact factthat thathuman humanintuition intuitionisisgrounded grounded inincausal, causal,not notstatistical, statistical,logic. logic.I Ibelieve believethat thatthe thereader readerwill willenjoy enjoy this thisnew newtwist twiston onhis hisor orher herfavorite favoriteold oldparadoxes. paradoxes. Chapters Chapters77toto99fifinally nallytake takereaders readerson onaathrilling thrillingascent ascentofofthe the Ladder LadderofofCausation. Causation.We Westart startininChapter Chapter77with withquestions questionsabout about intervention interventionand andexplain explainhow howmy mystudents studentsand andI Iwent wentthrough throughaa twenty-year twenty-yearstruggle struggletotoautomate automatethe theanswers answerstotodo-type do-typequestions. questions. We Wesucceeded, succeeded,and andthis thischapter chapterexplains explainsthe theguts gutsofofthe the“causal “causal inference inferenceengine,” engine,”which whichproduces producesthe theyes/no yes/noanswer answerand andthe theestiestimand mandininFigure FigureI.1. I.1.Studying Studyingthis thisengine enginewill willempower empowerthe thereader readertoto spot spotcertain certainpatterns patternsininthe thecausal causaldiagram diagramthat thatdeliver deliverimmediate immediate answers answerstotothe thecausal causalquery. query.These Thesepatterns patternsare arecalled calledback-door back-door adjustment, adjustment, front-door front-door adjustment, adjustment, and and instrumental instrumental variables, variables, the theworkhorses workhorsesofofcausal causalinference inferenceininpractice. practice. Chapter Chapter 88 takes takes you you toto the the top top ofof the the ladder ladder by by discussing discussing counterfactuals. counterfactuals.These Thesehave havebeen beenseen seenasasaafundamental fundamentalpart partofof causality causality atat least least since since 1748, 1748, when when Scottish Scottish philosopher philosopher David David Hume Humeproposed proposedthe thefollowing followingsomewhat somewhatcontorted contorteddefi definition nitionofof causation: causation:“We “Wemay maydefi define neaacause causetotobe bean anobject objectfollowed followedby byananother, other,and andwhere whereall allthe theobjects, objects,similar similartotothe thefifirst, rst,are arefollowed followedby by objects objectssimilar similartotothe thesecond. second.Or, Or,ininother otherwords, words,where, where,ififthe thefifirst rst

9780141982410_TheBookOfWhy_TXT.indd 19

26/12/18 9:15 PM


20 20

THE THEBOOK BOOKOF OFWHY WHY

object objecthad hadnot notbeen, been,the thesecond secondnever neverhad hadexisted.” existed.”David DavidLewis, Lewis,aa philosopher philosopheratatPrinceton PrincetonUniversity Universitywho whodied diedinin2001, 2001,pointed pointedout out that thatHume Humereally reallygave gavetwo twodefi definitions, nitions,not notone, one,the thefifirst rstofofregularregularity ity(i.e., (i.e.,the thecause causeisisregularly regularlyfollowed followedby bythe theeff effect) ect)and andthe thesecond second ofofthe thecounterfactual counterfactual(“if (“ifthe thefifirst rstobject objecthad hadnot notbeen . . .”). been . . .”).While While philosophers philosophersand andscientists scientistshad hadmostly mostlypaid paidattention attentiontotothe theregregularity ularitydefi definition, nition,Lewis Lewisargued arguedthat thatthe thecounterfactual counterfactualdefi definition nition aligns alignsmore moreclosely closelywith withhuman humanintuition: intuition:“We “Wethink thinkofofaacause cause asassomething somethingthat thatmakes makesaadiff difference, erence,and andthe thediff difference erenceititmakes makes must mustbe beaadiff difference erencefrom fromwhat whatwould wouldhave havehappened happenedwithout withoutit.” it.” Readers Readerswill willbe beexcited excitedtotofifind ndout outthat thatwe wecan cannow nowmove movepast past the theacademic academicdebates debatesand andcompute computean anactual actualvalue value(or (orprobabilprobability) ity)for forany anycounterfactual counterfactualquery, query,no nomatter matterhow howconvoluted. convoluted.Of Of special specialinterest interestare arequestions questionsconcerning concerningnecessary necessaryand andsuffi sufficient cient causes causesofofobserved observedevents. events.For Forexample, example,how howlikely likelyisisititthat thatthe the defendant’s defendant’saction actionwas wasaanecessary necessarycause causeofofthe theclaimant’s claimant’sinjury? injury? How Howlikely likelyisisititthat thatman-made man-madeclimate climatechange changeisisaasuffi sufficient cientcause cause ofofaaheat heatwave? wave? Finally, Finally,Chapter Chapter99discusses discussesthe thetopic topicofofmediation. mediation.You Youmay may have havewondered, wondered,when whenwe wetalked talkedabout aboutdrawing drawingarrows arrowsininaacausal causal diagram, diagram,whether whetherwe weshould shoulddraw drawan anarrow arrowfrom fromDrug DrugDDtotoLifesLifespan panLLififthe thedrug drugaff affects ectslifespan lifespanonly onlyby byway wayofofits itseff effect ecton onblood blood pressure pressureZZ(a(amediator). mediator).InInother otherwords, words,isisthe theeff effect ectofofDDon onLLdidirect rector orindirect? indirect?And Andififboth, both,how howdo dowe weassess assesstheir theirrelative relativeimporimportance? tance?Such Suchquestions questionsare arenot notonly onlyofofgreat greatscientifi scientificcinterest interestbut but also alsohave havepractical practicalramifi ramifications; cations;ififwe weunderstand understandthe themechanism mechanism through throughwhich whichaadrug drugacts, acts,we wemight mightbe beable abletotodevelop developother otherdrugs drugs with withthe thesame sameeff effect ectthat thatare arecheaper cheaperor orhave havefewer fewerside sideeff effects. ects. The Thereader readerwill willbe bepleased pleasedtotodiscover discoverhow howthis thisage-old age-oldquest questfor for aamediation mediationmechanism mechanismhas hasbeen beenreduced reducedtotoan analgebraic algebraicexercise exercise and andhow howscientists scientistsare areusing usingthe thenew newtools toolsininthe thecausal causaltool toolkit kittoto solve solvesuch suchproblems. problems. Chapter Chapter10 10brings bringsthe thebook booktotoaaclose closeby bycoming comingback backtotothe the problem problemthat thatinitially initiallyled ledme metotocausation: causation:the theproblem problemofofautomatautomating inghuman-level human-levelintelligence intelligence(sometimes (sometimescalled called“strong “strongAI”). AI”).I Ibebelieve lievethat thatcausal causalreasoning reasoningisisessential essentialfor formachines machinestotocommunicate communicate

9780141982410_TheBookOfWhy_TXT.indd 20

26/12/18 9:15 PM


Introduction: Introduction:Mind Mindover overData Data

21 21

with withus usininour ourown ownlanguage languageabout aboutpolicies, policies,experiments, experiments,explanaexplanations, tions,theories, theories,regret, regret,responsibility, responsibility,free freewill, will,and andobligations— obligations— and, and,eventually, eventually,totomake maketheir theirown ownmoral moraldecisions. decisions. IfIfI Icould couldsum sumup upthe themessage messageofofthis thisbook bookininone onepithy pithyphrase, phrase,itit would wouldbe bethat thatyou youare aresmarter smarterthan thanyour yourdata. data.Data Datado donot notunderunderstand standcauses causesand andeff effects; ects;humans humansdo. do.I Ihope hopethat thatthe thenew newscience scienceofof causal causalinference inferencewill willenable enableus ustotobetter betterunderstand understandhow howwe wedo doit,it, because becausethere thereisisno nobetter betterway waytotounderstand understandourselves ourselvesthan thanby byememulating ulatingourselves. ourselves.InInthe theage ageofofcomputers, computers,this thisnew newunderstanding understanding also alsobrings bringswith withititthe theprospect prospectofofamplifying amplifyingour ourinnate innateabilities abilitiesso so that thatwe wecan canmake makebetter bettersense senseofofdata, data,be beititbig bigor orsmall. small.

9780141982410_TheBookOfWhy_TXT.indd 21

26/12/18 9:15 PM


9780141982410_TheBookOfWhy_TXT.indd 22

26/12/18 9:15 PM


11 THE THE LADDER LADDER OF OF CAUSATION CAUSATION

II

NNthe thebeginning beginning. .. .. . I Iwas wasprobably probablysix sixor orseven sevenyears yearsold oldwhen whenI Ififirst rstread readthe thestory story ofofAdam Adamand andEve Eveininthe theGarden GardenofofEden. Eden.My Myclassmates classmatesand andI Iwere were not notatatall allsurprised surprisedby byGod’s God’scapricious capriciousdemands, demands,forbidding forbiddingthem them totoeat eatfrom fromthe theTree TreeofofKnowledge. Knowledge.Deities Deitieshave havetheir theirreasons, reasons,we we thought. thought.What Whatwe wewere weremore moreintrigued intriguedby bywas wasthe theidea ideathat thatasas soon soonasasthey theyate atefrom fromthe theTree TreeofofKnowledge, Knowledge,Adam Adamand andEve Evebebecame cameconscious, conscious,like likeus, us,ofoftheir theirnakedness. nakedness. As Asteenagers, teenagers,our ourinterest interestshifted shiftedslowly slowlytotothe themore morephilosophiphilosophical calaspects aspectsofofthe thestory. story.(Israeli (Israelistudents studentsread readGenesis Genesisseveral severaltimes times aayear.) year.)Of Ofprimary primaryconcern concerntotous uswas wasthe thenotion notionthat thatthe theemergence emergence ofofhuman humanknowledge knowledgewas wasnot notaajoyful joyfulprocess processbut butaapainful painfulone, one,acaccompanied companiedby bydisobedience, disobedience,guilt, guilt,and andpunishment. punishment.Was Wasititworth worth giving givingup upthe thecarefree carefreelife lifeofofEden? Eden?some someasked. asked.Were Werethe theagriculagricultural turaland andscientifi scientificcrevolutions revolutionsthat thatfollowed followedworth worththe theeconomic economic hardships, hardships,wars, wars,and andsocial socialinjustices injusticesthat thatmodern modernlife lifeentails? entails? Don’t Don’tget getme mewrong: wrong:we wewere wereno nocreationists; creationists;even evenour ourteachers teachers were wereDarwinists Darwinistsatatheart. heart.We Weknew, knew,however, however,that thatthe theauthor authorwho who choreographed choreographedthe thestory storyofofGenesis Genesisstruggled struggledtotoanswer answerthe themost most pressing pressingphilosophical philosophicalquestions questionsofofhis histime. time.We Welikewise likewisesuspected suspected that thatthis thisstory storybore borethe thecultural culturalfootprints footprintsofofthe theactual actualprocess process by bywhich whichHomo Homosapiens sapiensgained gaineddominion dominionover overour ourplanet. planet.What, What, then, then,was wasthe thesequence sequenceofofsteps stepsininthis thisspeedy, speedy,super-evolutionary super-evolutionary process? process?

23 23

9780141982410_TheBookOfWhy_TXT.indd 23

26/12/18 9:15 PM


24 24

THE THEBOOK BOOKOF OFWHY WHY

My Myinterest interestininthese thesequestions questionswaned wanedininmy myearly earlycareer careerasasaa professor professorofofengineering engineeringbut butwas wasreignited reignitedsuddenly suddenlyininthe the1990s, 1990s, when, when,while whilewriting writingmy mybook bookCausality, Causality,I Iconfronted confrontedthe theLadder Ladder ofofCausation. Causation. As AsI Ireread rereadGenesis Genesisfor forthe thehundredth hundredthtime, time,I Inoticed noticedaanuance nuance that thathad hadsomehow somehoweluded eludedmy myattention attentionfor forall allthose thoseyears. years.When When God Godfifinds ndsAdam Adamhiding hidingininthe thegarden, garden,he heasks, asks,“Have “Haveyou youeaten eaten from fromthe thetree treewhich whichI Iforbade forbadeyou?” you?”And AndAdam Adamanswers, answers,“The “The woman womanyou yougave gaveme mefor foraacompanion, companion,she shegave gaveme mefruit fruitfrom fromthe the tree treeand andI Iate.” ate.”“What “Whatisisthis thisyou youhave havedone?” done?”God Godasks asksEve. Eve.She She replies, replies,“The “Theserpent serpentdeceived deceivedme, me,and andI Iate.” ate.” As Aswe weknow, know,this thisblame blamegame gamedid didnot notwork workvery verywell wellon onthe the Almighty, Almighty,who whobanished banishedboth bothofofthem themfrom fromthe thegarden. garden.But Buthere here isis the the point point I I had had missed missed before: before: God God asked asked “what,” “what,” and and they they answered answered“why.” “why.”God Godasked askedfor forthe thefacts, facts,and andthey theyreplied repliedwith with explanations. explanations. Moreover, Moreover, both both were were thoroughly thoroughly convinced convinced that that naming namingcauses causeswould wouldsomehow somehowpaint painttheir theiractions actionsininaadiff different erent light. light.Where Wheredid didthey theyget getthis thisidea? idea? For For me, me, these these nuances nuances carried carried three three profound profound implications. implications. First, First, very very early early inin our our evolution, evolution, we we humans humans realized realized that that the the world worldisisnot notmade madeup uponly onlyofofdry dryfacts facts(what (whatwe wemight mightcall calldata data today); today);rather, rather,these thesefacts factsare areglued gluedtogether togetherby byan anintricate intricateweb web ofofcause-eff cause-effect ectrelationships. relationships.Second, Second,causal causalexplanations, explanations,not notdry dry facts, facts,make makeup upthe thebulk bulkofofour ourknowledge, knowledge,and andshould shouldbe bethe thecorcornerstone nerstoneofofmachine machineintelligence. intelligence.Finally, Finally,our ourtransition transitionfrom fromproprocessors cessorsofofdata datatotomakers makersofofexplanations explanationswas wasnot notgradual; gradual;ititwas wasaa leap leapthat thatrequired requiredan anexternal externalpush pushfrom froman anuncommon uncommonfruit. fruit.This This matched matchedperfectly perfectlywith withwhat whatI Ihad hadobserved observedtheoretically theoreticallyininthe the Ladder LadderofofCausation: Causation:No Nomachine machinecan canderive deriveexplanations explanationsfrom from raw rawdata. data.ItItneeds needsaapush. push. IfIfwe weseek seekconfi confirmation rmationofofthese thesemessages messagesfrom fromevolutionary evolutionary science, science,we wewon’t won’tfifind ndthe theTree TreeofofKnowledge, Knowledge,ofofcourse, course,but butwe we still stillsee seeaamajor majorunexplained unexplainedtransition. transition.We Weunderstand understandnow nowthat that humans humansevolved evolvedfrom fromapelike apelikeancestors ancestorsover overaaperiod periodofof55million million toto66million millionyears yearsand andthat thatsuch suchgradual gradualevolutionary evolutionaryprocesses processes are arenot notuncommon uncommontotolife lifeon onearth. earth.But Butininroughly roughlythe thelast last50,000 50,000 years, years,something somethingunique uniquehappened, happened,which whichsome somecall callthe theCognitive Cognitive

9780141982410_TheBookOfWhy_TXT.indd 24

26/12/18 9:15 PM


The TheLadder LadderofofCausation Causation

25 25

Revolution Revolutionand andothers others(with (withaatouch touchofofirony) irony)call callthe theGreat GreatLeap Leap Forward. Forward.Humans Humansacquired acquiredthe theability abilitytotomodify modifytheir theirenvironenvironment mentand andtheir theirown ownabilities abilitiesatataadramatically dramaticallyfaster fasterrate. rate. For Forexample, example,over overmillions millionsofofyears, years,eagles eaglesand andowls owlshave haveevolved evolved truly trulyamazing amazingeyesight—yet eyesight—yetthey’ve they’venever neverdevised devisedeyeglasses, eyeglasses,mimicroscopes, croscopes,telescopes, telescopes,or ornight-vision night-visiongoggles. goggles.Humans Humanshave haveproproduced ducedthese thesemiracles miraclesininaamatter matterofofcenturies. centuries.I Icall callthis thisphenomenon phenomenon the the“super-evolutionary “super-evolutionaryspeedup.” speedup.”Some Somereaders readersmight mightobject objecttotomy my comparing comparingapples applesand andoranges, oranges,evolution evolutiontotoengineering, engineering,but butthat that isisexactly exactlymy mypoint. point.Evolution Evolutionhas hasendowed endowedus uswith withthe theability abilitytoto engineer engineerour ourlives, lives,aagift giftshe shehas hasnot notbestowed bestowedon oneagles eaglesand andowls, owls, and andthe thequestion, question,again, again,isis“Why?” “Why?”What Whatcomputational computationalfacility facilitydid did humans humanssuddenly suddenlyacquire acquirethat thateagles eaglesdid didnot? not? Many Manytheories theorieshave havebeen beenproposed, proposed,but butone oneisisespecially especiallypertipertinent nenttotothe theidea ideaofofcausation. causation.InInhis hisbook bookSapiens, Sapiens,historian historianYuval Yuval Harari Harariposits positsthat thatour ourancestors’ ancestors’capacity capacitytotoimagine imaginenonexistent nonexistent things thingswas wasthe thekey keytotoeverything, everything,for forititallowed allowedthem themtotocommunicommunicate catebetter. better.Before Beforethis thischange, change,they theycould couldonly onlytrust trustpeople peoplefrom from their theirimmediate immediatefamily familyor ortribe. tribe.Afterward Afterwardtheir theirtrust trustextended extendedtoto larger largercommunities, communities,bound boundby bycommon commonfantasies fantasies(for (forexample, example,bebelief liefinininvisible invisibleyet yetimaginable imaginabledeities, deities,ininthe theafterlife, afterlife,and andininthe the divinity divinityofofthe theleader) leader)and andexpectations. expectations.Whether Whetheror ornot notyou youagree agree with withHarari’s Harari’stheory, theory,the theconnection connectionbetween betweenimagining imaginingand andcausal causal relations relationsisisalmost almostself-evident. self-evident.ItItisisuseless uselesstotoask askfor forthe thecauses causesofof things thingsunless unlessyou youcan canimagine imaginetheir theirconsequences. consequences.Conversely, Conversely,you you cannot cannotclaim claimthat thatEve Evecaused causedyou youtotoeat eatfrom fromthe thetree treeunless unlessyou you can canimagine imagineaaworld worldininwhich, which,counter countertotofacts, facts,she shedid didnot nothand hand you youthe theapple. apple. Back Back toto our our Homo Homo sapiens sapiens ancestors: ancestors: their their newly newly acquired acquired causal causalimagination imaginationenabled enabledthem themtotodo domany manythings thingsmore moreeffi efficiently ciently through throughaatricky trickyprocess processwe wecall call“planning.” “planning.”Imagine Imagineaatribe tribeprepreparing paringfor foraamammoth mammothhunt. hunt.What Whatwould woulditittake takefor forthem themtotosucsucceed? ceed?My Mymammoth-hunting mammoth-huntingskills skillsare arerusty, rusty,I Imust mustadmit, admit,but butasasaa student studentofofthinking thinkingmachines, machines,I Ihave havelearned learnedone onething: thing:aathinking thinking entity entity(computer, (computer,caveman, caveman,or orprofessor) professor)can canonly onlyaccomplish accomplishaa task taskofofsuch suchmagnitude magnitudeby byplanning planningthings thingsininadvance—by advance—bydeciding deciding how howmany manyhunters hunterstotorecruit; recruit;by bygauging, gauging,given givenwind windconditions, conditions,

9780141982410_TheBookOfWhy_TXT.indd 25

26/12/18 9:15 PM


26 26

THE THEBOOK BOOKOF OFWHY WHY

the the direction direction from from which which toto approach approach the the mammoth; mammoth; inin short, short, by byimagining imaginingand andcomparing comparingthe theconsequences consequencesofofseveral severalhunting hunting strategies. strategies.To Todo dothis, this,the thethinking thinkingentity entitymust mustpossess, possess,consult, consult,and and manipulate manipulateaamental mentalmodel modelofofits itsreality. reality.

Figure Figure1.1. 1.1. Perceived Perceivedcauses causesofofaasuccessful successfulmammoth mammothhunt. hunt.

Figure Figure1.1 1.1shows showshow howwe wemight mightdraw drawsuch suchaamental mentalmodel. model. Each Eachdot dotininFigure Figure1.1 1.1represents representsaacause causeofofsuccess. success.Note Notethat thatthere there are aremultiple multiplecauses causesand andthat thatnone noneofofthem themare aredeterministic. deterministic.That That is, is,we wecannot cannotbe besure surethat thathaving havingmore morehunters hunterswill willenable enablesuccess success or orthat thatrain rainwill willprevent preventit,it,but butthese thesefactors factorsdo dochange changethe theprobaprobability bilityofofsuccess. success. The Themental mentalmodel modelisisthe thearena arenawhere whereimagination imaginationtakes takesplace. place. ItItenables enablesus ustotoexperiment experimentwith withdiff different erentscenarios scenariosby bymaking makinglolocal calalterations alterationstotothe themodel. model.Somewhere Somewhereininour ourhunters’ hunters’mental mental model modelwas wasaasubroutine subroutinethat thatevaluated evaluatedthe theeff effect ectofofthe thenumber numberofof hunters. hunters.When Whenthey theyconsidered consideredadding addingmore, more,they theydidn’t didn’thave havetoto evaluate evaluateevery everyother otherfactor factorfrom fromscratch. scratch.They Theycould couldmake makeaalocal local change changetotothe themodel, model,replacing replacing“Hunters “Hunters==8” 8”with with“Hunters “Hunters==9,” 9,” and andreevaluate reevaluatethe theprobability probabilityofofsuccess. success.This Thismodularity modularityisisaakey key feature featureofofcausal causalmodels. models. I Idon’t don’tmean meantotoimply, imply,ofofcourse, course,that thatearly earlyhumans humansactually actually drew drewaapictorial pictorialmodel modellike likethis thisone. one.But Butwhen whenwe weseek seektotoememulate ulatehuman humanthought thoughton onaacomputer, computer,or orindeed indeedwhen whenwe wetry trytoto solve solveunfamiliar unfamiliarscientifi scientificcproblems, problems,drawing drawingan anexplicit explicitdots-anddots-andarrows arrowspicture pictureisisextremely extremelyuseful. useful.These Thesecausal causaldiagrams diagramsare arethe the

9780141982410_TheBookOfWhy_TXT.indd 26

26/12/18 9:15 PM


The TheLadder LadderofofCausation Causation

27 27

computational computationalcore coreofofthe the“causal “causalinference inferenceengine” engine”described describedinin the theIntroduction. Introduction.

THE THETHREE THREELEVELS LEVELSOF OFCAUSATION CAUSATION So Sofar farI Imay mayhave havegiven giventhe theimpression impressionthat thatthe theability abilitytotoorganize organize our ourknowledge knowledgeofofthe theworld worldinto intocauses causesand andeff effects ectswas wasmonolithic monolithic and andacquired acquiredall allatatonce. once.InInfact, fact,my myresearch researchon onmachine machinelearning learning has hastaught taughtme methat thataacausal causallearner learnermust mustmaster masteratatleast leastthree threedisdistinct tinctlevels levelsofofcognitive cognitiveability: ability:seeing, seeing,doing, doing,and andimagining. imagining. The Thefifirst, rst,seeing seeingor orobserving, observing,entails entailsdetection detectionofofregularities regularities ininour ourenvironment environmentand andisisshared sharedby bymany manyanimals animalsasaswell wellasasearly early humans humansbefore beforethe theCognitive CognitiveRevolution. Revolution.The Thesecond, second,doing, doing,enentails tailspredicting predictingthe theeff effect(s) ect(s)ofofdeliberate deliberatealterations alterationsofofthe theenvironenvironment mentand andchoosing choosingamong amongthese thesealterations alterationstotoproduce produceaadesired desired outcome. outcome.Only Onlyaasmall smallhandful handfulofofspecies specieshave havedemonstrated demonstratedeleelements mentsofofthis thisskill. skill.Use Useofoftools, tools,provided providedititisisintentional intentionaland andnot not just justaccidental accidentalor orcopied copiedfrom fromancestors, ancestors,could couldbe betaken takenasasaasign sign ofofreaching reachingthis thissecond secondlevel. level.Yet Yeteven eventool toolusers usersdo donot notnecessarily necessarily possess possessaa“theory” “theory”ofoftheir theirtool toolthat thattells tellsthem themwhy whyititworks worksand and what whattotodo dowhen whenititdoesn’t. doesn’t.For Forthat, that,you youneed needtotohave haveachieved achievedaa level levelofofunderstanding understandingthat thatpermits permitsimagining. imagining.ItItwas wasprimarily primarilythis this third thirdlevel levelthat thatprepared preparedus usfor forfurther furtherrevolutions revolutionsininagriculture agriculture and andscience scienceand andled ledtotoaasudden suddenand anddrastic drasticchange changeininour ourspecies’ species’ impact impacton onthe theplanet. planet. I Icannot cannotprove provethis, this,but butI Ican canprove provemathematically mathematicallythat thatthe the three threelevels levelsdiff differerfundamentally, fundamentally,each eachunleashing unleashingcapabilities capabilitiesthat that the theones onesbelow belowititdo donot. not.The Theframework frameworkI Iuse usetotoshow showthis thisgoes goes back backtotoAlan AlanTuring, Turing,the thepioneer pioneerofofresearch researchininartifi artificial cialintelliintelligence gence(AI), (AI),who whoproposed proposedtotoclassify classifyaacognitive cognitivesystem systemininterms terms ofofthe thequeries queriesititcan cananswer. answer.This Thisapproach approachisisexceptionally exceptionallyfruitfruitful fulwhen whenwe weare aretalking talkingabout aboutcausality causalitybecause becauseititbypasses bypasseslong long and andunproductive unproductivediscussions discussionsofofwhat whatexactly exactlycausality causalityisisand andfofocuses cusesinstead insteadon onthe theconcrete concreteand andanswerable answerablequestion question“What “Whatcan can aacausal causalreasoner reasonerdo?” do?”Or Ormore moreprecisely, precisely,what whatcan canan anorganism organism possessing possessingaacausal causalmodel modelcompute computethat thatone onelacking lackingsuch suchaamodel model cannot? cannot?

9780141982410_TheBookOfWhy_TXT.indd 27

26/12/18 9:15 PM


28 28

THE THEBOOK BOOKOF OFWHY WHY

Figure Figure1.2. 1.2. The TheLadder LadderofofCausation, Causation,with withrepresentative representativeorganisms organismsatat each eachlevel. level.Most Mostanimals, animals,asaswell wellasaspresent-day present-daylearning learningmachines, machines,are areon on the thefifirst rstrung, rung,learning learningfrom fromassociation. association.Tool Toolusers, users,such suchasasearly earlyhumans, humans, are areon onthe thesecond secondrung rungififthey theyact actby byplanning planningand andnot notmerely merelyby byimitation. imitation. We Wecan canalso alsouse useexperiments experimentstotolearn learnthe theeff effects ectsofofinterventions, interventions,and andprepresumably sumablythis thisisishow howbabies babiesacquire acquiremuch muchofoftheir theircausal causalknowledge. knowledge.CounCounterfactual terfactuallearners, learners,on onthe thetop toprung, rung,can canimagine imagineworlds worldsthat thatdo donot notexist exist and andinfer inferreasons reasonsfor forobserved observedphenomena. phenomena.(Source: (Source:Drawing Drawingby byMaayan Maayan Harel.) Harel.)

9780141982410_TheBookOfWhy_TXT.indd 28

26/12/18 9:15 PM


The TheLadder LadderofofCausation Causation

29 29

While WhileTuring Turingwas waslooking lookingfor foraabinary binaryclassifi classification—human cation—human or or nonhuman—ours nonhuman—ours has has three three tiers, tiers, corresponding corresponding toto progresprogressively sivelymore morepowerful powerfulcausal causalqueries. queries.Using Usingthese thesecriteria, criteria,we wecan can assemble assemblethe thethree threelevels levelsofofqueries queriesinto intoone oneLadder LadderofofCausation Causation (Figure (Figure1.2), 1.2),aametaphor metaphorthat thatwe wewill willreturn returntotoagain againand andagain. again. Let’s Let’stake takesome sometime timetotoconsider considereach eachrung rungofofthe theladder ladderinindedetail. tail.At Atthe thefifirst rstlevel, level,association, association,we weare arelooking lookingfor forregularities regularitiesinin observations. observations.This Thisisiswhat whatan anowl owldoes doeswhen whenobserving observinghow howaarat rat moves movesand andfifiguring guringout outwhere wherethe therodent rodentisislikely likelytotobe beaamoment moment later, later,and andititisiswhat whataacomputer computerGo Goprogram programdoes doeswhen whenititstudies studies aadatabase databaseofofmillions millionsofofGo Gogames gamesso sothat thatititcan canfifigure gureout outwhich which moves movesare areassociated associatedwith withaahigher higherpercentage percentageofofwins. wins.We Wesay saythat that one oneevent eventisisassociated associatedwith withanother anotherififobserving observingone onechanges changesthe the likelihood likelihoodofofobserving observingthe theother. other. The Thefifirst rstrung rungofofthe theladder laddercalls callsfor forpredictions predictionsbased basedon onpaspassive siveobservations. observations.ItItisischaracterized characterizedby bythe thequestion question“What “WhatififI I see . . . ?” see . . . ?”For Forinstance, instance,imagine imagineaamarketing marketingdirector directoratataadepartdepartment mentstore storewho whoasks, asks,“How “Howlikely likelyisisaacustomer customerwho whobought boughttoothtoothpaste pastetotoalso alsobuy buydental dentalflfloss?” oss?”Such Suchquestions questionsare arethe thebread breadand and butter butterofofstatistics, statistics,and andthey theyare areanswered, answered,fifirst rstand andforemost, foremost,by by collecting collectingand andanalyzing analyzingdata. data.InInour ourcase, case,the thequestion questioncan canbe beananswered sweredby byfifirst rsttaking takingthe thedata dataconsisting consistingofofthe theshopping shoppingbehavior behavior ofofall allcustomers, customers,selecting selectingonly onlythose thosewho whobought boughttoothpaste, toothpaste,and, and, focusing focusingon onthe thelatter lattergroup, group,computing computingthe theproportion proportionwho whoalso also bought boughtdental dentalflfloss. oss.This Thisproportion, proportion,also alsoknown knownasasaa“conditional “conditional probability,” probability,”measures measures(for (forlarge largedata) data)the thedegree degreeofofassociation association between between“buying “buyingtoothpaste” toothpaste”and and“buying “buyingflfloss.” oss.”Symbolically, Symbolically,we we can canwrite writeititasasP(fl P(floss oss| |toothpaste). toothpaste).The The“P” “P”stands standsfor for“probabil“probability,” ity,”and andthe thevertical verticalline linemeans means“given “giventhat thatyou yousee.” see.” Statisticians Statisticianshave havedeveloped developedmany manyelaborate elaboratemethods methodstotoreduce reduce aalarge largebody bodyofofdata dataand andidentify identifyassociations associationsbetween betweenvariables. variables. “Correlation” “Correlation” or or “regression,” “regression,” aa typical typical measure measure ofof association association mentioned mentionedoften oftenininthis thisbook, book,involves involvesfifitting ttingaaline linetotoaacollection collection ofofdata datapoints pointsand andtaking takingthe theslope slopeofofthat thatline. line.Some Someassociations associations might mighthave haveobvious obviouscausal causalinterpretations; interpretations;others othersmay maynot. not.But Butstastatistics tisticsalone alonecannot cannottell tellwhich whichisisthe thecause causeand andwhich whichisisthe theeff effect, ect, toothpaste toothpasteor orflfloss. oss.From Fromthe thepoint pointofofview viewofofthe thesales salesmanager, manager,itit

9780141982410_TheBookOfWhy_TXT.indd 29

26/12/18 9:15 PM


30 30

THE THEBOOK BOOKOF OFWHY WHY

may maynot notreally reallymatter. matter.Good Goodpredictions predictionsneed neednot nothave havegood goodexplaexplanations. nations.The Theowl owlcan canbe beaagood goodhunter hunterwithout withoutunderstanding understandingwhy why the therat ratalways alwaysgoes goesfrom frompoint pointAAtotopoint pointB.B. Some Somereaders readersmay maybe besurprised surprisedtotosee seethat thatI Ihave haveplaced placedpresentpresentday day learning learning machines machines squarely squarely on on rung rung one one ofof the the Ladder Ladder ofof Causation, Causation,sharing sharingthe thewisdom wisdomofofan anowl. owl.We Wehear hearalmost almostevery every day, day,ititseems, seems,about aboutrapid rapidadvances advancesininmachine machinelearning learningsystems— systems— self-driving self-driving cars, cars, speech-recognition speech-recognition systems, systems, and, and, especially especially inin recent recentyears, years,deep-learning deep-learningalgorithms algorithms(or (ordeep deepneural neuralnetworks). networks). How Howcould couldthey theystill stillbe beonly onlyatatlevel levelone? one? The Thesuccesses successesofofdeep deeplearning learninghave havebeen beentruly trulyremarkable remarkableand and have havecaught caughtmany manyofofus usby bysurprise. surprise.Nevertheless, Nevertheless,deep deeplearning learning has hassucceeded succeededprimarily primarilyby byshowing showingthat thatcertain certainquestions questionsor ortasks tasks we wethought thoughtwere werediffi difficult cultare areininfact factnot. not.ItIthas hasnot notaddressed addressedthe the truly trulydiffi difficult cultquestions questionsthat thatcontinue continuetotoprevent preventus usfrom fromachieving achieving humanlike humanlikeAI. AI.As Asaaresult resultthe thepublic publicbelieves believesthat that“strong “strongAI,” AI,”mamachines chinesthat thatthink thinklike likehumans, humans,isisjust justaround aroundthe thecorner corneror ormaybe maybe even evenhere herealready. already.InInreality, reality,nothing nothingcould couldbe befarther fartherfrom fromthe the truth. truth.I Ifully fullyagree agreewith withGary GaryMarcus, Marcus,aaneuroscientist neuroscientistatatNew NewYork York University, University,who whorecently recentlywrote wroteininthe theNew NewYork YorkTimes Timesthat thatthe the fifield eldofofartifi artificial cialintelligence intelligenceisis“bursting “burstingwith withmicro microdiscoveries”— discoveries”— the thesort sortofofthings thingsthat thatmake makegood goodpress pressreleases—but releases—butmachines machinesare are still stilldisappointingly disappointinglyfar farfrom fromhumanlike humanlikecognition. cognition.My Mycolleague colleague inincomputer computerscience scienceatatthe theUniversity UniversityofofCalifornia, California,Los LosAngeles, Angeles, Adnan AdnanDarwiche, Darwiche,has hastitled titledaaposition positionpaper paper“Human-Level “Human-LevelIntelliIntelligence genceor orAnimal-Like Animal-LikeAbilities?” Abilities?”which whichI Ithink thinkframes framesthe thequestion question ininjust justthe theright rightway. way.The Thegoal goalofofstrong strongAI AIisistotoproduce producemachines machines with withhumanlike humanlikeintelligence, intelligence,able abletotoconverse conversewith withand andguide guidehuhumans. mans.Deep Deeplearning learninghas hasinstead insteadgiven givenus usmachines machineswith withtruly trulyimimpressive pressiveabilities abilitiesbut butno nointelligence. intelligence.The Thediff difference erenceisisprofound profound and andlies liesininthe theabsence absenceofofaamodel modelofofreality. reality. Just Justasasthey theydid didthirty thirtyyears yearsago, ago,machine machinelearning learningprograms programs(in(including cludingthose thosewith withdeep deepneural neuralnetworks) networks)operate operatealmost almostentirely entirelyinin an anassociational associationalmode. mode.They Theyare aredriven drivenby byaastream streamofofobservations observations totowhich whichthey theyattempt attempttotofifit taafunction, function,ininmuch muchthe thesame sameway waythat that aastatistician statisticiantries triestotofifit taaline linetotoaacollection collectionofofpoints. points.Deep Deepneural neural

9780141982410_TheBookOfWhy_TXT.indd 30

26/12/18 9:15 PM


The TheLadder LadderofofCausation Causation

31 31

networks networkshave haveadded addedmany manymore morelayers layerstotothe thecomplexity complexityofofthe thefifit-tted tedfunction, function,but butraw rawdata datastill stilldrives drivesthe thefifitting ttingprocess. process.They Theyconcontinue tinuetotoimprove improveininaccuracy accuracyasasmore moredata dataare arefifitted, tted,but butthey theydo donot not benefi benefit tfrom fromthe the“super-evolutionary “super-evolutionaryspeedup.” speedup.”If, If,for forexample, example,the the programmers programmersofofaadriverless driverlesscar carwant wantitittotoreact reactdiff differently erentlytotonew new situations, situations,they theyhave havetotoadd addthose thosenew newreactions reactionsexplicitly. explicitly.The Themamachine chinewill willnot notfifigure gureout outfor foritself itselfthat thataapedestrian pedestrianwith withaabottle bottleofof whiskey whiskeyininhand handisislikely likelytotorespond responddiff differently erentlytotoaahonking honkinghorn. horn. This Thislack lackofofflflexibility exibilityand andadaptability adaptabilityisisinevitable inevitableininany anysystem system that thatworks worksatatthe thefifirst rstlevel levelofofthe theLadder LadderofofCausation. Causation. We Westep stepup uptotothe thenext nextlevel levelofofcausal causalqueries querieswhen whenwe webegin begintoto change changethe theworld. world.AAtypical typicalquestion questionfor forthis thislevel levelisis“What “Whatwill will happen happentotoour ourflfloss osssales salesififwe wedouble doublethe theprice priceofoftoothpaste?” toothpaste?”This This already alreadycalls callsfor foraanew newkind kindofofknowledge, knowledge,absent absentfrom fromthe thedata, data, which whichwe wefifind ndatatrung rungtwo twoofofthe theLadder LadderofofCausation, Causation,intervention. intervention. Intervention Interventionranks rankshigher higherthan thanassociation associationbecause becauseititinvolves involves not notjust justseeing seeingbut butchanging changingwhat whatis. is.Seeing Seeingsmoke smoketells tellsus usaatotally totally diff different erentstory storyabout aboutthe thelikelihood likelihoodofoffifirerethan thanmaking makingsmoke. smoke.We We cannot cannotanswer answerquestions questionsabout aboutinterventions interventionswith withpassively passivelycolcollected lecteddata, data,no nomatter matterhow howbig bigthe thedata dataset setor orhow howdeep deepthe theneural neural network. network.Many Manyscientists scientistshave havebeen beenquite quitetraumatized traumatizedtotolearn learnthat that none noneofofthe themethods methodsthey theylearned learnedininstatistics statisticsisissuffi sufficient cienteven eventoto articulate, articulate,let letalone aloneanswer, answer,aasimple simplequestion questionlike like“What “Whathappens happens ififwe wedouble doublethe theprice?” price?”I Iknow knowthis thisbecause becauseon onmany manyoccasions occasionsI I have havehelped helpedthem themclimb climbtotothe thenext nextrung rungofofthe theladder. ladder. Why Whycan’t can’twe weanswer answerour ourflfloss ossquestion questionjust justby byobservation? observation? Why Whynot notjust justgo gointo intoour ourvast vastdatabase databaseofofprevious previouspurchases purchasesand and see seewhat whathappened happenedpreviously previouslywhen whentoothpaste toothpastecost costtwice twiceasasmuch? much? The Thereason reasonisisthat thaton onthe theprevious previousoccasions, occasions,the theprice pricemay mayhave have been beenhigher higherfor fordiff different erentreasons. reasons.For Forexample, example,the theproduct productmay may have havebeen beenininshort shortsupply, supply,and andevery everyother otherstore storealso alsohad hadtotoraise raiseits its price. price.But Butnow nowyou youare areconsidering consideringaadeliberate deliberateintervention interventionthat that will willset setaanew newprice priceregardless regardlessofofmarket marketconditions. conditions.The Theresult result might mightbe bequite quitediff different erentfrom fromwhen whenthe thecustomer customercouldn’t couldn’tfifind ndaa better betterdeal dealelsewhere. elsewhere.IfIfyou youhad haddata dataon onthe themarket marketconditions conditions that thatexisted existedon onthe theprevious previousoccasions, occasions,perhaps perhapsyou youcould couldmake makeaa

9780141982410_TheBookOfWhy_TXT.indd 31

26/12/18 9:15 PM


32 32

THE THEBOOK BOOKOF OFWHY WHY

better betterprediction prediction. .. .. .but butwhat whatdata datado doyou youneed? need?And Andthen, then,how how would wouldyou youfifigure gureititout? out?Those Thoseare areexactly exactlythe thequestions questionsthe thescience science ofofcausal causalinference inferenceallows allowsus ustotoanswer. answer. AAvery verydirect directway waytotopredict predictthe theresult resultofofan anintervention interventionisistoto experiment experimentwith withititunder undercarefully carefullycontrolled controlledconditions. conditions.Big-data Big-data companies companieslike likeFacebook Facebookknow knowthis thisand andconstantly constantlyperform performexexperiments perimentstotosee seewhat whathappens happensififitems itemson onthe thescreen screenare arearranged arranged diff differently erentlyor orthe thecustomer customergets getsaadiff different erentprompt prompt(or (oreven evenaadifdifferent ferentprice). price). More More interesting interesting and and less less widely widely known—even known—even inin Silicon Silicon Valley—is Valley—isthat thatsuccessful successfulpredictions predictionsofofthe theeff effects ectsofofinterventions interventions can cansometimes sometimesbe bemade madeeven evenwithout withoutan anexperiment. experiment.For Forexample, example, the thesales salesmanager managercould coulddevelop developaamodel modelofofconsumer consumerbehavior behavior that thatincludes includesmarket marketconditions. conditions.Even Evenififshe shedoesn’t doesn’thave havedata dataon on every everyfactor, factor,she shemight mighthave havedata dataon onenough enoughkey keysurrogates surrogatestotomake make the theprediction. prediction.AAsuffi sufficiently cientlystrong strongand andaccurate accuratecausal causalmodel modelcan can allow allowus ustotouse userung-one rung-one(observational) (observational)data datatotoanswer answerrung-two rung-two (interventional) (interventional)queries. queries.Without Withoutthe thecausal causalmodel, model,we wecould couldnot not go gofrom fromrung rungone onetotorung rungtwo. two.This Thisisiswhy whydeep-learning deep-learningsystems systems (as (aslong longasasthey theyuse useonly onlyrung-one rung-onedata dataand anddo donot nothave haveaacausal causal model) model)will willnever neverbe beable abletotoanswer answerquestions questionsabout aboutinterventions, interventions, which whichby bydefi definition nitionbreak breakthe therules rulesofofthe theenvironment environmentthe themachine machine was wastrained trainedin. in. As Asthese theseexamples examplesillustrate, illustrate,the thedefi defining ningquery queryofofthe thesecond second rung rungofofthe theLadder LadderofofCausation Causationisis“What “Whatififwe wedo do. .. .. .?”?”What Whatwill will happen happenififwe wechange changethe theenvironment? environment?We Wecan canwrite writethis thiskind kindofof query queryasasP(fl P(floss oss| |do(toothpaste)), do(toothpaste)),which whichasks asksabout aboutthe theprobability probability that thatwe wewill willsell sellflfloss ossatataacertain certainprice, price,given giventhat thatwe weset setthe theprice price ofoftoothpaste toothpasteatatanother anotherprice. price. Another Anotherpopular popularquestion questionatatthe thesecond secondlevel levelofofcausation causationisis “How?,” “How?,”which whichisisaacousin cousinofof“What “Whatififwe wedo do. .. .. .?”?”For Forinstance, instance, the themanager managermay maytell tellus usthat thatwe wehave havetoo toomuch muchtoothpaste toothpasteininour our warehouse. warehouse.“How “Howcan canwe wesell sellit?” it?”he heasks. asks.That Thatis, is,what whatprice price should shouldwe weset setfor forit? it?Again, Again,the thequestion questionrefers referstotoan anintervention, intervention, which whichwe wewant wanttotoperform performmentally mentallybefore beforewe wedecide decidewhether whetherand and how howtotodo doititininreal reallife. life.That Thatrequires requiresaacausal causalmodel. model.

9780141982410_TheBookOfWhy_TXT.indd 32

26/12/18 9:15 PM


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.