
Chapter1 PrinciplesofProbability
1.Combiningindependentprobabilities.
Youhaveappliedtothreemedicalschools:UniversityofCaliforniaatSanFrancisco(UCSF), DuluthSchoolofMines(DSM),andHarvard(H).Youguessthattheprobabilitiesyou’ll beacceptedare: p(UCSF)=0.10, p(DSM)=0.30,and p(H)=0.50.Assumethatthe acceptanceeventsareindependent.
(a)Whatistheprobabilitythatyougetinsomewhere(atleastoneacceptance)?
(b)WhatistheprobabilitythatyouwillbeacceptedbybothHarvardandDuluth?
(a)Thesimplestwaytosolvethisproblemistorecallthatwhenprobabilitiesare independent,andyouwanttheprobabilityofevents A and B ,youcan multiply them. Wheneventsare mutuallyexclusive andyouwanttheprobabilityofevents A or B , youcan add theprobabilities.Thereforewetrytostructuretheproblemintoan and and or problem.WewanttheprobabilityofgettingintoH or DSMorUCSF.But thisdoesn’thelp,becausetheseeventsarenotmutuallyexclusive(mutuallyexclusive meansthatifonehappens,theothercannothappen).Sowetryagain.Theprobability ofacceptancesomewhere, P (a),is P (a)=1 P (r ),where P (r )istheprobabilitythat you’rerejectedeverywhere.(You’reeitheracceptedsomewhereoryou’renot.)Butthis probability can beputintheaboveterms. P (r )=theprobabilitythatyou’rerejected atH and atDSM and atUCSF.Theseeventsareindependent,sowehavetheanswer.
TheprobabilityofrejectionatHis p(r H)=1 0 5=0 5.RejectionatDSMis
p(r DSM)=1 0 3=0 7.RejectionatUCSFis p(r UCSF)=1 0 1=0 9. Therefore P (r )=(0.5)(0.7)(0.9)=0.315.Thereforetheprobabilityofatleastoneacceptance = P (a)=1 P (r )=0.685.
(b)Thesimpleansweristhatthisistheintersectionoftwoindependentevents:
15
Amoremechanicalapproachtoeitherpart(a)orthispartistowriteoutallthe possiblecircumstances.RejectionandacceptanceatHaremutuallyexclusive.Their probabilitiesaddtoone.Thesamefortheothertwoschools.Thereforeallpossible circumstancesaretakenintoaccountby adding themutuallyexclusiveeventstogether, and multiplying independentevents:
wherethefirsttermistheprobabilityofacceptanceatallthree,thesecondterm representsacceptanceatHandDSMbutrejectionatUCSF,thethirdtermrepresents acceptanceatHandUCSFbutrejectionatDSM,etc.Eachoftheseeventsismutually exclusivewithrespecttoeachother;thereforetheyarealladded.Eachindividualterm representsindependenteventsof,forexample, aH and aDSM and aUCSF.Thereforeit issimpletoreadofftheanswerinthisproblem:wewant aH and aDSM, but noticewe don’tcareaboutUCSF.Thisprobabilityis
=(0 50)(0 30) =0.15.
Notethatwecouldhavesolvedpart(a)thesameway;itwouldhaverequiredadding upalltheappropriatepossiblemutuallyexclusiveevents.Youcancheckthatitgives thesameanswerasabove(butnoticehowmuchmoretediousitis).
2.Probabilitiesofsequences.
Assumethatthefourbases A, C, T,and G occurwithequallikelihoodinaDNAsequence ofninemonomers.
(a)Whatistheprobabilityoffindingthesequence AAATCGAGT throughrandom chance?
(b)Whatistheprobabilityoffindingthesequence AAAAAAAAA throughrandom chance?
(c)Whatistheprobabilityoffindinganysequencethathasfour A’s,two T’s,two G’s, andone C,suchasthatin(a)?
(a)Eachbaseoccurswithprobability1/4.Theprobabilityofan A inposition1is1/4, thatofan A inposition2is1/4,thatofan A inposition3is1/4,thatofa T in position4is1/4,andsoon.Thereareninebases.Theprobabilityofthisspecific sequenceis(1/4)9 =3 8 × 10 6
(b)Sameansweras(a).
(c)Eachspecificsequencehastheprobabilitygivenabove,butinthiscasetherearemany possiblesequencesthatsatisfytherequirementthatwehavefour A’s,two T’s,two G’s,andone C.Howmanyarethere?Westartaswehavedonebefore,byassuming allnineobjectsaredistinguishable.Thereare9!arrangementsofninedistinguishable objectsinalinearsequence.(Thefirstonecanbeinanyofnineplaces,thesecondin anyoftheremainingeightplaces,andsoon.)Butwecan’tdistinguishthefour A’s,so wehaveovercountedbyafactorof4!,andmustdividethisout.Wecan’tdistinguish thetwo T’s,sowehaveovercountedby2!,andmustalsodividethisout.Andsoon. Sotheprobabilityofhavingthiscompositionis
3.Theprobabilityofasequence(givenacomposition).
Ascientisthasconstructedasecretpeptidetocarryamessage.Youknowonlythecompositionofthepeptide,whichissixaminoacidslong.Itcontainsoneserine S,onethreonine T, onecysteine C,onearginine R,andtwoglutamates E.Whatistheprobabilitythatthe sequence SECRET willoccurbychance?
The S couldbeinanyoneofthesixpositionswithequallikelihood.Theprobabilitythatit isinposition1is1/6.Giventhat S isinthefirstposition,wehavetwo E s,whichcould occurinanyoftheremainingfivepositions.Theprobabilitythatoneofthemisinposition2 is2/5.Giventhosetwolettersinposition,theprobabilitythattheone C isinthenextof thefourremainingpositionsis1/4.Theprobabilityforthe R is1/3.Fortheremaining E,it is1/2,andforthelast T,itis1/1,sotheprobabilityis
4.Combiningindependentprobabilities.
Youhaveafairsix-sideddie.Youwanttorollitenoughtimestoensurethata 2 occursat leastonce.Whatnumberofrolls k isrequiredtoensurethattheprobabilityisatleast2/3 thatatleastone 2 willappear?
Approximatelysixormorerollswillensurewithprobability P ≥ 2/3thata 2 willappear.
5.Predictingcompositionsofindependentevents.
Supposeyourollafairsix-sideddiethreetimes.
(a)Whatistheprobabilityofgettinga 5 twicefromallthreerollsofthedice?
(b)Whatistheprobabilityofgettingatotalof atleast two 5’sfromallthreerollsof thedie?
Theprobabilityofgetting x 5’son n rollsofthediceis
Notethatthisisa“2-outcome”problem(gettinga 5 ornotgettinga 5).Itisnota “6-outcome”problem.
(a)Sotheprobabilityoftwo 5’sonthreedicerollsis
(b)Theprobabilityofgetting atleast two 5’sistheprobabilityofgettingtwo 5’sorthree 5’s.Sincethesetwosituationsaremutuallyexclusive,weseek
7.Computingtheaverageofaprobabilitydistribution.
8.Predictingcoincidence.
Yourstatisticalmechanicsclasshas25students.Whatistheprobabilitythatatleasttwo classmateshavethesamebirthday?
Ifyoufirstfindtheprobability q thatnotwostudentshavethesamebirthday,thenthe quantityyouwantis
p(2studentshavesamebirthday)=1 q
Theprobabilitythatasecondstudentdoesnothavethesamebirthdayasthefirstis (364/365).Theprobabilitythatthethirdstudenthasabirthdaydifferentthaneitherofthe firsttwois(363/365),andsoon.Itislikeasequenceprobleminwhicheachpossible
birthdayisonecarddrawnoutofabarrel.Theprobabilitythatnotwopeoplehavethesame birthday,outof m people,is
q = 364 365 363 365 362 365 ··· 365 (m 1) 365
Infactorialnotation,
q = N ! (N m)!N m ,
where N =365.(Incidentally,thisexpressionisidenticaltotheexpressionforexcluded volumeintheFlory–Hugginsmodelofpolymersolutions(seeChapter31).)UsingStirling’s approximation x! ≈ (x/e)x ,weget
Collectingtogethertermsin e anddividingthenumeratoranddenominatorby N N gives q =
Substituting m =25studentsand N =365gives
q =0.4163, so
p =1 q =0.5837.
Thereisabetterthan50%chancetwostudentswillhavethesamebirthday!
9.Thedistributionofscoresondice.
Supposethatyouhave n dice,eachadifferentcolor,allunbiasedandsix-sided.
(a)Ifyourollthemallatonce,howmanydistinguishableoutcomesarethere?
(b)Giventwodistinguishabledice,whatisthemostprobablesumoftheirfacevalueson agiventhrowofthepair?(Thatis,whichsumbetween2and12hasthegreatest numberofdifferentwaysofoccurring?)
(c)Whatistheprobabilityofthemostprobablesum?
(a)
6ononedie 6 × 6ontwodice . . . 6n on n dice.
(b)Numberofwaysasumcanoccur:
Whendiceshowdifferentnumbers,thereisadegeneracyoftwo.Wheneachofthedice hasthesamenumber,thedegeneracyequalsone.
(c)Probabilityof7= p(7)= numberofwaysofgetting7 totalnumberofwaysofalloutcomes
10.Theprobabilitiesofidenticalsequencesofaminoacids.
Youarecomparingproteinaminoacidsequencesforhomology.Youhavea20-letteralphabet(20differentaminoacids).Eachsequenceisastring n lettersinlength.Youhaveone testsequenceand s differentdatabasesequences.Youmayfindanyoneofthe20different aminoacidsatanypositioninthesequence,independentofwhatyoufindatanyother position.Let p representtheprobabilitythattherewillbea‘match’atagivenpositionin thetwosequences.
(a)Intermsof s,p, and n, howmanyofthe s sequenceswillbeperfectmatches(identical residuesateveryposition)?
(b)Howmanyofthe s comparisons(ofthetestsequenceagainsteachdatabasesequence) willhaveexactlyonemismatchatanypositioninthesequences?
(a)Forcomparingonesequence,eachpositionbeingassumedindependent,theprobability ofaperfectmatchofall n residuesis
pn =(numberofmatchedseqs/numberoftotalseqs)=⇒ numberofmatchesin s sequences= spn
(b) n 1positionsmatch,sotheprobabilityis pn 1 ;onepositiondoesn’tmatch,whichhas theprobability(1 p);andthereare n differentpositionsatwhichthemismatchcould occur;thereforetheansweris
spn 1 (1 p)n
Notethat,ingeneral,for k matches,
(1) P (k )= sp k (1 p)n k n! k !(n k )! .
11.Thecombinatoricsofdisulfidebondformation.
Aproteinmaycontainseveralcysteines,whichmaypairtogethertoformdisulfidebonds asshowninthefigurebelow.Ifthereisanevennumber n ofcysteines, n/2disulfidebonds canform.Howmanydifferentdisulfidepairingarrangementsarepossible?
Numbertheindividualsulfhydrylgroupsalongthechain.Thefirstsulfhydrylalongthe sequencecanbondtoanyoftheother n 1.Thisremovestwosulfhydrylsfrom consideration.Thethirdsulfhydrylcanthenbondtoanyoftheremaining n 3.Four sulfhydrylsarenowremovedfromconsideration.Thefifthcannowbondtoanyofthe remaining n 5sulfhydryls,etc.,untilall n/2bondsareformed.Thusthetotalpossible numberofarrangementsofdisulfidebondsisaproductof n/2terms:
Anotherapproachgivesanexpressionthatiseasiertocalculate.Considerplacingthe sulfhydrylsinasequence.Thefirstplacemaybeoccupiedbyanyof n sulfhydryls,the secondplacebyanyof n 1sulfhydryls,thethirdbyanyof n 2sulfhydryls,etc.Thus,if eachsulfhydrylweredistinguishablefromeveryother,therewouldbe n!arrangements. However,eachsulfhydrylhasamatefromwhichitcannotbedistinguished.Wemustdivide byafactorof2(perbond)tocorrectfortheindistinguishabilityofthetwoendsofeach bond.Finally,sincewecannotdistinguishanyofthe n/2bondsfromanyother,wemust alsodivideby(n/2)!.Hencethenumberofarrangementsis
W (n)= n! 2n/2 (n/2)!
Althoughthesetwoequationswerederivedinverydifferentways,theyarenumerically identicalforall n
12.Predictingcombinationsofindependentevents.
Ifyouflipanunbiasedgreencoinandanunbiasedredcoinfivetimeseach,whatisthe probabilityofgettingfourredheadsandtwogreentails?
Theprobabilityoffourredheadsinfivecoinflipsis 1 2 5 5! 4!1! = 5 32
Theprobabilityoftwogreentailsis
Sincethegreencoinflipsareindependentoftheredcoinflips,theprobabilityweseekis (5/32)(10/32)=(50/1024)=4 88 × 10 2
13.Apairofaces.
Whatistheprobabilityofdrawingtwoacesintworandomdrawswithoutreplacementfrom afulldeckofcards?
Adeckhas52cardsandfouraces.Theprobabilityofgettinganaceonthefirstdrawis 4/52=1/13.Sinceyoudrawwithoutreplacement,theprobabilityofgettingoneofthe remainingthreeacesontheseconddrawis3/51,sotheprobabilityoftwoacesontwo drawsis
14.Averageofalinearfunction.
Whatistheaveragevalueof x,givenadistributionfunction q (x)= cx,where x ranges fromzerotoone,and q (x)isnormalized?
15.TheMaxwell–Boltzmannprobabilitydistributionfunction.
Accordingtothekinetictheoryofgases,theenergiesofmoleculesmovingalongthe x directionaregivenby εx =(1/2)mv 2 x ,where m ismassand vx isthevelocityinthe x direction.ThedistributionofparticlesovervelocitiesisgivenbytheBoltzmannlaw, p(vx )= e mv 2 x /2kT .ThisistheMaxwell–Boltzmanndistribution(velocitiesmayrangefrom −∞ to+∞).
(a)Writetheprobabilitydistribution p(vx ),sothattheMaxwell–Boltzmanndistribution iscorrectlynormalized.
(b)Computetheaverageenergy (1/2)mv 2 x .
(c)Whatistheaveragevelocity vx ?
(d)Whatistheaveragemomentum mvx ?
(a)Towritetheprobabilitydistribution p(vx ) dvx sothattheMaxwell–Boltzmann distributioniscorrectlynormalized,werequire
Againconsultingourtableofintegrals,wefind
Aside: Integralsoftheform
Notethatwehaveusedtheresultoftheintegralfrompart(a)above. Therefore
(c)Tofindtheaveragevelocity vx ,werecallthatforfunctionswith odd symmetry (f (x)= f ( x)),theintegralunderthecurvefornegative x cancelswiththatunder thecurveforpositive x.Usingthefactthat p(x)= p( x),
(d)Theaveragemomentum
16.PredictingtherateofmutationbasedonthePoissonprobabilitydistribution function.
Theevolutionaryprocessofaminoacidsubstitutionsinproteinsissometimesdescribedby thePoissonprobabilitydistributionfunction.Theprobability ps (t)thatexactly s substitutionsatagivenaminoacidpositionoccuroveranevolutionarytime t is
where λ istherateofaminoacidsubstitutionspersiteperunittime.Fibrinopeptidesevolve rapidly: λF =9 0substitutionspersiteper109 years.Lysozymeisintermediate: λL ≈ 1 0.
Histonesevolveslowly: λH =0.010substitutionspersiteper109 years.
(a)Whatistheprobabilitythatafibrinopeptidehasnomutationsatagivensitein t =1 billionyears?
(b)Whatistheprobabilitythatlysozymehasthreemutationspersitein100million years?
(c)Wewanttodeterminetheexpectednumberofmutations s thatwilloccurintime t.Wewilldothisintwosteps.First,usingthefactthatprobabilitiesmustsumto one,write α = ∞ s=0 (λt)s /s!inasimplerform.
(d)Nowwriteanexpressionfor s .Notethat
(e)Usingyouranswertopart(d),determinetheratiooftheexpectednumberofmutations inafibrinopeptidetotheexpectednumberofmutationsinhistoneprotein, s fib / s his
(a)Theprobabilitythatafibrinopeptidehasnomutationsatagivensitein t =1billion yearsis
=exp[ (9 0per109 years)(109 years)]
(b)Forlysozyme,
t =(1 0per109 years)(108 years)=0 1
Theprobabilitythatlysozymehasthreemutationspersitein100millionyearsisthen
(c)Sincetheprobabilitiessumto1,
17.Probabilityincourt.
Inforensicscience,DNAfragmentsfoundatthesceneofacrimecanbecomparedwith DNAfragmentsfromasuspectedcriminaltodeterminethattheprobabilitythatamatch occursbychance.SupposethatDNAfragment A isfoundin1%ofthepopulation,fragment B isfoundin4%ofthepopulation,andfragment C isfoundin2.5%ofthepopulation.
(a)Ifthethreefragmentscontainindependentinformation,whatistheprobabilitythat asuspect’sDNAwillmatchallthreeofthesefragmentcharacteristicsbychance?
(b)SomepeoplebelievesuchafragmentanalysisisflawedbecausedifferentDNAfragmentsdonotrepresentindependentproperties.Asbefore,supposethatfragment A occursin1%ofthepopulation.Butnowsupposethattheconditionalprobabilityof B ,giventhat A is p(B |A)=0 40ratherthan0.040,and p(C |A)=0 25ratherthan 0.025.Thereisnoadditionalinformationaboutanyrelationshipbetween B and C . Whatistheprobabilityofamatchnow?
(a)Sincethefragmentsareindependent,
.04)(0
(b)Compute x2 .
(c)Compute x3 .
(d)Compute x4