Where can buy An introduction to biostatistics 3rd edition, (ebook pdf) ebook with cheap price by Education Libraries

An Introduction to Biostatistics 3rd Edition, (Ebook PDF)

Visit to download the full and correct content document: https://ebookmass.com/product/an-introduction-to-biostatistics-3rd-edition-ebook-pdf/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

(eBook PDF) An Introduction to Payroll Administration 3rd Edition

https://ebookmass.com/product/ebook-pdf-an-introduction-topayroll-administration-3rd-edition/

(eBook PDF) Urbanization: An Introduction to Urban Geography 3rd Edition

https://ebookmass.com/product/ebook-pdf-urbanization-anintroduction-to-urban-geography-3rd-edition/

An Introduction to Crime Scene Investigation 3rd Edition, (Ebook PDF)

https://ebookmass.com/product/an-introduction-to-crime-sceneinvestigation-3rd-edition-ebook-pdf/

(eBook PDF) An Introduction to Psychological Science, 3rd Canadian Edition

https://ebookmass.com/product/ebook-pdf-an-introduction-topsychological-science-3rd-canadian-edition/

(eBook PDF) Politics An Introduction 3rd Canadian Edition

https://ebookmass.com/product/ebook-pdf-politics-anintroduction-3rd-canadian-edition/

Theatre Histories: An Introduction 3rd Edition, (Ebook PDF)

https://ebookmass.com/product/theatre-histories-anintroduction-3rd-edition-ebook-pdf/

Practices of Looking: An Introduction to Visual Culture 3rd Edition (Ebook PDF)

https://ebookmass.com/product/practices-of-looking-anintroduction-to-visual-culture-3rd-edition-ebook-pdf/

The Good Society: An Introduction to Comparative Politics 3rd Edition, (Ebook PDF)

https://ebookmass.com/product/the-good-society-an-introductionto-comparative-politics-3rd-edition-ebook-pdf/

Gateways to Democracy: An Introduction to American Government (Book Only) 3rd Edition, (Ebook PDF)

https://ebookmass.com/product/gateways-to-democracy-anintroduction-to-american-government-book-only-3rd-edition-ebookpdf/

10LinearRegressionandCorrelation315

11GoodnessofFitTests357

AProofsofSelectedResults403

C.6WilcoxonSigned-RankTestCumulativeDistribution..........

C.7Cumulative

C.8CriticalValuesfortheWilcoxonRank-SumTest............

C.10Fisher’s

C.11CorrelationCoecient

C.12CumulativeDistributionforKendall’sTest(⌧

C.13CriticalValuesfortheSpearmanRankCorrelationCoecient, rs ... 526

C.14CriticalValuesfortheKolmogorov-SmirnovTest............ 527

C.15CriticalValuesfortheLillieforsTest................... 528

Ourgoalinwritingthisbookwastogenerateanaccessibleandrelativelycomplete introductionforundergraduatestotheuseofstatisticsinthebiologicalsciences.The textisdesignedforaonequarteroronesemesterclassinintroductorystatisticsfor thelifesciences.Thetargetaudienceissophomoreandjuniorbiology,environmental studies,biochemistry,andhealthsciencesmajors.Theassumedbackgroundissome courseworkinbiologyaswellasafoundationinalgebrabutnotcalculus.Examples aretakenfrommanyareasinthelifesciencesincludinggenetics,physiology,ecology, agriculture,andmedicine.

Thistextemphasizestherelationshipsamongprobability,probabilitydistributions, andhypothesistesting.Wehighlighttheexpectedvalueofvariousteststatisticsunder thenullandresearchhypothesesasawaytounderstandthemethodologyofhypothesistesting.Inaddition,wehaveincorporatednonparametricalternativestomany situationsalongwiththestandardparametricanalysis.Thesenonparametrictechniquesareincludedbecauseundergraduatestudentprojectsoftenhavesmallsample sizesthatprecludeparametricanalysisandbecausethedevelopmentofthenonparametrictestsisreadilyunderstandableforstudentswithmodestmathbackgrounds. Thenonparametrictestscanbeskippedorskimmedwithoutlossofcontinuity.

Wehavetriedtoincludeinterestingandeasilyunderstandableexampleswitheach concept.Theproblemsattheendofeachchapterhavearangeofdicultyandcome fromavarietyofdisciplines.Somearereal-lifeexamplesandmostothersarerealistic intheirdesignanddatavalues.Throughoutthetextwehaveincludedshort“Concept Checks”thatallowreaderstoimmediatelygaugetheirmasteryofthetopicpresented. Theiranswersarefoundattheendsofappropriatechapters.Theend-of-chapter problemsarerandomizedwithineachchaptertorequirethestudenttochoosethe appropriateanalysis.Manyundergraduatetextspresentatechniqueandimmediately givealltheproblemsthatcanbesolvedwithit.Thisapproachpreventsstudents fromhavingtomakethereal-lifedecisionabouttheappropriateanalysis.Webelieve thisdecisionmakingisacriticalskillinstatisticalanalysisandhaveprovidedalarge numberofopportunitiestopracticeandit.

Thematerialforthistextderivesprincipallyfromarequiredundergraduatebiostatisticscourseoneofus(Glover)taughtformorethantwentyyearsandfroma secondcourseinnonparametricstatisticsandﬁelddataanalysisthattheotherof

us(Mitchell)taughtduringseveraltermabroadprogramstoQueensland,Australia. Recentshiftsinundergraduatecurriculahavede-emphasizedcalculusforbiologystudentsandarenowhighlightingstatisticalanalysesasafundamentalquantitativeskill. Wehopethatourtextwillmaketeachingandlearningtheseprocesseslessarduous.

SupplementalMaterials

Wehaveprovidedseveraldi↵erentresourcestosupplementthetextinvariousways. Thereisasetof AdditionalAppendices thatareavailableonlineat http://waveland. com/Glover-Mitchell/Appendices.pdf.Thesearecoordinatedwiththistextand containfurtherinformationonseveraltopics,including:

• additionalpost-hocmeancomparisontechniquesbeyondthosecoveredinthe textforuseintheanalysisofvariance;

• amethodfordeterminingconﬁdenceintervalsforthedi↵erencebetweenmedians ofindependentsamplesbasedontheWilcoxonrank-sumtest;

• adiscussionoftheDurbintestforincompleteblockdesigns;and

• asectioncoveringthreedi↵erentﬁeldmethods.

Alsoavailableissetof300additionalproblemstosupplementthoseinthetext. Thismaterialisavailableonlineforbothstudentsandinstructorsat http://waveland. com/Glover-Mitchell/ExtraProblems.pdf.

An AnswerManual forinstructorsisavailablefreeonCDfromthepublisher. IncludedonthisCDare

• aPDFﬁlecontainingbothquestionsandanswersforalltheproblemsinthetext andanotherﬁlecontainingonlytheanswersforalltheproblemsinthetext;

• aPDFﬁlecontainingthesupplementaryproblemsmentionedaboveandanother ﬁlecontainingboththequestionsandtheanswerstoallofthesupplementary problems;and

• aPDFﬁleofthe AdditionalAppendices

Thematerialinthistextbookcanbesupportedbyawidevarietyofstatistical packagesandcalculators.Theselectionofthesesupportmaterialsisdictatedby personalinterestsandcostconsiderations.Forexample,wehavesuccessfullyused SPSSsoftware(http://www.spss.com/)inthelaboratorysessionsofourcourse.This softwareiseasytouse,relativelyﬂexible,andcancompletenearlyallthestatistical techniquespresentedinourtext.

Anumberoffreeonlinestatisticaltoolsarealsoavailable.AmongthebestisThe R ProjectforStatisticalComputingwhichisavailableat http://www.r-project.org/ Wehavecreatedacompanionguide, Using R:AnIntroductiontoBiostatistics for ourtextthatisavailablefreeonlineat http://waveland.com/Glover-Mitchell/ r-guide.pdf,andisalsoincludedontheCDforinstructors.Intheguidewework throughalmosteveryexampleinthetextusing R.Ausefulfeatureof R istheability toaccessandanalyze(large)onlinedataﬁleswithouteverdownloadingthemtoyour owncomputer.Withthisinmind,wehaveplacedallofthedataﬁlesfortheexamples

andexercisesonlineastextfiles.Youmayeitherdownloadthesefilestousewith R oranyotherstatisticalpackage,orin R youmayaccessthetextfilesfromonline anddothenecessaryanalysiswithoutevercopyingthemtoyourowncomputer.See http://waveland.com/Glover-Mitchell/data.pdf formoreinformation.

OurstudentshaveusedTexasInstrumentcalculatorsrangingfromtheTI-30to theTI-83,TI-84,andTI-89models.Thepricerangeforthesecalculatorsisconsiderableandmightbeafactorinchoosingarequiredcalculatorforaparticularcourse. AlthoughcalculatorssuchastheTI-30dolessautomatically,theysometimesgive thestudentclearerinsightsintothestatisticaltestsbyrequiringafewmorecomputationalsteps.Theeaseofcomputationa↵ordedbycomputerprogramsorsophisticated calculatorssometimesleadstoa“blackbox”mentalityaboutstatisticsandtheircalculation.

ForbothstudentsandinstructorswerecommendD.J.Handetal.,editors,1994, AHandbookofSmallDataSets,Chapman&Hall,London.Thisbookcontains 510smalldatasetsrangingfromthenumbersofPrussianmilitarypersonnelkilled byhorsekicksfrom1875–1894(dataset#283)totheshapeofbeadworkonleather goodsofShoshoniIndians(dataset#150).Thedatasetsareinteresting,manageable, andamenabletostatisticalanalysisusingtechniquespresentedinourtext.Whilea numberofthedatasetsfromthe Handbook wereutilizedasexamplesorproblems inourtext,therearemanyothersthatcouldserveasengagingandusefulpractice problems.

Acknowledgments

Forthepreparationofthisthirdedition,thanksareduetothefollowingpeople: DonRossoandDakotaWestatWavelandPressfortheirsupportandguidance;Ann WarnerofHobartandWilliamSmithColleges,forhermeticulouswordprocessingof theearlydraftsofthismanuscript;andthestudentsofHobartandWilliamSmith Collegesfortheirmanycommentsandsuggestions,particularlyAlineGadueforher carefulscrutinyoftheﬁrstedition.

ThomasJ.Glover KevinJ.Mitchell Geneva,NY

IntroductiontoDataAnalysis

ConceptsinChapter1:

• ScientiﬁcMethodandStatisticalAnalysis

• Parameters:DescriptiveCharacteristicsofPopulations

• Statistics:DescriptiveCharacteristicsofSamples

• VariableTypes:Continuous,Discrete,Ranked,andCategorical

• MeasuresofCentralTendency:Mean,Median,andMode

• MeasuresofDispersion:Range,Variance,StandardDeviation,andStandard Error

• DescriptiveStatisticsforFrequencyData

• E↵ectsofCodingonDescriptiveStatistics

• TablesandGraphs

• QuartilesandBoxPlots

• Accuracy,Precision,andthe30–300Rule

1.1Introduction

Themodernstudyofthelifesciencesincludesexperimentation,datagathering,and interpretation.Thistexto↵ersanintroductiontothemethodsusedtoperformthese fundamentalactivities.

Thedesignandevaluationofexperiments,knownasthe scientificmethod, is utilizedinallscientificfieldsandisoftenimpliedratherthanexplicitlyoutlinedin manyinvestigations.Thecomponentsofthescientificmethodincludeobservation, formulationofapotentialquestionorproblem,constructionofahypothesis,followed byaprediction,andthedesignofanexperimenttotesttheprediction.Let’sconsider thesecomponentsbriefly.

ObservationofaParticularEvent

Generallyanobservationcanbeclassiﬁedaseitherquantitativeorqualitative.Quantitativeobservationsarebasedonsomesortofmeasurement,forexample,length, weight,temperature,andpH.Qualitativeobservationsarebasedoncategoriesreﬂectingaqualityorcharacteristicoftheobservedevent,forexample,maleversusfemale, diseasedversushealthy,andmutantversuswildtype.

StatementoftheProblem

Aseriesofobservationsoftenleadstotheformulationofaparticularproblemor unansweredquestion.Thisusuallytakestheformofa“why”questionandimplies

acauseande↵ectrelationship.Forexample,supposeuponinvestigatingaremote Fijianislandcommunityyourealizedthatthevastmajorityoftheadultssu↵erfrom hypertension(abnormallyelevatedbloodpressureswiththesystolicover165mmHg andthediastolicover95mmHg).Notethattheindividualobservationsherearequantitativewhilethepercentagethatarehypertensiveisbasedonaqualitativeevaluation ofthesample.Fromthesepreliminaryobservationsonemightformulatethequestion: Whyaresomanyadultsinthispopulationhypertensive?

FormulationofaHypothesis

Ahypothesisisatentativeexplanationfortheobservationsmade.Agoodhypothesis suggestsacauseande↵ectrelationshipandistestable.

TheFijiancommunitymaydemonstratehypertensionbecauseofdiet,lifestyle, geneticmakeup,orcombinationsofthesefactors.Becausewe’venoticedextraordinary consumptionofoctopiintheirdietandknowingoctopodshaveaveryhighcholesterol content,wemighthypothesizethat thehighlevelofhypertensioniscausedbydiet.

MakingaPrediction

Ifthehypothesisisproperlyconstructed,itcanandshouldbeusedtomakepredictions.Predictionsarebasedondeductivereasoningandtaketheformofan“if-then” statement.Forexample,agoodpredictionbasedonthehypothesisabovewouldbe: Ifthehypertensioniscausedbyahighcholesteroldiet,thenchangingthediettoalow cholesteroloneshouldlowertheincidenceofhypertension.

Thecriteriaforavalid(properlystated)predictionare:

1. An“if”clausestatingthehypothesis.

2. A“then”clausethat

(a) suggestsalteringacausativefactorinthehypothesis(changeofdiet);

(b) predictstheoutcome(lowerlevelofhypertension);

DesignoftheExperiment

Theentirepurposeanddesignofanexperimentistoaccomplishonegoal,thatis, totestthehypothesis.Anexperimentteststhehypothesisbytestingthecorrectness orincorrectnessofthepredictionsthatcamefromit.Theoretically,anexperiment shouldalterortestonlythefactorsuggestedbytheprediction,whileallotherfactors remainconstant.

Howwouldyoudesignanexperimenttotestthediethypothesisinthehypertensive population?

Thebestwaytotestthehypothesisaboveisbysettingupacontrolledexperiment. Thismightinvolveusingtworandomlychosengroupsofadultsfromthecommunity andtreatingbothidenticallywiththeexceptionoftheonefactorbeingtested.The controlgrouprepresentsthe“normal”situation,hasallfactorspresent,andisused asastandardorbasisforcomparison.Theexperimentalgrouprepresentsthe“test” situationandincludesallfactorsexceptthevariablethathasbeenaltered,inthiscase

thediet.Ifthegroupwiththelowcholesteroldietexhibits significantly lowerlevels ofhypertension,thehypothesisissupportedbythedata.Ontheotherhand,ifthe changeindiethasnoe↵ectonhypertension,thenaneworrevisedhypothesisshould beformulatedandtheexperimentalprocedureredesigned.Finally,thegeneralizations thataredrawnbyrelatingthedatatothehypothesiscanbestatedasconclusions. Whilethesestepsoutlinedabovemayseemstraightforward,theyoftenrequire considerableinsightandsophisticationtoapplyproperly.Inourexample,howthe groupsarechosenisnotatrivialproblem.Theymustbeconstructedwithoutbiasand mustbelargeenoughtogivetheresearcheranacceptablelevelofconfidenceinthe results.Further,howlargeachangeissignificantenoughtosupportthehypothesis? Whatis statisticallysignificant maynotbe biologicallysignificant

Afoundationinstatisticalmethodswillhelpyoudesignandinterpretexperiments properly.Thefieldofstatisticsisbroadlydefinedasthemethodsandproceduresfor collecting,classifying,summarizing,andanalyzingdata,andutilizingthedatatotest scientifichypotheses.Theterm statistics isderivedfromtheLatinforstate,andoriginallyreferredtoinformationgatheredinvariouscensusesthatcouldbenumerically summarizedtodescribeaspectsofthestate,forexample,bushelsofwheatperyear, ornumberofmilitary-agedmen.Overtimestatisticshascometomeanthescientific studyofnumericaldatabasedonnaturalphenomena.Statisticsappliedtothelife sciencesisoftencalled biostatistics or biometry.Thefoundationsofbiostatistics gobackseveralhundredyears,butstatisticalanalysisofbiologicalsystemsbegan inearnestinthelatenineteenthcenturyasbiologybecamemorequantitativeand experimental.

1.2PopulationsandSamples

Todayweusestatisticsasameansofinformingthedecision-makingprocessesinthe faceoftheuncertaintiesthatmostrealworldproblemspresent.Oftenwewishto makegeneralizationsaboutpopulationsthataretoolargeortoodiculttosurvey completely.Inthesecaseswesamplethepopulationandusecharacteristicsofthe sampletoextrapolatetocharacteristicsofthelargerpopulation.SeeFigure1.1.

Real-worldproblemsconcernlargegroupsor populations aboutwhichinferences mustbemade.(Isthereasizedi↵erencebetweentwocolormorphsofthesamespecies ofseastar?Aretheo↵springofacertaincrossoffruitﬂiesina3:1ratioofnormalto eyeless?)Certaincharacteristicsofthepopulationareofparticularinterest(systolic bloodpressure,weightingrams,restingbodytemperature).Thevaluesofthese characteristicswillvaryfromindividualtoindividualwithinthepopulation.These characteristicsarecalled randomvariables becausetheyvaryinanunpredictable wayorinawaythatappearsorisassumedtodependonchance.Thedi↵erenttypes ofvariablesaredescribedinSection1.3.

Adescriptivemeasureassociatedwitharandomvariablewhenitisconsidered overthe entirepopulation iscalleda parameter.Examplesarethemeanweightof allgreenturtles, Cheloniamydas,orthevarianceinclutchsizeofalltigersnakes, Notechisscutatus.Ingeneral,suchparametersaredicult,ifnotimpossible,to determinebecausethepopulationistoolargeorexpensivetostudyinitsentirety. Consequently,oneisforcedtoexamineasubsetor sample ofthepopulationandmake inferencesabouttheentirepopulationbasedonthissample.Adescriptivemeasure associatedwitharandomvariableofa sample iscalleda statistic.Themeanweight

Population(s)havetraitscalledrandomvariables. Summarycharacteristicsofthepopulationrandomvariables arecalledparameters: µ, 2 , N .

Randomsamplesofsize n ofthepopulation(s)generatenumericaldata: Xi ’s.

Thesedatacanbeorganizedinto summarystatistics: X , s2 , n, graphs,andﬁgures(Chapter1).

Thedatacanbeanalyzedusing anunderstandingofbasicprobability(Chapters2–4) andvarioustestsofhypotheses(Chapters5–11).

Theanalysesleadtoconclusionsorinferences aboutthepopulation(s)ofinterest.

FIGURE1.1. Thegeneralapproachtostatisticalanalysis.

of25femalegreenturtleslayingeggsonHeronIslandorthevariabilityinclutchsize of50clutchesoftigersnakeeggscollectedinsoutheasternQueenslandareexamples ofstatistics.

Whilesuchstatisticsarenotequaltothepopulationparameters,itishopedthat theyaresucientlyclosetothepopulationparameterstobeusefulorthatthepotentialerrorinvolvedcanbequantiﬁed.Samplestatisticsalongwithanunderstanding ofprobabilityformthefoundationforinferencesaboutpopulationparameters.See Figure1.1forreview.

Chapter1providestechniquesfororganizingsampledata.Chapters2through4 presentthenecessaryprobabilityconcepts,andtheremainingchaptersoutlinevarious techniquestotestawiderangeofpredictionsfromhypotheses.

ConceptChecks. Attheendofseveralofthesectionsineachchapterweincludeoneor twoquestionsdesignedasarapidcheckofyourmasteryofacentralideaofthesection’s content.Thesequestionswillbebemosthelpfulifyoudoeachasyouencounteritinthe text.Answerstothesequestionsaregivenattheendofeachchapterjustbeforetheexercises.

ConceptCheck1.1. Whichofthefollowingarepopulationsandwhicharesamples?

(a )Theweightsof25randomlychoseneighthgradeboysintheDetroitpublicschool system.

(b )ThenumberofeggsfoundineachospreynestonMt.DesertIslandinMaine.

(c )Theheightsof15redwoodtreesmeasuredintheMuirWoodsNationalMonument, anoldgrowthcoastredwoodforest.

(d )Thelengthsofalltheblindcaveﬁsh, Astyanasmexicanus,inasmallcavernsystem incentralMexico.

1.3VariablesorDataTypes

Thereareseveraldatatypesthatariseinstatistics.Eachstatisticaltestrequiresthat thedataanalyzedbeofaspeciﬁedtype.Herearethemostcommontypesofvariables.

1. Quantitativevariables fallintotwomajorcategories:

(a) Continuousvariables or intervaldata canassumeanyvalueinsome (possiblyunbounded)intervalofrealnumbers.Commonexamplesinclude length,weight,temperature,volume,andheight.Theyarisefrommeasurement.

(b) Discretevariables assumeonlyisolatedvalues.Examplesincludeclutch size,treesperhectare,armsperseastar,oritemsperquadrat.Theyarise fromcounting.

2. Ranked(ordinal)variables arenotmeasuredbutnonethelesshaveanatural ordering.Forexample,candidatesforpoliticalocecanberankedbyindividual voters.Orstudentscanbearrangedbyheightfromshortesttotallestand correspondinglyrankedwithouteverbeingmeasured.Therankvalueshaveno inherentmeaningoutsidethe“order”thattheyprovide.Thatis,acandidate ranked2isnottwiceaspreferableasthepersonranked1.(Comparethiswith measurementvariableswhereaplant2feettall is twiceastallasaplant1foot tall.Withmeasurementvariablessuchratiosaremeaningful,whilewithordinal variablestheyarenot.)

3. Categoricaldata arequalitativedata.Someexamplesarespecies,gender, genotype,phenotype,healthy/diseased,andmaritalstatus.Unlikewithranked data,thereisno“natural”orderingthatcanbeassignedtothesecategories.

Whenmeasurementvariablesarecollectedforeitherapopulationorasample,the numericalvalueshavetobeabstractedorsummarizedinsomeway.Thesummarydescriptivecharacteristicsofapopulationofobjectsarecalled populationparameters orjust parameters.Thecalculationofaparameterrequiresknowledgeofthemeasurementvariablesvaluefor every memberofthepopulation.Theseparametersare usuallydenotedbyGreeklettersanddonotvarywithinapopulation.Thesummary descriptivecharacteristicsofasampleofobjects,thatis,asubsetofthepopulation, arecalled statistics.Samplestatisticscanhavedi↵erentvalues,dependingonhow thesampleofthepopulationwaschosen.Statisticsaredenotedbyvarioussymbols, but(almost)neverbyGreekletters.

1.4MeasuresofCentralTendency:Mean,Median,andMode

Mean

Thereareseveralcommonlyusedmeasurestodescribethelocationorcenterofa populationorsample.Themostwidelyutilizedmeasureofcentraltendencyisthe arithmeticmean or average

The populationmean isthesumofthevaluesofthevariableunderstudydivided bythetotalnumberofobjectsinthepopulation.Itisdenotedbyalowercase µ (“mu”).Eachvalueisalgebraicallydenotedbyan X withasubscriptdenotation i

Forexample,asmalltheoreticalpopulationwhoseobjectshadvalues1,6,4,5,6,3, 8,7wouldbedenoted

Wewoulddenotethepopulationsizewithacapital N .Inourtheoreticalpopulation N =8.

Thepopulationmean µ wouldbe

FORMULA1.1. Thealgebraicshorthandformulaforapopulationmeanis µ = PN i=1 Xi N

TheGreekletter ⌃ (“sigma”)indicatessummation.Thesubscript i =1indicates tostartwiththeﬁrstobservationandthesuperscript N meanstocontinueuntiland includingthe N thobservation.Thesubscriptandsuperscriptmayrepresentother startingandstoppingpointsforthesummationwithinthepopulationorsample.For theexampleabove,

i=2 Xi wouldindicatethesumof X2 + X3 + X4 + X5 or6+4+5+6=21.

Noticealsothat N X i=1 Xi iswritten PN i=i Xi whenthesummationsymbolisembed-

dedinasentence.Infact,tofurtherreduceclutter,thesummationsignmaynot indexedatall,forexample P Xi .Itisimpliedthattheoperationofadditionbegins withtheﬁrstobservationandcontinuesthroughthelastobservationinapopulation, thatis,

Ifsigmanotationisnewtoyouorifyouwishaquickreviewofitsproperties,read AppendixA.1beforecontinuing.

FORMULA1.2. Thesamplemeanisdeﬁnedby X = Pn i=1 Xi n , where n isthesamplesize. Thesamplemeanisusuallyreportedtoonemoredecimalplace thanthedataandalwayshasappropriateunitsassociatedwithit.

Thesymbol X (read“X bar”)indicatesthattheobservationsofasubsetofsize n fromapopulationhavebeenaveraged. X isfundamentallydi↵erentfrom µ because samplesfromapopulationcanhavedi↵erentvaluesfortheirsamplemean,thatis, theycanvaryfromsampletosamplewithinthepopulation.Thepopulationmean, however,isconstantforagivenpopulation.

Againconsiderthesmalltheoreticalpopulation1, 6, 4, 5, 6, 3, 8, 7.Asampleofsize 3mayconsistof5, 3, 4with X =4or6, 8, 4with X =6.

SECTION 1.4:MeasuresofCentralTendency:Mean,Median,andMode7

Actuallythereare56possiblesamplesofsize3thatcouldbedrawnfromthe populationin(1.1).Onlyfoursampleshavea sample meanthesameasthepopulation mean,thatis, X = µ:

SampleSum X

Eachsamplemean X isanunbiasedestimateof µ butdependsonthevalues includedinthesampleandsamplesizeforitsactualvalue.Wewouldexpectthe averageofallpossible X ’stobeequaltothepopulationparameter, µ.Thisis,infact, thedeﬁnitionofan unbiasedestimator ofthepopulationmean.

Ifyoucalculatethesamplemeanforeachofthe56possiblesampleswith n =3 andthenaveragethesesamplemeans,theywillgiveanaveragevalueof5,thatis, thepopulationmean, µ.Rememberthatmostrealpopulationsaretoolargeortoo diculttocensuscompletely,sowemustrelyonusingasinglesampletoestimateor approximatethepopulationcharacteristics.

Median

Thesecondmeasureofcentraltendencyisthemedian.The median isthe“middle” valueofan ordered listofobservations.Thoughthisideaissimpleenough,itwill proveusefultodeﬁneitintermsofanevensimplernotion.The depth ofavalue isitspositionrelativetothenearestextreme(end)whenthedataarelistedinorder fromsmallesttolargest.

EXAMPLE1.1. Thetablebelowgivesthecircumferencesatchestheight(CCH)(in cm)andtheircorrespondingdepthsfor15sugarmaples, Acersaccharum,measured inaforestinsoutheasternOhio.

CCH1821222929363738565966708893120

Depth123456787654321

The populationmedian M istheobservationwhosedepthis d = N +1 2 ,where N isthepopulationsize.

NotethatthisparameterisnotaGreekletterandisseldomcomputedinpractice. Ratherasamplemedian X (read“X tilde”)isthestatisticusedtoapproximate orestimatethepopulationmedian. X isdeﬁnedastheobservationwhosedepthis d = n+1 2 ,where n isthesamplesize.InExample1.1,thesamplesizeis n =15,sothe depthofthesamplemedianis d =8.Thesamplemedian X = X n+1 2 = X8 =38cm.

EXAMPLE1.2. ThetablebelowgivesCCH(incm)for12cypresspines, Callitris preissii,measurednearBrownLakeonNorthStradbrokeIsland.

CCH1719313948566873737580122 Depth123456654321

Since n =12,thedepthofthemedianis 12+1 2 =6.5.Obviouslynoobservation hasdepth6.5,sothisisinterpretedastheaverageofbothobservationswhosedepth is6inthelistabove.So X = 56+68 2 =62cm.

Mode

The mode isdeﬁnedasthemostfrequentlyoccurringvalueinadataset.Themode ofExample1.2wouldbe73cm,whileExample1.1wouldhaveamodeof29cm. Insymmetricaldistributionsthemean,median,andmodearecoincident.Bimodal distributionsmayindicateamixtureofsamplesfromtwopopulations,forexample, weightsofmalesandfemales.Whilethemodeisnotoftenusedinbiologicalresearch, reportingthenumberofmodes,ifmorethanone,canbeinformative.

Eachmeasureofcentraltendencyhasdi↵erentfeatures.Themeanisapurposeful measureonlyforaquantitativevariable,whetheritiscontinuous(forexample,height) ordiscrete(forexample,clutchsize).Themediancanbecalculatedwhenevera variablecanberanked(includingwhenthevariableisquantitative).Finally,the modecanbecalculatedforcategoricalvariables,aswellasforquantitativeandranked variables.

Thesamplemedianexpresseslessinformationthanthesamplemeanbecauseit utilizesonlytheranksandnottheactualvaluesofeachmeasurement.Themedian, however,isresistanttothee↵ectsof outliers.Extremevaluesoroutliersinasamplecandrasticallya↵ectthesamplemean,whilehavinglittlee↵ectonthemedian. ConsiderExample1.2with X =58.4cmand ˜ X =62cm.Suppose X12 hadbeenmistakenlyrecordedas1220cminsteadof122cm.Themean X wouldbecome149.9cm whilethemedian ˜ X wouldremain62cm.

1.5MeasuresofDispersionandVariability:Range, Variance,StandardDeviation,andStandardError

EXAMPLE1.3. Thetablethatfollowsgivestheweightsoftwosamplesofalbacore tuna, Thunnusalalunga (inkg).Howwouldyoucharacterizethedi↵erencesinthe samples?

Sample1Sample2

SOLUTION. Uponinvestigationweseethatbothsamplesarethesamesizeand havethesamemean, X 1 = X 2 =10 11kg.Infact,bothsampleshavethesame median.Toseethis,arrangethedatasetsinrankorderasinTable1.1.Wehave n =9,so X = X n+1 2 = X5 ,whichis9.9kgforbothsamples.

Neitherofthesampleshasamode.SobyallthedescriptorsinSection1.4these samplesappeartobeidentical.Clearlytheyarenot.Thedi↵erenceinthesamples

TABLE1.1. Theorderedsamplesof Thunnus alalunga

isreﬂectedinthescatterorspreadoftheobservations.Sample1ismuchmore uniformthanSample2,thatis,theobservationstendtoclustermuchnearerthe meaninSample1thaninSample2.Weneeddescriptivemeasuresofthisscatteror dispersionthatwillreﬂectthesedi↵erences.

Range

Thesimplestmeasureofdispersionor“spread”ofthedataistherange.

FORMULAS1.3. Thedi↵erencebetweenthelargestandsmallestobservationsinagroup ofdataiscalledthe range:

Samplerange= Xn X1

Populationrange= XN X1

Whenthedataareorderedfromsmallesttolargest,thevalues Xn and X1 arecalledthe samplerangelimits

InExample1.3wehavefromTable1.1 Sample1:range=

Therangeforeachofthesetwosamplesreﬂectssomedi↵erencesindispersion, buttherangeisarathercrudeestimatorofdispersionbecauseitusesonlytwoof thedatapointsandissomewhatdependentonsamplesize.Assamplesizeincreases, weexpectlargestandsmallestobservationstobecomemoreextremeand,therefore, thesamplerangetoincreaseeventhoughthepopulationrangeremainsunchanged. Itisunlikelythatthesamplewillincludethelargestandsmallestvaluesfromthe population,sothesamplerangeusuallyunderestimatesthepopulationrangeandis, therefore,abiasedestimator.

Variance

Todevelopameasurethatusesallthedatatoformanindexofdispersionconsider thefollowing.Supposeweexpresseachobservationasadistancefromthemean

xi = Xi X .Thesedi↵erencesarecalled deviates andwillbesometimespositive (Xi isabovethemean)andsometimesnegative(Xi isbelowthemean).

Ifwetrytoaveragethedeviates,theyalwayssumto0.Becausethemeanisthe centraltendencyorlocation,thenegativedeviateswillexactlycanceloutthepositive deviates.Considerasimplenumericalexample

Themean X =4,andthedeviatesare x1 = 2 x2 = 1 x3 = 3 x4 =4 x5 =2

Noticethatthenegativedeviatescancelthepositiveonessothat P(Xi X )=0. Algebraicallyonecandemonstratethesameresultmoregenerally,

Since X isaconstantforanysample,

Since X = P Xi n ,then nX = P Xi ,so

Tocircumventthisunfortunateproperty,thewidelyusedmeasureofdispersion calledthe samplevariance utilizesthesquaresofthedeviates.Thequantity

isthesumofthesesquareddeviatesandisreferredtoasthe correctedsumof squares, denotedbyCSS.Eachobservationiscorrectedoradjustedforitsdistance fromthemean.

FORMULA1.4. Thecorrectedsumofsquaresisutilizedintheformulaforthesample variance, s 2 = Pn i=1 (Xi X )2 n 1 . Thesamplevarianceisusuallyreportedtotwomoredecimalplacesthanthedataandhas unitsthatarethesquareofthemeasurementunits.

Thiscalculationisnotasintuitiveasthemeanormedian,butitisaverygood indicatorofscatterordispersion.Iftheaboveformulahad n insteadof n 1in thedenominator,itwouldbeexactlytheaveragesquareddistancefromthemean. ReturningtoExample1.3,thevarianceofSample1is0.641kg2 andthevarianceof Sample2is49.851kg2 ,reﬂectingthelarger“spread”inSample2. Asamplevarianceisanunbiasedestimatorofaparametercalledthe population variance.

FORMULA1.5. Apopulationvarianceisdenotedby 2 (“sigmasquared”)andisdeﬁned by

Itreally is theaveragesquareddeviationfromthemeanforthepopulation.The n 1inFormula1.4makesitanunbiasedestimateofthepopulationparameter.(See AppendixA.2foraproof.)Rememberthat“unbiased”meansthattheaverageofall possiblevaluesof s2 foracertainsizesamplewillbeequaltothepopulationvalue 2 . Formulas1.4and1.5aretheoreticalformulasandarerathertedioustoapply directly.Computationalformulasutilizethefactthatmostcalculatorswithstatistical registerssimultaneouslycalculate n, P Xi ,and P X 2 i .

FORMULA1.6. Thecorrectedsumofsquares P(Xi X )2 maybecomputedmoresimply as

)

P X 2 i istheuncorrectedsumofsquaresand (P Xi )2 n isthecorrectionterm.

ToverifyFormula1.6,usingthepropertiesinAppendixA.1noticethat

Rememberthat X = P Xi n ,so nX = P Xi ;hence

Substituting P Xi n for X yields

FORMULA1.7. Useofthecomputationalformulaforthecorrectedsumofsquaresgives thecomputationalformulaforthesamplevariance

ReturningtoExample1.3,Sample2, X Xi =91, X X 2 i =1318 92,n =9, so s 2 = 1318.92 (91)2 9 9 1 = 1318.92 920.11 8 =

Remember,thenumeratormustalwaysbeapositivenumberbecauseit’sasumof squareddeviations.Becausethevariancehasunitsthatarethesquareofthemeasurementunits,suchassquaredkilogramsabove,theyhavenophysicalinterpretation. Withasimilarderivation,thepopulationvariancecomputationalformulacanbe showntobe 2 = P X 2 i (P Xi )2 N N

Again,thisformulaisrarelyusedsincemostpopulationsaretoolargetocensus directly.

StandardDeviation

FORMULAS1.8. Amore“natural”calculationisthe standarddeviation,whichisthe positivesquarerootofthepopulationorsamplevariance,respectively.

Thesedescriptionshavethesameunitsastheoriginalobservationsandare,inasense, theaveragedeviationofobservationsfromtheirmean.

Again,considerExample1.3.

ForSample1: s2 1 =0.641kg2 , so s1 =0.80kg.

ForSample2: s2 2 =49 851kg2 , so s2 =7 06kg

Thestandarddeviationofasampleisrelativelyeasytointerpretandclearlyreﬂects thegreatervariabilityinSample2comparedtoSample1. Likethemean,thestandard deviationisusuallyreportedtoonemoredecimalplacethanthedataandalwayshas appropriateunitsassociatedwithit. Boththevarianceandstandarddeviationcanbe usedtodemonstratedi↵erencesinscatterbetweensamplesorpopulations.

ThinkingaboutSumsofSquares

Ithasbeenourexperienceteachingelementarydescriptivestatisticsthatstudentshave littleproblemunderstandingmeasuresofcentraltendencysuchasthemeanandmedian. Thesamplevarianceandstandarddeviation,ontheotherhand,areoftenlessintuitiveto beginningstudents.Solet’sstepbackforamomenttocarefullyconsiderwhattheseindices ofvariabilityarereallymeasuring.

Supposeasmallsampleoflengths(incm)ofsmallmouthbassiscollected.

2732304135 Xi ’s

Theseﬁveﬁshhaveanaveragelengthof33.0cm.Somearesmallerandotherslargerthan thismean.Togetasenseofthisvariability,let’ssubtracttheaveragefromeachdatapoint (Xi 33)= xi generatingwhatiscalledthe deviate foreachvalue.Thedatawhenrescaled bysubtractingthemeanbecome

Whenweaddthesedeviations,theirsumis0,sotheirmeanisalso0.Toquantifythese deviationsand,therefore,thesample’svariability,wesquarethesedeviatestopreventthem fromalwayssummingto0.

Thesumofthesesquareddeviatesis

Thiscalculationiscalledthe corrected orrescaled sumofsquares (squareddeviates). Ifweaveragedthesecalculationsbydividingthecorrectedsumofsquaresbythesample size n =5,wewouldhaveameasureoftheaveragesquareddistanceoftheobservations fromtheirmean.Thismeasureiscalledthesample variance.However,withsamplesthis

SECTION 1.5:MeasuresofDispersionandVariability13

calculationusuallyinvolvesdivisionby n 1ratherthan n.Thismodiﬁcationaddresses issuesofbiasthatarediscussedinSection1.5andAppendixA.2.

Thepositivesquarerootofthesamplevarianceiscalledthe standarddeviation.In thiscontext,standardsigniﬁes“usual”or“average.”Sothesamplevarianceandstandard deviationarejustmeasuringtheaverageamountthatobservationsvaryfromtheircenteror mean.Theyaresimplyaveragesofvariabilityratherthanaveragesofobservationmeasurementvalueslikethemean.Theﬁshsamplehadameanof33 0cmwithastandarddeviation of5 3cm.

StandardError

Themostimportantstatisticofcentraltendencyisthesamplemean.However,the meanvariesfromsampletosample(seepage7).Wenowdevelopamethodtomeasure thevariabilityofthesamplemean.

Thevarianceandstandarddeviationaremeasuresofdispersionorscatterofthe valuesofthe X ’sinasampleorpopulation.Becausemeansutilizeanumberof X ’s intheircalculation,theytendtobelessvariablethantheindividual X ’s.Anextreme valueof X (largeorsmall)contributesonlyone nthofitsvaluetothesamplemean andis,therefore,somewhatdampenedout.

Ameasureofthevariabilityin X ’sthendependsontwofactors:thevariability inthe X ’sandthenumberof X ’saveragedtogeneratethemean X .Weutilizetwo statisticstoestimatethisvariability.

FORMULAS1.9. The varianceofthesamplemean isdeﬁnedtobe

s 2 n , andstandarddeviationofthesamplemeanor,morecommonly,the standarderror

SE= s pn

Thestandarderroristhemoreimportantofthesetwostatistics.Itsutilitywillbe becomeclearinChapter4whentheCentralLimitTheoremisoutlined.Thestandard errorisusuallyreportedtoonemoredecimalplacethanthedata,orif n islarge,to twomoreplaces.

EXAMPLE1.4. Calculatethevarianceofthesamplemeanandthestandarderror forthedatasetsinExample1.3.

SOLUTION. Thesamplesizesareboth n =9.ForSample1, s 2 =0 641kg2 ,sothe varianceofthesamplemeanis

s 2 n = 0.641 9 =0.71kg2

andthestandarddeviationis s =0.80kg,sothestandarderroris

SE= s pn = 0 80 p9 =0.27kg.

ForSample2, s 2 =49.851kg2 ,sothevarianceofthesamplemeanis

s 2 n = 49.851 9 =16 62kg2

andthestandarddeviationis s =7.06kg,sothestandarderroris

SE= s pn = 7 06 p9 =2 35kg

ConceptCheck1.2. Thefollowingdataarethecarapace(shell)lengthsincentimetersof asampleofadultfemalegreenturtles, Cheloniamydas,measuredwhilenestingatHeron IslandinAustralia’sGreatBarrierReef.Calculatethefollowingdescriptivestatisticsforthis sample:samplemean,samplemedian,correctedsumofsquares,samplevariance,standard deviation,standarderror,andrange.Remembertousetheappropriatenumberofdecimal placesinthesedescriptivestatisticsandtoincludethecorrectunitswithallstatistics.

11010511711395115989793120

1.6DescriptiveStatisticsforFrequencyTables

Whenlargedatasetsareorganizedintofrequencytablesorpresentedasgroupeddata, thereareshortcutmethodstocalculatethesamplestatistics: X , s2 ,and s

EXAMPLE1.5. Thefollowingtableshowsthenumberofsedgeplants, Carexﬂacca, foundin800samplequadratsinanecologicalstudyofgrasses.Eachquadratwas 1m2 . Plants/quadrat(Xi )Frequency(fi )

TocalculatethesampledescriptivestatisticsusingFormulas1.2,1.7,and1.8would bequitearduous,involvingsumsandsumsofsquaresof800numbers.Fortunately, thefollowingformulaslimitthedrudgeryforthesecalculations.

Itisclearthat X1 =0occurs f1 =268times, X2 =1occurs f2 =316times,etc., andthatthesumofobservationsintheﬁrstcategoryis f1 X1 ,thesuminthesecond categoryis f2 X2 ,etc.Thesumofallobservationsis,therefore,

where c denotesthenumberofcategories.Thetotalnumberofobservationsis

c i=1 fi ,andasaresult:

FORMULA1.10. Thesamplemeanforagroupeddatasetisgivenby

SECTION 1.6:DescriptiveStatisticsforFrequencyTables15

Similarly,thecomputationalformulaforthesamplevarianceforagroupeddataset canbederiveddirectlyfrom s 2 = Pc i=1 fi (Xi X )2 n 1

FORMULA1.11. Thesamplevarianceforagroupeddatasetisgivenby s 2 = P

, where n = Pc i=1 fi .

ToapplyFormulas1.10and1.11,weneedtocalculateonlythreesums:

• Thesamplesize n = P fi

• Thesumofobservations P fi Xi

• Theuncorrectedsumofsquaredobservations P fi X 2 i ReturningtoExample1.5,itisnowstraightforwardtocalculate X , s2 ,and s.

Plants/quadrat(Xi ) fi fi Xi fi X 2 i 026800 1316316316 2135270540 361183549 41560240 531575 61636 71749

Sum8008571805

Notethatcolumn4inthetableaboveisgeneratedbyﬁrstsquaring Xi andthen multiplyingby fi ,notbysquaringthevaluesincolumn3.Inotherwords, fi X 2 i = (fi Xi )2 . Thesamplemeanis

thesamplevarianceis

2 =

, andthesamplestandarddeviationis

s = p1 11=1 1plants/quadrat.

Example1.5summarizeddataforadiscretevariabletakingonwholenumber valuesfrom0to7.Continuousvariablescanalsobepresentedasgroupeddatain frequencytables.

EXAMPLE1.6. Thefollowingdatawerecollectedbyrandomlysamplingalarge populationofrainbowtrout, Salmogairdnerii.Thevariableofinterestisweightin pounds.

Rainbowtrouthaveweightsthatcanrangefromalmost0to20lbormore.Moreovertheirweightscantakeonanyvalueinthatinterval.Forexample,aparticular troutmayweigh7.3541lb.WhendataaregroupedasinExample1.6intervalsare impliedforeachclass.Aﬁshinthe3-lbclassweighssomewherebetween2.50and 3.49lbandaﬁshinthe9-lbclassweighsbetween8.50and9.49lb.Fishwereweighed tothenearestpoundallowinganalysisofgroupeddataforacontinuousmeasurement variable.InExample1.6,

Again,considerthatcalculationtimeissavedbyworkingwith13classesinstead of110individualobservations.Whethermeasuringtherainbowtrouttothenearest poundwasappropriatewillbeconsideredinSection1.10.

1.7TheE↵ectofCodingData

Whilegroupingdatacansaveconsiderabletimeande↵ort,codingdatamayalsoo↵er similarsavings.Codinginvolvesconversionofmeasurementsorstatisticsintoeasier toworkwithvaluesbysimplearithmeticoperations.Itissometimesusedtochange unitsortoinvestigateexperimentale↵ects.

AdditiveCoding

Additivecodinginvolvestheadditionorsubtractionofaconstantfromeachobservationinadataset.SupposethedatagatheredinExample1.6werecollectedusinga scalethatweighedtheﬁsh2lbtoolow.Wecouldgobacktothedataandadd2lbto eachobservationandrecalculatethedescriptivestatistics.Amoreecienttackwould betorealizethat ifaﬁxedamount c isaddedorsubtractedfromeachobservationin adataset,thesamplemeanwillbeincreasedordecreasedbythatamount,butthe variancewillbeunchanged.

Toseewhy,if X c isthecodedmean,then

If s2 c isthecodedsamplevariance,then

therefore, sc = s

Ifthescaleweighed2lblightinExample1.6thenew,correctedstatisticswould be X c =7 1+2 0=9 1lb,and s2 c =5 75(lb)2 ,and s

MultiplicativeCoding

Multiplicativecodinginvolvesmultiplyingordividingeachobservationinadatasetby aconstant.SupposethedatainExample1.6weretobepresentedataninternational conferenceand,therefore,hadtobepresentedinmetricunits(kilograms)ratherthan Englishunits(pounds).Since1kgequals2.20lb,wecouldconverttheobservationsto kilogramsbymultiplyingeachobservationby1/2.20or0.45kg/lb.Again,themore ecientapproachwouldbetorealizethefollowing.

Ifeachoftheobservationsinadatasetismultipliedbyaﬁxedquantity c,thenew meanis c timestheoldmeanbecause

Furtherthenewvarianceis c2 timestheoldvariancebecause

)

andfromthisitfollowsthatthenewstandarddeviationis c timestheoldstandard deviation, sc = cs.(Remember,too,thatdivisionisjustmultiplicationbyafraction.)

ToconvertthesummarystatisticsofExample1.6tometricwesimplyutilizethe formulasabovewith c =0.45kg/lb.

X c = cX =0 45kg/lb(7 1lb)=3 20kg s 2 c = c 2 s 2 =(0.45kg/lb)2 (5.75lb2 )=1.164kg2 .

sc = cs =0 45kg/lb(2 4lb)=1 08kg

Ourunderstandingofthee↵ectsofcodingondescriptivestatisticscansometimes helpdeterminethenatureofexperimentalmanipulationsofvariables.

EXAMPLE1.7. Supposethataparticularvarietyofstrawberryyieldsanaverage 50goffruitperplantinﬁeldconditionswithoutfertilizer.Withahighnitrogen fertilizerthisvarietyyieldsanaverageof100goffruitperplant.Anew“highyield” varietyofstrawberryyields150goffruitperplantwithoutfertilizer.Howmuch wouldtheyieldbeexpectedtoincreasewiththehighnitrogenfertilizer?

SOLUTION. Wehavetwochoiceshere:Thee↵ectofthefertilizercouldbeadditive,increasingeachvalueby50g(Xi +50)orthee↵ectofthefertilizercouldbe multiplicative,doublingeachvalue(2Xi ).Intheﬁrstcaseweexpecttheyieldofthe newvarietywithfertilizertobe150g+50g=200g.Inthesecondcaseweexpect theyieldofthenewvarietywithfertilizertobe2 ⇥ 150g=300g.Todi↵erentiate betweenthesepossibilitieswemustlookatthevarianceinyieldoftheoriginalvariety withandwithoutfertilizer.Ifthee↵ectoffertilizerisadditive,thevarianceswithand withoutfertilizershouldbesimilarbecauseadditivecodingdoesn’te↵ectthevariance: Xi +50yields s 2 ,theoriginalsamplevariance.Ifthee↵ectistodoubletheyield,the varianceofyieldswithfertilizershouldbefourtimesthevariancewithoutfertilizer becausemultiplicativecodingincreasesthevariancebythesquareoftheconstant usedincoding.2Xi yields4s 2 ,doublingtheyieldincreasesthesamplevariancefour fold.

1.8TablesandGraphs

Thedatacollectedinasampleareoftenorganizedintoatableorgraphasasummary representation.ThedatapresentedinExample1.5werearrangedintoafrequency tableandcouldbefurtherorganizedintoa relativefrequencytable byexpressing eachrowasapercentageofthetotalobservationsorintoa cumulativefrequency distribution byaccumulatingallobservationsuptoandincludingeachrow.Thecumulativefrequencydistributioncouldbemanipulatedfurtherintoa relativecumulativefrequencydistribution byexpressingeachrowofthecumulativefrequency distributionasapercentageofthetotal.Seecolumns3–5inTable1.2fortherelative frequency,cumulativefrequency,andrelativecumulativefrequencydistributionsfor Example1.5.(Here n = P fi and r istherownumber.)

TABLE1.2. Therelativefrequencies,cumulativefrequencies,andrelativecumulativefrequenciesforExample1.5

(100) Xi fi RelativeCumulativeRelativecumulative Plants/quadratFrequencyfrequencyfrequencyfrequency 026833.50026833.500 131639.50058473.000 213516.87571989.875 3617.62578097.500 4151.87579599.375 530.37579899.750 610.12579999.875 710.125800100.000

relativefrequencies.SeeFigure1.2.Inabargraphthebar heights aretherelative frequencies.Thebarsareofequalwidthandspacedequidistantlyalongthehorizontal axis.Becausethesedataarediscrete,thatis,becausetheycanonlytakecertainvalues alongthehorizontalaxis,thebarsdonottoucheachother.

FIGURE1.2. AbargraphofrelativefrequenciesforExample1.5.

ThedatainExample1.6canbesummarizedinasimilarfashionwithrelative frequency,cumulativefrequency,andrelativecumulativefrequencycolumns.SeeTable1.3.

TABLE1.3. Therelativefrequencies,cumulativefrequencies,andrelativecumulativefrequenciesforExample1.6

82421.828678.18 976.369384.55 1098.1810292.73 1121.8210494.55 1243.6410898.18 1321.82110100.00 P 110100.00

BecausethedatainExample1.6arecontinuousmeasurementdatawitheachclass implyingarangeofpossiblevaluesfor Xi ,forexample, Xi =3implieseachﬁsh weighedbetween2.50lband3.49lb,thepictorialrepresentationofthedatasetis a histogram notabargraph.Histogramshavetheobservationclassesalongthe horizontalaxis.The area ofthestriprepresentstherelativefrequency.(Iftheclasses

1:IntroductiontoDataAnalysis ofthehistogramareofequalwidth,astheyoftenare,thentheheightsofthestrips willrepresenttherelativefrequency,asinabargraph.)SeeFigure1.3.Thestrips inthiscasetoucheachotherbecauseeach X valuecorrespondstoarangeofpossible values.

FIGURE1.3. AhistogramfortherelativefrequenciesforExample1.6.

Whilethecategoriesinabargrapharepredeterminedbecausethedataarediscrete,theclassesrepresentingrangesofcontinuousdatavaluesmustbeselectedby theinvestigator.Infact,itissometimesrevealingtocreatemorethanonehistogram ofthesamedatabyemployingclassesofdi↵erentwidths.

EXAMPLE1.8. Thelistbelowgivessnowfallmeasurementsfor50consecutiveyears (1951–2000)inSyracuse,NY(ininchesperyear).Thedatahavebeenrearranged inorderofincreasingannualsnowfall.Createahistogramusingclassesofwidth 30inchesandthencreateahistogramusingnarrowerclassesofwidth15inches. (Source: http://neisa.unh.edu/Climate/IndicatorExcelFiles.zip)

71.773.477.881.684.184.184.386.791.393.8 93.994.497.597.698.199.199.9100.7101.0101.9 102.1102.2104.8108.3108.5110.2111.0113.3114.2114.3 116.2119.2119.5122.9124.0125.7126.6130.1131.7133.1 135.3145.9148.1149.2153.8160.9162.6166.1172.9198.7

SOLUTION. Usethesamescaleforthehorizontalaxis(inchesofannualsnowfall)in bothhistograms.Rememberthatthe area ofastriprepresentstherelativefrequency oftheassociatedclass.Sincethesnowfallclassesofthesecondhistogram(15in)are one-halfthoseoftheﬁrsthistogram(30in),thentheverticalscalemustbemultiplied byafactorof2sothatequalareasineachhistogramwillrepresentthesamerelative frequencies.Thus,asingleyearinthesecondhistogramwillberepresentedbyastrip halfaswidebuttwiceastallasintheﬁrsthistogram,asindicatedinthekeyinthe upperleftcornerofeachdiagram.

Inthiscase,thenarrowerclassesofthesecondhistogramprovidemoreinformation.Forexample,nearlyone-thirdofallrecentwintersinSyracusehaveproduced snowfallsinthe90–105inchrange.Therewasoneyearwithaverylargeamountof snowfallofapproximately200in.Whileonecouldgarnerthissameinformationfrom thedataitself,normallyonewouldusea(single)histogramtosummarizedataand notlisttheentiredataset.

FIGURE1.4. TwohistogramsforthedatainExample1.8.Theareasofthestripsrepresenttherelativefrequencies.Thesamearearepresentsthesamerelativefrequencyinbothgraphs.

Itisworthemphasizingthattomakevalidcomparisonsbetweentwohistograms, equalareasmustrepresentequalrelativefrequencies.Sincetherelativefrequenciesof alltheclassesinahistogramsumto1,thismeansthat thetotalareaundereachofthe histogramsbeingcomparedmustbethesame.LookatFigure1.4foranapplicationof thisidea.

Histogramsareoftenusedasgraphicaltestsoftheshapeofsamplesusuallytestingwhetherthedataareapproximately“bell-shaped”ornot.Wewilldiscussthe importanceofthisconsiderationinfuturechapters.

1.9QuartilesandBoxPlots

Intheprevioussectionswehaveusedsamplevariance,standarddeviation,andrange toobtainmeasuresofthespreadorvariability.Anotherquickandusefulwayto visualizethespreadofadatasetisbyconstructingaboxplotthatmakesuseof quartilesandthesamplerange.

QuartilesandFive-NumberSummaries

Asthenamesuggests,quartilesdivideadistributioninquarters.Moreprecisely,the pth percentile ofadistributionisthevaluesuchthat p percentoftheobservations fallatorbelowit.Forexample,themedianisjustthe50thpercentile.Similarly,the lower or ﬁrstquartile isthe25thpercentileandthe upper or thirdquartile is the75thpercentile.Becausethesecondquartileisthesameasthemedian,quartiles areappropriatewaystomeasurethespreadofadistributionwhenthemedianisused tomeasureitscenter.

Becausesamplesizesarenotalwaysevenlydivisibleby4toformquartiles,we needtoagreeonhowtobreakadatasetupintoapproximatequarters.Othertexts, computerprograms,andcalculatorsmayuseslightlydi↵erentruleswhichproduce slightlydi↵erentquartiles.

FORMULA1.12. Tocalculatethefirstandthirdquartiles,firstorderthelistofobservations andlocatethemedian.The firstquartile Q1 isthemedianoftheobservationsfallingbelow themedianoftheentiresampleandthe thirdquartile Q3 isthemedianoftheobservations fallingabovethemedianoftheentiresample.The interquartilerange isdefinedas

IQR= Q3 Q1 .

ThesampleIQRdescribesthespreadofthemiddle50%ofthesample,thatis,the di↵erencebetweentheﬁrstandthirdquartiles.Assuch,itisameasureofvariability andiscommonlyreportedwiththemedian.

EXAMPLE1.9. FindtheﬁrstandthirdquartilesandtheIQRforthecypresspine datainExample1.2.

CCH1719313948566873737580122 Depth123456654321

SOLUTION. Themediandepthis 12+1 2 =6 5.Sotherearesixobservationsbelow themedian.Thequartiledepthisthemediandepthofthesesixobservations: 6+1 2 = 3 5.Sotheﬁrstquartileis Q1 = 31+39 2 =35cm.Similarly,thedepthforthe thirdquartileisalso3.5(fromtheright),so Q3 = 73+75 2 =74cm.Finally,the

IQR= Q3 Q1 =74 35=39cm.