Where can buy An introduction to biostatistics 3rd edition, (ebook pdf) ebook with cheap price

Page 1


An Introduction to Biostatistics 3rd Edition, (Ebook PDF)

Visit to download the full and correct content document: https://ebookmass.com/product/an-introduction-to-biostatistics-3rd-edition-ebook-pdf/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

(eBook PDF) An Introduction to Payroll Administration 3rd Edition

https://ebookmass.com/product/ebook-pdf-an-introduction-topayroll-administration-3rd-edition/

(eBook PDF) Urbanization: An Introduction to Urban Geography 3rd Edition

https://ebookmass.com/product/ebook-pdf-urbanization-anintroduction-to-urban-geography-3rd-edition/

An Introduction to Crime Scene Investigation 3rd Edition, (Ebook PDF)

https://ebookmass.com/product/an-introduction-to-crime-sceneinvestigation-3rd-edition-ebook-pdf/

(eBook PDF) An Introduction to Psychological Science, 3rd Canadian Edition

https://ebookmass.com/product/ebook-pdf-an-introduction-topsychological-science-3rd-canadian-edition/

(eBook PDF) Politics An Introduction 3rd Canadian Edition

https://ebookmass.com/product/ebook-pdf-politics-anintroduction-3rd-canadian-edition/

Theatre Histories: An Introduction 3rd Edition, (Ebook PDF)

https://ebookmass.com/product/theatre-histories-anintroduction-3rd-edition-ebook-pdf/

Practices of Looking: An Introduction to Visual Culture 3rd Edition (Ebook PDF)

https://ebookmass.com/product/practices-of-looking-anintroduction-to-visual-culture-3rd-edition-ebook-pdf/

The Good Society: An Introduction to Comparative Politics 3rd Edition, (Ebook PDF)

https://ebookmass.com/product/the-good-society-an-introductionto-comparative-politics-3rd-edition-ebook-pdf/

Gateways to Democracy: An Introduction to American Government (Book Only) 3rd Edition, (Ebook PDF)

https://ebookmass.com/product/gateways-to-democracy-anintroduction-to-american-government-book-only-3rd-edition-ebookpdf/

10LinearRegressionandCorrelation315

11GoodnessofFitTests357

AProofsofSelectedResults403

C.6WilcoxonSigned-RankTestCumulativeDistribution..........

C.7Cumulative

C.8CriticalValuesfortheWilcoxonRank-SumTest............

C.10Fisher’s

C.11CorrelationCoecient

C.12CumulativeDistributionforKendall’sTest(⌧

C.13CriticalValuesfortheSpearmanRankCorrelationCoecient, rs ... 526

C.14CriticalValuesfortheKolmogorov-SmirnovTest............ 527

C.15CriticalValuesfortheLillieforsTest................... 528

Ourgoalinwritingthisbookwastogenerateanaccessibleandrelativelycomplete introductionforundergraduatestotheuseofstatisticsinthebiologicalsciences.The textisdesignedforaonequarteroronesemesterclassinintroductorystatisticsfor thelifesciences.Thetargetaudienceissophomoreandjuniorbiology,environmental studies,biochemistry,andhealthsciencesmajors.Theassumedbackgroundissome courseworkinbiologyaswellasafoundationinalgebrabutnotcalculus.Examples aretakenfrommanyareasinthelifesciencesincludinggenetics,physiology,ecology, agriculture,andmedicine.

Thistextemphasizestherelationshipsamongprobability,probabilitydistributions, andhypothesistesting.Wehighlighttheexpectedvalueofvariousteststatisticsunder thenullandresearchhypothesesasawaytounderstandthemethodologyofhypothesistesting.Inaddition,wehaveincorporatednonparametricalternativestomany situationsalongwiththestandardparametricanalysis.Thesenonparametrictechniquesareincludedbecauseundergraduatestudentprojectsoftenhavesmallsample sizesthatprecludeparametricanalysisandbecausethedevelopmentofthenonparametrictestsisreadilyunderstandableforstudentswithmodestmathbackgrounds. Thenonparametrictestscanbeskippedorskimmedwithoutlossofcontinuity.

Wehavetriedtoincludeinterestingandeasilyunderstandableexampleswitheach concept.Theproblemsattheendofeachchapterhavearangeofdicultyandcome fromavarietyofdisciplines.Somearereal-lifeexamplesandmostothersarerealistic intheirdesignanddatavalues.Throughoutthetextwehaveincludedshort“Concept Checks”thatallowreaderstoimmediatelygaugetheirmasteryofthetopicpresented. Theiranswersarefoundattheendsofappropriatechapters.Theend-of-chapter problemsarerandomizedwithineachchaptertorequirethestudenttochoosethe appropriateanalysis.Manyundergraduatetextspresentatechniqueandimmediately givealltheproblemsthatcanbesolvedwithit.Thisapproachpreventsstudents fromhavingtomakethereal-lifedecisionabouttheappropriateanalysis.Webelieve thisdecisionmakingisacriticalskillinstatisticalanalysisandhaveprovidedalarge numberofopportunitiestopracticeandit.

Thematerialforthistextderivesprincipallyfromarequiredundergraduatebiostatisticscourseoneofus(Glover)taughtformorethantwentyyearsandfroma secondcourseinnonparametricstatisticsandfielddataanalysisthattheotherof

us(Mitchell)taughtduringseveraltermabroadprogramstoQueensland,Australia. Recentshiftsinundergraduatecurriculahavede-emphasizedcalculusforbiologystudentsandarenowhighlightingstatisticalanalysesasafundamentalquantitativeskill. Wehopethatourtextwillmaketeachingandlearningtheseprocesseslessarduous.

SupplementalMaterials

Wehaveprovidedseveraldi↵erentresourcestosupplementthetextinvariousways. Thereisasetof AdditionalAppendices thatareavailableonlineat http://waveland. com/Glover-Mitchell/Appendices.pdf.Thesearecoordinatedwiththistextand containfurtherinformationonseveraltopics,including:

• additionalpost-hocmeancomparisontechniquesbeyondthosecoveredinthe textforuseintheanalysisofvariance;

• amethodfordeterminingconfidenceintervalsforthedi↵erencebetweenmedians ofindependentsamplesbasedontheWilcoxonrank-sumtest;

• adiscussionoftheDurbintestforincompleteblockdesigns;and

• asectioncoveringthreedi↵erentfieldmethods.

Alsoavailableissetof300additionalproblemstosupplementthoseinthetext. Thismaterialisavailableonlineforbothstudentsandinstructorsat http://waveland. com/Glover-Mitchell/ExtraProblems.pdf.

An AnswerManual forinstructorsisavailablefreeonCDfromthepublisher. IncludedonthisCDare

• aPDFfilecontainingbothquestionsandanswersforalltheproblemsinthetext andanotherfilecontainingonlytheanswersforalltheproblemsinthetext;

• aPDFfilecontainingthesupplementaryproblemsmentionedaboveandanother filecontainingboththequestionsandtheanswerstoallofthesupplementary problems;and

• aPDFfileofthe AdditionalAppendices

Thematerialinthistextbookcanbesupportedbyawidevarietyofstatistical packagesandcalculators.Theselectionofthesesupportmaterialsisdictatedby personalinterestsandcostconsiderations.Forexample,wehavesuccessfullyused SPSSsoftware(http://www.spss.com/)inthelaboratorysessionsofourcourse.This softwareiseasytouse,relativelyflexible,andcancompletenearlyallthestatistical techniquespresentedinourtext.

Anumberoffreeonlinestatisticaltoolsarealsoavailable.AmongthebestisThe R ProjectforStatisticalComputingwhichisavailableat http://www.r-project.org/ Wehavecreatedacompanionguide, Using R:AnIntroductiontoBiostatistics for ourtextthatisavailablefreeonlineat http://waveland.com/Glover-Mitchell/ r-guide.pdf,andisalsoincludedontheCDforinstructors.Intheguidewework throughalmosteveryexampleinthetextusing R.Ausefulfeatureof R istheability toaccessandanalyze(large)onlinedatafileswithouteverdownloadingthemtoyour owncomputer.Withthisinmind,wehaveplacedallofthedatafilesfortheexamples

andexercisesonlineastextfiles.Youmayeitherdownloadthesefilestousewith R oranyotherstatisticalpackage,orin R youmayaccessthetextfilesfromonline anddothenecessaryanalysiswithoutevercopyingthemtoyourowncomputer.See http://waveland.com/Glover-Mitchell/data.pdf formoreinformation.

OurstudentshaveusedTexasInstrumentcalculatorsrangingfromtheTI-30to theTI-83,TI-84,andTI-89models.Thepricerangeforthesecalculatorsisconsiderableandmightbeafactorinchoosingarequiredcalculatorforaparticularcourse. AlthoughcalculatorssuchastheTI-30dolessautomatically,theysometimesgive thestudentclearerinsightsintothestatisticaltestsbyrequiringafewmorecomputationalsteps.Theeaseofcomputationa↵ordedbycomputerprogramsorsophisticated calculatorssometimesleadstoa“blackbox”mentalityaboutstatisticsandtheircalculation.

ForbothstudentsandinstructorswerecommendD.J.Handetal.,editors,1994, AHandbookofSmallDataSets,Chapman&Hall,London.Thisbookcontains 510smalldatasetsrangingfromthenumbersofPrussianmilitarypersonnelkilled byhorsekicksfrom1875–1894(dataset#283)totheshapeofbeadworkonleather goodsofShoshoniIndians(dataset#150).Thedatasetsareinteresting,manageable, andamenabletostatisticalanalysisusingtechniquespresentedinourtext.Whilea numberofthedatasetsfromthe Handbook wereutilizedasexamplesorproblems inourtext,therearemanyothersthatcouldserveasengagingandusefulpractice problems.

Acknowledgments

Forthepreparationofthisthirdedition,thanksareduetothefollowingpeople: DonRossoandDakotaWestatWavelandPressfortheirsupportandguidance;Ann WarnerofHobartandWilliamSmithColleges,forhermeticulouswordprocessingof theearlydraftsofthismanuscript;andthestudentsofHobartandWilliamSmith Collegesfortheirmanycommentsandsuggestions,particularlyAlineGadueforher carefulscrutinyofthefirstedition.

ThomasJ.Glover KevinJ.Mitchell Geneva,NY

IntroductiontoDataAnalysis

ConceptsinChapter1:

• ScientificMethodandStatisticalAnalysis

• Parameters:DescriptiveCharacteristicsofPopulations

• Statistics:DescriptiveCharacteristicsofSamples

• VariableTypes:Continuous,Discrete,Ranked,andCategorical

• MeasuresofCentralTendency:Mean,Median,andMode

• MeasuresofDispersion:Range,Variance,StandardDeviation,andStandard Error

• DescriptiveStatisticsforFrequencyData

• E↵ectsofCodingonDescriptiveStatistics

• TablesandGraphs

• QuartilesandBoxPlots

• Accuracy,Precision,andthe30–300Rule

1.1Introduction

Themodernstudyofthelifesciencesincludesexperimentation,datagathering,and interpretation.Thistexto↵ersanintroductiontothemethodsusedtoperformthese fundamentalactivities.

Thedesignandevaluationofexperiments,knownasthe scientificmethod, is utilizedinallscientificfieldsandisoftenimpliedratherthanexplicitlyoutlinedin manyinvestigations.Thecomponentsofthescientificmethodincludeobservation, formulationofapotentialquestionorproblem,constructionofahypothesis,followed byaprediction,andthedesignofanexperimenttotesttheprediction.Let’sconsider thesecomponentsbriefly.

ObservationofaParticularEvent

Generallyanobservationcanbeclassifiedaseitherquantitativeorqualitative.Quantitativeobservationsarebasedonsomesortofmeasurement,forexample,length, weight,temperature,andpH.Qualitativeobservationsarebasedoncategoriesreflectingaqualityorcharacteristicoftheobservedevent,forexample,maleversusfemale, diseasedversushealthy,andmutantversuswildtype.

StatementoftheProblem

Aseriesofobservationsoftenleadstotheformulationofaparticularproblemor unansweredquestion.Thisusuallytakestheformofa“why”questionandimplies

acauseande↵ectrelationship.Forexample,supposeuponinvestigatingaremote Fijianislandcommunityyourealizedthatthevastmajorityoftheadultssu↵erfrom hypertension(abnormallyelevatedbloodpressureswiththesystolicover165mmHg andthediastolicover95mmHg).Notethattheindividualobservationsherearequantitativewhilethepercentagethatarehypertensiveisbasedonaqualitativeevaluation ofthesample.Fromthesepreliminaryobservationsonemightformulatethequestion: Whyaresomanyadultsinthispopulationhypertensive?

FormulationofaHypothesis

Ahypothesisisatentativeexplanationfortheobservationsmade.Agoodhypothesis suggestsacauseande↵ectrelationshipandistestable.

TheFijiancommunitymaydemonstratehypertensionbecauseofdiet,lifestyle, geneticmakeup,orcombinationsofthesefactors.Becausewe’venoticedextraordinary consumptionofoctopiintheirdietandknowingoctopodshaveaveryhighcholesterol content,wemighthypothesizethat thehighlevelofhypertensioniscausedbydiet.

MakingaPrediction

Ifthehypothesisisproperlyconstructed,itcanandshouldbeusedtomakepredictions.Predictionsarebasedondeductivereasoningandtaketheformofan“if-then” statement.Forexample,agoodpredictionbasedonthehypothesisabovewouldbe: Ifthehypertensioniscausedbyahighcholesteroldiet,thenchangingthediettoalow cholesteroloneshouldlowertheincidenceofhypertension.

Thecriteriaforavalid(properlystated)predictionare:

1. An“if”clausestatingthehypothesis.

2. A“then”clausethat

(a) suggestsalteringacausativefactorinthehypothesis(changeofdiet);

(b) predictstheoutcome(lowerlevelofhypertension);

(c) providesthebasisforanexperiment.

DesignoftheExperiment

Theentirepurposeanddesignofanexperimentistoaccomplishonegoal,thatis, totestthehypothesis.Anexperimentteststhehypothesisbytestingthecorrectness orincorrectnessofthepredictionsthatcamefromit.Theoretically,anexperiment shouldalterortestonlythefactorsuggestedbytheprediction,whileallotherfactors remainconstant.

Howwouldyoudesignanexperimenttotestthediethypothesisinthehypertensive population?

Thebestwaytotestthehypothesisaboveisbysettingupacontrolledexperiment. Thismightinvolveusingtworandomlychosengroupsofadultsfromthecommunity andtreatingbothidenticallywiththeexceptionoftheonefactorbeingtested.The controlgrouprepresentsthe“normal”situation,hasallfactorspresent,andisused asastandardorbasisforcomparison.Theexperimentalgrouprepresentsthe“test” situationandincludesallfactorsexceptthevariablethathasbeenaltered,inthiscase

thediet.Ifthegroupwiththelowcholesteroldietexhibits significantly lowerlevels ofhypertension,thehypothesisissupportedbythedata.Ontheotherhand,ifthe changeindiethasnoe↵ectonhypertension,thenaneworrevisedhypothesisshould beformulatedandtheexperimentalprocedureredesigned.Finally,thegeneralizations thataredrawnbyrelatingthedatatothehypothesiscanbestatedasconclusions. Whilethesestepsoutlinedabovemayseemstraightforward,theyoftenrequire considerableinsightandsophisticationtoapplyproperly.Inourexample,howthe groupsarechosenisnotatrivialproblem.Theymustbeconstructedwithoutbiasand mustbelargeenoughtogivetheresearcheranacceptablelevelofconfidenceinthe results.Further,howlargeachangeissignificantenoughtosupportthehypothesis? Whatis statisticallysignificant maynotbe biologicallysignificant

Afoundationinstatisticalmethodswillhelpyoudesignandinterpretexperiments properly.Thefieldofstatisticsisbroadlydefinedasthemethodsandproceduresfor collecting,classifying,summarizing,andanalyzingdata,andutilizingthedatatotest scientifichypotheses.Theterm statistics isderivedfromtheLatinforstate,andoriginallyreferredtoinformationgatheredinvariouscensusesthatcouldbenumerically summarizedtodescribeaspectsofthestate,forexample,bushelsofwheatperyear, ornumberofmilitary-agedmen.Overtimestatisticshascometomeanthescientific studyofnumericaldatabasedonnaturalphenomena.Statisticsappliedtothelife sciencesisoftencalled biostatistics or biometry.Thefoundationsofbiostatistics gobackseveralhundredyears,butstatisticalanalysisofbiologicalsystemsbegan inearnestinthelatenineteenthcenturyasbiologybecamemorequantitativeand experimental.

1.2PopulationsandSamples

Todayweusestatisticsasameansofinformingthedecision-makingprocessesinthe faceoftheuncertaintiesthatmostrealworldproblemspresent.Oftenwewishto makegeneralizationsaboutpopulationsthataretoolargeortoodiculttosurvey completely.Inthesecaseswesamplethepopulationandusecharacteristicsofthe sampletoextrapolatetocharacteristicsofthelargerpopulation.SeeFigure1.1.

Real-worldproblemsconcernlargegroupsor populations aboutwhichinferences mustbemade.(Isthereasizedi↵erencebetweentwocolormorphsofthesamespecies ofseastar?Aretheo↵springofacertaincrossoffruitfliesina3:1ratioofnormalto eyeless?)Certaincharacteristicsofthepopulationareofparticularinterest(systolic bloodpressure,weightingrams,restingbodytemperature).Thevaluesofthese characteristicswillvaryfromindividualtoindividualwithinthepopulation.These characteristicsarecalled randomvariables becausetheyvaryinanunpredictable wayorinawaythatappearsorisassumedtodependonchance.Thedi↵erenttypes ofvariablesaredescribedinSection1.3.

Adescriptivemeasureassociatedwitharandomvariablewhenitisconsidered overthe entirepopulation iscalleda parameter.Examplesarethemeanweightof allgreenturtles, Cheloniamydas,orthevarianceinclutchsizeofalltigersnakes, Notechisscutatus.Ingeneral,suchparametersaredicult,ifnotimpossible,to determinebecausethepopulationistoolargeorexpensivetostudyinitsentirety. Consequently,oneisforcedtoexamineasubsetor sample ofthepopulationandmake inferencesabouttheentirepopulationbasedonthissample.Adescriptivemeasure associatedwitharandomvariableofa sample iscalleda statistic.Themeanweight

Population(s)havetraitscalledrandomvariables. Summarycharacteristicsofthepopulationrandomvariables arecalledparameters: µ, 2 , N .

Randomsamplesofsize n ofthepopulation(s)generatenumericaldata: Xi ’s.

Thesedatacanbeorganizedinto summarystatistics: X , s2 , n, graphs,andfigures(Chapter1).

Thedatacanbeanalyzedusing anunderstandingofbasicprobability(Chapters2–4) andvarioustestsofhypotheses(Chapters5–11).

Theanalysesleadtoconclusionsorinferences aboutthepopulation(s)ofinterest.

FIGURE1.1. Thegeneralapproachtostatisticalanalysis.

of25femalegreenturtleslayingeggsonHeronIslandorthevariabilityinclutchsize of50clutchesoftigersnakeeggscollectedinsoutheasternQueenslandareexamples ofstatistics.

Whilesuchstatisticsarenotequaltothepopulationparameters,itishopedthat theyaresucientlyclosetothepopulationparameterstobeusefulorthatthepotentialerrorinvolvedcanbequantified.Samplestatisticsalongwithanunderstanding ofprobabilityformthefoundationforinferencesaboutpopulationparameters.See Figure1.1forreview.

Chapter1providestechniquesfororganizingsampledata.Chapters2through4 presentthenecessaryprobabilityconcepts,andtheremainingchaptersoutlinevarious techniquestotestawiderangeofpredictionsfromhypotheses.

ConceptChecks. Attheendofseveralofthesectionsineachchapterweincludeoneor twoquestionsdesignedasarapidcheckofyourmasteryofacentralideaofthesection’s content.Thesequestionswillbebemosthelpfulifyoudoeachasyouencounteritinthe text.Answerstothesequestionsaregivenattheendofeachchapterjustbeforetheexercises.

ConceptCheck1.1. Whichofthefollowingarepopulationsandwhicharesamples?

(a )Theweightsof25randomlychoseneighthgradeboysintheDetroitpublicschool system.

(b )ThenumberofeggsfoundineachospreynestonMt.DesertIslandinMaine.

(c )Theheightsof15redwoodtreesmeasuredintheMuirWoodsNationalMonument, anoldgrowthcoastredwoodforest.

(d )Thelengthsofalltheblindcavefish, Astyanasmexicanus,inasmallcavernsystem incentralMexico.

1.3VariablesorDataTypes

Thereareseveraldatatypesthatariseinstatistics.Eachstatisticaltestrequiresthat thedataanalyzedbeofaspecifiedtype.Herearethemostcommontypesofvariables.

1. Quantitativevariables fallintotwomajorcategories:

(a) Continuousvariables or intervaldata canassumeanyvalueinsome (possiblyunbounded)intervalofrealnumbers.Commonexamplesinclude length,weight,temperature,volume,andheight.Theyarisefrommeasurement.

(b) Discretevariables assumeonlyisolatedvalues.Examplesincludeclutch size,treesperhectare,armsperseastar,oritemsperquadrat.Theyarise fromcounting.

2. Ranked(ordinal)variables arenotmeasuredbutnonethelesshaveanatural ordering.Forexample,candidatesforpoliticalocecanberankedbyindividual voters.Orstudentscanbearrangedbyheightfromshortesttotallestand correspondinglyrankedwithouteverbeingmeasured.Therankvalueshaveno inherentmeaningoutsidethe“order”thattheyprovide.Thatis,acandidate ranked2isnottwiceaspreferableasthepersonranked1.(Comparethiswith measurementvariableswhereaplant2feettall is twiceastallasaplant1foot tall.Withmeasurementvariablessuchratiosaremeaningful,whilewithordinal variablestheyarenot.)

3. Categoricaldata arequalitativedata.Someexamplesarespecies,gender, genotype,phenotype,healthy/diseased,andmaritalstatus.Unlikewithranked data,thereisno“natural”orderingthatcanbeassignedtothesecategories.

Whenmeasurementvariablesarecollectedforeitherapopulationorasample,the numericalvalueshavetobeabstractedorsummarizedinsomeway.Thesummarydescriptivecharacteristicsofapopulationofobjectsarecalled populationparameters orjust parameters.Thecalculationofaparameterrequiresknowledgeofthemeasurementvariablesvaluefor every memberofthepopulation.Theseparametersare usuallydenotedbyGreeklettersanddonotvarywithinapopulation.Thesummary descriptivecharacteristicsofasampleofobjects,thatis,asubsetofthepopulation, arecalled statistics.Samplestatisticscanhavedi↵erentvalues,dependingonhow thesampleofthepopulationwaschosen.Statisticsaredenotedbyvarioussymbols, but(almost)neverbyGreekletters.

1.4MeasuresofCentralTendency:Mean,Median,andMode

Mean

Thereareseveralcommonlyusedmeasurestodescribethelocationorcenterofa populationorsample.Themostwidelyutilizedmeasureofcentraltendencyisthe arithmeticmean or average

The populationmean isthesumofthevaluesofthevariableunderstudydivided bythetotalnumberofobjectsinthepopulation.Itisdenotedbyalowercase µ (“mu”).Eachvalueisalgebraicallydenotedbyan X withasubscriptdenotation i

Forexample,asmalltheoreticalpopulationwhoseobjectshadvalues1,6,4,5,6,3, 8,7wouldbedenoted

Wewoulddenotethepopulationsizewithacapital N .Inourtheoreticalpopulation N =8.

Thepopulationmean µ wouldbe

FORMULA1.1. Thealgebraicshorthandformulaforapopulationmeanis µ = PN i=1 Xi N

TheGreekletter ⌃ (“sigma”)indicatessummation.Thesubscript i =1indicates tostartwiththefirstobservationandthesuperscript N meanstocontinueuntiland includingthe N thobservation.Thesubscriptandsuperscriptmayrepresentother startingandstoppingpointsforthesummationwithinthepopulationorsample.For theexampleabove,

i=2 Xi wouldindicatethesumof X2 + X3 + X4 + X5 or6+4+5+6=21.

Noticealsothat N X i=1 Xi iswritten PN i=i Xi whenthesummationsymbolisembed-

dedinasentence.Infact,tofurtherreduceclutter,thesummationsignmaynot indexedatall,forexample P Xi .Itisimpliedthattheoperationofadditionbegins withthefirstobservationandcontinuesthroughthelastobservationinapopulation, thatis,

Ifsigmanotationisnewtoyouorifyouwishaquickreviewofitsproperties,read AppendixA.1beforecontinuing.

FORMULA1.2. Thesamplemeanisdefinedby X = Pn i=1 Xi n , where n isthesamplesize. Thesamplemeanisusuallyreportedtoonemoredecimalplace thanthedataandalwayshasappropriateunitsassociatedwithit.

Thesymbol X (read“X bar”)indicatesthattheobservationsofasubsetofsize n fromapopulationhavebeenaveraged. X isfundamentallydi↵erentfrom µ because samplesfromapopulationcanhavedi↵erentvaluesfortheirsamplemean,thatis, theycanvaryfromsampletosamplewithinthepopulation.Thepopulationmean, however,isconstantforagivenpopulation.

Againconsiderthesmalltheoreticalpopulation1, 6, 4, 5, 6, 3, 8, 7.Asampleofsize 3mayconsistof5, 3, 4with X =4or6, 8, 4with X =6.

SECTION 1.4:MeasuresofCentralTendency:Mean,Median,andMode7

Actuallythereare56possiblesamplesofsize3thatcouldbedrawnfromthe populationin(1.1).Onlyfoursampleshavea sample meanthesameasthepopulation mean,thatis, X = µ:

SampleSum X

Eachsamplemean X isanunbiasedestimateof µ butdependsonthevalues includedinthesampleandsamplesizeforitsactualvalue.Wewouldexpectthe averageofallpossible X ’stobeequaltothepopulationparameter, µ.Thisis,infact, thedefinitionofan unbiasedestimator ofthepopulationmean.

Ifyoucalculatethesamplemeanforeachofthe56possiblesampleswith n =3 andthenaveragethesesamplemeans,theywillgiveanaveragevalueof5,thatis, thepopulationmean, µ.Rememberthatmostrealpopulationsaretoolargeortoo diculttocensuscompletely,sowemustrelyonusingasinglesampletoestimateor approximatethepopulationcharacteristics.

Median

Thesecondmeasureofcentraltendencyisthemedian.The median isthe“middle” valueofan ordered listofobservations.Thoughthisideaissimpleenough,itwill proveusefultodefineitintermsofanevensimplernotion.The depth ofavalue isitspositionrelativetothenearestextreme(end)whenthedataarelistedinorder fromsmallesttolargest.

EXAMPLE1.1. Thetablebelowgivesthecircumferencesatchestheight(CCH)(in cm)andtheircorrespondingdepthsfor15sugarmaples, Acersaccharum,measured inaforestinsoutheasternOhio.

CCH1821222929363738565966708893120

Depth123456787654321

The populationmedian M istheobservationwhosedepthis d = N +1 2 ,where N isthepopulationsize.

NotethatthisparameterisnotaGreekletterandisseldomcomputedinpractice. Ratherasamplemedian X (read“X tilde”)isthestatisticusedtoapproximate orestimatethepopulationmedian. X isdefinedastheobservationwhosedepthis d = n+1 2 ,where n isthesamplesize.InExample1.1,thesamplesizeis n =15,sothe depthofthesamplemedianis d =8.Thesamplemedian X = X n+1 2 = X8 =38cm.

EXAMPLE1.2. ThetablebelowgivesCCH(incm)for12cypresspines, Callitris preissii,measurednearBrownLakeonNorthStradbrokeIsland.

CCH1719313948566873737580122 Depth123456654321

Since n =12,thedepthofthemedianis 12+1 2 =6.5.Obviouslynoobservation hasdepth6.5,sothisisinterpretedastheaverageofbothobservationswhosedepth is6inthelistabove.So X = 56+68 2 =62cm.

Mode

The mode isdefinedasthemostfrequentlyoccurringvalueinadataset.Themode ofExample1.2wouldbe73cm,whileExample1.1wouldhaveamodeof29cm. Insymmetricaldistributionsthemean,median,andmodearecoincident.Bimodal distributionsmayindicateamixtureofsamplesfromtwopopulations,forexample, weightsofmalesandfemales.Whilethemodeisnotoftenusedinbiologicalresearch, reportingthenumberofmodes,ifmorethanone,canbeinformative.

Eachmeasureofcentraltendencyhasdi↵erentfeatures.Themeanisapurposeful measureonlyforaquantitativevariable,whetheritiscontinuous(forexample,height) ordiscrete(forexample,clutchsize).Themediancanbecalculatedwhenevera variablecanberanked(includingwhenthevariableisquantitative).Finally,the modecanbecalculatedforcategoricalvariables,aswellasforquantitativeandranked variables.

Thesamplemedianexpresseslessinformationthanthesamplemeanbecauseit utilizesonlytheranksandnottheactualvaluesofeachmeasurement.Themedian, however,isresistanttothee↵ectsof outliers.Extremevaluesoroutliersinasamplecandrasticallya↵ectthesamplemean,whilehavinglittlee↵ectonthemedian. ConsiderExample1.2with X =58.4cmand ˜ X =62cm.Suppose X12 hadbeenmistakenlyrecordedas1220cminsteadof122cm.Themean X wouldbecome149.9cm whilethemedian ˜ X wouldremain62cm.

1.5MeasuresofDispersionandVariability:Range, Variance,StandardDeviation,andStandardError

EXAMPLE1.3. Thetablethatfollowsgivestheweightsoftwosamplesofalbacore tuna, Thunnusalalunga (inkg).Howwouldyoucharacterizethedi↵erencesinthe samples?

Sample1Sample2

SOLUTION. Uponinvestigationweseethatbothsamplesarethesamesizeand havethesamemean, X 1 = X 2 =10 11kg.Infact,bothsampleshavethesame median.Toseethis,arrangethedatasetsinrankorderasinTable1.1.Wehave n =9,so X = X n+1 2 = X5 ,whichis9.9kgforbothsamples.

Neitherofthesampleshasamode.SobyallthedescriptorsinSection1.4these samplesappeartobeidentical.Clearlytheyarenot.Thedi↵erenceinthesamples

TABLE1.1. Theorderedsamplesof Thunnus alalunga

isreflectedinthescatterorspreadoftheobservations.Sample1ismuchmore uniformthanSample2,thatis,theobservationstendtoclustermuchnearerthe meaninSample1thaninSample2.Weneeddescriptivemeasuresofthisscatteror dispersionthatwillreflectthesedi↵erences.

Range

Thesimplestmeasureofdispersionor“spread”ofthedataistherange.

FORMULAS1.3. Thedi↵erencebetweenthelargestandsmallestobservationsinagroup ofdataiscalledthe range:

Samplerange= Xn X1

Populationrange= XN X1

Whenthedataareorderedfromsmallesttolargest,thevalues Xn and X1 arecalledthe samplerangelimits

InExample1.3wehavefromTable1.1 Sample1:range=

Therangeforeachofthesetwosamplesreflectssomedi↵erencesindispersion, buttherangeisarathercrudeestimatorofdispersionbecauseitusesonlytwoof thedatapointsandissomewhatdependentonsamplesize.Assamplesizeincreases, weexpectlargestandsmallestobservationstobecomemoreextremeand,therefore, thesamplerangetoincreaseeventhoughthepopulationrangeremainsunchanged. Itisunlikelythatthesamplewillincludethelargestandsmallestvaluesfromthe population,sothesamplerangeusuallyunderestimatesthepopulationrangeandis, therefore,abiasedestimator.

Variance

Todevelopameasurethatusesallthedatatoformanindexofdispersionconsider thefollowing.Supposeweexpresseachobservationasadistancefromthemean

xi = Xi X .Thesedi↵erencesarecalled deviates andwillbesometimespositive (Xi isabovethemean)andsometimesnegative(Xi isbelowthemean).

Ifwetrytoaveragethedeviates,theyalwayssumto0.Becausethemeanisthe centraltendencyorlocation,thenegativedeviateswillexactlycanceloutthepositive deviates.Considerasimplenumericalexample

Themean X =4,andthedeviatesare x1 = 2 x2 = 1 x3 = 3 x4 =4 x5 =2

Noticethatthenegativedeviatescancelthepositiveonessothat P(Xi X )=0. Algebraicallyonecandemonstratethesameresultmoregenerally,

Since X isaconstantforanysample,

Since X = P Xi n ,then nX = P Xi ,so

Tocircumventthisunfortunateproperty,thewidelyusedmeasureofdispersion calledthe samplevariance utilizesthesquaresofthedeviates.Thequantity

isthesumofthesesquareddeviatesandisreferredtoasthe correctedsumof squares, denotedbyCSS.Eachobservationiscorrectedoradjustedforitsdistance fromthemean.

FORMULA1.4. Thecorrectedsumofsquaresisutilizedintheformulaforthesample variance, s 2 = Pn i=1 (Xi X )2 n 1 . Thesamplevarianceisusuallyreportedtotwomoredecimalplacesthanthedataandhas unitsthatarethesquareofthemeasurementunits.

Thiscalculationisnotasintuitiveasthemeanormedian,butitisaverygood indicatorofscatterordispersion.Iftheaboveformulahad n insteadof n 1in thedenominator,itwouldbeexactlytheaveragesquareddistancefromthemean. ReturningtoExample1.3,thevarianceofSample1is0.641kg2 andthevarianceof Sample2is49.851kg2 ,reflectingthelarger“spread”inSample2. Asamplevarianceisanunbiasedestimatorofaparametercalledthe population variance.

FORMULA1.5. Apopulationvarianceisdenotedby 2 (“sigmasquared”)andisdefined by

Itreally is theaveragesquareddeviationfromthemeanforthepopulation.The n 1inFormula1.4makesitanunbiasedestimateofthepopulationparameter.(See AppendixA.2foraproof.)Rememberthat“unbiased”meansthattheaverageofall possiblevaluesof s2 foracertainsizesamplewillbeequaltothepopulationvalue 2 . Formulas1.4and1.5aretheoreticalformulasandarerathertedioustoapply directly.Computationalformulasutilizethefactthatmostcalculatorswithstatistical registerssimultaneouslycalculate n, P Xi ,and P X 2 i .

FORMULA1.6. Thecorrectedsumofsquares P(Xi X )2 maybecomputedmoresimply as

)

P X 2 i istheuncorrectedsumofsquaresand (P Xi )2 n isthecorrectionterm.

ToverifyFormula1.6,usingthepropertiesinAppendixA.1noticethat

Rememberthat X = P Xi n ,so nX = P Xi ;hence

Substituting P Xi n for X yields

FORMULA1.7. Useofthecomputationalformulaforthecorrectedsumofsquaresgives thecomputationalformulaforthesamplevariance

ReturningtoExample1.3,Sample2, X Xi =91, X X 2 i =1318 92,n =9, so s 2 = 1318.92 (91)2 9 9 1 = 1318.92 920.11 8 =

Remember,thenumeratormustalwaysbeapositivenumberbecauseit’sasumof squareddeviations.Becausethevariancehasunitsthatarethesquareofthemeasurementunits,suchassquaredkilogramsabove,theyhavenophysicalinterpretation. Withasimilarderivation,thepopulationvariancecomputationalformulacanbe showntobe 2 = P X 2 i (P Xi )2 N N

Again,thisformulaisrarelyusedsincemostpopulationsaretoolargetocensus directly.

StandardDeviation

FORMULAS1.8. Amore“natural”calculationisthe standarddeviation,whichisthe positivesquarerootofthepopulationorsamplevariance,respectively.

Thesedescriptionshavethesameunitsastheoriginalobservationsandare,inasense, theaveragedeviationofobservationsfromtheirmean.

Again,considerExample1.3.

ForSample1: s2 1 =0.641kg2 , so s1 =0.80kg.

ForSample2: s2 2 =49 851kg2 , so s2 =7 06kg

Thestandarddeviationofasampleisrelativelyeasytointerpretandclearlyreflects thegreatervariabilityinSample2comparedtoSample1. Likethemean,thestandard deviationisusuallyreportedtoonemoredecimalplacethanthedataandalwayshas appropriateunitsassociatedwithit. Boththevarianceandstandarddeviationcanbe usedtodemonstratedi↵erencesinscatterbetweensamplesorpopulations.

ThinkingaboutSumsofSquares

Ithasbeenourexperienceteachingelementarydescriptivestatisticsthatstudentshave littleproblemunderstandingmeasuresofcentraltendencysuchasthemeanandmedian. Thesamplevarianceandstandarddeviation,ontheotherhand,areoftenlessintuitiveto beginningstudents.Solet’sstepbackforamomenttocarefullyconsiderwhattheseindices ofvariabilityarereallymeasuring.

Supposeasmallsampleoflengths(incm)ofsmallmouthbassiscollected.

2732304135 Xi ’s

Thesefivefishhaveanaveragelengthof33.0cm.Somearesmallerandotherslargerthan thismean.Togetasenseofthisvariability,let’ssubtracttheaveragefromeachdatapoint (Xi 33)= xi generatingwhatiscalledthe deviate foreachvalue.Thedatawhenrescaled bysubtractingthemeanbecome

Whenweaddthesedeviations,theirsumis0,sotheirmeanisalso0.Toquantifythese deviationsand,therefore,thesample’svariability,wesquarethesedeviatestopreventthem fromalwayssummingto0.

Thesumofthesesquareddeviatesis

Thiscalculationiscalledthe corrected orrescaled sumofsquares (squareddeviates). Ifweaveragedthesecalculationsbydividingthecorrectedsumofsquaresbythesample size n =5,wewouldhaveameasureoftheaveragesquareddistanceoftheobservations fromtheirmean.Thismeasureiscalledthesample variance.However,withsamplesthis

SECTION 1.5:MeasuresofDispersionandVariability13

calculationusuallyinvolvesdivisionby n 1ratherthan n.Thismodificationaddresses issuesofbiasthatarediscussedinSection1.5andAppendixA.2.

Thepositivesquarerootofthesamplevarianceiscalledthe standarddeviation.In thiscontext,standardsignifies“usual”or“average.”Sothesamplevarianceandstandard deviationarejustmeasuringtheaverageamountthatobservationsvaryfromtheircenteror mean.Theyaresimplyaveragesofvariabilityratherthanaveragesofobservationmeasurementvalueslikethemean.Thefishsamplehadameanof33 0cmwithastandarddeviation of5 3cm.

StandardError

Themostimportantstatisticofcentraltendencyisthesamplemean.However,the meanvariesfromsampletosample(seepage7).Wenowdevelopamethodtomeasure thevariabilityofthesamplemean.

Thevarianceandstandarddeviationaremeasuresofdispersionorscatterofthe valuesofthe X ’sinasampleorpopulation.Becausemeansutilizeanumberof X ’s intheircalculation,theytendtobelessvariablethantheindividual X ’s.Anextreme valueof X (largeorsmall)contributesonlyone nthofitsvaluetothesamplemean andis,therefore,somewhatdampenedout.

Ameasureofthevariabilityin X ’sthendependsontwofactors:thevariability inthe X ’sandthenumberof X ’saveragedtogeneratethemean X .Weutilizetwo statisticstoestimatethisvariability.

FORMULAS1.9. The varianceofthesamplemean isdefinedtobe

s 2 n , andstandarddeviationofthesamplemeanor,morecommonly,the standarderror

SE= s pn

Thestandarderroristhemoreimportantofthesetwostatistics.Itsutilitywillbe becomeclearinChapter4whentheCentralLimitTheoremisoutlined.Thestandard errorisusuallyreportedtoonemoredecimalplacethanthedata,orif n islarge,to twomoreplaces.

EXAMPLE1.4. Calculatethevarianceofthesamplemeanandthestandarderror forthedatasetsinExample1.3.

SOLUTION. Thesamplesizesareboth n =9.ForSample1, s 2 =0 641kg2 ,sothe varianceofthesamplemeanis

s 2 n = 0.641 9 =0.71kg2

andthestandarddeviationis s =0.80kg,sothestandarderroris

SE= s pn = 0 80 p9 =0.27kg.

ForSample2, s 2 =49.851kg2 ,sothevarianceofthesamplemeanis

s 2 n = 49.851 9 =16 62kg2

andthestandarddeviationis s =7.06kg,sothestandarderroris

SE= s pn = 7 06 p9 =2 35kg

ConceptCheck1.2. Thefollowingdataarethecarapace(shell)lengthsincentimetersof asampleofadultfemalegreenturtles, Cheloniamydas,measuredwhilenestingatHeron IslandinAustralia’sGreatBarrierReef.Calculatethefollowingdescriptivestatisticsforthis sample:samplemean,samplemedian,correctedsumofsquares,samplevariance,standard deviation,standarderror,andrange.Remembertousetheappropriatenumberofdecimal placesinthesedescriptivestatisticsandtoincludethecorrectunitswithallstatistics.

11010511711395115989793120

1.6DescriptiveStatisticsforFrequencyTables

Whenlargedatasetsareorganizedintofrequencytablesorpresentedasgroupeddata, thereareshortcutmethodstocalculatethesamplestatistics: X , s2 ,and s

EXAMPLE1.5. Thefollowingtableshowsthenumberofsedgeplants, Carexflacca, foundin800samplequadratsinanecologicalstudyofgrasses.Eachquadratwas 1m2 . Plants/quadrat(Xi )Frequency(fi )

TocalculatethesampledescriptivestatisticsusingFormulas1.2,1.7,and1.8would bequitearduous,involvingsumsandsumsofsquaresof800numbers.Fortunately, thefollowingformulaslimitthedrudgeryforthesecalculations.

Itisclearthat X1 =0occurs f1 =268times, X2 =1occurs f2 =316times,etc., andthatthesumofobservationsinthefirstcategoryis f1 X1 ,thesuminthesecond categoryis f2 X2 ,etc.Thesumofallobservationsis,therefore,

where c denotesthenumberofcategories.Thetotalnumberofobservationsis

c i=1 fi ,andasaresult:

FORMULA1.10. Thesamplemeanforagroupeddatasetisgivenby

SECTION 1.6:DescriptiveStatisticsforFrequencyTables15

Similarly,thecomputationalformulaforthesamplevarianceforagroupeddataset canbederiveddirectlyfrom s 2 = Pc i=1 fi (Xi X )2 n 1

FORMULA1.11. Thesamplevarianceforagroupeddatasetisgivenby s 2 = P

, where n = Pc i=1 fi .

ToapplyFormulas1.10and1.11,weneedtocalculateonlythreesums:

• Thesamplesize n = P fi

• Thesumofobservations P fi Xi

• Theuncorrectedsumofsquaredobservations P fi X 2 i ReturningtoExample1.5,itisnowstraightforwardtocalculate X , s2 ,and s.

Plants/quadrat(Xi ) fi fi Xi fi X 2 i 026800 1316316316 2135270540 361183549 41560240 531575 61636 71749

Sum8008571805

Notethatcolumn4inthetableaboveisgeneratedbyfirstsquaring Xi andthen multiplyingby fi ,notbysquaringthevaluesincolumn3.Inotherwords, fi X 2 i = (fi Xi )2 . Thesamplemeanis

thesamplevarianceis

2 =

, andthesamplestandarddeviationis

s = p1 11=1 1plants/quadrat.

Example1.5summarizeddataforadiscretevariabletakingonwholenumber valuesfrom0to7.Continuousvariablescanalsobepresentedasgroupeddatain frequencytables.

EXAMPLE1.6. Thefollowingdatawerecollectedbyrandomlysamplingalarge populationofrainbowtrout, Salmogairdnerii.Thevariableofinterestisweightin pounds.

Rainbowtrouthaveweightsthatcanrangefromalmost0to20lbormore.Moreovertheirweightscantakeonanyvalueinthatinterval.Forexample,aparticular troutmayweigh7.3541lb.WhendataaregroupedasinExample1.6intervalsare impliedforeachclass.Afishinthe3-lbclassweighssomewherebetween2.50and 3.49lbandafishinthe9-lbclassweighsbetween8.50and9.49lb.Fishwereweighed tothenearestpoundallowinganalysisofgroupeddataforacontinuousmeasurement variable.InExample1.6,

Again,considerthatcalculationtimeissavedbyworkingwith13classesinstead of110individualobservations.Whethermeasuringtherainbowtrouttothenearest poundwasappropriatewillbeconsideredinSection1.10.

1.7TheE↵ectofCodingData

Whilegroupingdatacansaveconsiderabletimeande↵ort,codingdatamayalsoo↵er similarsavings.Codinginvolvesconversionofmeasurementsorstatisticsintoeasier toworkwithvaluesbysimplearithmeticoperations.Itissometimesusedtochange unitsortoinvestigateexperimentale↵ects.

AdditiveCoding

Additivecodinginvolvestheadditionorsubtractionofaconstantfromeachobservationinadataset.SupposethedatagatheredinExample1.6werecollectedusinga scalethatweighedthefish2lbtoolow.Wecouldgobacktothedataandadd2lbto eachobservationandrecalculatethedescriptivestatistics.Amoreecienttackwould betorealizethat ifafixedamount c isaddedorsubtractedfromeachobservationin adataset,thesamplemeanwillbeincreasedordecreasedbythatamount,butthe variancewillbeunchanged.

Toseewhy,if X c isthecodedmean,then

If s2 c isthecodedsamplevariance,then

therefore, sc = s

Ifthescaleweighed2lblightinExample1.6thenew,correctedstatisticswould be X c =7 1+2 0=9 1lb,and s2 c =5 75(lb)2 ,and s

MultiplicativeCoding

Multiplicativecodinginvolvesmultiplyingordividingeachobservationinadatasetby aconstant.SupposethedatainExample1.6weretobepresentedataninternational conferenceand,therefore,hadtobepresentedinmetricunits(kilograms)ratherthan Englishunits(pounds).Since1kgequals2.20lb,wecouldconverttheobservationsto kilogramsbymultiplyingeachobservationby1/2.20or0.45kg/lb.Again,themore ecientapproachwouldbetorealizethefollowing.

Ifeachoftheobservationsinadatasetismultipliedbyafixedquantity c,thenew meanis c timestheoldmeanbecause

Furtherthenewvarianceis c2 timestheoldvariancebecause

2

=

)

andfromthisitfollowsthatthenewstandarddeviationis c timestheoldstandard deviation, sc = cs.(Remember,too,thatdivisionisjustmultiplicationbyafraction.)

ToconvertthesummarystatisticsofExample1.6tometricwesimplyutilizethe formulasabovewith c =0.45kg/lb.

X c = cX =0 45kg/lb(7 1lb)=3 20kg s 2 c = c 2 s 2 =(0.45kg/lb)2 (5.75lb2 )=1.164kg2 .

sc = cs =0 45kg/lb(2 4lb)=1 08kg

Ourunderstandingofthee↵ectsofcodingondescriptivestatisticscansometimes helpdeterminethenatureofexperimentalmanipulationsofvariables.

EXAMPLE1.7. Supposethataparticularvarietyofstrawberryyieldsanaverage 50goffruitperplantinfieldconditionswithoutfertilizer.Withahighnitrogen fertilizerthisvarietyyieldsanaverageof100goffruitperplant.Anew“highyield” varietyofstrawberryyields150goffruitperplantwithoutfertilizer.Howmuch wouldtheyieldbeexpectedtoincreasewiththehighnitrogenfertilizer?

SOLUTION. Wehavetwochoiceshere:Thee↵ectofthefertilizercouldbeadditive,increasingeachvalueby50g(Xi +50)orthee↵ectofthefertilizercouldbe multiplicative,doublingeachvalue(2Xi ).Inthefirstcaseweexpecttheyieldofthe newvarietywithfertilizertobe150g+50g=200g.Inthesecondcaseweexpect theyieldofthenewvarietywithfertilizertobe2 ⇥ 150g=300g.Todi↵erentiate betweenthesepossibilitieswemustlookatthevarianceinyieldoftheoriginalvariety withandwithoutfertilizer.Ifthee↵ectoffertilizerisadditive,thevarianceswithand withoutfertilizershouldbesimilarbecauseadditivecodingdoesn’te↵ectthevariance: Xi +50yields s 2 ,theoriginalsamplevariance.Ifthee↵ectistodoubletheyield,the varianceofyieldswithfertilizershouldbefourtimesthevariancewithoutfertilizer becausemultiplicativecodingincreasesthevariancebythesquareoftheconstant usedincoding.2Xi yields4s 2 ,doublingtheyieldincreasesthesamplevariancefour fold.

1.8TablesandGraphs

Thedatacollectedinasampleareoftenorganizedintoatableorgraphasasummary representation.ThedatapresentedinExample1.5werearrangedintoafrequency tableandcouldbefurtherorganizedintoa relativefrequencytable byexpressing eachrowasapercentageofthetotalobservationsorintoa cumulativefrequency distribution byaccumulatingallobservationsuptoandincludingeachrow.Thecumulativefrequencydistributioncouldbemanipulatedfurtherintoa relativecumulativefrequencydistribution byexpressingeachrowofthecumulativefrequency distributionasapercentageofthetotal.Seecolumns3–5inTable1.2fortherelative frequency,cumulativefrequency,andrelativecumulativefrequencydistributionsfor Example1.5.(Here n = P fi and r istherownumber.)

TABLE1.2. Therelativefrequencies,cumulativefrequencies,andrelativecumulativefrequenciesforExample1.5

(100) Xi fi RelativeCumulativeRelativecumulative Plants/quadratFrequencyfrequencyfrequencyfrequency 026833.50026833.500 131639.50058473.000 213516.87571989.875 3617.62578097.500 4151.87579599.375 530.37579899.750 610.12579999.875 710.125800100.000

relativefrequencies.SeeFigure1.2.Inabargraphthebar heights aretherelative frequencies.Thebarsareofequalwidthandspacedequidistantlyalongthehorizontal axis.Becausethesedataarediscrete,thatis,becausetheycanonlytakecertainvalues alongthehorizontalaxis,thebarsdonottoucheachother.

FIGURE1.2. AbargraphofrelativefrequenciesforExample1.5.

ThedatainExample1.6canbesummarizedinasimilarfashionwithrelative frequency,cumulativefrequency,andrelativecumulativefrequencycolumns.SeeTable1.3.

TABLE1.3. Therelativefrequencies,cumulativefrequencies,andrelativecumulativefrequenciesforExample1.6

82421.828678.18 976.369384.55 1098.1810292.73 1121.8210494.55 1243.6410898.18 1321.82110100.00 P 110100.00

BecausethedatainExample1.6arecontinuousmeasurementdatawitheachclass implyingarangeofpossiblevaluesfor Xi ,forexample, Xi =3implieseachfish weighedbetween2.50lband3.49lb,thepictorialrepresentationofthedatasetis a histogram notabargraph.Histogramshavetheobservationclassesalongthe horizontalaxis.The area ofthestriprepresentstherelativefrequency.(Iftheclasses

1:IntroductiontoDataAnalysis ofthehistogramareofequalwidth,astheyoftenare,thentheheightsofthestrips willrepresenttherelativefrequency,asinabargraph.)SeeFigure1.3.Thestrips inthiscasetoucheachotherbecauseeach X valuecorrespondstoarangeofpossible values.

FIGURE1.3. AhistogramfortherelativefrequenciesforExample1.6.

Whilethecategoriesinabargrapharepredeterminedbecausethedataarediscrete,theclassesrepresentingrangesofcontinuousdatavaluesmustbeselectedby theinvestigator.Infact,itissometimesrevealingtocreatemorethanonehistogram ofthesamedatabyemployingclassesofdi↵erentwidths.

EXAMPLE1.8. Thelistbelowgivessnowfallmeasurementsfor50consecutiveyears (1951–2000)inSyracuse,NY(ininchesperyear).Thedatahavebeenrearranged inorderofincreasingannualsnowfall.Createahistogramusingclassesofwidth 30inchesandthencreateahistogramusingnarrowerclassesofwidth15inches. (Source: http://neisa.unh.edu/Climate/IndicatorExcelFiles.zip)

71.773.477.881.684.184.184.386.791.393.8 93.994.497.597.698.199.199.9100.7101.0101.9 102.1102.2104.8108.3108.5110.2111.0113.3114.2114.3 116.2119.2119.5122.9124.0125.7126.6130.1131.7133.1 135.3145.9148.1149.2153.8160.9162.6166.1172.9198.7

SOLUTION. Usethesamescaleforthehorizontalaxis(inchesofannualsnowfall)in bothhistograms.Rememberthatthe area ofastriprepresentstherelativefrequency oftheassociatedclass.Sincethesnowfallclassesofthesecondhistogram(15in)are one-halfthoseofthefirsthistogram(30in),thentheverticalscalemustbemultiplied byafactorof2sothatequalareasineachhistogramwillrepresentthesamerelative frequencies.Thus,asingleyearinthesecondhistogramwillberepresentedbyastrip halfaswidebuttwiceastallasinthefirsthistogram,asindicatedinthekeyinthe upperleftcornerofeachdiagram.

Inthiscase,thenarrowerclassesofthesecondhistogramprovidemoreinformation.Forexample,nearlyone-thirdofallrecentwintersinSyracusehaveproduced snowfallsinthe90–105inchrange.Therewasoneyearwithaverylargeamountof snowfallofapproximately200in.Whileonecouldgarnerthissameinformationfrom thedataitself,normallyonewouldusea(single)histogramtosummarizedataand notlisttheentiredataset.

FIGURE1.4. TwohistogramsforthedatainExample1.8.Theareasofthestripsrepresenttherelativefrequencies.Thesamearearepresentsthesamerelativefrequencyinbothgraphs.

Itisworthemphasizingthattomakevalidcomparisonsbetweentwohistograms, equalareasmustrepresentequalrelativefrequencies.Sincetherelativefrequenciesof alltheclassesinahistogramsumto1,thismeansthat thetotalareaundereachofthe histogramsbeingcomparedmustbethesame.LookatFigure1.4foranapplicationof thisidea.

Histogramsareoftenusedasgraphicaltestsoftheshapeofsamplesusuallytestingwhetherthedataareapproximately“bell-shaped”ornot.Wewilldiscussthe importanceofthisconsiderationinfuturechapters.

1.9QuartilesandBoxPlots

Intheprevioussectionswehaveusedsamplevariance,standarddeviation,andrange toobtainmeasuresofthespreadorvariability.Anotherquickandusefulwayto visualizethespreadofadatasetisbyconstructingaboxplotthatmakesuseof quartilesandthesamplerange.

QuartilesandFive-NumberSummaries

Asthenamesuggests,quartilesdivideadistributioninquarters.Moreprecisely,the pth percentile ofadistributionisthevaluesuchthat p percentoftheobservations fallatorbelowit.Forexample,themedianisjustthe50thpercentile.Similarly,the lower or firstquartile isthe25thpercentileandthe upper or thirdquartile is the75thpercentile.Becausethesecondquartileisthesameasthemedian,quartiles areappropriatewaystomeasurethespreadofadistributionwhenthemedianisused tomeasureitscenter.

Becausesamplesizesarenotalwaysevenlydivisibleby4toformquartiles,we needtoagreeonhowtobreakadatasetupintoapproximatequarters.Othertexts, computerprograms,andcalculatorsmayuseslightlydi↵erentruleswhichproduce slightlydi↵erentquartiles.

FORMULA1.12. Tocalculatethefirstandthirdquartiles,firstorderthelistofobservations andlocatethemedian.The firstquartile Q1 isthemedianoftheobservationsfallingbelow themedianoftheentiresampleandthe thirdquartile Q3 isthemedianoftheobservations fallingabovethemedianoftheentiresample.The interquartilerange isdefinedas

IQR= Q3 Q1 .

ThesampleIQRdescribesthespreadofthemiddle50%ofthesample,thatis,the di↵erencebetweenthefirstandthirdquartiles.Assuch,itisameasureofvariability andiscommonlyreportedwiththemedian.

EXAMPLE1.9. FindthefirstandthirdquartilesandtheIQRforthecypresspine datainExample1.2.

CCH1719313948566873737580122 Depth123456654321

SOLUTION. Themediandepthis 12+1 2 =6 5.Sotherearesixobservationsbelow themedian.Thequartiledepthisthemediandepthofthesesixobservations: 6+1 2 = 3 5.Sothefirstquartileis Q1 = 31+39 2 =35cm.Similarly,thedepthforthe thirdquartileisalso3.5(fromtheright),so Q3 = 73+75 2 =74cm.Finally,the

IQR= Q3 Q1 =74 35=39cm.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.