Data Structure into Java

Page 1

CS61BReader
(SeventhEdition)
DataStructures(IntoJava)
PaulN.Hilfinger UniversityofCalifornia,Berkeley

Acknowledgments. Thankstothefollowingindividualsforfindingmanyofthe errorsinearliereditions:DanBonachea,MichaelClancy,DennisHall,JosephHui, YinaJin,ZhiLin,AmyMok,BarathRaghavanYingssuTsai,EmilyWatt,and ZihanZhou.

Copyright c 2000,2001,2002,2004,2005,2006,2007,2008,2009,2011,2012, 2013byPaulN.Hilfinger.Allrightsreserved.
Contents 1AlgorithmicComplexity 7 1.1Asymptoticcomplexityanalysisandordernotation.... ......9 1.2Examples.................................11 1.2.1Demonstrating“Big-Ohness”..................13 1.3ApplicationstoAlgorithmAnalysis................. ..13 1.3.1Linearsearch...........................14 1.3.2Quadraticexample........................15 1.3.3Explosiveexample........................15 1.3.4Divideandconquer........................16 1.3.5Divideandfighttoastandstill.................17 1.4Amortization...............................18 1.5ComplexityofProblems.........................20 1.6SomePropertiesofLogarithms.....................21 1.7ANoteonNotation...........................22 2DataTypesintheAbstract 23 2.1Iterators..................................23 2.1.1TheIteratorInterface......................24 2.1.2TheListIteratorInterface....................26 2.2TheJavaCollectionAbstractions................... .26 2.2.1TheCollectionInterface.....................26 2.2.2TheSetInterface.........................33 2.2.3TheListInterface........................33 2.2.4OrderedSets...........................37 2.3TheJavaMapAbstractions.......................39 2.3.1TheMapInterface........................41 2.3.2TheSortedMapInterface....................41 2.4AnExample................................41 2.5ManagingPartialImplementations:DesignOptions.... .....46 3MeetingaSpecification 49 3.1DoingitfromScratch..........................52 3.2TheAbstractCollectionClass...................... 52 3.3ImplementingtheListInterface.................... .53 3.3.1TheAbstractListClass.....................53 3
4 CONTENTS 3.3.2TheAbstractSequentialListClass................ 56 3.4TheAbstractMapClass.........................60 3.5PerformancePredictions.........................60 4SequencesandTheirImplementations65 4.1ArrayRepresentationoftheListInterface........... ....65 4.2LinkinginSequentialStructures................... .69 4.2.1SinglyLinkedLists........................69 4.2.2Sentinels..............................70 4.2.3DoublyLinkedLists.......................70 4.3LinkedImplementationoftheListInterface.......... ....72 4.4SpecializedLists.............................78 4.4.1Stacks...............................78 4.4.2FIFOandDouble-EndedQueues................81 4.5Stack,Queue,andDequeImplementation.............. .81 5Trees 91 5.1Expressiontrees.............................93 5.2Basictreeprimitives...........................94 5.3Representingtrees............................96 5.3.1Root-downpointer-basedbinarytrees............. 96 5.3.2Root-downpointer-basedorderedtrees............ .96 5.3.3Leaf-uprepresentation......................97 5.3.4Arrayrepresentationsofcompletetrees........... .98 5.3.5Alternativerepresentationsofemptytrees........ ...99 5.4Treetraversals...............................100 5.4.1Generalizedvisitation......................101 5.4.2Visitingemptytrees.......................103 5.4.3Iteratorsontrees.........................104 6SearchTrees 107 6.1OperationsonaBST...........................109 6.1.1SearchingaBST.........................109 6.1.2InsertingintoaBST.......................109 6.1.3DeletingitemsfromaBST....................111 6.1.4Operationswithparentpointers................113 6.1.5Degeneracystrikes........................113 6.2ImplementingtheSortedSetinterface............... ...113 6.3OrthogonalRangeQueries........................115 6.4Priorityqueuesandheaps........................119 6.4.1HeapifyTime...........................126 6.5GameTrees................................127 6.5.1Alpha-betapruning.......................129 6.5.2Agame-treesearchalgorithm..................131
CONTENTS 5 7Hashing 133 7.1Chaining..................................133 7.2Open-addresshashing..........................134 7.3Thehashfunction............................138 7.4Performance................................140 8SortingandSelecting 141 8.1Basicconcepts..............................141 8.2ALittleNotation.............................142 8.3Insertionsorting.............................143 8.4Shell’ssort................................143 8.5Distributioncounting...........................148 8.6Selectionsort...............................148 8.7Exchangesorting:Quicksort....................... 151 8.8Mergesorting...............................153 8.8.1Complexity............................155 8.9Speedofcomparison-basedsorting.................. .155 8.10Radixsorting...............................158 8.10.1LSD-firstradixsorting......................159 8.10.2MSD-firstradixsorting.....................159 8.11Usingthelibrary.............................162 8.12Selection..................................162 9BalancedSearching 165 9.1BalancebyConstruction:B-Trees................... 165 9.1.1B-treeInsertion..........................167 9.1.2B-treedeletion..........................167 9.1.3Red-BlackTrees:BinarySearchTreesas(2,4)Trees.. ...172 9.2Tries....................................172 9.2.1Tries:basicpropertiesandalgorithms............ .174 9.2.2Tries:Representation......................179 9.2.3Tablecompression........................180 9.3RestoringBalancebyRotation.....................181 9.3.1AVLTrees.............................184 9.4SplayTrees................................186 9.4.1Analyzingsplaytrees......................188 9.5SkipLists.................................195 10ConcurrencyandSynchronization201 10.1SynchronizedDataStructures..................... .202 10.2MonitorsandOrderlyCommunication................ .203 10.3MessagePassing.............................205
6 CONTENTS 11Pseudo-RandomSequences 207 11.1Linearcongruentialgenerators................... ..207 11.2AdditiveGenerators...........................209 11.3Otherdistributions............................210 11.3.1Changingtherange.......................210 11.3.2Non-uniformdistributions....................211 11.3.3Finitedistributions........................212 11.4Randompermutationsandcombinations.............. ..215 12Graphs 217 12.1AProgrammer’sSpecification...................... 218 12.2Representinggraphs...........................219 12.2.1AdjacencyLists..........................219 12.2.2Edgesets.............................224 12.2.3Adjacencymatrices........................225 12.3GraphAlgorithms............................226 12.3.1Marking..............................226 12.3.2Ageneraltraversalschema....................227 12.3.3Genericdepth-firstandbreadth-firsttraversal.... .....228 12.3.4Topologicalsorting........................228 12.3.5Minimumspanningtrees.....................229 12.3.6Single-sourceshortestpaths................... 232 12.3.7A*search.............................234 12.3.8Kruskal’salgorithmforMST..................237

Chapter1

AlgorithmicComplexity

Theobviouswaytoanswertothequestion“Howfastdoessuch-and-suchaprogram run?”istousesomethingliketheUNIX time commandtofindoutdirectly.There arevariouspossibleobjectionstothiseasyanswer.Thetimerequiredbyaprogram isafunctionoftheinput,sopresumablywehavetotimeseveralinstancesofthe commandandextrapolatetheresult.Someprograms,however,behavefinefor most inputs,butsometimestakeaverylongtime;howdowereport(indeed,howcanwe besuretonotice)suchanomalies?Whatdowedoaboutalltheinputsforwhichwe havenomeasurements?Howdowevalidlyapplyresultsgatheredononemachine toanothermachine?

Thetroublewithmeasuringrawtimeisthattheinformationisprecise,but limited:thetimefor this inputon this configurationof this machine.Onadifferent machinewhoseinstructionstakedifferentabsoluteorrelativetimes,thenumbers don’tnecessarilyapply.Indeed,supposewecomparetwodifferentprogramsfor doingthesamethingonthesameinputsandthesamemachine.ProgramAmay turnoutfasterthanprogramB.Thisdoes not imply,however,thatprogramAwill befasterthanBwhentheyarerunonsomeotherinput,oronthe sameinput,but someothermachine.

Inmathematese,wemightsaythatarawtimeisthevalueofafunction

Cr(I,P,M )forsomeparticularinput I,someprogram P ,andsome“platform”

M (platform hereisacatchalltermforacombinationofmachine,operatingsystem,compiler,andruntimelibrarysupport).I’veinvented thefunction Cr hereto mean“therawcostof....”Wecanmakethefigurealittlemoreinformativeby summarizingover all inputsofaparticularsize C

where |I| denotesthe“size”ofinput I.Howonedefinesthesizedependsonthe problem:if I isanarraytobesorted,forexample, |I| mightdenote I .length.We saythat Cw measures worst-casetime ofaprogram.Ofcourse,sincethenumber ofinputsofagivensizecouldbeverylarge(thenumberofarraysof5 ints,for example,is2160 > 1048 ),wecan’tdirectlymeasure Cw,butwecanperhapsestimate itwiththehelpofsomeanalysisof P .Byknowingworst-casetimes,wecanmake

|I|=N Cr
w(N,P,M )=max
(I,P,M ),
7

CHAPTER1.ALGORITHMICCOMPLEXITY

conservative statementsabouttherunningtimeofaprogram:iftheworst-case timeforinputofsize N is T ,thenweareguaranteedthat P willconsumenomore thantime T for any inputofsize N .

Butofcourse,italwayspossiblethatourprogramwillworkfineonmostinputs, buttakeareallylongtimeononeortwo(unlikely)inputs.In suchcases,wemight claimthat Cw istooharshasummarymeasure,andweshouldreallylookatan average time.Assumingallvaluesoftheinput, I,areequallylikely,theaverage timeis

Fairthismaybe,butitisusuallyveryhardtocompute.Inthiscourse,therefore,Iwillsayverylittleaboutaveragecases,leavingthattoyournextcourseon algorithms.

We’vesummarizedoverinputsbyconsideringworst-casetimes;nowlet’sconsiderhowwecansummarizeovermachines.Justassummarizingoverinputs requiredthatwegiveupsomeinformation—namely,performanceonparticular inputs—sosummarizingovermachinesrequiresthatwegiveupinformationon preciseperformanceonparticularmachines.Supposethattwodifferentmodelsof computerarerunning(differenttranslationsof)thesameprogram,performingthe samestepsinthesameorder.Althoughtheyrunatdifferentspeeds,andpossibly executedifferentnumbersofinstructions,thespeedsatwhichtheyperformany particularsteptendtodifferbysomeconstantfactor.Bytakingthelargestand smallestoftheseconstantfactors,wecanputboundsaround thedifferenceintheir overallexecutiontimes.(Theargumentisnotreallythissimple,butforourpurposeshere,itwillsuffice.)Thatis,thetimingsofthesameprogramonanytwo platformswilltendtodifferbynomorethansomeconstantfactoroverallpossible inputs.Ifwecannaildownthetimingofaprogramononeplatform,wecanuseit forallothers,andourresultswill“onlybeoffbyaconstantfactor.”

Butofcourse,1000isaconstantfactor,andyouwouldnotnormallybeinsensitivetothefactthatBrandXprogramis1000timesslowerthanBrandY. Thereis,however,animportantcaseinwhichthissortofcharacterizationisuseful:namely,whenwearetryingtodetermineorcomparetheperformanceof algorithms—idealizedproceduresforperformingsometask.Thedistinctionbetweenalgorithmandprogram(aconcrete,executableprocedure)issomewhatvague.Most higher-levelprogramminglanguagesallowonetowriteprogramsthatlookvery muchlikethealgorithmstheyaresupposedtoimplement.The distinctionliesin thelevelofdetail.Aprocedurethatiscastintermsofoperationson“sets,”with nospecificimplementationgivenforthesesets,probablyqualifiesasanalgorithm. Whentalkingaboutidealizedprocedures,itdoesn’tmakeagreatdealofsenseto talkaboutthenumberofsecondstheytaketoexecute.Rather,weareinterested inwhatImightcallthe shape ofanalgorithm’sbehavior:suchquestionsas“Ifwe doublethesizeoftheinput,whathappenstotheexecutiontime?”Giventhatkind ofquestion,theparticular units oftime(orspace)usedtomeasuretheperformance ofanalgorithmareunimportant—constantfactorsdon’tmatter.

8
Ca(N,P,M )= |I|=N Cr(I,P,M ) N

1.1.ASYMPTOTICCOMPLEXITYANALYSISANDORDERNOTATION 9

Ifweonlycareaboutcharacterizingthespeedofanalgorithmtowithina constantfactor,othersimplificationsarepossible.Weneednolongerworryabout thetimingofeachlittlestatementinthealgorithm,butcan measuretimeusing anyconvenient“markerstep.”Forexample,tododecimalmultiplicationinthe standardway,youmultiplyeachdigitofthemultiplicandby eachdigitofthe multiplier,andperformroughlyoneone-digitadditionwithcarryforeachofthese one-digitmultiplications.Countingjusttheone-digitmultiplications,therefore,will giveyouthetimewithinaconstantfactor,andthesemultiplicationsareveryeasy tocount(theproductofthenumbersofdigitsintheoperands).

Anothercharacteristicassumptioninthestudyof algorithmiccomplexity (i.e., thetimeormemoryconsumptionofanalgorithm)isthatweare interestedin typical behaviorofanidealizedprogramovertheentiresetofpossibleinputs.Idealized programs,ofcourse,beingideal,canoperateoninputsofanypossiblesize,andmost “possiblesizes”intheidealworldofmathematicsareextremelylarge.Therefore,in thiskindofanalysis,itistraditionalnottobeinterested inthefactthataparticular algorithmdoesverywellforsmallinputs,butrathertoconsideritsbehavior“in thelimit”asinputgetsverylarge.Forexample,supposethatonewantedto analyzealgorithmsforcomputing π toanygivennumberofdecimalplaces.Ican make any algorithmlookgoodforinputsupto,say,1,000,000bysimplystoring thefirst1,000,000digitsof π inanarrayandusingthattosupplytheanswer when1,000,000orfewerdigitsarerequested.Ifyoupaidany attentiontohowmy programperformedforinputsupto1,000,000,youcouldbeseriouslymisledasto theclevernessofmyalgorithm.Therefore,whenstudyingalgorithms,welookat their asymptoticbehavior —howtheybehaveastheyinputsizegoestoinfinity.

Theresultofalltheseconsiderationsisthatinconsideringthetimecomplexity ofalgorithms,wemaychooseanyparticularmachineandcountanyconvenient markerstep,andwetrytofindcharacterizationsthataretrueasymptotically—out toinfinity.Thisimpliesthatourtypicalcomplexitymeasureforalgorithmswill havetheform Cw(N,A)—meaning“theworst-casetimeoverallinputsofsize N ofalgorithm A (insomeunits).”Sincethealgorithmwillbeunderstoodinany particulardiscussion,wewillusuallyjustwrite Cw(N )orsomethingsimilar.Sothe firstthingweneedtodescribealgorithmiccomplexityisawaytocharacterizethe asymptoticbehavioroffunctions.

1.1Asymptoticcomplexityanalysisandordernotation

Asithappens,thereisaconvenientnotationaltool—knowncollectivelyas order notation for“orderofgrowth”—fordescribingtheasymptoticbehavioroffunctions. Itmaybe(andis)usedforanykindofinteger-orreal-valued function—notjust complexityfunctions.You’veprobablyseenitusedincalculuscourses,forexample.

Wewrite

f (n) ∈ O(g(n)) (aloud,thisis“f (n)isinbig-Ohof g(n)”)tomeanthatthefunction f iseventually

Figure1.1: Illustrationofbig-Ohnotation.Ingraph(a),weseethat |f (n)|≤ 2|g(n)| for n>M ,sothat f (n) ∈ O(g(n))(with K =2).Likewise, h(n) ∈ O(g(n)), illustratingthe g canbeaveryover-cautiousbound.Thefunction f isalsobounded below byboth g (with,forexample, K =0 5and M anyvaluelargerthan0)andby h.Thatis, f (n) ∈ Ω(g(n))and f (n) ∈ Ω(h(n)).Because f isboundedaboveand belowbymultiplesof g,wesay f (n) ∈ Θ(g(n)).Ontheotherhand, h(n) ∈ Ω(g(n)). Infact,assumingthat g continuestogrowasshownand h toshrink, h(n) ∈ o(g(n)). Graph(b)showsthat o(·)isnotsimplythesetcomplementofΩ(·); h′(n) ∈ Ω(g′(n)), but h′(n) ∈ o(g′(n)),either.

boundedbysomemultipleof |g(n)|.Moreprecisely, f (n) ∈ O(g(n))iff

|f (n)|≤ K ·|g(n)|, forall n>M, forsomeconstants K> 0and M .Thatis, O(g(n))isthe set offunctionsthat “grownomorequicklythan” |g(n)| doesas n getssufficientlylarge.Somewhat confusingly, f (n)heredoesnotmean“theresultofapplying f to n,”asitusually does.Rather,itistobeinterpretedasthe bodyofafunction whoseparameteris n Thus,weoftenwritethingslike O(n2)tomean“thesetofallfunctionsthatgrow nomorequicklythanthesquareoftheirargument1.”Figure1.1agivesanintuitive ideaofwhatitmeanstobein O(g(n)).

Sayingthat f (n) ∈ O(g(n))givesusonlyan upperbound onthebehaviorof f . Forexample,thefunction h inFigure1.1a—andforthatmatter,thefunctionthat

1Ifwewantedtobeformallycorrect,we’duselambdanotation torepresentfunctions(suchas Schemeuses)andwriteinstead O(λn.n2 ),butI’msureyoucanseehowsuchadegreeofrigor wouldbecometediousverysoon.

10 CHAPTER1.ALGORITHMICCOMPLEXITY 2|g(n)| 0.5|g(n)| |f (n)| |h(n)| f n (a) n = M n (b) |g′(n)| |h′(n)|

1.2.EXAMPLES 11

is0everywhere—arebothin O(g(n)),butcertainlydon’tgrowlike g.Accordingly, wedefine f (n) ∈ Ω(g(n))iffforall n>M, |f (n)|≥ K|g(n)| for n>M ,for someconstants K> 0and M .Thatis,Ω(g(n))isthesetofallfunctionsthat “grow atleast asfastas” g beyondsomepoint.Alittlealgebrasufficestoshowthe relationshipbetween O( )andΩ( ):

|f (n)|≥ K|g(n)|≡|g(n)|≤ (1/K) ·|f (n)|

so

f (n) ∈ Ω(g(n)) ⇐⇒ g(n) ∈ O(f (n))

Becauseofourcavaliertreatmentofconstantfactors,itis possibleforafunction f (n)tobeboundedbothaboveandbelowbyanotherfunction g(n): f (n) ∈ O(g(n)) and f (n) ∈ Ω(g(n)).Forbrevity,wewrite f (n) ∈ Θ(g(n)),sothatΘ(g(n))= O(g(n)) ∩ Ω(g(n)).

Justbecauseweknowthat f (n) ∈ O(g(n)),wedon’tnecessarilyknowthat f (n)getsmuchsmallerthan g(n),oreven(asillustratedinFigure1.1a)thatit iseversmallerthan g(n).Weoccasionallydowanttosaysomethinglike“h(n) becomesnegligible comparedto g(n).”Yousometimesseethenotation h(n) ≪ g(n), meaning“h(n)ismuchsmallerthan g(n),”butthiscouldapplytoasituationwhere h(n)=0.001g(n). Notbeinginterestedinmereconstantfactorslikethis,weneed somethingstronger.Atraditionalnotationis“little-oh,”definedasfollows.

h(n) ∈ o(g(n)) ⇐⇒ lim n→∞ h(n)/g(n)=0.

It’seasytoseethatif h(n) ∈ o(g(n)), then h(n) ∈ Ω(g(n));noconstant K can workinthedefinitionofΩ( ).Itisnotthecase,however,thatallfunctionsthat are outside ofΩ(g(n))mustbein o(g(n)),asillustratedinFigure1.1b.

1.2Examples

Youmayhaveseenthebig-Ohnotationalreadyincalculuscourses.Forexample, Taylor’stheoremtellsus2 that(underappropriateconditions)

forsome y between0and x,where f [k] representsthe kth derivativeof f .Therefore, if g(x)representsthemaximumabsolutevalueof f [n] between0and x,thenwe couldalsowritetheerrortermas f (x)

))

2 Yes,Iknowit’saMaclaurinserieshere,butit’sstillTaylor’stheorem.

errorterm
approximation
f (x)= xn n! f [n](y)
+ 0≤k<n f [k](0) xk k!
0
≤k<n f [k](0) xk k! ∈ O( xn n! g(x))= O(x n g(x

f (n) Iscontainedin Is not containedin

1, 1+1/n O(10000),O(√n),O(n), O(1/n),O(e n)

O(n2),O(lg n),O(1 1/n)

Ω(1), Ω(1/n), Ω(1 1/n) Ω(n), Ω(√n), Ω(lg n), Ω(n2)

Θ(1), Θ(1 1/n) Θ(n), Θ(n2), Θ(lg n), Θ(√n)

o(n),o(√n),o(n2) o(100+ e n),o(1)

logk n, ⌊logk n⌋, O(n),O(nǫ),O(√n),O(logk′ n) O(1)

⌈logk n⌉ O(⌊logk′ n⌋),O(n/ logk′ n)

Ω(1), Ω(logk′ n), Ω(⌊logk′ n⌋) Ω(nǫ), Ω(√n)

Θ(logk′ n), Θ(⌊logk′ n⌋), Θ(log2 k′ n), Θ(logk′ n + n)

Θ(logk′ n +1000)

o(n),o(nǫ)

n, 100n +15 O( 0005n 1000),O(n2), O(10000),O(lg n),

O(n lg n) O(n n2/10000),O(√n)

Ω(50n +1000), Ω(√n), Ω(n2), Ω(n lg n)

Ω(n +lg n), Ω(1/n)

Θ(50n +100), Θ(n +lg n) Θ(n2), Θ(1)

o(n3),o(n lg n) o(1000n),o(n2 sin n)

n2 , 10n2 + n O(n2 +2n +12),O(n3), O(n),O(n lg n),O(1)

O(n2 + √n) o(50n2 +1000)

Ω(n2 +2n +12), Ω(n), Ω(1), Ω(n3), Ω(n2 lg n)

Ω(n lg n)

Θ(n2 +2n +12), Θ(n2 +lg n) Θ(n), Θ(n sin n)

np

O(pn),O(np +1000np 1 ) O(np 1),O(1)

Ω(np ǫ), Ω(np+ǫ), Ω(pn)

Θ(np + np ǫ) Θ(np+ǫ), Θ(1)

o(pn),o(n!),o(np+ǫ) o((n + k)p)

2n , 2n + np O(n!),O(2n np),O(3n),O(2n+p) O(np),O((2 δ)n)

Ω(np), Ω((2 δ)n), Ω((2+ ǫ)n), Ω(n!)

Θ(2n + np) Θ(22n)

o(n2n),o(n!),o(2n+ǫ),o((2+ ǫ)n)

Table1.1:Someexamplesoforderrelations.Intheabove,namesotherthan n representconstants,with ǫ> 0,0 ≤ δ ≤ 1, p> 1,and k,k′ > 1.

12
CHAPTER1.ALGORITHMICCOMPLEXITY

1.3.APPLICATIONSTOALGORITHMANALYSIS 13

forfixed n.Thisis,ofcourse,amuchweakerstatementthantheoriginal(itallows theerrortobemuchbiggerthanitreallyis).

You’lloftenseenstatementslikethiswrittenwithalittle algebraicmanipulation:

f (x) ∈ 0≤k<n f [k](0) xk k! + O(x n g(x)).

Tomakesenseofthissortofstatement,wedefineaddition(andsoon)between functions(a, b,etc.)andsetsoffunctions(A, B,etc.):

a + b = λx.a(x)+ b(x)

A + B = {a + b | a ∈ A,b ∈ B}

A + b = {a + b | a ∈ A} a + B = {a + b | b ∈ B}

Similardefinitionsapplyformultiplication,subtraction,anddivision.Soif a is √x and b islg x,then a + b isafunctionwhosevalueis √x +lg x forevery(postive) x. O(a(x))+ O(b(x))(orjust O(a)+ O(b))isthenthesetoffunctionsyoucanget byaddingamemberof O(√x)toamemberof O(lg x).Forexample, O(a)contains

5√x+3and O(b)contains0 01lg x 16,so O(a)+O(b)contains5√x+0 01lg k 13, amongmanyothers.

1.2.1Demonstrating“Big-Ohness”

Supposewewanttoshowthat5n2 +10√n ∈ O(n2).Thatis,weneedtofind K and M sothat

5n 2 +10√n|≤|Kn2|, for n>M.

Werealizethat n2 growsfasterthan √n,soiteventuallygetsbiggerthan10√n as well.Soperhapswecantake K =6andfind M> 0suchthat

7.Sochoosing M> 5 certainlyworks.

Toget10√n<n2,weneed10 <n

1.3ApplicationstoAlgorithmAnalysis

Inthiscourse,wewillbeusuallydealwithinteger-valuedfunctionsarisingfrom measuringthecomplexityofalgorithms.Table1.1givesafewcommonexamples ofordersthatwedealwithandtheircontainmentrelations, andthesectionsbelow giveexamplesofsimplealgorithmicanalysesthatusethem.

|
5n 2 +10√n ≤ 5n 2 + n 2 =6n 2
3/2,or n> 102/3 ≈ 4

1.3.1Linearsearch

Let’sapplyallofthistoaparticularprogram.Here’satail-recursivelinearsearch forseeingifaparticularvalueisinasortedarray:

/**TrueiffXisoneofA[k]...A[A.length-1]. *AssumesAisincreasing,k>=0.*/ staticbooleanisIn(int[]A,intk,intX){ if(k>=A.length) returnfalse; elseif(A[k]>X) returnfalse; elseif(A[k]==X) returntrue; else returnisIn(A,k+1,X);

Thisisessentiallyaloop.Asameasureofitscomplexity,let’sdefine CisIn(N ) asthemaximumnumberofinstructionsitexecutesforacallwith k =0and A.length= N .Byinspection,youcanseethatsuchacallwillexecutethefirst if testupto N +1times,thesecondandthirdupto N times,andthetail-recursive callon isIn upto N times.Withonecompiler3,eachrecursivecallof isIn executes atmost14instructionsbeforereturningortail-recursivelycalling isIn.Theinitial callexecutes18.Thatgivesatotalofatmost14N +18instructions.Ifinsteadwe countthenumberofcomparisons k>=A.length,wegetatmost N +1.Ifwecount thenumberofcomparisonsagainst X orthenumberoffetchesof A[0],wegetat most2N .Wecouldthereforesaythatthefunctiongivingthelargest amountof timerequiredtoprocessaninputofsize N iseitherin O(14N +18), O(N +1), or O(2N ).However,theseareallthesameset,andinfactallareequalto O(N ). Therefore,wemaythrowawayallthosemessyintegersanddescribe CisIn(N )as beingin O(N ),thusillustratingthesimplifyingpowerofignoringconstantfactors. Thisboundisaworst-casetime.Forallargumentsinwhich X<=A[0],the isIn functionrunsinconstanttime.Thattimebound—the best-case bound—isseldom veryuseful,especiallywhenitappliestosoatypicalaninput.

Givingan O( )boundto CisIn(N )doesn’ttellusthat isIn must taketime proportionalto N evenintheworstcase,onlythatittakesnomore.Inthis particularcase,however,theargumentusedaboveshowsthattheworstcaseis,in fact,atleastproportionalto N ,sothatwemayalsosaythat CisIn(N ) ∈ Ω(N ).

Puttingthetworesultstogether, CisIn(N ) ∈ Θ(N ).

Ingeneral,then,asymptoticanalysisofthespaceortimerequiredforagiven algorithminvolvesthefollowing.

• Decidingonanappropriatemeasureforthe size ofaninput(e.g.,lengthof anarrayoralist).

3aversionofgccwiththe-Ooption,generatingSPARCcodefor aSunSparcstationIPC workstation.

14 CHAPTER1.ALGORITHMICCOMPLEXITY
}

1.3.APPLICATIONSTOALGORITHMANALYSIS

• Choosingarepresentativequantitytomeasure—onethatisproportionalto the“real”spaceortimerequired.

• Comingupwithoneormorefunctionsthatboundthequantitywe’vedecided tomeasure,usuallyintheworstcase.

• Possiblysummarizingthesefunctionsbygiving O(·),Ω(·),orΘ(·)characterizationsofthem.

1.3.2Quadraticexample

Hereisabitofcodeforsortingintegers: staticvoidsort(int[]A){ for(inti=1;i<A.length;i+=1){ intx=A[i]; intj; for(j=i;j>0&&x<A[j-1];j-=1) A[j]=A[j-1]; A[j]=x; } }

Ifwedefine Csort(N )astheworst-casenumberoftimesthecomparison x<A[j-1] isexecutedfor N = A.length,weseethatforeachvalueof i from1to A.length-1, theprogramexecutesthecomparisonintheinnerloop(on j)atmost i times. Therefore,

Csort(N )=1+2+ + N 1 = N (N 1)/2 ∈ Θ(N 2)

Thisisacommonpatternfornestedloops.

1.3.3Explosiveexample

Considerafunctionwiththefollowingform. staticintboom(intM,intX){ if(M==0) returnH(X); returnboom(M-1,Q(X)) +boom(M-1,R(X)); }

andsupposewewanttocompute Cboom(M )—thenumberoftimes Q iscalledfor agiven M intheworstcase.If M =0,thisis0.If M> 0,then Q getsexecuted onceincomputingtheargumentofthefirstrecursivecall,andthenitgetsexecuted howevermanytimesthetwoinnercallsof boom withargumentsof M 1execute

15

it.Inotherwords,

CHAPTER1.ALGORITHMICCOMPLEXITY

Cboom(0)=0

Cboom(i)=2Cboom(i 1)+1

Alittlemathematicalmassage:

Cboom(M )=2Cboom(M 1)+1, for M ≥ 1

=2(2Cboom(M 2)+1)+1, for M ≥ 2

andso Cboom(M ) ∈ Θ(2M ).

1.3.4Divideandconquer

Thingsbecomemoreinterestingwhentherecursivecallsdecreasethesizeofparametersbyamultiplicativeratherthananadditivefactor.Consider,forexample, binarysearch.

/**ReturnstrueiffXisoneof *A[L]...A[U].AssumesAincreasing, *L>=0,U-L<A.length.*/ staticbooleanisInB(int[]A,intL,intU,intX){ if(L>U)

returnfalse; else{ intm=(L+U)/2; if(A[m]==X)

returntrue; elseif(A[m]>X) returnisInB(A,L,m-1,X); else

returnisInB(A,m+1,U,X);

Theworst-casetimeheredependsonthenumberofelementsof A underconsideration, U L +1,whichwe’llcall N .Let’susethenumberoftimesthefirstline isexecutedasthecost,sinceiftherestofthebodyisexecuted,thefirstlinealso hadtohavebeenexecuted4.If N> 1,thecostofexecuting

4Forthoseofyouseekingarcaneknowledge,wesaythatthetest L>U dominates allother statements.

16
=2( (2 M 0+1)+1)
M = 0≤j≤M 1 2j =2M 1
+1
} }
isInB is1comparison

1.3.APPLICATIONSTOALGORITHMANALYSIS 17

of L and U followedbythecostofexecuting isInB eitherwith ⌊(N 1)/2⌋ orwith ⌈(N 1)/2⌉ asthenewvalueof N 5.Eitherquantityisnomorethan ⌈(N 1)/2⌉. If N ≤ 1,therearetwocomparisonsagainst N intheworstcase. Therefore,thefollowingrecurrencedescribesthecost, CisInB(i),ofexecuting thisfunctionwhen U L +1= i

CisInB(1)=2

CisInB(i)=1+ CisInB(⌈(i 1)/2⌉),i> 1

Thisisabithardtodealwith,solet’sagainmakethereasonableassumptionthat thevalueofthecostfunction,whateveritis,mustincrease as N increases.Then wecancomputeacostfunction, C ′ isInB thatisslightlylargerthan CisInB,but easiertocompute.

C ′ isInB(1)=2

C ′ isInB(i)=1+ C ′ isInB(i/2),i> 1apowerof2.

Thisisaslightover-estimateof CisInB,butthatstillallowsustocomputeupper bounds.Furthermore, C ′ isInB isdefinedonlyonpowersoftwo,butsince isInB’s costincreasesas N increases,wecanstillbound CisInB(N )conservativelyby computing C ′ isInB ofthenexthigherpowerof2.Againwiththemassage:

C ′ isInB(i)=1+ C ′ isInB(i/2),i> 1apowerof2.

=1+1+ C ′ isInB(i/4),i> 2apowerof2.

=1+ +1

lg N +2

Thequantitylg N isthelogarithmof N base2,orroughly“thenumberoftimesone candivide N by2beforereaching1.”Insummary,wecansay CisIn(N ) ∈ O(lg N ). Similarly,onecaninfactderivethat CisIn(N ) ∈ Θ(lg N ).

1.3.5Divideandfighttoastandstill

Considernowasubprogramthatcontains two recursivecalls. staticvoidmung(int[]A,L,U){ if(L<U){ intm=(L+U)/2; mung(A,L,m); mung(A,m+1,U);

5 Thenotation ⌊x⌋ meanstheresultofrounding x down(toward −∞)toaninteger,and ⌈x⌉ meanstheresultofrounding x uptoaninteger.

} }

CHAPTER1.ALGORITHMICCOMPLEXITY

Wecanapproximatetheargumentsofbothoftheinternalcallsby N/2asbefore, endingupwiththefollowingapproximation, Cmung(N ),tothecostofcalling mung withargument N = U L +1(wearecountingthenumberoftimesthetestinthe firstlineexecutes).

Cmung(1)=3

Cmung(i)=1+2Cmung(i/2),i> 1apowerof2.

So,

Cmung(N )=1+2(1+2Cmung(N/4)),N> 2apowerof2.

=1+2+4+ + N/2+ N · 3

Thisisasumofageometricseries(1+ r + r2 + + rm),withalittleextraadded on.Thegeneralruleforgeometricseriesis

so,taking r =2,

or Cmung(N ) ∈ Θ(N ).

1.4Amortization

Sofar,wehaveconsideredthetimespentbyindividualoperations,orindividual callsonacertainfunctionofinterest.Sometimes,however,itisfruitfultoconsider thecostofwholesequenceofcalls,especiallywheneachcallaffectsthecostoflater calls.

Consider,forexample,asimplebinarycounter.Incrementingthiscountercauses ittogothroughasequencelikethis:

Eachstepconsistsof flipping acertainnumberofbits,convertingbit b to1 b. Moreprecisely,thealgorithmforgoingfromonesteptoanotheris

18
0≤k≤m r k =(rm
+1 1)/(r 1)
C
mung(N )=4N 1
00000 00001 00010 00011 00100 01111 10000 ···

Increment: Flipthebitsofthecounterfromrighttoleft,uptoandincludingthe first0-bitencountered(ifany).

Clearly,ifweareaskedtogiveaworst-caseboundonthecost oftheincrement operationforan N -bitcounter(innumberofflips),we’dhavetosaythatitis Θ(N ):allthebitscanbeflipped.Usingjustthatbound,we’dthen havetosay thatthecostofperforming M incrementoperationsisΘ(M · N ). Butthecostsofconsecutiveincrementoperationsarerelated.Forexample,if oneincrementflipsmorethanonebit,thenextincrementwill alwaysflipexactly one(why?).Infact,ifyouconsiderthepatternofbitchanges,you’llseethatthe units(rightmost)bitflipsoneveryincrement,the2’sbiton everysecondincrement, the4’sbitoneveryfourthincrement,andingeneral,then2k’sbitonevery(2k)th increment.Therefore,overanysequenceof M consecutiveincrements,startingat 0,therewillbe

Inotherwords,thisisthesameresultwewouldgetifweperformed M incrementseachofwhichhadaworst-casecostof2flips,ratherthan N .Wecall2flips the amortizedcost ofanincrement.To amortize inthecontextofalgorithmsisto treatthecostofeachindividualoperationinasequenceasifitwerespreadout amongalltheoperationsinthesequence6.Anyparticularincrementmighttakeup to N flips,butwetreatthatas N/M flipscreditedtoeachincrementoperationin thesequence(andlikewisecounteachincrementthattakesonlyoneflipas1/M flip foreachincrementoperation).Theresultisthatwegetamorerealisticideaofhow muchtimetheentireprogramwilltake;simplymultiplyingtheordinaryworst-case timeby M givesusaverylooseandpessimisticestimate.Norisamortizedcostthe sameasaveragecost;itisastrongermeasure.Ifacertainoperationhasagiven averagecost,thatleavesopenthepossibilitythatthereis someunlikelysequence ofinputsthatwillmakeitlookbad.Aboundonamortizedworst-casecost,onthe otherhand,is guaranteed toholdregardlessofinput.

Anotherwaytoreachthesameresultuseswhatiscalledthe potentialmethod7 .The ideahereisthatweassociatewithourdatastructure(ourbitsequenceinthiscase) anon-negative potential thatrepresentsworkwewishtospreadoutoverseveraloperations.If ci representstheactualcostofthe ith operationonourdatastructure,

6 Theword amortize comesfromanOldFrenchwordmeaning“todeath.”Theoriginalmeaning fromwhichthecomputer-scienceusagecomes(introducedby SleatorandTarjan),is“togradually writeofftheinitialcostofsomething.”

7 AlsoduetoD.Sleator.

1.4.AMORTIZATION 19
M unit’sflips + ⌊M/2⌋ 2’sflips + ⌊M/4⌋ 4’sflips + ... + ⌊M/2n⌋ 2n’sflips , where n = ⌊lg M ⌋ =2n +2n 1 +2n 2 + ... +1 =2n+1 1 +(M 2n) =2n 1+ M < 2M flips

wedefinetheamortizedcostofthe ith operation, ai sothat

, (1.1)

whereΦi denotesthesaved-uppotentialbeforethe ith operation.Thatis,wegive ourselvesthechoiceofincreasingΦalittleonanygivenoperationandchargingthis increaseagainst ai,causing ai >ci whenΦincreases.Alternatively,wecanalso decrease ai below ci byhavinganoperationreduceΦ,ineffectusinguppreviously savedincreases.AssumingwestartwithΦ0 =0,thetotalcostof n operationsis

sincewerequirethatΦi ≥ 0.These ai thereforeprovideconservativeestimatesof thecumulativecostoftheoperationsateachpoint.

Forexample,withourbit-flippingexample,we’lldefineΦi asthetotalnumber of1-bitsbeforethe ith operation.Thecostofthe ith incrementisalwaysoneplus thenumberof1-bitsthatflipbackto0,which,becauseofhowwe’vedefinedit, canneverbemorethanΦi (whichofcourseisnevernegative).Sodefining ai =2 foreveryoperationsatisfiesEquation1.1,againprovingthatwecanboundthe amortizedcostofanincrementby2bit-flips.

Ifudgedabitherebyassumingthatourbitcounteralwaysstartsat0.Ifit startedinsteadat N0 > 0,andwestoppedafterasingleincrement,thenthetotal cost(inbitflips)couldbeasmuchas1+ ⌊lg(N0 +1)⌋.Sincewewanttoinsure thattheinequality1.2holdsforany n,we’llhavetodosomeadjustingtohandle thiscase.AsimpletrickistoredefineΦ0 =0,keepothervaluesoftheΦi thesame (thenumberof1-bitsbeforethe ith operation,andfinallydefine a0 = c0 +Φ1.In effect,wecharge a0 withthestart-upcostsofourcountingsequence.Ofcourse, this means a0 canbearbitrarilylarge,butthatmerelyreflectsreality;theremaining ai arestillconstant.

1.5ComplexityofProblems

Sofar,Ihavediscussedonlytheanalysisofanalgorithm’scomplexity.Analgorithm,however,isjustaparticularwayofsolvingsomeproblem.Wemighttherefore consideraskingforcomplexityboundsonthe problem’s complexity.Thatis,canwe boundthecomplexityofthe bestpossible algorithm?Obviously,ifwehaveaparticularalgorithmanditstimecomplexityis O(f (n)),where n isthesizeoftheinput, thenthecomplexityofthebestpossiblealgorithmmustalso be O(f (n)).Wecall f (n),therefore,an upperbound onthe(unknown)complexityofthebest-possible

20
CHAPTER1.ALGORITHMICCOMPLEXITY
a
i = ci +Φi+1 Φi
0≤i<n ci ≤ 0≤i<n (ai +Φi Φi +1) =( 0≤i<n ai)+Φ0 Φn =( 0≤i<n ai) Φn ≤ 0≤i<n ai, (1.2)

1.6.SOMEPROPERTIESOFLOGARITHMS

21

algorithm.Butthistellsusnothingaboutwhetherthebest-possiblealgorithmis any faster thanthis—itputsno lowerbound onthetimerequiredforthebestalgorithm.Forexample,theworst-casetimefor isIn isΘ(N ).However, isInB is muchfaster.Indeed,onecanshowthatiftheonlyknowledgethealgorithmcan haveistheresultofcomparisonsbetween X andelementsofthearray,then isInB hasthebestpossiblebound(itis optimal),sothattheentire problem offindingan elementinanorderedarrayhasworst-casetimeΘ(lg N ).

Puttinganupperboundonthetimerequiredtoperformsomeproblemsimply involvesfindinganalgorithmfortheproblem.Bycontrast,puttingagoodlower boundontherequiredtimeismuchharder.Weessentiallyhavetoprovethatno algorithmcanhaveabetterexecutiontimethanourbound,regardlessofhowmuch smarterthealgorithmdesigneristhanweare.Triviallower bounds,ofcourse,are easy:everyproblem’sworst-casetimeisΩ(1),andtheworst-casetimeofanyproblemwhoseanswerdependsonallthedataisΩ(N ),assumingthatone’sidealized machineisatallrealistic.Betterlowerboundsthanthose, however,requirequite abitofwork.Allthebettertokeepourtheoreticalcomputer scientistsemployed.

1.6SomePropertiesofLogarithms

Logarithmsoccurfrequentlyinanalysesofcomplexity,soitmightbeusefultoreview afewfactsaboutthem.Inmostmathcourses,youencounterthenaturallogarithm, ln x =loge x,butcomputerscientiststendtousethebase-2logarithm,lg x =log2 x, andingeneralthisiswhatImeanwhenIsay“logarithm.”Ofcourse,alllogarithms arerelatedbyaconstantfactor:sincebydefinition aloga x = x = blogb x,itfollows that

loga x =loga blogb x =(loga b)logb x.

Theirconnectiontotheexponentialdictatestheirfamiliarproperties:

lg xy =lg x +lg y

lg x/y =lg x lg y

lg xp = p lg x

Incomplexityarguments,weareofteninterestedininequalities.Thelogarithm isaveryslow-growingfunction:

lim x→∞ lg x/xp =0, forallp> 0.

Itisstrictlyincreasingandstrictly concave, meaningthatitsvalueslieaboveanyline segmentjoiningpoints(x, lg x)and(z, lg z) Toputitalgebraically,if0 <x<y<z, then

lgy> y x z x lg x + z y z x lg z.

Therefore,if0 <x + y<k,thevalueoflg x +lg y ismaximizedwhen x = y = k/2.

1.7ANoteonNotation

Otherauthorsusenotationsuchas f (n)= O(n2)ratherthan f (n) ∈ O(n2).Idon’t becauseIconsideritnonsensical.Tojustifytheuseof‘=’, oneeitherhastothink of f (n)asasetoffunctions(whichitisn’t),orthinkof O(n2)asasinglefunction thatdifferswitheachseparateappearanceof O(n2)(whichisbizarre).Icanseeno disadvantagestousing‘∈’,whichmakesperfectsense,sothat’swhatIuse.

Exercises

1.1. Demonstratethefollowing,orgivecounter-exampleswhere indicated.Showingthatacertain O( )formulaistruemeansproducingsuitable K and M for thedefinitionatthebeginningof §1.1.Hint:sometimesitisusefultotakethe logarithmsoftwofunctionsyouarecomparing.

a. O(max(|f0(n)|, |f1 (n)|))= O(f0(n))+ O(f1(n)).

b.If f (n)isapolynomialin n,thenlg f (n) ∈ O(lg n).

c. O(f (n)+ g(n))= O(f (n))+ O(g(n)).Thisisabitoftrickquestion,really, tomakeyoulookatthedefinitionscarefully.Underwhatconditionsisthe equationtrue?

d.Thereisafunction f (x) > 0suchthat f (x) ∈ O(x)and f (x) ∈ Ω(x).

e.Thereisafunction f (x)suchthat f (0)=0,f (1)=100,f (2)=10000,f (3)= 106 ,but f (n) ∈ O(n).

f. n3 lg n ∈ O(n3 0001).

g.Thereisnoconstant k suchthat n3 lg n ∈ Θ(nk).

1.2. Showeachofthefollowing false byexhibitingacounterexample.Assume that f and g areanyreal-valuedfunctions.

a. O(f (x) · s(x))= o(f (x)),assuminglimx→∞ s(x)=0.

b. If f (x) ∈ O(x3)and g(x) ∈ O(x)then f (x)/g(x) ∈ O(x2).

c. If f (x) ∈ Ω(x)and g(x) ∈ Ω(x)then f (x)+ g(x) ∈ Ω(x).

d. If f (100)=1000and f (1000)=1000000then f cannotbe O(1).

e. If f1(x),f2(x),... areabunchoffunctionsthatareallinΩ(1),then

F (N )= 1≤i≤N |fi(x)|∈ Ω(N )

22
CHAPTER1.ALGORITHMICCOMPLEXITY

Chapter2

DataTypesintheAbstract

Mostofthe“classical”datastructurescoveredincourseslikethisrepresentsome sortof collection ofdata.Thatis,theycontainsomesetormultiset1 ofvalues, possiblywithsomeorderingonthem.Someofthesecollectionsofdataare associativelyindexed; theyaresearchstructuresthatactlikefunctionsmappingcertain indexingvalues(keys)intootherdata(suchasnamesintostreetaddresses).

Wecancharacterizethesituationintheabstractbydescribingsetsofoperationsthataresupportedbydifferentdatastructures—thatisbydescribingpossible abstractdatatypes. Fromthepointofviewofaprogramthatneedstorepresent somekindofcollectionofdata,thissetofoperationsisall thatoneneedstoknow.

Foreachdifferentabstractdatatype,therearetypicallyseveralpossibleimplementations.Whichyouchoosedependsonhowmuchdatayourprogramhasto process,howfastithastoprocessthedata,andwhatconstraintsithasonsuch thingsasmemoryspace.Itisadirtylittlesecretofthetradethatforquiteafew programs,ithardlymatterswhatimplementationyouchoose.Nevertheless,the well-equippedprogrammershouldbefamiliarwiththeavailabletools.

Iexpectthatmanyofyouwillfindthischapterfrustrating,becauseitwilltalk mostlyabout interfaces todatatypeswithouttalkingverymuchatallaboutthe implementationsbehindthem.Getusedtoit.Afterall,thestandardlibrarybehind anywidelyusedprogramminglanguageispresentedtoyou,theprogrammer,asa setofinterfaces—directionsforwhatparameterstopassto eachfunctionandsome commentary,generallyinEnglish,aboutwhatitdoes.Asaworkingprogrammer, youwillinturnspendmuchofyourtimeproducingmodulesthatpresentthesame featurestoyourclients.

2.1Iterators

Ifwearetodevelopsomegeneralnotionofacollectionofdata,thereisatleastone genericquestionwe’llhavetoanswer:howarewegoingtoget items out ofsucha collection?Youarefamiliarwithonekindofcollectionalready—anarray.Getting

1 A multiset or bag islikeasetexceptthatitmaycontainmultiplecopiesofaparticulardata value.Thatis,eachmemberofamultisethasa multiplicity: anumberoftimesthatitappears.

23

itemsoutofanarrayiseasy;forexample,toprintthecontentsofanarray,you mightwrite

for(inti=0;i<A.length;i+=1) System.out.print(A[i]+",");

Arrayshaveanaturalnotionofan nth element,sosuchloopsareeasy.Butwhat aboutothercollections?Whichisthe“firstpenney”inajarofpenneys?Evenif wedoarbitrarilychoosetogiveeveryiteminacollectionanumber,wewillfind thattheoperation“fetchthe nth item”maybeexpensive(considerlistsofthings suchasinScheme).

Theproblemwithattemptingtoimposeanumberingoneverycollectionofitems aswaytoextractthemisthatitforcestheimplementorofthe collectiontoprovide amorespecifictoolthanourproblemmayrequire.It’saclassicengineeringtradeoff:satisfyingoneconstraint(thatonebeabletofetchthe nth item)mayhave othercosts(fetchingallitemsonebyonemaybecomeexpensive).

Sotheproblemistoprovidetheitemsinacollectionwithout relyingonindices, orpossiblywithoutrelyingonorderatall.Javaprovidestwoconventions,realizedas interfaces.Theinterface java.util.Iterator providesawaytoaccessalltheitems inacollectionin some order.Theinterface java.util.ListIterator providesa waytoaccessitemsinacollectioninsomespecificorder,but withoutassigningan indextoeachitem2

2.1.1TheIteratorInterface

TheJavalibrarydefinesaninterface, java.util.Iterator,showninFigure2.1, thatcapturesthegeneralnotionof“somethingthatsequencesthroughallitems inacollection”withoutanycommitmenttoorder.ThisisonlyaJavainterface; thereisnoimplementationbehindit.IntheJavalibrary,thestandardwayfora classthatrepresentsacollectionofdataitemstoprovidea waytosequencethrough thoseitemsistodefineamethodsuchas

Iterator<SomeType>iterator(){...}

thatallocatesandreturnsanIterator(Figure3.3includes anexample).Oftenthe actualtypeofthisiteratorwillbehidden(evenprivate);alltheuseroftheclass needstoknowisthattheobjectreturnedby iterator providestheoperations hasNext and next (andsometimes remove).Forexample,ageneralwaytoprint allelementsofacollectionof Strings(analogoustothepreviousarrayprinter) mightbe

for(Iterator<String>i=C.iterator();i.hasNext();) System.out.print(i.next()+"");

2Thelibraryalsodefinestheinterface java.util.Enumeration,whichisessentiallyanolder versionofthesameidea.Wewon’ttalkaboutthatinterfacehere,sincetheofficialpositionisthat Iterator ispreferredfornewprograms.

24
CHAPTER2.DATATYPESINTHEABSTRACT

packagejava.util;

/**Anobjectthatdeliverseachiteminsomecollectionofitems *eachofwhichisaT.*/ publicinterfaceIterator<T>{

/**Trueifftherearemoreitemstodeliver.*/ booleanhasNext();

/**AdvanceTHIStothenextitemandreturnit.*/ Tnext();

/**Removethelastitemdeliveredbynext()fromthecollection *beingiteratedover.Optionaloperation:maythrow *UnsupportedOperationExceptionifremovalisnotpossible.*/ voidremove();

Theprogrammerwhowritesthisloopneedn’tknowwhatgyrationstheobject i hastogothroughtoproducetherequestedelements;evenamajorchangeinhow C representsitscollectionrequiresnomodificationtotheloop. Thisparticularkindof for loopissocommonandusefulthatinJava2,version 1.5,ithasitsown“syntacticsugar,”knownasan enhanced for loop. Youcanwrite for(Stringi:C) System.out.print(i+"");

togetthesameeffectastheprevious for loop.Javawillinsertthemissingpieces, turningthisinto

for(Iterator<String> ρ =C.iterator(); ρ hasNext();){

Stringi= ρ.next(); System.out.println(i+"");

where ρ issomenewvariableintroducedbythecompilerandunusedelsewhere intheprogram,andwhosetypeistakenfromthatof C.iterator().Thisenhanced for loopwillworkforanyobject C whosetypeimplementstheinterface java.lang.Iterable,definedsimply

publicinterfaceIterable<T>{ Iterator<T>iterator();

Thankstotheenhanced for loop,simplybydefiningan iterator methodonatype youdefine,youprovideaveryconvenientwaytosequencethroughanysubparts thatobjectsofthattypemightcontain.

Well,needlesstosay,havingintroducedthisconvenientshorthandfor Iterators, Java’sdesignersweresuddenlyinthepositionthatiteratingthroughtheelements

2.1.ITERATORS 25
}
Figure2.1:The java.util.Iterator interface.
}
}

ofanarraywasmuchclumsierthaniteratingthroughthoseof alibraryclass.So theyextendedtheenhanced for statementtoencompassarrays.So,forexample, thesetwomethodsareequivalent:

/**Thesumofthe *elementsofA*/ intsum(int[]A){ intS; S=0; for(intx:A) =⇒ S+=x; }

/**Thesumoftheelements *ofA*/ intsum(int[]A){ intS; S=0; for(int κ =0; κ <A.length; κ++) { intx=A[κ]; S+=x; } }

where κ isanewvariableintroducedbythecompiler.

2.1.2TheListIteratorInterface

Somecollectionsdohaveanaturalnotionofordering,butit maystillbeexpensive toextractanarbitraryitemfromthecollectionbyindex.Forexample,youmay haveseenlinkedlistsintheSchemelanguage:givenaniteminthelist,itrequires n operationstofindthe nth succeedingitem(incontrasttoaJavaarray,which requiresonlyoneJavaoperationorafewmachineoperations toretrieveanyitem).

ThestandardJavalibrarycontainstheinterface java.util.ListIterator,which capturestheideaofsequencingthroughanorderedsequence withoutfetchingeach explicitlybynumber.ItissummarizedinFigure2.2.Inadditiontothe“navigational”methodsandthe remove methodof Iterator (whichitextends),the ListIterator classprovidesoperationsforinsertingnewitemsorreplacingitems inacollection.

2.2TheJavaCollectionAbstractions

TheJavalibrary(beginningwithJDK1.2)providesahierarchyofinterfacesrepresentingvariouskindsofcollection,plusahierarchyofabstractclassestohelp programmersprovideimplementationsoftheseinterfaces, aswellasafewactual (“concrete”)implementations.Theseclassesareallfound inthepackage java.util Figure2.4illustratesthehierarchyofclassesandinterfacesdevotedtocollections.

2.2.1TheCollectionInterface

TheJavalibraryinterface java.util.Collection,whosemethodsaresummarized inFigures2.5and2.6,issupposedtodescribedatastructuresthatcontaincollectionsofvalues,whereeachvalueisareferencetosomeObject(ornull).The term“collection”asopposedto“set”isappropriatehere,because Collection is supposedtobeabledescribemultisets(bags)aswellasordinarymathematicalsets.

26
CHAPTER2.DATATYPESINTHEABSTRACT

packagejava.util;

/**Abstractionofapositioninanorderedcollection.Atany *giventime,THISrepresentsaposition(calledits cursor ) *thatisjustaftersomenumberofitemsoftypeT(0ormore)of *aparticularcollection,calledthe underlyingcollection.*/ publicinterfaceListIterator<T>extendsIterator<T>{

/*Exceptions:Methodsthatreturnitemsfromthecollectionthrow

*NoSuchElementExceptionifthereisnoappropriateitem.Optional *methodsthrowUnsupportedOperationExceptionifthemethodisnot *supported.*/

/* Requiredmethods: */

/**TrueunlessTHISispastthelastitemofthecollection*/ booleanhasNext();

/**TrueunlessTHISisbeforethefirstitemofthecollection*/ booleanhasPrevious();

/**Returnstheitemimmediatelyafterthecursor,and *movesthecurrentpositiontojustafterthatitem.

*ThrowsNoSuchElementExceptionifthereisnosuchitem.*/ Tnext();

/**Returnstheitemimmediatelybeforethecursor,and *movesthecurrentpositiontojustbeforethatitem.

*ThrowsNoSuchElementExceptionifthereisnosuchitem.*/ Tprevious();

/**Thenumberofitemsbeforethecursor*/ intnextIndex();

/*nextIndex()-1*/ intpreviousIndex();

2.2.THEJAVACOLLECTIONABSTRACTIONS 27
Figure2.2:The java.util.ListIterator interface.

CHAPTER2.DATATYPESINTHEABSTRACT

/* Optionalmethods: */

/**InsertitemXintotheunderlyingcollectionimmediatelybefore *thecursor(Xwillbereturnedbyprevious()).*/ voidadd(Tx);

/**Removetheitemreturnedbythemostrecentcallto.next() *or.previous().Theremustnothavebeenamorerecent *callto.add().*/ voidremove();

/**Replacetheitemreturnedbythemostrecentcallto.next () *or.previous()withXintheunderlyingcollection.

*Theremustnothavebeenamorerecentcallto.add()or.remove.*/ voidset(Tx);

Figure2.3: TheJavalibrary’sMap-relatedtypes(from java.util).Ellipsesrepresentinterfaces;dashedboxesareabstractclasses,andsolidboxesareconcrete (non-abstract)classes.Solidarrowsindicate extends relationships,anddashed arrowsindicate implements relationships.Theabstractclassesareforuseby implementorswishingtoaddnewcollectionclasses;theyprovidedefaultimplementationsofsomemethods.Clientsapply new totheconcreteclassestogetinstances, and(atleastideally),usetheinterfacesasformalparametertypessoastomake theirmethodsaswidelyapplicableaspossible.

28
}
Map AbstractMap SortedMap HashMap WeakHashMap TreeMap
Figure2.2,continued:Optionalmethodsinthe ListIterator class.

2.2.THEJAVACOLLECTIONABSTRACTIONS

List Set

SortedSet

AbstractCollection

AbstractList

AbstractSequentialList ArrayList Vector

AbstractSet

HashSet TreeSet

LinkedList Stack

Figure2.4: TheJavalibrary’sCollection-relatedtypes(from java.util).SeeFigure2.3forthenotation.

29
Collection

Sincethisisaninterface,thedocumentationcommentsdescribingtheoperations neednotbeaccurate;anineptormischievousprogrammercan writeaclassthat implements Collection inwhichthe add method removes values.Nevertheless, anydecentimplementorwillhonorthecomments,sothatanymethodthataccepts a Collection, C,asanargumentcanexpectthat,afterexecuting C .add(x),the value x willbein C

Noteverykindof Collection needstoimplementeverymethod—specifically, nottheoptionalmethodsinFigure2.6—butmayinsteadchoosetoraisethestandardexception UnsupportedOperationException.See §2.5forafurtherdiscussionofthisparticulardesignchoice.Classesthatimplementonlytherequired methodsareessentially read-only collections;theycan’tbemodifiedoncetheyare created.

ThecommentconcerningconstructorsinFigure2.5is,ofcourse,merelyacomment.Javainterfacesdonothaveconstructors,sincetheydonotrepresentspecific typesofconcreteobject.Nevertheless,youultimatelyneedsomeconstructortocreatea Collection inthefirstplace,andthepurposeofthecommentistosuggest someusefuluniformity.

Atthispoint,youmaywellbewonderingofwhatpossibleusethe Collection classmightbe,inasmuchasitisimpossibletocreateonedirectly(itisaninterface), andyouaremissingdetailsaboutwhatitsmembersdo(forexample,canagiven Collection havetwoequalelements?).Thepointisthatanyfunctionthatyou can writeusingjusttheinformationprovidedinthe Collection interfacewillwork for all implementationsof Collection

Forexample,hereissimplemethodtodetermineiftheelementsofone Collection areasubsetofanother:

/**TrueiffC0isasubsetofC1,ignoringrepetitions.*/ publicstaticbooleansubsetOf(Collection<?>C0,Collection<?>C1){ for(Objecti:C0)

if(!C1.contains(i)) returnfalse;

//Note:equivalentto

//for(Iterator<?>iter=C0.iterator();iter.hasNext(); ){ //Objecti=iter.next();

returntrue;

Wehavenoideawhatkindsofobjects C0 and C1 are(theymightbecompletely differentimplementationsof Collection),inwhatordertheiriteratorsdeliverelements,orwhethertheyallowrepetitions.Thismethodreliessolelyontheproperties describedintheinterfaceanditscomments,andthereforealwaysworks(assuming, asalways,thattheprogrammerswhowriteclassesthatimplement Collection dotheirjobs).Wedon’thavetorewriteitforeachnewkindof Collection we implement.

30
CHAPTER2.DATATYPESINTHEABSTRACT
//...
}

2.2.THEJAVACOLLECTIONABSTRACTIONS

31 packagejava.util; /**Acollectionofvalues,eachanObjectreference.*/ publicinterfaceCollection<T>extendsIterable<T>{

/* Constructors. ClassesthatimplementCollectionshould *haveatleasttwoconstructors:

*CLASS():ConstructsanemptyCLASS

*CLASS(C):WhereCisanyCollection,constructsaCLASSthat *containsthesameelementsasC.*/

/* Requiredmethods: */

/**ThenumberofvaluesinTHIS.*/ intsize();

/**Trueiffsize()==0.*/ booleanisEmpty();

/**TrueiffTHIScontainsX:thatis,ifforsomezin *THIS,eitherzandXarenull,orz.equals(X).*/ booleancontains(Objectx);

/**Trueiffcontains(x)forallelementsxinC.*/ booleancontainsAll(Collection<?>c);

/**AniteratorthatyieldsalltheelementsofTHIS,insome *order.*/ Iterator<T>iterator();

/**AnewarraycontainingallelementsofTHIS.*/ Object[]toArray();

/**AssumingANARRAYhasdynamictypeT[](whereTissome *referencetype),theresultisanarrayoftypeT[]containing *allelementsofTHIS.TheresultisANARRAYitself,ifallof *theseelementsfit(leftoverelementsofANARRAYaresetto null).

*Otherwise,theresultisanewarray.Itisanerrorifnot *allitemsinTHISareassignabletoT.*/ <T>T[]toArray(T[]anArray);

Figure2.5:Theinterface java.util.Collection,requiredmembers.

//Interfacejava.util.Collection,continued. /* Optionalmethods. Anyofthesemaydonothingexceptto *throwUnsupportedOperationException.*/

/**CauseXtobecontainedinTHIS.ReturnstrueiftheCollection*/ *changesasaresult.*/ booleanadd(Tx);

/**CauseallmembersofCtobecontainedinTHIS.Returnstrue *iftheobjectTHISchangesasaresult.*/ booleanaddAll(Collection<?extendsT>c);

/**RemoveallmembersofTHIS.*/ voidclear();

/**RemoveaObject.equaltoXfromTHIS,ifoneexists, *returningtrueifftheobjectTHISchangesasaresult.*/ booleanremove(ObjectX);

/**Removeallelements,x,suchthatC.contains(x)(ifany *arepresent),returningtrueifftherewereany *objectsremoved.*/ booleanremoveAll(Collection<?>c);

/**Intersection:Removeallelements,x,suchthatC.contains(x) *isfalse,returningtrueiffanyitemswereremoved.*/ booleanretainAll(Collection<?>c);

32
CHAPTER2.DATATYPESINTHEABSTRACT
}
Figure2.6:Optionalmembersoftheinterface java.util.Collection

2.2.2TheSetInterface

Inmathematics,asetisacollectionofvaluesinwhichthere arenoduplicates.This istheideaalsofortheinterface java.util.Set.Unfortunately,thisprovisionis notdirectlyexpressibleintheformofaJavainterface.Infact,asfarastheJava compilerisconcerned,thefollowingservesasaperfectlygooddefinition:

packagejava.util;

publicinterfaceSet<T>extendsCollection<T>{}

Themethods,thatis,areallthesame.Thedifferencesareall inthecomments. Theone-copy-of-each-elementruleisreflectedinmorespecificcommentsonseveral methods.TheresultisshowninFigure2.7.Inthisdefinition,wealsoincludethe methods equals and hashCode.Thesemethodsareautomaticallypartofanyinterface,becausetheyaredefinedintheJavaclass java.lang.Object,butIincluded themherebecausetheirsemanticspecification(thecomment)ismorestringentthan forthegeneralObject.Theidea,ofcourse,isfor equals todenotesetequality. We’llreturnto hashCode inChapter7.

2.2.3TheListInterface

AsthetermisusedintheJavalibraries,alistisasequenceofitems,possiblywith repetitions.Thatis,itisaspecializedkindof Collection,oneinwhichthereisa sequencetotheelements—afirstitem,alastitem,an nth item—anditemsmaybe repeated(itcan’tbeconsidereda Set).Asaresult,itmakessensetoextendthe interface(relativeto Collection)toincludeadditionalmethodsthatmakesense forwell-orderedsequences.Figure2.8displaystheinterface.

Agreatdealoffunctionalityhereiswrappedupinthe listIterator method andtheobjectitreturns.Asyoucanseefromtheinterfacedescriptions,youcan insert,add,remove,orsequencethroughitemsina List eitherbyusingmethods inthe List interfaceitself,orbyusing listIterator tocreatealistiteratorwith whichyoucandothesame.Theideaisthatusingthe listIterator toprocess anentirelist(orsomepartofit)willgenerallybefasterthanusing get andother methodsof List thatusenumericindicestodenoteitemsofinterest.

Views

The subList methodisparticularlyinteresting.Acallsuchas L.subList(i,j) is supposedtoproduceanother List (whichwillgenerally not beofthesametypeas L)consistingofthe ith throughthe (j-1)th itemsof L.Furthermore,itistodo thisbyprovidinga view ofthispartof L—thatis,analternativewayofaccessing thesamedatacontainers.Theideaisthatmodifyingthesublist(usingmethods suchas add, remove,and set)issupposedtomodifythecorrespondingportionof L aswell.Forexample,toremoveallbutthefirst k itemsinlist L,youmightwrite L.subList(k,L.size()).clear();

2.2.THEJAVACOLLECTIONABSTRACTIONS 33

packagejava.util;

/**ACollectionthatcontainsatmostonenullitemandinwhichno *twodistinctnon-nullitemsare.equal.Theeffectsofmodifying *anitemcontainedinaSetsoastochangethevalueof.equal *onitareundefined.*/

publicinterfaceSet<T>extendsCollection<T>{

/* Constructors. ClassesthatimplementSetshould *haveatleasttwoconstructors:

*CLASS():ConstructsanemptyCLASS

*CLASS(C):WhereCisanyCollection,constructsaCLASSthat *containsthesameelementsasC,withduplicatesremoved.*/

/**CauseXtobecontainedinTHIS.ReturnstrueiffXwas*/ *notpreviouslyamember.*/ booleanadd(Tx);

/**TrueiffSisaSet(instanceofSet)andisequaltoTHISasa *set(size()==S.size()eachofiteminSiscontainedinTHIS).*/ booleanequals(ObjectS);

/**Thesumofthevaluesofx.hashCode()forallxinTHIS,with *thehashCodeofnulltakentobe0.*/ inthashCode();

/*OthermethodsinheritedfromCollection:

*size,isEmpty,contains,containsAll,iterator,toArray, *addAll,clear,remove,removeAll,retainAll*/

34
CHAPTER2.DATATYPESINTHEABSTRACT
}
Figure2.7:Theinterface java.util.Set.Onlymethodswithcommentsthatare morespecificthanthoseof Collection areshown.

35 packagejava.util;

/**Anorderedsequenceofitems,indexedbynumbers0..N-1, *whereNisthesize()oftheList.*/ publicinterfaceList<T>extendsCollection<T>{

/* Requiredmethods: */

/**TheKthelementofTHIS,where0<=K<size().Throws *IndexOutOfBoundsExceptionifKisoutofrange.*/ Tget(intk);

/**Thefirstvalueksuchthatget(k)isnullifX==null, *X.equals(get(k)),otherwise,or-1ifthereisnosuchk.*/ intindexOf(Objectx);

/**Thelargestvalueksuchthatget(k)isnullifX==null, *X.equals(get(k)),otherwise,or-1ifthereisnosuchk.*/ intlastIndexOf(Objectx);

/*NOTE:Themethodsiterator,listIterator,andsubListproduce *viewsthatbecomeinvalidifTHISisstructurallymodified by *anyothermeans(seetext).*/

/**AniteratorthatyieldsalltheelementsofTHIS,inproper *indexorder.(NOTE:itisalwaysvalidforiterator()to *returnthesamevalueaswouldlistIterator,below.)*/ Iterator<T>iterator();

/**AListIteratorthatyieldstheelementsK,K+1,...,size()-1 *ofTHIS,inthatorder,where0<=K<=size().Throws *IndexOutOfBoundsExceptionifKisoutofrange.*/ ListIterator<T>listIterator(intk);

/**SameaslistIterator(0)*/ ListIterator<T>listIterator();

/**AviewofTHISconsistingoftheelementsL,L+1,...,U-1, *inthatorder.ThrowsIndexOutOfBoundsExceptionunless *0<=L<=U<=size().*/ List<T>subList(intL,intU);

/*OthermethodsinheritedfromCollection: *add,addAll,size,isEmpty,contains,containsAll,remove,toArray*/

java.util.List,beyondthoseinherited from Collection.

2.2.THEJAVACOLLECTIONABSTRACTIONS
Figure2.8:Requiredmethodsofinterface

/* Optionalmethods: */

/**CauseitemKofTHIStobeX,anditemsK+1,K+2,...tocontain *thepreviousvaluesofget(K),get(K+1),....Throws *IndexOutOfBoundsExceptionunless0<=K<=size().*/ voidadd(intk,Tx);

/**Sameeffectasadd(size(),x);alwaysreturnstrue.*/ booleanadd(Tx);

/**IftheelementsreturnedbyC.iterator()arex0,x1,..., in *thatorder,thenperformtheequivalentofadd(K,x0), *add(K+1,x1),...,returningtrueifftherewasanythingto *insert.IndexOutOfBoundsExceptionunless0<=K<=size().*/ booleanaddAll(intk,Collection<T>c);

/**SameasaddAll(size(),c).*/ booleanaddAll(Collection<T>c);

/**RemoveitemK,movingitemsK+1,...downoneindexposition, *andreturningtheremoveditem.Throws *IndexOutOfBoundsExceptionifthereisnoitemK.*/ Objectremove(intk);

/**RemovethefirstitemequaltoX,ifany,movingsubsequent *elementsoneindexpositionlower.Returntrueiffanything *wasremoved.*/ booleanremove(Objectx);

/**Replaceget(K)withX,returningtheinitial(replaced) valueof *get(K).ThrowsIndexOutOfBoundsExceptionifthereisnoitemK.*/ Objectset(intk,Tx);

/*OthermethodsinheritedfromCollection:removeAll,retainAll*/ }

36
CHAPTER2.DATATYPESINTHEABSTRACT
Figure2.8,continued:Optionalmethodsofinterface java.util.List,beyond fromthoseinheritedfrom Collection

Asaresult,therearealotofpossibleoperationson List thatdon’thavetobe defined,becausetheyfalloutasanaturalconsequenceofoperationsonsublists. Thereisnoneedforaversionof remove thatdeletesitems i through j ofalist,or foraversionof indexOf thatstartssearchingatitem k.

Iterators(includingListIterators)provideanotherexampleofaviewofCollections.Again,youcanaccessor(sometimes)modifythecurrentcontentsofa Collectionthroughaniteratorthatitsmethodssupply.For thatmatter,anyCollectionisitselfaview—the“identityview”ifyouwant.

Whenevertherearetwopossibleviewsofthesameentity,thereisapossibility thatusingoneofthemtomodifytheentitywillinterferewiththeotherview.It’s notjustthatchangesinoneviewaresupposedtobeseeninotherviews(asinthe exampleofclearingasublist,above),butstraightforward andfastimplementations ofsomeviewsmaymalfunctionwhentheentitybeingviewedis changedbyother means.Whatissupposedtohappenwhenyoucall remove onaniterator,butthe itemthatissupposedtoberemoved(accordingtothespecificationof Iterator) hasalreadybeenremoveddirectly(bycalling remove onthefullCollection)?Or supposeyouhaveasublistcontainingitems2through4ofsomefulllist.Ifthefull listis cleared,andthen3itemsareaddedtoit,whatisinthesublistview?

Becauseofthesequandries,thefullspecificationofmanyview-producingmethods(inthe List interface,theseare iterator, listIterator,and subList)have aprovisionthattheviewbecomesinvalidiftheunderlying List is structurallymodified (thatis,ifitemsareaddedorremoved)throughsomemeansotherthanthat view.Thus,theresultof L.iterator() becomesinvalidifyouperform L.add(...), orifyouperform remove onsomeother Iterator orsublistproducedfrom L.By contrast,wewillalsoencounterviews,suchasthoseproducedbythe values method on Map (seeFigure2.12),thataresupposedtoremainvalidevenwhentheunderlyingobjectisstructurallymodified;itisanobligationon theimplementorsofnew kindsof Map thattheyseethatthisisso.

2.2.4OrderedSets

The List interfacedescribesdatatypesthatdescribesequencesinwhichtheprogrammerexplicitlydeterminestheorderofitemsinthesequencebytheorderor placeinwhichtheyareaddedtothesequence.Bycontrast,the SortedSet interfaceisintendedtodescribesequencesinwhichthe data determinetheordering accordingtosomeselectedrelation.Ofcourse,thisimmediatelyraisesaquestion: inJava,howdowerepresentthis“selectedrelation”sothat wecanspecifyit?How dowemakeanorderingrelationaparameter?

Orderings:the Comparable and Comparator Interfaces

Therearevariouswaysforfunctionstodefineanorderingoversomesetofobjects. Onewayistodefinebooleanoperations equals, less, greater,etc.,withthe obviousmeanings.LibrariesintheCfamilyoflanguages(whichincludesJava) tendtocombinealloftheseintoasinglefunctionthatreturnsanintegerwhose signdenotestherelation.Forexample,onthetype String, x.compareTo("cat")

2.2.THEJAVACOLLECTIONABSTRACTIONS 37

packagejava.lang;

/**Describestypesthathavea naturalordering. */ publicinterfaceComparable<T>{

/**Returns

**anegativevalueiffTHIS<Yunderthenaturalordering

**apositivevalueiffTHIS>Y;

**0iffXandYare"equivalent".

*ThrowsClassCastExceptionifXandYareincomparable.*/ intcompareTo(Ty);

returnsanintegerthatiszero,negative,orpositive,dependingonwhether x equals "cat",comesbeforeitinlexicographicorder,orcomesafterit.Thus,theordering x ≤ y onStringscorrespondstothecondition x.compareTo(y)<=0.

Forthepurposesofthe SortedSet interface,this ≤ (or ≥)orderingrepresented by compareTo (or compare,describedbelow)isintendedtobea totalordering. Thatis,itissupposedtobetransitive(x ≤ y and y ≤ z implies x ≤ z),reflexive (x ≤ x),andantisymmetric(x ≤ y and y ≤ x impliesthat x equals y).Also,forall x and y inthefunction’sdomain,either x ≤ y or y ≤ x.

Someclasses(suchas String)definetheirownstandardcomparisonoperation. Thestandardwaytodosoistoimplementthe Comparable interface,shownin Figure2.9.However,notallclasseshavesuchanordering,noris the natural orderingnecessarilywhatyouwantinanygivencase.Forexample,onecansort Stringsindictionaryorder,reversedictionaryorder,orcase-insensitiveorder.

IntheSchemelanguage,thereisnoparticularproblem:anorderingrelationis justafunction,andfunctionsareperfectlygoodvaluesinScheme.Toacertain extent,thesameistrueinlanguageslikeCandFortran,wherefunctionscanbe usedasargumentstosubprograms,butunlikeScheme,haveaccessonlytoglobal variables(whatarecalledstaticfieldsorclassvariablesinJava).Javadoesnotdirectlysupportfunctionsasvalues,butitturnsoutthatthisisnotalimitation.The Javastandardlibrarydefinesthe Comparator interface(Figure2.10)torepresent thingsthatmaybeusedasorderingrelations.

Themethodsprovidedbybothoftheseinterfacesaresupposedtobepropertotalorderings.However,asusual,noneoftheconditionscan actuallybeenforcedby theJavalanguage;theyarejustconventionsimposedbycomment.Theprogrammerwhoviolatestheseassumptionsmaycauseallkindsofunexpectedbehavior. Likewise,nothingcankeepyoufromdefininga compare operationthatisinconsistentwiththe .equals function.Wesaythat compare (or compareTo)is consistent withequals if x.equals(y) iff C.compare(x,y)==0.It’sgenerallygoodpracticeto maintainthisconsistencyintheabsenceofagoodreasontothecontrary.

38
CHAPTER2.DATATYPESINTHEABSTRACT
}
Figure2.9:Theinterface java.lang.Comparable,whichmarksclassesthatdefine anaturalordering.

packagejava.util;

/**Anorderingrelationoncertainpairsofobjects.If*/ publicinterfaceComparator<T>{

/**Returns

**anegativevalueiffX<YaccordingtoTHISordering; **apositivevalueiffX>Y; **0iffXandYare"equivalent"undertheorder; *ThrowsClassCastExceptionifXandYareincomparable. */ intcompare(Tx,Ty);

/**TrueifORDis"same"orderingasTHIS.Itislegaltoreturn *false(conservatively)evenifORDdoesdefinethesameordering, *butshouldreturntrueonlyifORD.compare(X,Y)and *THIS.compare(X,Y)alwayshavethesamevalue.*/ booleanequals(Objectord);

Figure2.10:Theinterface java.util.Comparator,whichrepresentsorderingrelationsbetweenObjects.

TheSortedSetInterface

The SortedSet interfaceshowninFigure2.11extendsthe Set interfacesothat its iterator methoddeliversan Iterator thatsequencesthroughitscontents“in order.”Italsoprovidesadditionalmethodsthatmakesense onlywhenthereis suchanorder.Thereareintendedtobetwowaystodefinethisordering:eitherthe programmersuppliesa Comparator whenconstructinga SortedSet thatdefines theorder,orelsethecontentsofthesetmust Comparable,andtheirnaturalorder isused.

2.3TheJavaMapAbstractions

Theterm map or mapping isusedincomputerscienceandelsewhereasasynonym for function inthemathematicalsense—acorrespondencebetweenitemsinsome set(the domain)andanotherset(the codomain)inwhicheachitemofthedomain correspondsto(ismappedtoby)asingleitemofthecodomain3 .

Itistypicalamongprogrammerstotakearatheroperational view,andsay thatamap-likedatastructure“looksup”agiven key (domainvalue)tofindthe associated value (codomainvalue).However,fromamathematicalpointofview,a perfectlygoodinterpretationisthatamappingisasetofpairs,(d,c),where d isa

2.3.THEJAVAMAPABSTRACTIONS 39
}
3 Anynumberofmembersofthedomain,includingzero,maycorrespondtoagivenmemberof thecodomain.Thesubsetofthecodomainthatismappedtobysomememberofthedomainis calledthe range ofthemapping,orthe image ofthedomainunderthemapping.

packagejava.util; publicinterfaceSortedSet<T>extendsSet<T>{

/* Constructors. ClassesthatimplementSortedSetshoulddefine *atleasttheconstructors

*CLASS():Anemptysetorderedbynaturalorder(compareTo).

*CLASS(CMP):AnemptysetorderedbytheComparatorCMP.

*CLASS(C):AsetcontainingtheitemsinCollectionC,in *naturalorder.

*CLASS(S):AsetcontainingacopyofSortedSetS,withthe *sameorder.

/**ThecomparatorusedbyTHIS,ornullifnaturalorderingused.*/ Comparator<?superT>comparator();

/**Thefirst(smallest)iteminTHISaccordingtoitsordering*/ Tfirst();

/**Thelast(largest)iteminTHISaccordingtoitsordering */ Tlast();

/*NOTE:ThemethodsheadSet,tailSet,andsubSetproduce *viewsthatbecomeinvalidifTHISisstructurallymodified by *anyothermeans.*/

/**AviewofallitemsinTHISthatarestrictlylessthanX.*/ SortedSet<T>headSet(Tx);

/**AviewofallitemsinTHISthatarestrictly>=X.*/ SortedSet<T>tailSet(Tx);

/**Aviewofallitems,y,inTHISsuchthatX0<=y<X1.*/ SortedSet<T>subSet(TX0,TX1); }

40 CHAPTER2.DATATYPESINTHEABSTRACT
*/
Figure2.11:Theinterface java.util.SortedSet

memberofthedomain,and c ofthecodomain.

2.3.1TheMapInterface

ThestandardJavalibraryusesthe java.util.Map interface,displayedinFigures2.12and2.13,tocapturethesenotionsof“mapping.”Thisinterfaceprovides boththeviewofamapasalook-upoperation(withthemethod get),butalsothe viewofamapasasetoforderedpairs(withthemethod entrySet).Thisinturnrequiressomerepresentationfor“orderedpair,”providedherebythenestedinterface Map.Entry.Aprogrammerwhowishestointroduceanewkindofmaptherefore definesnotonlyaconcreteclasstoimplementthe Map interface,butanotherone toimplement Map.Entry

2.3.2TheSortedMapInterface

Anobjectthatimplements java.util.SortedMap issupposedtobea Map inwhich thesetofkeysisordered.Asyoumightexpect,theoperationsareanalogousto thoseoftheinterface SortedSet,asshowninFigure2.15.

2.4AnExample

Considertheproblemofreadinginasequenceofpairsofnames,(ni,mi).Wewish tocreatealistofallthefirstmembers, ni,inalphabeticalorder,and,foreachof them,alistofallnames mi thatarepairedwiththem,witheach mi appearing once,andlistedintheorderoffirstappearance.Thus,theinput

JohnMaryGeorgeJeffTomBertGeorgePaulJohnPeter

TomJimGeorgePaulAnnCyrilJohnMaryGeorgeEric mightproducetheoutput

Ann:Cyril

George:JeffPaulEric

John:MaryPeter

Tom:BertJim

Wecanusesomekindof SortedMap tohandlethe ni andforeach,a List tohandle the mi.Apossiblemethod(takinga Reader asasourceofinputanda PrintWriter asadestinationforoutput)isshowninFigure2.16.

2.4.ANEXAMPLE 41

CHAPTER2.DATATYPESINTHEABSTRACT packagejava.util; publicinterfaceMap<Key,Val>{

/* Constructors: ClassesthatimplementMapshould *haveatleasttwoconstructors:

*CLASS():ConstructsanemptyCLASS

*CLASS(M):WhereMisanyMap,constructsaCLASSthat *denotesthesameabstractmappingasC.*/

/* Requiredmethods: */

/**ThenumberofkeysinthedomainofTHISmap.*/ intsize();

/**Trueiffsize()==0*/ booleanisEmpty();

/*NOTE:ThemethodskeySet,values,andentrySetproduceviews *thatremainvalidevenifTHISisstructurallymodified.*/

/**ThedomainofTHIS.*/ Set<Key>keySet();

/**TherangeofTHIS.*/ Collection<Val>values();

/**AviewofTHISasthesetofallits(key,value)pairs.*/ Set<Map.Entry<Key,Val>>entrySet();

/**ThevaluemappedtobyKEY,ornullifKEYisnot *inthedomainofTHIS.*/

/**TrueiffkeySet().contains(KEY)*/ booleancontainsKey(Objectkey);

/**Trueiffvalues().contains(VAL).*/ booleancontainsValue(Objectval); Objectget(Objectkey);

/**TrueiffMisaMapandTHISandMrepresentthesamemapping.*/ booleanequals(ObjectM);

/**ThesumofthehashCodevaluesofallmembersofentrySet().*/ inthashCode();

staticinterfaceEntry{...// SeeFigure2.14 }

42
Figure2.12:Requiredmethodsoftheinterface java.util.Map.

//Interfacejava.util.Map,continued

/* Optionalmethods: */

/**SetthedomainofTHIStotheemptyset.*/ voidclear();

/**Causeget(KEY)toyieldVAL,withoutdisturbingothervalues.*/ Objectput(Keykey,Valval);

/**AddallmembersofM.entrySet()totheentrySet()ofTHIS.*/ voidputAll(Map<?extendsKey,?extendsVal>M);

/**RemoveKEYfromthedomainofTHIS.*/ Objectremove(Objectkey);

/**Representsa(key,value)pairfromsomeMap.Ingeneral, anEntry *isassociatedwithaparticularunderlyingMapvalue.Operationsthat *changetheEntry(specificallysetValue)arereflectedin that *Map.OnceanentryhasbeenremovedfromaMapasaresultof *removeorclear,furtheroperationsonitmayfail.*/ staticinterfaceEntry<Key,Val>{

/**ThekeypartofTHIS.*/ KeygetKey();

/**ThevaluepartofTHIS.*/ ValgetValue();

/**CausegetValue()tobecomeVAL,returningthepreviousvalue.*/ ValsetValue(Valval);

/**TrueiffEisaMap.Entryandbothrepresentthesame(key,value) *pair(i.e.,keysarebothnull,orare.equal,andlikewisefor *values).

booleanequals(Objecte);

/**AnintegerhashvaluethatdependsonlyonthehashCodevalues

*ofgetKey()andgetValue()accordingtotheformula:

*(getKey()==null?0:getKey().hashCode())

*^(getValue()==null?0:getValue.hashCode())*/ inthashCode();

2.4.ANEXAMPLE 43
}
Figure2.13:Optionalmethodsoftheinterface java.util.Map
}
Figure2.14:Thenestedinterface java.util.Map.Entry,whichisnestedwithin the java.util.Map interface.

CHAPTER2.DATATYPESINTHEABSTRACT

packagejava.util; publicinterfaceSortedMap<Key,Val>extendsMap<Key,Val>{

/* Constructors: ClassesthatimplementSortedMapshould *haveatleastfourconstructors:

*CLASS():Anemptymapwhosekeysareorderedbynaturalorder.

*CLASS(CMP):AnemptymapwhosekeysareorderedbytheComparatorCMP.

*CLASS(M):AmapthatisacopyofMapM,withkeysordered *innaturalorder.

*CLASS(S):AmapcontainingacopyofSortedMapS,with *keysobeyingthesameordering.

/**ThecomparatorusedbyTHIS,ornullifnaturalorderingused.*/ Comparator<?superKey>comparator();

/**Thefirst(smallest)keyinthedomainofTHISaccordingto *itsordering*/ KeyfirstKey();

/**Thelast(largest)iteminthedomainofTHISaccordingto *itsordering*/ KeylastKey();

/*NOTE:ThemethodsheadMap,tailMap,andsubMapproduceviews *thatremainvalidevenifTHISisstructurallymodified.*/

/**AviewofTHISconsistingoftherestrictiontoallkeysin the *domainthatarestrictlylessthanKEY.*/ SortedMap<Key,Val>headMap(Keykey);

/**AviewofTHISconsistingoftherestrictiontoallkeysin the *domainthataregreaterthanorequaltoKEY.*/ SortedMap<Key,Val>tailMap(Keykey);

/**AviewofTHISrestrictedtothedomainofallkeys,y, *suchthatKEY0<=y<KEY1.*/ SortedMap<Key,Val>subMap(Keykey0,Keykey1);

Figure2.15:Theinterface

44
*/
}
java.util.SortedMap,showingmethodsnotincluded in Map.

importjava.util.*; importjava.io.*;

classExample{

/**Read (ni,mi) pairsfromINP,andsummarizeall *pairingsforeach$n_i$inorderonOUT.*/ staticvoidcorrelate(Readerinp,PrintWriterout)

{

throwsIOException

Scannerscn=newScanner(inp); SortedMap<String,List<String>>associatesMap =newTreeMap<String,List<String>>(); while(scn.hasNext()){

Stringn=scn.next(); Stringm=scn.next();

if(m==null||n==null)

thrownewIOException("badinputformat"); List<String>associates=associatesMap.get(n); if(associates==null){

associates=newArrayList<String>(); associatesMap.put(n,associates); }

if(!associates.contains(m)) associates.add(m);

for(Map.Entry<String,List<String>>e:associatesMap.entrySet()){ System.out.format("%s:",e.getKey()); for(Strings:e.getValue()) System.out.format("%s",s); System.out.println();

2.4.ANEXAMPLE 45
}
} } }
Figure2.16:Anexampleusing SortedMapsand Lists.

2.5ManagingPartialImplementations:DesignOptions

Throughoutthe Collection interfaces,yousaw(incomments)thatcertainoperationswere“optional.”Theirspecificationsgavetheimplementorleavetouse thrownewUnsupportedOperationException();

asthebodyoftheoperation.Thisprovidesanelegantenough waynottoimplement something,butitraisesanimportantdesignissue.Throwinganexceptionisa dynamic action.Ingeneral,thecompilerwillhavenocommentaboutthefactthat youhavewrittenaprogramthatmustinevitablythrowsuchan exception;youwill discoveronlyupontestingtheprogramthattheimplementationyouhavechosen forsomedatastructureisnotsufficient.

Analternativedesignwouldsplittheinterfacesintosmallerpieces,likethis:

publicinterfaceConstantIterator<T>{ Requiredmethodsof Iterator

publicinterfaceIterator<T>extendsConstantIterator<T>{ voidremove();

publicinterfaceConstantCollection<T>{ Requiredmethodsof Collection

publicinterfaceCollection<T>extendsConstantCollection<T>{ Optionalmethodsof Collection

publicinterfaceConstantSet<T>extendsConstantCollection<T>{

publicinterfaceSet<T>extendsConstantSet<T>,Collection<T>{

publicinterfaceConstantList<T>extendsConstantCollection<T>{ Requiredmethodsof List

publicinterfaceList<T>extendsCollection<T>,ConstantList<T>{ Optionalmethodsof List

etc....

46 CHAPTER2.DATATYPESINTHEABSTRACT
}
}
}
}
}
}
}
}

2.5.MANAGINGPARTIALIMPLEMENTATIONS:DESIGNOPTIONS 47

Withsuchadesignthecompilercouldcatchattemptstocallunsupportedmethods, sothatyouwouldn’tneedtestingtodiscoveragapinyourimplementation. However,sucharedesignwouldhaveitsowncosts.It’snotquiteassimpleas thelistingabovemakesitappear.Consider,forexample,the subList methodin ConstantList.Presumably,thiswouldmostsensiblyreturna ConstantList,since ifyouarenotallowedtoalteralist,youcannotbeallowedto alteroneofitsviews. Thatmeans,however,thatthetype List wouldneedtwo subList methods(with differingnames),theoneinheritedfrom ConstantList,andanewonethatproduces a List asitsresult,whichwouldallowmodification.Similarconsiderationsapply totheresultsofthe iterator method;therewouldhavetobetwo—onetoreturna ConstantIterator,andtheothertoreturn Iterator.Furthermore,thisproposed redesignwouldnotdealwithanimplementationof List thatallowedonetoadd items,orclearallitems,butnotremoveindividualitems.Forthat,youwouldeither stillneedthe UnsupportedOperationException oranevenmorecomplicatednest ofclasses.

Evidently,theJavadesignersdecidedtoacceptthecostofleavingsomeproblems tobediscoveredbytestinginordertosimplifythedesignof theirlibrary.By contrast,thedesignersofthecorrespondingstandardlibrariesinC++optedto distinguishoperationsthatworkonanycollectionsfromthosethatworkonlyon “mutable”collections.However,theydidnotdesigntheirlibraryoutofinterfaces; itisawkwardatbesttointroducenewkindsofcollectionormapintheC++library.

48 CHAPTER2.DATATYPESINTHEABSTRACT

Chapter3

MeetingaSpecification

InChapter2,wesawandexercisedanumberofabstractinterfaces—abstractinthe sensethattheydescribethecommonfeatures,themethodsignatures,ofwholefamiliesoftypeswithoutsayinganythingabouttheinternalsofthosetypesandwithout providingawaytocreateanyconcreteobjectsthatimplementthoseinterfaces.

Inthischapter,wegetalittleclosertoconcreterepresentations,byshowing onewaytofillintheblanks.Inonesense,thesewon’tbeseriousimplementations; theywilluse“naive,”ratherslowdatastructures.Ourpurpose,rather,willbeone ofexercisingthemachineryofobject-orientedprogrammingtoillustrateideasthat youcanapplyelsewhere.

Tohelpimplementorswhowishtointroducenewimplementationsoftheabstractinterfaceswe’vecovered,theJavastandardlibrary providesaparallelcollectionofabstractclasseswithsomemethodsfilledin.Onceyou’vesuppliedafew keymethodsthatremainunimplemented,yougetalltherest“forfree”.These partialimplementationclassesarenotintendedtobeuseddirectlyinmostordinaryprograms,butonlyasimplementationaidsforlibrarywriters.Hereisalist oftheseclassesandtheinterfacestheypartiallyimplement(allfromthepackage java.util):

AbstractClass Interfaces

AbstractCollection Collection

AbstractSet Collection,Set

AbstractList Collection,List

AbstractSequentialList Collection,List

AbstractMap Map

Theideaofusingpartialimplementationsinthiswayisaninstanceofadesign patterncalledTemplateMethod.Theterm designpattern inthecontextofobjectorientedprogramminghascometomean“thecoreofasolution toaparticular commonlyoccurringprobleminprogramdesign1.”The Abstract... classesare

1 TheseminalworkonthetopicistheexcellentbookbyE.Gamma,R.Helm,R.Johnson,andJ. Vlissides, DesignPatterns:ElementsofReusableObject-OrientedSoftware, Addison-Wesley,1995. Thisgroupandtheirbookareoftenreferredtoas“TheGangof Four.”

49

CHAPTER3.MEETINGASPECIFICATION

importjava.util.*; importjava.lang.reflect.Array; publicclassArrayCollection<T>implementsCollection<T>{ privateT[]data;

/**AnemptyCollection*/ publicArrayCollection(){data=(T[])newObject[0];}

/**ACollectionconsistingoftheelementsofC*/ publicArrayCollection(Collection<?extendsT>C){ data=C.toArray((T[])newObject[C.size()]); }

/**ACollectionconsistingofaviewoftheelementsofA.*/ publicArrayCollection(T[]A){data=T;}

publicintsize(){returndata.length;} publicIterator<T>iterator(){ returnnewIterator<T>(){ privateintk=0; publicbooleanhasNext(){returnk<size();} publicTnext(){

if(!hasNext())thrownewNoSuchElementException(); k+=1; returndata[k-1]; } publicvoidremove(){ thrownewUnsupportedOperationException(); } };

publicbooleanisEmpty(){returnsize()==0;}

publicbooleancontains(Objectx){ for(Ty:this){

if(x==null&&y==null

||x!=null&&x.equals(y)) returntrue; } returnfalse;

50
}
}
Figure3.1:Implementationofanewkindofread-only Collection “fromscratch.”

publicbooleancontainsAll(Collection<?>c){ for(Objectx:c) if(!contains(x)) returnfalse; returntrue;

}

publicObject[]toArray(){returntoArray(newObject[size()]);}

public<E>E[]toArray(E[]anArray){ if(anArray.length<size()){

Class<?>typeOfElement=anArray.getClass().getComponentType(); anArray=(E[])Array.newInstance(typeOfElement,size());

}

System.arraycopy(anArray,0,data,0,size()); returnanArray; }

privatebooleanUNSUPPORTED(){ thrownewUnsupportedOperationException();

}

publicbooleanadd(Tx){returnUNSUPPORTED();}

publicbooleanaddAll(Collection<?extendsT>c){returnUNSUPPORTED();} publicvoidclear(){UNSUPPORTED();}

publicbooleanremove(Objectx){returnUNSUPPORTED();}

publicbooleanremoveAll(Collection<?>c){returnUNSUPPORTED();}

publicbooleanretainAll(Collection<?>c){returnUNSUPPORTED();} }

51
Figure3.1,continued:Sincethisisaread-onlycollection,themethodsformodifyingthecollectionallthrow UnsupportedOperationException,thestandardway tosignalunsupportedfeatures.

CHAPTER3.MEETINGASPECIFICATION

usedastemplatesforrealimplementations.Usingmethodoverriding,theimplementorfillsinafewmethods;everythingelseinthetemplate usesthosemethods2 . Inthesectionstofollow,we’lllookathowtheseclassesare usedandwe’lllook atsomeoftheirinternalsforideasabouthowtousesomeofthefeaturesofJava classes.Butfirst,let’shaveaquicklookatthealternative

3.1DoingitfromScratch

Forcomparison,let’ssupposewewantedtointroduceasimpleimplementationthat simplyallowedustotreatanordinaryarrayofObjectsasaread-only Collection. ThedirectwaytodosoisshowninFigure3.1.Followingthespecificationof Collection,thefirsttwoconstructorsfor ArrayCollection providewaysofforminganemptycollection(notterriblyuseful,ofcourse,sinceyoucan’taddtoit)anda copyofanexistingcollection.Thethirdconstructorisspecifictothenewclass,and providesaviewofanarrayasa Collection—thatis,theitemsinthe Collection aretheelementsofthearray,andtheoperationsarethoseof the Collection interface.Nextcometherequiredmethods.The Iterator thatisreturnedby iterator hasananonymoustype;nouserof ArrayCollection cancreateanobjectofthis typedirectly.Sincethisisaread-onlycollection,theoptionalmethods(which modifycollections)areallunsupported.

ASideExcursiononReflection. Theimplementationofthesecond toArray methodisratherinteresting,inthatitusesafairlyexotic featureoftheJavalanguage: reflection. Thistermreferstolanguagefeaturesthatallowonetomanipulate constructsofaprogramminglanguagewithinthelanguageitself.InEnglish,we employreflectionwhenwesaysomethinglike“Theword‘hit’isaverb.”Thespecificationof toArray callsforustoproduceanarrayofthesamedynamictypeas theargument.Todoso,wefirstusethemethod getClass,whichisdefinedon all Objects,togetavalueofthebuilt-intype java.lang.Class thatstandsfor (reflects)thedynamictypeofthe anArray argument.Oneoftheoperationson type Class is getComponentType,which,foranarraytype,fetchesthe Class that reflectsthetypeofitselements.Finally,the newInstance method(definedinthe class java.lang.reflect.Array)createsanewarrayobject,givenitssizeandthe Class foritscomponenttype.

3.2TheAbstractCollectionClass

Theimplementationof ArrayCollection hasaninterestingfeature:themethods startingwith isEmpty makenomentionoftheprivatedataof ArrayCollection,

2Whilethename TemplateMethod maybeappropriateforthisdesignpattern,Imustadmitthat ithassomeunfortunateclasheswithotherusesoftheterminology.First,thelibrarydefineswhole classes, whilethenameofthepatternfocusesonindividualmethodswithinthatclass.Second,the term template hasanothermeaningwithinobject-orientedprogramming;inC++(andapparently inupcomingrevisionsofJava),itreferstoaparticularlanguageconstruct.

52

butinsteadrelyentirelyontheother(public)methods.Asa result,theycouldbe employedverbatimintheimplementationof any Collection class.Thestandard Javalibraryclass AbstractCollection exploitsthisobservation(seeFigure3.2). Itisapartiallyimplementedabstractclassthatnewkindsof Collection can extend.Atabareminimum,animplementorcanoverridejustthedefinitionsof iterator and size togetaread-onlycollectionclass.Forexample,Figure3.3 showsaneasierre-writeof ArrayCollection.If,inaddition,theprogrammer overridesthe add method,then AbstractCollection willautomaticallyprovide addAll aswell.Finally,ifthe iterator methodreturnsan Iterator thatsupports the remove method,then AbstractCollection willautomaticallyprovide clear, remove, removeAll,and retainAll.

Inprograms,theideaistouse AbstractCollection only inan extends clause. Thatis,itissimplyautilityclassforthebenefitofimplementorscreatingnew kindsof Collection,andshouldnotgenerallybeusedtospecifythetypeofa formalparameter,localvariable,orfield.This,bytheway, istheexplanationfor declaringtheconstructorfor AbstractCollection tobe protected;thatkeyword emphasizesthefactthatonlyextensionsof AbstractClass willcallit.

You’vealreadyseenfiveexamplesofhow AbstractCollection mightworkin Figure3.1:methods isEmpty, contains, containsAll,andthetwo toArray methods.Onceyougetthegeneralidea,itisfairlyeasytoproducesuchmethodbodies Theexercisesaskyoutoproduceafewmore.

3.3ImplementingtheListInterface

Theabstractclasses AbstractList and AbstractSequentialList arespecialized extensionsoftheclass AbstractCollection providedbytheJavastandardlibrary tohelpdefineclassesthatimplementthe List interface.Whichyouchoosedependsonthenatureoftherepresentationusedfortheconcretelisttypebeing implemented.

3.3.1TheAbstractListClass

Theabstractimplementationof List, AbstractList,sketchedinFigure3.4is intendedforrepresentationsthatprovidefast(generally constanttime) random access totheirelements—thatis,representationswithafastimplementationof get and(ifsupplied) remove Figure3.5showshow listIterator works,asapartial illustration.Thereareanumberofinterestingtechniques illustratedbythisclass.

Protectedmethods. Themethod removeRange isnotpartofthepublicinterface.Sinceitisdeclared protected,itmayonlybecalledwithinotherclasses inthepackage java.util,andwithinthebodiesofextensionsof AbstractList Suchmethodsare implementationutilities foruseintheclassanditsextensions. Inthestandardimplementationof AbstractList, removeRange isusedtoimplement clear (whichmightnotsoundtooimportantuntilyourememberthat L.subList(k0,k1).clear() ishowoneremovesanarbitrarysectionofa List).

53
3.3.IMPLEMENTINGTHELISTINTERFACE

CHAPTER3.MEETINGASPECIFICATION packagejava.util; publicabstractclassAbstractCollection<T>implementsCollection<T>{ /**TheemptyCollection.*/ protectedAbstractCollection<T>(){}

/**Unimplementedmethodsthatmustbeoverriddeninany *non-abstractclassthatextendsAbstractCollection*/

/**ThenumberofvaluesinTHIS.*/ publicabstractintsize();

/**AniteratorthatyieldsalltheelementsofTHIS,insome *order.Iftheremoveoperationissupportedonthisiterator, *thenremove,removeAll,clear,andretainAllonTHISwillwork.*/ publicabstractIterator<T>iterator();

/**Overridethisdefaultimplementationtosupportadding */ publicbooleanadd(Tx){ thrownewUnsupportedOperationException(); }

Default,general-purposeimplementationsof contains(Objectx),containsAll(Collectionc),isEmpty(), toArray(),toArray(Object[]A), addAll(Collectionc),clear(),remove(Objectx), removeAll(Collectionc),andretainAll(Collectionc)

/**AStringrepresentingTHIS,consistingofacomma-separated *listofthevaluesinTHIS,asreturnedbyitsiterator, *surroundedbysquarebrackets([]).Theelementsare *convertedtoStringsbyString.valueOf(whichreturns"null" *forthenullpointerandotherwisecallsthe.toString()method).*/ publicStringtoString(){...}

Figure3.2:Theabstractclass java.util.AbstractCollection,whichmaybe usedtohelpimplementnewkindsof Collection.Allthemethodsbehaveas specifiedinthespecificationof Collection.Implementorsmustfillindefinitions of iterator and size,andmayeitheroverridetheothermethods,orsimplyuse theirdefaultimplementations(notshownhere).

54
}

importjava.util.*;

/**Aread-onlyCollectionwhoseelementsarethoseofanarray.*/ publicclassArrayCollection<T>extendsAbstractCollection<T>{ privateT[]data;

/**AnemptyCollection*/ publicArrayCollection(){ data=(T[])newObject[0];

/**ACollectionconsistingoftheelementsofC*/ publicArrayCollection(Collection<?extendsT>C){ data=C.toArray(newObject[C.size()]);

/**ACollectionconsistingofaviewoftheelementsofA.*/ publicArrayCollection(Object[]A){ data=A;

publicintsize(){returndata.length;}

publicIterator<T>iterator(){ returnnewIterator<T>(){ privateintk=0; publicbooleanhasNext(){returnk<size();} publicTnext(){

if(!hasNext())thrownewNoSuchElementException(); k+=1; returndata[k-1];

publicvoidremove(){ thrownewUnsupportedOperationException();

3.3.IMPLEMENTINGTHELISTINTERFACE 55
}
}
}
}
} }; }
Figure3.3:Re-implementationof ArrayCollection,usingthedefaultimplementationsfrom java.util.AbstractCollection.

Thedefaultimplementationof removeRange simplycalls remove(k) repeatedlyand soisnotparticularlyfast.Butifaparticular List representationallowssomebetterstrategy,thentheprogrammercanoverride removeRange,gettingbetterperformancefor clear (that’swhythedefaultimplementaionofthemethodisnot declared final,eventhoughitiswrittentoworkforanyrepresentationof List).

CheckingforInvalidity. Aswediscussedin §2.2.3,the iterator, listIterator, and subList methodsofthe List interfaceproduceviewsofalistthat“become invalid”ifthelistisstructurallychanged.Implementors of List areunderno particularobligationtodoanythingsensiblefortheprogrammerwhoignoresthis provision;usinganinvalidatedviewmayproduceunpredictableresultsorthrowan unexpectedexception,asconvenient.Nevertheless,the AbstractList classgoesto sometroubletoprovideawaytoexplicitlycheckforthiserror,andimmediately throwaspecificexception, ConcurrentModificationException,ifithappens.The field modCount (declared protected toindicateitisintendedforListimplementors, notusers)keepstrackofthenumberofstructuralmodificationstoan AbstractList Everycallto add or remove (eitherontheListdirectlyorthroughaview)issupposedtoincrementit.Individualviewscanthenkeeptrackofthelastvaluethey “saw”forthe modCount fieldoftheirunderlyingListandthrowanexceptionifit seemstohavechangedintheinterim.We’llseeanexampleinFigure3.5.

HelperClasses. The subList methodof AbstractList (atleastinSun’simplementation)usesanon-publicutilitytype java.util.SubList toproduceits result.Becauseitisnotpublic, java.util.SubList isineffectprivatetothe java.util package,andisnotanofficialpartoftheservicesprovidedby that package.However,beinginthesamepackage,itisallowedto accessthenon-public fields(modCount)andutilitymethods(removeRange)of AbstractList.Thisisan exampleofJava’smechanismforallowing“trusted”classes (thoseinthesamepackage)accesstotheinternalsofaclasswhileexcludingaccessfromother“untrusted” classes.

3.3.2TheAbstractSequentialListClass

Thesecondabstractimplementationof List, AbstractSequentialList (Figure3.6), isintendedforusewithrepresentationswhererandomaccessisrelativelyslow,but the next operationofthelistiteratorisstillfast.

Thereasonforhavingadistinctclassforthiscasebecomesclearwhenyou considertheimplementationsof get andthe next methodsoftheiterators.Ifwe assumeafast get method,thenitiseasytoimplementtheiteratorstohavefast next methods,aswasshowninFigure3.5.If get isslow—specifically,iftheonly waytoretrieveitem k ofthelististosequencethroughthepreceding k items—then implementing next asinthatfigurewouldbedisasterous;itwouldrequireΘ(N 2) operationstoiteratethroughan N -elementlist.Sousing get toimplementthe iteratorsisnotalwaysagoodidea.

56
CHAPTER3.MEETINGASPECIFICATION

packagejava.util;

publicabstractclassAbstractList<T>

extendsAbstractCollection<T>implementsList<T>{

/**Constructanemptylist.*/ protectedAbstractList(){modCount=0;}

abstractTget(intindex); abstractintsize();

Tset(intk,Tx){returnUNSUPPORTED();}

voidadd(intk,Tx){UNSUPPORTED();}

Tremove(intk){returnUNSUPPORTED();}

Default,general-purposeimplementationsof add(x),addAll,clear,equals,hashCode,indexOf,iterator, lastIndexOf,listIterator,set,andsubList

/**ThenumberoftimesTHIShashadelementsaddedorremoved.*/ protectedintmodCount;

/**RemovefromTHISallelementswithindicesinthe rangeK0..K1-1.*/

protectedvoidremoveRange(intk0,intk1){ ListIterator<T>i=listIterator(k0); for(intk=k0;k<k1&&i.hasNext();k+=1){ i.next();i.remove();

privateObjectUNSUPPORTED()

{thrownewUnsupportedOperationException();}

3.3.IMPLEMENTINGTHELISTINTERFACE 57
} }
}
Figure3.4:Theabstractclass AbstractList,usedasanimplementationaidin writingimplementationsof List thatareintendedforrandomaccess.SeeFigure3.5 fortheinnerclass ListIteratorImpl.

CHAPTER3.MEETINGASPECIFICATION

publicListIterator<T>listIterator(intk0){ returnnewListIteratorImpl(k0); }

privateclassListIteratorImpl<T>implementsListIterator<T>{ ListIteratorImpl(intk0) {lastMod=modCount;k=k0;lastIndex=-1;}

publicbooleanhasNext(){returnk<size();} publichasPrevious(){returnk>0;}

publicTnext(){ check(0,size());

lastIndex=k;k+=1;returnget(lastIndex);

publicTprevious(){ check(1,size()+1); k-=1;lastIndex=k;returnget(k);

publicintnextIndex(){returnk;} publicintpreviousIndex(){returnk-1;}

publicvoidadd(Tx){ check();lastIndex=-1; k+=1;AbstractList.this.add(k-1,x); lastMod=modCount;

publicvoidremove(){ checkLast();AbstractList.this.remove(lastIndex); lastIndex=-1;lastMod=modCount;

publicvoidset(Tx){

checkLast();AbstractList.this.remove(lastIndex,x); lastIndex=-1;lastMod=modCount;

58
}
}
}
}
}
Figure3.5:Partofapossibleimplementationof AbstractList,showingtheinner classprovidingthevalueof listIterator

//ClassAbstractList.ListIteratorImpl,continued.

/*Privatedefinitions*/

/**modCountvalueexpectedforunderlyinglist.*/ privateintlastMod;

/**Currentposition.*/ privateintk;

/**Indexoflastresultreturnedbynextorprevious.*/ privateintlastIndex;

/**Checkthattherehasbeennoconcurrentmodification.Throws *appropriateexceptioniftherehas.*/ privatevoidcheck(){

if(modCount!=lastMod)thrownewConcurrentModificationException(); }

/**Checkthattherehasbeennoconcurrentmodificationand that *thecurrentposition,k,isintherangeK0<=k<K1.Throws *appropriateexceptionifeithertestfails.*/ privatevoidcheck(intk0,intk1){ check();

if(k<k0||k>=k1) thrownewNoSuchElementException(); }

/**Checkthattherehasbeennoconcurrentmodificationand that *thereisavalid‘‘lastelementreturnedbynextorprevious’’. *Throwsappropriateexceptionifeithertestfails.*/ privatecheckLast(){ check();

if(lastIndex==-1)thrownewIllegalStateException();

3.3.IMPLEMENTINGTHELISTINTERFACE 59
}
Figure3.5,continued:Privaterepresentationofthe ListIterator

CHAPTER3.MEETINGASPECIFICATION

publicabstractclassAbstractSequentialList<T>extends AbstractList<T>{ /**Anemptylist*/ protectedAbstractSequentialList(){}

abstractintsize();

abstractListIterator<T>listIterator(intk);

Defaultimplementationsof add(k,x),addAll(k,c),get,iterator,remove(k),set

From AbstractList,inheritedimplementationsof add(x),clear,equals,hashCode,indexOf,lastIndexOf, listIterator(),removeRange,subList

From AbstractCollection,inheritedimplementationsof addAll(),contains,containsAll,isEmpty,remove(),removeAll, retainAll,toArray,toString

Ontheotherhand,ifwewerealwaystoimplement get(k) byiteratingoverthe preceding k items(thatis,usethe Iterator’smethodstoimplement get rather thanthereverse),wewouldobviouslyloseoutonrepresentationswhere get is fast.

3.4TheAbstractMapClass

The AbstractMap classshowninFigure3.7providesatemplateimplementationfor the Map interface.Overridingjustthe entrySet toprovidearead-only Set gives aread-only Map.Additionallyoverridingthe put methodgivesanextendable Map, andimplementingthe remove methodfor entrySet().iterator() givesafully modifiable Map

3.5PerformancePredictions

AtthebeginningofChapter2,Isaidthattherearetypically severalimplementationsforagiveninterface.Thereareseveralpossiblereasonsonemightneedmore thanone.First,specialkindsofstoreditems,keys,orvaluesmightneedspecial handling,eitherforspeed,orbecausethereareextraoperationsthatmakesense onlyforthesespecialkindsofthings.Second,someparticular Collections or Maps mayneedaspecialimplementationbecausetheyarepartofsomethingelse, suchasthe subList or entrySet views.Third,oneimplementationmayperform

60
}
Figure3.6:Theclass AbstractSequentialList.
.

packagejava.util; publicabstractclassAbstractMap<Key,Val>implementsMap<Key,Val>{ /**Anemptymap.*/ protectedAbstractMap(){}

/**AviewofTHISasthesetofallits(key,value)pairs. *IftheresultingSet’siteratorsupportsremove,thenTHIS *mapwillsupporttheremoveandclearoperations.*/ publicabstractSet<Entry<Key,Val>>entrySet();

/**Causeget(KEY)toyieldVAL,withoutdisturbingothervalues.*/ publicValput(Keykey,Valval){ thrownewUnsupportedOperationException();

Defaultimplementationsof clear,containsKey,containsValue,equals,get,hashCode, isEmpty,keySet,putAll,remove,size,values

/**PrintaStringrepresentationofTHIS,intheform *{KEY0=VALUE0,KEY1=VALUE1,...}

*wherekeysandvaluesareconvertedusingString.valueOf(...).*/ publicStringtoString(){...}

3.5.PERFORMANCEPREDICTIONS 61
}
}
Figure3.7:Theclass AbstractMap

CHAPTER3.MEETINGASPECIFICATION

betterthananotherinsomecircumstances,butnotinothers.Finally,theremaybe time-vs.-spacetradeoffsbetweendifferentimplementations,andsomeapplications mayhaveparticularneedforacompact(space-efficient)representation.

Wecan’tmakespecificclaimsabouttheperformanceofthe Abstract... family ofclassesdescribedherebecausetheyaretemplatesrather thancompleteimplementations.However,wecancharacterizetheirperformanceasafunctionofthe methodsthattheprogrammerfillsin.Here,let’sconsidertwoexamples:theimplementationtemplatesforthe List interface.

AbstractList. Thestrategybehind AbstractLististousethemethods size, get(k), add(k,x), set(k,x),and remove(k) suppliedbytheextendingtypeto implementeverythingelse.The listIterator methodreturnsa ListIterator thatuses get toimplement next and previous, add (on AbstractList)toimplementtheiterator’s add,and remove (on AbstractList)toimplementtheiterator’s remove.Thecostoftheadditionalbookkeepingdonebytheiterator consistsof incrementingordecrementinganintegervariable,andisthereforeasmallconstant. Thus,wecaneasilyrelatethecostsoftheiteratorfunctionsdirectlytothoseofthe suppliedmethods,asshowninthefollowingtable.Tosimplifymatters,wetakethe timecostsofthe size operationandthe equals operationonindividualitemsto beconstant.Thevaluesofthe“plugged-in”methodsaregivennamesoftheform Cα;thesizeof this (the List)is N ,andthesizeoftheotherCollectionargument

c,whichwe’llassumeisthesamekindof List,justtobeabletosay

62
(denoted
more)is M Costsof AbstractList Implementations List ListIterator MethodTimeasΘ(·) MethodTimeasΘ(·) add(k,X) Ca add Ca get(k) Cg remove Cr remove(k) Cr next Cg set Cs previous Cg remove(X) Cr + N · Cg set Cs indexOf N · Cg hasNext1 lastIndexOf N · Cg listIterator(k)1 iterator()1 subList1 size1 isEmpty1 contains N Cg containsAll(c) N · M · Cg addAll(c) M · Cg +(N + M ) · Ca toArray N · Cg

AbstractSequentialList. Let’snowcomparethe AbstractList implementation with AbstractSequentialList,whichisintendedtobeusedwithrepresentations thatdon’thavecheap get operations,butstilldohavecheapiterators.Inthiscase, the get(k) operationisimplementedbycreatinga ListIterator andperforming a next operationonit k times.Wegetthefollowingtable:

Costsof AbstractList Implementations

3.1. Provideabodyforthe addAll methodof AbstractCollection.Itcanassumethat add willeitherthrow UnsupportedOperationException ifaddingtothe Collectionisnotsupported,orwilladdanelement.

3.2. Provideabodyforthe removeAll methodof AbstractCollection.You mayassumethat,ifremovalissupported,the remove operationoftheresultof iterator works.

3.3. Provideapossibleimplementationofthe java.util.SubList class.This utilityclassimplements List andhasoneconstructor:

/**AviewofitemsK0throughtK1-1ofTHELIST.Subsequent *modificationstoTHISalsomodifyTHELIST.Anystructural

*modificationtoTHELISTotherthanthroughTHISandany *iteratorsorsublistsderivedfromitrendersTHISinvalid

*OperationsonaninvalidSubListthrow

*ConcurrentModificationException*/ SubList(AbstractListtheList,intk0,intk1){...}

3.5.PERFORMANCEPREDICTIONS 63
List ListIterator MethodTimeasΘ( ) MethodTimeasΘ( ) add(k,X) Ca + k Cn add Ca get(k) k Cn remove Cr remove(k) Cr + k Cn next Cn set(k,X) Cs + k Cn previous Cp remove(X) Cr + N Cg set Cs indexOf N Cn hasNext1 lastIndexOf N · Cp listIterator(k) k · Cn iterator()1 subList1 size1 isEmpty1 contains N Cn containsAll(c) N M Cn addAll(c) M Cn + N Ca toArray N · Cn
Exercises

3.4. Forclass AbstractSequentialList,providepossibleimplementationsof add(k,x) and get.Arrangetheimplementationsothatperforminga get ofanelementator near eitherend ofthelistisfast.

3.5. Extendthe AbstractMap classtoproduceafullimplementationof Map.Try toleaveasmuchaspossibleupto AbstractMap,implementingjustwhatyouneed. Forarepresentation,provideanimplementationof Map.Entry andthenusethe existingimplementationof Set providedbytheJavalibrary, HashSet.Callthe resultingclass SimpleMap.

3.6. In §3.5,wedidnottalkabouttheperformanceofoperationsonthe Listsreturnedbythe subList method.Providetheseestimatesforboth AbstractList and AbstractSequentialList.For AbstractSequentialList,thetimerequirement forthe get methodonasublist must dependonthethefirstargumentto subList (thestartingpoint).Whyisthis?Whatchangetothedefinitionof ListIterator couldmaketheperformanceof get (andotheroperations)onsublistsindependent onwhereintheoriginallistthesublistcomesfrom?

64
CHAPTER3.MEETINGASPECIFICATION

Chapter4

SequencesandTheir Implementations

InChapters2and3,wesawquiteabitofthe List interfaceandsomeskeleton implementations.Here,wereviewthestandardrepresentations(concreteimplementations)ofthisinterface,andalsolookatinterfacesandimplementationsof somespecializedversions,the queue datastructures.

4.1ArrayRepresentationoftheListInterface

Most“production”programminglanguageshavesomebuilt-indatastructurelike theJavaarray—arandom-accesssequenceofvariables,indexedbyintegers.The arraydatastructurehastwomainperformanceadvantages.First,itisacompact (space-efficient)representationforasequenceofvariables,typicallytakinglittle morespacethantheconstituentvariablesthemselves.Second,randomaccessto anygivenvariableinthesequenceisafast,constant-timeoperation.Thechief disadvantageisthatchangingthesizeofthesequencerepresentedisslow(inthe worstcase).Nevertheless,withalittlecare,wewillseethatthe amortizedcost of operationsonarray-backedlistsisconstant.

Oneofthebuilt-inJavatypesis java.util.ArrayList,whichhas,inpart,the implementationshowninFigure4.11.So,youcancreateanew ArrayList withits constructors,optionallychoosinghowmuchspaceitinitiallyhas.Thenyoucanadd items(with add),andthearrayholdingtheseitemswillbeexpandedasneeded.

Whatcanwesayaboutthecostoftheoperationson ArrayList?Obviously, get and size areΘ(1);theinterestingoneis add.Asyoucansee,thecapacityof an ArrayList isalwayspositive.Theimplementationof add uses ensureCapacity wheneverthearraypointedtoby data needstoexpand,anditrequeststhatthe

1 TheJavastandardlibrarytype java.util.Vector providesessentiallythesamerepresentation.Itpredates ArrayList andtheintroductionofJava’sstandard Collection classes,and was“retrofitted”tomeetthe List interface.Asaresult,manyexistingJavaprogramstendto use Vector,andtendtouseits(nowredundant)pre-List operations,suchas elementAt and removeAllElements (sameas get and clear).The Vector classhasanotherdifference:itis synchronized, whereas ArrayList isnot.See §10.1forfurtherdiscussion.

65

CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS

capacityofthe ArrayList—thesizeof data—should double wheneveritneedsto expand.Let’slookintothereasonbehindthisdesignchoice.We’llconsiderjust thecall A.add(x),whichcalls A.add(A.size(),x).

Supposethatwereplacethelines

if(count+1>data.length) ensureCapacity(data.length*2); withthealternativeminimalexpansion: ensureCapacity(count+1); Inthiscase,oncetheinitialcapacityisexhausted,each add operationwillexpand thearray data.Let’smeasurethecostof add innumberofassignmentstoarray elements[whyisthatreasonable?].InJava,wecantakethecostoftheexpression newObject[K ] tobeΘ(K).Thisdoesnotchangewhenweaddinthecostofcopyingelementsfromthepreviousarrayintoit(using System.arraycopy).Therefore, theworst-casecost, Ci,ofexecuting A.add(x) usingoursimpleincrement-size-by-1 schemeis

Ci(K,M )= α1, if M>K; α2(K +1), if M = K where K is A.size(), M ≥ K is A’scurrentcapacity(thatis, A.data.length), andthe αi aresomeconstants.So,conservativelyspeaking,wecanjustsaythat C(K,M,I) ∈ Θ(K).

Nowlet’sconsiderthecost, Cd,oftheimplementationasshowninFigure4.1, wherewealwaysdoublethecapacitywhenitmustbeincreased.Thistime,weget

Cd(K,M )= α1, if M>K; α3(2K +1), if M = K

theworst-casecostlooksidentical;thefactoroftwoincreaseinsizesimplychanges theconstantfactor,andwecanstillusethesameformulaasbefore: Cd(K,M ) ∈ O(K).

Sofromthisna¨ıveworst-caseasymptoticpointofview,itwouldappearthetwo alternativestrategieshaveidenticalcosts.Yetweoughttobesuspicious.Consideranentire series of add operationstogether,ratherthanjustone.Withthe increment-size-by-1strategy,weexpandeverytime.Byconstrast,withthesizedoublingstrategy,weexpandlessandlessoftenasthearray grows,sothatmost callsto add completeinconstanttime.Soisitreallyaccuratetocharacterizethem astakingtimeproportionalto K?

Consideraseriesof N callsto A.add(x),startingwith A anempty ArrayList withinitialcapacityof M0 <N .Withtheincrement-by-1strategy,callnumber M0,(numberingfrom0), M0 +1, M0 +2,etc.willcosttimeproportionalto M0 +1, M0 +2,...,respectively.Therefore,thetotalcost, Cincr,of N>M0 operations beginningwithanemptylistofinitialsize M0 willbe

Cincr ∈ Θ(M0 + M0 +1+ ... + N )

=Θ((N + M0) N/2)

=Θ(N 2)

66

4.1.ARRAYREPRESENTATIONOFTHELISTINTERFACE 67 packagejava.util; /**AListwithaconstant-timegetoperation.Atanygiventime, *anArrayListhasa capacity, whichindicatesthemaximum *size()forwhichtheone-argumentaddoperation(whichaddsto *theend)willexecuteinconstanttime.Thecapacityexpands *automaticallyasneededsoastoprovideconstantamortized *timefortheone-argumentaddoperation.*/

publicclassArrayListextendsAbstractListimplementsCloneable{

/**AnemptyArrayListwhosecapacityisinitiallyatleast *CAPACITY.*/

publicArrayList(intcapacity){ data=newObject[Math.max(capacity,2)];count=0;

publicArrayList(){this(8);}

publicArrayList(Collectionc){ this(c.size());addAll(c);

publicintsize(){returncount;}

publicObjectget(intk){ check(k,count);returndata[k];

publicObjectremove(intk){ Objectold=data[k]; removeRange(k,k+1); returnold;

publicObjectset(intk,Objectx){ check(k,count);

Objectold=data[k]; data[k]=x; returnold;

}
}
}
}
}
Figure4.1:Implementationoftheclass java.util.ArrayList.

CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS

publicvoidadd(intk,Objectobj){ check(k,count+1); if(count+1>data.length)

ensureCapacity(data.length*2); System.arraycopy(data,k,data,k+1,count-k); data[k]=obj;count+=1;

/*CausethecapacityofthisArrayListtobeatleastN.*/ publicvoidensureCapacity(intN){ if(N<=data.length)

return;

Object[]newData=newObject[N]; System.arraycopy(data,0,newData,0,count); data=newData;

/**AcopyofTHIS(overridesmethodinObject).*/ publicObjectclone(){returnnewArrayList(this);}

protectedvoidremoveRange(intk0,intk1){ if(k0>=k1)

return; check(k0,count);check(k1,count+1); System.arraycopy(data,k1,data,k0,count-k1); count-=k1-k0;

privatevoidcheck(intk,intlimit){ if(k<0||k>=limit)

thrownewIndexOutOfBoundsException();

privateintcount;/*Currentsize*/ privateObject[]data;/*Currentcontents*/

Figure4.1,continued.

68
}
}
}
}
}

forfixed M0.Thecost,inotherwords,isquadraticinthenumberofitems added. Nowconsiderthedoublingstrategy.We’llanalyzeitusingthepotentialmethod from §1.4toshowthatwecanchooseaconstantvaluefor ai,theamortizedcostof the ith operation,byfindingasuitablepotentialΦ ≥ 0sothat(fromEquation1.1),

ai = ci +Φi+1 Φi, where ci denotestheactualcostofthe ith addition.Inthiscase,asuitablepotential is

Φi =4i 2Si +2S0 where Si isthecapacity(thesizeofthearray)beforethe ith operation.Afterthe firstdoubling,wealwayshave2i ≥ Si,sothatΦi ≥ 0forall i. Wecantakethenumberofitemsinthearraybeforethe ith additionas i, assumingasusualthatwenumberadditionsfrom0.Theactual cost, ci,ofthe ith additioniseither1timeunit,if i<Si,orelse(when i = Si)thecostofallocatinga doubledarray,copyingalltheexistingitems,andthenaddingonemore,whichwe cantakeas2Si timeunits(withsuitablechoiceof“timeunit,”ofcourse). When i<Si,therefore,wehave

So ai =4,showingthattheamortizedcostofaddingtotheendofanarrayunder thedoublingstrategyisindeedconstant.

4.2LinkinginSequentialStructures

Theterm linkedstructure refersgenerallytocomposite,dynamicallygrowabledata structurescomprisingsmallobjectsfortheindividualmembersconnectedtogether bymeansofpointers(links).

4.2.1SinglyLinkedLists

TheSchemelanguagehasonepervasivecompounddatastructure,the pair or cons cell, whichcanservetorepresentjustaboutanydatastructureonecanimagine. Perhapsitsmostcommonuseisinrepresentinglistsofthings,asillustratedin

69
4.2.LINKINGINSEQUENTIALSTRUCTURES
ai = ci +Φi+1 Φi =1+4(i +1) 2Si+1 +2S0 (4i 2Si +2S0) =1+4(i +1) 2Si +2S0 (4i 2Si +2S0) =4
i
S
ai = ci +Φi+1 Φi =2Si +4(i +1) 2Si+1 +2S0 (4i 2Si +2S0) =2Si +4(i +1) 4Si +2S0 (4i 2Si +2S0) =4
andwhen
=
i,wehave

CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS

Figure4.2a.Eachpairconsistsoftwocontainers,oneofwhichisusedtostorea (pointerto)thedataitem,andthesecondapointertothenextpairinthelist,or anullpointerattheend.InJava,aroughequivalenttothepairisaclasssuchas thefollowing:

classEntry{

Entry(Objecthead,Entrynext){ this.head=head;this.next=next;

Objecthead; Entrynext;

Wecalllistsformedfromsuchpairs singlylinked, becauseeachpaircarriesone pointer(link )toanotherpair.

Changingthestructureofalinkedlist(itssetofcontainers)involveswhatis colloquiallyknownas“pointerswinging.”Figure4.2reviewsthebasicoperations forinsertionanddeletiononpairsusedaslists.

4.2.2Sentinels

AsFigure4.2illustrates,theprocedureforinsertingordeletingatthebeginningof alinkedlistdiffersfromtheprocedureformiddleitems,becauseitisthevariable L,ratherthana next fieldthatgetschanged:

L=L.next;//RemovefirstitemoflinkedlistpointedtobyL L=newEntry("aardvark",L);//Additemtofrontoflist.

Wecanavoidthisrecoursetospecialcasesforthebeginning ofalistbyemploying aclevertrickknownasa sentinelnode.

Theideabehindasentinelistouseanextraobject,onethatdoesnotcarry oneoftheitemsofthecollectionbeingstored,toavoidhavinganyspecialcases. Figure4.3illustratestheresultingrepresentation.

Useofsentinelschangessometests.Forexample,testingto seeiflinkedlist L listisemptywithoutasentinelissimplyamatterofcomparing L tonull,whereas thetestforalistwithasentinelcompares L.next tonull.

4.2.3DoublyLinkedLists

Singlylinkedlistsaresimpletocreateandmanipulate,but theyareatadisadvantageforfullyimplementingtheJava List interface.Oneobviousproblemis thatthe previous operationonlistiteratorshasnofastimplementationonsingly linkedstructures.Oneisprettymuchforcedtoreturntothe startofthelistand followanappropriatenumberof next fieldstoalmostbutnotquitereturntothe currentposition,requiringtimeproportionaltothesizeofthelist.Asomewhat moresubtleannoyancecomesintheimplementationofthe remove operationonthe listiterator.Toremoveanitem p fromasinglylinkedlist,youneedapointerto theitem before p,becauseitisthe next fieldofthatobjectthatmustbemodified.

70
}
}

(a)Originallist

(b)Afterremoving bat with L.next=L.next.next

(c)Afteradding balance with L.next=newEntry("balance",L.next)

balance

(d)Afterdeleting ax with L=L.next

Figure4.2: Commonoperationsonthesinglylinkedlistrepresentation.Starting fromaninitiallist,weremoveobject β,andtheninsertinitsplaceanewone.Next weremovethefirstiteminthelist.Theobjectsremovedbecome“garbage,”and arenolongerreachablevia L

4.2.LINKINGINSEQUENTIALSTRUCTURES 71 L: α β γ ax bat syzygy
L: α β γ ax bat syzygy
L: α β γ δ ax bat syzygy
balance
L: α β γ δ ax bat syzygy

Bothproblemsareeasilysolvedbyaddinga predecessor linktotheobjects inourliststructure,makingboththeitemsbeforeandafter agiveniteminthe listequallyaccessible.Aswithsinglylinkedstructures, theuseoffrontandend sentinelsfurthersimplifiesoperationsbyremovingthespecialcasesofaddingtoor removingfromthebeginningorendofalist.Afurtherdevice inthecaseofdoubly linkedstructuresistomaketheentirelist circular, thatis,touseonesentinelas boththefrontandbackofthelist.Thiscutetricksavesthesmallamountofspace otherwisewastedbythe prev linkofthefrontsentinelandthe next linkofthelast.

4.3LinkedImplementationoftheListInterface

Thedoublylinkedstructuresupportseverythingweneedtodotoimplementthe Java List interface.Thetypeofthelinks(LinkedList.Entry)isprivatetothe implementation.A LinkedList objectitselfcontainsjustapointertothelist’s sentinel(whichneverchanges,oncecreated)andaninteger variablecontaining thenumberofitemsinthelist.Technically,ofcourse,thelatterisredundant, sinceonecanalwayscountthenumberofitemsinthelist,but keepingthisvariableallows size tobeaconstant-timeoperation.Figure4.5illustratesthe three maindatastructuresinvolved: LinkedList, LinkedList.Entry,andtheiterator LinkedList.LinkedIter.

72
L: ax bat syzygy sentinel (a)Three-itemlist E: sentinel (b)Emptylist
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
Figure4.3: Singlylinkedlistsemployingsentinelnodes.Thesentinelscontainno usefuldata.Theyallowallitemsinthelisttotreatedidentically,withnospecial caseforthefirstnode.Thesentinelnodeistypicallyneverremovedorreplaced whilethelistisinuse. Figure4.4illustratestheresultingrepresentationsandtheprincipaloperationsupon it.

(b)Afterdeletingitem

(c)Afteraddingitem

(d)Afterremovingallitems,andremovinggarbage.

Figure4.4: Doublylinkedlistsemployingasinglesentinelnodetomark bothfront andback.Shadeditemisgarbage.

4.3.LINKEDIMPLEMENTATIONOFTHELISTINTERFACE 73 δ α β γ δ α sentinel L: ax bat syzygy
δ α β γ δ α sentinel L: ax bat syzygy
(a)Initiallist
γ (bat) δ α β γ δ α ǫ sentinel L: ax bat syzygy balance
ǫ (balance) α sentinel L:

Datastructureafterexecuting: L=newLinkedList<String>(); L.add("axolotl"); L.add("kludge"); L.add("xerophyte"); I=L.listIterator(); I.next();

Figure4.5: Atypical LinkedList (pointedtoby L andalistiterator(pointedto by I).Sincetheiteratorbelongstoaninnerclassof LinkedList,itcontainsan implicitprivatepointer(LinkedList.this)thatpointsbacktothe LinkedList objectfromwhichitwascreated.

74
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
β α β α axolotl kludge xerophyte L: 3 I: LinkedList.this lastReturned here 1 nextIndex

4.3.LINKEDIMPLEMENTATIONOFTHELISTINTERFACE 75 packagejava.util;

publicclassLinkedList<T>extendsAbstractSequentialList<T> implementsCloneable{

publicLinkedList(){ sentinel=newEntry(); size=0;

}

publicLinkedList(Collection<?extendsT>c){ this(); addAll(c); }

publicListIterator<T>listIterator(intk){ if(k<0||k>size)

thrownewIndexOutOfBoundsException(); returnnewLinkedIter(k);

}

publicObjectclone(){ returnnewLinkedList(this);

}

publicintsize(){returnsize;}

privatestaticclassEntry<E>{ Edata;

Entryprev,next; Entry(Edata,Entry<E>prev,Entry<E>next){ this.data=data;this.prev=prev;this.next=next;

Entry(){data=null;prev=next=this;}

privateclassLinkedIterimplementsListIterator{ SeeFigure4.7.

}

privatefinalEntry<T>sentinel; privateintsize;

}
}
}
Figure4.6:Theclass LinkedList.

CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS packagejava.util;

publicclassLinkedList<T>extendsAbstractSequentialList<T> implementsCloneable{

privateclassLinkedIter<E>implementsListIterator<E>{ Entry<E>here,lastReturned; intnextIndex;

/**Aniteratorwhoseinitialnextelementisitem *KofthecontainingLinkedList.*/ LinkedIter(intk){

if(k>size-k){//Closertotheend here=sentinel;nextIndex=size; while(k<nextIndex)previous(); }else{ here=sentinel.next;nextIndex=0; while(k>nextIndex)next();

}

lastReturned=null;

}

publicbooleanhasNext(){returnhere!=sentinel;} publicbooleanhasPrevious(){returnhere.prev!=sentinel;}

publicEnext(){ check(here); lastReturned=here; here=here.next;nextIndex+=1; returnlastReturned.data;

}

publicEprevious(){ check(here.prev); lastReturned=here=here.prev; nextIndex-=1; returnlastReturned.data;

}

76
. . .
Figure4.7:Theinnerclass LinkedList.LinkedIter.Thisversiondoesnotcheck forconcurrentmodificationoftheunderlyingList.

4.3.LINKEDIMPLEMENTATIONOFTHELISTINTERFACE 77

publicvoidadd(Tx){

lastReturned=null; Entry<T>ent=newEntry<T>(x,here.prev,here); nextIndex+=1; here.prev.next=here.prev=ent; size+=1;

publicvoidset(Tx){

checkReturned();

lastReturned.data=x;

publicvoidremove(){ checkReturned();

lastReturned.prev.next=lastReturned.next; lastReturned.next.prev=lastReturned.prev; if(lastReturned==here) here=lastReturned.next; else nextIndex-=1; lastReturned=null; size-=1;

publicintnextIndex(){returnnextIndex;} publicintpreviousIndex(){returnnextIndex-1;}

voidcheck(Objectp){

if(p==sentinel)thrownewNoSuchElementException();

voidcheckReturned(){

if(lastReturned==null)thrownewIllegalStateException ();

}
}
}
}
} }
Figure4.7,continued.

Figure4.8: Threevarietiesofqueues—sequentialdatastructuresmanipulatedonly attheirends.

4.4SpecializedLists

Acommonuseforlistsisinrepresentingsequencesofitemsthataremanipulated andexaminedonlyatoneorbothends.Ofthese,themostfamiliarare

• The stack (or LIFOqueue for“Last-InFirstOut”),whichsupportsonly addinganddeletingitemsatoneend;

• The queue (or FIFOqueue, for“First-InFirstOut”),whichsupportsadding atoneendanddeletionfromtheother;and

• The deque or double-endedqueue,whichsupportsadditionanddeletionfrom eitherend.

whoseoperationsareillustratedinFigure4.8.

4.4.1Stacks

Javaprovidesatype java.util.Stack asanextensionofthetype java.util.Vector (itselfanoldervariationof ArrayList):

packagejava.util;

publicclassStack<T>extendsVector<T>{

/**AnemptyStack.*/ publicStack(){}

publicbooleanempty(){returnisEmpty();}

publicTpeek(){check();returnget(size()-1);} publicTpop(){check();returnremove(size()-1);} publicTpush(Tx){add(x);returnx;} publicintsearch(Objectx){ intr=lastIndexOf(x); returnr==-1?-1:size()-r;

} privatevoidcheck(){

78
D C B A push pop (a)Stack A B C D add removeFirst (b)(FIFO)Queue A B C D add removeFirst removeLast addFirst (c)Deque
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS

packageucb.util; importjava.util.*; /**ALIFOqueueofT’s.*/ publicinterfaceStack<T>{

/**TrueiffTHISisempty.*/ booleanisEmpty();

/**Numberofitemsinthestack.*/ intsize();

/**Thelastiteminsertedinthestackandnotyetremoved.*/ Ttop();

/**Removeandreturnthetopitem.*/ Tpop();

/**AddXasthelastitemofTHIS.*/ voidpush(Tx);

/**Theindexofthemost-recentlyinserteditemthatis.equalto *X,or-1ifitisnotpresent.Item0istheleastrecently *pushed.*/ intlastIndexOf(Objectx);

Figure4.9:Apossibledefinitionoftheabstracttype Stack asaJavainterface. Thisis not partoftheJavalibrary,butitsmethodnamesaremoretraditionalthan thoseofJava’sofficial java.util.Stack type.Itisdesigned,furthermore,tofitin withimplementationsofthe List interface.

if(empty())thrownewEmptyStackException();

However,becauseitisoneoftheoldertypesinthelibrary, java.util.Stack doesnotfitinaswellasitmight.Inparticular,thereisnoseparateinterface describing“stackness.”Insteadthereisjustthe Stack class,inextricablycombining aninterfacewithanimplementation.Figure4.9showshowa Stack interface(in theJavasense)mightbedesigned.

Stackshavenumeroususes,inpartbecauseoftheircloserelationshipto recursion and backtrackingsearch. Consider,forexample,asimple-mindedstrategyfor findinganexittoamaze.Weassumesome Maze class,anda Position classthat representsapositioninthemaze.Fromanypositioninthemaze,youmaybeable tomoveinuptofourdifferentdirections(representedbynumbers0–4,standing perhapsforthecompasspointsnorth,east,south,andwest).Theideaisthatwe leavebreadcrumbstomarkeachpositionwe’vealreadyvisited.Fromeachposition wevisit,wetrysteppingineachofthepossibledirectionsandcontinuingfromthat point.Ifwefindthatwehavealreadyvisitedaposition,orwe runoutofdirections togofromsomeposition,we backtrack tothelastpositionwevisitedbeforethat andcontinuewiththedirectionswehaven’ttriedyetfromthatpreviousposition,

4.4.SPECIALIZEDLISTS 79
}
} }

CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS

stoppingwhenwegettoanexit(seeFigure4.10).Asaprogram (usingmethod namesthatIhopearesuggestive),wecanwritethisintwoequivalentways.First, recursively:

/**FindanexitfromMstartingfromPLACE.*/ voidfindExit(MazeM,Positionplace){

if(M.isAnExit(place))

M.exitAt(place);

if(!M.isMarkedAsVisited(place)){

M.markAsVisited(place);

for(dir=0;dir<4;dir+=1)

if(M.isLegalToMove(place,dir)) findExit(M,place.move(dir));

Second,aniterativeversion:

importucb.util.Stack; importucb.util.ArrayStack;

/**FindanexitfromMstartingfromPLACE.*/ voidfindExit(MazeM,Positionplace0){

Stack<Position>toDo=newArrayStack<Position>(); toDo.push(place0);

while(!toDo.isEmpty()){

Positionplace=toDo.pop();

if(M.isAnExit(place))

M.exitAt(place);

if(!M.isMarkedAsVisited(place)){ M.markAsVisited(place); for(dir=3;dir>=0;dir-=1)

if(M.isLegalToMove(place,dir)) toDo.push(place.move(dir));

where ArrayStack isanimplementationof ucb.util.Stack (see §4.5).

Theideabehindtheiterativeversionof findExit isthatthe toDo stackkeeps trackofthevaluesof place thatappearasargumentsto findExit intherecursive version.Bothversionsvisitthesamepositionsinthesameorder(whichiswhy thelooprunsbackwardsintheiterativeversion).Ineffect, the toDo playsthe roleofthe callstack intherecursiveversion.Indeed,typicalimplementations of recursiveproceduresalsouseastackforthispurpose,althoughitisinvisibletothe programmer.

80
} }
} } }

4.5.STACK,QUEUE,ANDDEQUEIMPLEMENTATION 81

Figure4.10: Exampleofsearchingamazeusingbacktrackingsearch(the findExit procedurefromthetext).Westartinthelower-leftcorner. Theexitisthedark squareontheright.Thelightlyshadedsquaresarethosevisitedbythealgorithm, assumingthatdirection0isup,1isright,2isdown,and3isleft.Thenumbersin thesquaresshowtheorderinwhichthealgorithmfirstvisits them.

4.4.2FIFOandDouble-EndedQueues

Afirst-in,first-outqueueiswhatweusuallymeanby queue ininformalEnglish(or line inAmericanEnglish):peopleorthingsjoinaqueueatoneend,andleaveitat theother,sothatthefirsttoarrive(or enqueue)arethefirsttoleave(or dequeue).

Queuesappearextensivelyinprograms,wheretheycanrepresentsuchthingsas sequencesofrequeststhatneedservicing.TheJavalibrary (asofJava2,version

1.5)providesastandardFIFOqueueinterface,butitisintendedspecificallyfor usesinwhichaprogrammighthavetowaitforanelementtoget addedtothe queue.Figure4.11showsamore“classic”possibleinterface.

The deque, whichisthemostgeneral,double-endedqueue,probablyseesrather littleexplicituseinprograms.Itusesevenmoreofthe List interfacethandoesthe FIFOqueue,andsotheneedtospecializeisnotparticularly acute.Nevertheless, forcompleteness,IhaveincludedapossibleinterfaceinFigure4.12.

4.5Stack,Queue,andDequeImplementation

Wecouldimplementaconcretestackclassforour ucb.util.Stack interfaceas inFigure4.13:asanextensionof ArrayList justas java.util.Stack isanextensionof java.util.Vector.Asyoucansee,thenamesofthe Stack interface methodsaresuchthatwecansimplyinheritimplementations of size, isEmpty, and lastIndexOf from ArrayList

Butlet’sinsteadspiceupourimplementationof ArrayStack withalittlegeneralization.Figure4.14illustratesaninterestingkindofclassknownasan adapter or wrapper (anotherofthe designpatterns introducedatthebeginningofChapter3). Theclass StackAdapter showntherewillmakeany List objectlooklikeastack. Thefigurealsoshowsanexampleofusingittomakeaconcretestackrepresentation outofthe ArrayList class.

Likewise,givenanyimplementationofthe List interface,wecaneasilyprovide implementationsof Queue or Deque,butthereisacatch.Botharray-basedand linked-list-basedimplementationsof List willsupportour Stack interfaceequally well,giving push and pop methodsthatoperateinconstantamortizedtime.However,usingan ArrayList inthesamena¨ıvefashiontoimplementeitherofthe

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS

packageucb.util;

/**AFIFOqueue*/ publicinterfaceQueue<T>{

/**TrueiffTHISisempty.*/ booleanisEmpty();

/**Numberofitemsinthequeue.*/ intsize();

/**Thefirstiteminsertedinthestackandnotyetremoved. *Requires!isEmpty().*/ Tfirst();

/**Removeandreturnthefirstitem.Requires!isEmpty().*/ TremoveFirst();

/**AddXasthelastitemofTHIS.*/ voidadd(Tx);

/**Theindexofthefirst(least-recentlyinserted)itemthatis *.equaltoX,or-1ifitisnotpresent.Item0isfirst.*/ intindexOf(Objectx);

/**Theindexofthelast(most-recentlyinserted)itemthat is *.equaltoX,or-1ifitisnotpresent.Item0isfirst.*/ intlastIndexOf(Objectx);

/**Adouble-endedqueue*/ publicinterfaceDeque<T>extendsQueue<T>{

/**Thelastinsertediteminthesequence.Assumes!isEmpty().*/ Tlast();

/**InsertXatthebeginningofthesequence.*/ voidaddFirst(Tx);

/**Removethelastitemfromthesequence.Assumes!isEmpty().*/ TremoveLast();

/*PlusinheriteddefinitionsofisEmpty,size,first,add, *removeFirst,indexOf,andlastIndexOf*/

82
}
Figure4.11:ApossibleFIFO(FirstIn,FirstOut)queueinterface. packageucb.util;
}
Figure4.12:ApossibleDeque(double-endedqueue)interface

publicclassArrayStack<T>

extendsjava.util.ArrayList<T>implementsStack<T> {

/**AnemptyStack.*/ publicArrayStack(){}

publicTtop(){check();returnget(size()-1);}

publicTpop(){check();returnremove(size()-1);}

publicvoidpush(Tx){add(x);}

privatevoidcheck(){

if(empty())thrownewEmptyStackException();

packageucb.util; importjava.util.*;

publicclassStackAdapter<T>implementsStack<T>{ publicStackAdapter(List<T>rep){this.rep=rep;}

publicbooleanisEmpty(){returnrep.isEmpty();} publicintsize(){returnrep.size();}

publicTtop(){returnrep.get(rep.size()-1);}

publicTpop(){returnrep.remove(rep.size()-1);}

publicvoidpush(Tx){rep.add(x);}

publicintlastIndexOf(Objectx){returnrep.lastIndexOf ();} }

publicclassArrayStackextendsStackAdapter{ publicArrayStack(){this(newArrayList());}

4.5.STACK,QUEUE,ANDDEQUEIMPLEMENTATION 83
}
}
Figure4.13:Animplementationof ArrayStack asanextensionof ArrayList
}
Figure4.14:AnadapterclassthatmakesanyListlooklikea Stack,andanexampleofusingittocreateanarray-basedimplementationof the ucb.util.Stack interface.

CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS

Queue or Deque interfacegivesverypoorperformance.Theproblemisobvious:as we’veseen,wecanaddorremovefromtheend(highindex)ofan arrayquickly, butremovingfromtheother(index0)endrequiresmovingoveralltheelementsof thearray,whichtakestimeΘ(N ),where N isthesizeofthequeue.Ofcourse,we cansimplystickto LinkedLists,whichdon’thavethisproblem,butthereisalso aclevertrickthatmakesitpossibletorepresentgeneralqueuesefficientlywithan array.

Insteadofshiftingovertheitemsofaqueuewhenweremovethefirst,let’s insteadjustchangeourideaofwhereinthearraythequeue starts. Wekeeptwo indicesintothearray,onepointingtothefirstenqueueditem,andonetothelast. Thesetwoindices“chaseeachother”aroundthearray,circlingbacktoindex0 whentheypassthehigh-indexend,andvice-versa.Suchanarrangementisknown asa circularbuffer. Figure4.15illustratestherepresentation.Figure4.16shows partofapossibleimplementation.

84

Figure4.15: Circular-bufferrepresentationofadequewithN==7.Part(a)shows aninitialemptydeque.Inpart(b),we’veinsertedfouritemsattheend.Part(c) showstheresultofremovingthefirstitem.Part(d)showsthe fulldequeresulting fromaddingfouritemstothefront.Removingthelastthreeitemsgives(e),and afterremovingonemorewehave(f).Finally,removingtherestoftheitemsfrom theendgivestheemptydequeshownin(g).

a. first last b. B C D E first last c. C D E first last d. F C D E I H G first last e. F I H G first last f. I H G first last g. lastfirst
4.5.STACK,QUEUE,ANDDEQUEIMPLEMENTATION 85

classArrayDeque<T>implementsDeque<T>{ /**AnemptyDeque.*/ publicArrayDeque(intN){ first=0;last=N;size=0;

publicintsize(){ returnsize;

publicbooleanisEmpty(){ returnsize==0;

publicTfirst(){ returndata.get(first);

publicTlast(){ returndata.get(last);

86
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
}
}
}
}
}
Figure4.16:Implementationof Deque interfaceusingacircularbuffer.

publicvoidadd(Tx){ size+=1; resize();

last=(last+1==data.size())?0:last+1; data.put(last,x);

}

publicvoidaddFirst(Tx){ size+=1; resize();

first=(first==0)?data.size()-1:first-1; data.put(first,x); }

publicTremoveLast(){ Tval=last();

last=(last==0)?data.size()-1:last-1; returnval;

}

publicTremoveFirst(){ Tval=first();

first=(first+1==data.size())?0:first+1; returnval;

}

privateintfirst,last; privatefinalArrayList<T>data=newArrayList<T>(); privateintsize;

/**InsurethatDATAhasatleastsizeelements.*/ privatevoidresize(){ lefttothereader } etc.

Figure4.16,continued.

4.5.STACK,QUEUE,ANDDEQUEIMPLEMENTATION 87
}

Exercises

4.1. Implementatype Deque,asanextensionof java.util.Vector.Tothe operationsrequiredby java.util.AbstractList,add first, last, insertFirst, insertLast, removeFirst, removeLast,anddothisinsuchawaythatallthe operationson Vector continuetowork(e.g., get(0) continuestogetthesame elementas first()),andsuchthattheamortizedcostforalltheseoperations remainsconstant.

4.2. Implementatypeof List withtheconstructor: publicConcatList(List<T>L0,List<T>L1){...}

thatdoesnotsupporttheoptionaloperationsforaddingand removingobjects,but givesa view oftheconcatenationof L0 and L1.Thatis, get(i) onsuchalistgives element i intheconcatenationof L0 and L1 atthetimeofthe get operation(that is,changestothelistsreferencedby L0 and L1 arereflectedintheconcatenated list).Besurealsotomake iterator and listIterator work.

4.3. Asinglylinkedliststructurecanbecircular.Thatis,some elementinthelist canhaveatail(next)fieldthatpointstoanitem earlier inthelist(notnecessarily tothefirstelementinthelist).Comeupwithawaytodetectwhetherthereissuch acircularitysomewhereinalist.Do not, however,useanydestructiveoperations onanydatastructure.Thatis,youcan’tuseadditionalarrays,lists, Vectors,hash tables,oranythinglikethemtokeeptrackofitemsinthelist.Usejustsimple listpointerswithoutchanginganyfieldsofanylist.See CList.java inthe hw5 directory.

4.4. Theimplementationsof LinkedList inFigure4.6and LinkedList.LinkedIter inFigure4.7donotprovidecheckingforconcurrentmodificationoftheunderlying list.Asaresult,acodefragmentsuchas

for(ListIterator<Object>i=L.listIterator();i.hasNext();){ if(bad(i.next()))

L.remove(i.previousIndex());

canhaveunexpectedeffects.Whatissupposedtohappen,accordingtothespecificationfor LinkedList,isthat i becomesinvalidassoonasyoucall L.remove,and subsequentcallsonmethodsof i willthrow ConcurrentModificationExceptions

a.Forthe LinkedList class,whatgoeswrongwiththeloopabove,andwhy?

b.Modifyour LinkedList implementationtoperformthecheckforconcurrent modification(sothattheloopabovethrows ConcurrentModificationException).

88
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
}
4.5. Devisea DequeAdapter classanalogousto StackAdapter,thatallowsoneto createdeques(orqueues)fromarbitrary List objects.

4.6. Provideanimplementationforthe resize methodof ArrayDeque (Figure4.16). Yourmethodshoulddoublethesizeofthe ArrayList beingusedtorepresentthe circularbufferifexpansionisneeded.Becareful!Youhavetodomorethansimply increasethesizeofthearray,ortherepresentationwillbreak.

89
4.5.STACK,QUEUE,ANDDEQUEIMPLEMENTATION
90 CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS

Chapter5

Trees

Inthischapter,we’lltakeabreakfromthedefinitionofinterfacestolibrariesand lookatoneofthebasicdata-structuringtoolsusedinrepresentingsearchablecollectionsofobjects,expressions,andotherhierarchicalstructures,the tree.Theterm tree referstoseveraldifferentvariantsofwhatwewilllatercallaconnected,acyclic, undirectedgraph.Fornow,though,let’s not callitthatandinsteadconcentrate ontwovarietiesof rootedtree. First,

Definition: an orderedtree consistsof

A.A node 1 ,whichmaycontainapieceofdataknownasa label.Dependingontheapplication,anodemaystandforanynumberof thingsandthedatalabelingitmaybearbitrarilyelaborate.The nodepartofatreeisknownasits rootnode or root

B.Asequenceof0ormoretrees,whoserootnodesareknownasthe children oftheroot.Eachnodeinatreeisthechildofatmost onenode—its parent.Thechildrenofanynodeare siblings ofeach other2

Thenumberofchildrenofanodeisknownasthe degree ofthatnode.A nodewithnochildreniscalleda leaf(node),externalnode, or terminal node; allothernodesarecalled internal or non-terminal nodes.

Weusuallythinkoftherebeingconnectionscalled edges betweeneachnodeand itschildren,andoftenspeakof traversing or following anedgefromparenttochild orback.Startingatanynode, r,thereisaunique,non-repeating path orsequence ofedgesleadingfrom r toanyothernode, n,inthetreewith r asroot.Allnodes alongthatpath,including r and n,arecalled descendents of r,and ancestors of n Adescendentof r isa properdescendent ifitisnot r itself; properancestors are definedanalogously.Anynodeinatreeistherootofa subtree ofthattree.Again, a propersubtree ofatreeisonethatnotequalto(andisthereforesmallerthan)

1 Theterm vertex mayalsobeused,asitiswithothergraphs,but node istraditionalfortrees.

2 Theword father hasbeenusedinthepastfor parent and son for child. Ofcourse,thisisno longerconsideredquiteproper,althoughIdon’tbelievethatthefreedomtoliveone’slifeasatree waseveranofficialgoalofthewomen’smovement.

91

Thedistancefromanode, n,totheroot, r,ofatree—thenumberofedgesthat mustbefollowedtogetfrom n to r—isthe level (or depth)ofthatnodeinthetree. Themaximumlevelofallnodesinatreeiscalledthe height ofthetree.Thesum ofthelevelsofallnodesinatreeisthe pathlength ofthetree.Wealsodefineof the internal(external)pathlength asthesumofthelevelsofallinternal(external) nodes.Figure5.1illustratesthesedefinitions.Allthelevelsshowninthatfigure arerelativetonode0.Italsomakessensetotalkabout“thelevelofnode7inthe treerootedatnode1,”whichwouldbe2.

Ifyoulookcloselyatthedefinitionoforderedtree,you’llseethatithastohave atleastonenode,sothatthereisnosuchthingasanemptyorderedtree.Thus, childnumber j ofanodewith k>j childrenisalwaysnon-empty.Itiseasyenough tochangethedefinitiontoallowforemptytrees:

Definition: A positionaltree iseither

A.Empty(missing),or

B.Anode(generallylabeled)and,foreverynon-negativeinteger, j, apositionaltree—the jth child.

Thedegreeofanodeisthenumberofnon-emptychildren.Ifallnodes inatreehavechildrenonlyinpositions <k,wesayitisa k-arytree. Leafnodesarethosewithnonon-emptychildren;allothersareinternal nodes.

Perhapsthemostimportantkindofpositionaltreeisthe binarytree,inwhich k =2.Forbinarytrees,wegenerallyrefertochild0andchild1 asthe left and right children,respectively.

A full k-arytree isoneinwhichallinternalnodesexceptpossiblytherightmost bottomonehavedegree k.Atreeis complete ifitisfullandallitsleafnodesoccur lastwhenreadtoptobottom,lefttoright,asinFigure5.2c. Completebinarytrees areofinterestbecausetheyare,insomesense,maximally“bushy”;foranygiven numberofinternalnodes,theyminimizetheinternalpathlengthofthetree,which

92 CHAPTER5.TREES 0 1 2 3 4 5 6 7 8 9 height=3 Level0Pathlength:18 Level1Externalpathlength:12 Level2Internalpathlength:6 Level3
Figure5.1: Anillustrativeorderedtree.Theleafnodesaresquares.As istradtional, thetree“grows”downward. thetree.Anysetofdisjointtrees(suchasthetreesrootedatallthechildrenofa node)iscalleda forest

isinterestingbecauseitisproportionaltothetotaltimerequiredtoperformthe operationofmovingfromtheroottoaninternalnodeoncefor eachinternalnode inthetree.

5.1Expressiontrees

Treesaregenerallyinterestingwhenoneistryingtorepresentarecursively-defined type.Afamiliarexampleisthe expressiontree, whichrepresentsanexpression recursivelydefinedas

• Anidentifierorconstant,or

• Anoperator(whichstandsforsomefunctionof k arguments)and k expressions (whichareitsoperands).

Giventhisdefinition,expressionsareconvenientlyrepresentedbytreeswhoseinternalnodescontainoperatorsandwhoseexternalnodescontainidentifiersorconstants.Figure5.3showsarepresentationoftheexpression x*(y+3)-z

Asanillustrationofhowonedealswithtrees,considertheevaluationofexpressions.Asoftenhappens,thedefinitionofthevaluedenotedbyanexpression correspondscloselytothestructureofanexpression:

5.1.EXPRESSIONTREES 93 0 1 3 4 6 2 5 (a) 0 1 3 4 7 8 2 5 6 (b) 0 1 3 7 8 4 9 10 2 5 6 (c)
Figure5.2:Aforestofbinarytrees:(a)isnotfull;(b)isfull,butnotcomplete; (c)iscomplete.Tree(c)wouldstillbecompleteifnode10weremissing,butnotif node9weremissing.
* x + y 3 z
Figure5.3:Anexpressiontreefor x*(y+3)-z.

• Thevalueofaconstantisthevalueitdenotesasanumeral.Thevalueofa variableisitscurrently-definedvalue.

• Thevalueofanexpressionconsistingofanoperatorandoperandexpressions istheresultofapplyingtheoperatortothevaluesoftheoperandexpressions.

Thisdefinitionimmediatelysuggestsaprogram.

/**ThevaluecurrentlydenotedbytheexpressionE(given *currentvaluesofanyvariables).AssumesErepresents *avalidexpressiontree,andthatallvariables *containedinithavevalues.*/ staticinteval(TreeE)

if(E.isConstant()) returnE.valueOf(); elseif(E.isVar())

returncurrentValueOf(E.variableName()); else returnperform(E.operator(), eval(E.left()),eval(E.right()));

Here,weposittheexistenceofadefinitionof Tree thatprovidesoperatorsfor detectingwhether E isaleafrepresentingaconstantorvariable,forextractingthe data—values,variablenames,oroperatornames—storedat E,andforfindingthe leftandrightchildrenof E (forinternalnodes).Weassumealsothat perform takes anoperatorname(e.g., "+")andtwointegervalues,andperformstheindicated computationonthoseintegers.Thecorrectnessofthisprogramfollowsimmediately byusinginductiononthestructureoftrees(thetreesrootedatanode’schildren arealwayssubtreesofthenode’stree),andbyobservingthe matchbetweenthe definitionofthevalueofanexpressionandtheprogram.

5.2Basictreeprimitives

Thereareanumberofpossiblesetsofoperationsonemightdefineontrees,just astherewereforsequences.Figure5.4showsonepossibleclass(assuminginteger labels).Typically,onlysomeoftheoperationsshownwould actuallybeprovidedin agivenapplication.Forbinarytrees,wecanbeabitmorespecialized,asshownin Figure5.5.Inpractice,wedon’toftendefine BinaryTree asanextensionof Tree, butI’vedonesoherejustasanillustration.

94
CHAPTER5.TREES
{
}

5.2.BASICTREEPRIMITIVES 95

/**ApositionaltreewithlabelsoftypeT.Theemptytreeisnull.*/ classTree<T>{

/**AleafnodewithgivenLABEL*/ publicTree(Tlabel)...

/**Aninternalnodewithgivenlabel,andKemptychildren*/ publicTree(Tlabel,intk)...

/**Thelabelofthisnode.*/ publicTlabel()...

/**Thenumberofnon-emptychildrenofthisnode.*/ publicintdegree()...

/**Numberofchildren(argumentKtotheconstructor)*/ publicintnumChildren()...

/**ChildnumberKofthis.*/ publicTree<T>child(intk)...

/**SetchildnumberKofthistoC,0<=K<numChildren(). *Cmustnotalreadybeinthistree,orvice-versa.*/ publicvoidsetChild(intk,Tree<T>C)...

classBinaryTree<T>extendsTree<T>{ publicBinaryTree(Tlabel, BinaryTree<T>left,BinaryTree<T>right){ super(label,2); setChild(0,left);setChild(1,right);

publicBinaryTree<T>left(){return(BinaryTree)child(0);} publicvoidsetLeft(BinaryTree<T>C){setChild(0,C);} publicBinaryTree<T>right(){return(BinaryTree)child(1);} publicvoidsetRight(BinaryTree<T>C){setChild(1,C);} }

}
Figure5.4:Aclassrepresentingpositionaltreenodes.
}
Figure5.5:Apossibleclassrepresentingbinarytrees.

Theoperationssofarallassume“root-down”processingofthetree,inwhichitis necessarytoproceedfromparenttochild.Whenitismoreappropriatetogothe otherway,thefollowingoperationsareusefulasanadditionto(orsubstitutefor) fortheconstructorsand child methodsof Tree.

/**TheparentofT,ifany(otherwisenull).*/ publicTree<T>parent()...

/**Setsparent()toP.*/ publicvoidsetParent(Tree<T>P);

/**AleafnodewithlabelLandparentP*/ publicTree(TL,Tree<T>P);

5.3Representingtrees

Asusual,therepresentationoneusesforatreedependsinlargepartupontheuses onehasforit.

5.3.1Root-downpointer-basedbinarytrees

Fordoingthetraversalsonbinarytreesdescribedbelow(§5.4),astraightforward transcriptionoftherecursivedefinitionisoftenappropriate,sothatthefieldsare

TL;/*Datastoredatnode.*/ BinaryTree<T>left,right;/*Leftandrightchildren*/

AsIsaidaboutthesampledefinitionof BinaryTree,thisspecializedrepresentation isinpracticemorecommonthansimplyre-usingtheimplementationof Tree.Ifthe parent operationistobesupported,ofcourse,wecanaddanadditionalpointer:

BinaryTree<T>parent;//orTree,asappopriate

5.3.2Root-downpointer-basedorderedtrees

Thefieldsusedfor BinaryTree arealsousefulforcertainnon-binarytrees,thanks tothe leftmost-child,right-sibling representation.Assumethatwearerepresenting anorderedtreeinwhicheachinternalnodemayhaveanynumberofchildren.We canhave left foranynodepointtochild#0ofthenodeandhave right pointto thenextsiblingofthenode(ifany),illustratedinFigure5.6.

Asmallexamplemightbeinorder.Considertheproblemofcomputingthe sumofallthenodevaluesinatreewhosenodescontainintegers(forwhichwe’ll usethelibraryclass Integer,sinceourlabelshavetobe Objects).Thesumofall nodesinatreeisthesumofthevalueintherootplusthesumof thevaluesinall children.Wecanwritethisasfollows:

96 CHAPTER5.TREES

Figure5.6:Usingabinarytreerepresentationtorepresent anorderedtreeofarbitrarydegree.ThetreerepresentedistheonefromFigure5.1.Left(down)links pointtothefirstchildofanode,andrightlinkstothenextsibling.

/**ThesumofthevaluesofallnodesofT,assumingTisan orderedtreewithnomissingchildren.*/ staticinttreeSum(Tree<Integer>T)

{ intS; S=T.label(); for(inti=0;i<T.degree();i+=1) S+=treeSum(T.child(i)); returnS;

(Java’sunboxingoperationsaresilentlyatworkhereturning Integer labelsinto ints.)

Aninterestingsidelightisthattheinductiveproofofthis programcontainsno obviousbasecase.Theprogramaboveisnearlyadirecttranscriptionof“thesum ofthevalueintherootplusthesumofthevaluesinallchildren.”

5.3.3Leaf-uprepresentation

Forapplicationswhere parent istheimportantoperation,and child isnot,a differentrepresentationisuseful.

Tlabel;

Tree<T>parent;/*Parentofcurrentnode*/

Here, child isanimpossibleoperation.

Therepresentationhasaratherinterestingadvantage:ituseslessspace;thereis onefewerpointersineachnode.Ifyouthinkaboutit,thismightatfirstseemodd, sinceeachrepresentationrequiresonepointerperedge.Thedifferenceisthatthe “parental”representationdoesnotneednullpointersinalltheexternalnodes,only intherootnode.Wewillseeapplicationsofthisrepresentationinlaterchapters.

5.3.REPRESENTINGTREES 97 0 1 2 3 4 5 6 9 7 8
}

5.3.4Arrayrepresentationsofcompletetrees

Whenatreeiscomplete,thereisaparticularlycompactrepresentationusingan array.ConsiderthecompletetreeinFigure5.2c.Theparent ofanynodenumbered k> 0inthatfigureisnodenumber ⌊(k 1)/2⌋ (orinJava, (k-1)/2 or (k-1)>>1); theleftchildofnode k is2k +1andtherightis2k +2.Hadwenumberedthenodes from1insteadof0,theseformulaewouldhavebeenevensimpler: ⌊k/2⌋ forthe parent,2k fortheleftchild,and2k +1fortheright.Asaresult,wecanrepresent suchcompletetreesasarrayscontainingjustthe label information,usingindices intothearrayaspointers.Boththeparentandchildoperationsbecomesimple. Ofcourse,onemustbecarefultomaintainthecompletenessproperty,orgapswill developinthearray(indeed,forcertainincompletetrees, itcanrequireanarray with2h 1elementstorepresentatreewith h nodes).

Unfortunately,theheadersneededtoeffectthisrepresentationdifferslightly fromtheonesabove,sinceaccessinganelementofatreerepresentedbyanarray inthiswayrequiresthreepiecesofinformation—anarray,anupperbound,andan index—ratherthanjustasinglepointer.Inaddition,wepresumablywantroutines forallocatingspaceforanewtree,specifyinginadvanceitssize.Hereisanexample, withsomeofthebodiessuppliedaswell.

/**ABinaryTree2<T>isanentirebinarytreewithlabelsoftypeT. Thenodesinitaredenotedbytheirdepth-firstnumberin acompletetree.*/

classBinaryTree2<T>{ protectedT[]label; protectedintsize;

/**AnewBinaryTree2withroomforNlabels.*/ publicBinaryTree2(intN){ label=(T[])newObject[N];size=0; }

publicintcurrentSize(){returnsize;} publicintmaxSize(){returnlabel.length;}

/**ThelabelofnodeKinbreadth-firstorder. *Assumes0<=k<size.*/ publicTlabel(intk){returnlabel[k];}

/**Causelabel(K)tobeVAL.*/ publicvoidsetLabel(intk,Tval){label[k]=val;}

publicintleft(intk){return2*k+1;} publicintright(intk){return2*k+2;} publicintparent(intk){return(k-1)/2;} Continues...

98
CHAPTER5.TREES

Figure5.7:ThebinarytreeinFigure5.2c,representedwith anarray.Thelabels ofthenodeshappentoequaltheirbreadth-firstpositions.Inthatfigure,nodes7 and8aretheleftandrightchildrenofnode3.Therefore,theirlabelsappearat positions7(3 · 2+1)and8(3 · 2+2).

Continuationof BinaryTree2<T>:

/**Addonemorenodetothetree,thenextinbreadth-first *order.AssumescurrentSize()<maxSize().*/ publicvoidextend(Tlabel){ this.label[size]=label;size+=1;

Wewillseethisarrayrepresentationlater,whenwedealwithheapdatastructures. TherepresentationofthebinarytreeinFigure5.2cisillustratedinFigure5.7.

5.3.5Alternativerepresentationsofemptytrees

Inourrepresentations,theemptytreetendstohaveaspecialstatus.Forexample,wecanformulateamethodtoaccesstheleftnodeofatree, T,withthe syntax T.left(),butwecan’twriteamethodsuchthat T.isEmpty() istrueiff T referencestheemptytree.Instead,wemustwrite T==null.Thereason,of course,isthatwerepresenttheemptytreewiththenullpointer,andnoinstance methodsaredefinedonthenullpointer’sdynamictype(moreconcretely,wegeta NullPointerException ifwetry).Ifthenulltreewererepresentedbyanordinary objectpointer,itwouldn’tneedaspecialstatus.

Forexample,wecouldextendourdefinitionof Tree fromthebeginningof §5.2 asfollows:

classTree<T>{

publicfinalTree<T>EMPTY=newEmptyTree<T>();

/**TrueiffTHISistheemptytree.*/ publicbooleanisEmpty(){returnfalse;}

privatestaticclassEmptyTree<T>extendsTree<T>{

/**Theemptytree*/ privateEmptyTree(){}

publicbooleanisEmpty(){returntrue;} publicintdegree(){return0;} publicintnumChildren(){return0;} /**Thekthchild(alwaysanerror).*/

publicTree<T>child(intk){

5.3.REPRESENTINGTREES 99 0 1 2 3 4 5 6 7 8 9
10
} }

thrownewIndexOutOfBoundsException(); } /**ThelabelofTHIS(alwaysanerror).*/ publicTlabel(){

thrownewIllegalStateException();

Thereisonlyoneemptytree(guaranteedbecausethe EmptyTree classisprivate tothe Tree class,anexampleoftheSingletondesignpattern),butthis treeisa full-fledgedobject,andwewillhavelessneedtomakespecialtestsfornulltoavoid exceptions.We’llbeextendingthisrepresentationfurtherinthediscussionsoftree traversals(see §5.4.2).

5.4Treetraversals.

Thefunction eval in §5.1 traverses (or walks)itsargument—thatis,itprocesses eachnodeinthetree.Traversalsareclassifiedbytheorderinwhichtheyprocess thenodesofatree.Intheprogram eval,wefirsttraverse(i.e.,evaluateinthis case)anychildrenofanode,andthenperformsomeprocessingontheresults ofthesetraversalsandotherdatainthenode.Thelatterprocessingisknown genericallyas visiting thenode.Thus,thepatternfor eval is“traversethechildren ofthenode,thenvisitthenode,”anorderknownas postorder.Onecouldalso usepostordertraversalforprintingouttheexpressiontreeinreversePolishform, wherevisitinganodemeansprintingitscontents(thetreeinFigure5.3wouldcome outas“xy3+*z-).Iftheprimaryprocessingforeachnode(the“visitation”) occurs before thatofthechildren,givingthepattern“visitthenode,thentraverse itschildren”,wegetwhatisknownas preordertraversal.Finally,thenodesin Figures5.1and5.2areallnumberedin levelorder or breadth-firstorder,inwhich anodesatagivenlevelofthetreearevisitedbeforeanynodesatthenext.

Allofthetraversalorderssofarmakesenseforanykindoftreewe’veconsidered. Thereisoneotherstandardtraversalorderingthatapplies exclusivelytobinary trees:the inorder or symmetric traversal.Here,thepatternis“traversetheleft childofthenode,visitthenode,andthentraversetheright child.”Inthecase ofexpressiontrees,forexample,suchanorderwouldreproducetherepresented expressionininfixorder.Actually,that’snotquiteaccurate,becausetogetthe expressionproperlyparenthesized,thepreciseoperation wouldhavetobesomething like“writealeftparenthesis,thentraversetheleftchild,thenwritetheoperator, thentraversetherightchild,thenwritearightparenthesis,”inwhichthenode seemstobevisitedseveraltimes.However,althoughsuchexampleshaveledtoat leastoneattempttointroduceamoregeneralnotationfortraversals3 ,weusually

3Forexample,Wulf,Shaw,Hilfinger,andFlonusedsuchaclassificationschemein Fundamental StructuresofComputerScience (Addison-Wesley,1980).Underthatsystem,apreordertraversal isNLR(forvisitNode,traverseLeft,traverseRight),postorderisLRN,inorderisLNR,andthe

100 CHAPTER5.TREES
} } }

justclassifythemapproximatelyinoneofthecategoriesdescribedaboveandleave itatthat.

5.4.1Generalizedvisitation

I’vebeendeliberatelyvagueaboutwhat“visiting”means,sincetreetraversalis ageneralconceptthatisnotspecifictoanyparticularactionatthetreenodes. Infact,itispossibletowriteageneraldefinitionoftraversalthattakesasaparametertheactiontobe“visitedupon”eachnodeofthetree. Inlanguagesthat, likeScheme,havefunctionclosures,wesimplymakethevisitationparameterbea functionparameter,asin

;;VisitthenodesofTREE,applyingVISITtoeachininorder

(defineinorder-walk(treevisit)

(if(not(null?tree))

(begin(inorder-walk(lefttree)visit) (visittree)

(inorder-walk(righttree)visit))))

sothatprintingallthenodesofatree,forexampleisjust

(inorder-walkmyTree(lambda(x)(display(labelx))(newline)))

Ratherthanusingfunctions,inJavaweuseobjects(asforthe java.util.Comparator interfacein §2.2.4).Forexample,wecandefineinterfacessuchas

publicinterfaceTreeVisitor<T>{ voidvisit(Tree<T>node);

publicinterfaceBinaryTreeVisitor<T>{ voidvisit(BinaryTree<T>node);

trueorderforwritingparenthesizedexpressionsisNLNRN. Thisnomenclatureseemsnottohave caughton.

5.4.TREETRAVERSALS. 101 6 3 0 2 1 5 4 Postorder 0 1 2 3 4 5 6 Preorder 4 1 0 3 2 5 6 inorder
Figure5.8:Orderofnodevisitationinpostorder,preorder,andinordertraversals. Figure5.8illustratesthenodesofseveralbinarytreesnumberedintheorder theywouldbevisitedbypreorder,inorder,andpostordertreetraversals.
}
}

The inorder-walk procedureabovebecomes static<T>BinaryTreeVisitor<T>

inorderWalk(BinaryTree<T>tree, BinaryTreeVisitor<T>visitor)

if(tree!=null){

inorderWalk(tree.left(),visitor); visitor.visit(tree);

inorderWalk(tree.right(),visitor); } returnvisitor; }

andoursamplecallis

inorderWalk(myTree,newPrintNode());

where myTree is,let’ssay,a BinaryTree<String> andwehavedefined

classPrintNodeimplementsBinaryTreeVisitor<String>{ publicvoidvisit(BinaryTree<String>node){ System.out.println(node.label()); } }

Clearly,the PrintNode classcouldalsobeusedwithotherkindsoftraverals.Alternatively,wecanleavethevisitoranonymous,asitwasin theoriginalScheme program:

inorderWalk(myTree, newBinaryTreeVisitor<String>(){

publicvoidvisit(BinaryTree<String>node){ System.out.println(node.label()); } });

Thegeneralideaofencapsulatinganoperationaswe’vedone hereandthen carryingittoeachiteminacollectionisanotherdesignpatternknownsimplyas Visitor.

102
CHAPTER5.TREES
{

Byaddingstatetoavisitor,wecanuseitto accumulate results:

/**ATreeVisitorthatconcatenatesthelabelsofallnodesit *visits.*/

publicclassConcatNodeimplementsBinaryTreeVisitor<String>{ privateStringBufferresult=newStringBuffer(); publicvoidvisit(BinaryTree<String>node){ if(result.length()>0)

result.append(",");

result.append(node.label());

publicStringtoString(){returnresult.toString();}

Withthisdefinition,wecanprintacomma-separatedlistoftheitemsin myTree in inorder:

System.out.println(inorderWalk(myTree,newConcatNode ())); (ThisexampleillustrateswhyIhad inorderWalk returnitsvisitorargument.I suggestthatyougothroughallthedetailsofwhythisexampleworks.)

5.4.2Visitingemptytrees

Idefinedthe inorderWalk methodof §5.4.1tobeastatic(class)methodrather thananinstancemethodinparttomakethehandlingofnulltreesclean.Ifweuse thealternativeempty-treerepresentationof §5.3.5,ontheotherhand,wecanavoid special-casingthenulltreeandmaketraversalmethodsbepartofthe Tree class. Forexample,hereisapossiblepreorder-walkmethod:

classTree<T>{

publicTreeVisitor<T>preorderWalk(TreeVisitor<T>visitor){ visitor.visit(this);

for(inti=0;i<numChildren();i+=1) child(i).preorderWalk(visitor); returnvisitor;

privatestaticclassEmptyTree<T>extendsTree<T>{

publicTreeVisitor<T>preorderWalk(TreeVisitor<T>visitor){ returnvisitor;

Hereyouseethattherearenoexplicittestsfortheemptytreeatall;everythingis implicitinwhichofthetwoversionsof preorderWalk getcalled.

5.4.TREETRAVERSALS. 103
}
}
} ···
} } }

importjava.util.Stack; publicclassPreorderIterator<T>implementsIterator<T> { privateStack<BinaryTree<T>>toDo=newStack<BinaryTree<T>>();

/**AnIteratorthatreturnsthelabelsofTREEin *preorder.*/

publicPreorderIterator(BinaryTree<T>tree){ if(tree!=null)

toDo.push(tree);

publicbooleanhasNext(){ return!toDo.empty(); }

publicTnext(){ if(toDo.empty())

thrownewNoSuchElementException(); BinaryTree<T>node=toDo.pop(); if(node.right()!=null) toDo.push(node.right()); if(node.left()!=null) toDo.push(node.left()); returnnode.label();

publicvoidremove(){ thrownewUnsupportedOperationException();

5.4.3Iteratorsontrees

Recursionfitstreedatastructuresperfectly,sincetheyarethemselvesrecursively defineddatastructures.Thetaskofprovidinganon-recursivetraversalofatree usinganiterator,ontheotherhand,israthermoretroublesomethanwasthecase forsequences.

Onepossibleapproachistouseastackandsimplytransformtherecursive structureofatraversalinthesamemannerweshowedforthe findExit procedure in §4.4.1.Wemightgetaniteratorlikethatfor BinaryTrees showninFigure5.9. Anotheralternativeistouseatreedatastructurewithparentlinks,asshown forbinarytreesinFigure5.10.Asyoucansee,thisimplementationkeepstrackof thenextnodetobevisited(inpostorder)inthefield next.Itfindsthenodetovisit after next bylookingattheparent,anddecidingwhattodobasedonwhether next istheleftorrightchildofitsparent.Sincethisiteratordoespostordertraversal, thenodeafter next is next’sparentif next isarightchild,andotherwiseitisthe

104
CHAPTER5.TREES
}
}
} }
Figure5.9:Aniteratorforpreorderbinary-treetraversal usingastacktokeeptrack oftherecursivestructure.

deepest,leftmostdescendentoftherightchildoftheparent.

Exercises

5.1. ImplementanIteratorthatenumeratesthelabelsofatree’s nodesininorder, usingastackasinFigure5.9.

5.2. ImplementanIteratorthatenumeratesthelabelsofatree’s nodesininorder, usingparentlinksasinFigure5.10.

5.3. ImplementapreorderIteratorthatoperatesonthegeneraltype Tree (rather than BinaryTree).

5.4.TREETRAVERSALS. 105

importjava.util.Stack; publicclassPostorderIterator<T>implementsIterator<T>{ privateBinaryTree<T>next;

/**AnIteratorthatreturnsthelabelsofTREEin *postorder.*/

publicPostorderIterator(BinaryTree<T>tree){ next=tree;

while(next!=null&&next.left()!=null) next=next.left(); }

publicbooleanhasNext(){ returnnext!=null;

publicTnext(){ if(next==null)

thrownewNoSuchElementException(); Tresult=next.label(); BinaryTree<T>p=next.parent(); if(p.right()==next ||p.right()==null)

//Havejustfinishedwiththerightchildofp. next=p; else{ next=p.right();

while(next!=null&&next.left()!=null) next=next.left();

} returnresult;

publicvoidremove(){

thrownewUnsupportedOperationException();

106 CHAPTER5.TREES
}
}
} }
Figure5.10:Aniteratorforpostorderbinary-treetraversalusing parentlinks inthe treetokeeptrackoftherecursivestructure.

Chapter6 SearchTrees

Aratherimportantuseoftreesisinsearching.Thetaskisto findoutwhether sometargetvalueispresentinadatastructurethatrepresentsasetofdata,and possiblytoreturnsomeauxiliaryinformationassociatedwiththatvalue.Inall thesesearches,weperformanumberofstepsuntilweeitherfindthevaluewe’re lookingfor,orexhaustthepossibilities.Ateachstep,weeliminatesomepartofthe remainingsetfromfurtherconsideration.Inthecaseoflinearsearches(see §1.3.1), weeliminateoneitemateachstep.Inthecaseofbinarysearches(see §1.3.4),we eliminatehalftheremainingdataateachstep.

Theproblemwithbinarysearchisthatthesetofsearchitems isdifficultto change;addinganewitem,unlessitislargerthanallexistingdata,requiresthat wemovesomeportionofthearrayovertomakeroomforthenewitem.Theworstcasecostofthisoperationrisesproportionatelywiththesizeoftheset.Changing thearraytoalistsolvestheinsertionproblem,butthecrucialoperationofabinary search—findingthemiddleofasectionofthearray,becomesexpensive.

Enterthetree.Let’ssupposethatwehaveasetofdatavalues,thatwecan extractfromeachdatavaluea key, andthatthesetofpossiblekeysis totally ordered—thatis,wecanalwayssaythatonekeyiseitherlessthan,greaterthan, orequaltoanother.Whatthesemeanexactlydependsonthekindofdata,but thetermsaresupposedtobesuggestive.Wecanapproximatebinarysearchby havingthesedatavaluesserveasthelabelsofa binarysearchtree (or BST ),which isdefinedtobebinarytreehavingthefollowingproperty:

Binary-Search-TreeProperty. Foreverynode, x,ofthetree,all nodesintheleftsubtreeof x havekeysthatarelessthanorequalto thekeyof x andallnodesintherightsubtreeof x havekeysthatare greaterthanorequaltothekeyof x. ✷

Figure6.1aisanexampleofatypicalBST.Inthatexample,thelabelsareintegers, thekeysarethesameasthelabels,andtheterms“lessthan,” “greaterthan,”and “equalto”havetheirusualmeanings.

Thekeysdon’thavetobeintegers.Ingeneral,wecanorganizeasetofvalues intoaBSTusingany totalordering onthekeys.Atotalordering,let’scallit‘ ’, hasthefollowingproperties:

107

• Completeness: Foranyvalues x and y,either x y or y x,orboth;

• Transitivity: If x y and y z,then x z,and

• Anti-symmetry: If x y and y x,then x = y.

Forexample,thekeyscanbeintegers,andgreaterthan,etc.,canhavetheirusual meanings.Orthedataandkeyscanbestrings,withtheorderingbeingdictionary order.Orthedatacanbepairs,(a,b),andthekeyscanbethefirstitemsofthe pairs.Adictionaryislikethat—itisorderedbythewordsbeingdefined,regardless oftheirmeanings.Thislastorderisanexamplewhereonemightexpecttohave severaldistinctitemsinthesearchtreewithequalkeys.

AnimportantpropertyofBSTs,whichfollowsimmediatelyfromtheirdefinition, isthattraversingaBSTininordervisitsitsnodesinascendingorderoftheirlabels. Thisleadstoasimplealgorithmforsortingknownas“treesort.”

/**PermutetheelementsofAintonon-decreasingorder.Assumes *theelementsofAhaveanorderonthem.*/ staticvoidsort(SomeType[]A){ inti; BSTT; T=null; for(i=0;i<A.length;i+=1){

insert A[i] intosearchtree T. } i=0; traverse T ininorder,wherevisitinganode, Q,means A[i]=Q.label();i+=1;

108
CHAPTER6.SEARCHTREES
42 19 16 25 30 60 50 91 (a) 16 19 25 ... (b)
Figure6.1:Twobinarysearchtrees.Tree(b)isright-leaninglineartree.
}

Thearraycontainselementsoftype SomeType,bywhichIintendtodenoteatype thathasaless-thanandequalsoperatorsonit,asrequiredbythedefinitionofa BST.

6.1OperationsonaBST

ABSTissimplyabinarytree,andthereforewecanusetherepresentationfrom §5.2,givingtheclassinFigure6.2.Fornow,Iwillusethetype int forlabels,and we’llassumethatlabelsarethesameaskeys.

Sinceitispossibletohavemorethanoneinstanceofalabelinthisparticular versionofbinarysearchtree,Ihavetospecifycarefullywhatitmeanstoremovethat labelortofindanodethatcontainsit.Ihavechosenheretochoosethe“highest” nodecontainingthelabel—theonenearesttheroot.[Whywillthisalwaysbe unique?Thatis,whycan’ttherebetwohighestnodescontainingalabel,equally neartheroot?]

OneproblematicfeatureofthisparticularBSTdefinitionis thatthedatastructureisrelativelyunprotected.Asthecommenton insert indicates,itispossibleto “break”aBSTbyinsertingsomethinginjudiciousintooneof itschildren,aswith BST.insert(T.left(),42),when T.label() is20.Whenweincorporatetherepresentationintoafull-fledgedimplementationof SortedSet (see §6.2),we’llprotect itagainstsuchabuse.

6.1.1SearchingaBST

SearchingaBSTisverysimilartobinarysearchinanarray,withtherootofthe treecorrespondingtothemiddleofthearray.

/**ThehighestnodeinTthatcontainsthe *labelL,ornullifthereisnone.*/ publicstaticBSTfind(BSTT,intL)

if(T==null||L==T.label)

returnT; elseif(L<T.label)

returnfind(T.left,L); elsereturnfind(T.right,L);

6.1.2InsertingintoaBST

Aspromised,theadvantageofusingatreeisthatitisrelativelycheaptoaddthings toit,asinthefollowingroutine.

6.1.OPERATIONSONABST 109
{
}

/**Abinarysearchtree.*/ classBST{ protectedintlabel; protectedBSTleft,right;

/**AleafnodewithgivenLABEL*/ publicBST(intlabel){this(label,null,null);}

/**Fetchthelabelofthisnode.*/ publicintlabel();

/**Fetchtheleft(right)childofthis.*/ publicBSTleft()... publicBSTright()...

/**ThehighestnodeinTthatcontainsthe *labelL,ornullifthereisnone.*/ publicstaticBSTfind(BSTT,intL)...

/**TrueifflabelLisinT.*/ publicstaticbooleanisIn(BSTT,intL) {returnfind(T,L)!=null;}

/**InsertthelabelLintoT,returningthemodifiedtree. *Thenodesoftheoriginaltreemaybemodified.If *TisasubtreeofalargerBST,T’,theninsertioninto *TwillrenderT’invalidduetoviolationofthebinary*search-treepropertyifL>T’.label()andTisin *T’.left()orL<T’.label()andTisinT’.right().*/ publicstaticBSTinsert(BSTT,intL)...

/**DeletetheinstanceoflabelLfromTthatisclosestto *totherootandreturnthemodifiedtree.Thenodesof *theoriginaltreemaybemodified.*/ publicstaticBSTremove(BSTT,intL)...

/*ThisconstructorisprivatetoforceallBSTcreation *tobedonebytheinsertmethod.*/ privateBST(intlabel,BSTleft,BSTright){ this.label=label;this.left=left;this.right=right; }

110
CHAPTER6.SEARCHTREES
}
Figure6.2:ABSTrepresentation.

6.1.OPERATIONSONABST 111

/**InsertthelabelLintoT,returningthemodifiedtree. *Thenodesoftheoriginaltreemaybemodified....*/ staticBSTinsert(BSTT,intL)

if(T==null)

returnnewBST(L,null,null); if(L<T.label)

T.left=insert(T.left,L); else

T.right=insert(T.right,L); return T;

BecauseoftheparticularwaythatIhavewrittenthis,whenI insertmultiplecopies ofavalueintothetree,theyalwaysgo“totheright”ofallexistingcopies.Iwill preservethispropertyinthedeleteoperation.

6.1.3DeletingitemsfromaBST.

Deletionisquiteabitmorecomplex,sincewhenoneremovesaninternalnode,one can’tjustletitschildrenfalloff,butmustre-attachthemsomewhereinthetree. Obviously,deletionofanexternalnodeiseasy;justreplaceitwiththenulltree (seeFigure6.3(a)).It’salsoeasytoremoveaninternalnodethatismissingone child—justhavetheotherchildcommitpatricideandmoveup (Figure6.3(b)).

Whenneitherchildisempty,wecanfindthe successor ofthenodewewantto remove—thefirstnodeintherighttree,whenitistraversedininorder.Nowthat nodewillcontainthesmallestkeyintherightsubtree.Furthermore,becauseitis thefirstnodeininorder,itsleftchildwillbenull[why?].Therefore,wecanreplace thatnodewithitsrightchildandmoveitskeytothenodeweareremoving,as showninFigure6.3(c).

ApossiblesetofsubprogramsfordeletionfromaBSTappears inFigure6.4. Theauxiliaryroutine swapSmallest isanadditionalmethodprivateto BST,and definedasfollows.

{
}
42 19 16 25 60 50 91 42 19 16 30 60 50 91 50 19 16 25 30 60 91 remove30remove25remove42
Figure6.3:Threepossibledeletions,eachstartingfromthetreeinFigure6.1.

/**DeletetheinstanceoflabelLfromTthatisclosestto *totherootandreturnthemodifiedtree.Thenodesof *theoriginaltreemaybemodified.*/ publicstaticBSTremove(BSTT,intL){ if(T==null) returnnull; if(L<T.label)

T.left=remove(T.left,L); elseif(L>T.label)

T.right=remove(T.right,L); //Otherwise,we’vefoundL elseif(T.left==null) returnT.right; elseif(T.right==null) returnT.left;

else

T.right=swapSmallest(T.right,T); returnT;

/**MovethelabelfromthefirstnodeinT(inaninorder *traversal)tonodeR(over-writingthecurrentlabelofR), *removethefirstnodeofTfromT,andreturntheresultingtree.

privatestaticBSTswapSmallest(BSTT,BSTR){ if(T.left==null){

R.label=T.label; returnT.right;

T.left=swapSmallest(T.left,R); returnT;

112 CHAPTER6.SEARCHTREES
}
*/
}else{
} }
Figure6.4:RemovingitemsfromaBSTwithoutparentpointers.

staticBSTinsert(BSTT,intL){ BSTnewNode; if(T==null)

returnnewBST(L,null,null); if(L<T.label)

T.left=newNode=insert(T.left,L); else

T.right=newNode=insert(T.right,L); newNode.parent=T; return T; }

6.1.4Operationswithparentpointers

IfwerevisetheBSTclasstoprovidea parent operation,andaddacorresponding parent fieldtotherepresentation,theoperationsbecomemorecomplex,butprovide abitmoreflexibility.Itisprobablywise not toprovidea setParent operationfor BST,sinceitisparticularlyeasytodestroythebinary-search-treepropertywiththis operation,andaclientof BST wouldbeunlikelytoneeditinanycase,giventhe existenceof insert and remove operations.

Theoperation find operationisunaffected,sinceitignoresparentnodes.When insertingina BST,ontheotherhand,lifeiscomplicatedbythefactthat insert mustsettheparentofanynodeinserted.Figure6.5showsone way.Finally,removal froma BST withparentpointers—showninFigure6.6—istrickiestofall,asusual.

6.1.5Degeneracystrikes

Unfortunately,allisnotroses.ThetreeinFigure6.1(b)is theresultofinserting nodesintoatreeinascendingorder(obviously,thesametreecanresultfromappropriatedeletionsfromalargertreeaswell).Youshouldbeabletoseethatdoinga searchorinsertiononthistreeisjustlikedoingasearchor insertiononalinkedlist; it is alinkedlist,butwithextrapointersineachelementthatarealwaysnull.This treeisnot balanced:itcontainssubtreesinwhichleftandrightchildrenhavemuch differentheights.WewillreturntothisquestioninChapter 9,afterdevelopinga bitmoremachinery.

6.2ImplementingtheSortedSetinterface

ThestandardJavalibraryinterface SortedSet (see §2.2.4)providesakindof Collection thatsupports rangequeries. Thatis,aprogramcanusetheinterface tofindallitemsinacollectionthatarewithinacertainrangeofvalues,according tosomeorderingrelation.Searchingforasinglespecificvalueissimplyaspecial caseinwhichtherangecontainsjustonevalue.Itisfairlyeasytoimplementthis

6.2.IMPLEMENTINGTHESORTEDSETINTERFACE 113
Figure6.5:InsertionintoaBSTthathasparentpointers.

/**DeletetheinstanceoflabelLfromTthatisclosestto *totherootandreturnthemodifiedtree.Thenodesof *theoriginaltreemaybemodified.*/ publicstaticBSTremove(BSTT,intL){ if(T==null)

returnnull; BSTnewChild; newChild=null;result=T; if(L<T.label)

T.left=newChild=remove(T.left,L); elseif(L>T.label)

T.right=newChild=remove(T.right,L); //Otherwise,we’vefoundL elseif(T.left==null)

returnT.right; elseif(T.right==null)

returnT.left; else

T.right=newChild=swapSmallest(T.right,T); if(newChild!=null)

newChild.parent=T; returnT;

privatestaticBSTswapSmallest(BSTT,BSTR){ if(T.left==null){ R.label=T.label; returnT.right; }else{

T.left=swapSmallest(T.left,R); if(T.left!=null)

T.left.parent=T; returnT;

114 CHAPTER6.SEARCHTREES
}
} }
Figure6.6:RemovingitemsfromaBSTwithparentpointers.

interfaceusingabinarysearchtreeastherepresentation; we’llcalltheresulta BSTSet.

Let’splanaheadalittle.Amongtheoperationswe’llhaveto supportare headSet, tailSet,and subSet,whichreturnviewsofsomeunderlyingsetthatconsistofasubrangeofthatset.Thevaluesreturnedwillbefull-fledged SortedSets intheirownright,modificationstowhicharesupposedtomodifytheunderlying setaswell(andvice-versa).Sinceafull-fledgedsetcanalsobethoughtofasaview ofarangeinwhichtheboundsare“infinitelysmall”to“infinitelylarge,”wemight lookforarepresentationthatsupports both setscreated“fresh”fromaconstructor, andthosethatareviewsofothersets.Thissuggestsarepresentationforourset thatcontainsapointertotherootofa BST,andtwoboundsindicatingthelargest andsmallestmembersoftheset,withnullindicatingamissingbound.

Wemaketherootofthe BST a(permanent)sentinelnodeforanimportant reason.Wewillusethesametreeforallviewsoftheset.Ifourrepresentation simplypointedatarootofthetreethatcontaineddata,then thispointerwould havetochangewheneverthatnodeofthetreewasremoved.But then,wewould havetomakesuretoupdatetherootpointerinallotherviews ofthesetaswell, sincetheyarealsosupposedtoreflectchangesintheset.Byintroducingthesentinel node,sharedbyallviewsandneverdeleted,wemaketheproblemofkeepingthem alluptodatetrivial.Thisisatypicalexampleoftheoldcomputer-sciencemaxim: Mosttechnicalproblemscanbesolvedbyintroducinganotherlevelofindirection.

Assumingweuseparentpointers,aniteratorthroughasetcanconsistofa pointertothenextnodewhoselabelistobereturned,apointertothelastnode whoselabelwasreturned(forimplementing remove)andapointertothe BSTSet beingiteratedover(convenientlyprovidedinJavabymakingtheiteratoraninner class).Theiteratorwillproceedininorder,skippingover portionsofthetreethat areoutsidetheboundsontheset.SeealsoExercise5.2concerningiteratingusing a parent pointer.

Figure6.8illustratesa BSTSet,showingthemajorelementsoftherepresentation:theoriginalset,the BST thatcontainsitsdata,aviewofthesameset,and aniteratoroverthisview.Thesetsallcontainspacefora Comparator (see §2.2.4) toallowtheuserofthesettospecifyanordering;inFigure6.8,weusethenaturalordering,whichonstringsgivesuslexicographicalorder.Figure6.7containsa sketchofthecorrespondingJavadeclarationsfortherepresentation.

6.3OrthogonalRangeQueries

Binarysearchtreesdividedata(ideally)intohalves,usingalinearorderingonthe data.Thedivide-and-conqueridea,however,doesnotrequirethatthefactorbetwo. Supposewearedealingwithkeysthathavemorestructure.Forexample,considera collectionofitemsthathavelocationson,say,sometwo-dimensionalarea.Insome cases,wemaywishtofinditemsinthiscollectionbasedontheirlocation;their keysaretheirlocations.Whileitis possible toimposealinearorderingonsuch keys,itisnotterriblyuseful.Forexample,wecouldusealexicographicordering, anddefine(x0,y0) > (x1,y1)iff x0 >x1 or

6.3.ORTHOGONALRANGEQUERIES 115
x
= x1
y0 >y1
0
and
.However,with

publicclassBSTSet<T>extendsAbstractSet<T>{

/**Theemptyset,usingCOMPastheordering.*/ publicBSTSet(Comparator<T>comp){ comparator=comp; low=high=null; sent=newBST();

}

/**Theemptyset,usingnaturalordering.*/ publicBSTSet(){this(null);}

/**ThesetinitializedtothecontentsofC,withnaturalorder.*/ publicBSTSet(Collection<?extendsT>c){addAll(c);}

/**ThesetinitializedtothecontentsofS,sameordering.*/ publicBSTSet(SortedSet<?extendsT>s){ this(s.comparator());addAll(c);

} /**Valueofcomparator();nullifnaturallyordered.*/ privateComparator<T>comp;

/**Boundsonelementsinthisclass,nullifnobounds.*/ privateTlow,high;

/**SentinelofBSTcontainingdata.*/ privatefinalBST<T>sent;

116 CHAPTER6.SEARCHTREES
Figure6.7:Javarepresentationfor BSTSet class,showingonlyconstructorsand instancevariables.

/**Usedinternallytoformviews.*/ privateBSTSet(BSTSet<T>set,Tlow,Thigh){ comparator=set.comparator(); this.low=low;this.high=high; this.sent=set.sent;

/**AniteratoroverBSTSet.*/ privateclassBSTIter<T>implementsIterator<T>{

/**Nextnodeiniterationtoyield.Equalsthesentinelnode *whendone.*/

BST<T>next;

/**Nodelastreturnedbynext(),ornullifnone,orifremove() *hasintervened.*/

BST<T>last;

BSTIter(){ last=null;

next= firstnodethatisinbounds,or sent ifnone;

/**AnodeintheBST*/ privatestaticclassBST<T>{ Tlabel;

BST<T>left,right,parent;

/**Asentinelnode*/

BST(){label=null;parent=null;}

BST(Tlabel,BST<T>left,BST<T>right){ this.label=label;this.left=left;this.right=right;

6.3.ORTHOGONALRANGEQUERIES 117
}
} ··· }
} } }
Figure6.7,continued:Privatenestedclassesusedinimplementation

BSTSet.this: last: next:

I:

hartebeest dog

axolotl elk duck

fauna: subset:

elephant gnu

Figure6.8: A BSTSet, fauna,aview, subset,formedfrom fauna.subSet("dog", "gnu"),andaniterator, I,over subset.The BST partoftherepresentationis sharedbetween fauna and subset.Trianglesrepresentwholesubtrees,androunded rectanglesrepresentindividualnodes.Eachsetcontainsa pointertotherootofthe BST (asentinelnode,whoselabelisconsideredlargerthananyvalueinthetree),plus lowerandupperboundsonthevalues(nullmeansunbounded), anda Comparator (inthiscase,null,indicatingnaturalorder).Theiteratorcontainsapointerto subset,whichitisiteratingover,apointer(next)tothenodecontainingthenext labelinsequence(“duck”)andanotherpointer(last)tothenodecontainingthe labelinthesequencethatwaslastdeliveredby I.next().Thedashedregionsof the BST areskippedentirelybytheiterator.The“hartebeest”node isnotreturned bytheiterator,buttheiteratordoeshavetopassthroughit togetdowntothe nodesitdoesreturn.

118 CHAPTER6.SEARCHTREES
∞ sentinel

6.4.PRIORITYQUEUESANDHEAPS 119

thatdefinition,thesetofallobjectsbetweenpoints A and B consistsofallthose objectswhosehorizontalpositionliesbetweenthoseof A and B,butwhosevertical positionisarbitrary(alongverticalstrip).Halftheinformationisunused.

Theterm quadtree (or quadtree)referstoaclassofsearchtreestructurethat betterexploitstwo-dimensionallocationdata.Eachstepofasearchdividesthe remainingdataintofourgroups,oneforeachoffourquadrantsofarectangleabout someinteriorpoint.Thisinteriordividingpointcanbethe center(sothatthe quadrantsareequal)givinga PRquadtree (alsocalleda point-regionquadtree or just regionquadtree),oritcanbeoneofthepointsthatisstoredinthetree,giving a pointquadtree.

Figure6.9illustratestheideabehindthetwotypesofquadtree.Eachnodeof thetreecorrespondstoarectangularregion(possiblyinfiniteinthecaseofpoint quadtrees).Anyregionmaybesubdividedintofourrectangularsubregionsto thenorthwest,northeast,southeast,andsouthwestofsome interiordividingpoint. Thesesubregionsarerepresentedbychildrenofthetreenodethatcorresponds tothedividingpoint.ForPRquadtrees,thesedividingpointsarethecentersof rectangles,whileforpointquadtrees,theyareselectedfromthedatapoints,just asthedividingvaluesinabinarysearchtreeareselectedfromthedatastoredin thetree.

6.4Priorityqueuesandheaps

Supposethatwearefacedwithadifferentproblem.Insteadof beingabletosearch quicklyforthepresenceof any elementinaset,letusrestrictourselvestosearching forthe largest (byflippingeverythinginthefollowingdiscussionaroundinthe obviousway,wecansearchforsmallestelementsinstead).Findingthelargestina BSTisreasonablyeasy[how?],butwestillhavetodealwiththeimbalanceproblem describedabove.Byrestrictingourselvestotheoperationsofinsertinganelement, andfindinganddeletingthelargestelement,wecanavoidthe balancingproblem easily.Adatastructuresupportingjustthoseoperationsiscalleda priorityqueue, becauseweremoveitemsfromitintheorderoftheirvalues,regardlessofarrival order.

InJava,wecouldsimplymakeaclassthatimplements SortedSet andthatwas particularlyfastattheoperations first and remove(x),when x happenstobethe firstelementoftheset.Butofcourse,theuserofsuchaclass mightbesurprisedto findhowslowitistoiteratethroughanentireset.Therefore,wemightspecialize abit,asshowninFigure6.10.

Aconvenientdatastructureforrepresentingpriorityqueuesisthe heap (notto beconfusedwiththelargeareaofstoragefromwhich new allocatesmemory,an unfortunatebuttraditionalclashofnomenclature).Aheap issimplyapositional tree(usuallybinary)satisfyingthefollowingproperty.

HeapProperty. Thelabelatanynodeinthetreeisgreaterthanor equaltothelabelofanydescendantofthatnode.

isafour-levelPRquadtree,usingsquareregionsforsimplicity.Belowisacorrespondingpointquadtree(therearemany,dependingonwhichpointsareused todividethedata).Ineach,theleftdiagramshowsthegeometry;thedotsrepresentthepositions—thekeys—ofsevendataitemsat(40, 30),( 30, 10),(20, 90), (30, 60),(10, 70),(70, 70),and(80, 20).Ontheright,weseethecorrespondingtreedatastructures.ForthePRquadtree,eachlevelofthetreecontainsnodes thatrepresentsquareswiththesamesizeofedge(shownattheright).Forthe pointquadtree,eachpointistherootofasubtreethatdividesarectangularregion intofour,generallyunequal,regions.Thefourchildrenof eachnoderepresentthe upper-left,upper-right,lower-left,andlower-rightquadrantsoftheircommonparentnode,respectively.Tosimplifythedrawing,wehavenot shownthechildrenof anodewhentheyareallempty.

120 CHAPTER6.SEARCHTREES -100 -75 -50 0 100 -10002550100 A • B • C • • D • E • F • G 0 A B G F E D C 200 100 50 25 -100 100 -100100 • A • B • C • D • E • F • G D B G E F A C
Figure6.9:Illustrationoftwokindsofquadtreeforthesamesetofdata.Ontop

interfacePriorityQueue<TextendsComparable<T>>{ /**InsertitemLintothisqueue.*/ publicvoidinsert(TL);

/**Trueiffthisqueueisempty.*/ publicbooleanisEmpty();

/**Thelargestelementinthisqueue.Assumes!isEmpty().*/ publicTfirst();

/**Removeandreturnaninstanceofthelargestelement(theremay *bemorethanone;removesonlyone).Assumes!isEmpty().*/ publicTremoveFirst();

Sincetheorderofthechildrenisimmaterial,thereismorefreedominhowtoarrange thevaluesintheheap,makingiteasytokeepaheapbushy.Accordingly,whenwe usetheunqualifiedterm“heap”inthiscontext,wewillmeana complete treewith theheapproperty.Thisspeedsupallthemanipulationsofheaps,sincethetime requiredtodoinsertionsanddeletionsisproportionaltotheheightoftheheap. Figure6.11illustratesatypicalheap.

Implementingtheoperationoffindingthelargestvalueisobviouslyeasy.To deletethelargestelement,whilekeepingboththeheappropertyandthebushiness ofthetree,wefirstmovethe“last”itemonthebottomlevelof theheaptothe rootofthetree,replacinganddeletingthelargestelement,andthen“reheapify”to re-establishtheheapproperty.Figure6.11b–dillustratestheprocess.Itistypical todothiswithabinarytreerepresentedasanarray,asinthe class BinaryTree2 of §5.3.Figure6.12givesapossibleimplementation.

Byrepeatedlyfindingthelargestelement,ofcourse,wecansortanarbitrary setofobjects:

/**SorttheelementsofAinascendingorder.*/ staticvoidheapSort(Integer[]A){

if(A.length<=1)

return;

Heap<Integer>H=newHeap<Integer>(A.length);

H.setHeap(A,0,A.length);

for(inti=A.length-1;i>=0;i-=1)

A[i]=H.removeFirst();

TheprocessisillustratedinFigure6.13.

6.4.PRIORITYQUEUESANDHEAPS 121
}
Figure6.10:Apossibleinterfacetopriorityqueues.
}

ofthelargestitem.Thelast(bottommost,rightmost)label isfirstmovedupto overwritethatoftheroot.Itisthen“sifteddown”untilthe heappropertyis restored.Theshadednodesshowwheretheheappropertyisviolatedduringthe process.

122 CHAPTER6.SEARCHTREES 2 60 30 42 5 4 −∞ (b) 60 2 30 42 5 4 −∞ (c) 60 42 30 2 5 4 −∞ (d) 91 60 30 42 5 4 2 (a)
Figure6.11: Illustrativeheap(a).Thesequence(b)–(d)showsstepsinthedeletion

classHeap<TextendsComparable<T>> extendsBinaryTree2<T>implementsPriorityQueue<T>{

/**AheapcontaininguptoN>0elements.*/ publicHeap(intN){super(N);}

/**Theminimumlabelvalue(written −∞).*/ staticfinalintMIN=Integer.MIN_VALUE;

/**InsertitemLintothisqueue.*/ publicvoidinsert(TL){ extend(L); reHeapifyUp(currentSize()-1);

/**Trueiffthisqueueisempty.*/ publicbooleanisEmpty(){returncurrentSize()==0;}

/**Thelargestelementinthisqueue.Assumes!isEmpty().*/ publicintfirst(){returnlabel(0);}

/**Removeandreturnaninstanceofthelargestelement(theremay *bemorethanone;removesonlyone).Assumes!isEmpty().*/ publicTremoveFirst(){ intresult=label(0); setLabel(0,label(currentSize()-1)); size-=1; reHeapifyDown(0); returnresult;

6.4.PRIORITYQUEUESANDHEAPS 123
}
}
Figure6.12:Implementationofacommonkindofpriorityqueue:theheap.

CHAPTER6.SEARCHTREES

/**Restoretheheappropertyinthistree,assumingthatonly *NODEmayhavealabellargerthanthatofitsparent.*/ protectedvoidreHeapifyUp(intnode){ if(node<=0) return; Tx=label(node); while(node!=0&&label(parent(node)).compareTo(x)<0){ setLabel(node,label(parent(node))); node=parent(node);

setLabel(node,x);

/**Restoretheheappropertyinthistree,assumingthatonly *NODEmayhavealabelsmallerthanthoseofitschildren.*/ protectedvoidreHeapifyDown(intnode){ Tx=label(node); while(true){

if(left(node)>=currentSize()) break;

intlargerChild= (right(node)>=currentSize() ||label(right(node)).compareTo(label(left(node)))<= 0) ?left(node):right(node);

if(x>=label(largerChild)) break; setLabel(node,label(largerChild)); node=largerChild;

setLabel(node,x);

/**SetthelabelsinthisHeaptoA[off],A[off+1],... *A[off+len-1].AssumesthatLEN<=maxSize().*/ publicvoidsetHeap(T[]A,intoff,intlen){ for(inti=0;i<len;i+=1) setLabel(i,A[off+i]); size=len; heapify();

/**Turnlabel(0)..label(size-1)intoaproperheap.*/ protectedvoidheapify(){...}

124
}
}
}
}
}
}
Figure6.12,continued.

(a);(b)istheresultof setHeap;(c)–(h)aretheresultsofsuccessiveiterations.Eachshowstheactivepart oftheheaparrayandtheportionoftheoutputarraythathasbeenset,separated byagap.

6.4.PRIORITYQUEUESANDHEAPS 125 (a) 19 0 -1 7 23 2 42 (b) 42 23 19 7 0 2 -1 (c) 23 7 19 -1 0 2 42 (d) 19 7 2 -1 0 23 42 (e) 7 0 2 -1 19 23 42 (f) 2 0 -1 7 19 23 42 (g) 0 -1 2 7 19 23 42 (h) -1 0 2 7 19 23 42
Figure6.13:Anexampleofheapsort.Theoriginalarrayisin

Wecouldsimplyimplement heapify likethis:

protectedvoidheapify()

for(inti=1;i<size;i+=1) reHeapifyUp(i);

Interestinglyenough,however,thisimplementationisnot quiteasfastasitcould be,anditisfastertoperformtheoperationbyadifferentmethod,inwhichwe workfromtheleavesbackup.Thatis,inreverselevelorder, weswapeachnode withitsparent,ifitislarger,andthen,asfor reHeapifyDown,continuemovingthe parent’svaluedownthetreeuntilheapnessisrestored.Itmightseemthatthisis nodifferentfromrepeatedinsertion,butwewillseelaterthatitis.

protectedvoidheapify()

for(inti=size/2-1;i>=0;i-=1) reHeapifyDown(i);

6.4.1HeapifyTime

Ifwemeasurethetimerequirementsforsorting N itemswith heapSort,wesee thatitisthecostof“heapifying” N elementsplusthetimerequiredtoextract N items.Theworst-casecostofextracting N itemsfromtheheap, Ce(N )isdominated bythecostof reHeapifyDown,startingatthetopnodeoftheheap.Ifwecount comparisonsofparentlabelsagainstchildlabels,youcanseethattheworst-case costhereisproportionaltothecurrentheightoftheheap.Supposetheinitial heightoftheheapis k (andthat N =2k+1 1).Itstaysthatwayuntil2k items havebeenextracted(removingthewholebottomrow),andthenbecomes k 1. Itstaysat k 1forthenext2k 1 items,andsoforth.Thus,thetotaltimespent extractingitemsis

126 CHAPTER6.SEARCHTREES
{
}
{
}
Ce(N )= Ce(2k+1 1)=2k k +2k 1 (k 1)+ ... +20 0 Ifwewrite2k · k as2k + +2k k andre-arrangetheterms,weget Ce(2k+1 1)=2k · k +2k 1 · (k 1)+ +20 · 0 =21 +22 + ··· +2k 1 +2k +22 + ··· +2k 1 +2k + . . . +2k 1 +2k +2k =(2k+1 2)+(2k+1 4)+ +(2k+1 2k 1)+(2k+1 2k) = k2k+1 (2k+1 2) ∈ Θ(k2k+1)=Θ(N lg N )

Nowlet’sconsiderthecostofheapifying N elements.Ifwedoitbyinsertingthe N elementsonebyoneandperforming reHeapifyUp,thenwegetacostlikethat oftheextracting N elements:Forthefirstinsertion,wedo0labelcomparisons; for thenext2,wedo1;forthenext4,wedo2;etc,or

where Cu h (N )istheworst-casecostofheapifying N elementsbyrepeated reHeapifyUps. Thisisthesameastheonewejustdid,giving

Butsupposeweheapifybyperformingthesecondalgorithmat theendof §6.4, performinga reHeapifyDown onalltheitemsofthearraystartingatitem ⌊N/2⌋−1 andgoingtowarditem0.Thecostof reHeapifyDown dependsonthedistanceto thedeepestlevel.Forthelast2k itemsintheheap,thiscostis0(whichiswhywe skipthem).Forthepreceding2k 1,thecostis1,etc.Thisgives

Sothissecondheapificationmethodrunsconsiderablyfaster(asymptotically) thantheobviousrepeated-insertionmethod.Ofcourse,sincethecostofextracting N elementsisstillΘ(N lg N )intheworstcase,theoverallworst-casecostof heapsortisstillΘ(N lg N ).However,thisdoesleadyoutoexpectthatforbig enough N ,therewillbesomeconstant-factoradvantagetousingthesecondform ofheapification,andthat’sanimportantpracticalconsideration.

6.5GameTrees

Considertheproblemoffindingthe best moveinatwo-persongamewithperfect information(i.e.,noelementofchance).Naively,youcoulddothisbyenumerating allpossiblemovesavailabletotheplayerwhoseturnitisfromthecurrentposition, somehowassignascoretoeach,andthenpickthemovewiththe highestscore. Forexample,youmightscoreapositionbycountingmaterial—bycomparingthe

6.5.GAMETREES 127
k+1 1)=20 0+21 1+ +2k
Cu h (2
k
N ) ∈ Θ(N lg N
Cu h (
)
Cd h(N )= Cd h(2k+1 1)=2k 1 1+2k 2 2+ +20 k Usingthesametrickasbefore, Cd h(2k+1 1)=2k 1 1+2k 2 2+ +20 k =20 +21 + ··· +2k 2 +2k 1 +20 +21 + ··· +2k 2 + . . . +20 =(2k 1)+(2k 1 1)+ ··· +(21 1) =2k+1 2 k ∈ Θ(N )

numberofyourpiecesagainstthoseofyouropponent.Butsuchascorewouldbe misleading.Amovemightgivemorepieces,butsetupadevastatingresponsefrom youropponent.So,foreachmove,youshouldalsoconsiderallyour opponent’s possiblemoves,assumehepicksthebestoneforhim,andusethatasthevalue.

Butwhatif you haveagreatresponsetohisresponse?Howcanweorganizethis searchsensibly?

Atypicalapproachistothinkofthespaceofpossiblecontinuationsofagame asatree,appropriatelyknownasa gametree..Eachnodeinthetreeisaposition inthegame;eachedgeisamove.Figure6.14illustratesthekindofthingwemean. Eachnodeisaposition;thechildrenofanodearethepossiblenextpositions.The numbersoneachnodearevaluesyouguessforthepositions(wherelargermeans betterforyou).Thequestionishowtogetthesenumbers.

Let’sconsidertheproblemrecursively.Giventhatitisyourmoveinacertain position,representedbynode P ,youpresumablywillchoosethemovethatgives youthebestscore;thatis,youwillchoosethechildof P withthemaximumscore. Therefore,itisreasonabletoassignthescoreofthatchild asthescoreof P itself.

Contrariwise,ifnode Q representsapositioninwhichitistheopponent’sturnto move,theopponentwillpresumablydobestbychoosingthechildof Q thatgives the minimum score(sinceminimumforyoumeansbestfortheopponent).Thus, theappropriatevaluetoassignto Q isthethatofthesmallestchild.Thenumbers ontheillustrativegametreeinFigure6.14conformtothisruleforassigningscores, whichisknownasthe minimaxalgorithm. Thestarrednodesinthediagramindicate whichnodes(andthereforemoves)youandyouropponentwouldconsidertobebest giventhesescores.

Thisprocedureexplainshowtoassignscorestoinnernodes, butitdoesn’thelp withtheleaves(thebasecaseoftherecursion).Ifourtreeiscompleteinthesense thateachleafnoderepresentsafinalpositioninthegame,it’seasytoassignleaf

128 CHAPTER6.SEARCHTREES -5 -5 -20 -5 15 -20 10 -30 -5 5 15 -20 -30 9 10 * * **** * Yourmove Opponent’smove Yourmove Opponent’smove
Figure6.14: Agametree.Nodesarepositions,edgesaremoves,andnumbersare scoresthatestimate“goodness”ofeachpositionforyou.Starsindicatewhichchild wouldbechosenfromthepositionaboveit.

values.Youcanchoosesomepositivevalueforpositionsinwhichyouhavewon, somenegativevalueforpositionsinwhichyouropponenthas won,andzeroforties. Withsuchatree,ifyouhavethefirstmove,thenyouknowthatyoucanforcea winiftherootnodehasapositivevalue(justchooseachildwiththatvalue),force atieifthetopnodehas0value(likewise),andthatyouwillalwayssufferdefeat (againstaperfectopponent)ifthetopnodehasanegativevalue.

However,formostinterestinggames,thegametreeistoobig eithertostoreor eventocompute,exceptneartheveryendofthegame.Sowecut offcomputation atsomepoint,eventhoughtheresultingleafpositionsarenotfinalpositions.Typically,wechooseamaximum depth, andusesomeheuristictocomputethevalue fortheleafbasedjustonthepositionitself(calleda staticvaluation).Aslight variationistouse iterativedeepening: repeatingthesearchatincreasingdepths untilwereachsometimelimit,andtakingthebestresultfoundsofar.

6.5.1Alpha-betapruning

Aswithanytreesearch,game-treesearchesareexponential inthedepthofthetree (thenumberofmoves(or ply)onelooksahead).Furthermore,gametreescanhave fairlysubstantial branchingfactors (thetermusedfortheaveragenumberofnext positions—children—ofanode).It’seasytoseethatifonehas16choicesforeach move,onewillnotbeabletolookverymanymovesahead.Wecan mitigatethis problemsomewhatby pruning thegametreeaswesearchit.

Onetechnique,knownas alpha-betapruning, isbasedonasimpleobservation: ifIhavealreadycalculatedthatmovingtoacertainposition, P ,willgetmea scoreofatleast α,andIhavepartiallyevaluatedsomeotherpossibleposition, Q,tothepointthatIknowitsvaluewillbe <α,thenIcanceaseanyfurther computationof Q (pruningitsunexploredbranches),knowingthatIwillnever chooseit.Likewise,whencomputingvaluesfortheopponent,ifIdeterminethata

6.5.GAMETREES 129 -5 -5 ≤-20 -5 ≥5 -20 -30 -5 5 -20 -30 * * * * * Yourmove Opponent’smove Yourmove Opponent’smove
Figure6.15: Alpha-betapruningappliedtothegametreefromFigure6.14.Missing subtreeshavebeenpruned.

/**AlegalmoveforWHOthateitherhasanestimatedvalue>=CUTOFF *orthathasthebestestimatedvalueforplayerWHO,startingfrom *positionSTART,andlookinguptoDEPTHmovesahead.*/ MovefindBestMove(Playerwho,Positionstart,intdepth,doublecutoff)

if(start isawonpositionfor who)returnWON_GAME;/*Value=∞ */ elseif(start isalostpositionfor who)returnLOST_GAME;/*Value=−∞ */ elseif(depth==0)returnguessBestMove(who,start,cutoff);

MovebestSoFar=Move.REALLY_BAD_MOVE; for(eachlegalmove, M, for who fromposition start){

Positionnext=start.makeMove(M);

/*Negatehereandbelowbecausebestforopponent=worstfor WHO*/ Moveresponse=findBestMove(who.opponent(),next, depth-1,-bestSoFar.value());

if(-response.value()>bestSoFar.value()){

Set M’svalueto -response.value(); bestSoFar=M; if(M.value()>=cutoff)break;

} returnbestSoFar;

/**Staticevaluationfunction.*/ MoveguessBestMove(Playerwho,Positionstart,doublecutoff)

MovebestSoFar;

bestSoFar=Move.REALLY_BAD_MOVE; for(eachlegalmove, M, for who fromposition start){

Positionnext=start.makeMove(M);

Set M’svaluetoheuristicguessofvalueto who of next; if(M.value()>bestSoFar.value()){ bestSoFar=M; if(M.value()>=cutoff) break;

returnbestSoFar;

130 CHAPTER6.SEARCHTREES
{
}
}
{
}
}
}
Figure6.16:Game-treesearchwithalpha-betapruning.

certainpositionwillyieldavaluenomorethan β (biggerscoresarebetterforme, worsefortheopponent),thenIcanstopcomputationonanyotherpositionforthe opponentwhosevalueisknowntobe >β.Thisobservationleadstothetechnique of alpha-betapruning.

Forexample,considerFigure6.15.Atthe‘≥ 5’position,Iknowthatthe opponentwillnotchoosetomovehere(sincehealreadyhasa 5move).Atthe ‘≤−20’position,myopponentknowsthatIwillneverchoosetomovehere(since Ialreadyhavea 5move).

Alpha-betapruningisbynomeanstheonlywaytospeedupsearchthrougha gametree.Muchmoresophisticatedsearchstrategiesarepossible,andarecovered inAIcourses.

6.5.2Agame-treesearchalgorithm

ThepseudocodeinFigure6.16summarizesthediscussioninthissection.Ifyou examinethefigure,you’llseethatthegametreewe’vebeentalkingaboutinthis sectionneveractuallymaterializes.Instead,we generate thechildrenofanode asweneedthem,andthrowthemawaywhennolongerneeded.Indeed,thereis notreedatastructurepresentatall;thetreesshowninFigures6.14and6.15are conceptual,orifyouprefer,theydescribe computations ratherthandatastructures.

Exercises

6.1. Fillinaconcreteimplementationforthetype QuadTree thathasthefollowing constructor:

/**Aninitiallyemptyquadtreethatisrestrictedtocontainpoints *withintheWxHrectanglewhosecenterisat(X0,Y0).*/ publicQuadTree(doublex0,doubley0,doublew,doubleh)..

andnootherconstructors.

6.2. Fillinaconcreteimplementationforthetype QuadTree thathasthefollowing constructor:

/**Aninitiallyemptyquadtree.*/ publicQuadTree()...

andnootherconstructors.Thisproblemismoredifficultthan theprecedingexercise,becausethereisno apriori limitontheboundariesoftheentireregion.While you could simplyusethemaximumandminimumfloating-pointnumbersforthese bounds,theresultwouldingeneralbeawastefultreestructurewithmanyuseless levels.Therefore,itmakessensetogrowtheregioncovered,asnecessary,starting fromsomearbitraryinitialsize.

6.5.GAMETREES 131

6.3. Supposethatweintroduceanewkindofremovaloperationfor BSTsthat haveparentpointers(see §6.1.4):

/**DeletethelabelT.label()fromtheBSTcontainingT,assuming *thattheparentofTisnotnull.Thenodesoftheoriginaltree *willbemodified.*/ publicstaticBSTremove(BSTT){ ··· }

Giveanimplementationofthisoperation.

6.4. Theimplementationof BSTSet in §6.2leftoutonedetail:theimplementation ofthe size method.Indeed,therepresentationgiveninFigure6.7providesnoway tocomputeitotherthantocountthenodesintheunderlying BST eachtime.Show howtoaugmenttherepresentationof BSTSet oritsnestedclassesasnecessaryso astoallowaconstant-timeimplementationof size.Rememberthatthesizeofany viewofa BSTSet mightchangewhenyouchangeaddorremoveelementsfromany otherviewofthesame BSTSet.

6.5. Assumethatwehaveaheapthatisstoredwiththelargestelementatthe root.Toprintallelementsofthisheapthataregreaterthan orequaltosomekey X,we could performthe removeFirst operationrepeatedlyuntilwegetsomething lessthan X,butthiswouldpresumablytakeworst-casetimeΘ(k lg N ),where N isthenumberofitemsintheheapand k isthenumberofitemsgreaterthanor equalto X.Furthermore,ofcourse,itchangestheheap.Showhowtoperformthis operationinΘ(k)time without modifyingtheheap.

132 CHAPTER6.SEARCHTREES

Chapter7 Hashing

Sortedarraysandbinarysearchtreesallallowfastqueries oftheform“isthere somethinglarger(smaller)than X inhere?”Heapsallowthequery“whatisthe largestiteminhere?”Sometimes,however,weareinterestedinknowingonly whethersomeitemispresent—inotherwords,onlyinequality.

Consideragainthe isIn procedurefrom §1.3.1—alinearsearchinasortedarray. Thisalgorithmrequiresanamountoftimeatleastproportionalto N ,thenumberof itemsstoredinthearraybeingsearched.Ifwecouldreduce N ,wewouldspeedup thealgorithm.Onewaytoreduce N istodividethesetofkeysbeingsearchedinto somenumber,say M ,ofdisjointsubsetsandtothenfindsomefastwayofchoosing therightsubset.Bydividingthekeysmore-or-lessevenlyamongsubsets,wecan reducethetimerequiredtofindsomethingtobeproportional,ontheaverage,to N/M .Thisiswhatbinarysearchdoesrecursively(isInB from §1.3.4),with M =2. Ifwecouldgoevenfurtherandchooseavaluefor M thatiscomparableto N ,then thetimerequiredtofindakeybecomesalmostconstant.

Theproblemistofindaway—preferablyfast—ofpickingsubsets(bins)inwhich toputkeystobesearchedfor.Thismethodmustbeconsistent,sincewheneverwe areaskedtosearchforsomething,wemustgotothesubsetweoriginallyselected forit.Thatis,theremustbeafunction—knownasa hashingfunction—thatmaps keystobesearchedforintotherangeofvalues0to M 1.

7.1Chaining

Oncewehavethishashingfunction,wemustalsohavearepresentationoftheset ofbins.Perhapsthesimplestschemeistouselinkedliststo representthebins,a practiceknownas chaining inthehash-tableliterature.ThestandardJavalibrary class HashSet usesjustsuchastrategy,illustratedinFigure7.1.Moreusually,hash tablesappearasmappings,suchasimplementationsofthestandardJavainterface java.util.Map.Therepresentationisthesame,exceptthattheentriesinthebins carrynotonlykeys,butalsotheadditionalinformationthatissupposedtobe indexedbythosekeys.Figure7.2showspartofapossibleimplementationofthe standardJavaclass java.util.HashMap,whichisitselfanimplementationofthe

133

Map interface.

The HashMap classshowninFigure7.2usesthe hashCode methoddefinedforall JavaObjectstoselectabinnumberforanykey.Ifthishashfunctionisagoodone thebinswillreceiveroughlyequalnumbersofitems(see §7.3formorediscussion). Wecandecideonan apriori limitontheaveragenumberofitemsperbin, andthengrowthetablewheneverthatlimitisexceeded.This isthepurposeof the loadFactor fieldandconstructorargument.It’snaturaltoaskwhetherwe mightusea“faster”datastructure(suchasabinarysearchtree)forthebins. However,ifwereallydochoosereasonablevaluesforthesizeofthetree,sothat eachbincontainsonlyafewitems,thatclearlywon’tgainus much.Growingthe bins arraywhenitexceedsourchosenlimitislikegrowingan ArrayList (§4.1). Forgoodasymptotictimeperformance,weroughlydoubleits sizeeachtimeit becomesnecessarytogrowthetable.Wehavetoremember,inaddition,thatthe binnumbersofmostitemswillchange,sothatwe’llhavetomovethem.

7.2Open-addresshashing

IntheGoodOldDays,theoverheadof“allthoselinkfields”andtheexpenseof “allthose new operations”ledpeopletoconsiderwaysofavoidinglinkedlistsfor representingthecontentsofbins.The open-addressing schemesputtheentries directlyintothebins(oneperbin).Ifabinisalreadyfull, thensubsequententries thathavethesamehashvalueoverflowintoother,unusedentriesaccordingto somesystematicscheme.Asaresult,the put operationfromFigure7.2wouldlook somethinglikethis:

publicValput(Keykey,Valvalue){ inth=hash(key);

while(bins.get(h)!=null&&!bins.get(h).key.equals(key)) h=nextProbe(h);

if(bins.get(h)==null){ bins.add(newentry ); size+=1;

if((float)size/bins.size()>loadFactor) resize bins; returnnull; }else

returnbins.get(h).setValue(value);

and get wouldbesimilarlymodified.

Thefunction nextProbe providesanothervalueintheindexrangeof bins for usewhenitturnsoutthatthetableisalreadyoccupiedatposition h (asituation knownasa collision).

Inthesimplestcase nextProbe(L) simplyreturns (h+1)%bins.size(),an instanceofwhatiscalledknown linearprobing.Moregenerally,linearprobing

134 CHAPTER7.HASHING
}

7.2.OPEN-ADDRESSHASHING

Figure7.1: Illustrationofasimplehashtablewithchaining,pointedtobythe variable nums.Thetablecontains11bins,eachcontainingapointertoalinkedlist oftheitems(ifany)inthatbin.Thisparticulartablerepresentstheset

Thehashfunctionissimply h(x)= x mod11onintegerkeys.(Themathematical operation a mod b isdefinedtoyield a b⌊a/b⌋ when b =0.Therefore,itisalways non-negativeif b> 0.)Thecurrentloadfactorinthissetis17/11 ≈ 1 5,againsta maximumof2.0(the loadFactor field),althoughasyoucansee,thebinsizesvary from0to3.

135 0 1 2 3 4 5 6 7 8 9 10 0 22 23 26 81 5 82 38 83 39 84 40 63 -3 9 86 65 size: bins: loadFactor:
nums:
17 2.0
{81, 22, 38, 26, 86, 82, 0, 23, 39, 65, 83, 40, 9, 3, 84, 63, 5}.

packagejava.util; publicclassHashMap<Key,Val>extendsAbstractMap<Key,Val>{

/**Anew,emptymappingusingahashtablethatinitiallyhas *INITIALBINSbins,andmaintainsaloadfactor<=LOADFACTOR.*/ publicHashMap(intinitialBins,floatloadFactor){ if(initialBuckets<1||loadFactor<=0.0)

thrownewIllegalArgumentException(); bins=newArrayList<Entry<Key,Val>>(initialBins); bins.addAll(Collections.ncopies(initialBins,null)); size=0;this.loadFactor=loadFactor;

/**AnemptymapwithINITIALBINSinitialbinsandloadfactor0.75.*/ publicHashMap(intinitialBins){this(initialBins,0.75);}

/**Anemptymapwithdefaultinitialbinsandloadfactor0.75.*/ publicHashMap(){this(127,0.75);}

/**AmappingthatisacopyofM.*/ publicHashMap(Map<Key,Val>M){this(M.size(),0.75);putAll(M);}

publicTget(Objectkey){

Entrye=find(key,bins.get(hash(key))); return(e==null)?null:e.value;

/**Causeget(KEY)==VALUE.Returnsthepreviousget(KEY). */ publicValput(Keykey,Valvalue){ inth=hash(key);

Entry<Key,Val>e=find(key,bins.get(h)); if(e==null){

bins.set(h,newEntry<Key,Val>(key,value,bins.get(h))); size+=1; if(size>bins.size()*loadFactor)grow(); returnnull;

}else

returne.setValue(value);

136
CHAPTER7.HASHING
}
}
} ···
Figure7.2:Partofanimplementationofclass java.util.HashMap,ahash-tablebasedimplementationofthe java.util.Map interface.

privatestaticclassEntry<K,V>implementsMap.Entry<K,V>{ Kkey;Vvalue; Entry<K,V>next; Entry(Kkey,Vvalue,Entry<K,V>next)

{this.key=key;this.value=value;this.next=next;} publicKgetKey(){returnkey;} publicVgetValue(){returnvalue;} publicVsetValue(Vx)

{Vold=value;value=x;returnold;} publicinthashCode(){ seeFigure2.14 } publicbooleanequals(){ seeFigure2.14 }

privateArrayList<Entry<Key,Val>>bins; privateintsize;/**Numberofitemscurrentlystored*/ privatefloatloadFactor;

/**Increasenumberofbins.*/ privatevoidgrow(){

HashMap<Key,Val>newMap

=newHashMap(primeAbove(bins.size()*2),loadFactor); newMap.putAll(this);copyFrom(newMap); }

/**Returnavalueintherange0..bins.size()-1,basedon *thehashcodeofKEY.*/ privateinthash(Objectkey){ return(key==null)?0

:(0x7fffffff&key.hashCode())%bins.size();

/**SetTHIStothecontentsofS,destroyingtheprevious *contentsofTHIS,andinvalidatingS.*/ privatevoidcopyFrom(HashMap<Key,Val>S) {size=S.size;bins=S.bins;loadFactor=S.loadFactor;}

/**TheEntryinthelistBINwhosekeyisKEY,ornullifnone.*/ privateEntry<Key,Val>find(Objectkey,Entry<Key,Val>bin){ for(Entry<Key,Val>e=bin;e!=null;e=e.next)

if(key==null&&e.key==null||key.equals(e.key)) returne; returnnull; } privateintprimeAbove(intN){return aprimenumber ≥ N ;} }

7.2.OPEN-ADDRESSHASHING 137
}
}
Figure7.2,continued:Privatedeclarationsfor HashMap

addsapositiveconstantthatisrelativelyprimetothetablesize bins.size() [whyrelativelyprime?].Ifwetakethe17keysofFigure7.1:

andinserttheminthisorderintoanarrayofsize23usinglinearprobingwith increment1and x mod23asthehashfunction,thearrayofbinswillcontainthe followingkeys:

Asyoucansee,severalkeysaredisplacedfromtheirnatural positions.Forexample, 84mod23=15and63mod23=17.

Thereisa clustering phenomenonassociatedwithlinearprobing.Theproblem issimpletoseewithreferencetothechainingmethod.Ifthe sequenceofentries examinedinsearchingforsomekeyis,say, b0,b1,...,bn,andifanyotherkeyshould hashtooneofthese bi,thenthesequenceofentriesexaminedinsearchingforitwill bepartofthesamesequence, bi,bi+1,...bn,evenwhenthetwokeyshavedifferent hashvalues.Ineffect,whatwouldbetwodistinctlistsunder chainingaremerged togetherunderlinearprobing,asmuchasdoublingtheeffectiveaveragesizeofthe binsforthosekeys.Thelongestchainforoursetofintegers (seeFigure7.1)was only3long.Intheopen-addressexampleabove,thelongestchainis9itemslong (lookat63),eventhoughonlyoneotherkey(40)hasthesamehashvalue.

Byhaving nextProbe incrementthevaluebydifferentamounts,dependingon theoriginalkey—atechniqueknownas doublehashing—wecanamelioratethis effect.

Deletionfromanopen-addressedhashtableisnon-trivial. Simplymarkingan entryas“unoccupied”canbreakthechainofcollidingentries,anddeletemorethan thedesireditemfromthetable[why?].Ifdeletionisnecessary(often,itisnot), wehavetobemoresubtle.Theinterestedreaderisreferredtovolume3ofKnuth, TheArtofComputerProgramming.

Theproblemwithopen-addressschemesingeneralisthatkeysthatwouldbein separatebinsunderthechainingschemecancompetewitheachother.Underthe chainingscheme,ifallentriesarefullandwesearchforakeythatisnotinthetable, thesearchrequiresonlyasmanyprobes(i.e.,testsforequality)astherearekeysin thetablethathavethesamehashedvalue.Underanyopen-addressingscheme,it wouldrequire N probestofindthatthekeyisnotinthetable.Inmyexperience, thecostoftheextralinknodesrequiredforchainingisrelativelyunimportant,and formostpurposes,Irecommendusingchainingratherthanopen-addressschemes.

7.3Thehashfunction

Thisleavesthequestionofwhattouseforthefunction hash,usedtochoosethe bininwhichtoplaceakey.Inorderforthemaporsetweareimplementingto workproperly,itisfirstimportantthatourhashfunctionsatisfytwoconstraints:

138 CHAPTER7.HASHING
{81, 22, 38, 26, 86, 82, 0, 23, 39, 65, 83, 40, 9, 3, 84, 63, 5}
0 0 23 1 63 2 26 3 4 5 5 6 7 8 9 9 10 11 81 12 82 13 83 14 38 15 39 16 86 17 40 18 65 19 -3 20 84 21 22 22

1.Foranykeyvalue, K,thevalueof hash(K ) mustremainconstantwhile K is inthetable(orthetablemustbereconstructedif hash(K ) changes).during theexecutionoftheprogram.

2.Iftwokeysareequal(accordingtothe equals method,orwhateverequality testthehashtableisusing),thentheir hash valuesmustbeequal.

Ifeitherconditionisviolated,akeycaneffectivelydisappearfromthetable.On theotherhand,itis not generallynecessaryforthevalueof hash tobeconstant fromoneexecutionofaprogramtothenext,norisitnecessarythatunequalkeys haveunequalhashvalues(althoughperformancewillclearlysufferiftoomanykeys havethesamehashvalue).

Ifthekeysaresimplynon-negativeintegers,asimpleandeffectivefunctionis tousetheremaindermodulothetablesize:

hash(X)==X%bins.size();

Forintegersthatmightbenegative,wehavetomakesomeprovision.Forexample hash(X)=(X&0x7fffffff)%bins.size(); hastheeffectofadding231 toanynegativevalueof X first[why?].Alternatively,if bins.size() isodd,then

hash(X)=X%((bins.size()+1)/2)+bins.size()/2; willalsowork[why?].

Handlingnon-numerickeyvaluesrequiresabitmorework.AllJavaobjects havedefinedonthema hashCode methodthatwehaveusedtoconvert Objects intointegers(whencewecanapplytheprocedureonintegers above).Thedefaultimplementationof x.equals(y) on Object is x==y—thatis,that x and y arereferencestothesameobject.Correspondingly,thedefaultimplementationof x.hashCode() suppliedby Object simplyreturnsanintegervaluethatisderived fromtheaddressoftheobjectpointedtoby x—thatis,bythepointervalue x treatedasaninteger(whichisallitreallyis,behindthescenes).Thisdefault implementationisnotsuitableforcaseswherewewanttoconsidertwodifferent objectstobethesame.Forexample,thetwo Stringscomputedby

Strings1="Hello,world!",s2="Hello,"+""+"world!"; willhavethepropertythat s1.equals(s2),but s1!=s2 (thatis,theyaretwo different String objectsthathappentocontainthesamesequenceofcharacters). Hence,thedefault hashCode operationisnotsuitablefor String,andthereforethe String classoverridesthedefaultdefinitionwithitsown. Forconvertingtoanindexinto bins,weusedtheremainderoperation.This obviouslyproducesanumberinrange;whatisnotsoobviousiswhywechosethe tablesizeswedid(primesnotclosetoapowerof2).Sufficeittosaythatother choicesofsizetendtoproduceunfortunateresults.Forexample,usingapowerof 2meansthatthehigh-orderbitsof X.hashCode() getignored.

7.3.THEHASHFUNCTION 139

Ifkeysarenotsimpleintegers(strings,forexample),aworkablestrategyisto firstmashthemintointegersandthenapplytheremaindering methodabove.Here isarepresentativestring-hashingfunctionthatdoeswell empirically,takenfroma CcompilerbyP.J.Weinberger1 .Itassumes8-bitcharactersand32-bit ints. staticinthash(StringS)

inth; h=0; for(intp=0;p<S.length();p+=1){ h=(h<<4)+S.charAt(p); h=(h^((h&0xf0000000)>>24))&0x0fffffff; } returnh;

TheJavaStringtypehasadifferentfunctionfor hashCode,whichcomputes

usingmodular int arithmetictogetaresultintherange 231 to231 1.Here, ci denotesthe ith characterintheclsString.

7.4Performance

Assumingthekeys are evenlydistributed,ahashtablewilldoretrievalinconstant time,regardlessof N ,thenumberofitemscontained.Asindicatedintheanalysis wedidin §4.1aboutgrowing ArrayLists,insertionalsohasconstantamortized cost(i.e.,costaveragedoverallinsertions).Ofcourse,ifthekeysarenotevenly distributed,thenwecanseeΘ(N )cost.

Ifthereisapossibilitythatonehashfunctionwillsometimeshavebadclustering problems,atechniqueknownas universalhashing canhelp.Here,youchooseahash functionatrandomfromsomecarefullychosenset.Onaverageoverallrunsofyour program,yourhashfunctionwillthenperformwell.

Exercises

7.1. Giveanimplementationforthe iterator functionoverthe HashMap representationgivenin §7.1,andthe Iterator classitneeds.Sincewehavechosen rathersimplelinkedlists,youwillhavetousecareingettingthe remove operation right.

1TheversionhereisadaptedfromAho,Sethi,andUllman, Compilers:Principles,Techniques, andTools,Addison-Wesley,1986,p.436.

140 CHAPTER7.HASHING
{
}
0≤i<n ci31n i 1

Chapter8

SortingandSelecting

Atleastatonetime,mostCPUtimeandI/Obandwidthwasspent sorting(these days,IsuspectmoremaybespentrenderingMPEGfiles).Asaresult,sortinghas beenthesubjectofextensivestudyandwriting.Wewillhardlyscratchthesurface here.

8.1Basicconcepts

Thepurposeofanysortistopermutesomesetofitemsthatwe’llcall records so thattheyaresortedaccordingtosomeorderingrelation.In general,theordering relationlooksatonlypartofeachrecord,the key.Therecordsmaybesorted accordingtomorethanonekey,inwhichcasewerefertothe primarykey andto secondarykeys.Thisdistinctionisactuallyrealizedintheorderingfunction:record A comesbefore B iffeither A’sprimarykeycomesbefore B’s,ortheirprimarykeys areequaland A’ssecondarykeycomesbefore B’s.Onecanextendthisdefinition inanobviouswaytohierarchiesofmultiplekeys.Forthepurposesofthisbook,I’ll usuallyassumethatrecordsareofsometype Record andthatthereisanordering relationontherecordswearesorting.I’llwrite before(A,B ) tomeanthatthe keyof A comesbeforethatof B inwhateverorderweareusing.

Althoughconceptuallywemovearoundtherecordswearesortingsoastoput theminorder,infacttheserecordsmayberatherlarge.Therefore,itisoften preferabletokeeparoundpointerstotherecordsandexchangethoseinstead.If necessary,therealdatacanbephysicallyre-arrangedasalaststep.InJava,this isveryeasyofcourse,since“large”dataitemsarealwaysreferredtobypointers.

Stability. Asortiscalled stable ifitpreservestheoriginalorderofrecordsthat haveequalkeys.Anysortcanbemadestableby(ineffect)addingtheoriginal recordpositionasafinalsecondarykey,sothatthelistofkeys(Bob,Mary,Bob, Robert)becomessomethinglike(Bob.1,Mary.2,Bob.3,Robert.4).

Inversions. Forsomeanalyses,weneedtohaveanideaof howout-of-order a givensequenceofkeysis.Oneusefulmeasureisthenumberof inversions inthe

141

sequence—inasequenceofkeys k0,...,kN 1,thisisthenumberofpairsofintegers, (i,j),suchthat i<j and ki >kj .Forexample,therearetwoinversionsinthe sequenceofwords

Charlie,Alpha,Bravo

andthreeinversionsin

Charlie,Bravo,Alpha.

Whenthekeysarealreadyinorder,thenumberofinversionsis0,andwhenthey areinreverseorder,sothat every pairofkeysisinthewrongorder,thenumberof inversionsis N (N 1)/2,whichisthenumberofpairsofkeys.Whenallkeysare originallywithinsomedistance D oftheircorrectpositionsinthesortedpermutation,wecanestablishapessimisticupperboundof DN inversionsintheoriginal permutation.

Internalvs.externalsorting. Asortthatiscarriedoutentirelyinprimary memoryisknownasan internal sort.Thosethatinvolveauxiliarydisks(or,in theolddaysespecially,tapes)toholdintermediateresultsarecalled external sorts. Thesourcesofinputandoutputareirrelevanttothisclassification(onecanhave internalsortsondatathatcomesfromanexternalfile;it’sjusttheintermediate filesthatmatter).

8.2ALittleNotation

Manyofthealgorithmsinthesenotesdealwith(orcanbethoughtofasdealing with)arrays.Indescribingorcommentingthem,wesometimesneedtomakeassertionsaboutthecontentsofthesearrays.Forthispurpose,Iamgoingtousea notationusedbyDavidGriestomakedescriptivecommentsaboutmyarrays.The notation

denotesasectionofanarraywhoseelementsareindexedfrom a to b andthat satisfiesproperty P .Italsoassertsthat a ≤ b +1;if a>b,thenthesegmentis empty.Icanalsowrite

todescribeanarraysegmentinwhichitems c +1to d 1satisfy P ,andthat c<d Byputtingthesesegmentstogether,Icandescribeanentire array.Forexample,

142
CHAPTER8.SORTINGANDSELECTING
P ab
c P d
A : ordered 0 iN

istrueifthearray A has N elements,elements0through i 1areordered,and 0 ≤ i ≤ N .Anotationsuchas

denotesa1-elementarraysegmentwhoseindexis j andwhose(single)valuesatisfies P .Finally,I’lloccasionallyneedtohavesimultaneousconditionsonnestedpieces ofanarray.Forexample,

referstoanarraysegmentinwhichitems0to N 1satisfy P ,items0to i 1 satisfy Q,0 ≤ N ,and0 ≤ i ≤ N

8.3Insertionsorting

Oneverysimplesort—andquiteadequateforsmallapplications,really—isthe straightinsertionsort. Thenamecomesfromthefactthatateachstage,weinsert anas-yet-unprocessedrecordintoa(sorted)listoftherecordsprocessedsofar,as illustratedinFigure8.2.ThealgorithmisshowninFigure8.1.

Acommonwaytomeasurethetimerequiredtodoasortistocountthecomparisonsofkeys(forFigure8.1,thecallsto before).Thetotal(worst-case)time requiredby insertionSort is

0<i<N CIL(i),where CIL(m)isthecostoftheinner(j)loopwhen i= m,and N isthesizeof A.Examinationoftheinnerloop showsthatthenumberofcomparisonsrequiredisequaltothe numberofrecords numbered0to i-1 whosekeyslargerthanthatof x,plusoneifthereisatleast onesmallerkey.Since A[0..i-1] issorted,itcontainsnoinversions,andtherefore, thenumberofelementsafter X inthesortedpartof A happenstobeequaltothe numberofinversionsinthesequence A[0],...,A[i] (since X is A[i]).When X is insertedcorrectly,therewillbenoinversionsintheresultingsequence.Itisfairly easytoworkoutfromthatpointthattherunningtimeof insertionSort,measuredinkeycomparisons,isboundedby I + N 1,where I isthetotalnumberof inversionsintheoriginalargumentto insertionSort.Thus,themoresortedan arrayistobeginwith,thefaster insertionSort runs.

8.4Shell’ssort

Theproblemwithinsertionsortcanbeseenbyexaminingtheworstcase—where thearrayisinitiallyinreverseorder.Thekeysareagreatdistancefromtheir finalrestingplaces,andmustbemovedoneslotatatimeuntil theygetthere.If keyscouldbemovedgreatdistancesinlittletime,itmightspeedthingsupabit.

8.3.INSERTIONSORTING 143
P j
Q
P
0 iN

/**PermutetheelementsofAtobeinascendingorder.*/ staticvoidinsertionSort(Record[]A){

intN=A.length; for(inti=1;i<N;i+=1){

/*A: 0 i N

ordered */ Recordx=A[i]; intj;

for(j=i;j>0&&before(x,A[j-1]);j-=1){

/*A: 0 j >x i N

ordered exceptat j */

A[j]=A[j-1]; }

ordered exceptat j */ A[j]=x;

}

144
CHAPTER8.SORTINGANDSELECTING
/*A: ≤ x 0 j >x i N }
Figure8.1:Programforperforminginsertionsortonanarray.The before function isassumedtoembodythedesiredorderingrelation.

insertElement.Thegapateachpointseparatestheportionofthearrayknown tobesortedfromtheunprocessedportion.

8.4.SHELL’SSORT 145 13 9 10 0 22 12 4 9 13 10 0 22 12 4 9 10 13 0 22 12 4 0 9 10 13 22 12 4 0 9 10 13 22 12 4 0 9 10 12 13 22 4 0 4 9 10 12 13 22
Figure8.2:Exampleofinsertionsort,showingthearraybeforeeachcallof

CHAPTER8.SORTINGANDSELECTING

/**PermutetheelementsofKEYS,whichmustbedistinct, *intoascendingorder.*/ staticvoiddistributionSort1(int[]keys){ intN=keys.length; intL=min(keys),U=max(keys); java.util.BitSetb=newjava.util.BitSet(); for(inti=0;i<N;i+=1)

b.set(keys[i]-L); for(inti=L,k=0;i<=U;i+=1) if(b.get(i-L)){ keys[k]=i;k+=1;

Figure8.3:Sortingdistinctkeysfromareasonablysmallanddenseset.Here, assumethatthefunctions min and max returntheminimumandmaximumvalues inanarray.Theirvaluesarearbitraryifthearraysareempty.

ThisistheideabehindShell’ssort1.Wechooseadiminishingsequenceofstrides, s0 >s1 >...>sm 1,typicallychoosing sm 1 =1.Then,foreach j,wedividethe N recordsintothe sj interleavedsequences

andsorteachoftheseusinginsertionsort.Figure8.4illustratestheprocesswitha vectorinreverseorder(requiringatotalof49comparisons ascomparedwith120 comparisonsforstraightinsertionsort).

Agoodsequenceof sj turnsouttobe sj = ⌊2m j 1⌋,where m = ⌊lg N ⌋.With thissequence,itcanbeshownthatthenumberofcomparisons requiredis O(N 1 5), whichisconsiderablybetterthan O(N 2).Intuitively,theadvantagesofsucha sequence—inwhichthesuccessive sj arerelativelyprime—isthatoneachpass,each positionofthevectorparticipatesinasortwithanewsetof otherpositions.The sortsget“jumbled”andgetmoreofachancetoimprovethenumberofinversions forlaterpasses.

1Alsoknownas“shellsort.”Knuth’sreference:DonaldL.Shell,inthe Communicationsofthe ACM 2 (July,1959),pp.30–32.

146
} }
R0,Rsj ,R2sj ,..., R1,Rsj +1,R2sj +1,... ··· Rsj 1,R2sj 1,...

vectorinreverseorder. Theincrementsare15,7,3,and1.Thecolumnmarked #I givesthenumberof inversionsremaininginthearray,andthecolumnmarked #C givesthenumber ofkeycomparisonsrequiredtoobtaineachlinefromitspredecessor.Thearcs underneaththearraysindicatewhichsubsequencesofelementsareprocessedat eachstage.

8.4.SHELL’SSORT 147
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 01200 14 13 12 11 10 9 8 7 6 5 4 3 2 1 15911 0 7 6 5 4 3 2 1 14 13 12 11 10 9 8 15429 0 1 3 2 4 6 5 7 8 10 9 11 13 12 14 15420 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15019
#I#C
Figure8.4:AnillustrationofShell’ssort,startingwitha

8.5Distributioncounting

Whentherangeofkeysisrestricted,thereareanumberofoptimizationspossible.In Column#1ofhisbook ProgrammingPearls 2 ,JonBentleygivesasimplealgorithm fortheproblemofsorting N distinctkeys,allofwhichareinarangeofintegersso limitedthattheprogrammercanbuildavectorofbitsindexedbythoseintegers. IntheprogramshowninFigure8.3,IuseaJava BitSet,whichisabstractlyaset ofnon-negativeintegers(implementedasapackedarrayof1-bitquantities).

Let’sconsideramoregeneraltechniquewecanapplyevenwhentherearemultiplerecordswiththesamekey.Assumethatthekeysoftherecordstobesortedare insomereasonablysmallrangeofintegers.Thenthefunction distributionSort2 showninFigure8.6sorts N recordsstably,movingthemfromaninputarray(A) toadifferentoutputarray(B).Itcomputesthecorrectfinalpositionin B foreach recordin A.Todoso,itusesthefactthatthepositionofanyrecordin B issupposedtobethenumbertherecordsthateitherhavesmallerkeysthanithas,or thathavethesamekey,butappearbeforeitin A.Figure8.5containsanexample oftheprograminoperation.

8.6Selectionsort

Ininsertionsort,wedetermineanitem’sfinalpositionpiecemeal.Anotherwayto proceedistoplaceeachrecordinitsfinalpositioninonemovebyselectingthe smallest(orlargest)keyateachstep.Thesimplestimplementationofthisideais straightselectionsorting, asfollows.

staticvoidselectionSort(Record[]A) { intN=A.length; for(inti=0;i<N-1;i+=1){

*/ intm,j; for(j=i+1,m=i;j<N;j+=1) if(before(A[j],A[m])m=j; /*NowA[m]isthesmallestelementinA[i..N-1]*/ swap(A,i,m);

Here, swap(A,i,m) isassumedtoswapelements i and m of A.Thissortisnotstable; theswappingofrecordspreventsstability.Ontheotherhand,theprogramcanbe 2Addison-Wesley,1986.Bytheway,thatcolumnmakesverynice“consciousness-raising”columnonthesubjectofappropriately-engineeredsolutions. Ihighlyrecommendboththisbookand his MoreProgrammingPearls, Addison-Wesley,1988.

148 CHAPTER8.SORTINGANDSELECTING
/*A: ordered 0 ≥ items0..i-1 iN } }

Figure8.5:Illustrationofthe distributionSort2 program.Thevaluestobe sortedareshowninthearraymarked A.Thekeysarethenumberstotheleftof theslashes.Thedataaresortedintothearray B,shownatvariouspointsinthe algorithm.ThelabelsattheleftrefertopointsintheprograminFigure8.6.Each point Bk indicatesthesituationattheendofthelastloopwhere i = k.The roleofarray count changes.First(atcount1) count[k-1] containsthenumberof instancesofkey (k-1)-1.Next(atcount2),itcontainsthenumberofinstancesof keyslessthan k-1.IntheBi lines, count[k 1] indicatestheposition(in B)at whichtoputthenextinstanceofkey k.(It’s k-1 intheseplaces,ratherthan k, because1isthesmallestkey.

8.6.SELECTIONSORT 149 A: 3/A 2/B 2/C 1/D 4/E 2/F 3/G count1: 0 1 3 2 1 count2: 0 1 4 6 7 B0: 3/A count: 0 1 5 6 7 B1: 2/B 3/A count: 0 2 5 6 7 B2: 2/B 2/C 3/A count: 0 3 5 6 7 B3: 1/D 2/B 2/C 3/A count: 1 3 5 6 7 B4: 1/D 2/B 2/C 3/A 4/E count: 1 3 5 7 7 B5: 1/D 2/B 2/C 2/F 3/A 4/E count: 1 4 5 7 7 B6: 1/D 2/B 2/C 2/F 3/A 3/G 4/E count: 1 4 6 7 7

/**AssumingthatAandBarenotthesamearrayandareof *thesamesize,sorttheelementsofAstablyintoB. */

voiddistributionSort2(Record[]A,Record[]B)

intN=A.length;

intL=min(A),U=max(A);

/*count[i-L]willcontainthenumberofitems<i*/

//NOTE:count[U-L+1]isnotterriblyuseful,butis //includedtoavoidhavingtotestforfori==Uin //thefirstiloopbelow.

int[]count=newint[U-L+2];

//Clearcount:NotreallyneededinJava,butagoodhabit //togetintoforotherlanguages(e.g.,C,C++).

for(intj=L;j<=U+1;j+=1)

count[j-L]=0;

for(inti=0;i<N;i+=1)

count[key(A[i])-L+1]+=1;

/*Nowcount[i-L]==#ofrecordswhosekeyisequaltoi-1*/

// SeeFigure8.5,pointcount1

for(intj=L+1;j<=U;j+=1)

count[j-L]+=count[j-L-1];

/*Nowcount[k-L]==#ofrecordswhosekeyislessthank, *forallk,L<=k<=U.*/

// SeeFigure8.5,pointcount2.

for(i=0;i<N;i+=1){

/*Nowcount[k-L]==#ofrecordswhosekeyislessthank, *orwhosekeyiskandhavealreadybeenmovedtoB.*/

B[count[key(A[i])-L]]=A[i];

count[key(A[i])-L]+=1;

// SeeFigure8.5,pointsB0–B6

150
CHAPTER8.SORTINGANDSELECTING
{
} }
Figure8.6:DistributionSorting.Thisprogramassumesthat key(R) isaninteger.

8.7.EXCHANGESORTING:QUICKSORT 151

modifiedtoproduceitsoutputinaseparateoutputarray,and thenitisrelatively easytomaintainstability[how?].

Itshouldbeclearthatthealgorithmaboveisinsensitiveto thedata.Unlike insertionsort,it always takesthesamenumberofkeycomparisons—N (N 1)/2. Thus,inthisnaiveform,althoughitisverysimple,itsuffersincomparisonto insertionsort(atleastonasequentialmachine).

Ontheotherhand,wehaveseenanotherkindofselectionsort before—heapsort (from §6.4)isaformofselectionsortthat(ineffect)keepsaroundinformationabout theresultsofcomparisonsfromeachpreviouspass,thusspeedinguptheminimum selectionconsiderably.

8.7Exchangesorting:Quicksort

OneofthemostpopularmethodsforinternalsortingwasdevelopedbyC.A.R.Hoare3 . Evidentlymuchtakenwiththetechnique,henamedit“quicksort.”Thenameis actuallyquiteappropriate.Thebasicalgorithmisasfollows.

staticfinalintK=...;

voidquickSort(RecordA[])

quickSort(A,0,A.length-1); insertionSort(A);

/*PermuteA[L..U]sothatallrecordsare<Kawayfromtheir*/ /*correctpositionsinsortedorder.AssumesK>0.*/ voidquickSort(Record[]A,intL,intU)

if(U-L+1>K){

ChooseRecordT=A[p],whereL ≤ p ≤ U; P:SetiandpermuteA[L..U]toestablishthepartitioning condition:

; quickSort(A,L,i-1);quickSort(A,i+1,U);

Here, K isaconstantvaluethatcanbeadjustedtotunethespeedofthesort.Once theapproximatesortgetsallrecordswithinadistance K-1 oftheirfinallocations, thefinalinsertionsortproceedsin O(KN )time.If T canbechosensothatits

3 Knuth’sreference: ComputingJournal 5 (1962),pp.10–15.

{
}
{
key ≤ key(T) 0 T i key ≥ key(T) N } }

keyisnearthemediankeyfortherecordsin A,thenwecancomputeroughlythat thetimeinkeycomparisonsrequiredforperforming quicksort on N recordsis approximatedby C(N ),definedasfollows.

C(K)=0

C(N )= N 1+2C(⌊N/2⌋)

Thisassumesthatwecanpartitionan N -elementarrayin N 1comparisons,which we’llseetobepossible.Wecangetasenseforthesolutionby consideringthecase

).

Unfortunately,intheworstcase—wherethepartition T hasthelargestorsmallestkey,quicksortisessentiallyastraightselectionsort,withrunningtimeΘ(N 2). Thus,wemustbecarefulinthechoiceofthepartitioningelement.Onetechnique istochoosearandomrecord’skeyfor T.Thisiscertainlylikelytoavoidthebad cases.Acommonchoicefor T isthe median of A[L], A[(L+U)/2],and A[U],which isalsounlikelytofail.

Partitioning. Thisleavesthesmalllooseendofhowtopartitionthearrayat eachstage(stepPintheprogramabove).Therearemanywaystodothis.Hereis oneduetoNicoLomuto—notthefastest,butsimple.

152
CHAPTER8.SORTINGANDSELECTING
N =2mK: C(N )=2mK +2C(2m 1K) =2mK 1+2mK 2+4C(2m 2K) =2mK + +2mK m 1 2 4 2m 1 + C(K) = m2mK 2m +1
Θ(m2mK)=Θ(N lg N )
(sincelg(2mK)= m lg K
P: swap(A,L,p); i=L; for(intj=L+1;j<=U;j+=1){ /*A[L..U]: T L <T i ≥T j U */ if(before(A[j],T)){ i+=1; swap(A,j,i); } } /*A[L..U]: T L <T i ≥T U */

swap(A,L,i); /*A[L..U]:

Someauthorsgotothetroubleofdevelopingnon-recursiveversionsofquicksort,evidentlyundertheimpressionthattheyaretherebyvastlyimprovingits performance.Thisviewofthecostofrecursioniswidelyheld,soIsupposeIcan’t besurprised.However,aquicktestusingaCversionindicatedabouta3%improvementusinghisiterativeversion.Thisishardlyworth obscuringone’scodeto obtain.

8.8Mergesorting

Quicksortwasakindofdivide-and-conqueralgorithm4 thatwemightcall“try todivide-and-conquer,”sinceitisnotguaranteedtosucceedindividingthedata evenly.Anoldertechnique,knownasmergesorting,isaform ofdivide-and-conquer thatdoesguaranteethatthedataaredividedevenly.

Atahighlevel,itgoesasfollows.

/**SortitemsA[L..U].*/

staticvoidmergeSort(Record[]A,intL,intU)

{ if(L>=U)

return;

mergeSort(A,L,(L+U)/2);

mergeSort(A,(L+U)/2+1,U);

merge(A,L,(L+U)/2,A,(L+U)/2+1,U,A,L); }

The merge programhasthefollowingspecification

/**AssumingV0[L0..U0]andV1[L1..U1]areeachsortedin*/ /*ascendingorderbykeys,setV2[L2..U2]tothesortedcontents*/ /*ofV0[L0..U0],V1[L1..U1].(U2=L2+U0+U1-L0-L1+1).*/

voidmerge(Record[]V0,intL0,intU0,Record[]V1,intL1,intU1, Record[]V2,intL2)

Since V0 and V1 areinascendingorderalready,itiseasytodothisinΘ(N )time, where N = U 2 L2+1,thecombinedsizeofthetwoarrays.Mergingprogresses throughthearraysfromlefttoright.Thatmakesitwell-suitedforcomputerswith smallmemoriesandlotstosort.Thearrayscanbeonsecondarystoragedevices

4 Theterm divide-and-conquer isusedtodescribealgorithmsthatdivideaproblemintosome numberofsmallerproblems,andthencombinetheanswerstothoseintoasingleresult.

8.8.MERGESORTING 153
<T L ≥T iU */

CHAPTER8.SORTINGANDSELECTING

thatarerestrictedto sequentialaccess—i.e.,thatrequirereadingorwritingthe arraysinincreasing(ordecreasing)orderofindex5 .

Therealworkisdonebythemergingprocess,ofcourse.Thepatternofthese mergesisratherinteresting.Forsimplicity,considerthe casewhere N isapower oftwo.Ifyoutracetheexecutionof mergeSort,you’llseethefollowingpatternof callson merge

Wecanexploitthispatterntogoodadvantagewhentryingtodomergesorting onlinkedlistsofelements,wheretheprocessofdividingthelistinhalfisnot aseasyasitisforarrays.Assumethatrecordsarelinkedtogetherinto Lists. Theprogrambelowshowshowtoperformamergesortontheselists;Figure8.7 illustratestheprocess.Theprogrammaintainsa binomialcomb ofsortedsublists, comb[0..M-1],suchthatthelistin comb[i] iseithernullorhaslength2i .

/**PermutetheRecordsinListAsoastobesortedbykey.*/ staticvoidmergeSort(List<Record>A) {

intM=anumbersuchthat 2M 1 ≥ lengthofA; List<Record>[]comb=newList<Record>[M];

for(inti=0;i<M;i+=1) comb[i]=newLinkedList<Record>(); for(RecordR:A) addToComb(comb,R); A.clear(); for(List<Record>L:comb) mergeInto(A,L); }

5Afamiliarmovieclich´eofdecadespastwasspinningtapeunitstoindicatethatsomepieceof machinerywasacomputer(alsooperatorsflippingconsoleswitches—somethingonealmost never reallydidduringnormaloperation).Whenthoseimagescame fromfootageofrealcomputers,the computerwasmostlikelysorting.

154
Call V0 V1 # 0. A[0] A[1] 1. A[2] A[3] 2. A[0..1] A[2..3] 3. A[4] A[5] 4. A[6] A[7] 5. A[4..5] A[6..7] 6. A[0..3] A[4..7] 7. A[8] A[9] etc.

Ateachpoint,thecombcontainssortedliststhataretobemerged.Wefirstbuild upthecombonenewitematatime,andthentakeafinalpassthroughit,merging allitslists.Toaddoneelementtothecomb,wehave

/**AssumingthateachC[i]isasortedlistwhoselengthiseither0 *or 2i elements,addsPtotheitemsinCsoasto *maintainthissamecondition.*/ staticvoidaddToComb(List<Record>C[],Recordp)

{

if(C[0].size()==0){

C[0].add(p); return;

}elseif(before(C[0].get(0),p))

C[0].add(p);

else

C[0].add(p,0);

//NowC[0]contains2items

inti; for(i=1;C[i].size()!=0;i+=1) mergeLists(C[i],C[i-1]); C[i]=C[i-1];C[i-1]=newLinkedList();

Ileavetoyouthe mergeLists procedure:

/**MergeL1intoL0,producingasortedlistcontainingallthe *elementsoriginallyinL0andL1.AssumesthatL0andL1are *eachsortedinitially(accordingtothebeforeordering).

*TheresultendsupinL0;L1becomesempty.*/ staticvoidmergeLists(List<Record>L0,List<Record>L1)

8.8.1Complexity

Theoptimistictimeestimateforquicksortappliesintheworstcasetomergesorting, becausemergesortsreallydodividethedatainhalfwitheachstep(andmergingof twolistsorarraystakeslineartime).Thus,mergesortingisaΘ(N lg N )algorithm, with N thenumberofrecords.Unlikequicksortorinsertionsort,mergesortingas Ihavedescribeditisgenerallyinsensitivetotheordering ofthedata.Thischanges somewhatwhenweconsiderexternalsorting,but O(N lg N )comparisonsremains anupperbound.

8.9Speedofcomparison-basedsorting

I’vepresentedanumberofalgorithmsandhaveclaimedthatthebestofthemrequire Θ(N lg N )comparisonsintheworstcase.Thereareseveralobviousquestionsto

8.9.SPEEDOFCOMPARISON-BASEDSORTING 155
}

CHAPTER8.SORTINGANDSELECTING

Figure8.7: Mergesortingoflists,showingthestateofthe“comb”after various numbersofitemsfromthelist L havebeenprocessed.Thefinalstepistomerge thelistsremaininginthecombafterall11elementsfromthe originallisthave beenaddedtoit.The0sand1sinthesmallboxesaredecorationstoillustratethe patternofmergesthatoccurs.Eachemptyboxhasa0andeachnon-emptybox hasa1.Ifyoureadthecontentsofthefourboxesasasinglebinarynumber,units bitontop,itequalsthenumberofelementsprocessed.

156
L:(9,15,5,3,0,6,10, 1,2,20,8) 0 0: 0 1: 0 2: 0 3: 0elementsprocessed L:(15,5,3,0,6,10, 1,2,20,8) 1 • 0: (9) 0 1: 0 2: 0 3: 1elementprocessed L:(5,3,0,6,10, 1,2,20,8) 0 0: 1 • 1: (9,15) 0 2: 0 3: 2elementsprocessed L:(3,0,6,10, 1,2,20,8) 1 • 0: (5) 1 • 1: (9,15) 0 2: 0 3: 3elementsprocessed L:(0,6,10, 1,2,20,8) 0 0: 0 1: 1 • 2: (3,5,9,15) 0 3: 4elementsprocessed L:(10, 1,2,20,8) 0 0: 1 • 1: (0,6) 1 • 2: (3,5,9,15) 0 3: 6elementsprocessed L: 1 • 0: (8) 1 • 1: (2,20) 0 2: 1 • 3: ( 1,0,3,5,6,9,10,15) 11elementsprocessed

askaboutthisbound.First,howdo“comparisons”translate into“instructions”?

Second,canwedobetterthan N lg N ?

ThepointofthefirstquestionisthatIhavebeenabitdishonesttosuggestthat acomparisonisaconstant-timeoperation.Forexample,whencomparingstrings, thesizeofthestringsmattersinthetimerequiredforcomparisonintheworstcase. Ofcourse,ontheaverage,oneexpectsnottohavetolooktoofarintoastringto determineadifference.Still,thismeansthattocorrectlytranslatecomparisonsinto instructions,weshouldthrowinanotherfactorofthelengthofthekey.Suppose thatthe N recordsinoursetallhavedistinctkeys.Thismeansthatthe keys themselveshavetobeΩ(lg N )long.Assumingkeysarenolongerthannecessary, andassumingthatcomparisontimegoesupproportionallyto thesizeofakey(in theworstcase),thismeansthatsorting really takesΘ(N (lg N )2)time(assuming thatthetimerequiredtomoveoneoftheserecordsisatworst proportionaltothe sizeofthekey).

AstothequestionaboutwhetheritispossibletodobetterthanΘ(N lg N ),the answeristhat if theonlyinformationwecanobtainaboutkeysishowtheycomparetoeachother,thenwecannotdobetterthanΘ(N lg N ).Thatis,Θ(N lg N ) comparisonsisalowerboundontheworstcaseofallpossible sortingalgorithms thatusecomparisons.

Theproofofthisassertionisinstructive.Asortingprogramcanbethoughtof asfirstperformingasequenceofcomparisons,andthendecidinghowtopermute itsinputs,based only ontheinformationgarneredbythecomparisons.Thetwo operationsactuallygetmixed,ofcourse,butwecanignorethatfacthere.Inorder fortheprogramto“know”enoughtopermutetwodifferentinputsdifferently,these inputsmustcausedifferentsequencesofcomparisonresults.Thus,wecanrepresent thisidealizedsortingprocessasatreeinwhichtheleafnodesarepermutationsand theinternalnodesarecomparisons,witheachleftchildcontainingthecomparisons andpermutationsthatareperformedwhenthecomparisonturnsouttrueandthe rightchildcontainingthosethatareperformedwhenthecomparisonturnsoutfalse.

Figure8.8illustratesthisforthecase N =3.Theheightofthistreecorrespondsto thenumberofcomparisonsperformed.Sincethenumberofpossiblepermutations (andthusleaves)is N !,andtheminimalheightofabinarytreewith M leavesis

N recordsisroughlylg(N !).

8.9.SPEEDOFCOMPARISON-BASEDSORTING 157
⌈lg M ⌉,theminimalheightofthecomparisontreefor
Now lg N !=lg N +lg(N 1)+ ... +1 ≤ lg N +lg N + ... +lg N = N lg N ∈ O(N lg N ) andalso(taking N tobeeven) lg N ! ≥ lg N +lg(N 1)+ ... +lg(N/2) ≥ (N/2+1)lg(N/2) ∈ Ω(N lg N )

Figure8.8:Acomparisontreefor N =3.Thethreevaluesbeingsortedare A, B,and C.Eachinternalnodeindicatesatest.Theleftchildrenindicatewhat happenswhenthetestissuccessful(true),andtherightchildrenindicatewhat happensifitisunsuccessful.Theleafnodes(rectangular) indicatetheorderingof thethreevaluesthatisuniquelydeterminedbythecomparisonresultsthatlead downtothem.Weassumeherethat A, B,and C aredistinct.Thistreeisoptimal, demonstratingthatthreecomparisonsareneededintheworstcasetosortthree items.

sothat

lg N ! ∈ Θ(N lg N )

Thus any sortingalgorithmthatusesonly(true/false)keycomparisonstogetinformationabouttheorderofitsinput’skeysrequiresΘ(N lg N )comparisonsinthe worstcasetosort N keys.

8.10Radixsorting

Togettheresultin §8.9,weassumedthattheonlyexaminationofkeysavailablewas comparingthemfororder.Supposethatweare not restrictedtosimplycomparing keys.CanweimproveonourO(N lg N )bounds?Interestinglyenough,wecan, sortof.Thisispossiblebymeansofatechniqueknownas radixsort.

Mostkeysareactuallysequencesoffixed-sizepieces(charactersorbytes,in particular)withalexicographicorderingrelation—thatis,thekey k0k1 kn 1 is lessthan k′ 0k′ 1 k′ n 1 if k0 <k′ 0 or k0 = k′ 0 and k1 kn 1 islessthan k′ 1 k′ n 1 (wecanalwaystreatthekeysashavingequallengthbychoosingasuitablepadding characterfortheshorterstring).Justasinasearchtriewe usedsuccessivecharactersinasetofkeystodistributethestringsamongstsubtrees,wecanusesuccessive charactersofkeystosortthem.Therearebasicallytwovarietiesofalgorithm— onethatworksfromleastsignificanttomostsignificantdigit(LSD-first)andone thatworksfrommostsignificanttoleastsignificantdigit(MSD-first).Iuse“digit”

158 CHAPTER8.SORTINGANDSELECTING A<B B<C (A,B,C) A<C (A,C,B) (C,A,B) A<C (B,A,C) B<C (B,C,A) (C,B,A)

hereasagenerictermencompassingnotonlydecimaldigits, butalsoalphabetic characters,orwhateverisappropriatetothedataoneissorting.

8.10.1LSD-firstradixsorting

TheideaoftheLSD-firstalgorithmistofirstusetheleastsignificantcharacterto orderallrecords,thenthesecond-leastsignificant,andso forth.Ateachstage,we performa stable sort,sothatifthe k mostsignificantcharactersoftworecords areidentical,theywillremainsortedbytheremaining,leastsignificant,characters. Becausecharactershavealimitedrangeofvalues,itiseasy tosorttheminlinear time(using,forexample, distributionSort2,or,iftherecordsarekeptinalinked list,bykeepinganarrayoflistheaders,oneforeachpossiblecharactervalue). Figure8.9illustratestheprocess.

LSD-firstradixsortispreciselythealgorithmusedbycardsorters.Thesemachineshadaseriesofbinsandcouldbeprogrammed(usingplugboards)todrop cardsfromafeederintobinsdependingonwhatwaspunchedin aparticularcolumn.Byrepeatingtheprocessforeachcolumn,oneendedupwithasorteddeck ofcards.

Eachdistributionofarecordtoabintakes(about)constant time(assumingwe usepointerstoavoidmovinglargeamountsofdataaround).Thus,thetotaltime isproportionaltothetotalamountofkeydata—whichisthetotalnumberofbytes inallkeys.Inotherwords,radixsortingis O(B)where B isthetotalnumberof bytesofkeydata.Ifkeysare K byteslong,then B = NK,where N isthenumber ofrecords.Sincemergesorting,heapsorting,etc.,require O(N lg N )comparisons, eachrequiringintheworstcase K time,wegetatotaltimeof O(NK lg N )= O(B lg N )timeforthesesorts.Evenifweassumeconstantcomparison time,if keysarenolongerthantheyhavetobe(inordertoprovide N differentkeyswe musthave K ≥ logC N ,where C isthenumberofpossiblecharacters),thenradix sortingisalso O(N lg N ).

Thus,relaxingtheconstraintonwhatwecandotokeysyields afastsorting procedure,atleastinprinciple.Asusual,theDevilisinthedetails.Ifthekeys areconsiderablylongerthanlogC N ,astheyveryoftenare,thepassesmadeonthe lastcharacterswilltypicallybelargelywasted.Onepossibleimprovement,which KnuthcreditstoM.D.Maclaren,istouseLSD-firstradixsort onthefirsttwo characters,andthenfinishwithaninsertionsort(onthetheorythatthingswill almostbeinorderaftertheradixsort).Wemustfudgethedefinitionof“character” forthispurpose,allowingcharacterstogrowslightlywith N .Forexample,when N =100000,Maclaren’soptimalprocedureistosortonthefirst andsecond10-bit segmentsofthekey(onan8-bitmachine,thisisthefirst2.25 characters).Of course,thistechniquecan,inprinciple,makenoguaranteesof O(B)performance.

8.10.2MSD-firstradixsorting

Performingradixsortstartingatthemostsignificantdigit probablyseemsmore naturaltomostofus.Wesorttheinputbythefirst(most-significant)character into C (orfewer)subsequences,oneforeachstartingcharacter(thatis,thefirst

8.10.RADIXSORTING 159

Initial:set,cat,cad,con,bat,can,be,let,bet

be ‘ ⊔ ’

cad ‘d’

can con ‘n’

bet let bat cat set ‘t’

Afterfirstpass:be,cad,con,can,set,cat,bat,let,bet bat cat can cad ‘a’

bet let set be ‘e’

con ‘o’

Aftersecondpass:cad,can,cat,bat,be,set,let,bet,con bet be bat ‘b’

con cat can cad ‘c’

let ‘l’

set ‘s’

Afterfinalpass:bat,be,bet,cad,can,cat,con,let,set

Figure8.9:AnexampleofaLSD-firstradixsort.Eachpasssortsbyonecharacter, startingwiththelast.Sortingconsistsofdistributingtherecordstobinsindexed bycharacters,andthenconcatenatingthebins’contentstogether.Onlynon-empty binsareshown.

160 CHAPTER8.SORTINGANDSELECTING

Figure8.10:AnexampleofanMSDradixsortonthesamedataas inFigure8.9. Thefirstlineshowstheinitialcontentsof A andthelastshowsthefinalcontents. Partially-sortedsegmentsthatagreeintheirinitialcharactersareseparatedby singleslash(/)characters.The ⋆ characterindicatesthesegmentofthearraythat isabouttobesortedandthe posn columnshowswhichcharacterpositionisabout tobeusedforthesort.

characterofallthekeysinanygivensubsequenceisthesame).Next,wesorteach ofthesubsequencesthathasmorethanonekeyindividuallybyitssecondcharacter, yieldinganothergroupofsubsequencesinwhichallkeysinanygivensubsequence agreeintheirfirsttwocharacters.Thisprocesscontinuesuntilallsubsequences areoflength1.Ateachstage,weorderthesubsequences,sothatonesubsequence precedesanotherifallitsstringsprecedeallthoseintheother.Whenwearedone, wesimplywriteoutallthesubsequencesintheproperorder.

Thetrickypartiskeepingtrackofallthesubsequencessothattheycanbe outputintheproperorderattheendandsothatwecanquickly findthenext subsequenceoflengthgreaterthanone.Hereisasketchofonetechniqueforsorting anarray;itisillustratedinFigure8.10.

staticfinalintALPHA=sizeofalphabetofdigits;

/**SortA[L..U]stably,ignoringthefirstkcharactersineachkey.*/ staticvoidMSDradixSort(Record[]A,intL,intU,intk){ int[]countLess=newint[ALPHA+1];

SortA[L..U]stablybythekthcharacterofeachkey,andforeach digit,c,setcountLess[c]tothenumberofrecordsinA whosekthcharactercomesbeforecinalphabeticalorder.

for(inti=0;i<=ALPHA;i+=1)

if(countLess[i+1]-countLess[i]>1)

MSDradixSort(A,L+countLess[i], L+countLess[i+1]-1,k+1);

8.10.RADIXSORTING 161 A posn ⋆ set,cat,cad,con,bat,can,be,let,bet 0 ⋆ bat,be,bet/cat,cad,con,can/let/set 1 bat/ ⋆ be,bet/cat,cad,con,can/let/set 2 bat/be/bet/ ⋆ cat,cad,con,can/let/set 1 bat/be/bet/ ⋆ cat,cad,can/con/let/set 2 bat/be/bet/cad/can/cat/con/let/set
}

8.11Usingthelibrary

Notwithstandingallthetroublewe’vetakeninthischapter tolookatsortingalgorithms,inmostprogramsyoushouldn’teventhinkaboutwritingyourownsorting subprogram!Goodlibrariesprovidethemforyou.TheJavastandardlibraryhas aclasscalled java.util.Collections,whichcontainsonlystaticdefinitionsof usefulutilitiesrelatedtoCollections.Forsorting,wehave

/**SortLstablyintoascendingorder,asdefinedbyC.Lmust *bemodifiable,butneednotbeexpandable.*/

publicstatic<T>voidsort(List<T>L,Comparator<?superT>c){ } /**SortLintoascendingorder,asdefinedbythenaturalordering *oftheelements.Lmustbemodifiable,butneednotbeexpandable.*/ publicstatic<TextendsComparable<T>>voidsort(List<T> L){ ··· }

Thesetwomethodsuseaformofmergesort,guaranteeing O(N lg N )worst-case performance.Giventhesedefinitions,youshouldnotgenerallyneedtowriteyour ownsortingroutineunlessthesequencetobesortedisextremelylarge(inparticular,ifitrequiresexternalsorting),iftheitemstobesortedhaveprimitivetypes (like int),oryouhaveanapplicationwhereitisnecessarytosqueeze everysingle microsecondoutofthealgorithm(arareoccurrence).

8.12Selection

Considertheproblemoffindingthe median valueinanarray—avalueinthearray withasmanyarrayelementslessthanitasgreaterthanit.Abrute-forcemethod offindingsuchanelementistosortthearrayandchoosethemiddleelement(or a middleelement,ifthearrayhasanevennumberofelements). However,wecando substantiallybetter.

Thegeneralproblemis selection—givena(generallyunsorted)sequenceofelementsandanumber k,findthe kth valueinthesortedsequenceofelements.Finding amedian,maximum,orminimumvalueisaspecialcaseofthisgeneralproblem. PerhapstheeasiestefficientmethodisthefollowingsimpleadaptationofHoare’s quicksortalgorithm.

/**Assuming0<=k<N,returnarecordofAwhosekeyiskthsmallest *(k=0givesthesmallest,k=1,thenextsmallest,etc.).Amay *bepermutedbythealgorithm.*/

Recordselect(Record[]A,intL,intU,intk){

RecordT=somememberofA[L..U];

PermuteA[L..U]andfindptoestablishthepartitioning condition:

162 CHAPTER8.SORTINGANDSELECTING
key ≤ key(T) L T p key ≥ key(T) U ;

if(p-L==k) returnT; if(p-L<k)

returnselect(A,p+1,U,k-p+L-1); else

returnselect(A,L,p-1,k); }

Thekeyobservationhereisthatwhenthearrayispartitionedasforquicksort,the value T isthe(p L)stsmallestelement;the p L smallestrecordkeyswillbein A[L..p-1];andthelargerrecordkeyswillbein A[p+1..U].Hence,if k<p L,the kth smallestkeyisintheleftpartof A andif k>p,itmustbethe(k p + L 1)st largestkeyintherighthalf.

Optimistically,assumingthateachpartitiondividesthearrayinhalf,therecurrencegoverningcosthere(measuredinnumberofcomparisons)is

C(1)=0

C(N )= N + C(⌈N/2⌉)

where N = U L+1.The N comesfromthecostofpartitioning,andthe C(⌈N/2⌉) fromtherecursivecall.Thisdiffersfromthequicksortandmergesortrecurrences bythefactthatthemultiplierof C(···)is1ratherthan2.For N =2m weget C

Thisalgorithmisonlyprobabilisticallygood,justaswasquicksort.Thereare selectionalgorithmsthat guarantee linearbounds,butwe’llleavethemtoacourse onalgorithms.

Exercises

8.1. Youaregiventwosetsofkeys(i.e.,sothatneithercontains duplicatekeys), S0 and S1,bothrepresentedasarrays.Assumingthatyoucancomparekeysfor “greaterthanorequalto,”howwouldyoucomputetheintersectionofthe S0 and S1,andhowlongwouldittake?

8.2. Givenalargelistofwords,howwouldyouquicklyfindallanagramsinthe list?(An anagram hereisawordinthelistthatcanbeformedfromanotherword onthelistbyrearrangingitsletters,asin“dearth”and“thread”).

8.12.SELECTION 163
N
m + C(2m 1) =2m +2m 1 + C(2m 2) =2m+1 1=2N 1 ∈ Θ(N )
(
)=2

8.3. Supposethatwehaveanarray, D,of N records.Withoutmodifyingthis array,Iwouldliketocomputean N -elementarray, P ,containingapermutationof theintegers0to N 1suchthatthesequence D[P [0]],D[P [1]],...,D[P [N 1]]is sorted stably.Giveageneralmethodthatworkswithanysortingalgorithm (stable ornot)anddoesn’trequireanyadditionalstorage(otherthanthatnormallyused bythesortingalgorithm).

8.4. Averysimplespellingcheckersimplyremovesallendingpunctuationfrom itswordsandlooksupeachinadictionary.Comparewaysofdoingthisfromthe classesintheJavalibrary:usingan ArrayList tostorethewordsinsortedorder,a TreeSet,anda HashSet.There’slittleprogramminginvolved,asidefromlearning tousetheJavalibrary.

8.5. Iamgivenalistofrangesofnumbers,[xi,x′ i],eachwith0 ≤ xi <x′ i ≤ 1. Iwanttoknowalltherangesofvaluesbetween0and1thatare not coveredby oneoftheserangesofnumbers.So,iftheonlyinputis[0 25, 0 5],thentheoutput wouldbe[0.0, 0.25]and[0.5, 1.0](nevermindtheendpoints).Showhowtodothis quickly.

164 CHAPTER8.SORTINGANDSELECTING

Chapter9

BalancedSearching

We’veseenthatbinarysearchtreeshaveaweakness:atendencytobebecome unbalanced, sothattheyareineffectiveindividingthesetofdatatheyrepresent intotwosubstantiallysmallerparts.Let’sconsiderwhatwecandoaboutthis.

Ofcourse,wecouldalwaysrebalanceanunbalancedtreebysimplylayingall thekeysoutinorderandthenre-insertingtheminsuchawayastokeepthetree balanced.Thatoperation,however,requirestimelinearin thenumberofkeysin thetree,anditisdifficulttoseehowtoavoidhavingaΘ(N 2)factorcreepinto thetimerequiredtoinsert N keys.Bycontrast,only O(N lg N )timeisrequired tomake N insertionsifthedatahappentobepresentedinanorderthat keepsthe treebushy.Solet’sfirstlookatoperationstore-balanceatree(orkeepitbalanced) withouttakingitapartandreconstructingit.

9.1BalancebyConstruction:B-Trees

Anotherwaytokeepasearchtreebalancedistobecarefulalwaysto“insertnew keysinagoodplace”sothatthetreeremainsbushybyconstruction.Thedatabase communityhaslongusedadatastructurethatdoesexactlythis:the B-tree 1.We willdescribethedatastructureandoperationsabstractly here,ratherthangive code,sinceinpracticethereisawholeraftofdevicesoneusestogainspeed. A B-treeoforder m isapositionaltreewiththefollowingproperties:

1.Allnodeshave m orfewerchildren.

2.Allnodesotherthantheroothaveatleast m/2children(wecanalsosaythat eachnodeotherthantherootcontainsatleast ⌈m/2⌉ children2).

3.Anodewithchildren C0,C1,...,Cn 1 islabeledwithkeys K1,...,Kn 1 (thinkofkey Ki asresting“between” Ci 1 and Ci),with K1 <K2 < <

1 D.Knuth’sreference:R.BayerandE.McCreight, ActaInformatica (1972),173–189,andalso unpublishedindependentworkbyM.Kaufman.

2 Thenotation ⌈x⌉ means“thesmallestintegerthatis ≥ x.”

Kn 1.
165

4.AB-treeisasearchtree:Foranynode,allkeysinthesubtreerootedat Ci arestrictlylessthan Ki+1,and(for i> 0),strictlygreaterthan Ki.

5.Alltheemptychildrenoccuratthesamelevelofthetree.

Figure9.1containsanexampleofanorder-4tree.Inrealimplementations,B-trees tendtobekeptonsecondarystorage(disksandthelike),withtheirnodesbeing readinasneeded.Wechoose m soastomakethetransferofdatafromsecondary storageasfastaspossible.Disksinparticulartendtohave minimumtransfertimes foreachreadoperation,sothatforawiderangeofvaluesof m,thereislittle differenceinthetimerequiredtoreadinanode.Making m toosmallinthatcase isanobviouslybadidea.

We’llrepresentthenodesofaB-treewithastructurewe’llcalla BTreeNode, forwhichwe’llusethefollowingterminology:

B.child(i) Childnumber i ofB-treenode B,where0 ≤ i<m

B.key(i) Keynumber i ofB-treenode B,where1 ≤ i<m.

B.parent() Theparentnodeof B

B.index() Theinteger, i,suchthat B==B.parent().child(i)

B.arity() Thenumberofchildrenin B.

AnentireB-tree,then,wouldconsistofapointertotheroot,withperhapssome extrausefulinformation,suchasthecurrentsizeoftheB-tree.

Becauseofproperties(2)and(5),aB-treecontaining N keysmusthave O(logm/2 N ) levels.Becauseofproperty(1),searchingasinglenode’skeystakes O(1)time(we assume m isfixed).Therefore,searchingaB-treebythefollowingobviousrecursive algorithmisan O(logm N )= O(lg N )operation:

booleansearch(BTreeNodeB,KeyX){ if(B istheemptytree)

returnfalse; else{

Findlargest c suchthat B.key(i) ≤ X,forall 1 ≤ i ≤ c. if(c>0&&X.equals(B.key(c)))

returntrue; else

returnsearch(B.child(c),K);

166 CHAPTER9.BALANCEDSEARCHING
}

Figure9.1:ExampleofaB-treeoforder4withintegerkeys.Circlesrepresentempty nodes,whichappearallatthesamelevel.Eachnodehastwoto fourchildren,and onetothreekeys.Eachkeyisgreaterthanallkeysinthechildrentoitsleftand lessthanallkeysinthechildrentoitsright.

9.1.1B-treeInsertion

Initially,weinsertintothebottomofaB-tree,justasforbinarysearchtrees. However,weavoid“scrawny”treesbyfillingournodesupandsplittingthem, ratherthanextendingthemdown.Theideaissimple:wefindan appropriateplace atthebottomofthetreetoinsertagivenkey,andperformthe insertion(also addinganadditionalemptychild).Ifthismakesthenodetoo big(sothatithas m keysand m +1(empty)children),we split thenode,asinthecodeinFigure9.2. Figure9.3illustratestheprocess.

9.1.2B-treedeletion

DeletingfromaB-treeisgenerallymorecomplicatedthaninsertion,butnottoo bad.Asusual,real,productionimplementationsintroduce numerousintricaciesfor speed.Tokeepthingssimple,I’lljustdescribeastraightforward,idealizedmethod. Takingourcuefromthewaythatinsertionworks,wewillfirst movethekeytobe deleteddowntothebottomofthetree(wheredeletionisstraightforward).Then, ifdeletionhasmadetheoriginalnodetoosmall,wemergeitwithasibling,pulling downthekeythatusedtoseparatethetwofromtheparent.The pseudocodein Figure9.4describestheprocess,whichisalsoillustrated inFigure9.5.

10 20 30 40 50 60 95 100 120 130 140 150 255590 125 115
9.1.BALANCEBYCONSTRUCTION:B-TREES 167

/**SplitB-treenodeB,whichhasfrom m +1 to 2m +1 *children.*/

voidsplit(BTreeNodeB){ intk=B.arity()/2; KeyX=B.key(k);

BTreeNodeB2= anewBTreenode ; move B.child(k) through B.child(m) and B.key(k+1) through B.key(m) outof Band into B2; remove B.key(k) fromB ; if(B wastheroot ){ createanewrootwithchildren B and B2 andwithkey X; }else{ BTreeNodeP=B.parent(); intc=B.index();

insertchild B2 atposition c+1 in P,andkey X atposition c+1 in P,movingsubsequent childrenandkeysof P overasneeded; if(P.arity()> m) split(P);

168 CHAPTER9.BALANCEDSEARCHING
} }
Figure9.2:SplittingaB-treenode.Figure9.3containsillustrations.
9.1.BALANCEBYCONSTRUCTION:B-TREES 169 (a)Insert15: 10 20 10 15 20 (b)Insert145: 120 130 140 150 125 120 130 145 150 125140 (c)Insert35: 10 15 20 30 40 50 60 95 100 255590 115 10 15 20 30 40 50 60 95 100 25 5590 35115
Figure9.3: InsertingintoaB-tree.Theexamplesmodifythetreein9.1byinserting 15,145,andthen35.

/**DeleteB.key(i)fromtheBTreecontainingB.*/ voiddeleteKey(BTreeNodeB,inti){

if(B’s childrenareallempty ) remove B.key(i), movingoverremainingkeys ; else{ intn=B.child(i-1).arity(); merge(B,i);

//Thekeywewanttodeleteisnow#nofchild#i-1. deleteKey(B.child(i-1),n);

if(B.arity()> m)//Happensonlyonrecursivecalls split(B); regroup(B);

/**MoveB.key(i)andthecontentsofB.child(i)intoB.child(i-1), *afteritsexistingkeysandchildren.RemoveB.key(i)and *B.child(i)fromB,movingovertheremainingcontents.

*(TemporarilymakesB.child(i-1)toolarge.Theformer *B.child(i)becomesgarbage).*/ voidmerge(BTreeNodeB,inti){ implementationstraightforward }

/**IfBhastoofewchildren,regrouptheB-treetore-establish *theB-treeconditions.*/ voidregroup(BTreeNodeB){

if(B istheroot &&B.arity()==1) make B.child(0) thenewroot; elseif(B isnottheroot &&B.arity()< m/2){ if(B.index()==0)

merge(B.parent(),1); else

merge(B.parent(),B.index()); regroup(B.parent());

170 CHAPTER9.BALANCEDSEARCHING
}
}
} }
Figure9.4:DeletingfromaB-treenode.SeeFigure9.5foran illustration.

9.1.BALANCEBYCONSTRUCTION:B-TREES 171

removeitandsplitresultingnode(whichistoobig),at15.Next,(b)showsdeletion of10fromthetreeproducedby(a).Deleting10fromitsnodeatthebottommakes thatnodetoosmall,sowemergeit,moving15downfromtheparent.Thatinturn makestheparenttoosmall,sowemergeit,moving35downfrom theroot,giving thefinaltree.

(a)Stepsinremoving25: 10 15 20 30 25 10 15 20 25 30 10 20 30 15 (b)Removing10: 10 20 30 40 50 60 95 100 15 5590 ··· 35115 15 20 30 40 50 60 95 100 355590 115
Figure9.5: DeletionfromaB-tree.Theexamplesstartfromthefinaltree in Figure9.3c.In(a),weremove25.First,mergetomove25tothebottom.Then

9.1.3Red-BlackTrees:BinarySearchTreesas(2,4)Trees

GeneralB-treesusenodesthatare,ineffect,orderedarrays ofkeys.Order4Btrees(alsoknownas (2,4)trees,)lendthemselvestoanalternativerepresentation, knownas red-blacktrees. Every(2,4)treemapsontoparticularbinarysearchtree insuchawaythateach(2,4)nodecorrespondstoasmallclusterof1–3binary search-treenodes.Asaresult,thebinarysearchtreeisroughlybalanced,inthe sensethatallpathsfromroottoleaveshavelengthsthatdifferbyatmostafactor of2.Figure9.6showstherepresentationsofeachpossible(2,4)node.Inafulltree, wecanindicatetheboundariesoftheclusterswithaboolean quantitythatisset onlyintherootnodeofeachcluster.Traditionally,however,wedescribenodesin whichthisbooleanistrue(andalsothenullleafnodes)as“black”andtheother nodesas“red.”

ByconsideringFigure9.6andthestructureof(2,4)trees,youcanderivethat red-blacktreesarebinarysearchtreesthatadditionallyobeythefollowingconstraints(whichinstandardtreatmentsofred-blacktreesserveastheirdefinition):

A.Therootnodeandall(null)leavesareblack.

B.Everychildofarednodeisblack.

C.Anypathfromtheroottoaleaftraversesthesamenumberof blacknodes.

Again,propertiesBandCtogetherimplythatred-blacktreesare“bushy.”

Search,ofcourse,isordinarybinary-treesearch.Because ofthemappingbetween(2,4)treesandred-blacktreesshowninthefigure,the algorithmsforinsertion anddeletionarederivablefromthosefororder-4B-trees.Theusualproceduresfor manipulatingred-blacktreesdon’tusethiscorrespondencedirectly,butformulate thenecessarymanipulationsasordinarybinarysearch-treeoperationsfollowedby rebalancingrotations(see §9.3)andrecoloringsofnodesthatareguidedbythe colorsofnodesandtheirneighbors.Wewon’tgointodetails here.

9.2Tries

Looselyspeaking,balanced(maximallybushy)binarysearchtreescontaining N keysrequireΘ(lg N )timetofindakey.Thisisnotentirelyaccurate,ofcourse, becauseitneglectsthepossibilitythatthetimerequiredto compare againstakey dependsonthekey.Forexample,thetimerequiredtocompare twostringsdepends onthelengthoftheshorterstring.Therefore,inalltheplacesI’vesaid“Θ(lg N )” before,I really meant“Θ(L lg N )”for L aboundonthenumberofbytesinthe key.Inmostapplications,thisdoesn’tmuchmatter,since L tendstoincreasevery slowly,ifatall,as N increases.Nevertheless,wecometoaninterestingquestion: weevidentlycan’tgetridofthefactorof L tooeasily(afterall,youhavetolook atthekeyyou’researchingfor),butcanwegetridofthefactoroflg N ?

172 CHAPTER9.BALANCEDSEARCHING

Ontheleftarethethreepossiblecasesforasingle(2,4)node(onetothreekeys, ortwotofourchildren).Ontherightarethecorrespondingbinarysearchtrees.In eachcase,thetopbinarynodeiscoloredblackandtheothers arered.

9.2.TRIES 173 A B k1 k1 A B A B C k1 k2 k1 A k2 B C k2 k1 A B C or A B C D k1 k2 k3 k2 k1 A B k3 C D
Figure9.6: Representationsof(2,4)nodeswith“red”and“black”binarytreenodes.

9.2.1Tries:basicpropertiesandalgorithms

Itturnsoutthatwecanavoidthelg N factor,usingadatastructureknownasa trie 3.Apuretrieisakindoftreethatrepresentsasetofstringsfromsomealphabet offixedsize,say A = {a0,...,aM 1}.Oneofthecharactersisaspecialdelimiter thatappearsonlyattheendsofwords,‘✷’.Forexample, A mightbethesetof printableASCIIcharacters,with ✷ representedbyanunprintablecharacter,such as ’\000’ (NUL).Atrie, T ,maybeabstractlydefinedbythefollowingrecursive definition4:Atrie, T ,iseither

• empty,or

• aleafnodecontainingastring,or

• aninternalnodecontaining M childrenthatarealsotries.Theedgesleading tothesechildrenarelabeledbythecharactersofthealphabet, ai,likethis: Ca0 ,Ca1 ,...CaM 1 .

Wecanthinkofatrieasatreewhoseleafnodesarestrings.We imposeoneother condition:

• Ifbystartingattherootofatrieandfollowingedgeslabeled s0,s1,...,sh 1, wereachastring,thenthatstringbegins s0s1 sh 1.

Therefore,youcanthinkofeveryinternalnodeofatrieasstandingforsome prefix ofallthestringsintheleavesbelowit:specifically,aninternalnodeatlevel k standsforthefirst k charactersofeachstringbelowit.

Astring S = s0s1 ··· sm 1 isin T ifbystartingattherootof T andfollowing0 ormoreedgeswithlabeled s0 ··· sj ,wearriveatthestring S.Wewillpretendthat allstringsin T endin ✷,whichappearsonlyasthelastcharacterofastring.

3Howisitpronounced?Ihavenoidea.Thewordwassuggestedby E.Fredkinin1960,who deriveditfromtheword“retrieval.Despitethisetymology,Iusuallypronounceitlike“try”to avoidverbalconfusionwith“tree.”

4ThisversionofthetriedatastructureisdescribedinD.E.Knuth, TheArtofProgramming, vol.3,whichis the standardreferenceonsortingandsearching.Theoriginaldatastructure, proposedin1959bydelaBriandais,wasslightlydifferent.

174 CHAPTER9.BALANCEDSEARCHING

{a,abase,abash,abate,abbas,axe, axolotl,fabric,facet}.Theinternalnodesarelabeledtoshowthestringprefixesto whichtheycorrespond.

9.2.TRIES 175 a a ✷ a✷ b ab a aba s abas e abase✷ h abash✷ t abate✷ b abbas✷ x ax e axe✷ o axolotl✷ f f a fa b fabric✷ c facet✷
a a ✷ a✷ b ab a aba s abas e abase✷ h abash✷ t abate✷ b abbas✷ x ax e axe✷ o axolotl✷ b bat✷ f f a fa b fabric✷ c fac e face p faceplate✷ t facet✷
Figure9.7:Atriecontainingthesetofstrings Figure9.8:Resultofinsertingthestrings“bat”and“faceplate”intothetriein Figure9.7.

Figure9.7showsatriethatrepresentsasmallsetofstrings.Toseeifastring isintheset,westartattherootofthetrieandfollowtheedges(linkstochildren) markedwiththesuccessivecharactersinthestringwearelookingfor(including theimaginary ✷ attheend).Ifwesucceedinfindingastringsomewherealong thispathanditequalsthestringwearesearchingfor,thenthestringweare searchingforisinthetrie.Ifwedon’t,itisnotinthetrie. Foreachword,we needinternalnodesonlyasfardownastherearemultiplewordsstoredthatstart withthecharacterstraversedtothatpoint.Theconvention ofendingeverything withaspecialcharacterallowsustodistinguishbetweenasituationinwhichthe triecontainstwowords,oneofwhichisaprefixoftheother(like“a”and“abate”), fromthesituationwherethetriecontainsonlyonelongword.

Fromatrieuser’spointofview,itlookslikeakindoftreewithStringlabels: publicabstractclassTrie{

/**TheemptyTrie.*/ publicstaticfinalTrieEMPTY=newEmptyTrie();

/**Thelabelatthisnode.Definedonlyonleaves.*/ abstractpublicStringlabel();

/**TrueifXisinthisTrie.*/ publicbooleanisIn(Stringx)...

/**TheresultofinsertingXintothisTrie,ifitisnot *alreadythere,andreturningthis.Thistrieis *unchangedifXisinitalready.*/ publicTrieinsert(Stringx)...

/**TheresultofremovingXfromthisTrie,ifitispresent. *ThetrieisunchangedifXisnotpresent.*/ publicTrieremove(Stringx)...

/**TrueifthisTrieisaleaf(containingasingleString).*/ abstractpublicbooleanisLeaf();

/**TrueifthisTrieisempty*/ abstractpublicbooleanisEmpty();

/**ThechildnumberedwithcharacterK.Requiresthatthisnode *notbeempty.Child0correspondsto ✷.*/ abstractpublicTriechild(intk);

/**SetthechildnumberedwithcharacterKtoC.Requiresthat *thisnodenotbeempty.(Intendedonlyforinternaluse.*/ abstractprotectedvoidsetChild(intk,TrieC);

176 CHAPTER9.BALANCEDSEARCHING
}

Thefollowingalgorithmdescribesasearchthroughatrie.

/**TrueifXisinthisTrie.*/ publicbooleanisIn(Stringx){ TrieP=longestPrefix(x,0); returnP.isLeaf()&&x.equals(P.label()); }

/**ThenoderepresentingthelongestprefixofX.substring(K)that *matchesaStringinthistrie.*/ privateTrielongestPrefix(Stringx,intk){ if(isEmpty()||isLeaf()) returnthis; intc=nth(x,k); if(child(c).isEmpty()) returnthis; else returnchild(c).longestPrefix(x,k+1); }

/**CharacterKofX,or ✷ ifKisofftheendofX.*/ staticcharnth(Stringx,intk){ if(k>=x.length()) return(char)0; else returnx.charAt(k); }

Itshouldbeclearfromfollowingthisprocedurethatthetimerequiredtofind akeyisproportionaltothelengthofthekey.Infact,thenumberoflevelsofthe triethatneedtobetraversedcanbeconsiderablylessthanthelengthofthekey, especiallywhentherearefewkeysstored.However,ifastringisinthetrie,you willhavetolookatallitscharacters,so isIn hasaworst-casetimeofΘ(x.length).

Toinsertakey X inatrie,weagainfindthelongestprefixof X inthetrie, whichcorrespondstosomenode P .Then,if P isaleafnode,weinsertenough internalnodestodistinguish X from P .label().Otherwise,wecaninsertaleaf for X intheappropriatechildof P .Figure9.8illustratestheresultsofadding“bat” and“faceplate”tothetrieinFigure9.7.Adding“bat”simplyrequiresaddinga leaftoanexistingnode.Adding“faceplate”requiresinsertingtwonewnodesfirst. Themethod insert belowperformsthetrieinsertion.

/**TheresultofinsertingXintothisTrie,ifitisnot *alreadythere,andreturningthis.Thistrieis *unchangedifXisinitalready.*/ publicTrieinsert(StringX)

9.2.TRIES 177

CHAPTER9.BALANCEDSEARCHING

{ returninsert(X,0);

/**AssumesthisisalevelLnodeinsomeTrie.Returnsthe*/ *resultofinsertingXintothisTrie.Hasnoeffect(returns *this)ifXisalreadyinthisTrie.*/ privateTrieinsert(StringX,intL)

if(isEmpty()) returnnewLeafTrie(X); intc=nth(X,L); if(isLeaf()){

if(X.equals(label())) returnthis; elseif(c==label().charAt(L)) returnnewInnerTrie(c,insert(X,L+1)); else{

TrienewNode=newInnerTrie(c,newLeafTrie(X)); newNode.child(label().charAt(L),this); returnnewNode;

}else{ child(c,child(c).insert(X,L+1)); returnthis;

Here,theconstructorfor InnerTrie(c,T ),describedlater,givesusaTrieforwhich child(c)is T andallotherchildrenareempty.

Deletingfromatriejustreversesthisprocess.Wheneveratrienodeisreduced tocontainingasingleleaf,itmaybereplacedbythatleaf.Thefollowingprogram indicatestheprocess.

publicTrieremove(Stringx)

{ returnremove(x,0);

/**RemovexfromthisTrie,whichisassumedtobelevelL,and *returntheresult.*/ privateTrieremove(Stringx,intL)

if(isEmpty()) returnthis;

if(isLeaf(T)){

178
}
{
}
} }
}
{

if(x.equals(label())) returnEMPTY; else returnthis; } intc=nth(x,L); child(c,child(c).remove(x,L+1)); intd=onlyMember(); if(d>=0) returnchild(d); returnthis;

/**IfthisTriecontainsasinglestring,whichisin *child(K),returnK.Otherwisereturns-1. privateintonlyMember(){/*Lefttothereader.*/}

9.2.2Tries:Representation

Weareleftwiththequestionofhowtorepresentthesetries. Themainproblem ofcourseisthatthenodescontainavariablenumberofchildren.Ifthenumberof childrenineachnodeissmall,alinkedtreerepresentation likethosedescribedin §5.2willwork.However,forfastaccess,itistraditionalto useanarraytoholdthe childrenofanode,indexedbythecharactersthatlabeltheedges. Thisleadstosomethinglikethefollowing:

classEmptyTrieextendsTrie{ publicbooleanisEmpty(){returntrue;} publicbooleanisLeaf(){returnfalse;} publicStringlabel(){thrownewError(...);} publicTriechild(intc){thrownewError(...);}

protectedvoidchild(intc,TrieT){thrownewError(...);}

classLeafTrieextendsTrie{ privateStringL;

/**ATriecontainingjustthestringS.*/ LeafTrie(Strings){L=s;}

publicbooleanisEmpty(){returnfalse;} publicbooleanisLeaf(){returntrue;} publicStringlabel(){returnL;} publicTriechild(intc){returnEMPTY;}

protectedvoidchild(intc,TrieT){thrownewError(...);} }

9.2.TRIES 179
}
}

classInnerTrieextendsTrie{ //ALPHABETSIZEhastobedefinedsomewhere*/ privateTrie[]kids=newkids[ALPHABETSIZE];

/**ATriewithchild(K)==Tandallotherchildrenempty.*/ InnerTrie(intk,TrieT){ for(inti=0;i<kids.length;i+=1) kids[i]=EMPTY; child(k,T); }

publicbooleanisEmpty(){returnfalse;} publicbooleanisLeaf(){returnfalse;} publicStringlabel(){thrownewError(...);} publicTriechild(intc){returnkids[c];} protectedvoidchild(intc,TrieT){kids[c]=T;}

9.2.3Tablecompression

Actually,ouralphabetislikelytohave“holes”init—stretchesofencodingsthat don’tcorrespondtoanycharacterthatwillappearintheStringsweinsert.We couldcutdownonthesizeoftheinnernodes(the kids arrays)byperforminga preliminarymappingof chars intoacompressedencoding.Forexample,ifthe onlycharactersinourstringsarethedigits0–9,thenwecouldre-do InnerTrie as follows:

classInnerTrieextendsTrie{ privatestaticchar[]charMap=newchar[’9’+1];

static{ charMap[0]=0; charMap[’0’]=1;charMap[’1’]=1;... }

publicTriechild(intc){returnkids[charMap[c]];}

protectedvoidchild(intc,TrieT){kids[charMap[c]]=T;}

Thishelps,butevenso,arraysthatmaybeindexedbyallcharactersvalidina keyarelikelytoberelativelylarge(foratreenode)—sayon theorderof M =60 bytesevenfornodesthatcancontainonlydigits(assuming4 bytesperpointer,4 bytesoverheadforeveryobject,4bytesforalengthfieldinthearray).Ifthereisa totalof N charactersinallkeys,thenthespaceneededisboundedbyabout NM/2. Theboundisreachedonlyinahighlypathologicalcase(wherethetriecontainsonly

180 CHAPTER9.BALANCEDSEARCHING
}
}

181

twoverylongstringsthatareidenticalexceptintheirlast characters).Nevertheless, thearraysthatariseintriescanbequite sparse.

Oneapproachtosolvingthisisto compress thetables.Thisisespeciallyapplicablewhentherearefewinsertionsoncesomeinitialsetofstringsisaccommodated. Bytheway,thetechniquesdescribedbelowaregenerallyapplicabletoanysuch sparsearray,notjusttries.

Thebasicideaisthatsparsearrays(i.e.,thosethatmostly containemptyor “null”entries)canbe overlaid ontopofeachotherbymakingsurethatthenon-null entriesinonefallontopofnullentriesintheothers.Weallocateallthearraysin asinglelargeone,andstoreextrainformationwitheachentrysothatwecantell whichoftheoverlaidarraysthatentrybelongsto.Figure9.9showsanappropriate alternativedatastructure.

Theideaisthatwhenwestoreeverybody’sarrayofkidsinone place,andstore anedgelabelthattellsuswhatcharacterissupposedtocorrespondtoeachkid. Thatallowsustodistinguishbetweenaslotthatcontainssomebodyelse’schild (whichmeansthatIhavenochildforthatcharacter),andaslotthatcontainsone ofmychildren.Wearrangethatthe me fieldforeverynodeisuniquebymaking surethatthe0thchild(correspondingto ✷)isalwaysfull.

Asanexample,Figure9.10showstheteninternalnodesofthe trieinFigure9.8 overlaidontopofeachother.Asthefigureillustrates,this representationcanbe verycompact.Thenumberofextraemptyentriesthatareneededontheright(to preventindexingofftheendofthearray)islimitedto M 1,sothatitbecomes negligiblewhenthearrayislargeenough.(Aside:Whendealingwithasetofarrays thatonewishestocompressinthisway,itisbesttoallocate thefullest(leastsparse) first.)

Suchclosepackingcomesataprice:insertionsareexpensive.Whenoneaddsa newchildtoanexistingnode,thenecessaryslotmayalready beusedbysomeother array,makingitnecessarytomovethenodetoanewlocationby(ineffect)first erasingitsnon-nullentriesfromthepackedstoragearea,findinganotherspotfor itandmovingitsentriesthere,andfinallyupdatingthepointertothenodebeing movedinitsparent.Therearewaystomitigatethis,butwewon’tgointothem here.

9.3RestoringBalancebyRotation

AnotherapproachistofindanoperationthatchangesthebalanceofaBST— choosinganewrootthatmoveskeysfromadeepsidetoashallowside—while preservingthebinarysearchtreeproperty.Thesimplestsuchoperationsarethe rotations ofatree.Figure9.11showstwoBSTsholdingidenticalsetsofkeys. Considertherightrotationfirst(theleftisamirrorimage).First,therotation preservesthebinarysearchtreeproperty.Intheunrotated tree,thenodesin A are

9.3.RESTORINGBALANCEBYROTATION

abstractclassTrie{

staticprotectedTrie[]allKids; staticprotectedchar[]edgeLabels; staticfinalcharNOEDGE=/*Somecharthatisn’tused.*/ static{

allKids=newTrie[INITIAL_SPACE]; edgeLabels=newchar[INITIAL_SPACE]; for(inti=0;i<INITIAL_SPACE;i+=1){ allKids[i]=EMPTY;edgeLabels[i]=NOEDGE;

classInnerTrieextendsTrie{

/*Positionofmychild0inallKids.Mykthchild,if *non-empty,isatallKids[me+k].Ifmykthchildis *notempty,thenedgeLabels[me+k]==k.edgeLabels[me]

*isalways0(✷).*/ privateintme;

/**ATriewithchild(K)==Tandallotherchildrenempty.*/ InnerTrie(intk,TrieT){

//SetmesuchthatedgeLabels[me+k].isEmpty().*/ child(0,EMPTY); child(k,T);

publicTriechild(intc){

if(edgeLabels[me+c]==c) returnallKids[me+c]; else returnEMPTY;

protectedvoidchild(intc,TrieT){

if(edgeLabels[me+c]!=NOEDGE&& edgeLabels[me+c]!=c){

//Movemykidstoanewlocation,andpointmeatit.

allKids[me+c]=T; edgeLabels[me+c]=c;

Figure9.9:DatastructuresusedwithcompressedTries.

182
CHAPTER9.BALANCEDSEARCHING
} }
}
...
}
}
}
} }

Figure9.10: ApackedversionofthetriefromFigure9.8.Eachofthetrienodesfrom thatfigureisrepresentedasanarrayofchildrenindexedbycharacter,thecharacter thatistheindexofachildisstoredintheupperrow(whichcorrespondstothearray edgeLabels).Thepointertothechilditselfisinthelowerrow(whichcorresponds tothe allKids array).Emptyboxesontopindicateunusedlocations(the NOEDGE value).Tocompressthediagram,I’vechangedthecharacter setencodingsothat ✷ is0,‘a’is1,‘b’is2,etc.Thecrossedboxesinthelowerrowindicateemptynodes. Theremustalsobeanadditional24emptyentriesontheright (notshown)to accountforthec–zentriesoftherightmosttrienodestored.Thesearchalgorithm uses edgeLabels todeterminewhenanentryactuallybelongstothenodeitis currentlyexamining.Forexample,therootnodeissupposed tocontainentriesfor ‘a’,‘b’,and‘f’.Andindeed,ifyoucount1,2,and6overfrom the“root”box above,you’llfindentrieswhoseedgelabelsare‘a’,‘b’,and

‘f’.If,ontheother

hand,youcountover3fromtherootbox,lookingforthenon-existent‘c’edge,you findinsteadanedgelabelof‘e’,tellingyouthattherootnodehasno‘c’edge.

theexactlytheoneslessthan B,astheyareontheright; D isgreater,asonthe right;andsubtree C isgreater,asontheright.Youcanalsoassureyourselfthat thenodesunder D intherotatedtreebeartheproperrelationtoit.

Turningtoheight,let’susethenotation HA, HC , HE, HB,and HD todenote theheightsofsubtrees A, C,and E andofthesubtreeswhoserootsarenodes B and D.Anyof A, C,or E canbeempty;we’lltaketheirheightsinthatcaseto be 1.Theheightofthetreeontheleftis1+max(HE , 1+ HA, 1+ HC ).The heightofthetreeontherightis1+max(HA, 1+ HC , 1+ HE).Therefore,aslong as HA > max(HC +1,HE )(aswouldhappeninaleft-leaningtree,forexample), theheightoftheright-handtreewillbelessthanthatofthe left-handtree.One getsasimilarsituationintheotherdirection.

Infact,itispossibletoconvertanyBSTintoanyotherthatcontainsthesame keysbymeansofrotations.Thisamountstoshowingthatbyrotation,wecanmove anynodeofaBSTtotherootofthetreewhilepreservingthebinarysearchtree

9.3.RESTORINGBALANCEBYROTATION 183 • ✷ a✷ ✷ • b ✷ ✷ ✷ •root: ✷ • a • b bat✷ • e axe✷ • e abase✷ ✷ • f • h abash✷ ✷ • a • e • p faceplate✷ • o axolotl✷ • t facet✷ • s • t abate✷ • x ✷ • b fabric✷ • c ✷ • a • b abbas✷ ...

Figure9.11: Rotationsinabinarysearchtree.Trianglesrepresentsubtreesand circlesrepresentindividualnodes.Thebinarysearchtree relationismaintainedby bothoperations,butthelevelsofvariousnodesareaffected

property[whyisthissufficient?].Theargumentisaninductiononthestructureof trees.

• Itisclearlypossibleforemptyorone-elementtrees.

• Supposewewanttoshowitforalargertree,assuming(inductively)thatall smallertreescanberotatedtobringanyoftheirnodestheir root.Weproceed asfollows:

Ifthenodewewanttomaketherootisalreadythere,we’redone.

Ifthenodewewanttomaketherootisintheleftchild,rotate theleft childtomakeittherootoftheleftchild(inductivehypothesis).Then performarightrotationonthewholetree.

Similarlyifthenodewewantisintherightchild.

9.3.1AVLTrees

Ofcourse,knowingthatitispossibletore-arrangeaBSTbymeansofrotation doesn’ttelluswhichrotationstoperform.The AVLtree isanexampleofatechniqueforkeepingtrackoftheheightsofsubtreesandperformingrotationswhen theygettoofaroutofline.AnAVLtree5 issimplyaBSTthatsatisfiesthe

AVLProperty: theheightsoftheleftandrightsubtreesofeverynode differbyatmostone.

Addingordeletinganodeatthebottomofsuchatree(ashappenswiththesimple BSTinsertionanddeletionalgorithmsfrom §6.1)mayinvalidatetheAVLproperty, butitmayberestoredbyworkinguptowardtherootfromthepointoftheinsertion ordeletionandperformingcertainselectedrotations,dependingonthenatureof theimbalancethatneedstobecorrected.Inthefollowingdiagrams,theexpressions inthesubtreesindicatetheirheights.Anunbalancedsubtreehavingtheform

5Thenameistakenfromthenamesofthetwodiscoverers,Adel’son-Vel’ski˘ıandLandis.

184 CHAPTER9.BALANCEDSEARCHING A C B E D C E D A B D.rotateRight() B.rotateLeft()

h

+1

canberebalancedwithasingleleftrotation,givinganAVLtreeoftheform: h h

Finally,consideranunbalancedtreeoftheform h′ h′′

h +1

whereatleastoneof h′ and h′′ is h andtheotheriseither h or h 1.Here,wecan rebalancebyperformingtworotations,firstarightrotationonC,andthenaleft rotationonA,givingthecorrectAVLtree

Theotherpossiblecasesofimbalancethatresultfromaddingorremovingasingle nodearemirrorimagesofthese.

Thus,ifwekeeptrackoftheheightsofallsubtrees,wecanalwaysrestorethe AVLproperty,startingfromthepointofinsertionordeletionatthebaseofthe treeandproceedingupwards.Infact,itturnsoutthatitisn’tnecessarytoknow thepreciseheightsofallsubtrees,butmerelytokeeptrack ofthethreecasesat eachnode:thetwosubtreeshavethesameheight,theheightoftheleftsubtreeis

9.3.RESTORINGBALANCEBYROTATION
185 h h
B h C h A
A C
h h′ h′′ h
B

greaterby1,andtheheightoftherightsubtreeisgreaterby 1.

9.4SplayTrees

Rotationsallowustomoveanynodeofabinarysearchtreeascloseaswewant totherootofthetree,allthewhilemaintainingthebinarysearchtreeproperty. Attheveryleast,therefore,wecoulduserotationsinanunbalancedtreetomake commonlysearched-forkeysquicktofind.Itturnsoutwecandobetter.A splay tree6 isaformof self-adjustingbinarysearchtree, oneinwhichevenoperations thatdon’tchangethecontentofthetreecanneverthelessadjustitsstructureto speedupsubsequentoperations.Thisdatastructurehasthe interestingproperty thatsomeindividualoperationsmaytake O(N )time(for N itemsinthetree),but theamortizedcost(see §1.4)ofawholesequenceof K operations(including K insertions)isstill O(lg K).Itis,moreover,aparticularlysimplemodificationofthe basic(unbalanced)binarysearchtree.

Unsurprisingly,thedefiningoperationinthistreeis splaying, rotatingaparticularnodetotherootinacertainway.Splayinganodemeansapplyingasequence of splayingsteps soastobringthenodetothetopofthetree.Therearethree typesofsplayingstep:

1.Givenanode t andoneofitschildren, y,rotate y around t (thatis,rotate t leftorright,asappropriate,tobring y tothetop).Theoriginalpapercalls thisa“zig”step.

2.Givenanode t,oneofitschildren, y,andthechild, z,of y thatisonthe samesideof y as y isof t,rotate y around t,andthenrotate z around y (a “zig-zig”step).

3.Givenanode t,oneofitschildren, y,andthechild, z,of y thatisonthe oppositesizeof y as y isof t,rotate z around y andthenaround t (a“zig-zag” step).

Thenodesthatwesubjecttothisoperationarethoseonthepathfromtheroot thatwewouldnormallyfollowtofindagivenvalueinabinarysearchtree.To getsomeintuitionintothemotivationbehindthisparticularoperation,consider Figure9.13.Thetreeontheleftofthefigureisatypicalworst-basebinarysearch tree.Aftersplayingnode0,wegetthetreeontherightofthe figure,whichhas roughlyhalftheheightoftheformer.It’struethatwehavetodo7rotationsto splaythisnode,buttocreatethetreeontheleft,wedid8constant-timeinsertions, sothat(sofar),theamortizedcostsofall9operations(8insertionsplusonesplay) areonlyabout2each.

Theoperationsofsearching,inserting,anddeletingfromanordinarybinary searchtreeallinvolvesearchingforanodethatcontainsaparticularvalue,orone

6D.D.SleatorandR.E.Tarjan,“Self-AdjustingBinarySearchTrees,”JournaloftheACM. 32(3),July1985,pp.652–686.

186 CHAPTER9.BALANCEDSEARCHING
Figure9.12illustratesthesethreebasicsteps.

Figure9.12:Thebasicsplayingsteps.Therearemirrorimagecaseswhen y is ontheothersideof t.Thelastrowillustratesacompletesplayingoperation(on node3).Startingatthebottom,weperforma“zig”step,followedbya“zig-zig,” andfinallya“zig-zag”allonnode3,endingupwiththetreeon theright.

9.4.SPLAYTREES 187 A B y C t B C t A y −→ “zig” −→ A B z C y D t C D t B y A z −→ “zig-zig” −→ B C z A y D t A B C D y t z −→ “zig-zag” −→ 0 8 9 6 7 4 5 2 1 3 3 0 2 1 8 4 6 5 7 9 Before After

thatisascloseaspossibletoit.Insplaytrees,afterfindingthisnode,wesplay it,bringingittotherootofthetreeandreorganizingtherestofthetree.Forthis purpose,itisconvenienttomodifytheusualalgorithmforsearchingforavaluein aBSTsothatitsplaysatthesametime.Figure9.15showsonepossibleimplementation7 Itoperatesontreesofthetype BST,giveninFigure9.14.Thisparticular typeprovidesoperationsforrotatingandreplacingachild thatwillperformeither leftorrightoperations,allowingustocollapseanestofcasesintoafew.

The splayFind procedureisatoolthatwecanusetoimplementtheusualoperationsonbinarysearchtrees,asshowninFigure9.16andillustratedinFigure9.17.

9.4.1Analyzingsplaytrees

Itisquiteeasytocreateveryunbalancedsplaytrees.Insertingitemsintoatree inorderwilldoit.Sowillsearchingforallitemsinthetree inorder.Sothecost ofanyparticularoperationinisolationisΘ(N ),if N isthenumberofnodes(and thereforekeys)inthetree.Butyouneverperformasingleoperationonalargetree; afterall,youhadtobuildthetreeinthefirstplace,andthat certainlyhadtotake timeatleastproportionaltoitssize.Therefore,wemightexpecttogetdifferent resultsifweaskforthe amortized timeofoperationsoveranentiresequence.In thissection,we’llshowthatinfacttheamortizedtimeboundfortheoperationsof search,insertion,anddeletiononasplaytreeis O(lg N ),justliketheworst-case

7The splayFind procedurehereperformszig-zigandzig-zagsteps,afterfirstpossiblyperforming azigstepatthebottomofthesearch.Thisisoneofmanypossiblevariationsonsplaying.The originalpaperbySleatorandTarjanshowshowtoperformsplayingstepsfromthetopdown, orfromthebottomup,butwiththezigatthetopofthetreeratherthanthebottom,orwith asimplifiedversionofthezig-zagstep.Theseallresultinslightlydifferenttrees,butallhave essentiallythesameamortizedperformance.Theversionhereisnotthemostefficient—beinga linearrecursioninsteadofaniterativeprocess—butIfinditconvenientforanalysis.

188 CHAPTER9.BALANCEDSEARCHING 7 6 5 4 3 2 1 0 0 6 4 7 2 5 1 3
Figure9.13:Splayingnode0inacompletelyunbalancedtree.Theresultingtree hasabouthalftheheightoftheoriginal,speedingupsubsequentsearches.

publicstaticclassBST{ publicBST(intlabel,BSTleft,BSTright){ this.label=label; this.left=left;this.right=right;

publicBSTleft,right; publicintlabel;

/**RotateCHILDleftorrightaroundme,asappropriate, *returningCHILD.CHILDmustbeoneofmychildren.*/ BSTrotate(BSTchild){

if(child==right){

right=child.left; child.left=this;

}else{

left=child.right; child.right=this; } returnchild;

/**ReplaceCHILDwithNEWCHILDasoneofmychildren.CHILD *mustbeeithermy(initial)leftorrightchild.*/ voidreplace(BSTchild,BSTnewChild){

if(child==right)

right=newChild; else

left=newChild;

9.4.SPLAYTREES 189
}
}
} }
Figure9.14:Thebinarysearch-treestructureusedforoursplaytrees.Thisisjust anordinaryBSTthatsuppliesunifiedoperationsforrotatingorreplacingchildren.

/**ReorganizeT,maintainingtheBSTproperty,sothatitsrootis *eitherVorthenextvaluelargerorsmallerthanVinT.Returns *nullonlyifTisempty.*/ privatestaticBSTsplayFind(BSTt,intv)

{ BSTy,z;

if(t==null||v==t.label)

returnt;

y=v<t.label?t.left:t.right; if(y==null)

returnt; elseif(v==y.label)

returnt.rotate(y);/*zig*/ elseif(v<y.label)

z=y.left=splayFind(y.left,v); else

z=y.right=splayFind(y.right,v); if(z==null)

returnt.rotate(y);/*zig*/ elseif((v<t.label)==(v<y.label)){/*zig-zig*/

t.rotate(y); y.rotate(z); returnz;

}else{ /*zig-zag*/

t.replace(y,y.rotate(z)); t.rotate(z); returnz;

190 CHAPTER9.BALANCEDSEARCHING
} }
Figure9.15:The splayFind procedureforfindingandsplayinganode.Usedby insertion,deletion,andsearch.

publicclassIntSplayTree{ privateBSTroot=null;

privatestaticBSTsplayFind(BSTt,intv){/* SeeFigure9.15.*/}

/**InsertVintomeiffnotalreadypresent.Returnstrue *iffVwasadded.*/ publicbooleanadd(intv){ root=splayFind(root,v); if(root==null)

root=newBST(v,null,null); elseif(v==root.label)

returnfalse; elseif(v<root.label){

root=newBST(v,root.left,root); root.right.left=null; }else{

root=newBST(v,root,root.right); root.left.right=null;

returntrue;

/**DeleteVfrommeiffpresent.ReturnstrueiffVwasdeleted.*/ publicbooleanremove(intv){ root=splayFind(root,v); if(root==null||v!=root.label) returnfalse; if(root.left==null)

root=root.right; else{ BSTr=root.right; root=splayFind(root.left,v); root.right=r;

returntrue;

/**TrueiffIcontainV.*/ publicbooleancontains(intv){ root=splayFind(root,v);

returnv==root.label;

9.4.SPLAYTREES 191
}
}
}
}
} }
Figure9.16:Standardcollectionoperationsonasplaytree.Theinterfaceisinthe styleoftheJavacollectionsclasses.Figure9.17illustratesthesemethods.

theresultofperforminga splayFind oneitherofthevalues21or24.(c)isthe resultofaddingthevalue21intotree(a);thefirststepisto createthesplayedtree (b).(d)istheresultofremoving24fromtheoriginaltree(a);againthefirststep istocreate(b),afterwhichwesplaytheleftchildof24forthevalue24,whichis guaranteedtobelargerthananyvalueinthatchild.

192 CHAPTER9.BALANCEDSEARCHING 12 0 36 6 20 38 2 8 16 28 4 24 32 (a) 24 12 36 28 38 0 20 32 6 16 2 8 4 (b) 21 12 24 36 0 20 28 38 6 16 32 2 8 4 (c) 20 12 36 0 16 28 38 6 32 2 8 4 (d)
Figure9.17:BasicBSToperationsonasplaytree.(a)istheoriginaltree.(b)is

boundsforotherbalancedbinarytrees.

Todoso,wefirstdefinea potentialfunction onourtrees,asdescribedin §1.4, whichwillkeeptrackofhowmanycheap(andunbalancing)operationswehave performed,andthusindicatehowmuchtimewecanaffordtospendinanexpensive operationwhilestillguaranteeingthatthetotalcumulativecostofasequenceof operationsstaysappropriatelybounded.Aswedidthere(Equation1.1),wedefine theamortizedcost, ai,ofthe ith operationinasequencetobe

ai = ci +Φi+1 Φi, where ci istheactualcostandΦk istheamountof“storedpotential”inthedata structurejustbeforethe kth operation.Forour ci,it’sconvenienttousethenumber ofrotationsperformed,or1ifanoperationinvolvesnorotation.Thatgivesusa valuefor ci thatisproportionaltotherealamountofwork.Thechallengeistofind aΦthatallowsustoabsorbthespikesin ci;when ai >ci,wesaveup“operation credit”inΦandreleaseit(bycausingΦi+1 < Φi)onstepswhere ci becomeslarge. Tobesuitable,wemustmakesurethatΦi ≥ Φ0 atalltimes.

Forasplaytreecontainingasetofnodes, T ,we’lluseasourpotentialfunction

where s(x)isthesize(thenumberofnodesin)thesubtreerootedat x.Thevalue r(x)=lg s(x)iscalledthe rank of x.Thus,forthecompletelylineartreeonthe leftinFigure9.13,Φ= 1≤i≤8 lg i =lg8! ≈ 15.3,whilethetreeontherightof thatfigurehasΦ=4lg1+lg3+lg5+lg7+lg8 ≈ 9.7,indicatingthatthecostof splaying0islargelyoffsetbydecreasingthepotential.

Allbutaconstantamountofworkineachoperation(search,insert,ordelete) takesplacein splayFind,soitwillsufficetoanalyzethat.Iclaimthattheamortized costoffindingandsplayinganode x inatreewhoserootis t is ≤ 3(r(t) r(x))+1. Since t istheroot,weknowthat r(t) ≥ r(x) ≥ 0.Furthermore,since s(t)= N ,the numberofnodesinthetree,provingthisclaimwillprovethattheamortizedcost ofsplayingmustbe O(lg N ),asdesired8 .

Let’slet C(t,x)representtheamortizedcostoffindingandsplayinganode x inatreerootedat t.Thatis,

C(t,x)=max(1, numberofrotationsperformed) +finalpotentialoftree initialpotentialoftree

Weproceedrecursively,followingthestructureoftheprograminFigure9.15,to showthat

C(t,x) ≤ 3(r(t) r(x))+1=3lg(s(t)/s(x))+1. (9.1)

It’sconvenienttousethenotation s′(z)tomean“thevalueof s(z)attheendofa splaystep,”and r′(z)tomean“thevalueof r(z)attheendofasplaystep.”

8 MytreatmenthereisadaptedfromLemma1anditsproofintheSleatorandTarjanpaper.

9.4.SPLAYTREES 193
x∈T
Φ= x∈T r(x)=
lg s(x)

1.When t istheemptytreeor v isatitsroot,therearenorotations,thepotential isunchanged,andwetaketherealcosttobe1.Assertion9.1isobviously trueinthiscase.

2.When x = y isachildof t (the“zig”case,shownatthetopofFigure9.12), weperformonerotation,foratotalactualcostof1.Tocomputethechange inpotential,wefirstnoticethattheonlynodeswehavetoconsiderare t and x,becausetheranksofallothernodesdonotchange,andthuscancelout whenwesubtractthenewpotentialfromtheoldone.Thus,the changein potentialis

3(r(t) r(x))+1.

3.Inthezig-zigcase,thecostconsistsoffirstsplaying x uptobeagrandchildof t (node z inthesecondrowofFigure9.12),andthenperformingtworotations. Byassumption,theamortizedcostofthefirstsplaystepis C(z,x) ≤ 3(r(z)

r(x))+1(r(z)istherankof x afteritissplayedtotheformerpositionof z,sincesplayingdoesnotchangetherankoftherootofatree. We’llabuse notationabitandrefertothenode x afterthissplayingas z sothatwecan stilluse r(x)astheoriginalrankof x).Thecostoftherotationsis2,and thechangeinthepotentialcausedbythesetworotationsdependsonlyonthe changesitcausesintheranksof t, y,and z.Summingtheseup,theamortized costforthiscaseis

194
CHAPTER9.BALANCEDSEARCHING
r ′(t)+ r ′(x) r(t) r(x) = r ′(t) r(x), since r′(x)= r(t) <r(t) r(x), since r′(t) <r(t) < 3(r(t) r(x)), since r(t) r(x) > 0 andtherefore,addinginthecostofonerotation,theamortizedcostis
<
C(t,x)=2+ r ′(t)+ r ′(y)+ r ′(z) r(t) r(y) r(z)+ C(z,x) =2+ r ′(t)+ r ′(y) r(y) r(z)+ C(z,x), since r′(z)= r(t) ≤ 2+ r ′(t)+ r ′(y) r(y) r(z)+3(r(z) r(x))+1 bytheinductivehypothesis =3(r(t) r(x))+1+2+ r ′(t)+ r ′(y) r(y)+2r(z) 3r(t)
2+ r ′(t)+ r ′(y) r(y)+2r(z) 3r(t) ≤ 0 (9.2)
2+ r ′(t)+ r ′(y) r(y)+2r(z) 3r(t) ≤ 2+ r ′(t)+ r(z) 2r(t) since r(y) >r(z)and r(t) >r′(y).
s ′(t)/s(t))+lg(s(z)/s(t)) bythedefinitionof
sotheresultwewantfollowsif
Wecanshow9.2asfollows:
=2+lg(
r andpropertiesoflg.

Nowifyouexaminethetreesinthezig-zigcaseofFigure9.12,youcansee that s′(t)+ s(z)+1= s(t),sothat s′(t)/s(t)+ s(z)/s(t) < 1.Becauselgisa concave,increasingfunction,thisinturntellsusthat(as discussedin §1.6),

s ′(t)/s(t))+lg(s(z)/s(t)) ≤ 2+2lg(1/2)=0

4.Finally,inthezig-zagcase,weagainhavethatthedesiredresultfollowsifwe candemonstratetheinequality9.2above.Thistime,wehave s′(y)+s′(t)+1= s(t),sowecanproceed

Thusendsthedemonstration.

Theoperationsofinsertionandsearchaddaconstanttimeto thetimeofsplaying,anddeletionaddsaconstantandaconstantfactorof2(sinceitinvolvestwo splayingoperations).Therefore,alloperationsonsplaytreeshave O(lg N )amortizedtime(usingthemaximumvaluefor N foranygivensequenceofoperations).

Thisboundisactuallypessimistic.Inordertreetraversals,aswe’veseen,take lineartimeinthesizeofthetree.Sinceasplaytreeisjusta BST,wecanget thesamebound.Ifweweretosplayeachnodetotherootaswetraversedthem (whichmightseemtobenaturalforsplaytrees),ouramortizedboundis O(N lg N ) ratherthan O(N ).Notonlythat,butafterthetraversal,ourtreewillhavebeen convertedtoa“stringy”linkedlist.Oddlyenough,however,itispossibletoshow thatthecostofaninordertraversalofaBSTinwhicheachnodeissplayedasitis traversedisactually O(N )(amortizedcost O(1)foreachitemtraversed,inother words).However,theauthorthinkshehasbeatenthissubjectintothegroundand willspareyouthedetails.

9.5SkipLists

TheB-treewasanexampleofasearchtreeinwhichnodeshadvariablenumbers ofchildren,witheachchildrepresentingsomeorderedsetofkeys.Itspeedsup searchesasdoesavanillabinarysearchtreebysubdividing thekeysateachnode intodisjointrangesofkeys,andcontrivestokeepthesesequencesofcomparable length,balancingthetree.Herewelookatanotherstructurethatdoesmuchthe samething,exceptthatitusesrotationasneededtoapproximatelybalancethetree anditmerelyachievesthisbalancewithhighprobability,ratherthanwithcertainty. ConsiderthesamesetofintegerkeysfromFigure9.1,arrangedintoasearchtree whereeachnodehasonekeyandanynumberofchildren,andthe childrenofany nodeallhavekeysthatareatleastaslargeasthatoftheirparent.Figure9.18

9.5.SKIPLISTS 195
2+lg(
2+ r ′(t)+ r ′(y) r(y)+2r(z) 3r(t) ≤ 2+ r ′(t)+ r ′(y) 2r(t) since r(y) >r(z)and r(t) >r(z). =2+lg(s ′(t)/s(t))+lg(s ′(y)/s(t))
andtheresultfollowsbythesamereasoningasinthezig-zig case.

showsapossiblearrangement.Themaximumheightsatwhichthekeysappearare chosenindependentlyaccordingtoarulethatgivesaprobabilityof(1 p)pk ofa keyappearingbeingatheight k (0beingthebottom).Thatis,0 <p< 1isan arbitraryconstantthatrepresentstheapproximateproportionofallnodesatheight ≥ e thathaveheight >e.Weaddaminimal(−∞)keyattheleftwithsufficient heighttoserveasarootforthewholetree.

Figure9.18showsanexample,createdusing p =0 5.Tolookforakey,we canscanthistreefromlefttorightstartingatanyleveland workingdownwards. Startingatthebottom(level0)justgivesusasimplelinear search.Athigherlevels, wesearchaforestoftrees,choosingwhichforesttoexamine morecloselyonthe basisofthevalueofitsrootnode.Toseeif127isamember,forexample,wecan lookat

• thefirst15entriesoflevel0(notincluding −∞)[15entries];or

• thefirst7level-1entries,andthenthe2level-0itemsbelow thekey120[9 entries];or

• thefirst3level-2entries,thenthelevel-1entry140,andthenthe2level-0 itemsbelow120[6entries];or

• thelevel-3entry90,thenthelevel-2entry120,thenthelevel-1entry140,and thenthe2level-0itemsbelow120[5entries].

Wecanrepresentthistreeasakindoflinearlistofnodesinin-order(see Figure9.19)inwhichthenodeshaverandomnumbersof next links,andthe ith next linkineach(numberingfrom0asusual)isconnectedtothenextnodethat hasatleast i +1links.Thislist-likerepresentation,withsomelinks“skipping” arbitrarynumbersoflistelements,explainsthenamegiven tothisdatastructure: the skiplist 9

Searchingisverysimple.Ifwedenotethevalueatoneofthesenodesas L.value (here,we’lluseintegerkeys)andthenextpointeratheight k as L.next[k],then:

/**TrueiffXisintheskiplistbeginningatnodeLat *aheight<=K,whereK>=0.*/

staticbooleancontains(SkipListNodeL,intk,intx){ if(x==L.next[k].value)

returntrue;

elseif(x>L.next[k].value)

returncontains(L.next[k],k,x);

elseif(k>0)

returncontains(L,k-1,x);

else returnfalse;

9WilliamPugh,Skiplists:Aprobabilisticalternativetobalancedtrees,“

33,6(June,1990)pp.668–676.

196 CHAPTER9.BALANCEDSEARCHING
}
Comm.oftheACM,

Figure9.18: Anabstractviewofaskiplist,showingitsrelationshiptoa (nonbinary)searchtree.Eachkeyotherthan −∞ isduplicatedtoarandomheight. Wecansearchthisstructurebeginningatanylevel.Inthebestcase,tosearch (unsuccessfully)forthetargetvalue127,weneedonlylook atthekeysinthe shadednodes.Darkershadednodesindicatekeyslargerthan 127thatboundthe search.

Figure9.19: TheskiplistfromFigure9.18,showingapossiblerepresentation.The datastructureisanorderedlistwhosenodescontainrandom numbersofpointersto laternodes(whichallowinterveningitemsinthelisttobeskippedduringasearch; hencethename).Ifanodehasatleast k pointers,thenitcontainsapointertothe nextnodethathasatleast k pointers.Anodefor ∞ attherightallowsustoavoid testsfornull.Again,thenodeslookedatduringasearchfor 127areshaded;the darkershadingindicatesnodesthatlimitthesearch.

Figure9.20: TheskiplistfromFigure9.19afterinserting127and126(in either order),anddeleting20.Here,the127nodeisrandomlygiven aheightof5,and the126nodeaheightof1.Theshadednodesshowwhichpreviouslyexistingnodes needtochange.Forthetwoinsertions,thenodesneedingchangearethesameas thelight-shadednodesthatwereexaminedtosearchfor127(or126),plusthe ±∞ nodesattheends(iftheyneedtobeheightened).

9.5.SKIPLISTS 197 0 1 2 3 4 −∞ −∞ −∞ −∞ −∞ 10 20 20 20 25 25 30 40 50 55 55 60 90 90 90 90 95 95 100 115 120 120 120 125 130 140 140 150
−∞ 0 1 2 3 10 20 25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 ∞
−∞ 0 1 2 3 10 25 30 40 50 55 60 90 95 100 115 120 125 126 127 130 140 150 ∞

Wecanstarttheatanylevel k ≥ 0uptotheheightofthetree.Itturnsoutthat areasonableplacetostartforalistcontaining N nodesisatlevellog1/p N ,as explainedbelow.

Toinsertordeleteintothelist,wefindthepositionofthenodetobeadded ordeletedbytheprocessabove,keepingtrackofthenodeswe traversetodoso. Whentheitemisaddedordeleted,thesearethenodeswhosepointersmayneed tobeupdated.Whenweinsertnodes,wechooseaheightforthemrandomlyin suchawaythatthenumberofnodesatheight k +1isroughly pk,where p is someprobability(typicalvaluesforwhichmightbe0.5or0.25).Thatis,ifwe areshootingforaroughly n-arysearchtree,welet p =1/n.Asuitableprocedure mightlooklikethis:

/**Arandominteger,h,intherange0..MAXsuchthat

* Pr(h ≥ k)= P k , 0 ≤ k ≤ MAX.*/ staticintrandomHeight(doublep,intmax,Randomr){ inth; h=0;

while(h<max&&r.nextDouble()<p) h+=1; returnh;

Ingeneral,itispointlesstoaccommodatearbitrarilylargeheights,soweimpose somemaximum,generallythelogarithm(base1/p)ofthemaximumnumberofkeys oneexpectstoneed.

Intuitively,anysequenceof M insertednodeseachofwhoseheightsisatleast k willberandomlybrokenaboutevery1/p nodesbyonewhoseheightisstrictly greaterthan k.Likewise,fornodesofheightatleast k +1,andsoforth.So,if ourlistcontains N items,andwestartlookingatlevellog1/p N ,we’dexpectto lookatmostatroughly(1/p)log1/p N keys(thatis,1/p keysateachoflog1/p N levels).Inotherwords,Θ(lg N )onaverage,whichiswhatwewant.Admittedly, thisanalysisisabithandwavy,butthetrueboundisnotsignificantlylarger.Since insertinganddeletingconsistsoffindingthenode,plussomeinsertionordeletion timeproportionaltothenode’sheight,weactuallyhaveΘ(lg N )expectedbounds onsearch,insertion,anddeletion.

Exercises

9.1. Fillinthefollowingtoagreewithitscomments:

/**ReturnamodifiedversionofTcontainingthesamenodes *withthesameinordertraversal,butwiththenodecontaining *labelXattheroot.DoesnotcreateanynewTreenodes.*/ staticTreerotateUp(TreeT,ObjectX){

198 CHAPTER9.BALANCEDSEARCHING
}
//FILLIN }

9.2. Whatisthemaximumheightofanorder-5B-treecontaining N nodes?What istheminimumheight?Whatsequencesofkeysgivethemaximumheight(thatis, giveageneralcharacterizationofsuchsequences).Whatsequencesofkeysgivethe minimumheight?

9.3. The splayFind algorithmgiveninFigure9.15ishardlythemostefficient versiononecouldimagineofthisprocedure.Theoriginalpaperhasaniterative versionofthesamefunctionthatusesconstantextraspaceinsteadofthelinear recursionofourversionof splayFind.Itkeepstrackoftwotrees: L,containing nodesthatarelessthan v,and R,containingnodesgreaterthan v.Asitprogresses iterativelydownthetreefromtheroot,itaddssubtreesofthecurrentnodeto L and R untilitreachesthenode, x,thatitisseeking.Atthatpoint,itfinishes byattachingtheleftandrightsubtreesof x to L and R respectively,andthen making L and R itsnewchildren.Duringthisprocess,subtreesgetattachedto L inorderincreasingofincreasinglabels,andto R inorderofdecreasinglabels. Rewrite splayFind tousethisstrategy.

9.4. Writeanon-recursiveversionofthe contains functionforskiplists(§9.5).

9.5. Defineanimplementationofthe SortedSet interfacethatusesaskiplist representation.

9.5.SKIPLISTS 199
200 CHAPTER9.BALANCEDSEARCHING

Chapter10

Concurrencyand Synchronization

Animplicitassumptionineverythingwe’vedonesofaristhatasingleprogramis modifyingourdatastructures.InJava,one can havetheeffectofmultipleprograms modifyinganobject,duetotheexistenceof threads.

Althoughthelanguageusedtodescribethreadssuggeststhattheirpurposeis toallowseveralthingstohappensimultaneously,thisisasomewhatmisleading impression.EventhesmallestJavaapplicationrunningonSun’sJDKplatform,for example,hasfivethreads,andthat’sonlyiftheapplication hasnotcreatedany itself,andevenifthemachineonwhichtheprogramrunsconsistsofasingleprocessor(whichcanonlyexecuteoneinstructionatatime).Thefouradditional“system threads”performanumberoftasks(suchas“finalizing”objectsthatarenolonger reachablebytheprogram)thatare logicallyindependent oftherestoftheprogram. Theiractionscouldusefullyoccuratanytimerelativetotherestoftheprogram. Sun’sJavaruntimesystem,inotherwords,isusingthreadsasa organizationaltool foritssystem.ThreadsaboundinJavaprogramsthatuse graphicaluserinterfaces (GUIs). Onethreaddrawsorredrawsthescreen.Anotherrespondsto events such astheclickingofamousebuttonatsomepointonthescreen.Thesearerelated, butlargelyindependentactivities:objectsmustberedrawn,forexample,whenever awindowbecomesinvisibleanduncoversthem,whichhappens independentlyof anycalculationstheprogramisdoing.

Threadsviolateourimplicitassumptionthatasingleprogramoperatesonour data,sothatevenanotherwiseperfectlyimplementeddatastructure,withallof itsinstancevariablesprivate,canbecomecorruptedinratherbizarreways.The existenceofmultiplethreadsoperatingonthesamedataobjectsalsoraisesthe generalproblemofhowthesethreadsaretocommunicatewith eachotherinan orderlyfashion.

201

10.1SynchronizedDataStructures

Considerthe ArrayList implementationfrom §4.1.Inthemethod ensureCapacity, wefind

publicvoidensureCapacity(intN){ if(N<=data.length) return; Object[]newData=newObject[N]; System.arraycopy(data,0, newData,0,count); data=newData; }

publicObjectset(intk,Objectx){ check(k,count); Objectold=data[k]; data[k]=x; returnold; }

Supposeoneprogramexecutes ensureCapacity whileanotherisexecuting set on thesame ArrayList object.Wecouldseethefollowinginterleavingoftheiractions:

/*Program1executes:*/newData=newObject[N];

/*Program1executes:*/System.arraycopy(data,0, newData,0,count);

/*Program2executes:*/data[k]=x;

/*Program1executes:*/data=newData;

Thus,welosethevaluethatProgram2set,becauseitputsthisvalueintotheold valueof data after data’scontentshavebeencopiedtothenew,expandedarray.

Tosolvethesimpleproblempresentedby ArrayList,threadscanarrangeto accessanyparticular ArrayList in mutualexclusion—thatis,insuchawaythat onlyonethreadatatimeoperatesontheobject.Java’s synchronized statement providemutualexclusion,allowingustoproduce synchronized (or thread-safe)data structures.Hereispartofanexample,showingboththeuseofthe synchronized methodmodifierandequivalentuseofthe synchronized statement:

publicclassSyncArrayList<T>extendsArrayList<T>{

publicvoidensureCapacity(intn){ synchronized(this){ super.ensureCapacity(n);

publicsynchronizedTset(intk,Tx){ returnsuper.set(k,x);

TheprocessofprovidingsuchwrapperfunctionsforallmethodsofaListis sufficientlytediousthatthestandardJavalibraryclass java.util.Collections providesthefollowingmethod:

202 CHAPTER10.CONCURRENCYANDSYNCHRONIZATION
...
}
}
}

10.2.MONITORSANDORDERLYCOMMUNICATION

203

/**Asynchronized(thread-safe)viewofthelistL,inwhich only *onethreadatatimeexecutesanymethod.Tobeeffective, *(a)thereshouldbenosubsequentdirectuseofL, *and(b)thereturnedListmustbesynchronizedupon *duringanyiteration,asin

*ListaList=Collections.synchronizedList(newArrayList());

*synchronized(aList){

*for(Iteratori=aList.iterator();i.hasNext();) *foo(i.next());

publicstaticList<T>synchronizedList(ListL<T>){...}

Unfortunately,thereisatimecostassociatedwithsynchronizingoneveryoperation,whichiswhytheJavalibrarydesignersdecidedthat Collection andmost ofitssubtypeswouldnotbesynchronized.Ontheotherhand, StringBuffers and Vectors aresynchronized,andcannotbecorruptedbysimultaneoususe.

10.2MonitorsandOrderlyCommunication

Theobjectsreturnedbythe synchronizedList methodareexamplesofthesimplestkindof monitor. Thistermreferstoanobject(ortypeofobject)thatcontrols (“monitors”)concurrentaccesstosomedatastructuresoas tomakeitworkcorrectly.Onefunctionofamonitoristoprovidemutuallyexclusiveaccesstothe operationsofthedatastructure,whereneeded.Anotheristoarrangefor synchronization betweenthreads—sothatonethreadcanwaituntilanobjectis“ready” toprovideitwithsomeservice.

Monitorsareexemplifiedbyoneoftheclassicexamples:the sharedbuffer or mailbox. Asimpleversionofitspublicspecificationlookslikethis:

/**Acontainerforasinglemessage(anarbitraryObject).Atany *time,aSmallMailboxiseitherempty(containingnomessage)or *full(containingonemessage).*/ publicclassSmallMailbox{

/**WhenTHISisempty,setitscurrentmessagetoMESSAGE,making *itfull.*/

publicsynchronizedvoiddeposit(Objectmessage) throwsInterruptedException{...}

/**WhenTHISisfull,emptyitandreturnitscurrentmessage.*/ publicsynchronizedObjectreceive() throwsInterruptedException{...} }

Sincethespecificationssuggestthateithermethodmighthavetowaitforanew messagetobedepositedoranoldonetobereceived,wespecifybothaspossibly

*
*...
*} */

throwingan InterruptedException,whichisthestandardJavawaytoindicate thatwhilewewerewaiting,someotherthreadinterruptedus. The SmallMailbox specificationillustratesthefeaturesofatypicalmonitor:

• Noneofthemodifiablestatevariables(i.e.,fields)areexposed.

• Accessesfromseparatethreadsthatmakeanyreferencetomodifiablestateare mutuallyexcluded;onlyonethreadatatime holdsalock ona SmallMailbox object.

• Athreadmayrelinquishalocktemporarilyandawaitnotificationofsome change.Butchangesintheownershipofalockoccuronlyatwell-defined pointsintheprogram.

Theinternalrepresentationissimple: privateObjectmessage; privatebooleanamFull;

TheimplementationsmakeuseoftheprimitiveJavafeatures for“waitinguntil notified:”

publicsynchronizedvoiddeposit(Objectmessage) throwsInterruptedException

while(amFull) wait();//Sameasthis.wait(); this.message=message;this.amFull=true; notifyAll();//Sameasthis.notifyAll()

publicsynchronizedObjectreceive() throwsInterruptedException

while(!amFull) wait(); amFull=false; notifyAll(); returnmessage;

Themethodsof SmallMailbox allowotherthreadsinonlyatcarefullycontrolled points:thecallsto wait.Forexample,theloopin deposit means“Ifthereisstill oldunreceivedmail,waituntilsomeotherthreadtoreceivesitandwakesmeup again(with notifyAll) and Ihavemanagedtolockthismailboxagain.”From thepointofviewofathreadthatisexecuting deposit or receive,eachcallto wait hastheeffectofcausingsomechangetotheinstancevariablesof this—some change,thatis,thatcouldbeeffectedbyothercalls deposit or receive.

204 CHAPTER10.CONCURRENCYANDSYNCHRONIZATION
{
}
{
}

Aslongasthethreadsofaprogramarecarefultoprotectalltheirdatain monitorsinthisfashion,theywillavoidthesortsofbizarreinteractiondescribedat thebeginningof §10.1.Ofcourse,thereisnosuchthingasafreelunch;theuse of lockingcanleadtothesituationknownas deadlock inwhichtwoormorethreads waitforeachotherindefinitely,asinthisartificialexample: classCommunicate{ staticSimpleMailbox box1=newSimpleMailbox(), box2=newSimpleMailbox(); }

//Thread#1:|//Thread#2: m1=Communicate.box1.receive();|m2=Communicate.box2.receive(); Communicate.box2.deposit(msg1);|Communicate.box1.deposit(msg2);

Sinceneitherthreadsendsanythingbeforetryingtoreceiveamessagefromitsbox, boththreadswaitforeachother(theproblemcouldbesolved byhavingoneofthe twothreadsreversetheorderinwhichitreceivesanddeposits).

10.3MessagePassing

Monitorsprovideadisciplinedwayformultiplethreadstoaccessdatawithout stumblingovereachother.Lurkingbehindtheconceptofmonitorisasimpleidea:

Thinkingaboutmultipleprogramsexecutingsimultaneouslyishard,so don’tdoit!Instead,writeabunchof one-thread programs,andhave themexchangedatawitheachother.

Inthecaseofgeneralmonitors,“exchangingdata”meanssettingvariablesthateach cansee.Ifwetaketheideafurther,wecaninsteaddefine“exchangingdata”as “readinginputandwritingoutput.”Wegetaconcurrentprogrammingdiscipline called messagepassing.

Inthemessage-passingworld,threadsareindependentsequentialprogramsthan sendeachother messages. Theyreadandwritemessagesusingmethodsthatcorrespondto read onJava Readers,or print onJava PrintStreams. Asaresult, onethreadisaffectedbyanotheronlywhenitbothersto“read itsmessages.”

Wecangettheeffectofmessagepassingbywritingourthreads toperformall interactionwitheachotherbymeansofmailboxes.Thatis,thethreadssharesome setofmailboxes,butsharenoothermodifiableobjectsorvariables(unmodifiable objects,like Strings,arefinetoshare).

Exercises

10.1. Giveapossibleimplementationforthe Collections.synchronizedList staticmethodin §10.1.

10.3.MESSAGEPASSING 205
206 CHAPTER10.CONCURRENCYANDSYNCHRONIZATION

Chapter11

Pseudo-RandomSequences

Randomsequencesofnumbershaveanumberofusesinsimulation,gameplaying, cryptography,andefficientalgorithmdevelopment.Theterm “random”israther difficulttodefine.Formostofourpurposes,wereallydon’tneedtoanswerthe deepphilosophicalquestions,sinceourneedsaregenerallyservedbysequencesthat displaycertainstatisticalproperties.Thisisagoodthing,becausetruly“random”sequencesinthesenseof“unpredictable”aredifficulttoobtainquickly,and programmersgenerallyresort,therefore,to pseudo-random sequences.Theseare generatedbysomeformula,andarethereforepredictablein principle.Nevertheless,formanypurposes,suchsequencesareacceptable,iftheyhavethedesired statistics.

Wecommonlyusesequencesofintegersorfloating-pointnumbersthatare uniformly distributedthroughoutsomeinterval—thatis,ifonepicks anumber(truly) atrandomoutofthesequence,theprobabilitythatitisinanysetofnumbersfrom theintervalisproportionaltothesizeofthatset.Itisrelativelyeasytoarrange thatasequenceofintegersinsomeintervalhasthisparticularproperty:simply enumerateapermutationoftheintegersinthatintervaloverandover.Eachintegerisenumeratedonceperrepetition,andsothesequenceis uniformlydistributed. Ofcourse,havingdescribeditlikethis,itbecomesevenmoreapparentthatthe sequenceisanythingbut“random”intheinformalsenseofthisterm.Nevertheless,whentheintervalofintegersislargeenough,andthepermutation“jumbled” enough,itishardtotellthedifference.TherestofthisChapterwilldealwith generatingsequencesofthissort.

11.1Linearcongruentialgenerators

Perhapsthemostcommonpseudo-random-numbergeneratorsusethefollowingrecurrence.

Xn =(aXn 1 + c)mod m, (11.1)

where Xn ≥ 0isthe nth integerinthesequence,and a,m> 0and c ≥ 0are integers.The seed value, X0,maybeanyvaluesuchthat0 ≤ X0 <m.When m is

207

apoweroftwo,the Xn areparticularlyeasytocompute,asinthefollowingJava class.

/**Ageneratorofpseudo-randomnumbersintherange0.. 231 1.*/ classRandom1{ privateintrandomState; staticfinalint a=..., c=...;

Random1(intseed){randomState=seed;}

intnextInt(){ randomState=(a*randomState+c)&0x7fffffff; returnrandomState;

Here, m is231.The‘&’operationcomputesmod231 [why?].Theresultcanbeany non-negativeinteger.Ifwechangethecalculationof randomState to randomState=a*randomState+c;

thenthecomputationisimplicitlydonemodulo232 ,andtheresultsareintegersin therange 231 to231 1.

Thequestiontoasknowishowtochoose a and c appropriately.Considerable analysishasbeendevotedtothisquestion1.Here,I’lljustsummarize.Iwillrestrict thediscussiontothecommoncaseof m =2w,where w> 2istypicallytheword sizeofthemachine(asintheJavacodeabove).Thefollowing criteriafor a and c aredesirable.

1.Inordertogetasequencethathasmaximum period—thatis,whichcycles throughallintegersbetween0and m 1(orinourcase, m/2to m/2 1)—it isnecessaryandsufficientthat c and m berelativelyprime(havenocommon factorsotherthan1),andthat a havetheform4k +1forsomeinteger k.

2.Averylowvalueof a iseasilyseentobeundesirable(theresultingsequence willshowasortofsawtoothbehavior).Itisdesirablefor a tobereasonably largerelativeto m (Knuth,forexample,suggestsavaluebetween0.01m and 0.99m)andhaveno“obviouspattern”toitsbinarydigits.

3.Itturnsoutthatvaluesof a thatdisplaylow potency (definedastheminimal valueof s suchthat(a 1)s isdivisibleby m)arenotgood.Since a 1must

1Fordetails,seeD.E.Knuth, SeminumericalAlgorithms (TheArtofComputerProgramming, volume2),secondedition,Addison-Wesley,1981.

208
CHAPTER11.PSEUDO-RANDOMSEQUENCES
} }

bedivisibleby4,(seeitem1above),thebestwecandoistoinsurethat (a 1)/4isnoteven—thatis, a mod8=5.

4.Undertheconditionsabove, c =1isasuitablevalue.

5.Finally,althoughmostarbitrarily-chosenvaluesof a satisfyingtheaboveconditionsworkreasonablywell,itisgenerallypreferableto applyvariousstatisticaltests(seeKnuth)justtomakesure.

Forexample,when m =232,somegoodchoicesfor a are1566083941(whichKnuth creditstoWaterman)and1664525(creditedtoLavauxandJanssens).

Therearealsobadchoicesofparameters,ofwhichthemostfamousisonethat waspartoftheIBMFORTRANlibraryforsometime—RANDU,whichhad m =231 , X0 odd, c =0,and a =65539.Thisdoesnothavemaximumperiod,ofcourse(it skipsallevennumbers).Moreover,ifyoutakethenumbersthreeatatimeand considerthemaspointsinspace,thesetofpointsisrestrictedtoarelativelyfew widely-spacedplanes—strikinglybadbehavior.

TheJavalibraryhasaclass java.util.Random similarto Random1.Ittakes m =248 , a =25214903917,and c =11togenerate long quantitiesintherange0to 248 1,whichdoesn’tquitesatisfyKnuth’scriterion2.Ihaven’tcheckedtoseehow gooditis.Therearetwowaystoinitializea Random:eitherwithaspecific“seed” value,orwiththecurrentvalueofthesystemtimer(whichon UNIXsystemsgives anumberofmillisecondssincesometimein1970)—afairlycommonwaytogetan unpredictablestartingvalue.It’simportanttohaveboth: forgamesorencryption, unpredictabilityisuseful.Thefirstconstructor,however,isalsoimportantbecause itmakesitpossibletoreproduceresults.

11.2AdditiveGenerators

Onecangetverylongperiods,andavoidmultiplications(whichcanbealittle expensiveforJava long quantities)bycomputingeachsuccessiveoutput, Xn,asa sumofselectedofpreviousoutputs: Xn k forseveralfixedvaluesof k.Here’san instanceofthisschemethatapparentlyworksquitewell2:

Xn =(Xn 24 + Xn 55)mod m, for n ≥ 55(11.2)

where m =2e forsome e.Weinitiallychoosesome“random”seedvaluesfor X0 to X54.Thishasalargeperiodof2f (255 1)forsome0 ≤ f<e.Thatis,although numbersitproducesmustrepeatbeforethen(sincethereare only2e ofthem,and e istypicallysomethinglike32),theywon’trepeatinthesamepattern.

Implementingthisschemegivesusanotherniceopportunity toillustratethe circularbuffer (see §4.5).Keepyoureyeonthearray state inthefollowing:

2 KnuthcreditsthistounpublishedworkofG.J.MitchellandD.P.Moorein1958.

11.2.ADDITIVEGENERATORS
209

CHAPTER11.PSEUDO-RANDOMSEQUENCES

classRandom2{

/**state[k]willhold Xk,Xk+55,Xk+110,... */ privateint[]state=newint[55];

/**nmwillhold n mod55 aftereachcalltonextInt.

*Initially n =55.*/ privateintnm;

publicRandom2(...){

initializestate[0..54]tovaluesfor X0 to X54; nm=-1;

publicintnextInt(){ nm=mod55(nm+1); intk24=mod55(nm-24); //Nowstate[nm]is Xn 55 andstate[k24]is Xn 24. returnstate[nm]+=state[k24]; //Nowstate[nm](justreturned)represents Xn

privateintmod55(intx){ return(x>=55)?x-55:(x<0)?x+55:x;

Othervaluesthan24and55willalsoproducepseudo-randomstreamswithgood characteristics.SeeKnuth.

11.3Otherdistributions

11.3.1Changingtherange

Thelinearcongruentialgeneratorsabovegiveuspseudo-randomnumbersinsome fixedrange.Typically,wearereallyinterestedinsomeother,smaller,rangeof numbersinstead.Let’sfirstconsiderthecasewherewewanta sequence, Yi,of integersuniformlydistributedinarange0to m′ 1andaregivenpseudo-random integers, Xi,intherange0to m 1,with m>m′.Apossibletransformationis

whichresultsinnumbersthatarereasonablyevenlydistributedaslongas m ≫ m′ . Fromthis,itiseasytogetasequenceofpseudo-randomintegersevenlydistributed intherange L ≤ Y ′ i <U :

210
}
}
} }
Yi = ⌊ m′ m Xi⌋,
Y ′ i = ⌊ U L m Xi⌋

Itmightseemthat

Yi = Xi mod m ′ (11.3)

isamoreobviousformulafor Yi.However,ithasproblemswhen m′ isasmall poweroftwoandweareusingalinearcongruentialgenerator asinEquation11.1, with m apowerof2.Forsuchagenerator,thelast k bitsof Xi haveaperiodof2k [why?],andthussowill Yi.Equation11.3worksmuchbetterif m′ isnotapower of2.

The nextInt methodintheclass java.util.Random producesits32-bitresult froma48-bitstatebydividingby216 (shiftingrightby16binaryplaces),whichgets convertedtoan int intherange 231 to231 1.The nextLong methodproduces a64-bitresultbycalling nextInt twice:

(nextInt()<<32L)+nextInt();

11.3.2Non-uniformdistributions

Sofar,wehavediscussedonlyuniformdistributions.Sometimesthatisn’twhatwe want.Ingeneral,assumethatwewanttopickanumber Y insomerange ul to uh sothat3

Pr[Y ≤ y]= P (y),

where P isthedesired distributionfunction—thatis,itisanon-decreasingfunction with P (y)=0for y<ul and P (y)=1for y ≥ uh.Theideaofwhatwemust doisillustratedinFigure11.1,whichshowsagraphofadistribution P .Thekey observationisthatthedesiredprobabilityof Y beingnogreaterthan y0, P (y0), isthesameastheprobabilitythatauniformlydistributedrandomnumber X on theinterval0to1,islessthan P (y0).Suppose,therefore,thatwehadaninverse function P 1 sothat P (P 1(x))= x.Then,

Pr[P 1(X) ≤ y]=Pr[X ≤ P (y)]= P (y)

Inotherwords,wecandefine

Y = P 1(X) asthedesiredrandomvariable.

Allofthisisstraightforwardwhen P isstrictlyincreasing.However,wehave toexercisecarewhen P isnotinvertible,whichhappenswhen P doesnotstrictly increase(i.e.,ithas“plateaus”whereitsvaluedoesnotchange).If P (y)hasa constantvaluebetween y0 and y1,thismeansthattheprobabilitythat Y falls betweenthesetwovaluesis0.Therefore,wecanuniquelydefine P 1(x)asthe smallest y suchthat P (y) ≤ x.

Unfortunately,invertingacontinuousdistribution(that is,inwhich Y ranges— atleastideally—oversomeintervalofrealnumbers)isnotalwayseasytodo.There arevarioustricks;asusual,theinterestedreaderisreferredtoKnuthfordetails.In particular,Javausesoneofhisalgorithms(the polarmethod ofBox,Muller,and

3 Thenotation Pr[E]means“theprobabilitythatsituation E (calledan event)istrue.”

11.3.OTHERDISTRIBUTIONS 211

Figure11.1: Atypicalnon-uniformdistribution,illustratinghowtoconvertauniformlydistributedrandomvariableintoonegovernedbyanarbitrarydistribution, P (y).Theprobabilitythat y islessthan y0 isthesameastheprobabilitythata uniformlydistributedrandomvariableontheinterval0to1 islessthanorequalto P (y0).

Marsaglia)toimplementthe nextGaussian methodin java.util.Random,which returnsnormallydistributedvalues(i.e.,the“bellcurve”density)withameanvalue of0andstandarddeviationof1.

11.3.3Finitedistributions

Thereisasimplercommoncase:thatinwhich Y istorangeoverafiniteset— saytheintegersfrom0to u,inclusive.Wearetypicallygiventheprobabilities pi =Pr[Y = i].Intheinterestingcase,thedistributionisnotuniform, andhence the pi arenotnecessarilyall1/(u +1).Therelationshipbetweenthese pi and P (i) is

P (i)=Pr[Y ≤ i]= 0≤k≤i pk.

Theobvioustechniqueforcomputingtheinverse P 1 istoperformalookup onatablerepresentingthedistribution P .Tocomputearandom i satisfyingthe desiredconditions,wechoosearandom X intherange0–1,andreturnthefirst i suchthat X ≤ P (i).Thisworksbecausewereturn i iff P (i 1) <X ≤ P (i)(taking P ( 1)=0).Thedistancebetween P (i 1)and P (i)is pi,andsince X isuniformly distributedacross0to1,theprobabilityofgettingapoint inthisintervalisequal tothesizeoftheinterval, pi

Forexample,if1/12ofthetimewewanttoreturn0,1/2thetimewewantto return1,1/3ofthetimewewanttoreturn2,and1/12ofthetimewewantto return3,wereturntheindexofthefirstelementoftable PT thatdoesnotexceed arandom X chosenuniformlyontheinterval0to1,where PT isdefinedtohave PT [0]=1/12, PT [1]=7/12, PT [2]=11/12,and PT [3]=1.

Oddlyenough,thereisafasterwayofdoingthiscomputation forlarge u,discoveredbyA.J.Walker4 .Imaginethenumbersbetween0and u aslabelson u +1

4Knuth’scitationsare ElectronicsLetters 10,8(1974),127–128and ACMTransactionson

212
1 0 y y0 P (y0 ) P (y)
CHAPTER11.PSEUDO-RANDOMSEQUENCES

beakers,eachofwhichcancontain1/(u +1)unitsofliquid.Imaginefurtherthat wehave u +1vialsofcoloredliquids,alsonumbered0to u,eachofadifferentcolor andallimmiscibleineachother;we’llusetheinteger i asthenameofthecolorin vialnumber i.Thetotalamountofliquidinallthevialsis1unit,butthevialsmay containdifferentamounts.Theseamountscorrespondtothedesiredprobabilities ofpickingthenumbers0through u +1.

Supposethatwecandistributetheliquidfromthevialstothebeakerssothat

• Beakernumber i containstwocolorsofliquid(thequantityofoneofthecolors, however,maybe0),and

• Oneofthecolorsofliquidinbeaker i iscolornumber i.

Thenwecanpickanumberfrom0to u withthedesiredprobabilitiesbythe followingprocedure.

• Pickarandomfloating-pointnumber,X,uniformlyintherange0 ≤ X<u+1. Let K betheintegerpartofthisnumberand F thefractionalpart,sothat K + F = X, F< 1,and K,F ≥ 0.

• Iftheamountofliquidofcolor K inbeaker K isgreaterthanorequalto F , thenreturn K.Otherwisereturnthenumberoftheothercolorinbeaker K.

Alittlethoughtshouldconvinceyouthattheprobabilityof pickingcolor i under thisschemeisproportionaltotheamountofliquidofcolor i.Thenumber K representsarandomly-chosenbeaker,and F representsarandomly-chosenpoint alongthesideofthatbeaker.Wechoosethecolorwefindatthisrandomlychosen point.Wecanrepresentthisselectionprocesswithtwotablesindexedby K: YK isthecoloroftheotherliquidinbeaker K (i.e.,besidescolor K itself),and HK is theheightoftheliquidwithcolor K inbeaker K (asafractionofthedistanceto thetopgradationofthebeaker).

Forexample,considertheprobabilitiesgivenpreviously; anappropriatedistributionofliquidisillustratedinFigure11.2.Thetablescorrespondingtothisfigureare Y =[1, 2, , 1](Y2 doesn’tmatterinthiscase),and H =[0.3333, 0.6667, 1.0, 0.3333].

Theonlyremainingproblemisperformthedistributionofliquidstobeakers, forwhichthefollowingproceduresuffices(inoutline):

MathematicalSoftware, 3 (1976),253–256.

11.3.OTHERDISTRIBUTIONS 213

CHAPTER11.PSEUDO-RANDOMSEQUENCES

/** S isasetofintegersthatarethenamesofbeakersand *vialcolors.Assumesthatallthebeakersnamedin S are *emptyandhaveequalcapacity,andthetotalcontentsofthe vials *namedin S isequaltothetotalcapacityofthebeakersin

* S .Fillsthebeakersin S fromthevialsinVsothat *eachbeakercontainsliquidfromnomorethantwovialsandthe *beakernamedscontainsliquidofcolors.*/ voidfillBeakers(SetOfIntegers S )

if(S isempty) return;

v0 =thecolorofavialin S withtheleastliquid; Pourthecontentsofvial v0 intobeaker v0;

/*Thecontentsmustfitinthebeaker,becausesince v0

*containstheleastfluid,itmusthavenomorethanthe *capacityofasinglebeaker.Vial v0 isnowempty.*/

v1 =thecolorofavialin S withthemostliquid; Fillbeaker v0 therestofthewayfromvial v1;

/*If |S| =1 sothat v0 = v1,thisisthenulloperation.

*Otherwise, v0 = v1 andvial v1 mustcontainat *leastasmuchliquidaseachbeakercancontain.Thus,beaker

* v0 isfilledbythisstep.(NOTE: |S| isthe

*cardinalityof S .)*/

fillBeakers(S −{v0}); }

Theactionof“pouringthecontentsofvial v0 intobeaker v0”correspondstosetting Hv0

totheratiobetweentheamountofliquidinvial v0 andthecapacityofbeaker v0.Theactionof“fillingbeaker v0 therestofthewayfromvial v1”correspondsto setting Yv0 to v1.

214
0 1 2 3 Legend: 0: 1: 2: 3:
Figure11.2: Anexampledividingprobabilities(coloredliquids)intobeakers.Each beakerholds1/4unitofliquid.Thereis1/12unitof0-coloredliquid,1/2unitof 1-coloredliquid,1/3unitof2-coloredliquid,and1/12unitof3-coloredliquid.
{

11.4.RANDOMPERMUTATIONSANDCOMBINATIONS 215

11.4Randompermutationsandcombinations

Givenasetof N values,considertheproblemofselectinga randomsequencewithout replacement oflength M fromtheset.Thatis,wewantarandomsequenceof M valuesfromamongthese N ,whereeachvalueoccursinthesequencenomorethan once.By“randomsequence”wemeanthatallpossiblesequencesareequallylikely5 . Ifweassumethattheoriginalvaluesarestoredinanarray,thenthefollowingisa verysimplewayofobtainingsuchasequence.

/**PermuteAsoastorandomlyselectMofitselements, *placingtheminA[0]..A[M-1],usingRasasourceof *randomnumbers.*/

staticvoidselectRandomSequence(SomeType[]A,intM,Random1R) {

intN=A.length; for(inti=0;i<M;i+=1)

swap(A,i,R.randInt(i,N-1)); }

Here,weassume swap(V,j,k) exchangesthevaluesof V[j] and V[k]

Forexample,if DECK[0] isA♣, DECK[1] is2♣,...,and DECK[51] isK♠,then selectRandomSequence(DECK,52,newRandom());

shufflesthedeckofcards.

Thistechniqueworks,butif M ≪ N ,itisnotaterriblyefficientuseofspace,at leastwhenthecontentsofthearray A issomethingsimple,liketheintegersbetween 0and N 1.Forthatcase,wecanbetterusesomealgorithmsduetoFloyd(names oftypesandfunctionsaremeanttomakethemself-explanatory).

5 Here,I’llassumethattheoriginalsetcontainsnoduplicatevalues.Ifitdoes,thenwehaveto treattheduplicatesasiftheywerealldifferent.Inparticular,ifthereare k duplicatesofavalue intheoriginalset,itmayappearupto k timesintheselectedsequence.

CHAPTER11.PSEUDO-RANDOMSEQUENCES

/**ReturnsarandomsequenceofMdistinctintegersfrom0..N-1, *withallpossiblesequencesequallylikely.Assumes0<=M<=N.*/ staticSequenceOfIntegersselectRandomIntegers(intN,intM,Random1R) { SequenceOfIntegersS=newSequenceOfIntegers(); for(inti=N-M;i<N;i+=1){ ints=R.randInt(0,i); if(s ∈ S) insertiintoSafters; else prefixstothefrontofS; } returnS;

Thisprocedureproducesallpossiblesequenceswithequalprobabilitybecauseeverypossiblesequenceofvaluesfor s generatesadistinctvalueof S,andallsuch sequencesareequallyprobable.

Sanitycheck:thenumberofwaystoselectasequenceof M objectsfromaset of N objectsis

N ! (N M )!

andthenumberofpossiblesequencesofvaluesfor s isequaltothenumberofpossiblevaluesof R.randInt(0,N-M) timesthenumberofpossiblevaluesof R.randInt(0,N-M-1), etc.,whichis

(N M +1)(N M +2) ··· N = N ! (N M )!

Byreplacingthe SequenceOfIntegers withasetofintegers,andreplacing “prefix”and“insert”withsimplyaddingtoaset,wegetanalgorithmforselecting combinations of M numbersfromthefirst N integers(i.e.,whereorderdoesn’t matter).

TheJavastandardlibraryprovidestwostaticmethodsinthe class java.util.Collections forrandomlypermutinganarbitrary List:

/**PermuteL,usingRasasourceofrandomness.Asaresult, *callingshuffletwicewithvaluesofRthatproduceidentical *sequenceswillgiveidenticalpermutations.*/ publicstaticvoidshuffle(List<?>L,Randomr){ ··· }

/**Sameasshuffle(L,D),whereDisadefaultRandomvalue.*/ publicstaticvoidshuffle(ListL<?>){ }

Thistakeslineartimeifthelistsupportsfastrandomaccess.

216
}

Chapter12

Graphs

Whenthetermisusedincomputerscience,a graph isadatastructurethatrepresentsamathematicalrelation.Itconsistsofasetof vertices (or nodes)anda setof edges, whicharepairsofvertices1 .Theseedgepairsmaybeunordered,in whichcasewehavean undirectedgraph, ortheymaybeordered,inwhichcasewe havea directedgraph (or digraph)inwhicheachedge leaves, exits,oris outof one vertexand enters oris into theother.Forvertices v and w wedenoteageneral edgebetween v and w as(v,w),or {v,w} ifwespecificallywanttoindicatean undirectededge,or[v,w]ifwespecificallywanttoindicateadirectededgethat leaves v andenters w.Anedge(v,w)issaidtobe incident onitstwo ends, v and w;if(v,w)isundirected,wesaythat v and w are adjacent vertices.The degree of avertexisthenumberofedgesincidentonit.Foradirectedgraph,the in-degree isthenumberofedgesthatenteritandthe out-degree isthenumberthatleave. Usually,theendsofanedgewillbedistinct;thatis,therewillbeno reflexive edges fromavertextoitself.

A subgraph ofagraph G issimplyagraphwhoseverticesandedgesaresubsets oftheverticesandedgesof G.

A path oflength k ≥ 0inagraphfromvertex v tovertex v′ isasequenceof vertices v0,v1,...,vk 1 with v = v0, v′ = vk 1 withallthe(vi,vi+1)edgesbeingin thegraph.Thisdefinitionappliesbothtodirectedandundirectedgraphs;inthe caseofdirectedgraphs,thepathhasadirection.Thepathis simple ifthereare norepetitionsofverticesinit.Itisa cycle if k> 1and v = v′,anda simplecycle if v0,...,vk 2 aredistinct;inanundirectedgraph,acyclemustadditionallynot followthesameedgetwice.Agraphwithnocyclesiscalled acyclic Ifthereisapathfrom v to v′,then v′ issaidtobe reachable from v.Inan undirectedgraph,a connectedcomponent isasetofverticesfromthegraphandall edgesincidentonthoseverticessuchthateachvertexisreachablefromanygiven vertex,andnoothervertexfromthegraphisreachablefromanyvertexintheset. Anundirectedgraphis connected ifitcontainsexactlyoneconnectedcomponent (containingallverticesinthegraph).

217
1 DefinitionsinthissectionaretakenfromTarjan, DataStructuresandNetworkAlgorithms, SIAM,1983.

thereismorethanone,thegraphisunconnected.Thesequence[2,1,0,3]isapath fromvertex2tovertex3.Thepath[2,1,0,2]isacycle.Theonlypathinvolving vertex4isthe0-lengthpath[4].Therightmostconnectedcomponentisacyclic, andisthereforeafreetree.

Inadirectedgraph,theconnectedcomponentscontainthesamesetsofvertices thatyouwouldgetbyreplacingalldirectededgesbyundirectedones.Asubgraph ofadirectedgraphinwhicheveryvertexcanbereachedfromeveryotheriscalled a stronglyconnectedcomponent.Figures12.1and12.2illustratethesedefinitions.

A freetree isaconnected,undirected,acyclicgraph(whichimpliesthatthere isexactlyonesimplepathfromanynodetoanyother).Anundirectedgraphis biconnected ifthereareatleasttwosimplepathsbetweenanytwonodes.

Forsomeapplications,weassociateinformationwiththeedgesofagraph.For example,ifverticesrepresentcitiesandedgesrepresentroads,wemightwishto associatedistanceswiththeedges.Orifverticesrepresentpumpingstationsand edgesrepresentpipelines,wemightwishtoassociatecapacitieswiththeedges. We’llcallnumericinformationofthissort weights.

12.1AProgrammer’sSpecification

Thereisn’tanobvioussingleclassspecificationthatonemightgiveforprograms dealingwithgraphs,becausevariationsinwhatvariousalgorithmsneedcanhavea profoundeffectontheappropriaterepresentationsandwhat operationsthoserepresentationsconvenientlysupport.Forinstructionaluse,however,Figure12.3gives asample“one-size-fits-all”abstractionforgeneraldirectedgraphs,andFigure12.4 doesthesameforundirectedgraphs.Theideaisthatverticesandedgesareidentifiedbynon-negativeintegers.Anyadditionaldatathatonewantstoassociatewith avertexoredge—suchasamoreinformativelabeloraweight—canbeadded“on theside”intheformofadditionalarraysindexedbyvertexoredgenumber.

218 CHAPTER12.GRAPHS 0 1 2 3 ⋆ 4 5 6 7 8 9
Figure12.1:Anundirectedgraph.Thestarrededgeisincidentonvertices1and2. Vertex4hasdegree0;3,7,8,and9havedegree1;1,2and6have degree2;and0 and5havedegree3.Thedashedlinessurroundtheconnectedcomponents;since

Nodes5,6and7formastronglyconnectedcomponent.Theotherstronglyconnectedcomponentsaretheremainingindividualnodes.Theleftcomponentis acyclic.Nodes0and4haveanin-degreeof0;nodes1,2,and5–8haveanin-degree of1;andnode3hasanin-degreeof3.Nodes3and8haveout-degreesof0;1,2, 4,5,and7haveout-degreesof1;and0and6haveout-degreesof2.

12.2Representinggraphs

Graphshavenumerousrepresentations,alltailoredtotheoperationsthatarecritical tosomeapplication.

12.2.1AdjacencyLists

Iftheoperations succ, pred, leaving,and entering fordirectedgraphsareimportanttoone’sproblem(or incident and adjacent forundirectedgraphs),thenit maybeconvenienttoassociatealistofpredecessors,successors,orneighborswith eachvertex—anadjacencylist.Therearemanywaystorepresentsuchthings—as alinkedlist,forexample.Figure12.5showsamethodthatusesarraysinsucha waythattoallowaprogrammerbothtosequenceeasilythroughtheneighborsof adirectedgraph,andtosequencethroughthesetofalledges.I’veincludedonlya coupleofindicativeoperationstoshowhowthedatastructureworks.Itisessentiallyasetoflinkedliststructuresimplementedwitharraysandintegersinsteadof objectscontainingpointers.Figure12.6showsanexampleofaparticulardirected graphandthedatastructuresthatwouldrepresentit.

Anothervariationonessentiallythesamestructureistointroduceseparate typesforverticesandedges.VerticesandEdgeswouldthencontainfieldssuchas

12.2.REPRESENTINGGRAPHS 219 0 1 2 3 4 5 6 7 8
Figure12.2: Adirectedgraph.Thedashedcirclesshowconnectedcomponents.

/**Ageneraldirectedgraph.Foranygivenconcreteextensionofthis *class,adifferentsubsetoftheoperationslistedwillwork.For *uniformity,wetakeallverticestobenumberedwithintegers *between0andN-1.*/ publicinterfaceDigraph{

/**Numberofvertices.Verticesarelabeled0..numVertices()-1.*/ intnumVertices();

/**Numberofedges.Edgesarenumbered0..numEdges()-1.*/ intnumEdges();

/**TheverticesthatedgeEleavesandenters.*/ intleaves(inte); intenters(inte);

/**Trueiff[v0,v1]isanedgeinthisgraph.*/ booleanisEdge(intv0,intv1);

/**Theout-degreeandin-degreeofvertex#V.*/ intoutDegree(intv); intinDegree(intv);

/**ThenumberoftheKthedgeleavingvertexV,0<=K<outDegree(V).*/ intleaving(intv,intk);

/**ThenumberoftheKthedgeenteringvertexV,0<=K<inDegree(V).*/ intentering(intv,intk);

/**TheKthsuccessorofvertexV,0<=K<outDegree(V).Itisintended *thatsucc(v,k)=enters(leaving(v,k)).*/ intsucc(intv,intk);

/**TheKthpredecessorofvertexV,0<=K<inDegree(V).Itis intended *thatpred(v,k)=leaves(entering(v,k)).*/ intpred(intv,intk);

/**AddMinitiallyunconnectedverticestothisgraph.*/ voidaddVertices(intM);

/**AddanedgefromV0toV1.*/ voidaddEdge(intv0,intv1);

/**RemovealledgesincidentonvertexVfromthisgraph.*/ voidremoveEdges(intv);

/**Removeedge(v0,v1)fromthisgraph*/ voidremoveEdge(intv0,intv1);

220
CHAPTER12.GRAPHS
}
Figure12.3:Asampleabstractdirected-graphinterfacein Java.

/**Ageneralundirectedgraph.Foranygivenconcreteextensionof *thisclass,adifferentsubsetoftheoperationslistedwillwork. *Foruniformity,wetakeallverticestobenumberedwithintegers *between0andN-1.*/ publicinterfaceGraph{

/**Numberofvertices.Verticesarelabeled0..numVertices()-1.*/ intnumVertices();

/**Numberofedges.Edgesarenumbered0..numEdges()-1.*/ intnumEdges();

/**TheverticesonwhichedgeEisincident.node0isthe *smaller-numberedvertex.*/ intnode0(inte); intnode1(inte);

/**TrueiffverticesV0andV1areadjacent.*/ booleanisEdge(intv0,intv1);

/**Thenumberofedgesincidentonvertex#V.*/ intdegree(intv);

/**ThenumberoftheKthedgeincidentonV,0<=k<degree(V). */ intincident(intv,intk);

/**TheKthnodeadjacenttoV,0<=K<outDegree(V).Itis *intendedthatadjacent(v,k)=eithernode0(incident(v,k)) *ornode1(incident(v,k)).*/ intadjacent(intv,intk);

/**AddMinitiallyunconnectedverticestothisgraph.*/ voidaddVertices(intM);

/**Addan(undirected)edgebetweenV0andV1.*/ voidaddEdge(intv0,intv1);

/**RemovealledgesinvolvingvertexVfromthisgraph.*/ voidremoveEdges(intv);

/**Removethe(undirected)edge(v0,v1)fromthisgraph.*/ voidremoveEdge(intv0,intv1);

12.2.REPRESENTINGGRAPHS 221
}
Figure12.4:Asampleabstractundirected-graphclass.

/**Adigraph*/ publicclassAdjGraphimplementsDigraph{

/**AnewDigraphwithNunconnectedvertices*/ publicAdjGraph(intN){ numVertices=N;numEdges=0; enters=newint[N*N];leaves=newint[N*N]; nextOutEdge=newint[N*N];nextInEdge=newint[N*N]; edgeOut0=newint[N];edgeIn0=newint[N]; }

/**TheverticesthatedgeEleavesandenters.*/ publicintleaves(inte){returnleaves[e];} publicintenters(inte){returnenters[e];}

/**AddanedgefromV0toV1.*/ publicvoidaddEdge(intv0,intv1){ if(numEdges>=enters.length)

expandEdges();//Expandalledge-indexedarrays enters[numEdges]=v1;leaves[numEdges]=v0; nextInEdge[numEdges]=edgeIn0[v1]; edgeIn0[v1]=numEdges; nextOutEdge[numEdges]=edgeOut0[v0]; edgeOut0[v0]=numEdges; numEdges+=1; }

222 CHAPTER12.GRAPHS
Figure12.5:Adjacency-listimplementationforadirected graph.Onlyafewrepresentativeoperationsareshown.

/**ThenumberoftheKthedgeleavingvertexV,0<=K<outDegree(V).*/ publicintleaving(intv,intk){ inte;

for(e=edgeOut0[v];k>0;k-=1) e=nextOutEdge[e]; returne;

} /*Privatesection*/

privateintnumVertices,numEdges;

/*Thefollowingareindexedbyedgenumber*/ privateint[] enters,leaves, nextOutEdge,/*The#ofsiblingoutgoingedge,or-1*/ nextInEdge;/*The#ofsiblingincomingedge,or-1*/

/*edgeOut0[v]is#offirstedgeleavingv,or-1.*/ privateint[]edgeOut0;

/*edgeIn0[v]is#offirstedgeenteringv,or-1.*/ privateint[]edgeIn0;

Figure12.5,continued.

12.2.REPRESENTINGGRAPHS 223
}

Figure12.6: Agraphandoneformofadjacencylistrepresentation.Thelistsin thiscasearearrays.Thelowerfourarraysareindexedbyedgenumber,andthe firsttwobyvertexnumber.Thearray nextOutEdge formslinkedlistsofout-going edgesforeachvertex,withrootsin edgeOut0.Likewise, nextInEdge and edgeIn0 formlinkedlistsofincomingedgesforeachvertex.The enters and leaves arrays givetheincidentverticesforeachedge.

classVertex{ privateintnum;/*Numberofthisvertex*/ privateEdgeedgeOut0,edgeIn0;/*Firstoutgoing&incomingedges.*/

privateintnum;/*Numberofthisedge*/ privateVertexenters,leaves; privateEdgenextOutEdge,nextInEdge;

12.2.2Edgesets

Ifallweneedtodoisenumeratetheedgesandtellwhatnodestheyareincident on,wecansimplifytherepresentationin §12.2.1quiteabitbythrowingoutfields edgeOut0, edgeIn0, nextOutEdge,and nextInEdge.Wewillseeonealgorithm wherethisisuseful.

224 CHAPTER12.GRAPHS D A B G E F C H edgeOut0 1 3 5 11 1 8 9 12 edgeIn0 1 12 10 9 7 3 5 11 ABCDEFGH nextOutEdge 1 0 1 2 1 1 1 6 1 1 4 7 10 nextInEdge 1 1 1 1 1 4 1 1 2 8 6 1 0 enters 1 3 3 5 6 6 2 4 3 3 2 7 1 leaves 0 0 1 1 7 2 3 3 5 6 7 3 7 0123456789101112
}
}
classEdge{

12.2.3Adjacencymatrices

Ifone’sgraphsare dense (manyofthepossiblyedgesexist)andiftheimportant operationsinclude“Isthereanedgefrom v to w?”or“Theweightoftheedge between v and w,”thenwecanusean adjacencymatrix.Wenumberthevertices0 to |V |− 1(where |V | isthesizeoftheset V ofvertices),andthensetupa |V |×|V | matrixwithentry(i,j)equalto1ifthereisanedgefromthevertexnumbers i to theonenumbered j and0otherwise.Forweightededges,wecanletentry(i,j)be theweightoftheedgebetween i and j,orsomespecialvalueifthereisnoedge (thiswouldbeanextensionofthespecificationsofFigure12.3).Whenagraphis undirected,thematrixwillbesymmetric.Figure12.7illustratestwounweighted graphs—directedandundirected—andtheircorrespondingadjacencymatrices.

Adjacencymatricesforunweightedgraphshavearatherinterestingproperty. Take,forexample,thetopmatrixinFigure12.7,andconsidertheresultof multiplying thismatrixbyitself.Wedefinetheproductofanymatrix X withitself as

12.2.REPRESENTINGGRAPHS 225 D A B G E F C H M = ABCDEFGH A 01010000 B 00010100 C 00000010 D 00101001 E 00000000 F 00010000 G 00010000 H 01100010 D A B G E F C H M ′ = ABCDEFGH A 01010000 B 10010101 C 00010011 D 11101111 E 00010000 F 01010000 G 00110001 H 01110010
Figure12.7: Top:adirectedgraphandcorrespondingadjacencymatrix.Bottom: anundirectedvariantofthegraphandadjacencymatrix.
(X · X)ij = 0≤k<|V | Xik · Xkj .

Translatingthis,weseethat(M M )ij isequaltothenumberofvertices, k,such thatthereisanedgefromvertex i tovertex k (Mik =1)andthereisalsoanedge fromvertex k tovertex j (Mkj =1).Foranyothervertex,oneof Mik or Mkj will be0.Itshouldbeeasytosee,therefore,that M 2 ij isthenumberofpathsfollowing exactly two edgesfrom i to j.Likewise, M 3 ij representsthenumberofpathsthat areexactlythreeedgeslongbetween i and j.Ifweusebooleanarithmeticinstead (where0+1=1+1=1),weinsteadget1’sinallpositionswhere thereisatleast onepathoflengthexactlytwobetweentwovertices.

Adjacencymatricesarenotgoodforsparsegraphs(thosewherethenumber ofedgesismuchsmallerthan V 2).Itshouldbeobviousalsothattheypresent problemswhenonewantstoaddandsubtractverticesdynamically.

12.3GraphAlgorithms

Manyinterestinggraphalgorithmsinvolvesomesortoftraversaloftheverticesor edgesofagraph.Exactlyasfortrees,onecantraverseagraphineitherdepthfirstorbreadth-firstfashion(intuitively,walkingawayfromthestartingvertexas quicklyorasslowlyaspossible).

12.3.1Marking.

However,ingraphs,unliketrees,onecangetbacktoavertex byfollowingedges awayfromit,makingitnecessarytokeeptrackofwhatverticeshavealreadybeen visited,anoperationI’llcall marking thevertices.Thereareseveralwaystoaccomplishthis.

Markbits. Ifverticesarerepresentedbyobjects,asintheclass Vertex illustrated in §12.2.1,wecankeepabitineachvertexthatindicateswhetherthevertex hasbeenvisited.Thesebitsmustinitiallyallbeon(oroff)andarethen flippedwhenavertexisfirstvisited.Similarly,wecoulddothisforedges instead.

Markcounts. Aproblemwithmarkbitsisthatonemustbesuretheyareallset thesamewayatthebeginningofatraversal.Iftraversalsmaygetcutshort,

226 CHAPTER12.GRAPHS
M 2 = ABCDEFGH A 00111101 B 00111001 C 00010000 D 01100020 E 00000000 F 00101001 G 00101001 H 00020110 M 3 = ABCDEFGH A 01211021 B 01201021 C 00101001 D 00030110 E 00000000 F 01100020 G 01100020 H 00222002
Fortheexampleinquestion,weget

causingmarkbitstohavearbitrarysettingsafteratraversal,onemaybe abletousealargermarkinstead.Giveeachtraversalanumberinincreasing sequence(thefirsttraversalisnumber1,thesecondis2,etc.).Tovisita node,setitsmarkcounttothecurrenttraversalnumber.Eachnewtraversal isguaranteedtohaveanumbercontainedinnoneofthemarkfields(assuming themarkfieldsareinitializedappropriately,sayto0).

Bitvectors. If,asinourabstractions,verticeshavenumbers,onecankeepabit vector, M ,ontheside,where M [i]is1iffvertexnumber i hasbeenvisited. Bitvectorsareeasytoresetatthebeginningofatraversal.

Adhoc. Sometimes,theparticulartraversalbeingperformedprovidesawayof recognizingavisitedvertex.Onecan’tsayanythinggeneralaboutthis,of course.

12.3.2Ageneraltraversalschema.

Manygraphalgorithmshavethefollowinggeneralform.Italicizedcapital-letter namesmustbereplacedaccordingtotheapplication.

/*GENERALGRAPH-TRAVERSALSCHEMA*/ COLLECTION OF VERTICES fringe;

fringe= INITIAL COLLECTION; while(!fringe.isEmpty()){

Vertexv=fringe.REMOVE HIGHEST PRIORITY ITEM ();

if(! MARKED (v)){ MARK (v); VISIT (v); Foreachedge(v,w){

if(NEEDS PROCESSING (w)) Addwtofringe;

Inthefollowingsections,welookatvariousalgorithmsthatfitthisschema2

2 Inthiscontext,a schema (plural schemas or schemata)isatemplate,containingsomepieces thatmustbereplaced.Logicalsystems,forexample,oftencontain axiomschemata suchas

(∀xP(x)) ⊃P(y), where P maybereplacedbyanylogicalformulawithadistinguishedfreevariable(well,roughly).

12.3.GRAPHALGORITHMS 227
} } }

12.3.3Genericdepth-firstandbreadth-firsttraversal Depth-firsttraversalingraphsisessentiallythesameasin trees,withtheexception ofthecheckfor“alreadyvisited.”Toimplement /**Performtheoperation VISIT oneachvertexreachablefromV *indepth-firstorder.*/ voiddepthFirstVisit(Vertexv)

weusethegeneralgraph-traversalschemawiththefollowingreplacements.

COLLECTION OF VERTICES isastacktype.

INITIAL COLLECTION istheset {v}.

REMOVE HIGHEST PRIORITY ITEM popsandreturnsthetop.

MARK and MARKED setandcheckamarkbit(seediscussionabove).

NEEDS PROCESSING means“not MARKED.”

Here,asisoftenthecase,wecoulddispensewith NEEDS PROCESSING (make italwaysTRUE).Theonlyeffectwouldbetoincreasethesizeofthestacksomewhat.

Breadth-firstsearchisnearlyidentical.Theonlydifferencesareasfollows.

COLLECTION OF VERTICES isa(FIFO)queuetype.

REMOVE HIGHEST PRIORITY ITEM istoremoveandreturnthefirst(leastrecently-added)iteminthequeue.

12.3.4Topologicalsorting.

A topologicalsort ofadirectedgraphisalistingofitsverticesinsuchanorder thatifvertex w isreachablefromvertex v,then w islistedafter v.Thus,ifwe thinkofagraphasrepresentinganorderingrelationonthevertices,atopological sortisalinearorderingoftheverticesthatisconsistentwiththatorderingrelation.Acyclicdirectedgraphhasnotopologicalsort.Forexample,topologicalsort istheoperationthattheUNIX make utilityimplicitlyperformstofindanorder forexecutingcommandsthatbringseveryfileuptodatebeforeitisneededina subsequentcommand.

Toperformatopologicalsort,weassociateacountwitheach vertexofthe numberofincomingedgesfromas-yetunprocessedvertices. Fortheversionbelow, Iuseanarraytokeepthesecounts.Thealgorithmfortopologicalsortnowlooks likethis.

228 CHAPTER12.GRAPHS

/**AnarrayoftheverticesinGintopologicallysortedorder. *AssumesGisacyclic.*/ staticint[]topologicalSort(DigraphG) {

int[]count=newint[G.numVertices()]; int[]result=newint[G.numVertices()]; intk;

for(intv=0;v<G.numVertices();v+=1) count[v]=G.inDegree(v);

Graph-traversalschemareplacementfortopologicalsorting; returnresult;

Theschemareplacementfortopologicalsortingisasfollows.

COLLECTION OF VERTICES canbeanyset,multiset,list,orsequencetypefor vertices(stacks,queues,etc.,etc.).

INITIAL COLLECTION isthesetofall v with count[v]=0

REMOVE HIGHEST PRIORITY ITEM canremoveanyitem.

MARKED and MARK canbetrivial(i.e.,alwaysreturnFALSEanddonothing, respectively).

VISIT(v) makes v thenextnon-nullelementof result anddecrements count[w] foreachedge (v,w) in G.

NEEDS PROCESSING istrueif count[w]==0

Figure12.8illustratesthealgorithm.

12.3.5Minimumspanningtrees

Consideraconnectedundirectedgraphwithedgeweights.A minimum(-weight) spanningtree (or MST forshort)isatreethatisasubgraphofthegivengraph, containsalltheverticesofthegivengraph,andminimizesthesumofitsedge weights.Forexample,wemighthaveabunchofcitiesthatwewishtoconnect upwithtelephonelinessoastoprovideapathbetweenanytwo,allatminimal cost.Thecitiescorrespondtoverticesandthepossibleconnectionsbetweencities correspondtoedges3.Findingaminimalsetofpossibleconnectionsisthesame asfindingaminimumspanningtree(therecanbemorethanone).Todothis,we makeuseofausefulfactaboutMSTs.

3 Itturnsoutthattoget really lowestcosts,youwanttointroducestrategicallyplacedextra “cities”toserveasconnectingpoints.We’llignorethathere.

12.3.GRAPHALGORITHMS 229
}

Figure12.8: Theinputtoatopologicalsort(upperleft)andthreestages inits computation.Theshadednodesarethosethathavebeenprocessedandmovedto the result.Thestarrednodesaretheonesinthefringe.Subscriptsindicate count fields.Apossiblefinalsequenceofnodes,giventhisstart,is A,C,F,D,B,E,G, H.

230 CHAPTER12.GRAPHS ⋆A0 B1 ⋆C0 D2 E3 F1 G1 H1 A ⋆B0 ⋆C0 D1 E3 F1 G1 H1 A result: A ⋆B0 C ⋆D0 E3 ⋆F0 G1 H1 A C result: A ⋆B0 C ⋆D0 E2 F ⋆G0 H1 A C F result:

FACT: Iftheverticesofaconnectedgraph G aredividedintotwodisjointnonemptysets, V0 and V1,thenanyMSTfor G willcontainoneoftheedgesrunning betweenavertexin V0 andavertexin V1 thathasminimalweight.

Proof. It’sconvenienttouseaproofbycontradiction.SupposethatsomeMST, T ,doesn’tcontainanyoftheedgesbetween V0 and V1 withminimalweight.Considertheeffectofaddingto T anedgefrom V0 to V1, e,thatdoeshaveminimal weight,thusgiving T ′ (theremustbesuchanedge,sinceotherwise T wouldbe unconnected).Since T wasatree,theresultofaddingthisnewedgemusthavea cycleinvolving e (sinceitaddsanewpathbetweentwonodesthatalreadyhada pathbetweenthemin T ).Thisisonlypossibleifthecyclecontainsanotheredge from T , e′,thatalsorunsbetween V0 and V1.Byhypothesis, e hasweightlessthan e′.Ifweremove e′ from T ′,wegetatreeonceagain,butsincewehavesubstituted e for e′,thesumoftheedgeweightsforthisnewtreeislessthanthat for T ,acontradictionof T ’sminimality.Therefore,itwaswrongtoassumethat T contained nominimal-weightedgesfrom V0 to V1.(EndofProof)

Weusethisfactbytaking V0 tobeasetofprocessed(marked)verticesforwhich wehaveselectededgesthatformatree,andtaking V1 tobethesetofallother vertices.BytheFactabove,wemaysafelyaddtothetreeanyminimal-weightedge fromthemarkedverticestoanunmarkedvertex.

ThisgiveswhatisknownasPrim’salgorithm.Thistime,weintroducetwoextra piecesofinformationforeachnode, dist[v] (aweightvalue),and parent[v] (a Vertex).Ateachpointinthealgorithm,the dist valueforanunprocessedvertex (stillinthefringe)istheminimaldistance(weight)betweenitandaprocessed vertex,andthe parent valueistheprocessedvertexthatachievesthisminimal distance.

/**ForallverticesvinG,setPARENT[v]tobetheparentofvin *aMSTofG.ForeachvinG,DIST[v]maybealteredarbitrarily. *AssumesthatGisconnected.WEIGHT[e]istheweightofedge e.*/ staticvoidMST(GraphG,int[]weight,int[]parent,int[]dist)

for(intv=0;v<G.numVertices();v+=1){

dist[v]= ∞; parent[v]=-1;

LetrbeanarbitraryvertexinG; dist[r]=0;

Graph-traversalschemareplacementforMST;

Theappropriate“settings”forthegraph-traversalschema areasfollows.

COLLECTION OF VERTICES isapriorityqueueofverticesorderedby dist values,withsmaller distshavinghigherpriorities.

12.3.GRAPHALGORITHMS 231
{
}
}

INITIAL COLLECTION containsalltheverticesofG.

REMOVE HIGHEST PRIORITY ITEM removesthefirstiteminthepriorityqueue.

VISIT(v): foreachedge(v,w)with weight n,if w isunmarked,and dist[w] >n, set dist[w]to n andset parent[w]to v.

NEEDS PROCESSING(v) isalwaysfalse.

Figure12.9illustratesthisalgorithminaction.

12.3.6Single-sourceshortestpaths

Supposethatwearegivenaweightedgraph(directedorotherwise)andwewantto findtheshortestpathsfromsomestartingnodetoeveryreachablenode.Asuccinct presentationoftheresultsofthisalgorithmisknownasa shortest-pathtree.Thisis a(notnecessarilyminimum)spanningtreeforthegraphwith thedesiredstarting nodeastherootsuchthatthepathfromtheroottoeachothernodeinthetreeis alsoapathofminimaltotalweightinthefullgraph.

Acommonalgorithmfordoingthis,knownasDijkstra’salgorithm,looksalmost identicaltoPrim’salgorithmforMSTs.Wehavethesame PARENT and DIST data asbefore.However,whereasinPrim’salgorithm, DIST givestheshortestdistance fromanunmarkedvertextothemarkedvertices,inDijkstra’salgorithmitgives thelengthoftheshortestpathknownsofarthatleadstoitfromthestartingnode.

/**ForallverticesvinGreachablefromSTART,setPARENT[v] *tobetheparentofvinashortest-pathtreefromSTARTinG.For *allverticesinthistree,DIST[v]issettothedistancefromSTART *WEIGHT[e]arenon-negativeedgeweights.Assumesthatvertex *STARTisinG.*/

staticvoidshortestPaths(GraphG,intstart,int[]weight, int[]parent,double[]dist)

for(intv=0;v<G.numVertices();v+=1){ dist[v]= ∞; parent[v]=-1;

dist[start]=0;

Graph-traversalschemareplacementforshortest-pathtree;

wherewesubstituteintotheschemaasfollows:

COLLECTION OF VERTICES isapriorityqueueofverticesorderedby dist values,withsmaller distshavinghigherpriorities.

INITIAL COLLECTION containsalltheverticesofG.

232 CHAPTER12.GRAPHS
{
}
}

12.3.GRAPHALGORITHMS 233

Figure12.9: Prim’salgorithmforminimumspanningtree.Vertex r is A.The numbersinthenodesdenote dist values.Dashededgesdenote parent values; theyformaMSTafterthelaststep.Unshadednodesareinthefringe.Thelast twosteps(whichdon’tchange parent pointers)havebeencollapsedintoone.

A 0 B ∞ C ∞ D ∞ E ∞ F ∞ G ∞ H ∞ 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E ∞ F ∞ G 7 H ∞ 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 4 D 3 E 3 F ∞ G 7 H ∞ 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 2 D 3 E 3 F 1 G 7 H 2 2 5 3 7 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 2 D 3 E 3 F 1 G 7 H 2 2 5 3 7 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 2 D 3 E 3 F 1 G 1 H 2 2 5 3 7 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 2 D 3 E 3 F 1 G 1 H 2 2 5 3 7 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 2 D 3 E 3 F 1 G 1 H 2 2 5 3 7 5 3 2 2 3 6 4 2 1 1

REMOVE HIGHEST PRIORITY ITEM removesthefirstiteminthepriorityqueue.

MARKEDandMARK canbetrivial(returnfalseanddonothing,respectively).

VISIT(v): foreachedge(v,w)withweight n,if dist[w] >n + dist[v],set dist[w]to n + dist[v]andset parent[w]to v.Reorder fringe asneeded.

NEEDS PROCESSING(v) isalwaysfalse.

Figure12.10illustratesDijkstra’salgorithminaction.

Becauseoftheirverysimilarstructure,theasymptoticrunningtimesofDijkstra’sandPrim’salgorithmsaresimilar.Wevisiteachvertexonce(removingan itemfromthepriorityqueue),andreorderthepriorityqueueatmostonceforeach edge.Hence,if V isthenumberofverticesof G and E isthenumberofedges,we getanupperboundonthetimerequiredbythesealgorithmsof O((V + E) lg V ).

12.3.7A*search

Dijkstra’salgorithmefficientlyfinds all shortestpathsfromasinglestartingpoint (source)inagraph.Suppose,however,thatyouareonlyinterestedinasingle shortestpathfromonesourcetoonedestination.Wecouldhandlethisbymodifying theVISITstepinDijkstra’salgorithm:

VISIT(v): [Singledestination]If v isthedestinationnode,exitthealgorithm.Otherwise,foreachedge(v,w)withweight n,if dist[w] >n + dist[v],set dist[w] to n + dist[v]andset parent[w]to v.Reorder fringe asneeded.

Thisavoidscomputationsofpathsfartherfromthesourcethanisthedestination, butDijkstra’salgorithmcanstilldoagreatdealofunnecessarywork.

Suppose,forexample,thatyouwanttofindashortestpathbyroadfromDenver toNewYorkCity.True,weareguaranteedthatwhenweselectNewYorkfrom thepriorityqueue,wecanstopthealgorithm.Unfortunately,beforethealgorithm considersasingleManhattanstreet,itwillhavefoundtheshortestpathfromDenver tonearlyeverydestinationonthewestcoast(asidefromAlaska),Mexico,andthe westernprovincesofCanada—allofwhichareinthewrongdirection!

Intuitively,wemightimprovethesituationbyconsidering nodesinadifferent order—onebiasedtowardourintendeddestination.Itturns outthatthenecessary adjustmentiseasy.Theresultingalgorithmiscalled A*search4 :

234
CHAPTER12.GRAPHS
4DiscoveredbyNilsNilssonandBertramRaphaelin1968.PeterHartdemonstratedoptimality.

12.3.GRAPHALGORITHMS 235

Figure12.10: Dijkstra’salgorithmforshortestpaths.Thestartingnode is A

Numbersinnodesrepresentminimumdistancetonode A sofarfound(dist).

Dashedarrowsrepresent parent pointers;theirfinalvaluesshowtheshortest-path tree.Thelastthreestepshavebeencollapsedtoone.

A 0 B ∞ C ∞ D ∞ E ∞ F ∞ G ∞ H ∞ 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E ∞ F ∞ G 7 H ∞ 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E 5 F ∞ G 7 H ∞ 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E 5 F ∞ G 6 H 9 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E 5 F 7 G 6 H 9 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E 5 F 6 G 6 H 7 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E 5 F 6 G 6 H 7 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1

/**ForallverticesvinGalongashortestpathfromSTARTtoEND, *setPARENT[v]tobethepredecessorofvinthepath,andset *DIST[v]issettothedistancefromSTART.WEIGHT[e]are *non-negativeedgeweights.H[v]isaconsistentheuristic *estimateofthedistancefromvtoEND.AssumesthatvertexSTART *isinG,andthatENDisinGandreachablefromSTART.*/ staticvoidshortestPath(GraphG,intstart,intend,int[] weight,int[]h, int[]parent,doubledist[])

for(intv=0;v<G.numVertices();v+=1){ dist[v]= ∞; parent[v]=-1; }

dist[start]=0;

Graph-traversalschemareplacementforA*search; }

TheschemaforA*searchisidenticaltothatforDijkstra’salgorithm,exceptthat theVISITstepismodifiedtotheSingleDestinationversionabove,andwereplace COLLECTION OF VERTICES [A*search]isapriorityqueueofverticesordered bythevalueof dist(v)+h[v] values,withsmallervalueshavinghigherpriorities.

Thedifference,inotherwords,isthatweconsidernodesinorderofourcurrentbest estimateoftheminimumdistancetothedestinationonapath thatgoesthrough thenode.Inotherwords,Dijkstra’salgorithmisessentiallythesame,butuses h[v]=0

Foroptimalandcorrectbehavior,weneedsomerestrictions on h,theheuristic distanceestimate.Asindicatedinthecomment,werequirethat h be consistent. Thismeansfirstthatitmustbe admissible: h[v] mustnotoverestimatetheactual shortest-pathlengthfrom v tothedestination.Second,werequirethatif(v,w)is anedge,then

h[v] ≤ weight[(v,w)]+ h[w].

Thisisaversionofthefamiliartriangleinequality:thelengthofanysideofa trianglemustbelessthanorequaltothesumofthelengthsof theothertwo. Undertheseconditions,theA*algorithmisoptimalinthesensethatnoother algorithmthatusesthesameheuristicinformation(i.e., h)canvisitfewernodes (somequalificationisneedediftherearemultiplepathswiththesameweight.)

Consideringagainroute-planningfromDenver,wecanusedistancetoNew York“asthecrowflies”asourheuristic,sincethesedistancessatisfythetriangle inequalityandarenogreaterthanthelengthofanycombinationofroadsegments betweentwopoints.Inreal-lifeapplications,however,thegeneralpracticeistodo agreatdealofpreprocessingofthedatasothatactualqueriesdon’tactuallyneed todoafullsearchandcanthusoperatequickly.

236
CHAPTER12.GRAPHS
{

12.3.8Kruskal’salgorithmforMST

Justsoyoudon’tgettheideathatourgraphtraversalschema istheonlypossible waytogo,we’llconsidera“classical”methodforformingaminimumspanningtree, knownasKruskal’salgorithm.Thisalgorithmreliesona union-find structure.At anytime,thisstructurecontainsa partition ofthevertices:acollectionofdisjoint setsofverticesthatincludesallofvertices.Initially,eachvertexisaloneinitsown set.TheideaisthatwebuildupanMSToneedgeatatime.Werepeatedlychoose anedgeofminimumweightthatjoinsverticesintwodifferent sets,addthatedgeto theMSTwearebuilding,andthencombine(union)thetwosets ofverticesintoone set.Thisprocesscontinuesuntilallthesetshavebeencombinedintoone(which mustcontainallthevertices).Atanypoint,eachsetisabunchofverticesthatare allreachablefromeachotherviatheedgessofaraddedtothe MST.Whenthere isonlyoneset,itmeansthatalloftheverticesarereachable,andsowehaveaset ofedgesthatspansthetree.ItfollowsfromtheFactin §12.3.5thatifwealways addtheminimallyweightededgethatconnectstwoofthedisjointsetsofvertices, thatedgecanalwaysbepartofaMST,sothefinalresultmustalsobeaMST. Figure12.11illustratestheidea.

Fortheprogram,I’llassumewehaveatype—UnionFind—representingsetsof setsofvertices.Weneedtwooperationsonthistype:aninquiry S .sameSet(v, w) thattellsuswhethervertices v and w areinthesamesetin S,andanoperation S .union(v, w) thatcombinesthesetscontainingvertices v and w intoone.I’ll alsoassumea“setofedges”settocontaintheresult.

12.3.GRAPHALGORITHMS 237

Figure12.11:Kruskal’salgorithm.Thenumbersintheverticesdenotesets:vertices withthesamenumberareinthesameset.Dashededgeshavebeenaddedtothe MST.ThisisdifferentfromtheMSTfoundinFigure12.9.

238
A 0 B 1 C 2 D 3 E 4 F 5 G 6 H 7 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 1 C 2 D 3 E 4 F 5 G 6 H 6 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 1 C 2 D 3 E 4 F 4 G 6 H 6 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 0 C 2 D 3 E 4 F 4 G 6 H 6 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 0 C 2 D 3 E 2 F 2 G 6 H 6 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 0 C 2 D 3 E 2 F 2 G 2 H 2 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 0 C 0 D 3 E 0 F 0 G 0 H 0 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 0 C 0 D 0 E 0 F 0 G 0 H 0 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7
CHAPTER12.GRAPHS

/**ReturnasubsetofedgesofGformingaminimumspanningtreeforG. *Gmustbeaconnectedundirectedgraph.WEIGHTgivesedgeweights.*/ EdgeSetMST(GraphG,int[]weight)

UnionFindS; EdgeSetE; //InitializeSto {{v}| v isavertexof G }; S=newUnionFind(G.numVertices()); E=newEdgeSet();

Foreachedge(v,w)inGinorderofincreasingweight{ if(!S.sameSet(v,w)){ Add(v,w)toE; S.union(v,w);

Thetrickypartisthisunion-findbit.Fromwhatyouknow,you mightwell guessthateach sameSet operationwillrequiretimeΘ(N lg N )intheworstcase (lookineachofupto N setseachofsizeupto N ).Interestinglyenough,thereisa betterway.Let’sassume(asinthisproblem)thatthesetscontainintegersfrom0to N 1.Atanygiventime,therewillbeupto N disjointsets;we’llgivethemnames (well,numbersreally)byselectingasingle representativemember ofeachsetand usingthatmember(anumberbetween0and N 1)toidentifytheset.Then,ifwe canfindthecurrentrepresentativememberofthesetcontaininganyvertex,wecan telliftwoverticesareinthesamesetbyseeingiftheirrepresentativemembersare thesame.Onewaytodothisistorepresenteachdisjointsetasa tree ofvertices, butwithchildrenpointingatparents(youmayrecallthatIsaidsuchastructure wouldeventuallybeuseful).Therootofeachtreeistherepresentativemember, whichwemayfindbyfollowingparentlinks.Forexample,wecanrepresenttheset ofsets

12.3.GRAPHALGORITHMS 239
{
} }
}
returnE;
{{1, 2, 4, 6, 7}, {0, 3, 5}, {8, 9, 10}} withtheforestoftrees 1 7 2 4 6 3 0 5 8 10 9

Werepresentallthiswithasingleintegerarray, parent,with parent[v] containingthenumberofparentnodeof v,or 1if v hasnoparent(i.e.,isarepresentativemember).Theunionoperationisquitesimple:tocompute S .union(v,w), wefindtherootsofthetreescontaining v and w (byfollowingthe parent chain) andthenmakeoneofthetworootsthechildoftheother.So,forexample,we couldcompute S .union(6,0) byfindingtherepresentativememberfor6(which is1),andfor0(whichis3)andthenmaking3pointto1:

Forbestresults,weshouldmakethetreeoflesser“rank”(roughly,height)pointto theoneoflargerrank5

However,whilewe’reatit,let’sthrowinatwist.Afterwetraversethepaths from6upto1andfrom0to3,we’llre-organizethetreebyhavingeverynodein thosepathspointdirectlyatnode1(ineffect“memoizing”theresultoftheoperationoffindingtherepresentativemember).Thus,afterfindingtherepresentative memberfor6and0andunioning,wewillhavethefollowing,muchflattertree:

Thisre-arrangement,whichiscalled pathcompression, causessubsequentinquiriesaboutvertices6,4,and0tobeconsiderablyfasterthanbefore.Itturns outthatwiththistrick(andtheheuristicofmakingtheshallowertreepointatthe deeperinaunion),anysequenceof M union and sameSet operationsonasetofsets containingatotalof N elementscanbeperformedintime O(α(M,N )M ).Here, α(M,N )isaninverseof Ackerman’sfunction.Specifically, α(M,N )isdefinedas theminimum i suchthat A(i, ⌊M/N ⌋) > lg N ,where

5We’recheatingabitinthissectiontomaketheeffectsoftheoptimizationswedescribeabit clearer.Wecouldnothaveconstructedthe“stringy”treesintheseexampleshadwealwaysmade thelesser-ranktreepointtothegreater.Soineffect,ourexamplesstartfromunion-findtreesthat wereconstructedinhaphazardfashion,andweshowwhathappensifwestartdoingthingsright fromthenon.

240
CHAPTER12.GRAPHS
1 7 2 4 6 3 0 5
1 7 2 4 6 0 3 5
A
A
i, 1)=
A
i,j)=
1
(1,j)=2j , for j ≥ 1,
(
A(i 1, 2), for i ≥ 2,
(
A(i
,A(i,j 1)), for i,j ≥ 2

Well,thisisallrathercomplicated,butsufficeittosaythat A growsmonumentally fast,sothat α growswithsubglacialslowness,andisforallmortalpurposes ≤ 4.In short,the amortizedcost of M operations(union and sameSets inanycombination) isroughlyconstantperoperation.Thus,thetimerequiredforKruskal’salgorithm isdominatedbythesortingtimefortheedges,andisasymptotically O(E lg E),for E thenumberofedges.Thisinturnequals O(E lg V )foraconnectedgraph,where V isthenumberofvertices.

Exercises

12.1. Aborogoveandasnarkfindthemselvesinamazeoftwistylittlepassages thatconnectnumerousrooms,oneofwhichisthemazeexit.Thesnark,beinga boojum,findsborogovesespeciallytastyafteralongdayofcausingpeopletosoftly andsilentlyvanishaway.Unfortunatelyforthesnark(andcontrariwiseforhis prospectivesnack),borogovescanruntwiceasfastassnarksandhaveanuncanny abilityoffindingtheshortestroutetotheexit.Fortunatelyforthesnark,his preternaturalsensestellhimpreciselywheretheborogove isatanytime,andhe knowsthemazelikethebackofhis,er,talons.Ifhecanarriveattheexitorin anyoftheroomsintheborogove’spathbeforetheborogovedoes(strictlybefore, notatthesametime),hecancatchit.Theborogoveisnotparticularlyintelligent, andwillalwaystaketheshortestpath,evenifthesnarkiswaitingonit.

Thus,forexample,inthefollowingmaze,thesnark(startingat‘S’)willdinein theshadedroom,whichhereachesin6timeunits,andtheborogove(startingat ‘B’)in7.Thenumbersontheconnectingpassagesindicatedistances(thenumbers insideroomsarejustlabels).Thesnarktravelsat0.5units/hour,andtheborogove at1unit/hour.

Writeaprogramtoreadinamazesuchastheabove,andprintoneoftwo messages: Snarkeats,or Borogoveescapes,asappropriate.Placeyouranswer inaclass Chase (seethetemplatesin˜cs61b/hw/hw7).

Theinputisasfollows.

• Apositiveinteger N ≥ 3indicatingthenumberofrooms.Youmayassume that N< 1024.Theroomsareassumedtobenumberedfrom0to N 1.

12.3.GRAPHALGORITHMS 241
S 3 4 5 6 7 B 8 9 E 2 2 4 1 3 6 2 1 2 3 8 1 1 7

Room0isalwaystheexit.Initiallyroom1containstheborogoveandroom2 containsthesnark.

• Asequenceofedges,eachconsistingoftworoomnumbers(the orderofthe roomnumbersisimmaterial)followedbyanintegerdistance

Assumethatwhenevertheborogovehasachoicebetweenpassagestotake(i.e.,all leadtoashortestpath),hechoosestheonetotheroomwiththelowestnumber. Forthemazeshown,apossibleinputisasfollows.

10

232242354361413

566582591672693

701798181

897

242
CHAPTER12.GRAPHS

Index A*search,232

AbstractCollectionclass,50–52

AbstractCollectionmethods add,52 iterator,52 size,52 toString,52

AbstractListclass,51–55,60

AbstractListmethods add,55 get,55 listIterator,55,56 remove,55 removeRange,55 set,55 size,55

AbstractList.ListIteratorImplclass,56,57

AbstractList.modCountfield,55

AbstractList.modCountfields modCount,55

AbstractMapclass,58,59

AbstractMapmethods clear,59 containsKey,59 containsValue,59 entrySet,59 equals,59 get,59 hashCode,59 isEmpty,59 keySet,59 put,59 putAll,59 remove,59 size,59 toString,59 values,59

AbstractSequentialListclass,54–58,61

AbstractSequentialListmethods listIterator,58 size,58 acyclicgraph,215 adapterpattern,79 add(AbstractCollection),52 add(AbstractList),55 add(ArrayList),66 add(Collection),30 add(LinkedIter),75 add(List),34 add(ListIterator),25 add(Queue),80 add(Set),32 addAll(Collection),30 addAll(List),34 addFirst(Deque),80 additivegenerator,207 adjacencylist,217 adjacencymatrix,223 adjacentvertex,215 AdjGraphclass,220 admissibledistanceestimate,234 algorithmiccomplexity,5–20 alpha-betapruning,127–129 amortizedcost,16–18,63 ancestor(oftreenode),89

Arrayclass,50

Arraymethods newInstance,50

ArrayDequeclass,84

ArrayListclass,63–65

ArrayListmethods add,66 check,66 ensureCapacity,66

243

get,65 remove,65 removeRange,66 set,65 size,65

ArrayStackclass,81

asymptoticcomplexity,7–9

averagetime,6

AVLtree,182–184

B-tree,163–170

backtracking,77

biconnectedgraph,216

Big-Ohnotation definition,7

Big-Omeganotation definition,9

Big-Thetanotation definition,9

bin,131

binarysearchtree(BST),105

binarytree,90,91

binary-search-treeproperty,105

BinaryTree,93

BinaryTreemethods left,93 right,93 setLeft,93

setRight,93

binomialcomb,152

breadth-firsttraversal,98,226

BST, see binarysearchtree deletingfrom,109

searching,107

BSTclass,108

BSTmethods find,107 insert,109,111 remove,110

swapSmallest,110

BSTSetclass,113,114

callstack,78

chainedhashtables,131

check(ArrayList),66

child(Tree),93

children(intree),89 circularbuffer,82 Classclass,50 Classmethods getComponentType,50 clear(AbstractMap),59 clear(Collection),30 clear(Map),41 clone(LinkedList),73 codomain,37

Collectionclass,29,30

Collectionhierarchy,27

Collectioninterface,24–28

Collectionmethods add,30 addAll,30 clear,30 contains,29 containsAll,29 isEmpty,29 iterator,29 remove,30 removeAll,30 retainAll,30 size,29 toArray,29

Collectionsclass,160,200,214

Collectionsmethods shuffle,214 sort,160 synchronizedList,200 collision(inhashtable),132

Comparableclass,36

Comparablemethods compareTo,36 comparator(SortedMap),42 comparator(SortedSet),38

Comparatorclass,37

Comparatormethods compare,37 equals,37 compare(Comparator),37 compareTo(Comparable),36 completetree,90,91 complexity,5–20

INDEX
244

compressingtables,179

concave,19

concurrency,199–203

ConcurrentModificationExceptionclass,54 connectedcomponent,215

connectedgraph,215

consistencywith .equals,36

consistentdistanceestimate,234 contains(Collection),29

containsAll(Collection),29 containsKey(AbstractMap),59 containsValue(AbstractMap),59

cycleinagraph,215

deadlock,203

degree(Tree),93

degreeofavertex,215

degreeofnode,89

deletingfromaBST,109

depthoftreenode,90

depth-firsttraversal,225

Dequeclass,80

dequedatastructure,76

Dequemethods

addFirst,80

last,80

removeLast,80

descendent(oftreenode),89

designpattern adapter,79

definition,47

Singleton,98

TemplateMethod,47

Visitor,100

digraph, see directedgraph

Digraphclass,218

Dijkstra’salgorithm,230

directedgraph,215

distributioncountingsort,146

domain,37

doublehashing,136

doublelinking,68–71

double-endedqueue,76

edge,89

edge,inagraph,215

edge-setgraphrepresentation,222 enhancedforloop,23 ensureCapacity(ArrayList),66 Entryclass,73 entrySet(AbstractMap),59 entrySet(Map),40 Enumerationclass,22 .equals,consistentwith,36 equals(AbstractMap),59 equals(Comparator),37 equals(Map),40 equals(Map.Entry),41 equals(Set),32 expressiontree,91 externalnode,89 externalpathlength,90 externalsorting,140

FIFOqueue,76 find(BST),107 findExitprocedure,78 first(PriorityQueue),119 first(Queue),80 first(SortedSet),38 firstKey(SortedMap),42 forloop,enhanced,23 forest,90 freetree,216 fulltree,90,91

gametrees,125–129 Gamma,Erich,47 get(AbstractList),55 get(AbstractMap),59 get(ArrayList),65 get(HashMap),134 get(List),33 get(Map),40 getClass(Object),50 getComponentType(Class),50 getKey(Map.Entry),41 getValue(Map.Entry),41 graph

acyclic,215 biconnected,216 breadth-firsttraversal,226

INDEX 245

connected,215

depth-firsttraversal,225 directed,215 path,215

traversal,general,225 undirected,215

Graphclass,219

graphs,215–239

hashCode(AbstractMap),59

hashCode(Map),40

hashCode(Map.Entry),41

hashCode(Object),132,137

hashCode(Set),32

hashCode(String),138

hashing,131–138

hashingfunction,131,136–138

HashMapclass,134

HashMapmethods get,134

put,134

hasNext(LinkedIter),74

hasNext(ListIterator),25

hasPrevious(LinkedIter),74

hasPrevious(ListIterator),25

headMap(SortedMap),42

headSet(SortedSet),38

heap,117–125

heightoftree,90

Helm,Richard,47

image,37

in-degree,215

incidentedge,215

indexOf(List),33

indexOf(Queue),80

indexOf(Stack),77

indexOf(StackAdapter),81

inordertraversal,98

insert(BST),109,111

insert(PriorityQueue),119

insertionsort,141

insertionSort,142

internalnode,89

internalpathlength,90

internalsorting,140

InterruptedExceptionclass,202 inversion,140

isEmpty(AbstractMap),59

isEmpty(Collection),29

isEmpty(Map),40

isEmpty(PriorityQueue),119

isEmpty(Queue),80

isEmpty(Stack),77

isEmpty(StackAdapter),81

Iterableclass,23

Iterablemethods

iterator,23

iterativedeepening,127 iterator,22

iterator(AbstractCollection),52

iterator(Collection),29 iterator(Iterable),23

iterator(List),33

Iteratorinterface,22–24

java.langclasses

Class,50

Comparable,36

InterruptedException,202 Iterable,23

java.lang.reflectclasses

Array,50

java.utilclasses

AbstractCollection,50–52

AbstractList,51–55,60

AbstractList.ListIteratorImpl,56,57

AbstractMap,58,59

AbstractSequentialList,54–58,61

ArrayList,63–65

Collection,29,30

Collections,160,200,214

Comparator,37

ConcurrentModificationException,54

Enumeration,22

HashMap,134

LinkedList,73

List,31,33,34

ListIterator,25

Map,39–41

Map.Entry,41

246 INDEX

Random,207,209

Set,31–32

SortedMap,39,42

SortedSet,37,38,111–113

Stack,76

UnsupportedOperationException,28

java.utilinterfaces

Collection,24–28

Iterator,22–24

ListIterator,24

java.util.LinkedListclasses

Entry,73

LinkedIter,73,74

Johnson,Ralph,47

key,105

key,insorting,139

keySet(AbstractMap),59

keySet(Map),40

Kruskal’salgorithm,235

label(Tree),93

last(Deque),80

last(SortedSet),38

lastIndexOf(List),33

lastIndexOf(Queue),80

lastKey(SortedMap),42

leafnode,89

left(BinaryTree),93

leveloftreenode,90

LIFOqueue,76

linearcongruentialgenerator,205–207

linearprobes,132

link,67

linkedstructure,67

LinkedIterclass,73,74

LinkedItermethods

LinkedListclass,73

LinkedListmethods clone,73 listIterator,73 Listclass,31,33,34

Listmethods add,34 addAll,34 get,33 indexOf,33 iterator,33 lastIndexOf,33 listIterator,33 remove,34 set,34 subList,33

listIterator(AbstractList),55,56 listIterator(AbstractSequentialList),58 listIterator(LinkedList),73 listIterator(List),33

ListIteratorclass,25 ListIteratorinterface,24

ListIteratormethods add,25 hasNext,25 hasPrevious,25 next,25 nextIndex,25 previous,25 previousIndex,25 remove,25 set,25

Little-ohnotation definition,9 logarithm,propertiesof,19 Lomuto,Nico,150

LSD-firstradixsorting,157

Mapclass,39–41

Maphierarchy,26

Mapmethods clear,41 entrySet,40 equals,40 get,40

INDEX 247
add,75 hasNext,74 hasPrevious,74 next,74 nextIndex,75 previous,74 previousIndex,75 remove,75 set,75

hashCode,40

isEmpty,40

keySet,40 put,41 putAll,41 remove,41 size,40 values,40

Map.Entryclass,41

Map.Entrymethods equals,41

getKey,41

getValue,41

hashCode,41

setValue,41

mapping,37

markingvertices,224 mergesorting,151

message-passing,203

minimaxalgorithm,126

minimumspanningtree,227,235 mod,133

modCount(field),55 monitor,201–203

MSD-firstradixsorting,157 mutualexclusion,200

naturalordering,36 newInstance(Array),50 next(LinkedIter),74 next(ListIterator),25

nextIndex(LinkedIter),75

nextIndex(ListIterator),25

nextInt(Random),209

nodeoftree,89

node,inagraph,215

non-terminalnode,89

nulltreerepresentation,97 numChildren(Tree),93

O( ), see Big-Ohnotation

o( ), see Little-ohnotation

Objectmethods

getClass,50

hashCode,132,137

Ω( ), see Big-Omeganotation

open-addresshashtable,132–136 ordernotation,7–9 orderedtree,89,90 ordering,natural,36 ordering,total,36 orthogonalrangequery,113 out-degree,215

parent(Tree),94 partitioning(forquicksort),150 pathcompression,238 pathinagraph,215 pathlengthintree,90 performance of AbstractList,60 of AbstractSequentialList,61 pointquadtree,117 point-regionquadtree,117 pop(Stack),77 pop(StackAdapter),81 positionaltree,90 postordertraversal,98 potentialmethod,17–18 PRquadtree,117 preordertraversal,98 previous(LinkedIter),74 previous(ListIterator),25 previousIndex(LinkedIter),75 previousIndex(ListIterator),25 Prim’salgorithm,229 primarykey,139 priorityqueue,117–125

PriorityQueueclass,119

PriorityQueuemethods

first,119 insert,119 isEmpty,119 removeFirst,119 properancestor,89 properdescendent,89 propersubtree,89 protectedconstructor,useof,51 protectedmethods,useof,51 pseudo-randomnumbergenerators,205–214

248 INDEX

additive,207

arbitraryranges,208 linearcongruential,205–207 non-uniform,209–212

push(Stack),77

push(StackAdapter),81

put(AbstractMap),59

put(HashMap),134

put(Map),41

putAll(AbstractMap),59

putAll(Map),41

quadtree,117

Queueclass,80

queuedatatype,76

Queuemethods add,80

first,80 indexOf,80 isEmpty,80

lastIndexOf,80

removeFirst,80 size,80

quicksort,149

radixsorting,156

“random”numbergeneration, see pseudorandomnumbergenerators randomaccess,51

Randomclass,207,209

Randommethods nextInt,209 randomsequences,212 range,37

rangequeries,111 rangequery,orthogonal,113

reachablevertex,215 record,insorting,139 recursion andstacks,77

Red-blacktree,170

reflection,50

reflexiveedge,215

regionquadtree,117 remove(AbstractList),55

remove(AbstractMap),59

remove(ArrayList),65 remove(BST),110 remove(Collection),30 remove(LinkedIter),75 remove(List),34 remove(ListIterator),25 remove(Map),41 removeAll(Collection),30 removeFirst(PriorityQueue),119 removeFirst(Queue),80 removeLast(Deque),80 removeRange(AbstractList),55 removeRange(ArrayList),66 removingfromaBST,109 retainAll(Collection),30 right(BinaryTree),93 rootnode,89 rootedtree,89 rotationofatree,179

searchingaBST,107 secondarykey,139 selection,160–161 selectionsort,146 sentinelnode,68,70 set(AbstractList),55 set(ArrayList),65 set(LinkedIter),75 set(List),34 set(ListIterator),25 Setclass,31–32

Setmethods add,32 equals,32 hashCode,32 setChild(Tree),93 setLeft(BinaryTree),93 setParent(Tree),94 setRight(BinaryTree),93 setValue(Map.Entry),41 Shell’ssort(shellsort),141 shortestpath

singledestination,232 single-source,allpaths,230 tree,230

INDEX 249

shuffle(Collections),214

singlelinking,67–68

Singletonpattern,98

size(AbstractCollection),52

size(AbstractList),55

size(AbstractMap),59

size(AbstractSequentialList),58

size(ArrayList),65

size(Collection),29

size(Map),40

size(Queue),80

size(Stack),77

size(StackAdapter),81

skiplist,193–196

sort(Collections),160

SortedMapclass,39,42

SortedMapmethods

comparator,42

firstKey,42

headMap,42

lastKey,42

subMap,42

tailMap,42

SortedSetclass,37,38,111–113

SortedSetmethods

comparator,38

first,38

headSet,38

last,38

subSet,38

tailSet,38

sorting,139–160

distributioncounting,146

exchange,149

insertion,141

merge,151

quicksort,149

radix,156

Shell’s,141

straightselection,146

sparsearrays,179

splaytree,184–193

splayFind,188

stablesort,139

stack

andrecursion,77

Stackclass,76,77,79 stackdatatype,76

Stackmethods indexOf,77 isEmpty,77

pop,77

push,77

size,77

top,77

StackAdapterclass,79,81

StackAdaptermethods indexOf,81

isEmpty,81

pop,81

push,81

size,81 top,81

staticvaluation,127 straightinsertionsort,141 straightselectionsort,146 Stringmethods hashCode,138 structuralmodification,35 subgraph,215 subList(List),33 subMap(SortedMap),42 subSet(SortedSet),38 subtree,89

swapSmallest(BST),110 symmetrictraversal,98 synchronization,199–203 synchronized keyword,200 synchronizedList(Collections),200

tailMap(SortedMap),42

tailSet(SortedSet),38

TemplateMethodpattern,47

terminalnode,89

Θ(·), see Big-Thetanotation thread-safe,200 threads,199–203

toArray(Collection),29 top(Stack),77 top(StackAdapter),81

250 INDEX

topologicalsorting,226

toString(AbstractCollection),52

toString(AbstractMap),59

totalordering,36

traversaloftree,98–99

traversinganedge,89

Tree,93

tree,89–103

arrayrepresentation,96–97

balanced,163

binary,90,91

complete,90,91

edge,89

free,216

full,90,91

height,90

leaf-uprepresentation,95

node,89

ordered,89,90

positional,90

root,89

root-downrepresentation,94

rooted,89

rotation,179

traversal,98–99

Treemethods

child,93

degree,93

label,93

numChildren,93

parent,94

setChild,93

setParent,94

treenode,89

Trie,170–179

ucb.utilclasses

AdjGraph,220

ArrayDeque,84

ArrayStack,81

BSTSet,114

Deque,80

Digraph,218

Graph,219

Queue,80

Stack,77,79

StackAdapter,79,81 unbalancedsearchtree,111 undirectedgraph,215 union-findalgorithm,237–239

UnsupportedOperationExceptionclass,28

values(AbstractMap),59

values(Map),40 vertex,inagraph,215

views,31

visitinganode,98 Visitorpattern,100 Vlissides,John,47

worst-casetime,5

INDEX 251

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.