Instant ebooks textbook Database systems. the complete book 2nd ed. hector garcia-molina download al

Page 1


Database Systems. The Complete Book 2nd ed. Hector Garcia-Molina

Visit to download the full and correct content document: https://ebookmass.com/product/database-systems-the-complete-book-2nd-ed-hectorgarcia-molina/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Complete German Grammar 2nd Edition Ed Swick

https://ebookmass.com/product/complete-german-grammar-2ndedition-ed-swick/

Database Management Systems Ramakrishnan 3rd Edition Raghu Ramakrishnan

https://ebookmass.com/product/database-management-systemsramakrishnan-3rd-edition-raghu-ramakrishnan/

The New Statistics with R: An Introduction for Biologists 2nd Edition Andy Hector

https://ebookmass.com/product/the-new-statistics-with-r-anintroduction-for-biologists-2nd-edition-andy-hector/

IIT Foundation Series Physics Class 7, The. 2nd ed. Edition Systems

https://ebookmass.com/product/iit-foundation-series-physicsclass-7-the-2nd-ed-edition-systems/

Database Systems Design, Implementation, and Management 12th Edition (eTextbook) PDF

https://ebookmass.com/product/database-systems-designimplementation-and-management-12th-edition-etextbook-pdf/

Database Systems: Design, Implementation, and Management 13th Edition Carlos Coronel

https://ebookmass.com/product/database-systems-designimplementation-and-management-13th-edition-carlos-coronel/

Complete English All-in-One for ESL Learners 2nd Edition Ed

https://ebookmass.com/product/complete-english-all-in-one-foresl-learners-2nd-edition-ed/

Data Modeling and Database Design 2nd Edition, (Ebook PDF)

https://ebookmass.com/product/data-modeling-and-databasedesign-2nd-edition-ebook-pdf/

Complete German All-in-One (Practice Makes Perfect), 2nd Premium Edition Ed Swick

https://ebookmass.com/product/complete-german-all-in-onepractice-makes-perfect-2nd-premium-edition-ed-swick/

Database Systems

The Complete Book

Second Edition

Garcia-Molina Ullman Widom

Pearson Education Limited

Edinburgh Gate

Harlow

Essex CM20 2JE

England and Associated Companies throughout the world

Visit us on the World Wide Web at: www.pearsoned.co.uk

© Pearson Education Limited 2014

ISBN 10: 1-292-02447-X

ISBN 13: 978-1-292-02447-9

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

Printed in the United States of America

1. The Worlds of Database Systems

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

2. The Relational Model of Data

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

3. Design Theory for Relational Databases

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

4. High-Level Database Models

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

5. Algebraic and Logical Query Languages

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

6. The Database Language SQL

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

7. Constraints and Triggers

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

8. Views and Indexes

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

9. SQL in a Server Environment

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

10. Advanced Topics in Relational Databases

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

11

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

12. Programming Languages for XML

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

13. Secondary Storage Management

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

15. Query Execution

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

16. The Query Compiler

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

17. Coping With System Failures

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

18. Concurrency Control

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

19. More About Transaction Management

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

20. Parallel and Distributed Databases

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

21. Information Integration

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

22. Database Systems and the Internet

Hector Garcia-Molina/Jeffrey Ullman/Jennifer Widom

TheWorldsofDatabase Systems

Databasestodayareessentialtoeverybusiness.Wheneveryouvisitamajor Website—Google,Yahoo!,Amazon.com,orthousandsofsmallersitesthat provideinformation—thereisadatabasebehindthescenesservingupthe informationyourequest.Corporationsmaintainalltheirimportantrecordsin databases.Databasesarelikewisefoundatthecoreofmanyscientificinvestigations.Theyrepresentthedatagatheredbyastronomers,byinvestigatorsof thehumangenome,andbybiochemistsexploringpropertiesofproteins,among manyotherscientificactivities.

Thepowerofdatabasescomesfromabodyofknowledgeandtechnology thathasdevelopedoverseveraldecadesandisembodiedinspecializedsoftwarecalleda databasemanagementsystem,or DBMS,ormorecolloquiallya “databasesystem.”ADBMSisapowerfultoolforcreatingandmanaginglarge amountsofdataefficientlyandallowingittopersistoverlongperiodsoftime, safely.Thesesystemsareamongthemostcomplextypesofsoftwareavailable.

1TheEvolutionofDatabaseSystems

Whatisadatabase?Inessenceadatabaseisnothingmorethanacollectionof informationthatexistsoveralongperiodoftime,oftenmanyyears.Incommon parlance,theterm database referstoacollectionofdatathatismanagedbya DBMS.TheDBMSisexpectedto:

1.Allowuserstocreatenewdatabasesandspecifytheir schemas (logical structureofthedata),usingaspecialized data-definitionlanguage.

2.Giveuserstheabilityto query thedata(a“query”isdatabaselingofor aquestionaboutthedata)andmodifythedata,usinganappropriate language,oftencalleda querylanguage or data-manipulationlanguage.

3.Supportthestorageofverylargeamountsofdata—manyterabytesor more—overalongperiodoftime,allowingefficientaccesstothedata forqueriesanddatabasemodifications.

4.Enable durability,therecoveryofthedatabaseinthefaceoffailures, errorsofmanykinds,orintentionalmisuse.

5.Controlaccesstodatafrommanyusersatonce,withoutallowingunexpectedinteractionsamongusers(called isolation)andwithoutactionson thedatatobeperformedpartiallybutnotcompletely(called atomicity).

1.1EarlyDatabaseManagementSystems

Thefirstcommercialdatabasemanagementsystemsappearedinthelate1960’s. Thesesystemsevolvedfromfilesystems,whichprovidesomeofitem(3)above; filesystemsstoredataoveralongperiodoftime,andtheyallowthestorageof largeamountsofdata.However,filesystemsdonotgenerallyguaranteethat datacannotbelostifitisnotbackedup,andtheydon’tsupportefficientaccess todataitemswhoselocationinaparticularfileisnotknown.

Further,filesystemsdonotdirectlysupportitem(2),aquerylanguagefor thedatainfiles.Theirsupportfor(1)—aschemaforthedata—islimitedto thecreationofdirectorystructuresforfiles.Item(4)isnotalwayssupported byfilesystems;youcanlosedatathathasnotbeenbackedup.Finally,file systemsdonotsatisfy(5).Whiletheyallowconcurrentaccesstofilesbyseveral usersorprocesses,afilesystemgenerallywillnotpreventsituationssuchas twousersmodifyingthesamefileataboutthesametime,sothechangesmade byoneuserfailtoappearinthefile.

ThefirstimportantapplicationsofDBMS’swereoneswheredatawascomposedofmanysmallitems,andmanyqueriesormodificationsweremade. Examplesoftheseapplicationsare:

1.Bankingsystems:maintainingaccountsandmakingsurethatsystem failuresdonotcausemoneytodisappear.

2.Airlinereservationsystems:these,likebankingsystems,requireassurance thatdatawillnotbelost,andtheymustacceptverylargevolumesof smallactionsbycustomers.

3.Corporaterecordkeeping:employmentandtaxrecords,inventories,sales records,andagreatvarietyofothertypesofinformation,muchofit critical.

TheearlyDBMS’srequiredtheprogrammertovisualizedatamuchasit wasstored.Thesedatabasesystemsusedseveraldifferentdatamodelsfor

describingthestructureoftheinformationinadatabase,chiefamongthem the“hierarchical”ortree-basedmodelandthegraph-based“network”model. Thelatterwasstandardizedinthelate1960’sthroughareportofCODASYL (CommitteeonDataSystemsandLanguages).1

Aproblemwiththeseearlymodelsandsystemswasthattheydidnotsupporthigh-levelquerylanguages.Forexample,theCODASYLquerylanguage hadstatementsthatallowedtheusertojumpfromdataelementtodataelement,throughagraphofpointersamongtheseelements.Therewasconsiderableeffortneededtowritesuchprograms,evenforverysimplequeries.

1.2RelationalDatabaseSystems

FollowingafamouspaperwrittenbyTedCoddin1970,2 databasesystems changedsignificantly.Coddproposedthatdatabasesystemsshouldpresent theuserwithaviewofdataorganizedastablescalled relations.Behindthe scenes,theremightbeacomplexdatastructurethatallowedrapidresponseto avarietyofqueries.But,unliketheprogrammersforearlierdatabasesystems, theprogrammerofarelationalsystemwouldnotbeconcernedwiththestorage structure.Queriescouldbeexpressedinaveryhigh-levellanguage,which greatlyincreasedtheefficiencyofdatabaseprogrammers.SQL(“Structured QueryLanguage”)isthemostimportantquerylanguagebasedontherelational model.

By1990,relationaldatabasesystemswerethenorm.Yetthedatabasefield continuestoevolve,andnewissuesandapproachestothemanagementofdata surfaceregularly.Object-orientedfeatureshaveinfilratedtherelationalmodel. Someofthelargestdatabasesareorganizedratherdifferentlyfromthoseusing relationalmethodology.Inthebalanceofthissection,weshallconsidersome ofthemoderntrendsindatabasesystems.

1.3SmallerandSmallerSystems

Originally,DBMS’swerelarge,expensivesoftwaresystemsrunningonlarge computers.Thesizewasnecessary,becausetostoreagigabyteofdatarequired alargecomputersystem.Today,hundredsofgigabytesfitonasingledisk, anditisquitefeasibletorunaDBMSonapersonalcomputer.Thus,database systemsbasedontherelationalmodelhavebecomeavailableforevenverysmall machines,andtheyarebeginningtoappearasacommontoolforcomputer applications,muchasspreadsheetsandwordprocessorsdidbeforethem. Anotherimportanttrendistheuseofdocuments,oftentaggedusingXML (eXtensibleModelingLanguage).Largecollectionsofsmalldocumentscan

1 CODASYLDataBaseTaskGroupApril1971Report,ACM,NewYork.

2 Codd,E.F.,“Arelationalmodelforlargeshareddatabanks,” Comm.ACM, 13:6, pp.377–387,1970.

serveasadatabase,andthemethodsofqueryingandmanipulatingthemare differentfromthoseusedinrelationalsystems.

1.4BiggerandBiggerSystems

Ontheotherhand,agigabyteisnotthatmuchdataanymore.Corporate databasesroutinelystoreterabytes(1012 bytes).Yettherearemanydatabases thatstorepetabytes(1015 bytes)ofdataandserveitalltousers.Someimportantexamples:

1.GoogleholdspetabytesofdatagleanedfromitscrawloftheWeb.This dataisnotheldinatraditionalDBMS,butinspecializedstructures optimizedforsearch-enginequeries.

2.Satellitessenddownpetabytesofinformationforstorageinspecialized systems.

3.Apictureisactuallyworthwaymorethanathousandwords.Youcan store1000wordsinfiveorsixthousandbytes.Storingapicturetypicallytakesmuchmorespace.RepositoriessuchasFlickrstoremillions ofpicturesandsupportsearchofthosepictures.Evenadatabaselike Amazon’shasmillionsofpicturesofproductstoserve.

4.Andifstillpicturesconsumespace,moviesconsumemuchmore.Anhour ofvideorequiresatleastagigabyte.SitessuchasYouTubeholdhundreds ofthousands,ormillions,ofmoviesandmakethemavailableeasily.

5.Peer-to-peerfile-sharingsystemsuselargenetworksofconventionalcomputerstostoreanddistributedataofvariouskinds.Althougheachnode inthenetworkmayonlystoreafewhundredgigabytes,togetherthe databasetheyembodyisenormous.

1.5InformationIntegration

Toagreatextent,theoldproblemofbuildingandmaintainingdatabaseshas becomeoneof informationintegration :joiningtheinformationcontainedin manyrelateddatabasesintoawhole.Forexample,alargecompanyhas manydivisions.Eachdivisionmayhavebuiltitsowndatabaseofproducts oremployeerecordsindependentlyofotherdivisions.Perhapssomeofthese divisionsusedtobeindependentcompanies,whichnaturallyhadtheirownway ofdoingthings.ThesedivisionsmayusedifferentDBMS’sanddifferentstructuresforinformation.Theymayusedifferenttermstomeanthesamethingor thesametermtomeandifferentthings.Tomakemattersworse,theexistence oflegacyapplicationsusingeachofthesedatabasesmakesitalmostimpossible toscrapthem,ever.

Asaresult,ithasbecomenecessarywithincreasingfrequencytobuildstructuresontopofexistingdatabases,withthegoalofintegratingtheinformation

distributedamongthem.Onepopularapproachisthecreationof datawarehouses,whereinformationfrommanylegacydatabasesiscopiedperiodically, withtheappropriatetranslation,toacentraldatabase.Anotherapproachis theimplementationofamediator,or“middleware,”whosefunctionistosupportanintegratedmodelofthedataofthevariousdatabases,whiletranslating betweenthismodelandtheactualmodelsusedbyeachdatabase.

2OverviewofaDatabaseManagement System

InFig.1weseeanoutlineofacompleteDBMS.Singleboxesrepresentsystem components,whiledoubleboxesrepresentin-memorydatastructures.Thesolid linesindicatecontrolanddataflow,whiledashedlinesindicatedataflowonly. Sincethediagramiscomplicated,weshallconsiderthedetailsinseveralstages. First,atthetop,wesuggestthattherearetwodistinctsourcesofcommands totheDBMS:

1.Conventionalusersandapplicationprogramsthataskfordataormodify data.

2.A databaseadministrator :apersonorpersonsresponsibleforthestructureor schema ofthedatabase.

2.1Data-DefinitionLanguageCommands

Thesecondkindofcommandisthesimplertoprocess,andweshowitstrail beginningattheupperrightsideofFig.1.Forexample,thedatabaseadministrator,or DBA,forauniversityregistrar’sdatabasemightdecidethatthere shouldbeatableorrelationwithcolumnsforastudent,acoursethestudent hastaken,andagradeforthatstudentinthatcourse.TheDBAmightalso decidethattheonlyallowablegradesareA,B,C,D,andF.Thisstructure andconstraintinformationisallpartoftheschemaofthedatabase.Itis showninFig.1asenteredbytheDBA,whoneedsspecialauthoritytoexecuteschema-alteringcommands,sincethesecanhaveprofoundeffectsonthe database.Theseschema-alteringdata-definitionlanguage(DDL)commands areparsedbyaDDLprocessorandpassedtotheexecutionengine,whichthen goesthroughtheindex/file/recordmanagertoalterthe metadata,thatis,the schemainformationforthedatabase.

2.2OverviewofQueryProcessing

ThegreatmajorityofinteractionswiththeDBMSfollowthepathontheleft sideofFig.1.Auseroranapplicationprograminitiatessomeaction,using thedata-manipulationlanguage(DML).Thiscommanddoesnotaffectthe schemaofthedatabase,butmayaffectthecontentofthedatabase(ifthe

User/application

queries, updates query plan

index, file, and record requests commands page

Index/file/rec−

data, metadata, indexes

Database administrator table Query compiler compiler Concurrency control Lock manager Buffer DDL

Buffers

read/write pages

Storage ord manager Execution engine Transaction manager manager Logging and recovery

Storage

Figure1:Databasemanagementsystemcomponents

actionisamodificationcommand)orwillextractdatafromthedatabase(ifthe actionisaquery).DMLstatementsarehandledbytwoseparatesubsystems, asfollows.

AnsweringtheQuery

Thequeryisparsedandoptimizedbya querycompiler.Theresulting query plan,orsequenceofactionstheDBMSwillperformtoanswerthequery,is passedtothe executionengine.Theexecutionengineissuesasequenceof requestsforsmallpiecesofdata,typicallyrecordsortuplesofarelation,toa resourcemanagerthatknowsabout datafiles (holdingrelations),theformat andsizeofrecordsinthosefiles,and indexfiles,whichhelpfindelementsof datafilesquickly.

Therequestsfordataarepassedtothe buffermanager.Thebuffermanager’staskistobringappropriateportionsofthedatafromsecondarystorage (disk)whereitiskeptpermanently,tothemain-memorybuffers.Normally,the pageor“diskblock”istheunitoftransferbetweenbuffersanddisk.

Thebuffermanagercommunicateswithastoragemanagertogetdatafrom disk.Thestoragemanagermightinvolveoperating-systemcommands,but moretypically,theDBMSissuescommandsdirectlytothediskcontroller.

TransactionProcessing

QueriesandotherDMLactionsaregroupedinto transactions,whichareunits thatmustbeexecutedatomicallyandinisolationfromoneanother.Anyquery ormodificationactioncanbeatransactionbyitself.Inaddition,theexecutionoftransactionsmustbe durable,meaningthattheeffectofanycompleted transactionmustbepreservedevenifthesystemfailsinsomewayrightafter completionofthetransaction.Wedividethetransactionprocessorintotwo majorparts:

1.A concurrency-controlmanager,or scheduler,responsibleforassuring atomicityandisolationoftransactions,and

2.A loggingand recoverymanager,responsibleforthedurabilityoftransactions.

2.3StorageandBufferManagement

Thedataofadatabasenormallyresidesinsecondarystorage;intoday’scomputersystems“secondarystorage”generallymeansmagneticdisk.However,to performanyusefuloperationondata,thatdatamustbeinmainmemory.It isthejobofthe storagemanager tocontroltheplacementofdataondiskand itsmovementbetweendiskandmainmemory.

Inasimpledatabasesystem,thestoragemanagermightbenothingmore thanthefilesystemoftheunderlyingoperatingsystem.However,forefficiency

purposes,DBMS’snormallycontrolstorageonthediskdirectly,atleastunder somecircumstances.The storagemanager keepstrackofthelocationoffiles onthediskandobtainstheblockorblockscontainingafileonrequestfrom thebuffermanager.

The buffermanager isresponsibleforpartitioningtheavailablemainmemoryinto buffers,whicharepage-sizedregionsintowhichdiskblockscanbe transferred.Thus,allDBMScomponentsthatneedinformationfromthedisk willinteractwiththebuffersandthebuffermanager,eitherdirectlyorthrough theexecutionengine.Thekindsofinformationthatvariouscomponentsmay needinclude:

1. Data :thecontentsofthedatabaseitself.

2. Metadata :thedatabaseschemathatdescribesthestructureof,andconstraintson,thedatabase.

3. LogRecords :informationaboutrecentchangestothedatabase;these supportdurabilityofthedatabase.

4. Statistics :informationgatheredandstoredbytheDBMSaboutdata propertiessuchasthesizesof,andvaluesin,variousrelationsorother componentsofthedatabase.

5. Indexes :datastructuresthatsupportefficientaccesstothedata.

2.4TransactionProcessing

Itisnormaltogrouponeormoredatabaseoperationsintoa transaction,which isaunitofworkthatmustbeexecutedatomicallyandinapparentisolation fromothertransactions.Inaddition,aDBMSofferstheguaranteeofdurability: thattheworkofacompletedtransactionwillneverbelost.The transaction manager thereforeaccepts transactioncommands fromanapplication,which tellthetransactionmanagerwhentransactionsbeginandend,aswellasinformationabouttheexpectationsoftheapplication(somemaynotwishtorequire atomicity,forexample).Thetransactionprocessorperformsthefollowingtasks:

1. Logging :Inordertoassuredurability,everychangeinthedatabaseis loggedseparatelyondisk.The logmanager followsoneofseveralpolicies designedtoassurethatnomatterwhenasystemfailureor“crash”occurs, a recoverymanager willbeabletoexaminethelogofchangesandrestore thedatabasetosomeconsistentstate.Thelogmanagerinitiallywrites theloginbuffersandnegotiateswiththebuffermanagertomakesurethat buffersarewrittentodisk(wheredatacansurviveacrash)atappropriate times.

2. Concurrencycontrol :Transactionsmustappeartoexecuteinisolation. Butinmostsystems,therewillintruthbemanytransactionsexecuting

TheACIDPropertiesofTransactions

Properlyimplementedtransactionsarecommonlysaidtomeetthe“ACID test,”where:

• “A”standsfor“atomicity,”theall-or-nothingexecutionoftransactions.

• “I”standsfor“isolation,”thefactthateachtransactionmustappear tobeexecutedasifnoothertransactionisexecutingatthesame time.

• “D”standsfor“durability,”theconditionthattheeffectonthe databaseofatransactionmustneverbelost,oncethetransaction hascompleted.

Theremainingletter,“C,”standsfor“consistency.”Thatis,alldatabases haveconsistencyconstraints,orexpectationsaboutrelationshipsamong dataelements(e.g.,accountbalancesmaynotbenegativeafteratransactionfinishes).Transactionsareexpectedtopreservetheconsistencyof thedatabase.

atonce.Thus,thescheduler(concurrency-controlmanager)mustassure thattheindividualactionsofmultipletransactionsareexecutedinsuch anorderthattheneteffectisthesameasifthetransactionshadin factexecutedintheirentirety,one-at-a-time.Atypicalschedulerdoes itsworkbymaintaining locks oncertainpiecesofthedatabase.These lockspreventtwotransactionsfromaccessingthesamepieceofdatain waysthatinteractbadly.Locksaregenerallystoredinamain-memory locktable,assuggestedbyFig.1.Thescheduleraffectstheexecutionof queriesandotherdatabaseoperationsbyforbiddingtheexecutionengine fromaccessinglockedpartsofthedatabase.

3. Deadlockresolution :Astransactionscompeteforresourcesthroughthe locksthattheschedulergrants,theycangetintoasituationwherenone canproceedbecauseeachneedssomethinganothertransactionhas.The transactionmanagerhastheresponsibilitytointerveneandcancel(“rollback”or“abort”)oneormoretransactionstolettheothersproceed.

2.5TheQueryProcessor

TheportionoftheDBMSthatmostaffectstheperformancethattheusersees isthe queryprocessor.InFig.1thequeryprocessorisrepresentedbytwo components:

THEWORLDSOFDATABASESYSTEMS

1.The querycompiler,whichtranslatesthequeryintoaninternalform calleda queryplan.Thelatterisasequenceofoperationstobeperformed onthedata.Oftentheoperationsinaqueryplanareimplementations of“relationalalgebra”operations.Thequerycompilerconsistsofthree majorunits:

(a)A queryparser,whichbuildsatreestructurefromthetextualform ofthequery.

(b)A querypreprocessor,whichperformssemanticchecksonthequery (e.g.,makingsureallrelationsmentionedbythequeryactually exist),andperformingsometreetransformationstoturntheparse treeintoatreeofalgebraicoperatorsrepresentingtheinitialquery plan.

(c)A queryoptimizer,whichtransformstheinitialqueryplanintothe bestavailablesequenceofoperationsontheactualdata.

Thequerycompilerusesmetadataandstatisticsaboutthedatatodecide whichsequenceofoperationsislikelytobethefastest.Forexample,the existenceofan index,whichisaspecializeddatastructurethatfacilitates accesstodata,givenvaluesforoneormorecomponentsofthatdata,can makeoneplanmuchfasterthananother.

2.The executionengine,whichhastheresponsibilityforexecutingeachof thestepsinthechosenqueryplan.Theexecutionengineinteractswith mostoftheothercomponentsoftheDBMS,eitherdirectlyorthrough thebuffers.Itmustgetthedatafromthedatabaseintobuffersinorder tomanipulatethatdata.Itneedstointeractwiththeschedulertoavoid accessingdatathatislocked,andwiththelogmanagertomakesurethat alldatabasechangesareproperlylogged.

3References

Today,on-linesearchablebibliographiescoveressentiallyallrecentpapersconcerningdatabasesystems.Thus,weshallnottrytobeexhaustiveinourcitations,butrathershallmentiononlythepapersofhistoricalimportanceand majorsecondarysourcesorusefulsurveys.Asearchableindexofdatabase researchpaperswasconstructedbyMichaelLey[5],andhasrecentlybeen expandedtoincludereferencesfrommanyfields.Alf-ChristianAchillesmaintainsasearchabledirectoryofmanyindexesrelevanttothedatabasefield[3].

Whilemanyprototypeimplementationsofdatabasesystemscontributedto thetechnologyofthefield,twoofthemostwidelyknownaretheSystemR projectatIBMAlmadenResearchCenter[4]andtheINGRESprojectatBerkeley[7].Eachwasanearlyrelationalsystemandhelpedestablishthistypeof systemasthedominantdatabasetechnology.Manyoftheresearchpapersthat shapedthedatabasefieldarefoundin[6].

The2003“Lowellreport”[1]isthemostrecentinaseriesofreportson database-systemresearchanddirections.Italsohasreferencestoearlierreports ofthistype.

Youcanfindmoreaboutthetheoryofdatabasesystemsthaniscovered herefrom[2]and[8].

1.S.Abitebouletal.,“TheLowelldatabaseresearchself-assessment,” Comm. ACM 48:5(2005),pp.111–118. http://research.microsoft.com/˜gray /lowell/LowellDatabaseResearchSelfAssessment.htm

2.S.Abiteboul,R.Hull,andV.Vianu, FoundationsofDatabases,AddisonWesley,Reading,MA,1995.

3. http://liinwww.ira.uka.de/bibliography/Database

4.M.M.Astrahanetal.,“SystemR:arelationalapproachtodatabase management,” ACMTrans.onDatabaseSystems 1:2,pp.97–137,1976.

5. http://www.informatik.uni-trier.de/˜ley/db/index.html .Amirrorsiteisfoundat http://www.acm.org/sigmod/dblp/db/index.html

6.M.StonebrakerandJ.M.Hellerstein(eds.), ReadingsinDatabaseSystems,Morgan-Kaufmann,SanFrancisco,1998.

7.M.Stonebraker,E.Wong,P.Kreps,andG.Held,“ThedesignandimplementationofINGRES,” ACMTrans.onDatabaseSystems 1:3,pp.189–222,1976.

8.J.D.Ullman, PrinciplesofDatabaseandKnowledge-BaseSystems,VolumesIandII,ComputerSciencePress,NewYork,1988,1989.

TheRelationalModelof Data

Thischapterintroducesthemostimportantmodelofdata:thetwo-dimensional table,or“relation.”Webeginwithanoverviewofdatamodelsingeneral.We givethebasicterminologyforrelationsandshowhowthemodelcanbeusedto representtypicalformsofdata.Wethenintroduceaportionofthelanguage SQL—thatpartusedtodeclarerelationsandtheirstructure.Thechapter closeswithanintroductiontorelationalalgebra.Weseehowthisnotation servesasbothaquerylanguage—theaspectofadatamodelthatenablesus toaskquestionsaboutthedata—andasaconstraintlanguage—theaspect ofadatamodelthatletsusrestrictthedatainthedatabaseinvariousways.

1AnOverviewofDataModels

Thenotionofa“datamodel”isoneofthemostfundamentalinthestudyof databasesystems.Inthisbriefsummaryoftheconcept,wedefinesomebasic terminologyandmentionthemostimportantdatamodels.

1.1WhatisaDataModel?

A datamodel isanotationfordescribingdataorinformation.Thedescription generallyconsistsofthreeparts:

1. Structureofthedata.Youmaybefamiliarwithtoolsinprogramming languagessuchasCorJavafordescribingthestructureofthedatausedby aprogram:arraysandstructures(“structs”)orobjects,forexample.The datastructuresusedtoimplementdatainthecomputeraresometimes referredto,indiscussionsofdatabasesystems,asa physicaldatamodel, althoughinfacttheyarefarremovedfromthegatesandelectronsthat trulyserveasthephysicalimplementationofthedata.Inthedatabase

world,datamodelsareatasomewhathigherlevelthandatastructures, andaresometimesreferredtoasa conceptualmodel toemphasizethe differenceinlevel.Weshallseeexamplesshortly.

2. Operationsonthedata.Inprogramminglanguages,operationsonthe dataaregenerallyanythingthatcanbeprogrammed.Indatabasedata models,thereisusuallyalimitedsetofoperationsthatcanbeperformed. Wearegenerallyallowedtoperformalimitedsetof queries (operations thatretrieveinformation)and modifications (operationsthatchangethe database).Thislimitationisnotaweakness,butastrength.Bylimiting operations,itispossibleforprogrammerstodescribedatabaseoperations ataveryhighlevel,yethavethedatabasemanagementsystemimplement theoperationsefficiently.Incomparison,itisgenerallyimpossibleto optimizeprogramsinconventionallanguageslikeC,totheextentthatan inefficientalgorithm(e.g.,bubblesort)isreplacedbyamoreefficientone (e.g.,quicksort).

3. Constraintsonthedata.Databasedatamodelsusuallyhaveawayto describelimitationsonwhatthedatacanbe.Theseconstraintscanrange fromthesimple(e.g.,“adayoftheweekisanintegerbetween1and7” or“amoviehasatmostonetitle”)tosomeverycomplexlimitations.

1.2ImportantDataModels

Today,thetwodatamodelsofpreeminentimportancefordatabasesystemsare:

1.Therelationalmodel,includingobject-relationalextensions.

2.Thesemistructured-datamodel,includingXMLandrelatedstandards. Thefirst,whichispresentinallcommercialdatabasemanagementsystems, isthesubjectofthischapter.Thesemistructuredmodel,ofwhichXMLis theprimarymanifestation,isanaddedfeatureofmostrelationalDBMS’s,and appearsinanumberofothercontextsaswell.

1.3TheRelationalModelinBrief

Therelationalmodelisbasedontables,ofwhichFig.1isanexample.Weshall discussthismodelbeginninginSection2.Thisrelation,ortable,describes movies:theirtitle,theyearinwhichtheyweremade,theirlengthinminutes, andthegenreofthemovie.Weshowthreeparticularmovies,butyoushould imaginethattherearemanymorerowstothistable—onerowforeachmovie evermade,perhaps.

Thestructureportionoftherelationalmodelmightappeartoresemblean arrayofstructsinC,wherethecolumnheadersarethefieldnames,andeach

title year length genre

GoneWiththeWind 1939 231 drama

StarWars 1977 124 sciFi

Wayne’sWorld 1992 95 comedy

Figure1:Anexamplerelation

oftherowsrepresentthevaluesofonestructinthearray.However,itmustbe emphasizedthatthisphysicalimplementationisonlyonepossiblewaythetable couldbeimplementedinphysicaldatastructures.Infact,itisnotthenormal waytorepresentrelations,andalargeportionofthestudyofdatabasesystems addressestherightwaystoimplementsuchtables.Muchofthedistinction comesfromthescaleofrelations—theyarenotnormallyimplementedas main-memorystructures,andtheirproperphysicalimplementationmusttake intoaccounttheneedtoaccessrelationsofverylargesizethatareresidenton disk.

Theoperationsnormallyassociatedwiththerelationalmodelformthe“relationalalgebra,”whichwediscussbeginninginSection4.Theseoperationsare table-oriented.Asanexample,wecanaskforallthoserowsofarelationthat haveacertainvalueinacertaincolumn.Forexample,wecanaskofthetable inFig.1foralltherowswherethegenreis“comedy.”

Theconstraintportionoftherelationaldatamodelwillbetouchedupon brieflyinSection5.However,asabriefsampleofwhatkindsofconstraintsare generallyused,wecoulddecidethatthereisafixedlistofgenresformovies, andthatthelastcolumnofeveryrowmusthaveavaluethatisonthislist.Or wemightdecide(incorrectly,itturnsout)thattherecouldneverbetwomovies withthesametitle,andconstrainthetablesothatnotworowscouldhavethe samestringinthefirstcomponent.

1.4TheSemistructuredModelinBrief

Semistructureddataresemblestreesorgraphs,ratherthantablesorarrays. TheprincipalmanifestationofthisviewpointtodayisXML,awaytorepresent databyhierarchicallynestedtaggedelements.Thetags,similartothoseused inHTML,definetheroleplayedbydifferentpiecesofdata,muchasthecolumn headersdointherelationalmodel.Forexample,thesamedataasinFig.1 mightappearinanXML“document”asinFig.2.

Theoperationsonsemistructureddatausuallyinvolvefollowingpathsin theimpliedtreefromanelementtooneormoreofitsnestedsubelements, thentosubelementsnestedwithinthose,andsoon.Forexample,startingat theouter <Movies> element(theentiredocumentinFig.2),wemightmoveto eachofitsnested <Movie> elements,eachdelimitedbythetag <Movie> and matching </Movie> tag,andfromeach <Movie> elementtoitsnested <Genre> element,toseewhichmoviesbelongtothe“comedy”genre.

<Movies>

<Movietitle="GoneWiththeWind"> <Year>1939</Year> <Length>231</Length> <Genre>drama</Genre> </Movie>

<Movietitle="StarWars"> <Year>1977</Year> <Length>124</Length> <Genre>sciFi</Genre> </Movie>

<Movietitle="Wayne’sWorld"> <Year>1992</Year> <Length>95</Length> <Genre>comedy</Genre> </Movie> </Movies>

Constraintsonthestructureofdatainthismodelofteninvolvethedata typeofvaluesassociatedwithatag.Forinstance,arethevaluesassociated withthe <Length> tagintegersorcantheybearbitrarycharacterstrings? Otherconstraintsdeterminewhichtagscanappearnestedwithinwhichother tags.Forexample,musteach <Movie> elementhavea <Length> elementnested withinit?Whatothertags,besidesthoseshowninFig.2mightbeusedwithin a <Movie> element?Cantherebemorethanonegenreforamovie?

1.5OtherDataModels

Therearemanyothermodelsthatare,orhavebeen,associatedwithDBMS’s. Amoderntrendistoaddobject-orientedfeaturestotherelationalmodel.There aretwoeffectsofobject-orientationonrelations:

1.Valuescanhavestructure,ratherthanbeingelementarytypessuchas integerorstrings,astheywereinFig.1.

2.Relationscanhaveassociatedmethods.

Inasense,theseextensions,calledthe object-relational model,areanalogousto thewaystructsinCwereextendedtoobjectsinC++.

Figure2:MoviedataasXML

Thereareevendatabasemodelsofthepurelyobject-orientedkind.Inthese, therelationisnolongertheprincipaldata-structuringconcept,butbecomes onlyoneoptionamongmanystructures.

ThereareseveralothermodelsthatwereusedinsomeoftheearlierDBMS’s, butthathavenowfallenoutofuse.The hierarchicalmodel was,likesemistructureddata,atree-orientedmodel.Itsdrawbackwasthatunlikemoremodern models,itreallyoperatedatthephysicallevel,whichmadeitimpossiblefor programmerstowritecodeataconvenientlyhighlevel.Anothersuchmodel wasthe networkmodel,whichwasagraph-oriented,physical-levelmodel.In truth,boththehierarchicalmodelandtoday’ssemistructuredmodels,allow fullgraphstructures,anddonotlimitusstrictlytotrees.However,thegeneralityofgraphswasbuiltdirectlyintothenetworkmodel,ratherthanfavoring treesastheseothermodelsdo.

1.6ComparisonofModelingApproaches

Evenfromourbriefexample,itappearsthatsemistructuredmodelshavemore flexibilitythanrelations.Thisdifferencebecomesevenmoreapparentwhen wediscuss,asweshall,howfullgraphstructuresareembeddedintotree-like, semistructuredmodels.Nevertheless,therelationalmodelisstillpreferredin DBMS’s,andweshouldunderstandwhy.Abriefargumentfollows.

Becausedatabasesarelarge,efficiencyofaccesstodataandefficiencyof modificationstothatdataareofgreatimportance.Alsoveryimportantisease ofuse—theproductivityofprogrammerswhousethedata.Surprisingly,both goalscanbeachievedwithamodel,particularlytherelationalmodel,that:

1.Providesasimple,limitedapproachtostructuringdata,yetisreasonably versatile,soanythingcanbemodeled.

2.Providesalimited,yetuseful,collectionofoperationsondata.

Together,theselimitationsturnintofeatures.Theyallowustoimplement languages,suchasSQL,thatenabletheprogrammertoexpresstheirwishesat averyhighlevel.AfewlinesofSQLcandotheworkofthousandsoflinesof C,orhundredsoflinesofthecodethathadtobewrittentoaccessdataunder earliermodelssuchasnetworkorhierarchical.YettheshortSQLprograms, becausetheyuseastronglylimitedsetsofoperations,canbeoptimizedtorun asfast,orfasterthanthecodewritteninalternativelanguages.

2BasicsoftheRelationalModel

Therelationalmodelgivesusasinglewaytorepresentdata:asatwo-dimensionaltablecalleda relation.Figure1,whichwecopyhereasFig.3,isan exampleofarelation,whichweshallcall Movies.Therowseachrepresenta

movie,andthecolumnseachrepresentapropertyofmovies.Inthissection, weshallintroducethemostimportantterminologyregardingrelations,and illustratethemwiththe Movies relation.

title year length genre

GoneWiththeWind 1939 231 drama

StarWars 1977 124 sciFi

Wayne’sWorld 1992 95 comedy

Figure3:Therelation Movies

2.1Attributes

Thecolumnsofarelationarenamedby attributes;inFig.3theattributesare title, year, length,and genre.Attributesappearatthetopsofthecolumns. Usually,anattributedescribesthemeaningofentriesinthecolumnbelow.For instance,thecolumnwithattribute length holdsthelength,inminutes,of eachmovie.

2.2Schemas

Thenameofarelationandthesetofattributesforarelationiscalledthe schema forthatrelation.Weshowtheschemafortherelationwiththerelation namefollowedbyaparenthesizedlistofitsattributes.Thus,theschemafor relation Movies ofFig.3is

Movies(title,year,length,genre)

Theattributesinarelationschemaareaset,notalist.However,inorderto talkaboutrelationsweoftenmustspecifya“standard”orderfortheattributes. Thus,wheneverweintroducearelationschemawithalistofattributes,as above,weshalltakethisorderingtobethestandardorderwheneverwedisplay therelationoranyofitsrows.

Intherelationalmodel,adatabaseconsistsofoneormorerelations.The setofschemasfortherelationsofadatabaseiscalleda relationaldatabase schema,orjusta databaseschema

2.3Tuples

Therowsofarelation,otherthantheheaderrowcontainingtheattribute names,arecalled tuples.Atuplehasone component foreachattributeof therelation.Forinstance,thefirstofthethreetuplesinFig.3hasthe fourcomponents GoneWiththeWind, 1939, 231,and drama forattributes title, year, length,and genre,respectively.Whenwewishtowriteatuple

ConventionsforRelationsandAttributes

Weshallgenerallyfollowtheconventionthatrelationnamesbeginwitha capitalletter,andattributenamesbeginwithalower-caseletter.However,laterweshalltalkofrelationsintheabstract,wherethenamesof attributesdonotmatter.Inthatcase,weshallusesinglecapitalletters forbothrelationsandattributes,e.g., R(A,B,C )foragenericrelation withthreeattributes.

inisolation,notaspartofarelation,wenormallyusecommastoseparate components,andweuseparenthesestosurroundthetuple.Forexample,

(GoneWiththeWind, 1939, 231, drama)

isthefirsttupleofFig.3.Noticethatwhenatupleappearsinisolation,the attributesdonotappear,sosomeindicationoftherelationtowhichthetuple belongsmustbegiven.Weshallalwaysusetheorderinwhichtheattributes werelistedintherelationschema.

2.4Domains

Therelationalmodelrequiresthateachcomponentofeachtuplebeatomic; thatis,itmustbeofsomeelementarytypesuchasintegerorstring.Itisnot permittedforavaluetobearecordstructure,set,list,array,oranyothertype thatreasonablycanhaveitsvaluesbrokenintosmallercomponents.

Itisfurtherassumedthatassociatedwitheachattributeofarelationisa domain,thatis,aparticularelementarytype.Thecomponentsofanytupleof therelationmusthave,ineachcomponent,avaluethatbelongstothedomainof thecorrespondingcolumn.Forexample,tuplesofthe Movies relationofFig.3 musthaveafirstcomponentthatisastring,secondandthirdcomponentsthat areintegers,andafourthcomponentwhosevalueisastring.

Itispossibletoincludethedomain,ordatatype,foreachattributein arelationschema.Weshalldosobyappendingacolonandatypeafter attributes.Forexample,wecouldrepresenttheschemaforthe Movies relation as:

Movies(title:string,year:integer,length:integer,genre:string)

2.5EquivalentRepresentationsofaRelation

Relationsaresetsoftuples,notlistsoftuples.Thustheorderinwhichthe tuplesofarelationarepresentedisimmaterial.Forexample,wecanlistthe threetuplesofFig.3inanyoftheirsixpossibleorders,andtherelationis “thesame”asFig.3.

Moreover,wecanreordertheattributesoftherelationaswechoose,without changingtherelation.However,whenwereordertherelationschema,wemust becarefultorememberthattheattributesarecolumnheaders.Thus,whenwe changetheorderoftheattributes,wealsochangetheorderoftheircolumns. Whenthecolumnsmove,thecomponentsoftupleschangetheirorderaswell. Theresultisthateachtuplehasitscomponentspermutedinthesamewayas theattributesarepermuted.

Forexample,Fig.4showsoneofthemanyrelationsthatcouldbeobtained fromFig.3bypermutingrowsandcolumns.Thesetworelationsareconsidered “thesame.”Moreprecisely,thesetwotablesaredifferentpresentationsofthe samerelation. year genre title length 1977 sciFi StarWars 124 1992 comedy Wayne’sWorld 95 1939 drama GoneWiththeWind 231

2.6RelationInstances

Arelationaboutmoviesisnotstatic;rather,relationschangeovertime.We expecttoinserttuplesfornewmovies,astheseappear.Wealsoexpectchanges toexistingtuplesifwegetrevisedorcorrectedinformationaboutamovie,and perhapsdeletionoftuplesformoviesthatareexpelledfromthedatabasefor somereason.

Itislesscommonfortheschemaofarelationtochange.However,thereare situationswherewemightwanttoaddordeleteattributes.Schemachanges, whilepossibleincommercialdatabasesystems,canbeveryexpensive,because eachofperhapsmillionsoftuplesneedstoberewrittentoaddordeletecomponents.Also,ifweaddanattribute,itmaybedifficultorevenimpossibleto generateappropriatevaluesforthenewcomponentintheexistingtuples.

Weshallcallasetoftuplesforagivenrelationan instance ofthatrelation.Forexample,thethreetuplesshowninFig.3formaninstanceofrelation Movies.Presumably,therelation Movies haschangedovertimeandwillcontinuetochangeovertime.Forinstance,in1990, Movies didnotcontainthe tuplefor Wayne’sWorld.However,aconventionaldatabasesystemmaintains onlyoneversionofanyrelation:thesetoftuplesthatareintherelation“now.” Thisinstanceoftherelationiscalledthe currentinstance 1

1 Databasesthatmaintainhistoricalversionsofdataasitexistedinpasttimesarecalled temporaldatabases

Figure4:Anotherpresentationoftherelation Movies

2.7KeysofRelations

Therearemanyconstraintsonrelationsthattherelationalmodelallowsusto placeondatabaseschemas.Onekindofconstraintissofundamentalthatwe shallintroduceithere: key constraints.Asetofattributesformsa key fora relationifwedonotallowtwotuplesinarelationinstancetohavethesame valuesinalltheattributesofthekey.

Example1: Wecandeclarethattherelation Movies hasakeyconsisting ofthetwoattributes title and year.Thatis,wedon’tbelievetherecould everbetwomoviesthathadboththesametitleandthesameyear.Notice that title byitselfdoesnotformakey,sincesometimes“remakes”ofamovie appear.Forexample,therearethreemoviesnamed KingKong,eachmadein adifferentyear.Itshouldalsobeobviousthat year byitselfisnotakey,since thereareusuallymanymoviesmadeinthesameyear. ✷

Weindicatetheattributeorattributesthatformakeyforarelationby underliningthekeyattribute(s).Forinstance,the Movies relationcouldhave itsschemawrittenas:

Movies(title,year,length,genre)

Rememberthatthestatementthatasetofattributesformsakeyfora relationisastatementaboutallpossibleinstancesoftherelation,notastatementaboutasingleinstance.Forexample,lookingonlyatthetinyrelationof Fig.3,wemightimaginethat genre byitselfformsakey,sincewedonotsee twotuplesthatagreeonthevalueoftheir genre components.However,wecan easilyimaginethatiftherelationinstancecontainedmoremovies,therewould bemanydramas,manycomedies,andsoon.Thus,therewouldbedistinct tuplesthatagreedonthe genre component.Asaconsequence,itwouldbe incorrecttoassertthat genre isakeyfortherelation Movies

Whilewemightbesurethat title and year canserveasakeyfor Movies, manyreal-worlddatabasesuseartificialkeys,doubtingthatitissafetomake anyassumptionaboutthevaluesofattributesoutsidetheircontrol.Forexample,companiesgenerallyassignemployeeID’stoallemployees,andtheseID’s arecarefullychosentobeuniquenumbers.OnepurposeoftheseID’sisto makesurethatinthecompanydatabaseeachemployeecanbedistinguished fromallothers,evenifthereareseveralemployeeswiththesamename.Thus, theemployee-IDattributecanserveasakeyforarelationaboutemployees.

InUScorporations,itisnormalforeveryemployeetohaveaSocial-Security number.IfthedatabasehasanattributethatistheSocial-Securitynumber, thenthisattributecanalsoserveasakeyforemployees.Notethatthereis nothingwrongwiththerebeingseveralchoicesofkey,astherewouldbefor employeeshavingbothemployeeID’sandSocial-Securitynumbers.

Theideaofcreatinganattributewhosepurposeistoserveasakeyisquite widespread.InadditiontoemployeeID’s,wefindstudentID’stodistinguish

studentsinauniversity.Wefinddrivers’licensenumbersandautomobileregistrationnumberstodistinguishdriversandautomobiles,respectively.You undoubtedlycanfindmoreexamplesofattributescreatedfortheprimarypurposeofservingaskeys.

Movies( title:string, year:integer, length:integer, genre:string, studioName:string, producerC#:integer )

MovieStar( name:string, address:string, gender:char, birthdate:date )

StarsIn( movieTitle:string, movieYear:integer, starName:string )

MovieExec( name:string, address:string, cert#:integer, netWorth:integer )

Studio( name:string, address:string, presC#:integer )

Figure5:Exampledatabaseschemaaboutmovies

2.8AnExampleDatabaseSchema

Weshallclosethissectionwithanexampleofacompletedatabaseschema. Thetopicismovies,anditbuildsontherelation Movies thathasappearedso farinexamples.ThedatabaseschemaisshowninFig.5.Herearethethings weneedtoknowtounderstandtheintentionofthisschema.

Movies

Thisrelationisanextensionoftheexamplerelationwehavebeendiscussing sofar.Rememberthatitskeyis title and year together.Wehaveadded twonewattributes; studioName tellsusthestudiothatownsthemovie,and producerC# isanintegerthatrepresentstheproducerofthemovieinaway thatweshalldiscusswhenwetalkabouttherelation MovieExec below.

MovieStar

Thisrelationtellsussomethingaboutstars.Thekeyis name,thenameofthe moviestar.Itisnotusualtoassumenamesofpersonsareuniqueandtherefore suitableasakey.However,moviestarsaredifferent;onewouldnevertakea namethatsomeothermoviestarhadused.Thus,weshallusetheconvenient fictionthatmovie-starnamesareunique.Amoreconventionalapproachwould betoinventaserialnumberofsomesort,likesocial-securitynumbers,sothat wecouldassigneachindividualauniquenumberandusethatattributeasthe key.Wetakethatapproachformovieexecutives,asweshallsee.Another interestingpointaboutthe MovieStar relationisthatweseetwonewdata types.Thegendercanbeasinglecharacter,MorF.Also,birthdateisoftype “date,”whichmightbeacharacterstringofaspecialform.

StarsIn

Thisrelationconnectsmoviestothestarsofthatmovie,andlikewiseconnectsa startothemoviesinwhichtheyappeared.Noticethatmoviesarerepresented bythekeyfor Movies —thetitleandyear—althoughwehavechosendifferentattributenamestoemphasizethatattributes movieTitle and movieYear representthemovie.Likewise,starsarerepresentedbythekeyfor MovieStar, withtheattributecalled starName.Finally,noticethatallthreeattributes arenecessarytoformakey.Itisperfectlyreasonabletosupposethatrelation StarsIn couldhavetwodistincttuplesthatagreeinanytwoofthethree attributes.Forinstance,astarmightappearintwomoviesinoneyear,giving risetotwotuplesthatagreedin movieYear and starName,butdisagreedin movieTitle

MovieExec

Thisrelationtellsusaboutmovieexecutives.Itcontainstheirname,address, andnetworthasdataabouttheexecutive.However,forakeywehaveinvented “certificatenumbers”forallmovieexecutives,includingproducers(asappear intherelation Movies)andstudiopresidents(asappearintherelation Studio, below).Theseareintegers;adifferentoneisassignedtoeachexecutive.

acctNo type balance

12345 savings 12000

23456 checking 1000

34567 savings 25

Therelation Accounts

firstName lastName idNo account

Robbie Banks 901-222 12345

Lena Hand 805-333 12345

Lena Hand 805-333 23456

Therelation Customers

Studio

Thisrelationtellsaboutmoviestudios.Werelyonnotwostudioshavingthe samename,andthereforeuse name asthekey.Theotherattributesarethe addressofthestudioandthecertificatenumberforthepresidentofthestudio. Weassumethatthestudiopresidentissurelyamovieexecutiveandtherefore appearsin MovieExec

2.9ExercisesforSection2

Exercise2.1: InFig.6areinstancesoftworelationsthatmightconstitute partofabankingdatabase.Indicatethefollowing:

a)Theattributesofeachrelation.

b)Thetuplesofeachrelation.

c)Thecomponentsofonetuplefromeachrelation.

d)Therelationschemaforeachrelation.

e)Thedatabaseschema.

f)Asuitabledomainforeachattribute.

g)Anotherequivalentwaytopresenteachrelation.

Figure6:Tworelationsofabankingdatabase

Exercise2.2: InSection2.7wesuggestedthattherearemanyexamplesof attributesthatarecreatedforthepurposeofservingaskeysofrelations.Give someadditionalexamples.

!!Exercise2.3: Howmanydifferentways(consideringordersoftuplesand attributes)aretheretorepresentarelationinstanceifthatinstancehas:

a)Threeattributesandthreetuples,liketherelation Accounts ofFig.6?

b)Fourattributesandfivetuples?

c) n attributesand m tuples?

3DefiningaRelationSchemainSQL

SQL(pronounced“sequel”)istheprincipallanguageusedtodescribeand manipulaterelationaldatabases.ThereisacurrentstandardforSQL,called SQL-99.Mostcommercialdatabasemanagementsystemsimplementsomething similar,butnotidenticalto,thestandard.TherearetwoaspectstoSQL:

1.The Data-Definition sublanguagefordeclaringdatabaseschemasand

2.The Data-Manipulation sublanguagefor querying (askingquestionsabout) databasesandformodifyingthedatabase.

Thedistinctionbetweenthesetwosublanguagesisfoundinmostlanguages; e.g.,CorJavahaveportionsthatdeclaredataandotherportionsthatare executablecode.Thesecorrespondtodata-definitionanddata-manipulation, respectively.

Inthissectionweshallbeginadiscussionofthedata-definitionportionof SQL.

3.1RelationsinSQL

SQLmakesadistinctionbetweenthreekindsofrelations:

1.Storedrelations,whicharecalled tables.Thesearethekindofrelation wedealwithordinarily—arelationthatexistsinthedatabaseandthat canbemodifiedbychangingitstuples,aswellasqueried.

2. Views,whicharerelationsdefinedbyacomputation.Theserelationsare notstored,butareconstructed,inwholeorinpart,whenneeded.

3.Temporarytables,whichareconstructedbytheSQLlanguageprocessor whenitperformsitsjobofexecutingqueriesanddatamodifications. Theserelationsarethenthrown awayandnot stored.

Inthissection,weshalllearnhowtodeclaretables.Wedonottreatthedeclarationanddefinitionofviewshere,andtemporarytablesareneverdeclared. TheSQL CREATETABLE statementdeclarestheschemaforastoredrelation.It givesanameforthetable,itsattributes,andtheirdatatypes.Italsoallows ustodeclareakey,orevenseveralkeys,forarelation.Therearemanyother featurestothe CREATETABLE statement,includingmanyformsofconstraints thatcanbedeclared,andthedeclarationof indexes (datastructuresthatspeed upmanyoperationsonthetable)butweshallleavethosefortheappropriate time.

3.2DataTypes

Tobegin,letusintroducetheprimitivedatatypesthataresupportedbySQL systems.Allattributesmusthaveadatatype.

1.Characterstringsoffixedorvaryinglength.Thetype CHAR(n) denotes afixed-lengthstringofupto n characters. VARCHAR(n) alsodenotesa stringofupto n characters.Thedifferenceisimplementation-dependent; typically CHAR impliesthatshortstringsarepaddedtomake n characters, while VARCHAR impliesthatanendmarkerorstring-lengthisused.SQL permitsreasonablecoercionsbetweenvaluesofcharacter-stringtypes. Normally,astringispaddedbytrailingblanksifitbecomesthevalue ofacomponentthatisafixed-lengthstringofgreaterlength.Forexample,thestring ’foo’, 2 ifitbecamethevalueofacomponentforan attributeoftype CHAR(5),wouldassumethevalue ’foo’ (withtwo blanksfollowingthesecond o).

2.Bitstringsoffixedorvaryinglength.Thesestringsareanalogoustofixed andvarying-lengthcharacterstrings,buttheirvaluesarestringsofbits ratherthancharacters.Thetype BIT(n) denotesbitstringsoflength n, while BITVARYING(n) denotesbitstringsoflengthupto n.

3.Thetype BOOLEAN denotesanattributewhosevalueislogical.Thepossiblevaluesofsuchanattributeare TRUE, FALSE,and—althoughitwould surpriseGeorgeBoole— UNKNOWN

4.Thetype INT or INTEGER (thesenamesaresynonyms)denotestypical integervalues.Thetype SHORTINT alsodenotesintegers,butthenumber ofbitspermittedmaybeless,dependingontheimplementation(aswith thetypes int and shortint inC).

2 NoticethatinSQL,stringsaresurroundedbysingle-quotes,notdouble-quotesasinmany otherprogramminglanguages.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
Instant ebooks textbook Database systems. the complete book 2nd ed. hector garcia-molina download al by Education Libraries - Issuu