Concepts and Semantics of Programming Languages 1
A Semantical Approach with OCaml and Python
Thérèse Hardin
Mathieu Jaume
François Pessaux
Véronique Viguié Donzeau-Gouge
First published 2021 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd
John Wiley & Sons, Inc.
27-37 St George’s Road 111 River Street London SW19 4EU Hoboken, NJ 07030 UK USA
www.iste.co.uk
www.wiley.com
© ISTE Ltd 2021
The rights of Thérèse Hardin, Mathieu Jaume, François Pessaux and Véronique Viguié Donzeau-Gouge to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Control Number: 2021930488
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN 978-1-78630-530-5
Foreword .......................................xi
Preface
Chapter1.FromHardwaretoSoftware ...................1
1.1.Computers:alow-levelview........................1
1.1.1.Informationprocessing.........................1
1.1.2.Memories.................................2
1.1.3.CPUs...................................3
1.1.4.Peripheraldevices............................7
1.2.Computers:ahigh-levelview.......................8
1.2.1.Modelingcomputations.........................9
1.2.2.High-levellanguages..........................9
1.2.3.Fromsourcecodetoexecutableprograms..............10
Chapter2.IntroductiontoSemanticsofProgrammingLanguages 15
2.1.Environment,memoryandstate......................16
2.1.1.Evaluationenvironment.........................16
2.1.2.Memory.................................18
2.1.3.State....................................20
2.2.Evaluationofexpressions..........................21
2.2.1.Syntax..................................21
2.2.2.Values...................................22
2.2.3.Evaluationsemantics..........................24
2.3.Definitionandassignment.........................26
2.3.1.Defininganidentifier..........................26
2.3.2.Assignment................................29
2.4.Exercises...................................31
Chapter3.SemanticsofFunctionalFeatures
3.1.Syntacticaspects...............................35
3.1.1.Syntaxofafunctionalkernel......................35
3.1.2.Abstractsyntaxtree...........................36
3.1.3.Reasoningbyinductionoverexpressions...............39
3.1.4.Declarationofvariables,boundandfreevariables..........39
3.2.Executionsemantics:evaluationfunctions................42
3.2.1.Evaluationerrors.............................42
3.2.2.Values...................................43
3.2.3.Interpretationofoperators.......................45
3.2.4.Closures.................................46
3.2.5.Evaluationofexpressions........................47
3.3.Executionsemantics:operationalsemantics...............54
3.3.1.Simpleexpressions...........................55
3.3.2.Call-by-value...............................56
3.3.3.Recursiveandmutuallyrecursivefunctions..............60
3.3.4.Call-by-name...............................61
3.3.5.Call-by-valueversuscall-by-name...................62
3.4.Evaluationfunctionsversusevaluationrelations.............64
3.4.1.Statusoftheevaluationfunction....................64
3.4.2.Inductionoverevaluationtrees.....................65
3.5.Semanticproperties.............................69
3.5.1.Equivalentexpressions.........................69
3.5.2.Equivalentenvironments........................71
4.1.Syntaxofakernelofanimperativelanguage...............77 4.2.Evaluationofexpressions..........................81 4.3.Evaluationofdefinitions..........................86 4.4.Operationalsemantics............................89
4.4.1.Big-stepsemantics............................89
4.4.2.Small-stepsemantics..........................93
4.4.3.Expressivenessofoperationalsemantics...............95 4.5.Semanticproperties.............................96 4.5.1.Equivalentprograms..........................96
4.5.2.Programtermination..........................98
4.5.3.Determinismofprogramexecution..................100
4.5.4.Bigstepsversussmallsteps......................103
4.6.Procedures..................................109
4.6.1.Blocks..................................109
4.6.2.Procedures................................112
4.7.Otherapproaches..............................118
4.7.1.Denotationalsemantics.........................118
4.7.2.Axiomaticsemantics,Hoarelogic...................129
4.8.Exercises...................................134
Chapter5.Types ..................................137
5.1.Typechecking:whenandhow?......................139
5.1.1.Whentoverifytypes?..........................139
5.1.2.Howtoverifytypes?..........................140
5.2.Informaltypingofaprogram Exp2 ....................141
5.2.1.Afirstexample..............................141
5.2.2.Typingaconditionalexpression....................142
5.2.3.Typingwithouttypeconstraints....................142
5.2.4.Polymorphism..............................143
5.3.Typingrulesin Exp2 ............................143
5.3.1.Types,typeschemesandtypingenvironments............143
5.3.2.Generalization,substitutionandinstantiation.............146
5.3.3.Typingrulesandtypingtrees......................151
5.4.Typeinferencealgorithmin Exp2 .....................154
5.4.1.Principaltype..............................154
5.4.2.Setsofconstraintsandunification...................155
5.4.3.Typeinferencealgorithm........................159
5.5.Properties...................................167
5.5.1.Propertiesoftypechecking.......................167
5.5.2.Propertiesoftheinferencealgorithm.................167
5.6.Typecheckingofimperativeconstructs..................168
5.6.1.Typealgebra...............................168
5.6.2.Typingrules...............................169
5.6.3.Typingpolymorphicdefinitions....................171
5.7.Subtypingandoverloading.........................172
5.7.1.Subtyping.................................173
5.7.2.Overloading...............................175
Chapter6.DataTypes
..............................179
6.1.Basictypes..................................179
6.1.1.Booleans.................................179
6.1.2.Integers..................................181
6.1.3.Characters................................186
6.1.4.Floatingpointnumbers.........................187
6.2.Arrays.....................................191
6.3.Strings....................................194
6.4.Typedefinitions...............................194
6.4.1.Typeabbreviations............................195
6.4.2.Records..................................196
6.4.3.Enumeratedtypes............................200
6.4.4.Sumtypes................................202
6.5.Generalizedconditional...........................205
6.5.1.Cstyle switch/case ...........................205
6.5.2.Patternmatching.............................208
6.6.Equality....................................216
6.6.1.Physicalequality.............................217
6.6.2.Structuralequality............................218
6.6.3.Equalitybetweenfunctions.......................220
Chapter7.PointersandMemoryManagement ..............223
7.1.Addressesandpointers...........................223
7.2.Endianness..................................225
7.3.Pointersandarrays.............................225
7.4.Passingparametersbyaddress.......................226
7.5.References..................................229
7.5.1.ReferencesinC++............................229
7.5.2.ReferencesinJava............................233
7.6.Memorymanagement............................234
7.6.1.Memoryallocation...........................234
7.6.2.Freeingmemory.............................237
7.6.3.Automaticmemorymanagement...................239
Chapter8.Exceptions ..............................243
8.1.Errors:notificationandpropagation....................243
8.1.1.Globalvariable..............................245
8.1.2.Recorddefinition............................245
8.1.3.Passingbyaddress............................245
8.1.4.Introducingexceptions.........................246
8.2.Asimpleformalization:ML-styleexceptions..............247
8.2.1.Abstractsyntax.............................247
8.2.2.Values...................................248
8.2.3.Typealgebra...............................248
8.2.4.Operationalsemantics..........................248
8.2.5.Typing..................................250
8.3.Exceptionsinotherlanguages.......................250
8.3.1.ExceptionsinOCaml..........................251
8.3.2.ExceptionsinPython..........................251
8.3.3.ExceptionsinJava............................253
8.3.4.ExceptionsinC++............................254
Foreword
Computerprogramshaveplayedanincreasinglycentralroleinourlivessincethe 1940s,andthequalityoftheseprogramshasthusbecomeacrucialquestion.Writing ahigh-qualityprogram–aprogramthatperformstherequiredtaskandisefficient, robust,easytomodify,easytoextend,etc.–isanintellectuallychallengingtask, requiringtheuseofrigorousdevelopmentmethods.Firstandforemost,however,the creationofsuchaprogramisdependentonanin-depthknowledgeofthe programminglanguageused,itssyntaxand,crucially,itssemantics,i.e.what happenswhenaprogramisexecuted.
Thedescriptionofthissemanticsputsthemostfundamentalconceptsintolight, includingthoseofvalue,reference,exceptionorobject.Theseconceptsarethe foundationsofprogramminglanguagetheory.Masteringtheseconceptsiswhatsets experiencedprogrammersapartfrombeginners.Certainconcepts–likethatofvalue –arecommontoallprogramminglanguages;others–suchasthenotionoffunctions –operatedifferentlyindifferentlanguages;finally,otherconcepts–suchasthatof objects–onlyexistincertainlanguages.Computerscientistsoftenreferto “programmingparadigms”toconsidersetsofconceptssharedbyafamilyof languages,whichimplyacertainprogrammingstyle:imperative,functional, object-oriented,logical,concurrent,etc.Nevertheless,anunderstandingofthe conceptsthemselvesisessential,asseveralparadigmsmaybeinterwovenwithinthe samelanguage.
Introductorytextsonprogramminginanygivenlanguagearenotdifficulttofind, andanumberofpublishedbooksaddressthefundamentalconceptsoflanguage semantics.Muchrarerarethose,likethepresentvolume,whichestablishand examinethelinksbetweenconceptsandtheirimplementationinlanguagesusedby programmersonadailybasis,suchasC,C++,Ada,Java,OCamlandPython.The authorsprovideawealthofexamplesintheselanguages,illustratingandgivinglife tothenotionsthattheypresent.Theyproposegeneralmodels,suchasthe kit
presentedinVolume2,permittingaunifiedviewofdifferentnotions;thismakesit easierforreaderstounderstandtheconstructsusedinpopularprogramming languagesandfacilitatescomparison.Thisthoroughanddetailedworkprovides readerswithanunderstandingofthesenotionsand,aboveall,anunderstandingof thewaysofusingthelattertocreatehigh-qualityprograms,buildingasaferand morereliablefutureincomputing.
GillesD OWEK ResearchDirector,Inria ProfessorattheÉcolenormalesupérieure,Paris-Saclay
CatherineD UBOIS ProfessorattheÉcolenationalesupérieure d’informatiquepourl’industrieetl’entreprise
January2021
Preface
Thistwo-volumeworkrelatestothefieldofprogramming.Firstandforemost,it isintendedtogivereadersasolidgroundinginthebasesoffunctionalorimperative programming,alongwithathoroughknowledgeofthemoduleandclassmechanisms involved.Inourview,thesemanticsapproachismostappropriatewhenstudying programming,astheimpactofinterlanguagesyntaxdifferencesislimited.Practical considerations,determinedbythematerialcharacteristicsofcomputersand/or “smart”devices,willalsobeaddressed.Thesameapproachwillbetakeninboth volumes,usingbothmathematicalformulasandmemorystatediagrams.Withthis book,wehopetohelpreadersunderstandthemeaningoftheconstructsdescribedin thereferencemanualsofprogramminglanguagesandtoestablishsolidfoundations forreasoningandassessingthecorrectnessoftheirownprogramsthroughcritical review.Inshort,ouraimistofacilitatethedevelopmentofsafeandreliable programs.
Volume1beginswithapresentationofthecomputer,inChapter1,firstatthe materiallevel–asanassemblageofcomponents–thenasatoolforexecuting programs.Chapter2isanintuitive,step-by-stepintroductiontolanguagesemantics, intendedtofamiliarizereaderswiththisapproachtoprogramming.InChapter3,we provideadetaileddiscussiononthesubject,withaformalpresentationofthe executionsemanticsoffunctionalfeatures.Chapter4continueswiththesametopic, lookingattheexecutionsemanticsofimperativefeatures.Inthesetwochapters,a clearmathematicalframeworkisusedtosupportourpresentation.Also,allofthe notionswhichweintroduceinthesechaptersareimplementedinbothPythonand OCamltoassistreaderslearningaboutthesemanticconceptsinquestionforthefirst time.Multipleexercises,withdetailedsolutions,areprovidedinbothcases.Chapter 5,onthesubjectoftyping,beginsbyaddressingtypingrules,whichareusedto checkprograms;wethenpresentthealgorithmusedtoinferpolymorphictypes, alongwiththeassociatedmathematicalnotions,allimplementedinbothlanguages. Finally,theextensionoftypingtoimperativefeaturesisaddressed.InChapter6,we
presentthemaindatatypesandmethodsofpatternmatching,usingarangeof examplesexpressedindifferentprogramminglanguages.Chapter7focuseson low-levelprogrammingfeatures:endianness,pointersandmemorymanagement; thesenotionsaremostlypresentedusingCandC++.Volume1endswitha discussionoferrorprocessingusingexceptions,theirsemanticsispresentedin OCaml,andtheexceptionmanagementmechanismsusedinPython,JavaandC++ arealsodescribed(seeChapter8).
Thus,Volume1isintendedtogiveabroadoverviewofthefunctionaland imperativefeaturesofprogramming,fromnotionsthatcanbemodeled mathematicallytonotionsthatarelinkedtothehardwareconfigurationofcomputers themselves.Volume2focusesonmodularandobjectprogramming,buildingonthe foundationslaiddowninVolume1sincemodules,classesandobjectsare,in essence,themeansoforganizingfunctionalorimperativeconstructs.Volume2first analyzestheneedsofdevelopersintermsoftoolsforsoftwarearchitecture.Basedon thisstudy,anoriginalsemanticmodel,calleda kit,isdrawnup,jointlypresentingall thefeaturesofthemodulesandobjectsthatcanmeettheseneeds.Thesemanticsof thesekitsaredefinedinaratherinformalway,asresearchinthisfieldhasnotyetled toamathematicalmodelofthissetoffeatures,whileremainingrelativelysimple. Fromthismodel,weconsiderasetofemergingquestions,theobjectiveofwhichis toguidetheacquisitionofalanguage.Thisapproachisthenexemplifiedbythestudy ofthemodulesystemsofAda,OCamlandC.Finally,thesameapproachwillbeused todeduceasemanticmodelofclassandobjectfeatures,whichwillservetopresent classesinJava,C++,OCamlandPythonfromaunifiedperspective.
Thisworkisaimedatarelativelywideaudience,fromexperienceddevelopers–whowillfindvaluableadditionalinformationonlanguagesemantics–tobeginners whohaveonlywrittenshortprograms.Forbeginners,werecommendworkingonthe semanticconceptsdescribedinVolume1usingtheimplementationsinOCamlor Pythontoeaseassimilation.Allreadersmaybenefitfromstudyingthereference manualofaprogramminglanguage,whilecomparingthepresentationsofconstructs giveninthemanualwiththosegivenhere,guidedbythequestionsmentionedin Volume2.
Notethatwedonotdiscussthealgorithmicaspectofdataprocessinghere. However,choosingthealgorithmandthedatarepresentationthatfittherequirements ofthespecificationisanessentialstepinprogramdevelopment.Manyexcellent workshavebeenpublishedonthissubject,andweencouragereaderstoexplorethe subjectfurther.Wealsorecommendusingthestandardlibrariesprovidedbythe chosenprogramminglanguage.Theselibrariesincludetriedandtested implementationsformanydifferentalgorithms,whichmaygenerallybeassumedto becorrect.
Thisfirstchapterprovidesabriefoverviewofthecomponentsfoundinall computers,frommainframestotheprocessingchipsintablets,smartphonesand smartobjectsviadesktoporlaptopcomputers.Buildingonthishardware-centric presentation,weshallthengiveamoreabstractdescriptionoftheactionscarriedout bycomputers,leadingtoauniformdefinitionoftheterms“program”and “execution”,aboveandbeyondthevariouscharacteristicsofso-calledelectronic devices.
1.1.Computers:alow-levelview
Computerscienceisthescienceofrationalprocessingofinformationby computers.Computershavethecapacitytocarryoutavarietyofprocesses, dependingontheinstructionsgiventothem.Eachitemof information isanelement ofknowledgethatmaybetransmittedusingasignalandencodedusingasequenceof symbolsinconjunctionwithasetofrulesusedtodecodethem,i.e.toreconstructthe signalfromthesequenceofsymbols.Computersusebinaryencoding,involvingtwo symbols;thesemaybereferredtoas“true”/“false”,“0”/“1”or“high”/“low”;these termsareinterchangeable,andallrepresentthetwostablestatesoftheelectrical potentialofdigitalelectroniccircuits.
1.1.1. Informationprocessing
Schematically,acomputerismadeupofthreefamiliesofcomponentsasfollows: –memories:storedata(information)andexecutablecode(theso-calledvon Neumannarchitecture);
–oneormoremicroprocessors,knownasCPUs(centralprocessingunits),which processinformationbyapplyingelementaryoperations;
–peripherals:theseenableinformationtobeexchangedbetweenthe CPU/memorycoupleandtheoutside.
Informationprocessingbyacomputer–inotherterms,theexecutionofa program–canbesummarizedasasequenceofthreesteps:fetchingdata,computing theresultsandreturningthem.Eachelementaryprocessingoperationcorrespondsto aconfigurationofthelogicalcircuitsoftheCPU,knownasa logicfunction.Ifthe resultofthisfunctionissolelydependentoninput,andifnonotionof“time”is involvedinthecomputations,thenthefunctionissaidtobe combinatorial; otherwise,itissaidtobe sequential.
Forexample,abinaryhalf-adder,asshowninFigure1.1,isacircuitthat computesthesumoftwobinarydigits(input),alongwiththepossiblecarryvalue.It thusimplementsacombinatoriallogicfunction.
Theessentialcharacterofacombinatorialfunctionisthat,forthesameinput,the functionalwaysproducesthesameoutput,nomatterwhatthecircumstances.Thisis nottrueofsequentiallogicfunctions.
Forexample,alogicfunctionthatcountsthenumberoftimesitsinputchanges reliesonanotionof“time”(changestakeplaceintime),andapersistentstatebetween twoinputsisrequiredinordertorecordthepreviousvalueofthecounter.Thisstateis savedina memory.Forsequentialfunctions,asameinputvaluecanresultindifferent outputvalues,aseveryoutputdependsnotonlyontheinput,butalsoonthestateof thememoryatthemomentofreadingthenewinput.
1.1.2. Memories
Computersusememorytosaveprogramsanddata.Thereareseveraldifferent technologiesusedinmemorycomponents,andasimplifiedpresentationisasfollows: –RAM(RandomAccessMemory):RAMmemoryisbothreadableandwriteable. RAMcomponentsaregenerallyfast,butalsovolatile:ifelectricpowerfallsdown, theircontentislost;
Figure1.1. Binaryhalf-adder
–ROM(ReadOnlyMemory):informationstoredinaROMiswrittenatthetime ofmanufacturing,anditisread-only.ROMisslowerthanRAM,butisnon-volatile, like,forexample,aburnedDVD;
–EPROM(ErasableProgrammableReadOnlyMemory):thismemoryis non-volatile,butcanbewrittenusingaspecificdevice,throughexposuretoultravioletlight,orbymodifyingthepowervoltage,etc.ItisslowerthanRAM,forboth readingandwriting.EPROMmaybeconsideredequivalenttoarewritableDVD.
Computersusethememorycomponentsofseveraltechnologies.Storagesize diminishesasaccessspeedincreases,asfast-accessmemoryismorecostly.A distinctionisgenerallymadebetweenfourdifferenttypesofmemory:
–massstorageismeasuredinterabytesandismadeeitherofmechanicaldisks (withanaccesstimeof ∼ 10 ms)or–increasingly–ofsolid-statedrive(SSD)blocks. TheseblocksuseanEEPROMvariant(electricallyerasable)withanaccesstimeof ∼ 0 1 0 3 ms,knownas flashmemory.Massstorageisnon-volatileandisprincipally usedforthefilesystem;
–RAM,whichisexternaltothemicroprocessor.Recenthomecomputersand smartphonesgenerallypossesslargeRAMcapacities(measuredingigabytes). Embeddedsystemsorconsumerdevelopmentelectronicboardsmayhaveamuch lowerRAMcapacity.Theaccesstimeisaround40–50 η s;
–the cache isgenerallyincludedintheCPUofmodernmachines.Thisisasmall RAMmemoryofafewkilobytes(ormegabytes),withanaccesstimeofaround 5 10 η s.Thereareoftenmultiplelevelsofcache,andaccesstimedecreaseswithsize. Thecacheisusedtosavefrequentlyusedand/orconsecutivedataand/orinstructions, reducingtheneedtoaccessslowerRAMbyretaininginformationlocally.Cache managementiscomplex:itisimportanttoensureconsistencybetweenthedatain themainmemoryandthecache,betweendifferentCPUsordifferentcores(full, independentprocessingunitswithinthesameCPU)andtodecidewhichdatato discardtofreeupspace,etc.;
– registers arethefastestmemoryunitsandarelocatedinthecenterofthe microprocessoritself.Themicroprocessorcontainsalimitednumber(afewdozen) ofthesestoragezones,useddirectlybyCPUinstructions.Accesstimeisaroundone processorcycle,i.e.around1ns.
1.1.3. CPUs
TheCPU,asitsnamesuggests,istheunitresponsibleforprocessinginformation, viatheexecutionof elementaryinstructions,whichcanberoughlygroupedintofive categories:
–datatransferinstructions(copybetweenregistersorbetweenmemoryand registers);
–arithmeticinstructions(additionoftwointegervaluescontainedintworegisters, multiplicationbyaconstant,etc.);
–logicalinstructions(bit-wiseand/or/not,shift,rotate,etc.);
–branchingoperations(conditional,non-conditional,tosubroutines,etc.);
–otherinstructions(halttheprocessor,reset,interruptrequests, test-and-set, compare-and-swap,etc.).
Instructionsarecodedbybinarywordsinaformatspecifictoeachmicroprocessor. Aprogramofafewlinesinahigh-levelprogramminglanguageistranslatedintotens orevenhundredsofelementaryinstructions,whichwouldbedifficult,errorprone andtimeconsumingtowriteoutmanually.ThisisillustratedinFigure1.2,wherea “HelloWorld!”programwritteninCisshownalongsideitscounterpartinx86-64 instructions,generatedbythe gcc compiler.
.section__TEXT .globl_main .align4,0x90 _main:
#include<stdio.h> intmain(){ printf("Hellow orld!\n"); return(0); }
.cfi_startproc ##BB#0: pushq%rbp Ltmp0: .cfi_def_cfa_offset16 Ltmp1: .cfi_offset%rbp, 16 movq%rsp,%rbp Ltmp2:
.cfi_def_cfa_register%rbp subq$16,%rsp leaqL_.str(%rip),%rdi movl$0, 4(%rbp) movb$0,%al callq_p rintf xorl%ecx,%ecx movl%eax, 8(%rbp) movl%ecx,%eax addq$16,%rsp popq%rbp retq .cfi_endproc .section__TEXT L_.str: .asciz"Helloworld!\n"
Putsimply,amicroprocessorissplitintotwoparts:acontrolunit,whichdecodes andsequencestheinstructionstoexecute,andoneormorearithmeticandlogicunits (ALUs),whichcarryouttheoperationsstipulatedbytheinstructions.TheCPUruns permanentlythroughathree-stagecycle:
Figure1.2. “Helloworld!”inCandinx86-64instructions
1)fetchingthenextinstructiontobeexecutedfromthememory:every microprocessorcontainsaspecialregister,theProgramCounter(PC),whichrecords thelocation(address)ofthisinstruction.ThePCisthenincremented,i.e.thesizeof thefetchedinstructionisaddedtoit;
2)decodingofthefetchedinstruction;
3)executionofthisinstruction.
However,thenextinstructionisnotalwaystheonelocatednexttothecurrent instruction.Considerthefunction min inexample1.1,writteninC,whichreturnsthe smallestofitstwoarguments.
E XAMPLE 1.1.–
C intmin(inta,intb){ if(a<b)return(a); elsereturn(b); }
Thisfunctionmaybetranslated,intuitivelyandnaively,intoelementary instructions,byfirstplacing a and b intoregisters,thencomparingthem:
min: loada,reg0 loadb,reg1 comparereg0,reg1
Dependingontheresultofthetest–trueorfalse–differentcontinuationsare considered.Executioncontinuesusinginstructionsforoneortheotherofthese continuations:wethereforehavetwopossiblecontrolpaths.Inthiscase,a conditionaljump instructionmustbeusedtomodifythePCvalue,whenrequired,to selectthefirstinstructionofoneofthetwopossiblepaths.
branchgta_gt_b loadreg0,reg2 jumpend
a_gt_b: loadreg1,reg2 end: returnreg2
The branchgt instructionloadsthelocationoftheinstructionatlabel a_gt_b into thePC.Iftheresultofthe compare instructionisthat reg0 > reg1,thenextinstruction istheonefoundatthisaddress: loadreg1,reg2.Otherwise,thenextinstructionis theonefollowing branchgt: loadreg0,reg2.Thisisfollowedbythe unconditional
jump instruction, jump,enablingunconditionalmodificationofthePC,loadingitwith theaddressofthe end label.Thus,whatevertheresultofthecomparison,execution finisheswiththeinstruction returnreg2.
Conditionalbranchingrequirestheuseofaspecificmemorytodetermine whethercertainconditionshavebeensatisfiedbytheexecutionoftheprevious instruction(overflow,positiveresult,nullresult,superiority,etc.).EveryCPU containsadedicatedregister,theStateRegister(SR),inwhicheverybitisassigned tosignalingoneoftheseconditions.Executingmostinstructionsmaymodifyallor someofthebitsintheregister.Conditionalinstructions(bothjumpsandmore “exotic”variants)usetheappropriatebitvaluesforexecution.CertainARM ® architectures[ARM10]evenpermitallinstructionstobeintrinsicallyconditional.
Everyprogramismadeupoffunctionsthatcanbecalledatdifferentpointsinthe programandthesecallscanbenested.Whenafunctioniscalled,thepointwhere executionshouldresumeoncetheexecutionofthefunctioniscompleted–the return address –mustberecorded.Consideraprogrammadeupofthefunctions g ()= k ()+ h() and f ()= g ()+ h(),featuringseveralfunctioncalls,someofwhich arenested.
g()= t11=k() t12=h() returnt11+t12
f()= v11=g() v12=h() returnv11+v12
Asingleregisterisnotsufficienttorecordthereturnaddressesofthedifferent calls.Calling k from g mustbefollowedbycalling h toevaluate t12.Butthiscall of g wasdoneby f,thusitsreturnaddressin f shouldalsobememorizedtofurther evaluationof v12.Thenumberofreturnaddressestorecordincreaseswiththenumber ofnestedcalls,anddecreasesasweleavethesecalls,suggestingverynaturallytosave theseaddressesina stack.Figure1.3showstheevolutionofastackstructureduring successivefunctioncalls,demonstratingtheneedtorecordmultiplereturnaddresses. Thestateofthestackisshownateverystepoftheexecution,atthemomentwherethe lineintheprogramisbeingexecuted.
Adedicatedregister,theStackPointer(SP),alwayscontainstheaddressofthe nextfreeslotinthestack(or,alternatively,theaddressofthelastslotused).Thus, inthecaseofnestedcalls,thereturnaddressissavedattheaddressindicatedbythe SP,andtheSPisincrementedbythesizeofthisaddress.Whenthefunctionreturns, thePCisloadedwiththesavedaddressfromthestack,andtheSPisdecremented accordingly.
Insummary,theinternalstateofamicroprocessorismadeupofitsgeneral registers,theprogramcounter,thestateregisterandthestackpointer.Note,however, thatthisisahighlysimplifiedvision.Therearemanydifferentvarietiesof microprocessorswithdifferentinternalarchitecturesand/orinstructionsets(for example,somedonotpossessanintegerdivisioninstruction).Thus,aprogram writtendirectlyusingtheinstructionsetofamicroprocessorwillnotbeexecutable usinganothermodelofmicroprocessor,anditwillneedtoberewritten.The portabilityofprogramswrittenintheassemblylanguageofagivenmicroprocessoris practicallynull.High-levellanguagesrespondtothisproblembyprovidingsyntactic constructs,whichareindependentofthetargetmicroprocessors.Thecompilerorthe interpreterhavetotranslatetheseconstructsintothelanguageusedbythe microprocessor.
1.1.4. Peripheraldevices
Aswesawinsection1.1.3,processorsexecuteaconstantcycleoffetching, decodingandexecutinginstructions.Computationsarecarriedoutusingdatastored inthememory,eitherbytheprogramitselforbyaninput/outputmechanism.The resultsofcomputationsarealsostoredinthememory,andmaybereturnedtousers usingthisinput/outputmechanism.
Theinterestofanyprogrammablesystemisinherentlydependentoninput/output capacitiesthroughwhichthesystemreactstotheexternalenvironmentandmayact onthisenvironment.Evenanassemblyrobotinacarfactory,whichrepeatsthesame actionsagainandagain,mustreacttodatainputfromtheenvironment.Forexample, thepressureofthegripmechanismmuststopincreasingonceithascaughtabolt,and thetimeittakestodothiswilldifferdependingontheexactpositionofthebolt.
Input/outputsystemsoperateusing peripherals,ancillarydevicesthatmaybe electronic,mechanicaloracombinationofthetwo.Theseallowthemicroprocessor toacquireexternalinformation,andtotransmitinformationtotheexterior.Computer
mice,screensandkeyboardsareperipheralsusedwithdesktopcomputers,butother elementssuchasmotors,analog/digitalacquisitioncards,etc.arealsoperipherals.
Ifperipheralsarepresent,themicroprocessorneedstodevotepartofits processingtimetodataacquisitionandtothetransmissionofcomputedresults.This interactionwithperipheralsmaybedirectlyintegratedintoprograms.Butinthis case,theprogramshavetointegrateregularcheckingofinputperipheralstoseeif newinformationisavailable.Itistechnicallydifficult(ifnotimpossible)toinclude suchamonitoringineveryprogram.Furthermore,regularperipheralchecksarea wasteoftimeandenergyifnonewdataisavailable.Finally,thereisnoguarantee thatinformationwouldarriveexactlyatthemomentofchecking,asdatamaybe asynchronously emitted.
Thisproblemcanbeavoidedbyrelyingonthehardwaretoindicatethe occurrenceofnewexternalevents,insteadofusingsoftwaretocheckforthese events.The interrupt mechanismisusedtointerrupttheexecutionofthecurrentcode andtolaunchtheinterrupthandlerassociatedwiththeexternalevent.Thishandleris asectionofcode,whichisnotexplicitlycalledbytheprogrambeingexecuted;itis locatedatanaddressknownbythemicroprocessor.Asanyprogrammaybe interruptedatanypoint,theprocessorstate,andnotablytheregisters,mustbesaved beforeprocessingtheinterrupt.Thecodethatisexecutedtoprocesstheinterruptwill indeedusetheregistersandmodifytheSR,SPandPC.Therefore,previousvaluesof registersmustberestoredinordertoresumeexecutionoftheinterruptedcode.This contextsavingiscarriedoutpartiallybythehardwareandpartiallybythesoftware.
1.2.Computers:ahigh-levelview
Thelow-levelvisionofavonNeumannmachinepresentedinsection1.1 providesagoodoverviewofthecomponentsofacomputerandofprogram execution,withoutgoingintodetailconcerningtheoperationsofelectronic components.However,thisviewisnotparticularlyhelpfulinthecontextofeveryday programmingactivity.Programsinbinarycode,orevenassemblycode,aredifficult towriteastheyneedtotakeaccountofeverydetailofexecution;theyare,bynature, longandhardtoreview,understandanddebug.Thefirst“high-level”programming languagesemergedveryshortlyafterthefirstcomputers.Theselanguagesassign namestocertainvaluesandaddressesinthememory,providingasetofinstructions thatcanbesplitintolow-levelmachineinstructions.Inotherterms,programming languagesofferanabstractvisionofthecomputer,enablinguserstoignorelow-level detailswhilewritingaprogram.The“helloworld”programinFigure1.2clearly demonstratesthepowerofabstractionofCcomparedtotheX86assemblylanguage.
1.2.1. Modelingcomputations
Anyprogramissimplyadescription,initsownprogramminglanguage,ofa seriesofcomputations(includingreadingandwriting),whicharetheonlyoperations thatacomputercancarryout.Anabstractviewofacomputerrequiresanabstract view–wecallita model –ofthenotionofcomputation.Thissubjectwasfirst addressedwellbeforetheemergenceofcomputers,inthelate19thcentury,by logicians,mathematiciansandphilosophers,whointroducedarangeofdifferent approachestothetheoryofcalculability.
TheTuringmachine[TUR95]isamathematicalmodelofcomputationintroduced in1936.Thismachineoperatesonaninfinitememorytapedividedintocellsandhas threeinstructions:moveonecellofthetaperightorleft,writeorreadasymbolin thecellorcomparethecontentsoftwocells.Ithasbeenformallyproventhatany “imperative”programminglanguage,featuringassignment,aconditionalinstruction anda while loop,hasthesamepowerofexpressionasthisTuringmachine.
Severalothermodelsofthenotionofalgorithmiccomputationwereintroduced overthecourseofthe20thcentury,andhavebeenformallyproventobeequivalent totheTuringmachine.OnenotableexampleisKleene’srecursiontheory[KLE52], thebasisforthe“purefunctional”languages,basedonthenotionof(potentially) recursivefunctions;hence,theselanguagesalsohavethesamepowerofexpression astheTuringmachine.Purefunctionalandimperativelanguageshavedevelopedin parallelthroughoutthehistoryofhigh-levelprogramming,leadingtodifferent programmingstyles.
1.2.2. High-levellanguages
Broadlyspeaking,theexecutionofafunctionalprogramcarriesoutaseriesof functioncallsthatleadtotheresult,withintermediatevaluesstoredexclusivelyin theregisters.Theexecutionofanimperativeprogramcarriesoutasequenceof modificationsofmemorycellsnamedbyidentifiers,thevaluesinthecellsbeing computedduringexecution.Themostwidespreadhigh-levellanguagesincludeboth functionalandimperativefeatures,alongwithvariouspossibilities(modules,object features,etc.)todividesourcecodeintopiecesthatcanbereused.
Whateverthestyleofprogrammingused,anyprogramwritteninahigh-level languageneedstobetranslatedintobinarylanguagetobeexecuted.These translationsareexecutedeithereverytimetheprogramisexecuted–inwhichcase thetranslationprogramisknownasan interpreter orjustonce,storingtheproduced binarycode–inwhichcasethetranslatorisknownasa compiler.
Aswehaveseen,high-levellanguagesfacilitatethecodingofalgorithms.They easereviewingofthesourcecodeofaprogram,asthetextismoreconcisethanit
10ConceptsandSemanticsofProgrammingLanguages1
wouldbeforthesamealgorithminassemblycode.Thisdoesnot,however,implythat usersgainabetterunderstandingofthewaytheprogramworks.Towriteaprogram, apreciseknowledgeoftheconstructsused–inotherterms,their semantics,what theydoandwhattheymean–iscrucialtounderstandthesourcecode.Bugsarenot alwaystheresultofalgorithmcodingerrors,andareoftencausedbyanerroneous interpretationofelementsofthelanguage.Forexample,theincrementationoperator ++ inCexistsintwoforms(i++ or ++i),anditsunderstandingisnotassimpleasit mayseem.Forexample,theprogram:
C #include<stdio.h>
intmain(){ inti=0; printf("%d\n",i++); return(0); } willprint0,butif i++ isreplacedwith ++i,thesameprogramwillprint1.
Thereareanumberofconceptsthatarecommontoallhigh-levellanguages:value naming,organizationofnamespaces,explicitmemorymanagement,etc.However, theseconceptsmaybeexpressedusingdifferentsyntacticconstructs.Thefieldof languagesemanticscoversasetoflogico-mathematicaltheories,whichdescribethese conceptsandtheirproperties.Constructingthesemanticsofaprogramallowstothe formalverificationofwhethertheprogrampossessesalloftherequiredproperties.
1.2.3. Fromsourcecodetoexecutableprograms
Thetransitionfromtheprogramsourcetoitsexecutionisamultistepprocess. Someofthesestepsmaydifferindifferentlanguages.Inthissection,weshallgive anoverviewofthemainstepsinvolvedinanalyzingandtransformingsourcecode, applicabletomostprogramminglanguages.
Thesourcecodeofaprogramismadeupofoneormoretextfiles.Indeed,toease softwarearchitecture,mostlanguagesallowsourcecodetobesplitacrossseveralfiles, knownas compilationunits.Eachfileisprocessedseparatelypriortothefinalphase, inwhichtheresultsofprocessingarecombinedintoonesingle executable file.
1.2.3.1. Lexicalanalysis
Lexicalanalysis isthefirstphaseoftranslation:itconvertsthesequenceof charactersthatisindeedthesourcefileintoasequenceof words,assigningeachtoa category.Commentsaregenerallydeletedatthisstage.Thus,inthefollowingtext presumedtobewritteninC
/*Thisisacomment.*/ if[x==3int+)cos($v)
lexicalanalysiswillrecognizethekeyword if,theopeningbracket,theidentifier x, theoperator ==,theintegerconstant 3,thetypeidentifier int,etc.NowordinCcan containthecharacter $,soalexicalerrorwillbehighlightedwhen $v isencountered.
Lexicalanalysismaybeseenasaformof“spellcheck”,inwhicheachrecognized wordisassignedtoacategory(keyword,constant,identifier).Thesewordsarereferred toas tokens
1.2.3.2.
Syntacticanalysis
Everylanguagefollows grammar.Forexample,inEnglish,asentenceis generallyconsideredtobecorrectlyformedifitcontainsasubject,verband complementinanunderstandableorder.Programminglanguagesarenoexception: syntacticanalysis verifiesthatthephrasesofasourcefileconformwiththegrammar oftheirlanguage.Forexample,inC,thekeyword if mustbefollowedbya bracketedexpression,aninstructionmustendwithasemicolon,etc.Clearly,the sourcetextgivenintheexampleaboveinthecontextoflexicalanalysisdoesnot respectthesyntaxofC.
Technically,thesyntacticanalyzerisinchargeofthecompletegrammatical analysisofthesourcefile.Itcallsthelexicalanalyzereverytimeitrequiresatokento progressthroughtheanalyzedsource.Syntacticanalysisisthusaformofgrammar verification,anditalsobuildsarepresentationofthesourcefilebyadatastructure, whichisoftenatree,calledthe abstractsyntaxtree (AST).Thisdatastructurewill beusedbyallthefollowingphasesofcompilation,uptothepointofexecutionbyan interpreterorthecreationofanexecutablefile.
1.2.3.3. Semanticanalyses
Thefirsttwoanalysisphasesofcompilationonlyconcernthetextualstructureof thesource.Theydonotconcernthe meaning oftheprogram,i.e.its semantics.Source textsthatpassthesyntacticanalysisphasedonotalwayshavemeaning.Thephrase “theseaeatsaderivablerabbit”isgrammaticallycorrect,butisevidentlynonsense.
Thebest-knownsemanticanalysisisthetypinganalysis,whichprohibitsthe combinationofelementsthatareincompatibleinnature.Thus,inthepreviousphase, “derivable”couldbeapplicabletoafunction,butcertainlynottoa“rabbit”.
Semanticanalysesdonotreducetoaformoftypinganalysisbuttheyallinterpret theconstructsofaprogramaccordingtothesemanticsofthechosenlanguage. Semanticanalysesmaybeusedtoeliminateprograms,whichleadstoexecution errors.Theymayalsoapplysometransformationstoprogramcodeinordertogetan
executablefile(dependencyanalysis,closureelimination,etc.).Thesesemantic analysesmaybecarriedoutduringsubsequentpassesofsourcecodeprocessing, evenafterthecodegenerationphasedescribedinthefollowingsection.
1.2.3.4. Codeinterpretation/generation
Oncetheabstractsyntaxtree(oraderivedtree)hasbeencreated,therearetwo options.Eitherthetreemaybeexecuteddirectlyviaan interpreter,whichisaprogram suppliedbytheprogramminglanguage,ortheASTisusedtogenerate object code files,withtheaimofcreatinganexecutablefilethatcanberunindependently.Letus firstfocusonthesecondapproach.Theinterpretationmechanismwillbediscussed later.
CompilationusestheASTgeneratedfromthesourcefiletoproduceasequence ofinstructionstobeexecutedeitherbytheCPUorbyavirtualmachine(VM).The compilationiscorrectiftheexecutionofthissequenceofinstructionsgivesaresult, whichconformstotheprogram’ssemantics.
Optimizationphasesmaytakeplaceduringorafterobjectcodegeneration,with theaimofimprovingitscompactnessoritsexecutionspeed.Moderncompilers implementarangeofoptimizations,whichstudyliesoutsidethescopeofthisbook. Certainoptimizationsare“universal”,whileothersmaybespecifictotheCPUfor whichthecodeisgenerated.
Theobjectcodeproducedbythecompilermaybeeitherbinarycodeencoding instructionsdirectlyorsourcetextinassemblycode.Inthelattercase,aprogram–knownasthe assembler –mustbecalledtotransformthislow-levelsourcecodeinto binarycode.Generallyspeaking,assemblerssimplyproduceamechanical translationofinstructionswrittenmnemonically(mov, add, jmp,etc.)intobinary representations.However,certainmoresophisticatedassemblersmayalsocarryout optimizationoperationsatthislevel.
Assemblingmnemoniccodeintobinarycodeisaverysimpleoperation,which doesnotalterthestructureoftheprogram.ThereferencemanualofthetargetCPU provides,foreachinstruction,themeaningofthebitsofthecorrespondingbinary word.Forexample,thereferencemanualfortheMIPS32®architecture[MIP13] describesthe32-bitbinaryformatoftheinstruction ADDrd,rs,rt (withtheeffect rd ← rs+rt ontheregisters)as:
Figure1.4. CodingtheADDinstructioninMIPS32®
Threepacketsof6bitsarereservedforencodingtheregisternumbers;theother bitsinthiswordarefixedandencodetheinstruction.Thetaskoftheassembleris togeneratesuchbitpatternsaccordingtotheinstructionsencounteredinthesource code.
1.2.3.5. Linking
Asingleprogrammaybemadeupofseveralsourcefiles,compiledseparately. Oncetheobjectcodefromeachsourcefilehasbeenproduced,allthesecodesmust becollectedintoasingleexecutablefile.Eachobjectfileincludes“holes”,indicating unknowninformationatthemomentofproductionofthisobjectcode.Itisimportant toknowwheretofindthismissingcode,whencallingfunctionsdefinedinadifferent compilationunit,orwheretofindvariablesdefinedinalocationoutsideofthecurrent unit.
The linker hastogatheralltheobjectfilesandfillalltheholes.Evidently,fora setofobjectfilestoleadtoanexecutablefile,allholesmustbefilled;sothecode ofeveryfunctioncalledinthesourcemustbeavailable.Thelinkingprocessalso hastointegratetheneededcode,ifitcomesfromsomelibraries,whetherfromthe standardlanguagelibraryorathird-partylibrary.Thereisonefinalquestiontoanswer, concerningthepointatwhichexecutionshouldbegin.Incertainlanguages(suchasC, C++andJava),thesourcecodemustcontainone,andonlyone,specialfunction,often named main,whichiscalledtostarttheexecution.Inotherlanguages(suchasPython andOCaml),definitionsareexecutedintheorderinwhichtheyappear,definedbythe fileorderingduringthelinkingprocess.Thus,“executing”thedefinitionofafunction doesnotcallthefunction:instead,the“value”ofthisfunctioniscreatedandstored tobeusedlaterwhenthefunctioniscalled.Thismeansthatprogrammershaveto insertintothesourcefileacalltothefunctionwhichtheyconsidertobethe“starting point”oftheexecution.Thiscallisusuallythefinalinstructionofthelastsourcefile processedbythelinker.
Asimplifiedillustrationofthedifferenttransformationpassesinvolvedinsource codecompilationisshowninFigure1.5. generation
1.2.3.6.
Interpretationandvirtualmachines
Aswehaveseen,informallyspeaking,aninterpreter“executes”aprogramdirectly fromtheAST.Furthermore,itwassaidthatthecodegenerationprocessmaygenerate
Figure1.5. Compilationprocess
codeforavirtualmachine.Inreality,interpretersrarelyworkdirectlyonthetree; compilationtoavirtualmachineisoftencarriedoutasanintermediatestage.A virtual machine (VM)maybeseenasapseudo-microprocessor,withoneormorestacks, registersandfairlyhigh-levelinstructions.ThecodeforaVMisoftenreferredtoas bytecode.Inthiscase,compilationdoesnotgenerateafiledirectlyexecutablebythe CPU.Executioniscarriedoutbythe virtualmachineinterpreter,aprogramsupplied bytheprogramminglanguageenvironment.So,thedifferencebetweeninterpretation andcompilationisnotclear-cut.
ThereareseveraladvantagesofusingaVM:thecompilernolongerneedstotake thespecificitiesoftheCPUintoaccount,thecodeisoftenmorecompactand portabilityishigher.Aslongastheexecutablefileforthevirtualmachineinterpreter isavailableonacomputer,itwillbepossibletogenerateabinaryfileforthe computerinquestion.Thedrawbacktothisapproachisthattheprogramsobtainedin thiswayareoftenslowerthanprogramscompiledas“native”machinecode.