ShortOverview
Foreword xxv
AbouttheAuthorxxvii
Acknowledgementsxxix
Preface xxxi
AbouttheCompanionSitexxxv
IIntroduction1
1TheBigPicturewithKondratievandKardashev3
2TheScientificMethodandData7
3Conventions 11
IIStartingwithRandElementsofStatistics19
4TheBasicsofR21
5LexicalScopingandEnvironments81
6TheImplementationofOO87
7TidyRwiththeTidyverse121
8ElementsofDescriptiveStatistics139
9VisualisationMethods159
10TimeSeriesAnalysis197
11FurtherReading211
32RMarkdown699
33knitrandLATEX703
34AnAutomatedDevelopmentCycle707
35WritingandCommunicationSkills709
36InteractiveApps713
VIIIBiggerandFasterR741
37ParallelComputing743
38RandBigData761
39ParallelismforBigData767
40TheNeedforSpeed793
IXAppendices819
ACreateyourownRpackage821
BLevelsofMeasurement829
CTrademarkNotices833
DCodeNotShownintheBodyoftheBook839
EAnswerstoSelectedQuestions845
Bibliography 859
Nomenclature 869
Index 881
ShortOverview ix
Contents
Foreword xxv
AbouttheAuthorxxvii
Acknowledgementsxxix
Preface xxxi
AbouttheCompanionSitexxxv
IIntroduction1
1TheBigPicturewithKondratievandKardashev3
2TheScientificMethodandData7
3Conventions 11
IIStartingwithRandElementsofStatistics19
4TheBasicsofR21
4.1GettingStartedwithR..................................23
4.2Variables..........................................26
4.3DataTypes.........................................28
4.3.1TheElementaryTypes..............................28
4.3.2Vectors.......................................29
4.3.2.1CreatingVectors............................29
4.3.3AccessingDatafromaVector..........................29
4.3.3.1VectorArithmetic...........................30
4.3.3.2VectorRecycling............................30
4.3.3.3ReorderingandSorting........................31
4.3.4Matrices......................................32
4.3.4.1CreatingMatrices...........................32
4.3.4.2NamingRowsandColumns.....................33
4.3.4.3AccessSubsetsofaMatrix......................33
4.6.1Built-inFunctions................................69
4.6.2HelpwithFunctions...............................69
4.6.3User-definedFunctions.............................70
5LexicalScopingandEnvironments81
5.1EnvironmentsinR....................................81
5.2LexicalScopinginR....................................83
6TheImplementationofOO87
6.1BaseTypes.........................................89
6.2S3Objects.........................................91
6.2.1CreatingS3Objects................................94
6.2.2CreatingGenericMethods............................96
6.2.3MethodDispatch.................................97
6.2.4GroupGenericFunctions............................98
6.3S4Objects.........................................100
6.3.1CreatingS4Objects................................100
6.3.2UsingS4Objects.................................101
6.3.3ValidationofInput................................105
6.3.4Constructorfunctions..............................107
6.3.5The.Dataslot...................................108
6.3.6RecognisingObjects,GenericFunctions,andMethods............108
6.3.7CreatingS4Generics...............................110
6.3.8MethodDispatch.................................111
6.4TheReferenceClass,refclass,RCorR5Model.....................113
6.4.1CreatingRCObjects...............................113
6.4.2ImportantMethodsandAttributes.......................117
6.5ConclusionsabouttheOOImplementation......................119
7TidyRwiththeTidyverse121
7.1ThePhilosophyoftheTidyverse.............................121
7.2PackagesintheTidyverse.................................124
7.2.1TheCoreTidyverse................................124
7.2.2TheNon-coreTidyverse.............................125
7.3WorkingwiththeTidyverse...............................127
7.3.1Tibbles.......................................127
7.3.2PipingwithR...................................132
7.3.3AttentionPointsWhenUsingthePipe.....................133
7.3.4AdvancedPiping.................................134
7.3.4.1TheDollarPipe............................134
7.3.4.2TheT-Pipe...............................135
7.3.4.3TheAssignmentPipe.........................136
7.3.5Conclusion....................................137
8ElementsofDescriptiveStatistics139
8.1MeasuresofCentralTendency..............................139
8.1.1Mean........................................139
8.1.1.1TheArithmeticMean.........................139
8.1.1.2GeneralisedMeans..........................140
8.1.2TheMedian....................................142
8.1.3TheMode.....................................143
8.2MeasuresofVariationorSpread.............................145
8.3MeasuresofCovariation.................................147
14.2.2CreatingtheDatabase..............................228
14.2.3CreatingtheTablesandRelations........................229 14.3AddingDatatotheDatabase...............................235 14.4QueryingtheDatabase..................................239
14.4.1TheBasicSelectQuery..............................239
14.4.2MoreComplexQueries..............................240 14.5ModifyingtheDatabaseStructure............................244 14.6SelectedFeaturesofSQL.................................249
14.6.1ChangingData..................................249
14.6.2FunctionsinSQL.................................249
15ConnectingRtoanSQLDatabase253
IVDataWrangling257
16AnonymousData261
17DataWranglinginthetidyverse265 17.1ImportingtheData....................................266
17.1.1ImportingfromanSQLRDBMS........................266 17.1.2ImportingFlatFilesintheTidyverse......................267 17.1.2.1CSVFiles................................270 17.1.2.2MakingSenseofFixed-widthFiles.................271 17.2TidyData..........................................275 17.3TidyingUpDatawithtidyr................................277
17.3.1SplittingTables..................................278 17.3.2ConvertHeaderstoData.............................281 17.3.3SpreadingOneColumnOverMany.......................284 17.3.4SplitOneColumnsintoMany..........................285 17.3.5MergeMultipleColumnsIntoOne.......................286 17.3.6WrongData....................................287 17.4SQL-likeFunctionalityviadplyr.............................288
17.4.1SelectingColumns................................288
17.4.2FilteringRows..................................289
17.4.3Joining.......................................290
17.4.4MutatingData...................................293
17.4.5SetOperations...................................296 17.5StringManipulationinthetidyverse..........................299 17.5.1BasicStringManipulation............................300 17.5.2PatternMatchingwithRegularExpressions..................302 17.5.2.1TheSyntaxofRegularExpressions.................303 17.5.2.2FunctionsUsingRegex........................308 17.6Dateswithlubridate....................................314 17.6.1ISO8601Format.................................315 17.6.2Time-zones....................................317
17.6.3ExtractDateandTimeComponents......................318 17.6.4CalculatingwithDate-times...........................319 17.6.4.1Durations................................320
xvi Contents
17.6.4.2Periods.................................321 17.6.4.3Intervals................................323 17.6.4.4Rounding................................324 17.7FactorswithForcats....................................325
18DealingwithMissingData333
19DataBinning343
23LearningMachines405
23.1DecisionTree.......................................407
23.1.1EssentialBackground..............................407
23.1.1.1TheLinearAdditiveDecisionTree..................407 23.1.1.2TheCARTMethod..........................407 23.1.1.3TreePruning..............................408 23.1.1.4ClassificationTrees..........................409 23.1.1.5BinaryClassificationTrees......................411
23.1.2ImportantConsiderations............................412
23.1.2.1BroadeningtheScope.........................412 23.1.2.2SelectedIssues.............................413 23.1.3GrowingTreeswiththePackagerpart.....................414 23.1.3.1GettingStartedwiththeFunctionrpart()..............414 23.1.3.2ExampleofaClassificationTreewithrpart.............415 23.1.3.3VisualisingaDecisionTreewithrpart.plot.............418 23.1.3.4ExampleofaRegressionTreewithrpart..............419 23.1.4EvaluatingthePerformanceofaDecisionTree................424 23.1.4.1ThePerformanceoftheRegressionTree..............424 23.1.4.2ThePerformanceoftheClassificationTree.............424 23.2RandomForest......................................428 23.3ArtificialNeuralNetworks(ANNs)...........................434
23.3.1TheBasicsofANNsinR.............................434 23.3.2NeuralNetworksinR..............................436
23.3.3TheWork-flowtoforFittingaNN.......................438 23.3.4CrossValidatetheNN..............................444 23.4SupportVectorMachine.................................447 23.4.1FittingaSVMinR................................447 23.4.2OptimizingtheSVM...............................449 23.5UnsupervisedLearningandClustering.........................450 23.5.1k-MeansClustering................................450 23.5.1.1k-MeansClusteringinR.......................452 23.5.1.2PCAbeforeClustering........................455 23.5.1.3OntheRelationBetweenPCAandk-Means............461 23.5.2VisualizingClustersinThreeDimensions...................462 23.5.3FuzzyClustering.................................464 23.5.4HierarchicalClustering.............................466 23.5.5OtherClusteringMethods............................468
24TowardsaTidyModellingCyclewithmodelr469 24.1AddingPredictions....................................470 24.2AddingResiduals.....................................471 24.3BootstrappingData....................................472 24.4OtherFunctionsofmodelr................................474
25ModelValidation475
25.1ModelQualityMeasures.................................476 25.2PredictionsandResiduals................................477 25.3Bootstrapping.......................................479 25.3.1BootstrappinginBaseR.............................479
xviii Contents
25.3.2Bootstrappinginthetidyversewithmodelr..................481 25.4Cross-Validation......................................483
25.4.1ElementaryCrossValidation..........................483
25.4.2MonteCarloCrossValidation..........................486
25.4.3 k -FoldCrossValidation.............................488 25.4.4ComparingCrossValidationMethods.....................489 25.5ValidationinaBroaderPerspective...........................492
26.1FinancialAnalysiswithquantmod...........................495 26.1.1TheBasicsofquantmod.............................495 26.1.2TypesofDataAvailableinquantmod......................496
26.1.3Plottingwithquantmod.............................497
26.1.4ThequantmodDataStructure..........................500
26.1.4.1Sub-settingbyTimeandDate....................500
26.1.6.1FinancialModelsinquantmod....................504
26.1.6.2ASimpleModelwithquantmod...................504 26.1.6.3TestingtheModelRobustness....................507
27MultiCriteriaDecisionAnalysis(MCDA)511
27.1WhatandWhy.......................................511
27.2GeneralWork-flow....................................513
27.3IdentifytheIssueatHand:Steps1and2........................516
27.4Step3:theDecisionMatrix................................518
27.4.1ConstructaDecisionMatrix...........................518
27.4.2NormalizetheDecisionMatrix.........................520
27.5Step4:DeleteInefficientandUnacceptableAlternatives...............521
27.5.1UnacceptableAlternatives............................521
27.5.2Dominance–InefficientAlternatives......................521
27.6PlottingPreferenceRelationships............................524
27.7Step5:MCDAMethods..................................526
27.7.1ExamplesofNon-compensatoryMethods...................526
27.7.1.1TheMaxMinMethod.........................526
27.7.1.2TheMaxMaxMethod.........................526
27.7.2TheWeightedSumMethod(WSM).......................527
27.7.3WeightedProductMethod(WPM).......................530
27.7.4ELECTRE.....................................530
27.7.4.1ELECTREI...............................532
27.7.4.2ELECTREII..............................538
27.7.4.3ConclusionsELECTRE........................539
27.7.5PROMethEE....................................540
27.7.5.1PROMethEEI.............................543
27.7.5.2PROMethEEII.............................549
27.7.6PCA(Gaia)....................................553
27.7.7OutrankingMethods...............................557
27.7.8GoalProgramming................................558
27.8SummaryMCDA.....................................561
VIIntroductiontoCompanies563
28FinancialAccounting(FA)567
28.1TheStatementsofAccounts...............................568
28.1.1IncomeStatement................................568
28.1.2NetIncome:TheP&Lstatement........................568
28.1.3BalanceSheet...................................569
28.2TheValueChain......................................571
28.3Further,Terminology...................................573
28.4SelectedFinancialRatios.................................575
29ManagementAccounting583
29.1Introduction........................................583
29.1.1DefinitionofManagementAccounting(MA).................583
29.1.2ManagementInformationSystems(MIS)...................584
29.2SelectedMethodsinMA.................................585
29.2.1CostAccounting.................................585
29.2.2SelectedCostTypes................................587
29.3SelectedUseCasesofMA................................590
29.3.1BalancedScorecard................................590
29.3.2KeyPerformanceIndicators(KPIs).......................591
29.3.2.1LaggingIndicators...........................592
29.3.2.2LeadingIndicators...........................592
29.3.2.3SelectedUsefulKPIs.........................593
30AssetValuationBasics597
30.1TimeValueofMoney...................................598
30.1.1InterestBasics...................................598
30.1.2SpecificInterestRateConcepts.........................598
30.1.3Discounting....................................600
30.2Cash............................................601
30.3Bonds............................................602
30.3.1FeaturesofaBond................................602
30.3.2ValuationofBonds................................604
30.3.3Duration......................................606
30.3.3.1MacaulayDuration..........................606
30.3.3.2ModifiedDuration...........................607
30.4TheCapitalAssetPricingModel(CAPM)........................610
30.4.1TheCAPMFramework.............................610
30.4.2TheCAPMandRisk...............................612
30.4.3LimitationsandShortcomingsoftheCAPM..................612
30.5Equities...........................................614
xx Contents
30.5.1Definition.....................................614
30.5.2ShortHistory...................................614
30.5.3ValuationofEquities...............................615
30.5.4AbsoluteValueModels..............................616
30.5.4.1DividendDiscountModel(DDM)..................616
30.5.4.2FreeCashFlow(FCF).........................620
30.5.4.3DiscountedCashFlowModel....................622
30.5.4.4DiscountedAbnormalOperatingEarningsModel.........623
30.5.4.5NetAssetValueMethodorCostMethod..............624
30.5.4.6ExcessEarningsMethod.......................625
30.5.5RelativeValueModels..............................625
30.5.5.1TheConceptofRelativeValueModels...............625
30.5.5.2ThePriceEarningsRatio(PE)....................626
30.5.5.3PitfallswhenusingPEAnalysis...................627
30.5.5.4OtherCompanyValueRatios.....................627
30.5.6SelectionofValuationMethods.........................630
30.5.7PitfallsinCompanyValuation..........................631
30.5.7.1ForecastingPerformance.......................631
30.5.7.2ResultsandSensitivity........................631
30.6ForwardsandFutures...................................638 30.7Options...........................................640
30.7.1Definitions....................................640
30.7.2CommercialAspects...............................642
30.7.3ShortHistory...................................643
30.7.4ValuationofOptionsatMaturity........................644
30.7.4.1ALongCallatMaturity........................644
30.7.4.2AShortCallatMaturity.......................645
30.7.4.3LongandShortPut..........................646
30.7.4.4ThePut-CallParity..........................648
30.7.5TheBlackandScholesModel..........................649
30.7.5.1PricingofOptionsBeforeMaturity.................649
30.7.5.2ApplytheBlackandScholesFormula................650
30.7.5.3TheLimitsoftheBlackandScholesModel.............653
30.7.6TheBinomialModel...............................654
30.7.6.1RiskNeutralMethod.........................655
30.7.6.2TheEquivalentPortfolioBinomialModel..............659
30.7.6.3SummaryBinomialModel......................660
30.7.7DependenciesoftheOptionPrice........................660
30.7.7.1DependenciesinaLongCallOption................661
30.7.7.2DependenciesinaLongPutOption.................662
30.7.7.3SummaryofFindings.........................664
30.7.8TheGreeks....................................664
30.7.9DeltaHedging...................................665
30.7.10LinearOptionStrategies.............................667
30.7.10.1PlottingaPortfolioofOptions....................667
30.7.10.2SingleOptionStrategies........................670
30.7.10.3CompositeOptionStrategies.....................671
30.7.11IntegratedOptionStrategies...........................674
30.7.11.1TheCoveredCall...........................675
30.7.11.2TheMarriedPut............................676
30.7.11.3TheCollar...............................677
30.7.12ExoticOptions..................................678
30.7.13CapitalProtectedStructures...........................680
VIIReporting683
31AGrammarofGraphicswithggplot2687
31.1TheBasicsofggplot2...................................688
31.2Over-plotting........................................692
31.3CaseStudyforggplot2..................................696
32RMarkdown699
33knitrandLATEX703
34AnAutomatedDevelopmentCycle707
35WritingandCommunicationSkills709
36InteractiveApps713
36.1Shiny............................................715
36.2BrowserBornDataVisualization............................719
36.2.1HTML-widgets..................................719
36.2.2InteractiveMapswithleaflet..........................720
36.2.3InteractiveDataVisualisationwithggvis....................721
36.2.3.1GettingStartedinRwithggvis....................721
36.2.3.2CombiningthePowerofggvisandShiny..............723
36.2.4googleVis.....................................723
36.3Dashboards........................................725
36.3.1TheBusinessCase:aDiversityDashboard...................726
36.3.2ADashboardwithflexdashboard........................731
36.3.2.1AStaticDashboard..........................731
36.3.2.2InteractiveDashboardswithflexdashboard.............736
36.3.3ADashboardwithshinydashboard.......................737
VIIIBiggerandFasterR741
37ParallelComputing743
37.1CombineforeachanddoParallel.............................745
37.2DistributeCalculationsoverLANwithSnow......................748
37.3UsingtheGPU.......................................752
37.3.1GettingStartedwithgpuR............................754
37.3.2OntheImportanceofMemoryuse.......................757
37.3.3ConclusionsforGPUProgramming......................759
38RandBigData761
38.1UseaPowerfulServer...................................763
38.1.1UseRonaServer.................................763
38.1.2LettheDatabaseServerdotheHeavyLifting.................763
38.2UsingmoreMemorythanwehaveRAM........................765
39ParallelismforBigData767
39.2.3.1AUserDefinedFunctiononSpark.................780
40.2.2UseVectorisationwhereAppropriate......................797 40.2.3Pre-allocatingMemory..............................799 40.2.4UsetheFastestFunction.............................800 40.2.5UsetheFastestPackage.............................801 40.2.6BeMindfulaboutDetails............................802
40.2.7CompileFunctions................................804
40.2.8UseCorC++CodeinR.............................806
40.2.9UsingaC++SourceFileinR..........................809
40.2.10CallCompiledC++FunctionsinR.......................811
40.3ProfilingCode.......................................812
40.3.1ThePackageprofr................................813
ACreateyourownRPackage821
A.1CreatingthePackageintheRConsole.........................823
A.2UpdatethePackageDescription.............................825 A.3DocumentingtheFunctionsxs..............................826 A.4LoadingthePackage...................................827
A.5FurtherSteps........................................828
BLevelsofMeasurement829
B.1NominalScale.......................................829
B.2OrdinalScale........................................830
B.3IntervalScale.......................................831
B.4RatioScale.........................................832
CTrademarkNotices833
C.1GeneralTrademarkNotices...............................834
C.2R-RelatedNotices.....................................835
C.2.1CreditingDevelopersofRPackages.......................835
C.2.2TheR-packagesusedinthisBook........................835
DCodeNotShownintheBodyoftheBook839
EAnswerstoSelectedQuestions845
Bibliography 859
Nomenclature 869
Index 881
Preface
Theauthorhaswrittenthisbookbasedonhisexperiencethatspansroughlythreedecadesin insurance,banking,andassetmanagement.Duringhiscareer,theauthorworkedinIT,structuredandmanagedhighlytechnicalinvestmentportfolios(atsomepointoversaw =C24billion inthousandinvestmentfunds),fulfilledmanyC-levelroles(e.g.wasCEOofKBCTFISA[an assetmanagerinPoland],wasCIOandCOOforEperonSA[afundmanagerinIreland]and satonboardsofinvestmentfunds,andwasinvolvedinbig-dataprojectsinLondon),anddid quantitativeanalysisinriskdepartmentsofbanks.Thisgavetheauthorauniqueandin-depth viewofmanyareasrangingformanalytics,big-data,databases,businessrequirements,financial modelling,etc.
Inthisbook,theauthorpresentsastructuredoverviewofhisknowledgeandexperiencefor anyonewhoworkswithdataandinvitesthereadertounderstandthebiggerpicture,anddiscover newaspects.ThisbookalsodemystifieshypearoundmachinelearningandAI,byhelpingthe readertounderstandthemodelsandprogramtheminRwithoutspendingtoomuchtimeonthe theory.
Thisbookaimstobeastartingpointforquants,datascientists,modellers,etc.Itaimsto bethebookthatbridgesdifferentdisciplinessothataspecialistinonedomaincangrabthis book,understandhowhis/herdisciplinefitsinthebiggerpicture,andgetenoughmaterialto understandthepersonwhoisspecializedinarelateddiscipline.Therefore,itcouldbetheideal bookthathelpsyoutomakecareermovetoanotherdisciplinesothatinafewyearsyouarethat personwhounderstandsthewholedata-chain.Inshort,theauthorwantstogiveyouashort-cut totheknowledgethathespent30yearstoaccumulate.
Anotherimportantpointisthatthisbookiswrittenbyandforpractitioners:peoplethatwork withdata,programmingandmathematicsforalivinginacorporateenvironment.So,thisbook wouldbemostinterestingforanyoneinterestedindata-science,machinelearning,statistical learningandmathematicalmodellingandwhomeverwantstoconveytechnicalmattersinaclear andconcisewaytonon-specialists.
Thisalsomeansthatthisbookisnotnecessarilythebestbookinanyofthedisciplinesthatit spans.Ineveryspecialisationtherearealreadygoodcontenders.
• Moreformalintroductionstostatisticsareforexamplein:Cyganowski,Kloeden,and Ombach(2001)andAndersenetal.(1987).Therearealsomanybooksaboutspecific stochasticprocessesandtheirapplicationsinfinancialmarkets:seee.g.Wolfgangand Baschnagel(1999),MalliarisandBrock(1982),andMikosch(1998).Whileknowledgeof stochasticprocessesandtheirimportanceinassetpricingareimportant,thiscoversonly averynarrowspotofapplicationsandtheory.Thisbookismoregeneral,moregentlyon theoreticalfoundationsandfocussesmoreontheuseofdatatoanswerreal-lifeproblems ineverydaybusinessenvironment.
• AcomprehensiveintroductiontostatisticsoreconometricscanbefoundinPeracchi(2001) orGreene(1997).AgeneralandcomprehensiveintroductioninstatisticsisalsoinNeter, Wasserman,andWhitmore(1988).
• Thisisnotsimplyabookaboutprogrammingand/oranyrelatedtechniques.Ifyoujust wanttolearnprogramminginR,thenGrolemund(2014)willbegetyoustartedfaster.Our PartIIwillalsogetyoustartedinprogramming,thoughitassumesacertainfamiliarity withprogrammingandmainlyzoomsinonaspectsthatwillbeimportantintherestofthe book.
• Thisbookisnotacomprehensivebooksaboutfinancialmodelling.Otherbooksdoabetter jobinlistingalltypesofpossiblemodels.NobookdoesabetterjobherethanBernardMarr’s publication:Marr(2016):“KeyBusinessAnalytics,the60+businessanalysistoolevery managerneedstoknow.”Thisbookwilllistyouallwordsthatsomemanagersmightuse andwhatitmeans,withoutanyofthemathematicsnoranyortheprogrammingbehind.I warmlyrecommendkeepingthisbooknexttoours.Wheneversomeonecomesupwitha termlike“customerchurnanalytics”forexample,youcanuseBernard’sbooktofindout whatitactuallymeansandthenturntooursto“getyourhandsdirty”andactuallydoit.
• Ifyouareonlyinterestedinstatisticallearningandmodelling,youwillfindthefollowing booksmorefocused:Hastie,Tibshirani,andFriedman(2009)oralsoJames,Witten,Hastie, andTibshirani(2013)whoalsousesR.
• Amorein-depthintroductiontoAIcanbefoundinRussellandNorvig(2016).
• DatascienceismoreelaboratelytreatedinBaesens(2014)andtherecentbookbyWickham andGrolemund(2016)thatprovidesanexcellentintroductiontoRanddatasciencein general.Thislastbookisagreatadd-ontothisbookasitfocussesmoreonthedata-aspects (butlessonthestatisticallearningpart).Wealsofocusmoreonthepracticalaspectsand realdataproblemsincorporateenvironment.
AbookthatcomesclosetooursinpurposeisthebookthatmyfriendprofessorBartBaetens hascompiled“AnalyticsinaBigDataWorld,theEssentialguidetodatascienceanditsapplications”:Baesens(2014).Ifthemathematics,programming,andRitselfscareyouinthisbook, thenBart’sbookisforyou.Bart’sbookcoversdifferentmethods,butaboveall,forthereader,itis sufficienttobeabletouseaspreadsheettodosomebasiccalculations.Therefore,itwillnothelp youtotacklebigdatanorprogramminganeuralnetworkyourself,butyouwillunderstandvery wellwhatitmeansandhowthingswork.
AnotherbookthatmightworkwellifthemathsinthisoneareprohibitivetoyouisProvost andFawcett(2013),itwillgiveyousomeinsightinwhatthestatisticallearningisandhowit works,butwillnotprepareyoutouseitonrealdata.
Summarizing,IsuggestyoubuynexttothisbookalsoMarr(2016)andBaesens(2014). Thiswillprovideyouacompletechainfrombusinessandbuzzwords(Bernard’sbook)over understandingwhatmodellingisandwhatpracticalissuesonewillencounter(Bart’sbook)to implementingthisinacorporatesettingandsolvethepracticalproblemsofadatascientistand modelleronsizeabledata(thisbook).
Inanutshell,thisbookdoesitall,isgentleontheoreticalfoundationsandaimstobeaonestopshoptoshowthebigpicture,learnallthosethingsandactuallyapplyit.Itaimstoserveas abasiswhenlaterpickingupmoreadvancedbooksincertainnarrowareas.Thisbookwilltake youonajourneyofworkingwithdatainarealcompany,andhence,itwilldiscussalsopractical problemssuchaspeoplefillinginformsorextractingdatafromaSQLdatabase.
xxxii Preface
Itshouldbereadableforanypersonthatfinished(orisfinishing)universityleveleducationin aquantitativefieldsuchasphysics,civilengineering,mathematics,econometrics,etc.Itshould alsobereadablebytheseniormanagerwithatechnicalbackground,whotriestounderstand whathisarmyofquants,datascientists,anddevelopersareupto,whilehavingfunlearning R.Afterreadingthisbookyouwillbeabletotalktoall,challengetheirwork,andmakemost analysisyourselforbepartofabiggerentityandspecializeinoneofthestepsofmodellingor data-manipulation.
Insomeway,thisbookcanalsobeseenasacelebrationofFOSS(FreeandOpenSourceSoftware).Weproudlymentionthatforthisbooknocommercialsoftwarewasusedatall.TheoperatingsystemisLinux,thewindowsmanagerFluxbox(sometimesLXDEorKDE),Kileandvihelped theeditingprocess,OkulardisplayedthePDF-file,eventhedatabaseserversandHadoop/Spark areFOSS...andofcourseRandLATEXprovidedtheicingonthecake.FOSSmakesthisworlda moreinclusiveplaceasitmakestechnologymoreattainableinpoorerplacesonthisworld.
Hence,weextendawarmthankstoallpeoplethatspendsomuchtimetocontributingtofree software.
xxxiii
Preface
FOSS