Instant Download Machine learning for business analytics: concepts, techniques and applications with

Page 1


MachineLearningforBusinessAnalytics:Concepts, TechniquesandApplicationswithJMPPro,2nd EditionGalitShmueli

https://ebookmass.com/product/machine-learning-for-businessanalytics-concepts-techniques-and-applications-with-jmppro-2nd-edition-galit-shmueli/

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Machine Learning for Business Analytics: Concepts, Techniques and Applications in RapidMiner Galit Shmueli

https://ebookmass.com/product/machine-learning-for-business-analyticsconcepts-techniques-and-applications-in-rapidminer-galit-shmueli/

ebookmass.com

Data Mining for Business Analytics: Concepts, Techniques and Applications in Python eBook

https://ebookmass.com/product/data-mining-for-business-analyticsconcepts-techniques-and-applications-in-python-ebook/

ebookmass.com

Supply Chain Analytics: Concepts, Techniques and Applications 1st Edition Kurt Y. Liu

https://ebookmass.com/product/supply-chain-analytics-conceptstechniques-and-applications-1st-edition-kurt-y-liu/

ebookmass.com

Wiley 2021 Interpretation and Application of IFRS Standards 1st Edition Pkf International Ltd

https://ebookmass.com/product/wiley-2021-interpretation-andapplication-of-ifrs-standards-1st-edition-pkf-international-ltd/ ebookmass.com

Successful Event Management: A Practical Handbook 5th

Edition Bryn Parry

https://ebookmass.com/product/successful-event-management-a-practicalhandbook-5th-edition-bryn-parry/

ebookmass.com

Mortmain Hall Martin Edwards

https://ebookmass.com/product/mortmain-hall-martin-edwards/

ebookmass.com

Writing Pain in the Nineteenth-Century United States Thomas Constantinesco

https://ebookmass.com/product/writing-pain-in-the-nineteenth-centuryunited-states-thomas-constantinesco/

ebookmass.com

Rescue in the Wilderness (Frontier Hearts Book 1) Andrea Byrd

https://ebookmass.com/product/rescue-in-the-wilderness-frontierhearts-book-1-andrea-byrd/

ebookmass.com

Reclaiming Eden David S -K Ting

https://ebookmass.com/product/reclaiming-eden-david-s-k-ting/

ebookmass.com

Teaching Students With High Incidence Disabilities: Strategies for Diverse Classrooms 1st Edition, (Ebook PDF)

https://ebookmass.com/product/teaching-students-with-high-incidencedisabilities-strategies-for-diverse-classrooms-1st-edition-ebook-pdf/

ebookmass.com

MACHINELEARNINGFOR BUSINESSANALYTICS

MACHINELEARNINGFOR BUSINESSANALYTICS

Concepts,Techniques,and

SecondEdition

GALITSHMUELI

NationalTsingHuaUniversity Taipei,Taiwan

PETERC.BRUCE statistics.com Arlington,USA

MIAL.STEPHENS

JMPStatisticalDiscoveryLLC Cary,USA

MURALIDHARAANANDAMURTHY

SASInstituteInc Mumbai,India

NITINR.PATEL Cytel,Inc. Cambridge,USA

Copyright2023byJohnWiley&Sons,Inc.Allrightsreserved

PublishedbyJohnWiley&Sons,Inc.,Hoboken,NewJersey. PublishedsimultaneouslyinCanada.

Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyany means,electronic,mechanical,photocopying,recording,scanning,orotherwise,exceptaspermittedunder Section107or108ofthe1976UnitedStatesCopyrightAct,withouteitherthepriorwrittenpermissionofthe Publisher,orauthorizationthroughpaymentoftheappropriateper-copyfeetotheCopyrightClearanceCenter, Inc.,222RosewoodDrive,Danvers,MA01923,(978)750-8400,fax(978)750-4470,oronthewebat www.copyright.com.RequeststothePublisherforpermissionshouldbeaddressedtothePermissions Department,JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,(201)748-6011,fax(201) 748-6008,oronlineathttp://www.wiley.com/go/permission.

Trademarks:WileyandtheWileylogoaretrademarksorregisteredtrademarksofJohnWiley&Sons,Inc. and/oritsaffiliatesintheUnitedStatesandothercountriesandmaynotbeusedwithoutwrittenpermission. Allothertrademarksarethepropertyoftheirrespectiveowners.JohnWiley&Sons,Inc.isnotassociatedwith anyproductorvendormentionedinthisbook.

LimitofLiability/DisclaimerofWarranty:Whilethepublisherandauthorhaveusedtheirbesteffortsin preparingthisbook,theymakenorepresentationsorwarrantieswithrespecttotheaccuracyorcompletenessof thecontentsofthisbookandspecificallydisclaimanyimpliedwarrantiesofmerchantabilityorfitnessfora particularpurpose.Nowarrantymaybecreatedorextendedbysalesrepresentativesorwrittensalesmaterials. Theadviceandstrategiescontainedhereinmaynotbesuitableforyoursituation.Youshouldconsultwitha professionalwhereappropriate.Neitherthepublishernorauthorshallbeliableforanylossofprofitoranyother commercialdamages,includingbutnotlimitedtospecial,incidental,consequential,orotherdamages.

Forgeneralinformationonourotherproductsandservicesorfortechnicalsupport,pleasecontactourCustomer CareDepartmentwithintheUnitedStatesat(800)762-2974,outsidetheUnitedStatesat(317)572-3993orfax (317)572-4002.

Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappearsinprintmaynotbe availableinelectronicformats.FormoreinformationaboutWileyproducts,visitourwebsiteatwww.wiley.com.

LibraryofCongressCataloging-in-PublicationDataAppliedfor:

Hardback:9781119903833

CoverDesign:Wiley

CoverImage: ©AdobeLibrary/AdobeStockPhotos

Setin10/12ptTimesLTStdbyStraive,Chennai,India

Toourfamilies

BoazandNoa

Liz,Lisa,andAllison

Michael,JadeAnn,andAudreyL SeethaandAnanda

Tehmi,Arjun,andinmemoryofAneesh

1.1WhatIsBusinessAnalytics?3

1.2WhatIsMachineLearning?5

1.3MachineLearning,AI,andRelatedTerms5 StatisticalModelingvs.MachineLearning6

1.4BigData6

1.5DataScience7

1.6WhyAreThereSoManyDifferentMethods?8

1.7TerminologyandNotation8

1.8RoadMapstoThisBook10 OrderofTopics12

2.1Introduction17

2.2CoreIdeasinMachineLearning18 Classification18 Prediction18 AssociationRulesandRecommendationSystems18

PredictiveAnalytics19

DataReductionandDimensionReduction19

DataExplorationandVisualization19

SupervisedandUnsupervisedLearning19

2.3TheStepsinAMachineLearningProject21

2.4PreliminarySteps22 OrganizationofData22

SamplingfromaDatabase22

OversamplingRareEventsinClassificationTasks23 PreprocessingandCleaningtheData23

2.5PredictivePowerandOverfitting29 Overfitting29

CreationandUseofDataPartitions31

2.6BuildingaPredictiveModelwith JMPPro 34 PredictingHomeValuesinaBostonNeighborhood34 ModelingProcess36

2.7Using JMPPro forMachineLearning42

2.8AutomatingMachineLearningSolutions43 PredictingPowerGeneratorFailure44 Uber’sMichelangelo45

2.9EthicalPracticeinMachineLearning47 MachineLearningSoftware:TheStateoftheMarketbyHerb Edelstein47 Problems52

PARTIIDATAEXPLORATIONANDDIMENSIONREDUCTION

3DataVisualization

3.1Introduction59

3.2DataExamples61

Example1:BostonHousingData61 Example2:RidershiponAmtrakTrains62

3.3BasicCharts:BarCharts,LineGraphs,andScatterPlots62 DistributionPlots:BoxplotsandHistograms64 Heatmaps67

3.4MultidimensionalVisualization70

AddingVariables:Color,Hue,Size,Shape,MultiplePanels, Animation70

Manipulations:Rescaling,AggregationandHierarchies,Zooming, Filtering73

Reference:TrendLineandLabels77

ScalingUp:LargeDatasets79

MultivariatePlot:ParallelCoordinatesPlot80 InteractiveVisualization80

3.5SpecializedVisualizations82

VisualizingNetworkedData82

VisualizingHierarchicalData:MoreonTreemaps83 VisualizingGeographicalData:Maps84

3.6Summary:MajorVisualizationsandOperations,Accordingto MachineLearningGoal87 Prediction87 Classification87 TimeSeriesForecasting87 UnsupervisedLearning88 Problems89

4DimensionReduction 91

4.1Introduction91

4.2CurseofDimensionality92

4.3PracticalConsiderations92 Example1:HousePricesinBoston92

4.4DataSummaries93 SummaryStatistics94 TabulatingData96

4.5CorrelationAnalysis97

4.6ReducingtheNumberofCategoriesinCategoricalVariables98

4.7ConvertingaCategoricalVariabletoaContinuousVariable100

4.8PrincipalComponentAnalysis100 Example2:BreakfastCereals101 PrincipalComponents106 StandardizingtheData107 UsingPrincipalComponentsforClassificationandPrediction110

4.9DimensionReductionUsingRegressionModels110

4.10DimensionReductionUsingClassificationandRegressionTrees111 Problems112

PARTIIIPERFORMANCEEVALUATION

5EvaluatingPredictivePerformance 117

5.1Introduction118

5.2EvaluatingPredictivePerformance118 NaiveBenchmark:TheAverage118 PredictionAccuracyMeasures119 ComparingTrainingandValidationPerformance120

5.3JudgingClassifierPerformance121 Benchmark:TheNaiveRule121 ClassSeparation121 TheClassification(Confusion)Matrix122 UsingtheValidationData123 AccuracyMeasures123

CONTENTS

PropensitiesandThresholdforClassification124 PerformanceinUnequalImportanceofClasses127

AsymmetricMisclassificationCosts130

GeneralizationtoMoreThanTwoClasses132

5.4JudgingRankingPerformance133

LiftCurvesforBinaryData133 BeyondTwoClasses135

LiftCurvesIncorporatingCostsandBenefits136

5.5Oversampling137

CreatinganOver-sampledTrainingSet139

EvaluatingModelPerformanceUsingaNonoversampled ValidationSet139

EvaluatingModelPerformanceIfOnlyOversampledValidation SetExists140 Problems142

PARTIVPREDICTIONANDCLASSIFICATIONMETHODS

6MultipleLinearRegression 147

6.1Introduction147

6.2Explanatoryvs.PredictiveModeling148

6.3EstimatingtheRegressionEquationandPrediction149 Example:PredictingthePriceofUsedToyotaCorolla Automobiles150

6.4VariableSelectioninLinearRegression155 ReducingtheNumberofPredictors155 HowtoReducetheNumberofPredictors156 ManualVariableSelection156 AutomatedVariableSelection157

Regularization(ShriknageModels)164 Problems170

7 k-NearestNeighbors(k-NN) 175

7.1The ��-NNClassifier(CategoricalOutcome)175 DeterminingNeighbors175 ClassificationRule176 Example:RidingMowers176

ChoosingParameter �� 178 SettingtheThresholdValue179 Weighted ��-NN181 ��-NNwithMoreThanTwoClasses182 WorkingwithCategoricalPredictors182

7.2 ��-NNforaNumericalResponse184

7.3AdvantagesandShortcomingsof ��-NNAlgorithms184 Problems186

8TheNaiveBayesClassifier 189

8.1Introduction189

ThresholdProbabilityMethod190 ConditionalProbability190 Example1:PredictingFraudulentFinancialReporting190

8.2ApplyingtheFull(Exact)BayesianClassifier191 Usingthe“AssigntotheMostProbableClass”Method191 UsingtheThresholdProbabilityMethod191 PracticalDifficultywiththeComplete(Exact)BayesProcedure192

8.3Solution:NaiveBayes192

TheNaiveBayesAssumptionofConditionalIndependence193 UsingtheThresholdProbabilityMethod194 Example2:PredictingFraudulentFinancialReports194 Example3:PredictingDelayedFlights195 EvaluatingthePerformanceofNaiveBayesOutputfrom JMP 198 WorkingwithContinuousPredictors199

8.4AdvantagesandShortcomingsoftheNaiveBayesClassifier201 Problems203

9ClassificationandRegressionTrees 205

9.1Introduction206 TreeStructure206 DecisionRules207 ClassifyingaNewRecord207

9.2ClassificationTrees207 RecursivePartitioning207 Example1:RidingMowers208 CategoricalPredictors210 Standardization210

9.3GrowingaTreeforRidingMowersExample210 ChoiceofFirstSplit211 ChoiceofSecondSplit212 FinalTree212 UsingaTreetoClassifyNewRecords213

9.4EvaluatingthePerformanceofaClassificationTree215 Example2:AcceptanceofPersonalLoan215

9.5AvoidingOverfitting219 StoppingTreeGrowth:CHAID220 GrowingaFullTreeandPruningItBack220 How JMPPro LimitsTreeSize221

9.6ClassificationRulesfromTrees222

9.7ClassificationTreesforMoreThanTwoClasses224

9.8RegressionTrees224 Prediction224 EvaluatingPerformance225

9.9AdvantagesandWeaknessesofaSingleTree227

9.10ImprovingPrediction:RandomForestsandBoostedTrees229

RandomForests229

BoostedTrees230 Problems233

10LogisticRegression

10.1Introduction237

10.2TheLogisticRegressionModel239

10.3Example:AcceptanceofPersonalLoan240

ModelwithaSinglePredictor241

EstimatingtheLogisticModelfromData:MultiplePredictors243

InterpretingResultsinTermsofOdds(foraProfilingGoal)246

10.4EvaluatingClassificationPerformance247

10.5VariableSelection249

10.6LogisticRegressionforMulti-classClassification250

LogisticRegressionforNominalClasses250

LogisticRegressionforOrdinalClasses251

Example:AccidentData252

10.7ExampleofCompleteAnalysis:PredictingDelayedFlights253 DataPreprocessing255

ModelFitting,Estimation,andInterpretation---ASimpleModel256

ModelFitting,EstimationandInterpretation---TheFullModel257

ModelPerformance257 Problems264

11NeuralNets 267

11.1Introduction267

11.2ConceptandStructureofaNeuralNetwork268

11.3FittingaNetworktoData269

Example1:TinyDataset269

ComputingOutputofNodes269 PreprocessingtheData272

TrainingtheModel273

UsingtheOutputforPredictionandClassification279

Example2:ClassifyingAccidentSeverity279 AvoidingOverfitting281

11.4UserInputin JMPPro 282

11.5ExploringtheRelationshipBetweenPredictorsandOutcome284

11.6DeepLearning285

ConvolutionalNeuralNetworks(CNNs)285 LocalFeatureMap287

AHierarchyofFeatures287 TheLearningProcess287

UnsupervisedLearning288 Conclusion289

11.7AdvantagesandWeaknessesofNeuralNetworks289 Problems290

12DiscriminantAnalysis

12.1Introduction293

Example1:RidingMowers294

Example2:PersonalLoanAcceptance294

12.2DistanceofanObservationfromaClass295

12.3FromDistancestoPropensitiesandClassifications297

12.4ClassificationPerformanceofDiscriminantAnalysis300

12.5PriorProbabilities301

12.6ClassifyingMoreThanTwoClasses303

Example3:MedicalDispatchtoAccidentScenes303

12.7AdvantagesandWeaknesses306 Problems307

13Generating,Comparing,andCombiningMultipleModels

13.1Ensembles311

WhyEnsemblesCanImprovePredictivePower312

SimpleAveragingorVoting313

Bagging314

Boosting315

Stacking316

AdvantagesandWeaknessesofEnsembles317

13.2AutomatedMachineLearning(AutoML)317

AutoML:ExploreandCleanData317

AutoML:DetermineMachineLearningTask318

AutoML:ChooseFeaturesandMachineLearningMethods318

AutoML:EvaluateModelPerformance320

AutoML:ModelDeployment321

AdvantagesandWeaknessesofAutomatedMachineLearning322

13.3Summary322 Problems323

PARTVINTERVENTIONANDUSERFEEDBACK

14Interventions:Experiments,UpliftModels,andReinforcementLearning327

14.1Introduction327

14.2A/BTesting328

Example:TestingaNewFeatureinaPhotoSharingApp329

TheStatisticalTestforComparingTwoGroups(�� -Test)329

MultipleTreatmentGroups:A/B/n Tests333

MultipleA/BTestsandtheDangerofMultipleTesting333

14.3Uplift(Persuasion)Modeling333

GettingtheData334

ASimpleModel336

ModelingIndividualUplift336

CreatingUpliftModelsin JMPPro 337

UsingtheResultsofanUpliftModel338

14.4ReinforcementLearning340

Explore-Exploit:Multi-armedBandits340 MarkovDecisionProcess(MDP)341

14.5Summary344 Problems345

PARTVIMININGRELATIONSHIPSAMONGRECORDS

15AssociationRulesandCollaborativeFiltering

15.1AssociationRules349

DiscoveringAssociationRulesinTransactionDatabases350 Example1:SyntheticDataonPurchasesofPhoneFaceplates350 DataFormat350 GeneratingCandidateRules352 TheAprioriAlgorithm353 SelectingStrongRules353 TheProcessofRuleSelection356 InterpretingtheResults358 RulesandChance359

Example2:RulesforSimilarBookPurchases361

15.2CollaborativeFiltering362 DataTypeandFormat363 Example3:NetflixPrizeContest363

User-BasedCollaborativeFiltering:“PeopleLikeYou”365 Item-BasedCollaborativeFiltering366 EvaluatingPerformance367

AdvantagesandWeaknessesofCollaborativeFiltering368 CollaborativeFilteringvs.AssociationRules369

15.3Summary370 Problems372

16ClusterAnalysis

16.1Introduction375 Example:PublicUtilities377

16.2MeasuringDistanceBetweenTwoRecords378 EuclideanDistance379 StandardizingNumericalMeasurements379 OtherDistanceMeasuresforNumericalData379 DistanceMeasuresforCategoricalData382 DistanceMeasuresforMixedData382

16.3MeasuringDistanceBetweenTwoClusters383 MinimumDistance383 MaximumDistance383

AverageDistance383

CentroidDistance383

16.4Hierarchical(Agglomerative)Clustering385

SingleLinkage385

CompleteLinkage386

AverageLinkage386

CentroidLinkage386

Ward’sMethod387

Dendrograms:DisplayingClusteringProcessandResults387

ValidatingClusters391

Two-WayClustering393

LimitationsofHierarchicalClustering393

16.5NonhierarchicalClustering:The ��-MeansAlgorithm394

ChoosingtheNumberofClusters(��)396 Problems403

PARTVIIFORECASTINGTIMESERIES

17HandlingTimeSeries

17.1Introduction409

17.2Descriptivevs.PredictiveModeling410

17.3PopularForecastingMethodsinBusiness411 CombiningMethods411

17.4TimeSeriesComponents411 Example:RidershiponAmtrakTrains412

17.5DataPartitioningandPerformanceEvaluation415

BenchmarkPerformance:NaiveForecasts417 GeneratingFutureForecasts417 Problems419

18Regression-BasedForecasting

18.1AModelwithTrend424

LinearTrend424 ExponentialTrend427 PolynomialTrend429

18.2AModelwithSeasonality430

Additivevs.MultiplicativeSeasonality432

18.3AModelwithTrendandSeasonality433

18.4AutocorrelationandARIMAModels433 ComputingAutocorrelation433

ImprovingForecastsbyIntegratingAutocorrelationInformation437 FittingARModelstoResiduals439

EvaluatingPredictability441 Problems444

19SmoothingandDeepLearningMethodsforForecasting

19.1Introduction455

19.2MovingAverage456

CenteredMovingAverageforVisualization456

TrailingMovingAverageforForecasting457

ChoosingWindowWidth(��)460

19.3SimpleExponentialSmoothing461

ChoosingSmoothingParameter �� 462

RelationBetweenMovingAverageandSimpleExponential Smoothing465

19.4AdvancedExponentialSmoothing465

SeriesWithaTrend465

SeriesWithaTrendandSeasonality466

19.5DeepLearningforForecasting470 Problems472

PARTVIIIDATAANALYTICS

20TextMining

20.1Introduction483

20.2TheTabularRepresentationofText:Document–TermMatrixand “Bag-of-Words”484

20.3Bag-of-Wordsvs.MeaningExtractionatDocumentLevel486

20.4PreprocessingtheText486 Tokenization487

TextReduction488

Presence/Absencevs.Frequency(Occurrences)489 TermFrequency-InverseDocumentFrequency(TF-IDF)489 FromTermstoTopics:LatentSemanticAnalysisandTopic Analysis490 ExtractingMeaning491

FromTermstoHighDimensionalWordVectors:Word2Vec491

20.5ImplementingMachineLearningMethods492

20.6Example:OnlineDiscussionsonAutosandElectronics492 ImportingtheRecords493

TextPreprocessingin JMP 494 UsingLatentSemanticAnalysisandTopicAnalysis496 FittingaPredictiveModel499 Prediction499

20.7Example:SentimentAnalysisofMovieReviews500 DataPreparation500 LatentSemanticAnalysisandFittingaPredictiveModel500

20.8Summary502 Problems503

21ResponsibleDataScience 505

21.1Introduction505

Example:PredictingRecidivism506

21.2UnintentionalHarm506

21.3LegalConsiderations508

TheGeneralDataProtectionRegulation(GDPR)508 ProtectedGroups508

21.4PrinciplesofResponsibleDataScience508 Non-maleficence509 Fairness509 Transparency510 Accountability511 DataPrivacyandSecurity511

21.5AResponsibleDataScienceFramework511 Justification511 Assembly512 DataPreparation513 Modeling513 Auditing513

21.6DocumentationTools514 ImpactStatements514 ModelCards515 Datasheets516 AuditReports516

21.7Example:ApplyingtheRDSFrameworktotheCOMPASExample517 UnanticipatedUses518 EthicalConcerns518 ProtectedGroups518 DataIssues518 FittingtheModel519 AuditingtheModel520 BiasMitigation526

21.8Summary526 Problems528

PARTIXCASES

22Cases 533

22.1CharlesBookClub533 TheBookIndustry533 DatabaseMarketingatCharles534 MachineLearningTechniques535 Assignment537

22.2GermanCredit541 Background541 Data541 Assignment544

22.3TaykoSoftwareCataloger545

Background545

TheMailingExperiment545 Data545

Assignment546

22.4PoliticalPersuasion548

Background548

PredictiveAnalyticsArrivesinUSPolitics548 PoliticalTargeting548 Uplift549 Data549 Assignment550

22.5TaxiCancellations552

BusinessSituation552 Assignment552

22.6SegmentingConsumersofBathSoap554

BusinessSituation554

KeyProblems554 Data555

MeasuringBrandLoyalty556

Assignment556

22.7CatalogCross-Selling557

Background557

Assignment557

22.8Direct-MailFundraising559

Background559 Data559

Assignment559

22.9TimeSeriesCase:ForecastingPublicTransportationDemand562

Background562

ProblemDescription562 AvailableData562

AssignmentGoal562 Assignment563 TipsandSuggestedSteps563

22.10LoanApproval564 Background564

RegulatoryRequirements564 GettingStarted564 Assignment564

FOREWORD

WhenIbeganmycareerbackinthelastcentury,mostcorporatecomputingtookplaceon mainframecomputers,datawasscarce,organizationswerefarmorehierarchical,andmanagerialdecision-makingwasoftendrivenbytheloudestpersonintheroomorthe“golden gut”ofanexperiencedexecutive.Bycontrast,today’sbusinessworldfeaturesawidevariety ofdigitallyconnectedprofessionalswhointeractwiththeircustomersandtheircolleagues throughsoftwareapplications(manyofthemweb-andcloud-based),remarkablypowerfulpersonalcomputers,andalways-connectedsmartphones.Dataiseverywhere,though usefuldataisoftenstillelusive.Andmoreandmoreofthesystemsthatcompaniesand individualsrelyuponareutilizingtechniquesfrommachinelearningtodeliverdata-driven insights,makepredictions,anddrivedecisionmaking.

Forthepastdecade,Ihavebeenteachingcoursesinmachinelearningandpredictive analyticstobusinessstudentsattheUniversityofSanFrancisco.Mystudentshavea widevarietyofacademicbackgroundsandprofessionalinterests.Mygoalistoprepare themforcareersinthisrapidlyevolving,digitallyenabled,andincreasinglydata-and algorithmically-drivenbusinessworld.

Iwasfortunateenoughtofind DataMiningforBusinessAnalytics:Concepts, Techniques,andApplicationswithJMPPro severalyearsago.Thisbookprovidesaclear roadmaptothefundamentalsofmachinelearningaswellasanumberofpathwaystoexploreavarietyofspecificmachinelearningmethodsforbusinessanalyticsincludingprediction,classification,andclustering.Inaddition,thistextbookandtheJMPProsoftware combinetoprovideagreatplatformforinteractivelearning.ThetextbookutilizestheJMP Prosoftwaretoillustratemachinelearningfundamentals,exploratorydataanalysismethods,anddatavisualizationconcepts,andabroadrangeofsupervisedandunsupervised machinelearningmethods.Thebookalsoprovidesexercisesthatalsoenableyoutoutilize JMPProtolearnandmastermachinelearningtechniques.

IwasveryexcitedwhenIlearnedthatthenexteditionofthisbookwasreadytobe released.Nowentitled MachineLearningforBusinessAnalytics:Concepts,Techniques, andApplicationswithJMPPro,this2ndeditionisbasedonthemostrecentversionof theJMPProsoftware,andboththetextandthesoftwarehavebeensignificantlyexpanded andupdated.Thisneweditionincludesallthefirsteditionmaterial(supervisedlearning, unsupervisedmethods,visualization,andtimeseries),aswellasanumberofnewtopics: recommendationsystems,textmining,ethicalissuesindatascience,deeplearning,and interventionsandreinforcementlearning.

AlongwiththeJMPProsoftware,thisbookwillprovideyouwithafoundationofknowledgeaboutmachinelearning.Itslessonsandinsightswillserveyouwellintoday’sdynamic anddata-intensivebusinessworld.Welcomeaboard!

PREFACE

Thistextbookfirstappearedinearly2007andhasbeenusedbynumerousstudents andpractitionersandinmanycourses,includingourownexperienceteachingthismaterialbothonlineandinpersonformorethan15years.Thefirstedition,basedon theExceladd-inAnalyticSolverDataMining(previouslyXLMiner),wasfollowed bytwomoreAnalyticSolvereditions,aJMPPro® edition,twoReditions,aPython edition,aRapidMineredition,andnowthissecondJMPProedition,withitscompanion website, www.jmp.com/dataminingbook.JMPProisadesktopstatisticalpackagefrom JMPStatisticalDiscoverythatrunsnativelyonMacandWindowsmachines.1

ThefirstJMPProeditionwasthefirsteditiontofullyintegrateJMPPro.Asinthe previousJMPedition,thefocusinthisneweditionisonmachinelearningconceptsand howtoimplementtheassociatedalgorithmsinJMPPro.Allexamples,specialtopicsboxes, instructions,andexercisespresentedinthisbookarebasedon JMPPro 17,theprofessional versionofJMP,whichhasaricharrayofbuilt-intoolsforinteractivedatavisualization, analysis,andmodeling.2

ForthisnewJMPProedition,anewco-author,MuralidharaAnandamurthy,comeson boardbringingextensiveexperienceinanalyticsanddatascienceatGenpact,Target,and Danske,andasamemberoftheJMPAcademicTeam.

TheneweditionprovidessignificantupdatesbothintermsofJMPProandintermsof newtopicsandcontent.Inadditiontoupdatingsoftwareroutinesandoutputsthathave changedorbecomeavailablesincethefirstedition,thiseditionalsoincorporatesupdates andnewmaterialbasedonfeedbackfrominstructorsteachingMBA,MS,undergraduate, diploma,andexecutivecourses,andfromtheirstudents.Importantly,thiseditionincludes severalnewtopics:

∙ Anewchapteron ResponsibleDataScience (Chapter21)coveringtopicsoffairness, transparency,modelcardsanddatasheets,legalconsiderations,andmore,withanillustrativeexample.

∙ Adedicatedsectionon deeplearning inChapter11.

∙ Anewchapteronrecommendations,coveringassociationrulesandcollaborativefiltering(Chapter15).

∙ AnewchapteronTextMiningcoveringmainapproachestotheanalysisoftextdata (Chapter20).

∙ The PerformanceEvaluation expositioninChapter5wasexpandedtoincludefurther metrics(precisionandrecall,F1).

1JMPStatisticalDiscoveryLLC,100SASCampusDriveCary,NC27513.

2See https://www.jmp.com/pro

∙ Anewchapteron Generating,Comparing,andCombiningMultipleModels (Chapter13)thatcoversensemblesandAutoML.

∙ Anewchapterdedicatedto InterventionsandUserFeedback (Chapter14)thatcovers A/Btests,upliftmodeling,andreinforcementlearning.

∙ Anewcase(LoanApproval)thattouchesonregulatoryandethicalissues.

Anoteaboutthebook’stitle:Thefirsttwoeditionsofthebookusedthetitle Data MiningforBusinessIntelligence.Businessintelligencetodayrefersmainlytoreporting anddatavisualization(“whatishappeningnow”),whilebusinessanalyticshastakenover the“advancedanalytics,”whichincludepredictiveanalyticsanddatamining.Latereditionswerethereforerenamed DataMiningforBusinessAnalytics.However,therecentAI transformationhasmadetheterm machinelearning morepopularlyassociatedwiththe methodsinthistextbook.Inthisnewedition,wethereforeusetheupdatedterms Machine Learning and BusinessAnalytics

SincetheappearanceofthefirstJMPProedition,thelandscapeofthecoursesusingthetextbookhasgreatlyexpanded:whereasinitiallythebookwasusedmainlyin semester-longelectiveMBA-levelcourses,itisnowusedinavarietyofcoursesinbusinessanalyticsdegreesandcertificateprograms,rangingfromundergraduateprogramsto postgraduateandexecutiveeducationprograms.Coursesinsuchprogramsalsovaryin theirdurationandcoverage.Inmanycases,thistextbookisusedacrossmultiplecourses. Thebookisdesignedtocontinuesupportingthegeneral“predictiveanalytics”or“data mining”courseaswellassupportingasetofcoursesindedicatedbusinessanalytics programs.

Ageneral“businessanalytics,”“predictiveanalytics,”or“datamining”course,common inMBAandundergraduateprogramsasaone-semesterelective,wouldcoverPartsI–III, andchooseasubsetofmethodsfromPartsIVandV.Instructorscanchoosetousecasesas teamassignments,classdiscussions,orprojects.Foratwo-semestercourse,PartVIImight beconsidered,andwerecommendintroducingthenewPartVIII(DataAnalytics).

Forasetofcoursesinadedicatedbusinessanalyticsprogram,hereareafewcourses thathavebeenusingourbook:

PredictiveAnalytics—SupervisedLearning: Inadedicatedbusinessanalyticsprogram, thetopicofpredictiveanalyticsistypicallyinstructedacrossasetofcourses.Thefirst coursewouldcoverPartsI–III,andinstructorstypicallychooseasubsetofmethods fromPartIVaccordingtothecourselength.Werecommendincluding“PartVIII: DataAnalytics.”

PredictiveAnalytics—UnsupervisedLearning: Thiscourseintroducesdataexploration andvisualization,dimensionreduction,miningrelationships,andclustering(PartsII andVI).IfthiscoursefollowsthePredictiveAnalytics:SupervisedLearningcourse, thenitisusefultoexamineexamplesandapproachesthatintegrateunsupervisedand supervisedlearning,suchasthenewparton“DataAnalytics.”

ForecastingAnalytics: Adedicatedcourseontimeseriesforecastingwouldrelyon PartVII.

AdvancedAnalytics: Acoursethatintegratesthelearningsfrompredictiveanalytics (supervisedandunsupervisedlearning)canfocusonPartVIII:DataAnalytics,where socialnetworkanalyticsandtextminingareintroduced,andresponsibledatascience isdiscussed.SuchacoursemightalsoincludeChapter13,Generating,Comparing,

andCombiningMultipleModelsfromPartIV,aswellasPartV,whichcoversexperiments,uplift,andreinforcementlearning.Someinstructorschoosetousethecases (Chapter22)insuchacourse.

Inallcourses,westronglyrecommendincludingaprojectcomponent,wheredata areeithercollectedbystudentsaccordingtotheirinterestorprovidedbytheinstructor (e.g.,fromthemanymachinelearningcompetitiondatasetsavailable).Fromourexperienceandotherinstructors’experience,suchprojectsenhancethelearningandprovidestudentswithanexcellentopportunitytounderstandthestrengthsofmachinelearningandthe challengesthatariseintheprocess.

GALIT SHMUELI,PETER BRUCE,MIA STEPHENS,MURALIDHARA ANANDAMURTHY, AND NITIN PATEL 2022

ACKNOWLEDGMENTS

Wethankthemanypeoplewhoassistedusinimprovingthebookfromitsinceptionas Data MiningforBusinessIntelligence in2006(usingXLMiner,nowAnalyticSolver),itsreincarnationas DataMiningforBusinessAnalytics,andnow MachineLearningforBusiness Analytics,includingtranslationsinChineseandKoreanandversionssupportingAnalytic SolverDataMining,R,Python,RapidMiner,andJMP.

AnthonyBabinec,whohasbeenusingearliereditionsofthisbookforyearsinhisdata miningcoursesatStatistics.com,provideduswithdetailedandexpertcorrections.Dan ToyandJohnElderIVgreetedourprojectwithearlyenthusiasmandprovideddetailedand usefulcommentsoninitialdrafts.RaviBapna,whousedanearlydraftinadatamining courseattheIndianSchoolofBusiness,andlateratUniversityofMinnesota,hasprovided invaluablecommentsandhelpfulsuggestionssincethebook’sstart.

Manyoftheinstructors,teachingassistants,andstudentsusingearliereditionsofthe bookhavecontributedinvaluablefeedbackbothdirectlyandindirectly,throughfruitful discussions,learningjourneys,andinterestingdataminingprojectsthathavehelpedshape andimprovethebook.TheseincludeMBAstudentsfromtheUniversityofMaryland,MIT, theIndianSchoolofBusiness,NationalTsingHuaUniversity,andStatistics.com.Instructorsfrommanyuniversitiesandteachingprograms,toonumeroustolist,havesupported andhelpedimprovethebooksinceitsinception.

KuberDeokar,instructionaloperationssupervisoratStatistics.com,hasbeenunstinting inhisassistance,support,anddetailedattention.WealsothankAnujaKulkarni,Poonam Tribhuwan,andShwetaJadhav,assistantteachers.ValerieTroianohasshepherdedmany instructorsandstudentsthroughtheStatistics.comcoursesthathavehelpednurturethe developmentofthesebooks.

Colleaguesandfamilymembershavebeenprovidingongoingfeedbackandassistance withthisbookproject.VijayKambleatUICandTravisGreeneatNTHUhaveprovided valuablehelpwiththesectiononreinforcementlearning.BoazShmueliandRaquelleAzran gavedetailededitorialcommentsandsuggestionsonthefirsttwoeditions;BruceMcCulloughandAdamHughesdidthesameforthefirstedition.NoaShmueliprovidedcareful proofsofthethirdedition.RanShenbergeroffereddesigntips.KenStrasma,founderof themicrotargetingfirmHaystaqDNAanddirectoroftargetingforthe2004Kerrycampaignandthe2008Obamacampaign,providedthescenarioanddataforthesectionon upliftmodeling.

MariettaTretteratTexasA&Msharedcommentsandthoughtsonthetimeserieschapters,andStephenFewandBenShneidermanprovidedfeedbackandsuggestionsonthedata visualizationchapterandoveralldesigntips.

SusanPalocsayandMargretBjarnadottirhaveprovidedsuggestionsandfeedbackon numerousoccasions.WealsothankCatherinePlaisantattheUniversityofMaryland’s Human–ComputerInteractionLab,whohelpedoutinamajorwaybycontributingexercises

andillustrationstothedatavisualizationchapter.GregoryPiatetsky-Shapiro,founderof KDNuggets.com,wasgenerouswithhistimeandcounselintheearlyyearsofthisproject.

WethankcolleaguesattheSloanSchoolofManagementatMITfortheirsupportduring theformativestageofthisbook—DimitrisBertsimas,JamesOrlin,RobertFreund,Roy Welsch,GordonKaufmann,andGabrielBitran.Asteachingassistantsforthedatamining courseatSloan,AdamMersereaugavedetailedcommentsonthenotesandcasesthatwere thegenesisofthisbook,RomyShiodahelpedwiththepreparationofseveralcasesand exercisesusedhere,andMaheshKumarhelpedwiththematerialonclustering.

ColleaguesattheUniversityofMaryland’sSmithSchoolofBusiness:ShrivardhanLele, WolfgangJank,andPaulZantekprovidedpracticaladviceandcomments.WethankRobert WindleandUniversityofMarylandMBAstudentsTimothyRoach,PabloMacouzet,and NathanBirckheadforinvaluabledatasets.WealsothankMBAstudentsRobWhitenerand DanielCurtisfortheheatmapandmapcharts.

AnandBodapatiprovidedbothdataandadvice.JakeHofmanfromMicrosoftResearch andSharadBorleassistedwithdataaccess.SureshAnkolekarandMayankShahhelped developseveralcasesandprovidedvaluablepedagogicalcomments.VinniBhandarihelped writetheCharlesBookClubcase.

WewouldliketothankMarvinZelen,L.J.Wei,andCyrusMehtaatHarvard,aswell asAnilGoreatPuneUniversity,forthought-provokingdiscussionsontherelationshipbetweenstatisticsanddatamining.OurthankstoRichardLarsonoftheEngineeringSystems Division,MIT,forsparkingmanystimulatingideasontheroleofdatamininginmodeling complexsystems.Overtwodecadesago,theyhelpedusdevelopabalancedphilosophical perspectiveontheemergingfieldofmachinelearning.

WethankthefolksatWileyforthissuccessfuljourneyofnearlytwodecades.Steve QuigleyatWileyshowedconfidenceinthisbookfromthebeginning,helpedusnavigate throughthepublishingprocesswithgreatspeed,andtogetherwithCurtHinrichs’sencouragementandsupporthelpedmakethisJMPPro® editionpossible.JonGurstelle,Kathleen Pagliaro,AllisonMcGinniss,SariFriedman,andKatrinaMacedaatWiley,andShikha PahujafromThomsonDigital,wereallhelpfulandresponsiveaswefinalizedthefirstJMP Proedition.BrettKurzmanhastakenoverthereinsandisnowshepherdingtheproject. BeckyCowan,SarahLemore,andKavyaRamugreatlyassistedusinpushingaheadand finalizingthisnewJMPProedition.WearealsoespeciallygratefultoAmyHendrickson, whoassistedwithtypesettingandmakingthisbookbeautiful.

Finally,we’dliketothankthereviewersofthefirstJMPProeditionfortheirfeedback andsuggestions,andmembersoftheJMPDocumentation,EducationandDevelopment teams,fortheirsupport,patience,andresponsivenesstoourendlessquestionsandrequests. WethankL.AllisonJones-Farmer,MariaWeese,IanCox,DiMichelson,MarieGaudard, CurtHinrichs,RobCarver,JimGrayson,BradyBrady,JianCao,ElizabethClaassen,Peng Liu,ChrisGotwalt,RussWolfinger,andFangChen.Mostimportant,wethankJohnSall, whoseinnovation,inspiration,andcontinueddedicationtoprovidingaccessibleanduserfriendlydesktopstatisticalsoftwaremadeJMP,andthisbook,possible.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
Instant Download Machine learning for business analytics: concepts, techniques and applications with by Education Libraries - Issuu