Overall Learnings • Real world data gathering and cleaning is challenging • Need to make decisions without necessarily having outside guidance • Choice to eliminate all records with errors
• Data exploration may not always yield information about the potential prediction power of variables • No systematic differences detected between records meeting the success case and those that did not
• Frequently, variable selection comes down to trial and error