Predicting Corporate Revenue Underperformance using GAAP Reporting Metrics Advanced Business Analytics Emir Beriker / Henry Karongo 3 March 2014
Research Problem
We set out to develop a model that would determine whether companies would miss their revenue estimates based on GAAP metrics • Using historical data, we experimented with several data mining algorithms and different combinations of input variables derived from GAAP metrics. • The model was intended to detect whether the sum of a company’s next quarter and expected following quarter revenue would fall below analyst expectations
We used historical data from several publiclytraded companies to conduct our experiments • Data related to companies active in the “Communication Equipment” industry (as classified by Capital IQ) • Each company had to have exceeded a $1B market cap at some point within the prior five years.) • The data covered 10 years of quarterly information for each company included in the dataset. • We assumed that the variables as formulated in the dataset would be representative of any new data that would be run through the model in an assumed live setting. • Special thanks to Mark Hamel for furnishing us with the data used in this project.
Data Cleaning and Exploration
Our analysis began by examining the original dataset… • 1,066 records with 120 classified as ‘1’s on the output field and 946 classified as ‘0’s (11.25% of records met the success case) • Our analysis focused on 18 fields within the dataset, broadly classified as follows: Variable Classification
Number of Fields
Difference between actual vs. expected revenue
6
Change in accounts receivable/revenue
2
Change in deferred revenue
1
Difference between actual vs. expected EBIT margin
2
Change in goodwill/assets
5
Projected vs. lagging revenue growth
1
Projected vs. lagging EBIT margin
1
… which we cleaned by eliminating all rows with missing data values • Rationale: we had no prior expectations or assumptions about what typical values for any given field in the dataset would be • Aim was to avoid biasing our results unknowingly by assuming a mean/median/mode value to replace missing values • Some problems we identified with the raw data: • Missing data (i.e. missing variables) • Missing values (#DIV/0!)
• We handled these using XLMiner’s Missing Data Handling tools • Reduced number of records available for analysis to 586
Data exploration was our first attempt at understanding the dataset • Focus of data exploration was 1. Understanding the statistical properties of the individual variables 2. Searching for systematic differences in potential input variables when filtered on the output variable. • Intended to potentially help with variable selection
Only 13.65% of the remaining records were classified in the success case
Most of the variables were normally distributed around a mean of zero Most variables had a few outliers
Revs Q-1 and Revs Q2 for the success case tend to have more negative values than when output = 0‌
‌ but the tendency starts to break down for Revs Q-3 and Revs Q-4, which occur further in the past
We detected other differences in how variables were distributed when filtered by the success case Few positive outlier values on the success case for accounts receivables/revenue and deferred revenue/revenue
In general, records in the success case tend to have fewer outliers
Correlations between the variables were generally low, offering little guidance for input variable selection
Only two sets of variables with absolute correlations > 0.4
Data Mining Approaches
We experimented with five classification algorithms • Problem task -> Classification • Output variable was binary
• Therefore we used: • • • • •
Classification and regression trees (CART) Naïve Bayes k-nearest neighbors (kNN) Neural networks Logistic regression
We used an iterative process to select variables for inclusion into our classification models • The variables didn’t speak for themselves. • We also had no preconceptions about which variables were potentially strong predictors of the success case. • Our iterative approach: Include all dataset variables in a given classification model
Add/remove variables using trial and error, while monitoring error rates and lift ratios
Keep best results from a model
Use the best combinations of variables in other algorithms
Performance Metrics • Lift Ratio: the model’s performance in identifying “1”s relative to a random selection model • We focused on the lift ratio of the 1st decile of records exposed to the model
• Overall Error Rate: The model’s overall success of classifying records as actual “0”s and “1”s • Error Rate on predicted “1”s: Gives the success rate of identifying the “1”s. • Tells us how well our algorithm does in predicting cases where companies are anticipated to miss revenue estimates
Naïve Bayes algorithm • Ran 9 iterations of the model using different combinations of variables • Academic version of XLMiner cannot apply Naïve Bayes on variables with more than 100 discrete values • Created binned variables of 100 equal intervals, where data took the mid-value of the bin they were assigned to.
• Best run (based on validation lift ratio); Variables Revs Q-1,Revs Q-2, AR QOQ, Def YOY, Growth Dif, Binned EBIT% Q-1
Training Errors Overall
5.7%
“1”s
28.9%
• High error rates on predicting “1”s.
Validation Errors Overall
16.7%
“1”s
85.7%
Lift Ratios Training
7.2x
Validation
2.0x
Classification and regression trees (CART) • We used our insights from the Naïve Bayes tests to guide our variable selection and to benchmark results from the CART tests • We ran 4 iterations of the model, all of which returned identical results Variables Revs Q-1,Revs Q-2, AR QOQ, Def YOY, Growth Dif, EBIT% Q-1
Training Errors Overall 0.0%
“1”s 0.0%
Validation Errors Overall
“1”s
17.1% 100.0%
Test Errors Overall 10.3%
“1”s 100.0%
Lift Ratios Training
Validation
7.7x
1.0x
Test 1.0x
• Concluded that CART was likely not an effective algorithm for this task
k-nearest neighbors • Similar approach to CART; variable selection informed by results from other model runs. • Ran 6 iterations of the model with a variety of variable combinations • Best run (based on validation lift ratio) (best k = 3): Variables Revs Q-1, AR QOQ, Def YOY, GW Q-1, Growth Dif, EBIT% Dif
Training Errors Overall 9.9%
“1”s
Validation Errors Overall
68.9%
• High error rates on predicting “1”s.
14.5%
“1”s 80.0%
Lift Ratios Training 4.4x
Validation 2.6x
Logistic regression • Correlation matrix did not expose any significant relationships between the variables, so had to experiment with variable combinations as with the other algorithms • Ran 7 different model formulations using the algorithm • Best run (based on validation lift ratio): Variables Revs Q-1, AR QOQ, Def YOY, EBIT% Q-1, EBIT% Q-2, GW Q-1, GW Q-2, Growth Dif
Training Errors Overall 13.1%
“1”s 97.8%
Validation Errors Overall 15.4%
• Very high error rates on predicting “1”s.
“1”s 97.1%
Lift Ratios Training 3.12x
Validation 3.19x
Neural nets • Ran 7 different model formulations using the algorithm • Adding hidden layers and nodes did not significantly improve the predictive power of the model • Best run: (based on validation lift ratio) Variables Revs Q-1, AR QOQ, Def YOY, EBIT% Q-1, EBIT% Q-2, GW Q-1, GW Q-2, Growth Dif
Training Errors Overall
12.8%
“1”s
100%
Validation Errors Overall
15.0%
“1”s
100%%
• Extremely high error rates on predicting “1”s.
Lift Ratios Training
2.7x
Validation
3.8x
On average, logistic regression and kNN produced the best lift ratios on validation datasets* Average Lift Ratio: Training vs. Validation Data
* On 1st decile of records
All algorithms had overall validation error rates of between 14% and 18% Average Overall Validation Error Rates
All algorithms were poor at correctly predicting “1�s on validation/test data Average Error Rate: Predicted Success Cases on Validation Data
Certain variables appeared consistently in the best performing models across algorithms Revs Revs AR Def Growth EBIT% EBIT% GW Q- GW Q- EBIT% Q-1
Q-2
QOQ
YOY
Dif
Q-1
Q-2
1
2
Dif
Naïve Bayes
CART
K-NN
Logistic Regression
Neural nets
Summary Results • Naïve Bayes and kNN do best at predicting “1”s (lowest error rates on predicting successes on validation data) • Best validation lift ratio came from a neural net model, which was, overall, poor at predicting successes • Best predictors to find “1”s: Revs Q-1, AR QOQ, Def YOY, EBIT% Q-1, GW Q-1, Growth Dif. • These variables appeared consistently across our best models in each algorithm test
Future work • Experimenting with a larger dataset with a larger proportion of success cases • We used a dataset with 586 records where ~14% met the success case.
• Using additional backward-looking data for the variables that performed well across algorithms • Example: ∆(accts. rec/revenue), ∆(deferred revenue/revenue) for other prior periods
Overall Learnings • Real world data gathering and cleaning is challenging • Need to make decisions without necessarily having outside guidance • Choice to eliminate all records with errors
• Data exploration may not always yield information about the potential prediction power of variables • No systematic differences detected between records meeting the success case and those that did not
• Frequently, variable selection comes down to trial and error