Predicting Corporate Revenue Underperformane Using GAAP Reporting Metrics by Emir

Predicting Corporate Revenue Underperformance using GAAP Reporting Metrics Advanced Business Analytics Emir Beriker / Henry Karongo 3 March 2014

Research Problem

We set out to develop a model that would determine whether companies would miss their revenue estimates based on GAAP metrics • Using historical data, we experimented with several data mining algorithms and different combinations of input variables derived from GAAP metrics. • The model was intended to detect whether the sum of a company’s next quarter and expected following quarter revenue would fall below analyst expectations

We used historical data from several publiclytraded companies to conduct our experiments • Data related to companies active in the “Communication Equipment” industry (as classified by Capital IQ) • Each company had to have exceeded a $1B market cap at some point within the prior five years.) • The data covered 10 years of quarterly information for each company included in the dataset. • We assumed that the variables as formulated in the dataset would be representative of any new data that would be run through the model in an assumed live setting. • Special thanks to Mark Hamel for furnishing us with the data used in this project.

Data Cleaning and Exploration

Our analysis began by examining the original dataset… • 1,066 records with 120 classified as ‘1’s on the output field and 946 classified as ‘0’s (11.25% of records met the success case) • Our analysis focused on 18 fields within the dataset, broadly classified as follows: Variable Classification

Number of Fields

Difference between actual vs. expected revenue

Change in accounts receivable/revenue

Change in deferred revenue

Difference between actual vs. expected EBIT margin

Change in goodwill/assets

Projected vs. lagging revenue growth

Projected vs. lagging EBIT margin

… which we cleaned by eliminating all rows with missing data values • Rationale: we had no prior expectations or assumptions about what typical values for any given field in the dataset would be • Aim was to avoid biasing our results unknowingly by assuming a mean/median/mode value to replace missing values • Some problems we identified with the raw data: • Missing data (i.e. missing variables) • Missing values (#DIV/0!)

• We handled these using XLMiner’s Missing Data Handling tools • Reduced number of records available for analysis to 586

Data exploration was our first attempt at understanding the dataset â&#x20AC;˘ Focus of data exploration was 1. Understanding the statistical properties of the individual variables 2. Searching for systematic differences in potential input variables when filtered on the output variable. â&#x20AC;˘ Intended to potentially help with variable selection

Only 13.65% of the remaining records were classified in the success case

Most of the variables were normally distributed around a mean of zero Most variables had a few outliers

Revs Q-1 and Revs Q2 for the success case tend to have more negative values than when output = 0â&#x20AC;Ś

â&#x20AC;Ś but the tendency starts to break down for Revs Q-3 and Revs Q-4, which occur further in the past

We detected other differences in how variables were distributed when filtered by the success case Few positive outlier values on the success case for accounts receivables/revenue and deferred revenue/revenue

In general, records in the success case tend to have fewer outliers

Correlations between the variables were generally low, offering little guidance for input variable selection

Only two sets of variables with absolute correlations > 0.4

Data Mining Approaches

We experimented with five classification algorithms • Problem task -> Classification • Output variable was binary

• Therefore we used: • • • • •

Classification and regression trees (CART) Naïve Bayes k-nearest neighbors (kNN) Neural networks Logistic regression

We used an iterative process to select variables for inclusion into our classification models • The variables didn’t speak for themselves. • We also had no preconceptions about which variables were potentially strong predictors of the success case. • Our iterative approach: Include all dataset variables in a given classification model

Add/remove variables using trial and error, while monitoring error rates and lift ratios

Keep best results from a model

Use the best combinations of variables in other algorithms

Performance Metrics • Lift Ratio: the model’s performance in identifying “1”s relative to a random selection model • We focused on the lift ratio of the 1st decile of records exposed to the model

• Overall Error Rate: The model’s overall success of classifying records as actual “0”s and “1”s • Error Rate on predicted “1”s: Gives the success rate of identifying the “1”s. • Tells us how well our algorithm does in predicting cases where companies are anticipated to miss revenue estimates

Naïve Bayes algorithm • Ran 9 iterations of the model using different combinations of variables • Academic version of XLMiner cannot apply Naïve Bayes on variables with more than 100 discrete values • Created binned variables of 100 equal intervals, where data took the mid-value of the bin they were assigned to.

• Best run (based on validation lift ratio); Variables Revs Q-1,Revs Q-2, AR QOQ, Def YOY, Growth Dif, Binned EBIT% Q-1

Training Errors Overall

5.7%

“1”s

28.9%

• High error rates on predicting “1”s.

Validation Errors Overall

16.7%

“1”s

85.7%

Lift Ratios Training

7.2x

Validation

2.0x

Classification and regression trees (CART) • We used our insights from the Naïve Bayes tests to guide our variable selection and to benchmark results from the CART tests • We ran 4 iterations of the model, all of which returned identical results Variables Revs Q-1,Revs Q-2, AR QOQ, Def YOY, Growth Dif, EBIT% Q-1

Training Errors Overall 0.0%

“1”s 0.0%

Validation Errors Overall

“1”s

17.1% 100.0%

Test Errors Overall 10.3%

“1”s 100.0%

Lift Ratios Training

Validation

7.7x

1.0x

Test 1.0x

• Concluded that CART was likely not an effective algorithm for this task

k-nearest neighbors • Similar approach to CART; variable selection informed by results from other model runs. • Ran 6 iterations of the model with a variety of variable combinations • Best run (based on validation lift ratio) (best k = 3): Variables Revs Q-1, AR QOQ, Def YOY, GW Q-1, Growth Dif, EBIT% Dif

Training Errors Overall 9.9%

“1”s

Validation Errors Overall

68.9%

• High error rates on predicting “1”s.

14.5%

“1”s 80.0%

Lift Ratios Training 4.4x

Validation 2.6x

Logistic regression • Correlation matrix did not expose any significant relationships between the variables, so had to experiment with variable combinations as with the other algorithms • Ran 7 different model formulations using the algorithm • Best run (based on validation lift ratio): Variables Revs Q-1, AR QOQ, Def YOY, EBIT% Q-1, EBIT% Q-2, GW Q-1, GW Q-2, Growth Dif

Training Errors Overall 13.1%

“1”s 97.8%

Validation Errors Overall 15.4%

• Very high error rates on predicting “1”s.

“1”s 97.1%

Lift Ratios Training 3.12x

Validation 3.19x

Neural nets • Ran 7 different model formulations using the algorithm • Adding hidden layers and nodes did not significantly improve the predictive power of the model • Best run: (based on validation lift ratio) Variables Revs Q-1, AR QOQ, Def YOY, EBIT% Q-1, EBIT% Q-2, GW Q-1, GW Q-2, Growth Dif

Training Errors Overall

12.8%

“1”s

100%

Validation Errors Overall

15.0%

“1”s

100%%

• Extremely high error rates on predicting “1”s.

Lift Ratios Training

2.7x

Validation

3.8x

On average, logistic regression and kNN produced the best lift ratios on validation datasets* Average Lift Ratio: Training vs. Validation Data

* On 1st decile of records

All algorithms had overall validation error rates of between 14% and 18% Average Overall Validation Error Rates

All algorithms were poor at correctly predicting â&#x20AC;&#x153;1â&#x20AC;?s on validation/test data Average Error Rate: Predicted Success Cases on Validation Data

Certain variables appeared consistently in the best performing models across algorithms Revs Revs AR Def Growth EBIT% EBIT% GW Q- GW Q- EBIT% Q-1

Q-2

QOQ

YOY

Dif

Q-1

Q-2

Dif

Naïve Bayes



CART



K-NN



Logistic Regression



Neural nets



Summary Results • Naïve Bayes and kNN do best at predicting “1”s (lowest error rates on predicting successes on validation data) • Best validation lift ratio came from a neural net model, which was, overall, poor at predicting successes • Best predictors to find “1”s: Revs Q-1, AR QOQ, Def YOY, EBIT% Q-1, GW Q-1, Growth Dif. • These variables appeared consistently across our best models in each algorithm test

Future work • Experimenting with a larger dataset with a larger proportion of success cases • We used a dataset with 586 records where ~14% met the success case.

• Using additional backward-looking data for the variables that performed well across algorithms • Example: ∆(accts. rec/revenue), ∆(deferred revenue/revenue) for other prior periods

Overall Learnings • Real world data gathering and cleaning is challenging • Need to make decisions without necessarily having outside guidance • Choice to eliminate all records with errors

• Data exploration may not always yield information about the potential prediction power of variables • No systematic differences detected between records meeting the success case and those that did not

• Frequently, variable selection comes down to trial and error