
4 minute read
TALKING AI
Spotting corporate trouble before the rot sets in
Recent research shows that a RegTech system homing in on email communications can identify the early signs of a company heading for hot water. Our columnist picks up the trail
ELISABETTA BASILICO Independent consultant
In 2016, Wells Fargo fired more than 5,000 employees over unlawful sales practices, AKA fake accounts. These were low-wage workers under a lot of pressure from their supervisors who said they might lose their jobs. The employees did it to get bonuses through an incentive programme at the bank, which encouraged them to open more accounts. Customers, in turn, were hoodwinked into paying unnecessary fees. A recent paper published in the Journal of Financial Data Science studies a possible early-warning system based on evidence that ‘fraudulent activity among corporate employees tends to occur in social network clusters’. We can think of it as a form of RegTech.
The methodology used is natural language processing applied to corporate emails content, as well as the network dynamics of the sender/receiver. The sample data used to research predictive signals and metrics is made up of 113,000 emails sent by 144 Enron employees and 1,300 articles that appeared on PR Newswire from January 2000 to December 2001.
Where did the researcher find this data? The Federal Energy Regulatory Commission made it available during its investigation into Enron’s practices.
It is a unique dataset. Compared with tweets, which are a popular alternative dataset studied in academia, emails can capture longer-term trends. Compared to 10-Ks – another popular alternative dataset – emails have timelier information.
THE RESEARCH QUESTIONS STUDIED The researchers studied four research questions: 1) Does the sentiment conveyed by employee communications contain valuerelevant information?; 2) Is this information conveyed in a timely manner so that it leads to subsequent stock returns?; 3) Do other structural characteristics of internal employee emails (eg, email length, email volume, or email network characteristics) also contain value-relevant information?; 4) What tends to contain more valuerelevant information, the actual verbal content or structural characteristics of employee emails?
THE METRICS STUDIED The researchers were particularly interested in three elements: content (to extract the sentiment based on two dictionaries for word classification from the Harvard Inquirer and the Loughran and McDonald sentiment word lists); length of emails; and networks inside the firm.
THE RESEARCH FINDINGS The authors observed trending patterns both in sentiment and email length. Specifically they noticed the positive sentiment from both emails and news articles declined into the year of 2001. Likewise, email length declined by 50% of the initial average word count of 400, into 2001. As far as the predictive power of these two measures on future stock returns, the researchers found that both variables have predictive power with a one standard deviation decrease in net sentiment from emails associated with a 4.5% decline in the company stock.
However, if both are taken into consideration, sentiment is no longer statistically significant but the length of emails takes over in explaining the relation with future stock returns. In fact, for every 20-character decline in email length there was a 1.17% drop in future stock returns. The intuition behind this is that, as corporate risks and malfeasance increase, employees are less likely to share details in corporate emails. The dominance of email length as a predictor holds when the researcher added an additional variable: the sentiment of news articles.
The researchers went a step further and examined whether the structure of communication, ie the network of email communications, among top officers of Enron offers revealing insights into management behaviour in times of impending crisis. From the data available, they observed a marked increase in connectedness from Q4 2000 to Q4 2001, that email connectivity among top officers became more intense, and in relation to the earlier results, emails were shorter. But what about the usage of vocabulary in these emails? By using a technique similar to Google Trends, they found an increase in the use of the word ‘losses’ and a significant decrease of the word ‘profits’.
This study found that it is possible to build a relatively simple risk management system to detect increased risks, including possible fraudulent behaviour, at the corporate level. Their findings show that email structural characteristics, such as their length, is a stronger predictor compared with the content of the email itself.
With 10%-15% of the staff in financial institutions dedicated to compliance, RegTech, the management of regulatory processes within the financial industry through technology, is a promising field within the broader FinTech sector. Indeed, according to a MediciInsights report, RegTech is a $100 billion opportunity market. In the words of the authors: ‘Early detection and prevention is better than a cure.’
Elisabetta Basilico is a consultant and academic who specialises in turning research papers into investment strategies. Her area of expertise is in asset allocation and quantitative investing.