
Determining the Pay Period Frequency in a Wage and Hour Case, using Machine Learning (AI)
Kyle Cheek, Ph.D. Economist and Statistician
EmployStats Consulting Partners, LLC
October 30, 2025
Abstract
Real-world data enable analytics that are critical to mediation and litigation strategies. However, real-world data sometimes lack important information –perhaps because it is not captured in the source system, or perhaps because it was just not deemed critical when the data were extracted This case study highlights an innovative Machine Learning solution to one specific variety of missing information that occurs from time to time in employee payroll data – missing pay period information.
Introduction
This case study highlights such a case – a payroll data extract that includes paycheck dates but does not include pay period start and end dates. There is also no external definition of pay periods that can be used to impute pay period start and end dates. Therefore, the only way to recover the pay period information is to somehow reverse-engineer it from the information that is available. If the pay period information can be reverse-engineered, time and pay data can be linked together to enable more detailed analysis of possible pay violations.
One critical bit of information in this reverse engineering is the pay schedule. In order to accurately link time and pay data it is important to know how many workdays are included in each paycheck – pay periods that may be weekly, bi-weekly, etc. One way to find the pay schedule is to manually inspect the distribution of check dates and hope the patterns are clear enough to reveal the periodicity by inspection. The pay data in this particular case study are not that clean - pay records are sparse for some employees, there appear to be multiple pay cycles (some weekly, others bi-weekly, off-cycle checks, etc). So rather than iterate blindly until some patterns emerge, we turn to Machine Learning methods to extract pay cycle information algorithmically. Specifically, we are going to use a very well-known technique from the engineering domain, the Fourier Transform, to extract important
information about employee pay cycles, and then we are going to apply a simple clustering technique to the output from the Fourier Transforms.
Methodology
Fourier Transforms are basically just a way to identify the simple (“periodic”) waves that can be summed together to closely approximate a more complicated wave (what electrical engineers simply call the "signal"). The simple waves that are identified in the Fourier Transform often tell us about important characteristics of the more complex time-series signal. These transforms are also useful because even a lengthy, complex time series can often be well approximated by finding only a few of their most prominent simple waves.
In this particular case study, it is speculated that many of the employees are paid weekly, but that there are also some employees paid bi-weekly and semi-monthly, and even some who switched from weekly to bi-weekly within the data. So the goal of the Fourier analysis, then, is to examine the underlying sine waves to identify the employees on a weekly pay cycle versus those paid on a non-weekly schedule. If this can be accomplished, the original reverse engineering problem has been greatly simplified.
In practice, the Fourier Analysis is performed for each employee in the payroll data. Each employee’s payroll data is sorted into a list of pay dates. A measure also needs to be assigned to each pay date – in this example the total number of paid hours is used. All of the days in between pay dates are then filled in and assigned a value of 0. This results in a daily time series for each employee, with non-zero measures on days they received a paycheck, and zeroes on all of the other days. These employee-specific time series then serve as the “signal” to which the Fourier Transform will be applied. The Fourier Transform also returns a measure for each of the individual sine waves (an “amplitude” or “magnitude”). The sine waves returned by the Fourier Transform can be sorted by their amplitude to determine which are the strongest down to the weakest sine waves in the original signal.
Here's an example of a time series for someone who gets paid weekly for ~40-hour weeks with some occasional overtime:

And here is a graph of the frequencies (horizontal axis) and their associated magnitudes (vertical axis) that are returned by the Fourier transform for this particular employee. The strongest frequency is approx 0.14 - that means that 14/100 of a wave will complete in one time period. Since the time periods are days, that means that 14/100 of the strongest sine waves will occur in a day - or, one complete wave will occur exactly once a week (1/.14 = 7 days!) - exactly as expected. (The strong frequencies at ~.28 and ~.42 indicate that there are also some bi-weekly (1/.28 = 3.5 days) and thrice-weekly (1/.42 = 2.333 days) undulations in the time series - these turn out to be more of an artifact of the imputed "0" days than anything. But it turns out they are not throw-away, either.)

Similar frequency graphs (or "periodograms”) can be generated for as many employees as exist in the payroll data. And each employee’s frequency graph can be inspected to identify the strongest frequency for insight into their pay schedules. Another nice thing about these Fourier Transforms is that they are generally robust against noise - like an occasional off-cycle paycheck - which can muddy the more manual approach of visual inspection. (The occasional noise of an off-cycle check tends to be captured in the very low frequencies. Similarly, even when employees have sparse pay histories, this method tends to capture the right frequency and relegate long idle periods to low frequencies.) Not surprisingly, since most of the employees in this data are paid weekly, most of them return a graph that shows ~0.14 as their strongest frequency
What about employees that are paid bi-weekly/semi-monthly? Here’s an example of an employee who was paid weekly at first and then shifted to a bi-weekly pay cadence, with a gap of several months that further muddies their time series:

For this time series, the strongest frequency in the decomposition is ~.07. The reciprocal of .07 is ~14-15. So here we see the frequency domain fingerprint of bi-weekly/semi-monthly (14-15 day) pay patterns. Even with the weekly pay periods at the beginning of the series, the decomposition still identifies the bi-weekly cadence as the dominant frequency. And again, there are three frequencies that seem to be disproportionately dominant.

Since the top three frequencies seem to contain the essence of the different pay cadences, the top three frequencies are captured for each employee in descending order of their magnitude. The end result is a dataframe that looks something like this:
The employees could just be separated into groups according to their strongest frequency, but there appear to be some interesting groupings that involve the second and third frequencies as well. A simple agglomerative clustering can instead be used to separate the employees into groups based on all three of their top measures (see dendrogram below). Again as expected, all of the EEs with weekly pay frequencies end up clustered closely together, and all of the EEs with bi-weekly/semi-monthly pay patterns end up clustered closely together. Some additional sub-clusters also appear within the bi-weekly/semi-monthly cluster that can be used to further refine EE groupings.

The clustering algorithm assigns a “Cluster ID” to each employee that identifies the cluster to which they belong. With this final list of employees and their assigned Cluster ID, each employee’s pay frequency has been “learned” algorithmically by the combined Fourier Transform and clustering algorithms. This final result can then be used as a critical input in our original problem of reverse-engineering the unknown pay period definitions – specifically, we now know whether to match each employee’s time and pay data based on a weekly pay cycle or on a bi-weekly/semi-monthly pay schedule. Conveniently, the analysis has also identified which EEs switch from once-weekly to bi- weekly pay cycles. And with a little bit of tweaking the clustering step could almost certainly even separate the bi-weekly and semi-monthly sub-clusters.
Conclusion
It is important to recap what has been gained with this approach. The original problem was to identify the unknown pay cycles of a large group of employees using only information from their payroll data. Without a machine learning approach, an analyst might spend countless hours examining each employee’s pay history This algorithmic approach can produce a very good estimation of the same discovery in a matter of seconds. Additionally, the algorithm is not customized to any particular set
of data. The exact same code can be used on an entirely different set of payroll data the next time this problem arises. And not only can this exact same code be run in the software in which it was created, it can also be packaged up and made available for use in any of a variety of other software applications that may be used to analyze payroll data.
Future work could expand these AI solutions to other employee data challenges, making mediation and litigation strategies even sharper