7 minute read

Big Data Analytics

BIG DATA ANALYTICS

Although big data analytics is a hot topic in many sectors, companies from banking to pharmaceuticals have only recently started to realise the value locked in their data. The Oil and Gas industry is not exempt from this, and it’s safe to say that it is no stranger to large amounts of process data.

For many years data historians have worked away collecting measurements from instrumentation around oil and gas processes globally. The data is large and complex with many interactions and correlations between variables which are not easily interpreted.

What is Big Data?

Put simply big data is a collection of information gathered through various means which is so large and complex that significant benefits can only be extracted through the application of computational algorithms. Indeed the exponential development of computational processing power and storage is the key driving force behind the wider application of big data analytics.

The Status Quo

It is a fact that contained within large historical datasets is valuable information and knowledge that, when coupled with domain expertise, can be used to achieve a variety of benefits including: more efficient maintenance, scheduling, improved performance, reduced downtime and maximised margins.

However, this valuable information is hidden in the quantity of data and further compounded by dataset issues such as the noise from unrepresentative operation i.e. process upsets, malfunctioning instrumentation etc. Hence, to simplify analytics, operators usually focus on time series trending and first order effects.

Upstream production involves many complex processes and numerous properties are constantly monitored including temperature, pressure, flow rates, GOR. As there are numerous processes occurring simultaneously, it can sometimes be difficult to pinpoint exactly where an error is originating from and this is where data analytics can step in.

To give an example, a client approached Intertek as they were having intermittent problems with their upstream process. The issue was resulting in damage to certain seals, and was often a cause for shutdown. To identify where the problem was occurring, Intertek received data from around the installation. The data contained information from periods of normal operation as well as during process disruptions. Intertek undertook data analysis with two primary objectives:

1. Identify root cause for process disruption

2. Establish a method for more efficient process monitoring

To promote efficiency, it was necessary to automate the process to monitor the performance of every process unit through the combinatorial analysis of all key instrumentation measurements. In this example the dataset contained measurements from over 150 process sensors. Taking the data every 2 minutes over 2 months gives 43,000 time periods and therefore 6.5 million data points! The need to reduce the complexity of the dataset into easy to visualise trends and correlations which focus on the key parameters and add value to the process will now be demonstrated for this example.

Data Purification

To maximise value from data it is key to understand outliers and ‘clean’ the data set before moving forward to any additional analysis such as process modelling, optimisation etc. Data purification can be a lengthy process and, it is our experience that, a large portion of all big data analytics activities is around screening and pre-processing the data to ensure conclusions are valid.

Intertek uses proprietary software (Interpret) in order to undertake big data analytics activities. interpret allows the user to look at the data and visualise features such as: missing data, constant values, faulty sensors, questionable values etc. Interpret plots can be used to identify process outliers by looking at combinations of process variables and latent variables. Any questionable data is highlighted and assessed against underlying trends and effects.

Correlating regions

A key piece of information for any process plant is to have an in depth understanding of those properties that correlate. There will be some properties within the data where you would expect correlation, however there will be other regions where correlation was not anticipated and those are the regions that require further exploration.

To continue with our previous example, where our client was looking to understand the root cause for failure; the below heat map shows the correlation of every variable with every other variable within that dataset. The extent of correlation is indicated by a colour scale, where red indicates a strong positive correlation and blue a strong negative correlation.

What was visible from the heatmap, was that there were a number of regions showing strong correlation. Some of those correlations could be easily explained as they came from the same unit, for example heat sensors around one lubricating unit. But there were other strong correlations that required further examination and principle component analysis was undertaken on those regions.

Principle Component Analysis

Principle component analysis (PCA) is a very powerful multi variate statistical procedure, that is used to reduce data down to its principle components. A principle component defines the variance between the data, with the first principle component defining the maximum amount of variance within the data set, the second principle component defines the second most amount of variance and so on.

Using PCA, Interpret is able to break down and understand the underlying trends in the data which cause features such as operating regions and determine the key instruments which have the largest impact. We do this by analysing not just the most obvious (first order) relationships but also other relationships caused by combinations of variables.

For the upstream study we have been discussing, what emerged on interrogating the data with PCA was that a certain pump was the root cause of failure. The pump in question was designed to be a multiphase pump and processed both gas and liquids at the same time. The pump had a certain operational region, but it was found that it was sometimes operating outside of that region which caused increased vibration within the unit. This increased vibration had a knock on effect and damaged the seals within, this repetitive damage eventually resulted in seal failure which would trigger a plant shut down.

PCA enables this information to be expressed visually, as shown in the figure below. As can be seen from the figure, Interpret identified distinct regions where the pump was operating. These processing regions were identified on data interrogation, which allowed for the identification of root causes of failure. These regions could also be harnessed to monitor the performance of the system going forward and prevent future shutdowns.

Identified operating regions for pump:

Green – Good, Pump operating within boundaries

Grey – Warning, pump moving outside of boundary seal damage may ensue

Red – Stop, pump operating outside of boundary, seal failure may be imminent

Benefits of Data Analytics

Using advanced data analytics to expose the underlying structure of complex datasets enables very large and complicated datasets to be broken down and the valuable information held within extracted. By uncovering underlying trends and correlations the operator can extract maximum value from a dataset.

It is also critical to analyse the dataset as a whole, and not just to focus on the simple first order relationships, but to examine the more detailed effects, such as those caused by combinations of many different variables. Interpret uses advanced mathematical and statistical algorithms to analyse the effect of other variables that do not have obvious or intuitive first order effects on the process.

Ultimately this gives the operator a much greater understanding of the process and how to maximise performance. Effective use of this data analysis tool can help mitigate/minimise potential future problems that impact on margins such as shutdowns and deferrals.

This article is from: