ISSN: 2250–3676

MAYA NAYAK* et al. [IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY

Volume - 2, Issue - 1, 38 – 48

GPAC-APSO CLUSTERING USING MODIFIED S-TRANSFORM FOR DATA MINING Maya Nayak1, Satyabrata Dash2 1

Professor, Department of IT, BPUT, Orissa, India, mayanayak3299@yahoo.com Asst.Professor, Department of IT, BPUT, Orissa, India, dash_satyabrata@yahoo.co.in

2

Abstract This paper presents a new approach for power signal time series data mining using S-transform based K-means clustering technique. Initially the power signal time series disturbance data are pre-processed through an advanced signal processing tool such as S-transform and various statistical features are extracted, which are used as inputs to the K-means algorithm for disturbance event detection. Particle Swarm Optimization (PSO) technique is used to optimize cluster centers which can be inputs to a decision tree for pattern classification. The proposed hybrid PSO-K-Means clustering technique provides accurate classification rates even under noisy conditions compared to the existing techniques, which show the efficacy and robustness of the proposed algorithm for time varying database like the power signal data.

Index Terms: K-means clustering, Time-varying power signal data, Particle swarm optimization, S-transform, decision tree, classification -------------------------------------------------------------------------- *** ------------------------------------------------------------------------1. INTRODUCTION Time series clustering has been very effective in providing pertinent and useful information in various application domains as a part of temporal data mining research. Time series data are of interest because of its pervasiveness in various areas ranging from science, engineering, business, finance, economics, healthcare, etc. The goal of clustering is to identify the structure of an unlabeled data set by objectively organizing data into homogenous groups where the within-group-object similarity is minimized and between the groups objectdissimilarity is maximized. Clustering is required when no labeled data whether is binary, categorical, numerical, textual, interval, spectral, temporal or multimedia or mixture of the above data types is available. Data are termed as static if all their feature values do not change with time or change negligibly. The bulk of clustering analysis has been applied to static data. Cluster analysis has become an important technique in exploratory data analysis, pattern recognition, machine learning, neural computing, and other engineering system studies. The clustering aims at identifying and extracting significant groups in underlying data. Unlike static data, the features of the time series comprise values that change with time. From a given unlabelled time series data streams, it is often desirable to determine the groups

of data that belong to a similar time series. The unlabelled time series data could be obtained by monitoring a process or more than a process over a certain period of time. Just like the static data clustering, time series data clustering requires a clustering algorithm or method to form clusters from a given set of unlabelled data objects and the clustering algorithm is chosen depending on the type of data available and its area of application [1-7]. Various algorithms have been developed to cluster different types of time series data. In order to apply the clustering algorithms, it is necessary to convert the time series data to the static one and modify the similarity or distance measures for clustering. This is known as feature based approach and is accomplished by processing the raw time series data using a time-frequency transform such as modified Stransform. The raw time series database comprises the power line disturbance data recorded or simulated using MATLAB software. The various power line disturbances include voltage sag, swell, impulsive, and oscillatory transients, multiple notches, momentary interruption, harmonics, and voltage flickers, etc. In order to improve the quality of electrical power supplied to customers, it is customary to continuously record the disturbance waveforms using power-monitoring instruments. Unfortunately most of these recorders rely on visual inspection of data record creating an unprecedented volume of data to be inspected by engineers. Thus an automatic recognition or

IJESAT | Jan-Feb 2012 Available online @ http://www.ijesat.org

38

ISSN: 2250–3676

MAYA NAYAK* et al. [IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY classification of the voluminous data is required for improving the quality of electricity supply. Although wavelet multi resolution analysis combined with a large number of neural networks provides efficient classification of power quality (PQ) events, the time domain featured disturbances such as sags, swells, etc. may not easily be classified [8],[9]. In addition if an important disturbance frequency component is not precisely extracted by wavelet transform which consists of octave band pass filters the classification accuracy may also be limited. This paper, therefore, presents a generalized discrete S-transform [10] with modified Gaussian window derived from STFT for the detection, localization of power line disturbance signal time series database. The S-transform with modified Gaussian window or Modified S-transform (MST) is essentially a variable window STFT whose window width varies inversely with the frequency. The MST produces a time-frequency representation of a time varying signal by uniquely combining the frequency dependent resolution with simultaneously localizing the real and imaginary spectra. Since the MST provides the local spectrum of a signal the time averaging of the local spectrum gives the Fourier spectrum. The MST of a Power line disturbance signal time series data provides contours which closely resemble the disturbance patterns unlike the wavelet transform and hence the features extracted from it are very suitable for developing highly efficient and accurate classification scheme. Further the MST analysis of time varying signal data yields all of the quantifiable parameters for localization, detection and quantification of signals comprising a time series database. After the features are extracted from the time-series data using a Fast discrete S-transform, a clustering analysis is used to group the data into clusters and thereby identifying the class of the data. The well known Fuzzy C-means Algorithm [12] is commonly used for data clustering but suffers from trial and error choice of the initial cluster centers and also the noise present in the original time series data. In addition to this the traditional Fuzzy C-means Algorithm gets stuck in the local minima and, therefore, does not offer robustness in clustering the features. In order to overcome this problem a simpler hybrid K-Means clustering algorithm along with Particle swarm optimization (PSO) technique [13-16] is used for clustering the features into distinct groups that may be used further for classification. Particle swarm optimization (PSO) is a population-based algorithm that simulates bird flocking or fish schooling behavior to achieve an optimized cluster centre using an objective function. Unlike other evolutionary learning algorithms, PSO needs smaller parameters to decide. PSO can be easily implemented, and has stable convergence characteristic with good computational efficiency. However, to improve the clustering performance of PSO further, a hybrid GPAC-Adaptive PSO, which is more effective and capable of

Volume - 2, Issue - 1, 38 – 48

solving non-linear optimization problems faster and with better accuracy in detecting the global best solution is used in this paper. The main concept of this development is based on the passive congregation and a kind of coordinated aggregation observed in the swarms known as GPAC-PSO and modifying it further to yield GPAC-Adaptive PSO or simply to be known as GPAC-APSO. This paper is organized as follows: Section-2 presents the modified Discrete S-transform to extract features from raw time varying power signal data and its implementation is described in Section-3. In Section-4 the K-Means clustering is described and the cluster centre optimization is described in Section 5. This section also describes the GPAC-APSO for improving the clustering performance of K-Means-PSO. Finally Section-6 gives the computational results and conclusion is presented in Section-7.

2. MODIFIED DISCRETE S-TRANSFORM FOR FEATURE EXTRACTION The short-time Fourier transform of a signal

STFT( t, f )

x()w(t )e

2 if

x ( t ) is given by

d

(1)

Where and f denote the time and frequency () is the main signal and the position of the translating window is determined by , which has the same value as t. An alternate expression for (1) is obtained through the use of convolution theorem as

STFT( t, f )

X ( f ) W ( ) e

2 t

d

(2)

Where W is the Fourier transform of w, and the convolution variable has the same dimension as f. One of the disadvantages of STFT is the fixed width and height of the window. This causes misinterpretation of signal components with periods longer than the window width and the width that works well with low frequency components can not detect the high frequency components. The S-transform is a time localized Fourier transform and has a window whose width and height vary with frequency. Although the theory of S-Transform has been presented in references [9], [10] and originally by Stockwell etal. [8], some of the equations and computational steps are outlined below: The general window function w(t, f) is chosen as

w (t, f )

1

(f ) 2 and (f ) f

e

t 2(f )2

,

(3)

IJESAT | Jan-Feb 2012 Available online @ http://www.ijesat.org

39

ISSN: 2250–3676

MAYA NAYAK* et al. [IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY

where A( , f ) S ( , f ) is the amplitude of the S-spectrum and

The window is normalized as

w (t, f )dt 1

Im[S ( , f )] is the phase of the S-spectrum. Re[ S ( , f )]

( , f ) arctan

(4)

n

Here is normally set to a value 0.2 for best overall performance of S-Transform where the contours exhibit the least edge effects and for computing the highest frequency component of very short duration oscillatory transients is made equal to 5. The S-Transform performs multiresolution analysis on the signal, because the width of its window varies inversely with frequency. This gives high time resolution at high frequencies and on the other hand, high frequency resolution at low frequencies.

S( t, f )

f

x()

2

e

f 2 ( t )2 2

.e

2 if

S( t, f )

X ( f )e

d

e

2 t

N 1

S[ j, n ] X m nW [m, n ]e

d

(6)

|f | and the real exponential) 2

localizes the complex Fourier sinusoid, giving the S-transform analyzing function (the term in braces in (5)). The CWT provides time resolution by translating its whole analyzing function (the wavelet) along the time axis. The S-transform is different because only the amplitude envelope of the analyzing function (the window) translates; the oscillations are given by the fixed Fourier sinusoid, which does not depend on . Since the local oscillatory properties of the analyzing function determine the phase of the local spectrum, the S-transform can be considered as having “fixed” phase reference. The fixedphase reference gives the S-transform some advantages over wavelet transform. The inverse S-transform is given by : y(t ) S ( , f )d ei 2 ft df

S ( , f )

is

complex

and

it

S ( , f ) A( , f )e

i ( , f )

(8)

is

2 mk N

2 m 2 2

,and

W[m n ] e

n

Another version of the discrete S-Transform used for computation

The S-transform has an advantage in that it provides multiresolution analysis while retaining the absolute phase of each frequency. This has led to its application for detection and interpretation of events in time series in a variety of disciplines. In this expression of S– transform, the scalable Gaussian

and

2mj N

, j, m and n = 0, 1, ….., N-1

window (the product of

i

m 0

(5)

2 2 2 2 f2

The discrete version of the S-Transform of a signal is obtained as

j 1 N 1 X[m] xk e N k 0

Here the window function is used as given in (3). An alternative formulation for S-transform is given by

The phase of S-spectrum is an improvement on the wavelet transform in that the average of all the local spectra does indeed give the same result as the Fourier transform.

Where X[m + n] is obtained by shifting the discrete Fourier Transform (DFT) of x(k) by n, X[m] being given

The expression for S-transform is given by

Volume - 2, Issue - 1, 38 – 48

represented

as

(7)

S[m, n ] e

j

N 2 nm 2 1 N

xm k w * (k, n)e

i

2 nk N

(9)

N k 2

for m = 1, 2, …, M and n = 0, 1, 2, …N/2 where M is the number of data points of the signal x[m], N is the width of the window, the signal vector x[m] is padded at the beginning or at the end with 0.

2.1 IMPLEMENTATION OF S-TRANSFORM (ST) The computation of the S-Transform is efficiently implemented using the convolution theorem and FFT. The following steps are used for the computation of S-Transform: (i) (ii)

Compute the DFT of the signal x(k) using FFT software routine and shift spectrum X[m] to X[m+n]. Compute the gaussian window function

exp( 2 2 2 m 2 / n 2 ) for the required frequency n. (iii)

Compute the inverse Fourier Transform of the product of DFT and Gaussian window function to give the ST matrix.

The output of the S-transform is an nm matrix, whose rows pertain to frequency and columns indicate time. Each column thus represents the “local spectrum” for that point in time. From the ST matrix we obtain the frequency-time contours having the same amplitude spectrum and these contours can be used to

IJESAT | Jan-Feb 2012 Available online @ http://www.ijesat.org

40

ISSN: 2250–3676

MAYA NAYAK* et al. [IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY visually classify the nature of the signal and its change of frequency and the harmonic content if any during the distortion. Although there are several algorithms available for time series data clustering, the simple and widely used ones are the Kmeans and Fuzzy C-means approaches. Other algorithms include self-organizing maps, similarly distance measures, short-time series distance measures, probability based methods, linear predictive coding etc. The main idea behind the K-means algorithm is the minimization of an objective function, which is normally chosen to be the total distance between all patterns from their respective cluster centers. The usual approach in arriving at a solution is to start with arbitrary cluster centers and then proceed iteratively. The distribution of feature values amongst clusters and updating cluster centers are the two main steps of the K-means algorithm. The clustering algorithm alternates between these two steps until the value of the objective function becomes minimum. The objective function of the K-means is not convex and hence it may contain local minima. Consequently, while minimizing the objective function, there is possibility of getting stuck at local minima (also at local maxima and saddle point). The performance of the K-means algorithm depends on the initial choice of the cluster centers. Besides, the Euclidean norm is sensitive to noise or outliers. Hence K-means algorithm should be affected by noise and outliers. To overcome the problems faced by GA and Simulated annealing, we provide an evolutionary-based clustering method by PSO. It has the higher ability to find the near-optimal solutions in the search space. Particle swarm optimization (PSO) is a population-based algorithm. This algorithm simulates bird flocking or fish schooling behavior to achieve a self evolution system. It can automatically search the optimum solution in the solution space. But the searching process isn’t randomness. According to the different problems, it decides the searching way by the fitness function. Unlike other evolutionary learning algorithms, PSO needs smaller parameters to decide. PSO can be easily implemented, and has stable convergence characteristic with good computational efficiency. This paper presents an evolutionary particle swarm optimization (PSO) learning-based method to optimally cluster N data points into K clusters. The main concept of this development is based on the passive congregation GPAC-PSO and a kind of coordinated aggregation observed in the swarms. As with GA, GPAC-PSO also has premature convergence and thus results in an inaccurate solution. 3. K-MEANS CLUSTERING In the field of clustering, K-means algorithm is the most popularly used algorithm to find a partition that minimizes mean square error (MSE) measure. Clustering analysis is a technology, which can classify the similar sample points into

Volume - 2, Issue - 1, 38 – 48

the same group from a data set. It is a branch from multivariable analysis and unsupervised learning rule in the pattern recognition. For space S which has the K groups and the N points {x1, x2, …, xN}, the definition of the clustering analysis is as follows: In many clustering techniques, the K-means (generally called hard C-means) algorithm is one of well-known hard clustering techniques. It can allocate the data point xi to the closest cluster center zj by using Euclidean distances:

d xi z j , i {1,2,..., N}, j (1,2,..., K}

(10)

A good method will cluster data set X = {x1, x2, …,xN} into K well partitions with 2 -1. When we have an unlabelled data set, it is very important to define an objective function for a clustering analysis method. Intuitively, each cluster shall be as compact as possible. Thus, the objective function of the K-means algorithm is created with the Euclidean norm given by: K

JE j 1

x -z i

xi Ci

2 j

(11)

Where zj is the jth cluster center. The necessary condition of the minimum J is

zj

1 Nj

x

xi C j

i

, j 1,2,..., K

(12)

Where Nj is the number of points belonging to cluster Cj. The K-means clustering process can be stopped when any one of the following criteria are satisfied: when the maximum number of iterations has been exceeded, when there is little change in the centroid vectors over a number of iterations, or when there are no cluster membership changes. 4. PARTICLE SWARM OPTIMIZATION Particle swarm optimization (PSO) is a population-based stochastic search process, modeled after the social behavior of a bird flock. The algorithm maintains a population of particles, where each particle represents a potential solution to an optimization problem. In the context of PSO, a swarm refers to a number of potential solutions to the optimization problem, where each potential solution is referred to as a particle. The aim of the PSO is to find the particle position that results in the best evaluation of a given fitness (objective) function. Each particle represents a position in Nd dimensional space, and is “flown” through this multi-dimensional search space, adjusting its position towards both. For a particle moving in a multidimensional search space, let xi , k and vik denote the position of the ith particle in then jth dimension and velocity at time t.

IJESAT | Jan-Feb 2012 Available online @ http://www.ijesat.org

41

ISSN: 2250–3676

MAYA NAYAK* et al. [IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY

Using the above notation, a particle’s position is adjusted according to

vi,k (t 1) wvi,k (t ) c1r1,k (t ) ( y xi ,k (t )) c2r2,k (t )( yˆk (t ) xi ,k (t )) (13)

xi (t 1) xi (t ) vi (t 1)

(14)

where w is the inertia weight c1 and c2 are the acceleration constants, r1,k (t ) , r2, j (t ) ~ U (0,1), and k=1,…,Nd. The velocity is thus calculated based on three contributions: (1) a fraction of the previous velocity, (2) the cognitive component which is a function of the distance of the particle from its personal best position, and (3) the social component which is a function of the distance of the particle from the best particle found thus far (i.e. the best of the personal bests). The personal best position of particle i is calculated as

y (t ) if f ( xi (t 1)) f ( yi (t )) yi (t 1) i xi (t 1) if f ( xi (t 1)) f ( yi (t )) (15) 4.1 PSO CLUSTERING In the context of clustering, a single particle represents the Nc cluster centroid vectors. That is, each particle xi is constructed as follows: (16) xi (mi1 ,..., m1 j ,..., miNc ) where mi1 refers to the j-th cluster centroid vector of the i-th particle in cluster Cij. Therefore, a swarm represents a number of candidates clustering for the current data vectors. The fitness of particles is easily measured as the quantization error,

Nc j 1 Z p Cij d ( z p , m j ) / Cij Je Nc (17) where d is defined in equation (3.2), and

Cij

is the number

of data vectors belonging to cluster Cij. The next section first presents a standard gbest PSO for clustering data into a given number of clusters and then shows how K-means and the PSO algorithm can be combined to further improve the performance of the PSO clustering algorithm. Hybrid PSO algorithm consistently performs better than the other two approaches with an increase in the number of clusters. The K-means algorithm exhibited a faster, but premature convergence to a large quantization error, while the

Volume - 2, Issue - 1, 38 – 48

PSO algorithms and slower quantization errors. Again, we proceed to the effort of developing more effective PSO algorithms by reflecting recent advances in swarm intelligence and, in addition, by introducing new concepts. Under these conditions, one new hybrid PSO algorithm is proposed, which is more effective and capable of solving non-linear optimization problems faster and with better accuracy in detecting the global best solution. The main concept of this development is based on the passive congregation and a kind of coordinated aggregation observed in the swarms known as GPAC-PSO and modifying it further to yield GPAC-Adaptive PSO or simply to be known as GPAC-APSO. Congregation, on the other hand, is a swarming by social forces, which is the source of attraction of particle to others and is classified in two types: social and passive. Social congregation usually happens when the swarm’s fidelity is high, such as genetic relation. Social congregation necessitates active information transfer e.g., ants that have high genetic relation use antennal contacts to transfer information about location of resources. We do not consider social congregation in this paper because it frequently displays a division of labor. Finally, passive congregation is an attraction of a particle to other swarm members, where there is no display of social behavior since particles need to monitor both environment and their immediate surroundings such as the position and the speed of neighbors. Such information transfer can be employed in the passive congregation. The global variant-based passive congregation PSO (GPAC) is enhanced with the construction factor approach. The swarms of the enhanced GPAC is manipulated by the velocity update

vi (t 1) k. w(t ).vi (t ) c1.rand1.(Pi Si (t ) c2.rand2.(Pk Si (t)) c3.rand3.( Pr Si (t)) (18) where i = 1, 2,…, N; c1, c2, and c3 are the cognitive, social, and passive congregation parameters, respectively; rand1, rand2, and rand3 are random numbers uniformly distributed within [0,1]; P i is the best previous position of the ith particle; Pk is either the global best position ever attained among all particles in the case of enhanced GPAC and Pr is the position of passive congregator (position of a randomly chosen particle-r). The position of the ith particle in the n-dimensional decision space is limited by the minimum and maximum and maximum positions expressed by vectors Si

min

, Simax .

Here, the minimum and maximum

position vectors express the inequality constraints. The velocities of the ith particle in the n-dimensional decision space are limited by Vi

max

,Vi max , where the maximum velocity in

the lth dimension of the search space is proposed as

IJESAT | Jan-Feb 2012 Available online @ http://www.ijesat.org

42

ISSN: 2250–3676

MAYA NAYAK* et al. [IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY

max i ,l

v

min simax , l Si , l

Nr

(l 1, 2,..., n)

,

(19) where

min i ,l

s

and

max i ,l

s

are the limits in the l-dimension of the

search space. The maximum velocities are constructed in small intervals in the search space for better balance between exploration and exploitation. Nr is a chosen number of search intervals for the particles. It is an important parameter in the enhanced GPAC algorithms. A small Nr facilitates global exploration (searching new areas), while a large one tends to facilitate local exploration (fine tuning of the current search area). A suitable value for the Nr usually provides balance between global and local exploration abilities and consequently results in a reduction of the number of iterations required to locate the optimum solution. The basic steps of the enhanced GPAC are listed below: Step-1: Generate a swarm of N-particles with uniform probability distribution, initial positions Si (0) , and velocities

Vi (0), (i 1, 2,..., N ) , and initialize the random parameters. Evaluate each particle-i using objective function f (e.g., f to be minimized). Step-2: For each particle-i, calculate the distance d i j between its

position

and

the

position

of

all

other

particles:

dij Si S j (i j 1, 2,..., N ) where Si and S j are the position vectors of particle-i and particle-j, respectively. Step-3: For each particle-i, determine the nearest particle, particle-k, with better evaluation than its own, i.e., dik min j (dij ), f k fi , and set is as the leader of particle-i. In the case of enhanced GPAC, particle-r and set it as the global best. Step-4: For each particle-I, randomly select a particle-r and set it as passive congregator of particle-i.

The enhanced GPAC algorithms will be terminated if one of the following criteria is satisfied: (1) No improvement of the global best in the last 30 generations is observed, or (2) the maximum number of allowed iterations is achieved (in this chapter).

5. PSO AND GPAC-APSO BASED K-MEANS CLUSTERING ALGORITHM FOR TIME SERIES PATTERN CLASSIFICATION The hybrid K-means PSO algorithm is applied to the time series data obtained from an electricity distribution network and the feature clusters are found to clearly classify the disturbance event of the time series. After the complex S-matrix is obtained by using the procedure outlined earlier, the feature vectors F 1, F2, F3, and F4 required for clustering are chosen from the sets of features described in Table-1. These features are used to classify the time series data into several classes by using a hybrid clustering technique based on PSO and K-means algorithms. The K-means algorithm tends to converge faster (after less function evaluations) than the PSO, but usually with a less accurate clustering. This section shows that the performance of the PSO clustering can further be improved by sending the initial swarm with the result of the K-means algorithm. The hybrid algorithm first executes K-means algorithm once. In this case the K-means clustering is terminated when the maximum number of iterations is exceeded, or when the average change in centroid vectors is less that that 0.0001 (a user specified parameter). The result of the K-means algorithm is then used as one of the particles, while the rest of the swarm is initialized randomly. The extracted features are extracted from S-transform as 1. F1 = max (A)+min (A)-max (B)-min (B). (20) where A is the amplitude versus time graph from the S-matrix under disturbance and B is the amplitude versus time graph of the S-matrix without disturbance. 2. F2 = Standard deviation of the magnitude versus time spectrum obtained from ST matrix =

Step-5: Update the velocities and positions of particles using equations (13) (14), and (15), respectively. Step-6: Check if the limits of positions and velocities are enforced. If the limits are violated, then they are replaced by the respective limits. Step-7: Evaluate each particle using the objective function f.

(

1 N

1

(S ( j) N S ( j)) j

2 1/ 2

)

.

j

(21) 3.

F3 = energy in the S-transform output

= S (n, j ) 4. N

Step-8: If the stopping criteria are not satisfied, go to Step 2.

Volume - 2, Issue - 1, 38 – 48

(22)

F4 = Total harmonic distortion (THD) =

Vn n2

2

2

/ V1

(23)

IJESAT | Jan-Feb 2012 Available online @ http://www.ijesat.org

43

ISSN: 2250–3676

MAYA NAYAK* et al. [IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY

where N is the number of points in the FFT,

Vn the value of

the nth harmonic component of the FFT. 5. F 5 = estimated frequency at the maximum amplitude obtained from ST matrix. Table- 1 Features extracted from S-transform with noise F1 F2 F3 F4 F5 Disturbanc es Normal .9963 .001 .052 .028 50 Sag (60%)

.591

.022

.039

.027

50

Swell (50%) Momentary Interruption (MI) Harmonics

1.503

.012

.076

.029

50

.070

.0387

.0323

.044

50

1.032

.050

.064

0.25

50

Sag with Harmonic (60%) Swell with Harmonic (50%) Flicker (4%, 5 Hz) Notch + harmonics Spike + harmonics Transient (low frequency) Transient (high frequency)

0.601

.0228

.0408

0.113 9

50

1.50

.0219

.079

.1155

50

.998

.0209

.027

55

.940

.1275

.0531

0.115 9 0.198

1.072

.141

.066

.204

50

1.02

0.1473

.0148

.0566

440

1.038 4

.155

.014

.068

3315

50

5.1Computational Results for PSO Optimized Kmeans algorithm Thus section compares the results of the K-means, PSO and Hybrid clustering algorithms on six time series events. The main purpose is to compare the quality of the respective

Volume - 2, Issue - 1, 38 – 48

clustering, where quality is measured according to the following three criteria: the quantization error the intra-cluster distances, i.e. the distance between data vectors within a cluster, where the objective is to minimize the intra-cluster distances; the inter-cluster distances, i.e. the distance between the centroids of the clusters, where the objective is to maximize the distance between clusters. The latter two objectives, respectively, correspond to crisp, compact clusters that are well separated. For all the results reported, an average over 30 simulations is given. All algorithms are run for 1000 function evaluations, and the PSO algorithms and 20 particles are chosen for PSO based clustering. The parameters chosen for PSO are: w=0.72 and c1=c2=1.49 to ensure good convergence. Fig.2 shows the plot of the features F1 versus F2 showing clearly the clusters belonging to steady disturbance patterns like voltage sag (or dip), voltage swell (rise), harmonics and normal signal time series, and transient disturbance patterns like oscillatory transient, notches, and spikes. From Fig.2, it can be seen that the K-means clustering although clusters, there may be a number of misclassifications, while GPAC-APSO is found to produce clear cluster centers separating the above mentioned disturbance patterns in comparison to K-Mean and is shown in Fig.3. The fitness function of K-Means and different PSO variants described in Fig.4 shows that the GPAC-APSO produces greater convergence in comparison to simply K-means algorithm. Similar plots for the features F2 and F4 clusters relate to the presence or absence of harmonics, and fitness values, respectively. The fitness function converges to a low quantization error rapidly in the case of both K-Means and GPAC-APSO algorithms. Cluster 1 belongs to power signal time series data, when the % of harmonics present in the data is < 5% of the fundamental component. When the harmonic content is more than 7 to 10 % of the fundamental component, the cluster-2 becomes prominent and shows such data points. However, K-Means does not produce a very clear boundary between different patterns while GPAC-APSO seems to be very effective in separating the clusters with a greater clarity in comparison to other PSO variants.

IJESAT | Jan-Feb 2012 Available online @ http://www.ijesat.org

44

ISSN: 2250–3676

MAYA NAYAK* et al. [IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY

Volume - 2, Issue - 1, 38 – 48

FIGURES Time series

Feature Extraction (ST or WT)

Clustering algorithm

Features

Clusters

Fig.1 Time series clustering using feature based approach.

HYBRID PSO OUTPUT :

Cluster-4

Cluster-1

Cluster-3

Cluster-2

Fig. 2 F1- F2 K-Means PSO Cluster 1 Sag Cluster 2: Swell Cluster 3: Normal + harmonic Cluster 4: Transient + Notch

IJESAT | Jan-Feb 2012 Available online @ http://www.ijesat.org

45

ISSN: 2250–3676

MAYA NAYAK* et al. [IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY

Volume - 2, Issue - 1, 38 – 48

Cluster 1 Sag Cluster 2: Swell Cluster 3: Normal + harmonic Cluster 4: Transient + Notch Fig.3. F1- F2 GPAC APSO

IJESAT | Jan-Feb 2012 Available online @ http://www.ijesat.org

46

ISSN: 2250–3676

MAYA NAYAK* et al. [IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY

Volume - 2, Issue - 1, 38 – 48

K-Means PSO GPAC-APSO GBEST PSO GPAC

APSO

PSO K-Means

GPAC APSO

GBEST PSO

Fig. 4 F1 - F2 Fitness Plots

6. CONCLUSION An evolutionary clustering technique has been developed by hybridizing the K-means algorithm and PSO for time-series clustering of power signal disturbance data. It can be considered as a viable and an efficient heuristic to find optimal or nearoptimal solutions to clustering problems of allocating N data points to K clusters. The proposed method is very efficient and simple to implement for clustering analysis when the number of clusters is known a priori. According to the simulation results of the proposed approach to a number of time series data sets occurring in power network disturbances with different geometrical properties, it is clear that the proposed approach is more robust than the traditional clustering analysis algorithms. Several variants of PSO have been also presented in this chapter

and the modified GPAC-PSO known as GPAC-APSO is found to be more efficient and has a quick convergence property in comparison to other PSO variants.

REFERENCES [1]. Selim, S. Z. and Ismail, M. A., “K-means Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality,” IEEE Trans. Pattern Anal. Mach. Intell. vol. 6, pp.81-87, 1984. [2]. Wu, K.L. and Yang, M.-S., “Alternative C-means Clustering Algorithms,” Pattern Recognition, vol.35, pp.22672278, 2002.

IJESAT | Jan-Feb 2012 Available online @ http://www.ijesat.org

47

ISSN: 2250–3676

MAYA NAYAK* et al. [IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY [3]. J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, 2001 pp. 346– 389. [4]. K. Krishna, M.N. Murty, Genetic k-means algorithms, IEEE Trans. Syst. Man Cybernet.-B: Cybernet. 29 (3) (1999) 433–439. [5]. R. H. Shumway, Time–frequency clustering and discriminant analysis, Stat. Probab. Lett. 63 (2003) 307–314. [6]. D. Tran, M. Wagner, Fuzzy c-means clustering-based speaker verification, in: N.R. Pal, M. Sugeno (Eds.), AFSS 2002, Lecture Notes in Artificial Intelligence, 2275, 2002, pp.318–324.

Volume - 2, Issue - 1, 38 – 48

Proceedings of the IEEE Congress on Computation (CEC 2001), Seoul, Korea, 2001.

Evolutionary

[16]. Chen, Ching-Yi. and Ye, Fun., “Particle Swarm Optimization Algorithm and Its Application to Clustering Analysis,” IEEE ICNSC 2004, Taipei, Taiwan, R.O.C., pp.789794, 2004. [17]. Chen, Ching-Yi. and Ye, Fun., “K-means Algorithm Based on Particle Swarm Optimization,” 2003 International Conference on Informatics, Cybernetics, and Systems, I-Shou University, Taiwan, R.O.C. pp.1470-1475, 2003.

[7]. V.S. Ananthanarayana, M.N. Murty, D.K. Subramanian, Efficient clustering of large data sets, Pattern Recognition 34 (2001) 2561–2563. [8]. S.R.Shaw, C.R.Laughman, S.B.Leeb, and R.F.Lepard, “A power quality prediction system”, IEEE Transaction on Industrial Electronics, vol.47, No.3, Jun 2000, pp.511-517. [9]. M. Gaouda, S.H.Kanoun, M.M.A. Salama, and A.Y. Chikhani, “Wavelet base signal processing for disturbance classification and measurement”, IEE [10]. Proc. Gen. Trans. and Distribution, vol. 149, May 2002, pp 310-318, [11]. I.W.C.Lee, P.K.Dash, “S-transform based intelligent system for classification of power quality disturbance signals”, IEEE Transactions of Industrial Electronics, vol.50, no.4, pp.800-806. [12]. P, K Dash, B.K Panigrahi, and G.Panda “Power Quality Analysis Using S-Transform” IEEE Transaction on power delivery,vol 18,No 2,April 2003. [13]. Nicolaos B. Karayiannis, and James C. Bezdek, “An Integrated Approach to Fuzzy Learning Vector Quantization and Fuzzy C -means Clustering” IEEE Trans. on Fuzzy System, vol. 5, no. 4, November 1997. [14]. Eberhart, R. and Kennedy, J., “A New Optimizer Using Particle Swarm Theory,” Proc. 6th Int. Symposium on Micro Machine and Human Science, pp. 39-43, 1995. [15]. Eberhart, R. C. and Shi, Y., “Particle Swarm Optimization: Developments, Applications and Resources,”

IJESAT | Jan-Feb 2012 Available online @ http://www.ijesat.org

48