International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013
AN IMPROVED MULTI-SOM ALGORITHM Imen Khanchouch1, Khaddouja Boujenfa2 and Mohamed Limam3 1
LARODEC ISG, University of Tunis
2
LARODEC ISG, University of Tunis
kh.imen88@gmail.com khadouja.Boujenfa@isg.rnu.tn 3
LARODEC, ISG University of Tunis, Dhofar University, Oman Mohamed.limam@isg.rnu.tn
ABSTRACT This paper proposes a clustering algorithm based on the Self Organizing Map (SOM) method. To find the optimal number of clusters, our algorithm uses the Davies Bouldin index which has not been used previously in the multi-SOM. The proposed algorithm is compared to three clustering methods based on five databases. Results show that our algorithm is as performing as concurrent methods.
KEYWORDS Clustering, SOM, multi-SOM, DB index.
1. INTRODUCTION Clustering is an unsupervised learning technique aiming to obtain homogeneous partitions of objects while promoting the heterogeneity between partitions.In the literature there are many clustering categories such as hierarchical [13], partition-based [5], density-based [1] and neuronal networks (NN) [6]. Hierarchical methods aim to build a hierarchy of clusters with many levels. There are two types of hierarchical clustering approaches: the agglomerative methods (bottom-up) and the divisive methods (Top-down).Agglomerative methods start by many data objects taken as clusters and are successively joined two by two until obtaining a single partition containing all the objects. However, divisive methods begin with a sample of data as one cluster and successively get N divided clusters as objects. Hierarchical methods are time consuming in the presence of large amount of data. Consequently, the resulting dendrogram is very large and may include incorrect information. Partitioning methods divide the data set into disjoint partitions where each partition represents a cluster. Clusters are formed to optimize an objective partitioning criterion, often called a similarity function, such as distance. Each cluster is represented by a centroid or a representative cluster. Partitioning methods suffer from the sensibility of initialization. Thus, inappropriate initialization may lead to bad results. However, they are faster than hierarchical methods. Density-based clustering methods aim to discover clusters with different shapes. They are based on the assumption that regions with high density constitute clusters, which are separated by regions with low density. They are based on the concept of cloud of points with higher density where the neighborhoods of a point are defined by a threshold of distance or number of nearest neighbors. DOI : 10.5121/ijnsa.2013.5414
181