# Sample project - genetic algorithm to cluster graphs

Course: Algorithms in Nature/02317 School of Computer Science / Machine Learning Department Carnegie Mellon University

Professor: Ziv Bar-Joseph Student: Fereshteh Shahmiri

A genetic algorithm to cluster graphs Finding dense modules or clusters in a graph is an important part of many data mining problems. One popular definition of a â&amp;#x20AC;&amp;#x2DC;moduleâ&amp;#x20AC;&amp;#x2122; is a set of nodes that have many more within-module connections (i.e. connections between nodes in the same module) than between-module connections (i.e. connections between nodes in different modules) than expected by chance. In 2002, Newman proposed an objective function, called modularity, that characterizes the quality of a clustering C of a graph G = (V, E): (

)

â&amp;#x2C6;&amp;#x2018;(

)(

)

Where Auv is 1 if u and v have an edge in E and 0 otherwise; ku is the degree of node u (i.e. its number of neighbors); m is the total number of edges in the graph; and the variables x uv describe C by indicating which nodes are in the same module. Specifically, for every pair of nodes, xuv = 0 if u and v belong to the same module, and xuv = 1 otherwise. Notice that there is no contribution towards the modularity score for a pair of nodes that lie in different modules and that all terms (A uv, ku, kv, m) are fixed besides the xuv terms. The goal is to find the clustering C that maximizes this function. In general, the clustering C can have any number of modules (from 1 to n, where n is the number of nodes in the graph), but all nodes must be assigned to exactly one module. Write a genetic program to cluster an input graph into modules that optimizes the Newman objective function using at most 5 clusters. Answer: Here are some sample outputs with different number of generations and also number of individuals and the running time based on seconds. 10 generations: How fast you want the result? Enter a number between 1 and 10, higher value return result faster: 2#modularity=22.7820512821 Module 1: 11 14 15 17 18 19 22 23 24 26 27 28 29 30 31 32 33 Module 2: 0 1 2 3 4 5 6 7 8 9 10 12 13 16 20 21 25 Time: 1304.78400016 30 generations: How fast you want the result? Enter a number between 1 and 10, higher value return result faster:

Course: Algorithms in Nature/02317 School of Computer Science / Machine Learning Department Carnegie Mellon University

Professor: Ziv Bar-Joseph Student: Fereshteh Shahmiri

8#modularity=30.2628205128 Module 1: 5 6 16 24 25 27 28 31 Module 2: 0 1 2 3 7 10 11 12 13 17 21 Module 3: 4 8 9 14 15 18 19 20 22 23 26 29 30 32 33 Time: 1024.56500006 How fast you want the result? Enter a number between 1 and 10, higher value return result faster: 2#modularity=31.8269230769 Module 1: 0 1 2 3 4 5 6 7 10 12 13 16 17 19 21 Module 2: 8 9 11 14 15 18 20 22 23 24 25 26 27 28 29 30 31 32 33 Time: 6231.27499986 How fast you want the result? Enter a number between 1 and 10, higher value return result faster: 2#modularity=31.8782051282 Module 1: 8 9 14 15 18 20 22 23 24 25 26 27 29 30 31 32 33 Module 2: 0 1 2 3 4 5 6 7 10 11 12 13 16 17 19 21 28 The best answer is the last one here which: #modularity=31.8782051282 Module 1: 8 9 14 15 18 20 22 23 24 25 26 27 29 30 31 32 33 Module 2: 0 1 2 3 4 5 6 7 10 11 12 13 16 17 19 21 28

Course: Algorithms in Nature/02317 School of Computer Science / Machine Learning Department Carnegie Mellon University

Professor: Ziv Bar-Joseph Student: Fereshteh Shahmiri