Outline

Outline

Outline Clique Relaxation Models of Clusters in Networks 1

Introduction

2

Clique Relaxations

3

Sample Numerical Results

Sergiy Butenko Industrial and Systems Engineering Texas A& M University College Station, TX butenko@tamu.edu Joint work with B. Balasundaram

July 19, 2006

S. Butenko

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

S. Butenko

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

Social Networks

Cohesive Subgroups

A social network is described by G = (V , E ) where V is the set of “actors” and E is the set of “ties”. actors are people and a tie exists if two people know each other.

Cohesive subgroups are “tightly knit groups” in a social network.

actors are wire transfer database records and a tie exists if two records have the same matching field.

Members of a cohesive subgroup are believed to share information, have homogeneity of thought, identity, beliefs, behavior, even food habits and illnesses.

actors are telephone numbers and a tie exists if calls were made between them.

Earliest models of cohesive subgroups were cliques.

Applications: Studying terrorist networks, criminal network analysis, detecting money laundering, organized crime and several clustering and graph based data mining applications. S. Butenko

Clique Relaxation Models of Clusters in Networks

S. Butenko

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

Introduction Clique Relaxations Sample Numerical Results

Cliques and Independent Sets

G

Applications

1

G

1

Acquaintance Networks - Criminal network analysis

Complement 5

2

5

Wire Transfer Database Networks - Detecting money laundering

2

Call Networks - Organized crime detection 4

3

4

3

Protein Interaction Networks - Predicting protein functions Gene Co-expression Networks - Detecting network motifs

{1,2,5} : maximal clique {1,4} : maximal independent set

{1,2,5} : maximal independent set

Stock Market Networks - Stock Portfolios Internet Graphs - Information Search and retrieval

{1,4} : maximal clique

{2,3,4,5} : maximum clique

S. Butenko

Clustering and data-mining, wireless and telecommunication networks, etc.

{2,3,4,5} : maximum independent set

Clique Relaxation Models of Clusters in Networks

S. Butenko

Introduction Clique Relaxations Sample Numerical Results

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

Properties of a Cohesive Subgroup (Cluster)

k-cliques Deﬁnition A k-clique is a subset of vertices C such that for every i, j ∈ C , d(i, j) ≤ k.

Three desirable properties are: Familiarity (degree); Reachability (distance);

6

Robustness (connectivity). Cliques are idealized structures for modeling cohesive subgroups. But cliques were criticized for their overly restrictive nature and modeling disadvantages.

5

{2, 3, 4} is a 1-clique ... the “regular” clique

1

4

{1, 2, 4, 5, 6} and {1, 2, 3, 4, 5} are 2-cliques

2

above sets are also maximal 3

S. Butenko

Clique Relaxation Models of Clusters in Networks

S. Butenko

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

Introduction Clique Relaxations Sample Numerical Results

k-clubs

Protein Interaction Networks

Deﬁnition A k-club is a subset of vertices D such that diam(G [D]) ≤ k. 6 5

{2,3,4} is a 1-club ... the “regular” clique

1

Graphs are the protein-protein interaction maps of the yeast Saccharomyces cerevisiae and a gastric pathogen Helicobacter Pylori. Table: S. Cerevisiae. Vertices: 2114; Edges: 2203; Connected components: 417.

Order 1 2 3 4

{1,2,4,5,6} is a 2-club 4

{1,2,3,4,5} is NOT a 2-club

2

maximality - harder to test 3

S. Butenko

Clique Relaxation Models of Clusters in Networks

S. Butenko

Introduction Clique Relaxations Sample Numerical Results

#Components 5 3 4 1

Clique Relaxation Models of Clusters in Networks

Results

Table: H. Pylori. Vertices: 1570; Edges: 1403; Connected components: 858.

Number of Components 850 7 1

S. Butenko

Order 5 6 7 1458

Introduction Clique Relaxations Sample Numerical Results

Protein Interaction Networks

Order 1 2 706

#Components 268 101 25 10

Clique Relaxation Models of Clusters in Networks

Table: Clique, 2-Clique, and 2-Club numbers of S. Cerevisiae and H. Pylori protein maps.

Network S. Cerevisiae H. Pylori

ω(G ) 6 3

S. Butenko

ω ˜ 2 (G ) 57 56

ω ¯ 2 (G ) 57 56

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

Introduction Clique Relaxations Sample Numerical Results

A maximum 2-club and 2-clique of S. Cerevisiae.

S. Butenko

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

A maximum 2-club and 2-clique of H. Pylori.

S. Butenko

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

A maximum 3-clique and 3-club of S. Cerevisiae

Some Drawbacks of k-Cliques and k-Clubs

1. k-cliques may use members outside the group.

S. Butenko

Clique Relaxation Models of Clusters in Networks

S. Butenko

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

Introduction Clique Relaxations Sample Numerical Results

Some Drawbacks of k-Cliques and k-Clubs

Some Drawbacks of k-Cliques and k-Clubs 3. k-cliques and k-clubs do not have meaningful complementary deﬁnitions.

2. k-clubs may lack familiarity and robustness. 5

5

4

5

6

1

3

4

6

4

2

1

3

6

2

1 5

3

5

4

2

1

3

4

2

3

5 o n C ycleve rtics S. Butenko

1

2

5 o n cyle a A s !ve rtics S. Butenko

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

k-plex

co-k-plex

Deﬁnition A subset of vertices S is said to be a k-plex if the minimum degree in the induced subgraph δ(G [S]) ≥ |S| − k. i.e. every vertex in G [S] has degree at least |S| − k. 2

1

6

5

3

4

{3,4,5,6} is a 1-plex ... the “regular” clique {1,3,4,5,6} is a 2-plex (and NOT a 1-plex) {1,2,3,4,5,6} is a 3-plex (and NOT a 2-plex)

Deﬁnition A subset of vertices S is a co-k-plex if the maximum degree in the induced subgraph ∆(G [S]) ≤ k − 1. i.e. degree of every vertex in G [S] is at most k − 1. S is a co-k-plex in G if and only if S is a k-plex in the complement ¯. graph G 2

1

Clique Relaxation Models of Clusters in Networks

1

6

3

6

3

5

4

5

4

3-p le x S. Butenko

2

C o -3-p le x S. Butenko

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

Introduction Clique Relaxations Sample Numerical Results

Structural Properties of a k-Plex

Maximum k-Plex Problem

If G is a k-plex, 1

Every subgraph of G is a k-plex;

2

If k <

3

κ(G ) ≥ n − 2k + 2.

n+2 2

then diam(G ) ≤ 2;

Given a graph G = (V , E ) and a positive integer k, the maximum k-plex problem (MkPP) is deﬁned as the problem of ﬁnding a largest k-plex in G .

Advantages: k-plexes for “small” k values, guarantee reachability and connectivity while relaxing familiarity.

We denote by ωk (G ) the k-plex number of graph G .

k-plexes and co-k-plexes systematically relax cliques and independent sets.

S. Butenko

Clique Relaxation Models of Clusters in Networks

S. Butenko

Introduction Clique Relaxations Sample Numerical Results

Introduction Clique Relaxations Sample Numerical Results

IP Formulation of the Maximum k-Plex Problem

Computational Complexity

Decision version: The k-Plex problem is deﬁned as follows: Given a graph G = (V , E ) and positive integers k and c, does there exist a k-plex of size ≥ c in G ? Theorem The k-Plex problem is NP-complete for any fixed positive integer k.

Formulation The k-plex number ωk (G ) of a graph G = (V , E ) admits the following integer programming formulation: xi ωk (G ) = max

Clique Relaxation Models of Clusters in Networks

(1)

i∈V

subject to: xj ≤ (k − 1)xi

+ (|V \ N[i]|)(1 − xi ) ∀ i ∈ V (2)

j∈V \N[i]

xi

S. Butenko

Clique Relaxation Models of Clusters in Networks

∈ {0, 1} ∀ i ∈ V

S. Butenko

(3)

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

Introduction Clique Relaxations Sample Numerical Results

On the maximum clique problem

Complement-edge vs 1-plex formulation

In the edge formulation, xi + xj ≤ 1 ∀ (i, j) ∈ E¯ replaces xj ≤ d¯i (1 − xi ) ∀ i ∈ V in the above 1-plex

The 1-plex formulation: ω(G ) = max

xi

(4)

j∈V \N[i]

(5)

Edge constraint can be rewritten as xi + xj ≤ 1 ∀ i ∈ V , j ∈ V \ N[i];

(6)

For each i, we can sum xi + xj ≤ 1 over j ∈ V \ N[i] to get the 1-plex constraint.

formulation;

i∈V

subject to:

xj

≤ d¯i (1 − xi )

xi

∈ {0, 1}

∀i ∈V

j∈V \N[i]

∀i ∈V

where d¯i = |V \ N[i]|.

Surrogate constraint ideas (Glover)

S. Butenko

S. Butenko

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

Introduction Clique Relaxations Sample Numerical Results

Solving Maximum k-Plex - Overview

Preprocessing techniques based on properties of a k-plex Branch-and-cut approach embedding maximal independent set inequalities Heuristic approaches

S. Butenko

Clique Relaxation Models of Clusters in Networks

Clique Relaxation Models of Clusters in Networks

Maximum 1-plex on Sanchis graphs n

d = 0.4

d = 0.5

d = 0.6

d = 0.7

d = 0.8

d = 0.9

60 80 100 120 140 160 180 200 250 300 350 400 500 600 700 800 900 1000

0.515 0.953 1.656 4.11 6.782 9.391 14.625 26.312 53.609 130.311 167.64 305.576 587.809 1440.16 2084.06 4188.82 7484.71 10609.8

0.14 0.281 1.015 1.781 2.734 4.094 6.562 11.234 27.672 49.828 69.874 146.405 332.216 666.262 1313.13 2407.17 7293.83 12619.5

0.109 0.188 0.374 1.515 1.891 2.843 4.843 6.703 15.375 36.844 80.093 123.953 827.541 1265.4 3128.04 5537.67 3138.48 5454.63

0.094 0.328 0.718 1.5 1.75 2.375 3.765 4.89 12.563 49.218 95.046 145.296 2640.28 2445.75 1163.8 9497.54 28803‡ (300) 14436.1

0.171 0.344 0.313 1.046 1.469 1.954 2.735 4.938 8.89 31.578 91.89 298.891 2302.89 6324.92 28801.1‡ (234) 28202.3 28802.5† (300) 28803.1† (334)

0.125 0.625 0.61 1.875 2.032 5.266 3.766 6.907 29.328 58.094 311.734 3164.52 18802.8 28801† (200) 28801.1† (234) 28801.1† (267) 28801† (300) 28800.9† (334)

S. Butenko

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

Introduction Clique Relaxations Sample Numerical Results

Maximum 2-plex on Sanchis graphs

Erd¨os Collaboration Networks

n

d = 0.4

d = 0.5

d = 0.6

d = 0.7

d = 0.8

d = 0.9

60 80 100 120 140 160 180 200 250 300 350 400 500 600 700 800 900 1000

0.781 1.422 3.108 6.375 9.812 16.265 26.093 28.453 66.984 131.39 255.012 321.754 1493.74 2594.98 2451.03 4571.52 5643.9 15307.2

0.438 1.14 2.172 3.687 6 9.828 22.406 29.953 59.219 138.921 174.875 483.749 1130.23 2802.97 4929.64 6961.75 17661.6 22162.1

1.031 1.515 2.329 4.422 10 21.156 29.14 57.281 131.311 314.373 638.072 1085.99 3728.15 7990.93 28865.5† 28901† 28941.7† -

4.592 5.592 6.89 10.687 37.938 45.843 50.234 152.749 709.823 1204.26 2400.58 3957.04 15078.9 28820.9‡ 28801.9† -

125.295 182.749 249.187 815.7 1775.66 1623.88 8458.37 1687.75 14062.5 28802.5‡ 28803† -

141.635 28298 28801† 28801.1† 28801† -

S. Butenko

In scientiﬁc collaboration networks, the vertices represent scientists, and an edge connects two of them if they co-authored at least one paper. Two types of Erd¨ os graphs were used in our experiments. First type has mathematicians with Erd¨os number 1. Second type has mathematicians with Erd¨ os number 1 and 2.

Clique Relaxation Models of Clusters in Networks

S. Butenko

Introduction Clique Relaxations Sample Numerical Results

Introduction Clique Relaxations Sample Numerical Results

Erd¨os Networks - k-plex numbers

Erd¨os Networks - Runtimes

Table: Erd¨ os networks: The number of vertices, edges, edge density, and the maximum k-plex size for k = 1, . . . , 5.

Graph ERDOS-97-1 ERDOS-98-1 ERDOS-99-1 ERDOS-97-2 ERDOS-98-2 ERDOS-99-2

|V | 472 485 492 5488 5822 6100

|E | 1314 1381 1417 8972 9505 9939

S. Butenko

Clique Relaxation Models of Clusters in Networks

Edge Density 0.0118212 0.0117662 0.0117315 0.0005959 0.0005609 0.0005343

ωk (G ) for 1 2 3 7 8 9 7 8 9 7 8 9 7 8 9 7 8 9 8 8 9

k = ... 4 5 11 12 11 12 11 12 11 12 11 12 11 12

Clique Relaxation Models of Clusters in Networks

Table: The run time (in CPU seconds) of the BC algorithm on 1-neighborhood Erd¨ os networks.

1 2 3 4 5

ERDOS-97-1 196.246 242.454 252.445 154.000 164.063

ERDOS-98-1 216.433 294.349 262.061 172.177 188.906

S. Butenko

ERDOS-99-1 235.121 327.407 309.323 191.003 193.905

Clique Relaxation Models of Clusters in Networks

Introduction Clique Relaxations Sample Numerical Results

Introduction Clique Relaxations Sample Numerical Results

ErdÂ¨os Networks - Runtimes

References

Table: The reduced graph sizes and run time (in seconds) for the BC algorithm on reduced graphs.

k 1 2 3 4 5

ERDOS-97-2 |V | |E | Time 174 1061 8.203 174 1061 25.561 174 1061 45.999 77 510 2.453 77 510 3.063

ERDOS-98-2 |V | |E | Time 188 1160 11.766 188 1160 29.28 188 1160 38.781 105 686 4.562 105 686 4.047

S. Butenko

ERDOS-99-2 |V | |E | Time 194 1208 12.234 194 1208 40.468 194 1208 38.874 116 763 10.624 116 763 6.297

Clique Relaxation Models of Clusters in Networks

B. Balasundaram, S. Butenko, and S. Trukhanov. Novel approaches for analyzing biological networks. Journal of Combinatorial Optimization, 10:23â€“39, 2005. B. Balasundaram, S. Butenko, I. V. Hicks, and S. Sachdeva. Clique relaxations in social network analysis: The maximum k-plex problem. Submitted to Mathematical Programming. 2006. http://ie.tamu.edu/people/faculty/butenko/papers

S. Butenko

Clique Relaxation Models of Clusters in Networks

Clique Relaxation Models of Clusters in Networks

Published on Mar 16, 2012

* actors are people and a tie exists if two people know each other. *actors are wire transfer database records and a tie exists if two recor...

Read more

Similar to

Popular now

Just for you