Biclustering Expression Data Based on Expanding Localized Substructures

Cesim Erten Melih Sรถzdinler

Presentation Overview    

Biclustering Definition Types of Biclusters Previous work Our Method   

Graph Preliminaries Biclustering method: Localize&Extract Experimental Results

Future Work

Melih Sözdinler Işık University

Biclustering Definition  

Clustering: groups of “Similar” items Biclustering:Simultaneously cluster two dimensions  The problem is introduced by [Hartigan 72].  The problem is observed that NP-Hard

Clustering Rows

Clustering Columns

Biclustering

Types of Biclusters All Constant

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

Constant Row Multiplicative

1 2 3 4

1 2 4 8

1 2 3 4

1 2 3 4

1 2 3 4

1 2 4 8

1 2 4 8

1 2 4 8

Constant Column Multiplicative

1 2 3 4 1 1 2 3 4 1 1 2 3 4 1 Melih1Sözdinler 2 3Işık4University 1

2 2 2 24

4 4 4 4

8 8 8 8

Types of Biclusters(cont) Coherent Additive Model

Coherent Multiplicative Model

1.0

2.0

4.0

5.0

1.0

2.0

4.0

5.0

2.0

3.0

5.0

6.0

2.0

4.0

8.0

10.0

4.0

5.0

7.0

8.0

0.4

0.8

1.6

2.0

5.0

6.0

8.0

9.0

0.8

1.6

3.2

4.0

Melih Sözdinler Işık University

Previous work 

Proposed Algorithms     

Cheng and Church’s Algorithm(CC)[Cheng et al’00] Order-Preserving Sub Matrix(OPSM)[Ben Dor et al’02] Conserved gene expression motifs(xMOTIFs)[Murali et al’03] Iterative Signature Algorithm(ISA)[Bergmann et al’03] Statistical-Algorithmic Method for Bicluster Analysis(SAMBA) [Tanay et al’02, Sharan et al’03] Bimax[Prelic et al’06]

Melih Sözdinler Işık University

Previous work(cont.) 

Proposed Tools   

Biclustering Analysis Toolbox(BicAT)[Barkow et al’06] Click and Expander[Sharan et al’03] Bicoverlapper[Santamaria et al’08]

Melih Sözdinler Işık University

Our Method   

Graph Preliminaries Biclustering method: Localize&Extract Experimental Results

Melih Sözdinler Işık University

Graph Preliminaries    

Gene Expression Matrix & Bipartite Graph Biclustering & Biclique Bicliques & Crossing Minimization Conditions CM generalized with weights:WOLF[Çakıroğlu Genes

1

1

1

1

0

0

0

0

1

1

1

1

0

0

0

0

1

1

1

1

0

0

0

0

1

1

1

1

0

0

0

0

0

0

0

0

1

1

1

1

0

0

0

0

1

1

1

1

0

0

0

0

1

1

1

1

0

0

0

0

1

1

1

1

Melih Sözdinler Işık University

et al]

Biclustering method: Localize&Extract Phase-1: Localize Phase 1.1 Initial Placement, Run WOLF on Alternating Layers Phase 1.2 Adaptive Noise Hiding Iterate Run WOLF on Alternating Layers Gene-cond pair should

Gene-cond pair could be noise and should be hidden to help localization

not be noise and should not be hidden

Like a convey's Layer Layer AA game of life “who is alone should be hided” Thursday, March 26, 2009

Melih Sözdinler Işık University

LayerBB Layer

Biclustering method: Localize&Extract(cont.) 

Phase-2: Extraction 

It is generic and adaptable. 

Constant   

All Constant Constant Rows Constant Columns

Coherent

Melih Sözdinler Işık University

Biclustering method: Localize&Extract(cont.) 

Phase-2: Extraction 

Constant 

For All Constant Ones 

For Constant row or Constant column Ones 

Collect the ones with the same weights Collect the ones with the same weight on each row or column x

A bit error rate with threshold

The ones with the same weight represent a constant bicluster

Melih Sözdinler Işık University

y

Our Method(cont.) 

Phase-2: Extraction 

Coherent 

H-value Calculate H-value for each submatrix Si For each S_i: Mark S_i. Collect ones on the same x, y-alignment and with similar H-value. Expand S_i if H-value difference is small x Si

Melih Sözdinler Işık University y

Experimental Results 

Two Real Dataset Experiments 

Yeast Cell Cycle(Saccharomyces cerevisiae)2884 genes,17 conditions [Cheng et al’00] Arabidopsis thaliana 734 genes,69 conditions [http://arabidopsis.info/]

LEB parameters, α = 2 and η = 10 for Arabidopsis thaliana dataset. Also for Yeast dataset α = 4 and η = 100

Melih Sözdinler Işık University

Experimental Results(cont.) 

Thaliana Experiment

H-Value Experiment We are the second best in terms of minumum average H-Values

Melih Sözdinler Işık University

Experimental Results(cont.) 

Yeast Experiments  

Enrichments for each functional category Proportions of biclusters enriched according to each GO Biological category using FuncAssociate[Berriz et al’03] Protein-Protein interactions test.

Melih Sözdinler Işık University

Experiment 1 We are better in 8 Functional Categories Thursday, March 26, 2009

Melih Sözdinler Işık University

Experiment 2 Biclusters of LEB enriched with better proportions Thursday, March 26, 2009

Melih Sözdinler Işık University

Yeast sample Protein-Protein Interactions(PPI) Network from http://www.bordalierinstitute.com/images/yeastProteinInteractionNetwork.jpg

Experiment 3 The best hit ratio for PPI Network is given by the biclustering results of LEB Thursday, March 26, 2009

Melih Sözdinler Işık University

Future Work  

Experiments on other datasets. Evaluating biclusters using biological metrics as in identifying cancer related genes in sample human cancer data. Formulating the mathematical relation between "Weighted crossing minimization" and various "Bicluster scoring function"s.

Melih Sözdinler Işık University

Acknowledgements 

Thanks to TÜBİTAK-BIDEB: 

Monthly Payments

Funding for visit

Melih Sözdinler Işık University

Thank you!  Any

Questions?

Melih Sözdinler Işık University

Bicob 2009 Presentation

Cesim Erten and Melih Sözdinler