Drawing, clustering and visualization of biological pathways.
Fabien Jourdan, LIRMM, Montpellier France. fjourdan@lirmm.fr
Visualization Enhances Data analysis.
From data extraction to visualization
From data extraction to visualization
From data extraction to visualization
From data extraction to visualization
Data Extraction
Visualization
Visualization Process • Import Data • Clearly separate data from representation • Organize data according to future visualization in a separate process
• Drawing • Follow drawing conventions or propose new representations • Provide drawing algorithms
• Link Data and Drawing • Make sure that data can be accessed through the representation (drawing)
• Navigation • Provide direct access to data (multiple views) • Provide synthetic views of data (clustering) • Enhance data discovering through navigation
Visualization Process • Import Data • Clearly separate data from representation • Organize data according to future visualization in a separate process
• Drawing • Follow drawing conventions or propose new representations • Provide drawing algorithms
• Link Data and Drawing • Make sure that data can be accessed through the representation (drawing)
• Navigation • Provide direct access to data (multiple views) • Provide synthetic views of data (clustering) • Enhance data discovering through navigation
Visualization Process • Import Data • Clearly separate data from representation • Organize data according to future visualization in a separate process
• Drawing • Follow drawing conventions or propose new representations • Provide drawing algorithms
• Link Data and Drawing • Make sure that data can be accessed through the representation (drawing)
• Navigation • Provide direct access to data (multiple views) • Provide synthetic views of data (clustering) • Enhance data discovering through navigation
Visualization Process • Import Data • Clearly separate data from representation • Organize data according to future visualization in a separate process
• Drawing • Follow drawing conventions or propose new representations • Provide drawing algorithms
• Link Data and Drawing • Make sure that data can be accessed through the representation (drawing)
• Navigation • Provide direct access to data (multiple views) • Provide synthetic views of data (clustering) • Enhance data discovering through navigation
Visualization loop Content Browse
Model
Browsing Strategy
Formulate a Browsing Strategy
Internal Model
Interpret Interpretation
Visualization is not a linear process !
Spence Diagram
Metabolic Pathway visualization
Boehringer Posters
Metabolic Pathway visualization
KEGG
Metabolic Pathway visualization
EcoCyc MetaCyc
And many other tools …
Metabolic Pathway visualization
EcoCyc MetaCyc
And many other tools …
Visualization Loop • Import Data • Clearly separate data from representation • Organize data according to future visualization in a separate process
• Drawing • Follow drawing conventions or propose new representations • Provide drawing algorithms
• Link Data and Drawing • Make sure that data can be accessed through the representation (drawing)
• Navigation • Provide direct access to data (multiple views) • Provide synthetic views of data (clustering) • Enhance data discovering through navigation
Importing Data • Information is merged in visualization not in databases • Data is organized under an easy to use and to exchange format (e. g. XML)
DB1
DB2
DB3
DB4
Importing Data • Information is merged in visualization not in databases • Data is organized under an easy to use and to exchange format (e. g. XML)
DB1
DB2
DB3
Query engine
DB4
Importing Data • Information is merged in visualization not in databases • Data is organized under an easy to use and to exchange format (e. g. XML)
DB1
DB2
DB3
Query engine
DB4
Importing Data • Information is merged in visualization not in databases • Data is organized using a standard exchange format (XML)
KEGG pathways database C1+E1->C2 C2+E3->C3 C4+E2->C3 …
Map000100 Map000100 C1->C2 Map000100 C1->C2 C2->C3 Map000100 C1->C2 C2->C3 … C1->C2 …C2->C3 …C2->C3 …
Map000100 C1Map000100 X=10 Y=30 Map000100 X=10 C2C1 X=5 Y=2Y=30 Map000100 C1 X=10 X=5 Y=2Y=30 C3C2 X=45 Y=99 C1 X=10 C2 X=5 Y=2Y=30 C3 X=45 Y=99 … C2 X=5 Y=2 …C3 X=45 Y=99 …C3 X=45 Y=99 …
KGML : an XML description for each metabolic pathway
Importing Data • Information is merged in visualization not in databases • Data is organized using a standard exchange format (XML)
KGML
Main steps in visualization. • Importing Data • Finding relevant sources • Organizing data according to future visualization
• Drawing • Following drawing conventions or porposing new representations • Providing drawing algorithm
• Linking Data and Drawing • Assure that data could be access through the representation (drawing)
• Navigation • Providing synthetical views of data (clustering) • Enhancing data discovering through navigation
Drawing • Providing new representations • Using deeply rooted drawing conventions in Metabolic Pathway representations
ViMac
• Rojas et al. / EcoCyc
Drawing Algorithms • Detect strongly connected components → a DAG • Draw the DAG with a DAG Placement algorithm • Draw each component with Force Directed Placement
Drawing Algorithms • Detect strongly connected components → a DAG • Draw the DAG with a DAG Placement algorithm • Draw each component with Force Directed Placement
Drawing Algorithms • Detect strongly connected components → a DAG • Draw the DAG with a DAG Placement algorithm • Draw each component with Force Directed Placement
Drawing Algorithms • Detect strongly connected components → a DAG • Draw the DAG with a DAG Placement algorithm • Draw each component with Force Directed Placement
Drawing Algorithms • Detect strongly connected components → a DAG • Draw the DAG with a DAG Placement algorithm • Draw each component with Force Directed Placement
Drawing Algorithms • Detect strongly connected components → a DAG • Draw the DAG with a DAG Placement algorithm • Draw each component with Force Directed Placement
Drawing Algorithms • Detect strongly connected components → a DAG • Draw the DAG with a DAG Placement algorithm • Draw each component with Force Directed Placement
• Rojas et al. / EcoCyc
Drawing • Providing new representations • Using deeply rooted drawing conventions in Metabolic Pathway representations
Drawing • Providing new representations • Using deeply rooted drawing conventions in Metabolic Pathway representations KEGG
Drawing • Providing new representations • Using deeply rooted drawing conventions in Metabolic Pathway representations BIOTAG
Interacting on metabolic pathwyas
KEGG
BIOTAG
Drawing • Our method : – Use KGML files – The implicit data structure does not match the KEGG drawing of the network • Data structure transformation
– Place elements according to KGML coordinates – Compute edge routes
Drawing • Our method : – Use KGML files – The implicit data structure does not match the KEGG drawing of the network • Data structure transformation
– Place elements according to KGML coordinates – Compute edge routes
Drawing
The network described in KGML is not the one we want to draw
Drawing • Our method : – Use KGML files – The implicit data structure does not match the KEGG drawing of the network • Data structure transformation
– Place elements according to KGML coordinates – Compute edge routes
Drawing Algorithms
• From KGML data our aim is to compute this representation
Drawing Algorithms
• Graphical informations given in KGML files
Drawing Algorithms
• Graphical informations given in KGML files
Drawing Algorithms
• Compute barycenter of enzymes
Drawing Algorithms
• According to the three defined coordinates route the edge.
Drawing Algorithms
• According to the three defined coordinates route the edge.
Drawing Algorithms
• From KGML data our aim is to compute this representation
Drawing Algorithms • Using KEGG coordinates provided in KGML files • Routing Edges on a grid.
Visualization Loop • Import Data • Clearly separate data from representation • Organize data according to future visualization in a separate process
• Drawing • Follow drawing conventions or propose new representations • Provide drawing algorithms
• Link Data and Drawing • Make sure that data can be accessed through the representation (drawing)
• Navigation • Provide direct access to data (multiple views) • Provide synthetic views of data (clustering) • Enhance data discovering through navigation
Linking Data and Drawing DATA
Visualization
BIOTAG User
Linking Data and Drawing DATA
Visualization
BIOTAG User
Visualization Loop • Import Data • Clearly separate data from representation • Organize data according to future visualization in a separate process
• Drawing • Follow drawing conventions or propose new representations • Provide drawing algorithms
• Link Data and Drawing • Make sure that data can be accessed through the representation (drawing)
• Navigation • Provide direct access to data (multiple views) • Provide synthetic views of data (clustering) • Enhance data discovering through navigation
Navigation : Clustering
A. J. Enright PNAS 2002
Small World Networks • Short path between each pair of elements • Each element neighbourhood is densely connected
• Metabolic pathways • Protein-protein interaction networks • Social networks • Software component networks • Hypermedia networks • ….
Navigation : Clustering • Giving a synthetical view of data – According to their values – Acdording to their organisation (structure)
• Grouping elements • Manualy • Automaticaly
Multiscale Visualization of Small World Networks InfoVis 03.
Navigation : Clustering • Giving a synthetical view of data – According to their values – Acdording to their organisation (structure)
• Grouping elements • Manualy • Automaticaly
Multiscale Visualization of Small World Networks InfoVis 03.
Navigation : Clustering
Software component capture using graph clustering IWPC 03.
Navigation : Clustering • Giving a synthetical view of data – According to their values – Acdording to their organisation (structure)
• Grouping elements • Manualy • Automaticaly
Navigation : keeping context • When looking closer at an element, keeping the contextual information • An overview frame • A Fisheye + Semantic Zooming
Navigation : keeping context • When looking closer at an element, keeping the contextual information • An overview frame • A Fisheye + Semantic Zooming
Navigation : keeping context • When looking closer at an element, keeping the contextual information • An overview frame • A Fisheye + Semantic Zooming
Conclusion • Visualization a tool to support data analysis – Analysis of post-genomic data through metabolic pathway visualization (Biotag) – Eploratory analysis (Protein-protein / Small World)
• Ongoing work – Full implementation of fisheye techniques – Validation of metric-based clustering
Acknoledgements • Transcriptome team : – Jacques Marti (Montpellier UM2) – Oliver Clement (Montpellier UM2) – David Piquemal (Montpellier UM2)
• Computer Science team : – Guy Melançon (Montpellier LIRMM) – Isabelle Mougenot (Montpellier LIRMM) – David Auber (Bordeaux Labri) – Yves Chiricota (Chicoutimi UQAM)
Thank you for your attention
Strength Metric on edges
u
e
v
Strength Metric on edges Wuv
u Mu = Nu\Nv
γ 3(e) =
e
v Mv = Nv\Nu
| Wuv | | Mu | + | Mu | + | Wuv |
Strength Metric on edges Wuv
u Mu = Nu\Nv
e
v Mv = Nv\Nu
γ 4(e) = s(Mu, Wuv) + s(Mv, Wuv) + s(Mu, Mv) + s(Wuv, Wuv)