Page 1


 
 
 
 
 


FROM
GENOTYPE
TO
PHENOTYPE
 FUTURE
PERSPECTIVES
ON
DATA
AND
SERVICE
INTEGRATION



 
 
 
 TÓPICOS AVANÇADOS EM ENGENHARIA INFORMÁTICA  BIOINFORMÁTICA
 Programa Doutoral em Engenharia Informática 2008­2009  
 
 
 
 
 
 
 Pedro
Lopes
|
pedrolopes@ua.pt

 



TABLE OF CONTENTS  
 Table
of
contents .....................................................................................................................................................2
 Introduction
–
The
GEN2PHEN
Project.........................................................................................................3
 Integration
Scenarios
and
Related
Work......................................................................................................6
 Semantic
Web ......................................................................................................................................................7
 Social
environments .........................................................................................................................................8
 Integration ............................................................................................................................................................9
 Summary............................................................................................................................................................. 10
 Our
Ongoing
Developments ............................................................................................................................ 12
 Dynamicflow ..................................................................................................................................................... 12
 DiseaseCard ....................................................................................................................................................... 14
 Summary............................................................................................................................................................. 15
 Future
Perspectives ............................................................................................................................................ 16
 Cloud‐computing............................................................................................................................................. 16
 Information
Integration ............................................................................................................................... 17
 Data
Visualization ........................................................................................................................................... 18
 Summary............................................................................................................................................................. 19
 Conclusion............................................................................................................................................................... 20
 References............................................................................................................................................................... 21
 
 


2



INTRODUCTION – THE GEN2PHEN PROJECT  
 Bioinformatics
 is
 emerging
 as
 one
 of
 the
 more
 fastest‐growing
 scientific
 areas
 of
 computer
science.
Recent
hardware
and
software
developments
show
an
evolution
faster
 than
the
Moore’s
Law
predictions.
This
development
has
begun
with
the
Human
Genome
 Project 1 
which
 has
 succeeded
 in
 decoding
 the
 complete
 human
 genetic
 code.
 This
 generated
 a
 tremendous
 amount
 of
 information
 that
 was
 readily
 available
 and
 the
 scientific
 community
 rapidly
 started
 designing
 applications,
 increasing
 the
 amount
 of
 resources
 needed
 in
 this
 area.
 Following
 the
 Human
 Genome
 Project
 came
 the
 Human
 Variome
 Project2,
 which
 aims
 to
 collect
 information
 about
 genome
 variations
 and
 their
 influence
in
human
health.
Along
with
the
latter,
European
Community
is
also
sponsoring
 a
 bioinformatics
 project
 in
 its
 Seventh
 Framework
 Program:
 Genotype
 to
 Phenotype
 Databases:
a
Holistic
Solution
(GEN2PHEN)3.
 
 The
 GEN2PHEN
 Project
 is
 a
 collaborative
 project
 with
 19
 partners.
 Most
 of
 the
 partners
 are
 from
 European
 institutions
 with
 relevant
 work
 in
 the
 bioinformatics
 scientific
 area.
 GEN2PHEN
 is
 an
 ambitious
 project
 aiming
 to
 unify
 human
 and
 model
 organisms
genetic
variation
databases
allowing
the
creation
of
a
central
genome
browser
 with
the
ability
to
blend
GEN2PHEN
data
and
medical
data.
The
overall
goal
is
to
create
a
 complete
biomedical
knowledge
environment.
The
strategy
and
objectives
of
this
project
 may
be
divided
in
several
research
areas:
 •

Analyze
 the
 genotype
to
phenotype
 field
and
 investigate
 current
 needs
 and
 practices
 in
 order
 to
 obtain
 a
 complete
 knowledge
 about
 other
 ongoing
 projects
 with
 similar
 objectives.
 The
 active
 biology
 community
 must
 be
 consulted
in
order
develop
an
accurate
state‐of‐the‐art
document
describing
 the
general
process
on
the
field
and
enabling
the
most
correct
definition
of
 what
 this
 particular
 area
 is
 lacking
 and
 what
 models
 and
 technologies
 are
 being
effectively
used.































































 1

Human Genome Project: http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml

2

Human Variome Project: http://www.humanvariomeproject.org

3

GEN2PHEN: http://www.gen2phen.org

3



Develop
standards
for
the
genotype
to
phenotype
field
of
research
in
order
 to
 speed
 up
 the
 standardization
 process
 with
 new
 data
 models,
 nomenclature
and
technology
standards.


Create


generic


database


components,


services


and


integration


infrastructures
 for
 the
 genotype
 to
 phenotype
 domain.
 These
 solutions
 will
 be
 mostly
 web
 applications
 applying
 new
 interface
 usability
 standards
 and
 customized
to
their
end
users.
Solutions
for
genetic
and
genomic
databases
 will
 be
 developed.
 This
 particular
 objective
 is
 aiming
 to
 create
 a
 central
 GEN2PHEN
 database
 crossing
 all
 the
 research
 areas
 and
 a
 simpler
 application,
which
can
be
deployed
by
any
research
group.
 •

Create
 data
 search
 and
 presentation
 solutions
 for
 genotype
 to
 phenotype
 knowledge.
 Applications
 designed
 when
 fulfilling
 the
 previously
 mentioned
 objective
 won’t
 be
 complete
 without
 proper
 search
 mechanisms
 that
 must
 encompass
 information
 distributed
 throughout
 different
 applications
 and
 architecture
 layers.
 The
 applications
 must
 also
 have
 an
 effective
 interface
 layer
designed
to
respect
the
community
requests.


Facilitate
 research
 and
 diagnostic
 genotype
 to
 phenotype
 databases
 population
 by
 developing
 new
 tools
 and
 promoting
 them
 in
 the
 scientific
 community.
 The
 newly
 developed
 applications
 will
 also
 support
 more
 efficient
 methods
 for
 data
 insertion
 allowing
 anyone
 to
 collaborate
 in
 this
 project.


Build
 a
 major
 genotype
 to
 phenotype
 Internet
 portal,
 a
 GEN2PHEN
 knowledge
 centre.
 This
 portal
 will
 contain
 all
 GEN2PHEN
 related
 information,
 ranging
 from
 calendars
 to
 databases,
 from
 publications
 to
 discussion
forums.


Deploy
 developed
 solutions
 to
 the
 community
 in
 order
 to
 increase
 researchers
interest
and
participation.
Several
resources
will
be
devoted
to
 advertising,
 explaining
 and
 training
 researchers
 in
 using
 the
 developed
 solutions.


The
 project
 main
 focus
 is
 on
 developing
 and
 promoting
 a
 new
 generation
 of
 applications
that
will
aid
different
types
of
researchers
in
their
scientific
work
and,
at
the
 same
time,
gather
and
integrate
information
from
different
sources
which
will
be
shared
 to
the
community.
GEN2PHEN
applications
have
to
be
state‐of‐the‐art
web
applications.
It
 is
 important
 to
 research
 and
 study
 the
 most
 popular
 Web2.0
 (and
 next
 Web3.0)


4



applications
in
order
to
improve
developers’
knowledge
about
what
captivates
the
users,
 increasing
 general
 biomedical
 community
 interest.
 This
 research
 should
 be
 mainly
 focused
 on
 user
 interactions
 issues
 like
 usability,
 interfaces,
 “quality
 of
 service”
 and
 overall
user
satisfaction.
This
new
wave
of
applications
has
to
address
issues
like
semantic
 data
 integration,
 user
 collaboration,
 information
 sharing
 and
 search
 engines’
 algorithms
 improvements.


Fig. 1 ­ GEN2PHEN
strategy 

Developing
 a
 simple
 Rich
 Internet
 Application
 is,
 by
 now,
 a
 somewhat
 trivial
 process,
not
requiring
great
software
engineering
and
programming
knowledge.
However,
 bioinformatics
 and
 biomedicine
 don’t
 depend
 only
 on
 good‐looking
 interfaces.
 What
 matters,
 and
 this
 is
 the
 difficult
 part,
 is
 what’s
 under
 the
 hood.
 Going
 deeper
 in
 the
 application
 composition,
 several
 issues
 like
 data
 integration,
 service
 integration,
 service
 orchestration,
 workflow
 composition,
 distributed
 processing,
 query
 expansion
 or
 object
 ontologies
arise.
 This
report
intends
to
give
a
GEN2PHEN
project
overview
with
special
incidence
in
 these
 next‐generation
 web
 applications
 problems.
 Some
 solutions
 with
 ongoing
 development
 will
 be
 referred
 as
 well
 as
 systems
 in
 development
 in
 our
 workgroup
 and
 how
can
both
help
assessing
GEN2PHEN
application
design.


5



INTEGRATION SCENARIOS AND RELATED WORK 

First
of
all
is
necessary
to
understand
to
whom
these
new
application
paradigms
will
 be
important
and
why
these
generic
GEN2PHEN
goals
are
so
significant.
The
biological
and
 biomedical
 scientific
 community
 is
 watching
 an
 exponential
 increase
 on
 the
 information
 available.
 This
 growth
 leads,
 subsequently,
 to
 the
 growth
 of
 the
 number
 of
 applications
 (web
 or
 desktop)
 to
 solve
 the
 same
 specific
 problems.
 And
 along
 with
 these
 new
 applications,
come
new
data
sources,
new
services
and
the
heterogeneity
among
them
is
 huge.
 The
 main
 issue
 one
 main
 found
 when
 doing
 scientific
 research
 is
 where
 to
 find
 information.
 A
 few
 years
 ago
 this
 was
 a
 problem
 because
 of
 the
 lack
 of
 applications
 and
 databases.
 Now,
 this
 is
 a
 problem
 because
 of
 the
 excessive
 amount
 of
 information
 available
on
every
corner
of
the
web.



 Fig. 2 ­ Web2.0
integration 

From
 the
 users
 perspective,
 we
 believe
 they
 are
 looking
 for
 a
 central,
 unifying
 portal,
customized
to
their
personal
status,
where
they
can
easily
find
all
the
information
 they
 need.
 This
 is
 the
 added
 value
 GEN2PHEN
 solutions
 may
 have.
 Currently,
 there
 are


6



innumerous
 ongoing
 works
 focusing
 on
 this
 problem.
 However,
 there
 isn’t
 a
 universal
 solution
to
solve
all
the
heterogeneity
problems
arose
by
data
and
service
integration.
And
 the
problems
don’t
boil
down
to
this;
there
are
also
the
novel
functionalities
possible
with
 the
semantic
web
[1]
and
the
grand
developments
made
in
information
mining.
Following
 Goble
and
Stevens
[2]
work,
one
can
conclude
that
not
all
is
well
in
the
kingdom
of
data
 integration
in
bioinformatics
and
that
data
integration
has
a
long
path
to
run
in
order
to
 completely
satisfy
the
initials
goals.

 The
group
of
applications
that
should
be
studied
may
be
divided
in
three
main
areas
 that
are
largely
connected
and
potentiate
integration.
There
are
developments
in
semantic
 web
 and
 its
 application
 in
 biology
 and
 how
 the
 bridge
 between
 generic
 ontologies
 and
 biological
 ones
 can
 be
 made.
 Other
 groups
 are
 working
 in
 collaboration
 tools
 for
 the
 community,
 which
 have
 better
 information
 sharing
 and
 productivity
 tools.
 The
 largest
 group
 is
 the
 integration
 one.
 In
 this
 group
 one
 can
 encompass
 data
 integration,
 service
 integration,
service
orchestration,
workflow
composition
and
mashup
applications.
 
 SEMANTIC
WEB
 Semantic
 web
 developments
 have
 the
 main
 purpose
 of
 describing,
 with
 a
 pre‐ defined
 ontology,
 all
 the
 information
 existent
 in
 the
 web.
 Semantic
 web
 key
 components
 are
 RDF4,
 OWL5
and
 SPARQL6.
 RDF
 stands
 for
 Resource
 Description
 Framework
 and
 is
 a
 generic
metadata
model
for
online
information
and
content
description.
OWL
is
the
Web
 Ontology
Language,
which
is
the
ontology‐authoring
tool
usually
associated
with
the
RDF
 schema.
 SPARQL
 is
 a
 recursive
 acronym
 for
 SPARQL
 Protocol
 and
 RDF
 Query
 Language
 and
 is
 a
 query
 language,
 based
 on
 SQL,
 to
 obtain
 information
 stored
 in
 the
 RDF
 format.
 Implementing
 semantic
 web
 architectures
 is
 not
 a
 trivial
 task
 [3]
 for
 any
 kind
 of
 data.
 However,
 it
 is
 important
 to
 introduce
 these
 metadata
 structures
 and
 algorithms
 in
 bioinformatics,
as
they
will
become
part
of
Web3.0.
 Applying
semantic
web
concepts
and
technologies
in
bioinformatics
one
can
access,
 in
 a
 unified
 manner,
 several
 biological
 documents
 described
 with
 RDF.
 Automation
 of
 processes
 and
 improved
 machine‐machine
 data
 exchange
 are
 also
 enabled
 with
 the
 application
of
these
concepts.
Belleau
et
al.
propose
Bio2RDF
[4],
a
preliminary
approach
 




























































 4

Resource Description Framework: http://www.w3.org/RDF

5

Web Ontology Language: http://www.w3.org/2004/OWL

6

SPARQL Query Language for RDF: http://www.w3.org/TR/rdf-sparql-query

7



to
 create
 an
 engine
 which
 provides
 RDF
 access
 to
 biological
 data
 distributed
 through
 several
 databases
 such
 as
 KEGG
 or
 NCBI.
 Bio2RDF7
makes
 all
 the
 data
 available
 in
 their
 website
using
only
the
URL
to
locate
the
resources.


Splendiani
[5]
also
as
a
proposal
to
 bring
the
semantic
web
to
biology,
but
the
implementation
isn’t
as
advanced
as
Bio2RDF.
 These
 are
 the
 most
 recent
 implementations
 but
 biology
 and
 medicine
 are
 very
 difficult
 scientific
areas
due
to
the
complexity
in
defining
a
proper
ontology
that
covers
all
the
life
 sciences
concepts
and
terms.
 
 SOCIAL
ENVIRONMENTS
 Social
 networks
 and
 collaboration
 environments
 are
 some
 of
 the
 most
 popular
 Web2.0
applications.
These
applications
connect
users
and
allow
them
to
share
personal
 information,
 music,
 videos
 or
 any
 other
 type
 of
 data.
 Additionally,
 several
 small
 applications
 are
 developed
 to
 integrate
 information
 about
 different
 users
 or
 entertainment
 areas.
 For
 instance,
 a
 movies
 application
 would
 allow
 every
 user
 to
 describe
his
personal
movie
tastes;
when
used
in
a
large
scale
environment,
it
would
give
 the
 developers
 important
 information
 about
 cinema
 which
 could
 be
 used
 to
 improve
 advertisements
shown
to
the
user:
a
user
who
likes
horror
movies
would
have
a
greater
 probability
of
seeing
horror
movie
ads
than
one
who
likes
comedies.
Facebook8
is
one
of
 the
largest
worldwide
used
social
web
applications
with
over
120
million
users.
Using
the
 personal
 connections,
 personal
 preferences
 and
 other
 specific
 applications,
 Facebook
 owners
 have
 valuable
 market
 information.
 Like
 Facebook,
 MySpace9
or
 Google’s
 Orkut10
 provide
 almost
 the
 same
 functionalities
 to
 users.
 Experiencing
 a
 sustained
 growth
 is
 Carole
 Anne
 et
 al.
 [6]
 myExperiment11
which
 is
 the
 first
 bioinformatics
 social
 network
 application
 where
 one
 can
 connect
 with
 others,
 share
 files
 (with
 focus
 on
 Taverna
 workflows,
detailed
more
ahead
in
this
report)
and
create
scientific
communities.
Despite
 the
 focus
 on
 Taverna,
 myExperiment
 provides
 a
 rich
 scientific
 ecosystem
 offering
 the
 community
 a
 wide
 range
 of
 tools
 essential
 in
 any
 social
 collaborative
 environment.
 myExperiment
 also
 offers
 access
 to
 its
 services
 using
 RESTful
 programming
 interfaces,
 




























































 7

Bio2RDF: http://www.bio2rdf.org

8

Facebook: http://www.facebook.com

9

MySpace: http://www.myspace.com

10

Orkut: http://www.orkut.com

11

myExperiment: http://www.myexperiment.org

8



thus,
it
is
possible
to
build
new
applications
on
the
framework
or
use
myExperiment
data
 and
tools
to
improve
existing
ones.
 
 INTEGRATION
 Integration
in
bioinformatics
is
one
of
the
areas
where
more
groups
are
interested
 and
with
more
ongoing
work.
Integration
is
a
research
area
which
includes
the
mentioned
 semantic
 web
 and
 social
 networking
 tools
 besides
 other
 fields
 such
 as
 mashups
 or
 workflows.
A
workflow
is
a
simple
sequence
of
logic
steps
or
activities
that
are
executed
 independently
 from
 each
 other
 [7].
 Applying
 this
 generic
 concept
 to
 bioinformatics,
 one
 may
assume
that
a
workflow
is
an
organized
information
flow,
connecting
distinct
services
 and/or
 data
 sources
 in
 order
 to
 solve
 a
 problem
 in
 a
 modularly
 manner.
 The
 most
 used
 solution
 for
 workflow
 building
 and
 execution
 is
 Taverna
 [8,
 9].
 Taverna
 is
 a
 Java
 based
 desktop
application
offering
a
simple
interface
for
workflow
composition
and
execution.
It
 can
access
several
types
of
services
such
as
BioMoby
[10]
or
generic
WSDL
web
services.
 The
 major
 setback
 is
 that
 to
 integrate
 services,
 one
 must
 define
 an
 integration
 XML
 component
to
assist
information
piping
from
service
A
output
to
service
B
input.
Taverna
 can
 also
 be
 used
 from
 within
 other
 applications,
 allowing
 access
 to
 the
 results
 of
 previously
 saved
 workflows
 or
 executing
 workflows
 in
 real
 time.
 One
 of
 myExperiment
 functionalities
is
workflow
sharing,
one
may
access
a
large
workflow
storage
system
and
 find
 solutions
 developed
 by
 others
 or
 share
 one’s
 workflow
 and
 important
 development
 information.
Currently,
Taverna’s
greatest
flaw
is
being
desktop
based
as
we’re
assisting
a
 shift
 in
 the
 computational
 paradigm:
 web
 applications
 usage
 dominating
 over
 desktop
 ones.

 Alongside
with
workflows
there
are
mashups.
Mashups
begun
in
the
music
industry:
 they
were
simple
mixes
of
several
songs
into
a
single
song.
With
Web2.0,
this
idea
crossed
 to
 web
 applications.
 Mashups
 are
 web
 applications
 which
 combine
 information
 from
 a
 predefined
 collection
 of
 data
 sources
 or
 services
 in
 a
 single
 interface.
 We
 can
 consider
 a
 mashup
 as
 being
 a
 meta
 application:
 it
 basically
 creates
 a
 new
 application
 by
 using
 functionalities
provided
by
other
applications.
Online,
there
are
several
workflow/mashup
 building
 frameworks.
 It
 is
 important
 to
 mention
 Yahoo!
 Pipes12
and
 Microsoft
 Popfly13
 because
they
have
remarkable
interfaces
and
pre‐built
components
to
access
World
Wide
 




























































 12

Yahoo! Pipes: http://pipes.yahoo.com/pipes

13

Microsoft Popfly: http://www.popfly.com

9



Web
 most
 popular
 websites.
 Bioinformaticians
 can
 use
 these
 tools
 with
 data
 from
 different
 data
 sources
 to
 develop
 new
 applications.
 Cheung
 et
 al.
 [11]
 pursued
 this
 approach
 to
 create
 a
 biomedical
 mashup
 application.
 Despite
 this,
 the
 mentioned
 tools
 weren’t
 specifically
 designed
 to
 be
 used
 in
 the
 life
 sciences
 area.
 Therefore,
 several
 researchers
are
working
on
service
integration
frameworks:
de
Knikker
et
al.
[12]
have
a
 basic
 web
 service
 choreography
 scenario;
 Bio‐jETI
 from
 Margaria
 et
 al.
 [13]
 is
 a
 similar
 solution,
using
the
same
principles
as
de
Knikker.
These
tools
share
a
common
problem
in
 integration:
 the
 information
 sources
 heterogeneity
 doesn’t
 allow
 a
 fully
 automated
 integration
 solution.
 Each
 service
 stores
 and
 offers
 the
 data
 in
 its
 own
 model,
 increasing
 the
difficulty
in
concept
mapping
and
information
exchange.
There
isn’t
yet
an
automated
 tool
which
offers
a
simple
integration
interface,
allowing
the
use
of
components
from
any
 random
service.
BioMoby
[10]
is
an
initiative
to
create
an
ontology
and
central
repository
 of
 bioinformatic
 resources.
 With
 this
 semantic
 framework,
 one
 can
 share
 or
 use
 online
 services
 created
 by
 others
 in
 an
 almost
 automated
 fashion
 [14].
 BioMoby 14 
central
 repository
 faces
 typical
 resource
 discovery
 problems
 such
 as
 validation
 or
 duplication.
 Anyone
can
add
services
and
the
description
provided
or
service
functionality
may
not
be
 scientifically
 valid
 and
 induce
 errors
 to
 users.
 Duplication
 of
 services
 is
 also
 a
 problem:
 there
 can
 be
 any
 number
 of
 services
 doing
 the
 same
 task,
 thus
 it
 is
 difficult
 to
 choose
 which
ones
fits
better
in
the
desired
requirements.



 Fig. 3 – Existing
developments
categories 

SUMMARY
 Fully
automated
and
dynamic
integration
is
the
panacea
that
developers
haven’t
yet
 reached.
 Workflow
 or
 mashup
 solutions
 are
 the
 most
 popular
 to
 integrate
 services
 and
 




























































 14

BioMoby: http://www.biomoby.org

10



data
sources.
However,
both
of
them
imply
hard
coding
several
functionalities,
increasing
 dependency
on
developers
to
add
new
functionalities.
Applying
a
semantic
web
approach
 to
 bioinformatics
 will
 empower
 developers
 to
 create
 more
 independent
 applications.
 Describing
 services
 and
 information
 semantically
 will
 allow
 automated
 communication
 between
 heterogeneous
 applications.
 This
 will
 enhance
 existing
 workflow
 and
 mashup
 applications:
 it
 will
 be
 easier
 for
 users
 to
 add
 new
 services
 to
 existing
 applications,
 becoming
developers
of
new
meta
applications
adjusted
to
their
needs.


11



OUR ONGOING DEVELOPMENTS 

Our
 bioinformatics
 group
 is,
 like
 others,
 developing
 software
 solutions
 to
 solve
 problems
 associated
 with
 this
 specific
 area.
 The
 developed
 work
 didn’t
 focus
 on
 integration
 or
 semantic
 web.
 Our
 work
 was
 mostly
 focused
 on
 aiding
 microarray
 laboratory
 research.
 ANACONDA
 [15]
 is
 a
 tool
 to
 study
 gene
 primary
 structure.
 The
 Microarray
 Information
 Database
 –
 MIND
 [16]
 
 ‐
 is
 a
 web
 application
 which
 helps
 researchers
 in
 the
 task
 of
 analyzing
 microarray
 experiment
 results.
 More
 abstract
 than
 MIND
is
GeneBrowser
[17],
a
tool
for
gene
expression
studies
from
microarray
gene
lists
 results.
 However,
 the
 web
 trends
 and
 the
 association
 with
 projects
 like
 GEN2PHEN
 or
 ALERT15
brought
the
necessity
to
expand
our
group’s
application
range.
DynamicFlow
[18,
 19]
 is
 a
 web‐based
 workflow
 management
 application,
 providing
 Web2.0
 semi‐ autonomous
 service
 integration.
 DiseaseCard
 [20]
 is
 an
 older
 application,
 however
 it
 already
implements
basic
collaboration
and
integration
functionalities
which
later
became
 famous
with
Web2.0.
Further
developments
are
being
studied
to
implement
semantic
web
 engines,
mashup
applications
and
novel
information
visualization
techniques.
 
 DYNAMICFLOW
 DynamicFlow
is
a
framework
for
dynamic
integration
of
heterogeneous
information
 sources.

The
main
goal
when
developing
this
framework
was
to
create
a
novel
and
agile
 interface
for
service
integration.
The
application
should
have
a
usable,
easy
and
intuitive
 interface
for
solving
problems
using
a
“divide
and
conquer”
strategy:
the
main
problem
is
 divided
in
smaller
tasks
that
can
be
solved
with
a
certain
web
service;
the
tasks
are
then
 combined,
using
the
workflow
metaphor,
creating
an
information
flow
from
task
to
task,
 until
 we
 get
 the
 final
 solution.
 This
 modular
 approach
 could
 be
 useful
 for
 researchers
 because
 it
 is
 more
 similar
 to
 the
 plan
 they
 have
 when
 solving
 problems
 in
 the
 wet
 lab:
 structuring
 the
 problem
 and
 then
 solving
 it
 iteratively,
 using
 simple
 tasks
 in
 a
 web
 application
running
in
their
browser.

 




























































 15

ALERT Project: http://www.alert-project.org

12



Fig. 4 ­ DynamicFlow
framework
model


One
 of
 DynamicFlow’s
 key
 elements
 is
 its
 innovative
 model.
 The
 three‐layered
 model
 ‐
 Fig.
 4
 –
 divides
 the
 application
 in
 access:
 the
 bottom
 layer,
 containing
 the
 databases
and
the
external
services;
design,
the
top
layer
where
the
user
interactions
like
 workflow
 building
 occur,
 using
 AJAX
 technology
 and
 drag‐‘n‐drop
 metaphors;
 core,
 the
 processing
 layer
 which
 encompasses
 server‐side
 processing
 on
 the
 application’s
 web
 server
 and
 client‐side
 processing
 in
 the
 client’s
 browser.
 This
 is
 one
 of
 the
 framework’s
 main
features,
the
division
of
the
processing
layer
in
two
separate
components.
The
web
 server
 processes
 client
 requests
 and
 connects
 to
 the
 authentication
 server
 and
 the
 framework’s
DBMS
but
service–application
communication
and
data
piping
between
tasks
 are
 client‐side
 processed,
 reducing
 server
 charger
 and
 speeding
 up
 the
 application
 execution
with
an
increase
in
efficiency
and
response
time.
This
semi‐autonomous
process
 of
maintaining
a
valid
information
flow
from
one
service
to
the
next
is
possible
due
to
the
 service
 definition
 standard
 that
 was
 previously
 defined.
 The
 standard
 follows
 a
 simple
 ontology
 and
 provides
 an
 easy
 way
 for
 editing
 the
 available
 services.
 Using
 it,
 the
 application
 can
 validate
 workflow
 consistency,
 execute
 the
 workflow
 and
 display
 intermediate
results
all
using
the
browser’s
resources.
It’s
a
primitive
version
of
semantics
 in
an
information
integration
application.
 The
 work
 conducted
 resulted
 in
 a
 web
 application
 prototype
 available
 for
 testing
 and
 open
 to
 new
 developments.
 These
 new
 developments
 will
 be
 on
 five
 main
 topics:
 perfecting
the
service
definition
standard,
inclusion
of
semantic
web
technologies
(RDF),
 interface
improvements,
new
user
interaction
and
widening
the
service
range.
 


13



DISEASECARD
 
DiseaseCard 16
project
 has
 begun
 in
 2003
 with
 the
 objective
 of
 creating
 a
 rare
 disease
 link
 aggregator,
 integrating
 information
 from
 distributed
 and
 heterogeneous
 medical
 and
 genomic
 databases.
 The
 links
 were
 gathered
 by
 a
 web
 crawling
 engine
 and
 grouped
into
nodes
representing
concepts
‐
Fig.
5.
For
instance,
for
the
Peters
anomaly17
 disease,
 the
 node
 References
 contains
 all
 the
 reference
 sections
 of
 the
 NCBI
 OMIM18
 database
 that
 refer
 to
 this
 disease
 and
 the
 node
 Pathology
 contains
 Orphanet 19
 information
about
this
disease.
Along
with
the
external
information,
each
disease
also
has
 a
 forum
 entry,
 where
 any
 registered
 user
 can
 share
 his
 personal
 experience.
 A
 tree
 –
 similar
 to
 Windows
 Explorer
 one
 –
 shows
 all
 the
 nodes
 and
 their
 collection
 of
 links,
 displaying,
in
a
unified
interface,
information
from
the
genotype
to
the
phenotype.
As
we
 want
to
gather
as
much
information
as
possible,
rare
diseases
are
the
main
target
due
to
 their
high
association
between
genotype
and
phenotype.
It
is
important
to
mention
that
no
 database
 information
 is
 replicated:
 DiseaseCard
 only
 saves
 link
 information
 of
 shared
 data.
Modern
concepts
like
integration
–
heterogeneous
link
gathering
–
and
collaboration
 –
public
disease
forums
–
where
already
considered
when
developing
the
system.




 




























































 16

DiseaseCard: http://www.diseasecard.org

17

Peters Anomaly disease card: http://diseasecard.org/evaluateCard.do?diseaseid=604229

18

OMIM Home: http://www.ncbi.nlm.nih.gov/omim

19

Orphanet: http://www.orpha.net/consor/cgi-bin/index.php

14



Fig. 5 ­ DiseaseCard
concept
map 


As
 the
 application
 got
 older,
 it
 lost
 quality:
 the
 web
 crawling
 engine
 doesn’t
 automatically
adapt
to
link
changes
and
so,
for
several
concepts,
the
resulting
nodes
were
 empty.
 In
 a
 preliminary
 analysis
 of
 GEN2PHEN
 goals
 and
 how
 they
 can
 be
 achieved,
 we
 concluded
 that
 DiseaseCard
 was
 the
 most
 adequate
 solution
 and
 should
 be
 under
 development
 again.
 After
 a
 careful
 analysis
 and
 the
 definition
 of
 an
 action
 plan,
 its
 operability
 was
 restored,
 the
 crawler
 was
 corrected,
 the
 interface
 got
 a
 new
 look
 and
 DiseaseCard
is
back
on
track.
 As
far
as
GEN2PHEN
is
concerned,
DiseaseCard
will
be
a
simple
way
to
achieve
some
 of
the
initially
proposed
goals.
In
the
future,
adding
GEN2PHEN
related
databases
and
web
 portals
is
a
priority
to
complete
the
application.
The
inclusion
of
semantics
in
DiseaseCard
 and
 in
 the
 portals
 it
 crawls
 will
 ease
 the
 crawling
 process
 and
 improve
 the
 obtained
 results
 precision.
 Information
 miming
 features
 are
 also
 being
 researched:
 even
 if
 it
 only
 stores
links,
DiseaseCard
contains
valuable
information
in
those
links
which
can
be
useful
 in
new
types
of
queries.
 
 SUMMARY
 Both
 DynamicFlow
 and
 DiseaseCard
 are
 ongoing
 projects
 that
 will
 be
 developed
 within
the
GEN2PHEN
perspective.
The
next
section
details
new
functionalities,
interfaces
 and
user
interactions
that
can
be
implemented
in
either
of
these
applications
in
order
to
 improve
their
quality.



15



FUTURE PERSPECTIVES 

Web2.0
 changed
 Internet
 forever.
 Developers
 don’t
 just
 care
 about
 what
 the
 application
does
anymore
but
also
what
the
users
want
it
to
do.
Users
are
now
the
most
 important
part
of
the
Internet.
They
produce
content,
they
have
their
own
web
footprint,
 and
they
are
part
of
a
new
online
community.
If
Web2.0
is
the
social
web,
Web3.0
may
be
 the
 intelligent
 web.
 Despite
 being
 science
 fiction,
 Web3.0
 is
 nearer
 one
 may
 think.
 Different
platforms
can
communicate
with
each
other
automatically;
“cloud‐computing”
is
 taking
 over
 the
 web;
 web
 is
 getting
 intelligent
 with
 new
 semantics;
 distributed
 applications
are
being
integrated.
These
facts,
which
were
mere
dreams
a
few
years
ago,
 are
 empowering
 the
 Internet
 with
 new
 solutions
 and
 establishing
 it
 as
 the
 platform
 for
 everything:
productivity,
entertainment,
research,
leisure…
 
 CLOUD‐COMPUTING
 New
computing
paradigms
are
changing
the
Internet
at
the
architecture
level.
GRID
 [21]
 architectures
 are
 the
 new
 solution
 for
 distributed
 computing.
 Virtualization
 improvements
 [22]
 make
 virtual
 machines
 almost
 as
 powerful
 as
 real
 ones.
 “Cloud‐ computing”
 [23]
 uses
 the
 best
 of
 both
 to
 offer
 an
 online
 development
 environment.
 Microsoft
with
the
Azure
Services
Platform20,
Amazon
with
the
Elastic
Compute
Cloud21
or
 Google
 with
 its
 App
 Engine22
offer
 access
 to
 virtual
 machines
 where
 anyone
 can
 deploy
 applications
 which
 will
 use
 distributed
 resources
 to
 guarantee
 real‐time
 scalability,
 flexibility
and
availability.
 Following
 the
 same
 paradigm
 trend,
 new
 web
 applications
 and
 web
 applications
 suites
 are
 replacing
 traditional
 desktop
 apps.
 For
 instance,
 Microsoft’s
 Live23
suite
 offers
 almost
all
the
Office
suite
tools
online
and
Google24
also
has
the
essential
productivity
tools
 online,
in
the
“cloud”.
 




























































 20

Azure Services Platform: http://www.microsoft.com/azure/default.mspx

21

Amazon Elastic Compute Cloud: http://aws.amazon.com/ec2

22

Google App Engine: http://code.google.com/appengine

23

Microsoft Live: http://www.live.com

24

Google Apps: http://www.google.com/apps

16



INFORMATION
INTEGRATION
 Considering
 information
 integration
 tools
 one
 can
 explore
 mashups
 and
 web
 desktops.
Popular
mashup
applications
are
personal
and
customizable
web
portals,
made
 with
 gadgets
 that
 access
 almost
 any
 web
 application.
 Netvibes25
is
 definitely
 the
 most
 complete
personal
portal
in
the
Web.
However,
the
most
famous
is
Google’s
iGoogle26.
Both
 offer,
 in
 a
 simple
 interface,
 the
 ability
 to
 customize
 a
 page
 with
 any
 gadgets
 we
 want.
 Available
gadgets
include
e‐mail
access,
calendars,
to‐do
lists,
newsreaders
and
almost
any
 interesting
tool
to
include
in
a
single
portal.




 Fig. 6 ­ iGoogle
gadget
interface
stub


Web
 desktops
 are
 web
 applications
 that
 simulate
 the
 traditional
 desktop
 environment:
there’s
wallpaper,
icons
to
access
applications,
trash
bin,
task
bar
and
menus
 for
applications.
eyeOS27
is
a
cloud
computing
operating
system
allowing
any
user
to
work
 online
 in
 a
 vast
 set
 of
 applications.
 Besides
 this,
 it
 is
 also
 an
 open
 source
 development
 platform:
users
can
create
their
applications
and
install
them
on
their
web
desktop.
 































































 25

Netvibes: http://www.netvibes.com

26

iGoogle: http://www.google.com/ig

27

eyeOS: http://eyeos.org

17



DATA
VISUALIZATION
 Other
 interesting
 area
 is
 data
 visualization.
 Traditionally,
 search
 results
 are
 listed
 with
a
simple
description.
However,
new
search
engines
like
Viewzi28
or
Searchme29
offer
 results
in
different
interfaces.
The
results
are
presented
in
a
much
more
visually
appealing
 interface.
 Screenshots
 are
 taken
 from
 the
 pages
 and
 show
 in
 grids
 or
 lists.
 Results
 are
 ordered
 by
 date
 to
 form
 a
 chronological
 sequence.
 Information
 is
 gathered
 from
 distinct
 search
engines
in
order
to
better
rank
the
results.
Context
relations
are
established
among
 results
to
create
a
visual
relational
tree.
The
distinct
visualizations
of
the
same
results
are
 important
as
they
can
offer
distinctive
insights
on
the
same
data.
Aiming
an
improved
user
 interaction
and
greater
usage
satisfaction,
these
tools
rely
on
AJAX,
Flash
or
Silverlight
to
 create
captivating
and
usable
interfaces.



 Fig. 7 ­ Viewzi
result
grid
for
gen2phen
search































































 28

Viewzi: http://www.viewzi.com

29

Searchme: http://www.searchme.com

18



SUMMARY
 All
 the
 presented
 applications
 and
 interfaces
 are
 new
 solutions
 that
 are
 being
 considered
in
several
thematic
fields.
They
represent
the
first
step
to
the
next
generation
 of
web
applications
and
open
the
door
to
a
new
level
of
user
interaction.
 This
new
wave
of
web
applications
will
have
repercussions
on
bioinformatics.
New
 applications
 like
 iBioinformatics
 and
 BioDesktop
 or
 new
 result
 visualization
 tools
 could
 leave
their
mark
in
the
bioinformatics
world.
 From
 the
 iGoogle
 and
 Netvibes
 example
 one
 could
 develop
 a
 similar
 portal,
 integrating
 gadgets
 and
 applications
 in
 a
 single
 interface.
 iBioinformatics
 or
 BioVibes
 would
 represent
 a
 leap
 forward
 in
 integration
 and
 personalization.
 If
 one
 could
 create
 a
 large
 range
 of
 services
 in
 the
 gadget
 repository,
 any
 research
 could
 customize
 the
 application
according
to
his
needs,
thus,
creating
his
own
personal
meta
application.
 BioDesktop
 or
 BiOS
 could
 be
 an
 EyeOS
 based
 bioinformatics
 and
 biomedical
 web
 desktop.
Following
the
desktop
metaphor,
one
could
create
a
web
desktop
implementation
 containing
applications
and
tools
useful
for
researchers.
Any
user
could
then
have
his
own
 personal
desktop
online,
customized
according
to
his
own
needs
and
taste.
 Integration
plays
a
large
role
in
the
future
of
bioinformatics,
but
data
visualization
is
 also
important.
Web
screenshots
are
useful
to
show
a
preview
of
the
page
we’re
searching.
 This
idea
could
be
applied
to
bioinformatics
search
results,
showing
pathway
previews
or
 protein
structure
previews.
Arranging
the
results
in
grids
or
lists
and
using
technologies
 like
AJAX,
Flash
or
Silverlight
to
create
new
interfaces
one
could
develop
interesting
and
 useful
applications.
 


19



CONCLUSION 
 Bioinformatics
 applications
 are
 evolving.
 Evolution
 isn’t
 a
 simple
 process
 and
 choosing
 the
 right
 path
 isn’t
 a
 trivial
 task.
 This
 evolution
 process
 is
 usually
 sustained
 by
 large
projects
like
the
Human
Genome
Project
a
few
years
ago
or
the
European
GEN2PHEN
 project
now.
 As
 bioinformatics
 is
 evolving,
 so
 are
 other
 software
 applications.
 The
 trend
 is
 to
 move
 the
 software
 to
 the
 web
 and
 to
 make
 it
 available,
 freely,
 to
 the
 entire
 world.
 This
 process
may
be
complex,
but
in
the
end,
the
positive
aspects
rule
over
the
tradeoffs
that
 have
to
be
made.
 For
bioinformatics,
continuing
this
ride
along
with
state‐of‐the‐art
web
technologies
 is
a
tremendous
task.
The
life
sciences
area
is
definitely
one
of
the
areas
where
the
amount
 of
 data
 is
 larger,
 and
 where
 the
 differences
 between
 applications
 and
 services
 are
 more
 noticeable.
 This
 leads
 to
 an
 enormous
 complexity
 in
 integration
 heterogeneous
 information
sources.
 Despite
 these
 facts,
 several
 groups
 are
 working
 to
 solve
 integration
 problems
 and
 they
 have
 several
 approaches.
 Semantic
 web
 concepts
 for
 better
 machine‐machine
 exchanges
 or
 “proprietary”
 integration
 frameworks
 using
 hard‐coded
 concept
 mapping
 are
solutions
currently
under
development.
However,
there
isn’t
any
heavenly
solution
for
 these
 problems.
 Fully
 automatic
 and
 dynamic
 information
 integration
 hasn’t
 yet
 been
 achieved
and
is
still
science
fiction.
 Hopefully,
using
the
presented
perspectives
and
using
more
concepts
from
success
 cases
in
other
areas
like
entertainment
or
CRM,
will
enhance
current
bioinformatics
web
 applications
and
empower
developers
with
tools
to
design
new
ones.
 


20



REFERENCES 
 1.


Berners‐Lee,
T.,
Hendler,
J.,
Lassila,
O.:
The
Semantic
Web.
Sci
Am
284 (2001)
34
‐


43
 2.


Goble,
 C.,
 Stevens,
 R.:
 State
 of
 the
 nation
 in
 data
 integration
 for
 bioinformatics.


Journal
of
Biomedical
Informatics
41 (2008)
687‐693
 3.


Fielding,
R.:
Semantic
Web
Services
Challenge:
Architectural
Styles
and
the
Design


of
Network‐based
Software
Architectures.
Semantic
Web
Services
Challenge:
Challenge
on
 Automating
 Web
 Services
 Mediation,
 Choreography
 and
 Discovery:
 2006;
 Stanford
 University,
USA
(2000)

 4.


Belleau,
F.,
Nolin,
M.‐A.,
Tourigny,
N.,
Rigault,
P.,
Morissette,
J.:
Bio2RDF:
Towards
a


mashup
to
build
bioinformatics
knowledge
systems.
Journal
of
Biomedical
Informatics
41 (2008)
706‐716
 5.


Splendiani,
 A.:
 RDFScape:
 Semantic
 Web
 meets
 Systems
 Biology.
 BMC


Bioinformatics
9 (2008)
S6
 6.


Carole
 Anne,
 G.,
 David
 Charles
 De,
 R.:
 myExperiment:
 social
 networking
 for


workflow‐using
e‐scientists.
Proceedings
of
the
2nd
workshop
on
Workflows
in
support
of
 large‐scale
science.
ACM,
Monterey,
California,
USA
(2007)
 7.


Cardoso,
 J.,
 Sheth,
 A.:
 Semantic
 E‐Workflow
 Composition.
 Journal
 of
 Intelligent


Information
Systems
(2003)

 8.


Ludascher,
 B.,
 Altintas,
 I.,
 Berkley,
 C.,
 Higgings,
 D.,
 Jaeger,
 E.,
 Jones,
 M.,
 Lee,
 E.A.,


Tao,
 J.,
 Zhao,
 Y.:
 Taverna:
 Scientific
 Workflow
 Management
 and
 the
 Kepler
 System.
 Research
Articles,
Concurrency
and
Computation:
Practice
&
Experience
18 (2006)
1039
‐
 1065
 9.


Oinn,
 T.,
 Addis,
 M.,
 Ferris,
 J.,
 Marvin,
 D.,
 Senger,
 M.,
 Greenwood,
 M.,
 Carver,
 T.,


Glover,
K.,
Pocock,
M.R.,
Wipat,
A.,
Li,
P.:
Taverna:
a
tool
for
the
composition
and
enactment
 of
bioinformatics
workflows.
Bioinformatics
20 (2004)
3045
‐
3054
 10.


Wilkinson,
 M.,
 Links,
 M.:
 BioMoby:
 An
 open
 source
 biological
 web
 services


proposal.
Brief
Bioinform
3 (2002)
331
‐
341
 11.


Cheung,
 K.‐H.,
 Yip,
 K.Y.,
 Townsend,
 J.P.,
 Scotch,
 M.:
 HCLS
 2.0/3.0:
 Health
 care
 and


life
sciences
data
mashup
using
Web
2.0/3.0.
Journal
of
Biomedical
Informatics
41 (2008)
 694‐705


21



12.


de
 Knikker,
 R.,
 Guo,
 Y.,
 Li,
 J.‐l.,
 Kwan,
 A.,
 Yip,
 K.,
 Cheung,
 D.,
 Cheung,
 K.‐H.:
 A
 web


services
 choreography
 scenario
 for
 interoperating
 bioinformatics
 applications.
 BMC
 Bioinformatics
5 (2004)
25
 13.


Margaria,
 T.,
 Kubczak,
 C.,
 Steffen,
 B.:
 Bio‐jETI:
 a
 service
 integration,
 design,
 and


provisioning
 platform
 for
 orchestrated
 bioinformatics
 processes.
 BMC
 Bioinformatics
 9 (2008)
S12
 14.


DiBernardo,
 M.,
 Pottinger,
 R.,
 Wilkinson,
 M.:
 Semi‐automatic
 web
 service


composition
for
the
life
sciences
using
the
BioMoby
semantic
web
framework.
Journal
of
 Biomedical
Informatics
41 (2008)
837‐847
 15.


Pinheiro,
 M.,
 Afreixo,
 V.,
 Moura,
 G.,
 Freitas,
 A.,
 Santos,
 M.A.S.,
 Oliveira,
 J.L.:


Statistical,
 computational
 and
 visualization
 methodologies
 to
 unveil
 gene
 primary
 structure
features.
Vol.
vol.
45,
n.¬∫
2
(2006)
p.
163
‐
168
 16.


Joel,
A.,
Laura,
C.,
Manuel,
A.S.S.,
José
Luis,
O.:
Collaborative
work
on
microarrays


using
MAGE‐ML.
MGED
9:
The
meeting
of
the
Microarray
Gene
Expression
Data
Society
 17.


Arrais,
 J.,
 Santos,
 B.,
 Fernandes,
 J.,
 Carreto,
 L.,
 Santos,
 M.,
 A.
 S.,
 Oliveira,
 J.L.:


GeneBrowser:
 an
 approach
 for
 integration
 and
 functional
 classification
 of
 genomic
 data.
 Vol.
vol.
4,
n.º
3
(2007)
 18.


Lopes,
 P.:
 Service
 Integration
 for
 Knowledge
 Extraction.
 Electronics,


Telecommunications
 and
 Informatics
 Department,
 Vol.
 Master
 of
 Science.
 University
 of
 Aveiro,
Aveiro
(2008)
 19.


Lopes,
 P.,
 Arrais,
 J.,
 Oliveira,
 J.L.:
 Dynamic
 Service
 Integration
 using
 Web‐based


Workflows.
 In:
 Society,
 A.C.
 (ed.):
 10th
 International
 Conference
 on
 Information
 Integration
 and
 Web
 Applications
 &
 Services.
 Association
 for
 Computer
 Machinery,
 Linz,
 Austria
(2008)
622‐625
 20.


Oliveira,
J.L.,
Dias,
G.M.S.,
Oliveira,
I.F.C.,
Rocha,
P.D.N.S.d.,
Hermosilla
,
I.,
Vicente,
J.,


Spiteri,
 I.,
 Martin‐Sánchez,
 F.,
 Pereira
 ,
 A.M.M.d.S.:
 DISEASECARD:
 A
 Web‐based
 Tool
 for
 the
 Collaborative
 Integration
 of
 Genetic
 and
 Medical
 Information.
 5th
 International
 Symposium,
ISBMDA
2004:
Biological
and
Medical
Data
Analysis
(2004)
409‐417
 21.


Nadeem,
 F.,
 Yousaf,
 M.M.,
 Ali,
 M.:
 Grid
 Performance
 Prediction:
 Requirements,


Framework,
and
Models.
Emerging
Technologies,
2006.
ICET
'06.
International
Conference
 on
(2006)
695‐702
 22.


Chen,
 W.,
 Lu,
 H.,
 Shen,
 L.,
 Wang,
 Z.,
 Xiao,
 N.,
 Chen,
 D.:
 A
 Novel
 Hardware
 Assisted


Full
 Virtualization
 Technique.
 Young
 Computer
 Scientists,
 2008.
 ICYCS
 2008.
 The
 9th
 International
Conference
for
(2008)
1292‐1297


22



23.


Vouk,
M.A.:
Cloud
computing
‐
Issues,
research
and
implementations.
Information


Technology
Interfaces,
2008.
ITI
2008.
30th
International
Conference
on
(2008)
31‐40
 


23


From Genotype to Phenotype - Future Perspectives on Data and Services Integration  

GEN2PHEN internal report created for bioinformatics course. Provides a general overview of current Web2.0 applications and how can they be u...