Page 1



State of the problem  

A tool for building phylogenetic profiles of a gene against a set of archaea, bacteria and eukaryotes

Phylogenetic profiling creates a binary vector where 1 denotes the presence of the gene in the genome, and 0 – the absence.

Proposed extension 

Rebuild phylogenetic profiles using more genomes than there were previously available

Given a sequence (protein), show profile “on the fly” and predict its function

implementation 

I focused on the 2nd extension for processing the sequence “on the fly”

Using NCBI Blastp online tool, I aligned the sequence against other genes from the system

I crawled the website and parsed the good results, with query cover (> 70%)

The genes are given by accession number / gi – from any of these ids I found the taxonomy number

The taxonomy number would represent the genome

Profiling: 1 for the taxonomies where a matching gene was found, 0 for all the others

I save all the taxonomies and their mapping to the domain {EUK, ARCH, BACT} in a database

The new profile will not be saved in the database

Demo with input a giardia gene


This Giardia gene actually matches with genes from all three domains

To be continued 

Improve the speed by running local Blast instead of online

Functional prediction – the function might be assumed identical to the best matched gene

Re-build the previous phylogenetic profiles using more genomes & local BLAST


Phylogenetic profiling  
Read more
Read more
Similar to
Popular now
Just for you