Bioinformation discovery data to knowledge in biology pandjassarame kangueane - Download the ebook n by Ebook Home

Visit to download the full and correct content document: https://textbookfull.com/product/bioinformation-discovery-data-to-knowledge-in-biolog y-pandjassarame-kangueane/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Ecological informatics data management and knowledge discovery Michener

https://textbookfull.com/product/ecological-informatics-datamanagement-and-knowledge-discovery-michener/

Knowledge Discovery in Big Data from Astronomy and Earth Observation: Astrogeoinformatics 1st Edition Petr Skoda (Editor)

https://textbookfull.com/product/knowledge-discovery-in-big-datafrom-astronomy-and-earth-observation-astrogeoinformatics-1stedition-petr-skoda-editor/

The Essentials of Data Science Knowledge Discovery Using R 1st Edition Graham J. Williams

https://textbookfull.com/product/the-essentials-of-data-scienceknowledge-discovery-using-r-1st-edition-graham-j-williams/

Advances in Knowledge Discovery and Management Volume 8 Bruno Pinaud

https://textbookfull.com/product/advances-in-knowledge-discoveryand-management-volume-8-bruno-pinaud/

Foundations of Predictive Analytics (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series) 1st Edition

James Wu

https://textbookfull.com/product/foundations-of-predictiveanalytics-chapman-hall-crc-data-mining-and-knowledge-discoveryseries-1st-edition-james-wu/

Discovering Knowledge in data An Introduction to Data Mining Second Edition Daniel T. Larose

https://textbookfull.com/product/discovering-knowledge-in-dataan-introduction-to-data-mining-second-edition-daniel-t-larose/

Multi-Scale Approaches in Drug Discovery. From Empirical Knowledge to In Silico Experiments and Back Alejandro Speck-Planche (Eds.)

https://textbookfull.com/product/multi-scale-approaches-in-drugdiscovery-from-empirical-knowledge-to-in-silico-experiments-andback-alejandro-speck-planche-eds/

Big Data Analytics and Knowledge Discovery 20th International Conference DaWaK 2018 Regensburg Germany September 3 6 2018 Proceedings Carlos Ordonez

https://textbookfull.com/product/big-data-analytics-andknowledge-discovery-20th-international-conferencedawak-2018-regensburg-germany-september-3-6-2018-proceedingscarlos-ordonez/

Advances in Knowledge Discovery and Data Mining 23rd

Pacific Asia Conference PAKDD 2019 Macau China April 14 17 2019 Proceedings Part II Qiang Yang

https://textbookfull.com/product/advances-in-knowledge-discoveryand-data-mining-23rd-pacific-asia-conference-pakdd-2019-macauchina-april-14-17-2019-proceedings-part-ii-qiang-yang/

Bioinformation Discovery

Pandjassarame Kangueane

Bioinformation Discovery

Data to Knowledge in Biology

Second Edition

Pandjassarame Kangueane Pondicherry, India

ISBN 978-3-319-95326-7 ISBN 978-3-319-95327-4 (eBook) https://doi.org/10.1007/978-3-319-95327-4

Library of Congress Control Number: 2018949608

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Dedicated to the creator of life on earth and to humanity that ponders its universal existence.

Preface

The purpose of the book titled Bioinformation Discovery: Data to Knowledge in Biology is to illustrate the power of biological data in knowledge discovery. The book consists of 10 chapters spanning approximately 200 pages. It describes biological data types and representations with examples for creating a workflow in bioinformation discovery. The concepts in biological knowledge discovery from data are illustrated using line diagrams. This book provides clarity to graduate students entering research in biology to design experiments and to formulate hypothesis.

The principles and concepts in biological knowledge discovery are used for the identification of prevalent rules toward the development of prediction models. Simulations of biological reactions using prediction models will further help in the design of its components. Advanced topics in molecular evolution in the context of cellular and molecular biology are addressed using bioinformation gleaned through knowledge discovery from data. The salient features of the book are (1) bioinformation discovery as a new domain in biology, (2) biological data representation, (3) biological dataset creation from databases, (4) biological knowledge extraction from data, (5) examples of knowledge discovery, and (6) exercises for practice. The exercise problems are designed to help students to expand their problem-solving skills in bioinformation discovery.

Pondicherry, India

Pandjassarame Kangueane

Acknowledgments

I wish to express my sincere appreciation to all members of Biomedical Informatics (P) Ltd. for many discussions on the subject of this book. I also thank all my colleagues (Dr. S. Subbiah, Dr. Meena Sakharkar, Dr. Venkatarajan Mathura, Dr. P. Gautam, Dr. B. S. Lakshmi), associates (Dr. Tan Tin Wee, Dr. P. R. Kolatkar, Dr. E. C. Ren), collaborators (Dr. Paul Shapshak, Dr. Francesco Chiappelli, Dr. Kannan Gunasekaran), staffs (Ms. R. Kayathri, Ms. N. Dandona, Ms. C. Iti, Mr. Lee Pern Chern), and students (Dr. Zhao Bing, Dr. Yu Yiting, Dr. Lei Li, Dr. Cui Zhanhua, Ms. Lim Yun Ping, Dr. A. Mohanapriya, Dr. Sajitha Lulu, Dr. M. Jayanthi, Dr. G. Sowmya, Dr. Abishek Suresh, Dr. V. Karthikraja, Ms. A. Vaishnavi, Ms. G. Shamini, Ms. S. Anita, Ms. Ilakya, Mr. G. Kalaivani) in my professional life, especially during 1993–2018 without whom this edition of the book would not have been materialized. I would like to thank the authors of several bioinformatics tools, techniques, and databases made available in the public domain through open access and open source publishing models. I am also thankful to Ms. K. Uma and Ms. C. Nilofer for help with the development of this book.

Pandjassarame Kangueane

1.26 Discovery Environment

1.27 Sequence, Structure Alignment, and Evolutionary Inferences

1.27.1 Sequence Alignment

1.28.1 Protein Modeling

1.28.2 Methods of Protein Modeling

1.28.3 Popular Force Fields for Molecular Mechanics

1.28.4 Prediction of Protein Structure.

1.28.5 Caveats on Homology Modeling

3.7 FASTA

3.8 INSIGHT II

3.9 GENSCAN

3.10 GROMOS.

3.11 HBPLUS

3.12 LALIGN/PLALIGN

3.13 LIGPLOT

3.14 LOOK

3.15 MODELLER

3.16 NACCESS

3.17 PHYLIP

3.18 PROTPARAM

3.19 PROTORP

3.20 PSAP

3.21

5.8

6.1 Gene Fusion .

6.2 Operons in Prokaryotes as Human Fusion Proteins

6.3 Multiple Functions in Fusion Proteins

6.4 Alternative Splicing in Fusion Genes

6.5 Protein Subunit Interaction and Fusion Proteins

6.6 Mechanism of Gene Fusion

6.7 Hypothesis of Gene Fusion

6.8 Structural Importance of Fusion Proteins

6.8.1 Fusion Protein IGPS Function

6.8.2 Fusion Protein IGPS Structure

6.8.3 IGPS Sequence, Structure, and Properties

6.8.4 Interface Area in IGPS

6.8.5

7.10.8 Note on T-EPITOPE .

7.11 HLA Supertypes. .

7.11.1 Grouping of HLA Alleles by Several Research Groups

7.11.2 Perplexing Issues with HLA Supertypes

7.11.3 Structural Basis for HLA Supertypes

7.11.4 Predictive Grouping of HLA Supertypes

7.11.5 Grouping Using Electrostatic Distribution Maps

7.11.6 Remarks on HLA Supertypes

7.12 Exercises

Exercises

List of Figures

Fig. 1.1 Relevance of bioinformatics in agriculture, healthcare, and biotechnology 2

Fig. 1.2 Evolution of bioinformatics and bioinformation 2

Fig. 1.3 Drug discovery pipeline. IND investigational new drug, NDA new drug application 4

Fig. 1.4 Skills for bioinformatics 5

Fig. 1.5 Types of data distribution are shown 7

Fig. 1.6 Description of energy function and energy minimization using first-order differentiation 7

Fig. 1.7 Illustration of regression analysis and determination of Pearson correlation coefficient (r) as shown 8

Fig. 1.8 Data warehousing in a discovery environment 8

Fig. 1.9 Bioinformatics components 9

Fig. 1.10 Biological knowledge discovery flowchart 10

Fig. 1.11 Bioinformatics variables 10

Fig. 1.12 Bioinformatics principle 13

Fig. 1.13 Bioinformatics challenges 14

Fig. 1.14 Biological database and their associations 14

Fig. 1.15 Data exchange between NCBI (USA), EBI (Europe), and CIB (Japan). Please refer to Table 1.4 for description on NCBI, EBI, and CIB 15

Fig. 1.16 Data explosion in biological domain 16

Fig. 1.17 Genetic data growth in GenBank 17

Fig. 1.18 Divisions in GenBank. BCT bacteria, FUN functional, HUM human, INV invertebrate, MAM mammals, ORG organelle, PHG phage, PLN plant, PRI primate, PRO prokaryote, ROD rodent, SYN synthetic, VRL viral, VRT vertebrate, PAT patent, EST expressed sequence tags, STS sequence-tagged sites, GSS genome

survey sequences, HTG high-throughput genomic, HTC highthroughput cDNA, CON contigs

Fig. 1.19 Structural and classifications

Fig. 1.20 Protein structure and its components

Fig. 1.21 SCOP classification and folds

Fig. 1.22 CATH and classification . . . .

Fig. 1.23 An example pathway (glucose metabolism) is shown. This pathway consists of two sections (glycolysis and citric acid cycle). Glucose, glucose-6-phosphate, pyruvate, lactate, acetyl-co-A, citric acid, α-ketoglutarate, and oxaloacetate are small molecule metabolites. In this example pyruvate dehydrogenase is the catalyzing protein enzyme

Fig. 1.24 Major bioinformatics development based on category 22

Fig. 1.25 Tools and concepts in bioinformatics 23

Fig. 1.26 Issues in a biological discovery environment

Fig. 1.27 Types of molecular interactions 24

Fig. 1.28 Sequence and structure alignment relation 25

Fig. 1.29 Illustration of sequence alignment by global and local alignment is shown

Fig. 1.30 Protein modeling principles. Force field equation (top), force field terms (middle), unfolded to folded (bottom)

Fig. 1.31 Protein structure prediction is illustrated. The steps involved in the prediction of a protein structure are shown

Fig. 1.32 Schematic diagram illustrating the docking of a small molecule ligand to a protein target to produce a target-ligand complex

Fig. 1.33 Structure of the target Candida rugosa lipase (CRL) is shown

Fig. 1.34 Molecular docking of Candida rugosa lipase (CRL) with the isomers of ibuprofen is shown. H-bonds formed between S(+) ibuprofen and target stabilizes binding. This is not true for R(−) ibuprofen with the docked target

Fig. 1.35 Phylogenetic analysis. (a) Properties of phylogenetic trees. Leaves (vertices) represent species or sequences compared. Nodes (vertices) represent bifurcations, speciation events, and hypothetical ancestor sequences. Branches (edges) represent sequence diversity. Branch lengths represent sequence variation over time and rate of change. The root (vertice) represents the hypothetical ancestor. (b) Relationships of apes and humans are shown using a sample phylogenetic tree

Fig. 1.36 Different types of phylogenetic analysis methods are illustrated .

Fig. 2.1 Creating biological datasets for knowledge discovery. PDB Protein Data Bank, DDBJ DNA Data Bank of Japan, EMBL European Molecular Biology Laboratory, RCSB Research Collaboration for Structural Biology

Fig. 2.2 MHC-peptide binding at the binding groove is shown

Fig. 2.3 An example heterodimer structure complex of succinyl co-A synthetase (α) and succinyl co-A synthetase (β) is shown . . . . .

Fig. 2.4 Sequence alignment (using EMBOSS needle) between succinyl co-A synthetase (α) and succinyl co-A synthetase (β) is shown with percentage similarity and identity .

Fig. 2.5 An example homodimer structure complex of aspartate aminotransferase A and B subunits is shown

Fig. 2.6 Sequence alignment (using EMBOSS needle) between aspartate aminotransferase A and B subunits is shown with percentage similarity and identity

Fig. 2.7 Homodimer folding and binding mechanism is shown. 2S two state, 3SDI three state with dimer intermediate, 3SMI three state with monomer intermediate

Fig. 2.8 The gene structure of SEG and MEG is illustrated

Fig. 2.9 GenBank FEATURES and CDS annotation (bottom horizontal arrow) for a genomic DNA (top horizontal arrow)

Fig. 2.10 CDS annotation for direct, complement, and partial intronless genes

Fig. 2.11 Different CDS representations for intron-containing multiple exon genes in eukaryotes are illustrated

Fig. 2.12 Fusion protein scenario for imidazole glycerol phosphate synthetase (IGPS) in yeast and bacteria

Fig. 2.13 A structural model of a cholera toxin (CT) is shown. CT is a hetero-hexameric complex (AB5) consisting of CTA (cleaved into 194 residues A1 and 46 residues A2) and CTB (103 residues) pentamer with D, E, F, G, and H chains

Fig. 2.14 Creation of a dataset for CTA and CTB sequences from GenBank. A sequence dataset of CTA and CTB was derived from GenBank (release 177) using KEYWORD search as illustrated in the flowchart. The KEYWORD search “cholera toxin” resulted in 1257 hits. This set consists of 27 CTA sequences, 165 CTB sequences according to GenBank description and available annotations. The remaining 1065 sequences with descriptions such as secretion protein, cholera toxin transcriptional activator, ADP-ribosylation factor, GNAS complex, dopamine receptor, Pertussis toxin, Shiga-like toxin, and the like are eliminated from the dataset. Thus, a CT sequence dataset of 192 sequences consisting of 27 CTA and 165 CTB was created. The CTA and CTB sequences are included in the dataset as available in the

GenBank. The biased availability on the amount of CTA and CTB sequences in GenBank is attributed to the likely observation of frequent mutations in CTB .

Fig. 2.15 Superposition of electron microscopy (EM) structures (PDB ID, 5FUU (4.19 Å resolution) and 5U1F (6.8 Å resolution)) of HIV-1/GP160 (GP120/GP40) trimer spike protein complex. This is a trimer of three GP160 structures. Each GP160 is made of cleaved GP120 and GP40. Thus, (GP120/GP40)3 forms the viral spike protein complex. GP glycoprotein

Fig. 2.16 Biological knowledge pipeline from data is illustrated

Fig. 2.17 A graphical abstract of different datasets described in this chapter 69

Fig. 3.1 An example for global and local alignment is illustrated using ALIGN 76

Fig. 3.2 An example for BIMAS HLA-peptide-binding prediction is shown 77

Fig. 3.3 An example for BLAST analysis is shown 78

Fig. 3.4 An example for multiple sequence alignment is shown 78

Fig. 3.5 DeepView download page 79

Fig. 3.6 FASTA download page 80

Fig. 3.7 An example for GENSCAN output is shown 81

Fig. 3.8 A schematic representation of a hydrogen bond is illustrated 82

Fig. 3.9 Download page for HBPLUS 82

Fig. 3.10 An example of LALIGN/PLALIGN input/output 83

Fig. 3.11 An example for inhibitor-enzyme interaction is shown using LIGPLOT 84

Fig. 3.12 Binding of HLA A*0201 with mHag peptides HA-1H and HA-1R is modeled and shown using the LOOK interface 85

Fig. 3.13 The download page for MODELLER 85

Fig. 3.14 The download page for NACCESS 86

Fig. 3.15 The download page for PHYLIP is shown 87

Fig. 3.16 An example of PROTPARAM input/output is shown 87

Fig. 3.17 The web interface for PSAP is shown 88

Fig. 3.18 An example input/output for InterPro is shown

Fig. 3.19 Download page for PYMOL is shown

Fig. 3.20 Download page for RASMOL is shown

Fig. 3.21 The web interface for ROSETTA is shown

Fig. 3.22 The download page for SURFNET is shown

Fig. 3.23 An example of input/output for T-EPITOPE designer is shown

Fig. 4.1 Interface shape complementarity between interacting subunits

Fig. 4.2 The correspondence between interface residues in one dimension and three dimensions is illustrated . . . . . . . . . . . . . 97

Fig. 4.3 A hydrophobic residue interface is illustrated using 1M4U (PDB ID) showing P35-I86 as well as I33-P74 interaction. .

Fig. 4.4 An illustration of interface residues at the protein-protein interface using CPK representation is illustrated. This shows that a stable interface is critical for protein-protein binding. This figure is adapted from Nilofer et al. (2017) under the open access creative commons attribution license . .

Fig. 4.5 Relationship between interface size and interface area is shown. It is clear that interface area increases with interface size. This is adapted from Sowmya et al. (2011) under the open access creative commons attribution license 99

Fig. 4.6 An illustration of large, medium, and small interfaces is shown with corresponding homodimer complexes shown. 2S 2-state, 3SMI 3-state monomer intermediate, 3SDI 3-state dimer intermediate. This figure is adapted with permission from Kangueane and Nilofer (2018)

Fig. 4.7 Fractional distribution of interface residues is shown. Hydrophobic residues are dominant in homodimer interfaces. This figure is adapted from Zhanhua et al. (2005) under the open access creative commons attribution license

Fig. 4.8 The distribution of amino acid residues as a ratio of interface to surface and interior is shown for heterodimer and homodimer protein complexes. The ratio of charged residues at the interface to interior is high for heterodimer protein complexes. This trend is true for hydrophobic residues in homodimer protein complexes. This figure is adapted from Zhanhua et al. (2005) under the open access creative commons attribution license

Fig. 4.9 The contribution by hydrogen bond energy at the interface of obligatory, nonobligatory, and immune complexes is shown. It is noted that hydrogen bond energy contributes to about 15 ± 6.5% at the interface of protein-protein complexes. This figure is adapted with permission from Kangueane and Nilofer (2018)

Fig. 4.10 Relationship between interface size and hydrogen bond energy at the interface of obligatory, nonobligatory, and immune complexes is shown. This image is adapted from Nilofer et al. (2017) under the open access creative commons attribution license

Fig. 4.11 The contribution by electrostatic energy at the interface of obligatory, nonobligatory, and immune complexes is shown. It is noted that electrostatic energy contributes to about

100

11.3 ± 8.7% at the interface of protein-protein complexes. This figure is adapted with permission from Kangueane and Nilofer (2018) .

Fig. 4.12 Relationship between interface size and electrostatic energy at the interface of obligatory, nonobligatory, and immune complexes is shown. This image is adapted from Nilofer et al. (2017) under the open access creative commons attribution license

Fig. 4.13 Distribution of sidechain-sidechain interaction (S1S1I) at the interface is shown as a function of distance x (Å). Two atoms are considered to be interacting of the interacting if the distance between them is within the sum of their vdW radii plus x distance. This image is adapted from Li et al. (2006) under the open access creative commons attribution license

103

Fig. 4.14 An interface hot spot is shown. The interaction of residue K15 (PDB ID: BPTI, Chain I) to residues S190, S195, and V213 (Trypsin, Chain E) is shown (PDB ID: 2PTC). K15 has three interacting sidechain atoms (CB, CD, and NZ). It should be noted that these three atoms are involved in favorable contacts and only CB participates in unfavorable contacts. This image is adapted from Li et al. (2006) under the open access creative commons attribution license 104

Fig. 5.1 Homodimer folding and binding mechanism are illustrated 108

Fig. 5.2 Distribution of 2S, 3SMI, and 3SDI proteins in relation to size and interface area is demonstrated 110

Fig. 5.3 An example of a 2S protein is illustrated with binding mode

111

Fig. 5.4 An example of a 3SMI protein is illustrated with binding mode 112

Fig. 5.5 An example of a 3SDI protein is illustrated with binding mode

113

Fig. 5.6 The distribution of interface to total residues is shown for 2S, 3SMI, 3SDI proteins in Table 2.9. Ψ = 3SDI 114

Fig. 5.7 An illustration of large, medium, and small interfaces is shown among homodimers. This is adopted from Karthikraja et al. (2009) under the open access creative commons attribution license

Fig. 6.1 A fusion protein is illustrated

Fig. 6.2 A fusion protein mimicking operon-like structure is shown.

Fig. 6.3 A fusion protein with multiple functions is illustrated

118

Fig. 6.4 A fusion protein mimicking protein subunit interaction is illustrated

Fig. 6.5 IGPS structure in bacteria (not fused) and yeast (fused) .

Fig. 6.6 A fusion scenario for IGPS between bacteria and yeast is shown

Fig. 6.7 IGPS in bacteria and yeast before and after molecular dynamics simulation

Fig. 6.8 Interface area in IGPS from bacteria and yeast

Fig. 6.9 Gap volume for IGPS from bacteria and yeast

Fig. 6.10 Gap index for IGPS from bacteria and yeast

Fig. 6.11 Rg for IGPS from bacteria and yeast

Fig. 7.1 MHC gene loci

Fig. 7.2 HLA sequence growth at IMGT/HLA database

Fig. 7.3 MHC and its function in T-cell immunity

Fig. 7.4 Structure of class I MHC molecule. The structure consists of a peptide binding alpha subunit and supporting beta-2m. The alpha subunit consists of domain 1, 2, and 3. The peptide binding domain is 1 and 2

Fig. 7.5 Peptide binding domains of class I MHC molecules with bound peptide. The polymorphic residues are often centered at the peptide binding groove comprising of alpha 1 and alpha 2 domains

Fig. 7.6

Structural and sequence alignment of the α chain with highly polymorphic residues clustered in the peptide binding groove is shown

Fig. 7.7 Class II MHC molecule HLA-DC1 with the bound peptide is shown. The groove is formed by α and β chains

Fig. 7.8 Sequence anchors in HLA-A*0201 binding peptide motif are shown

Fig. 7.9 Structural alignment of human class II MHC specific peptides with known sequence and structural anchors. BOLD letters represent binding structural anchors

Fig. 7.10 Structural alignment of (a) class I and (b) class II HLA alleles with bound peptides. The binding groove of class I is formed by alpha 1 and alpha 2 domains. The binding groove of class I is formed by α chain and β chain. The peptides bound to class II molecules have extended conformation from the groove unlike class I molecules

Fig. 7.11 An illustration of the steps involved in the development of a prediction model for T-EPITOPE Designer. The model is based on information gleaned from MHC-peptide structures. HERP highly essential residue positions .

Fig. 7.12 An illustration of the user interface for T-EPITOPE Designer. The web interface contains (a) Overview,

136

137

139

141

142

146

149

(b) Service, (c) Model, (d) Designer, (e) Links, and (f) Team

Fig. 7.13 Definition of HLA supertypes

Fig. 7.14 Multiple sequence alignment of HLA alleles

Fig. 7.15 Critical residues in class I HLA structures for peptide binding. (a) Distribution in the dataset (see Table 2.4 in Chap. 2 for dataset), (b) mean, and (c) standard deviation about the mean are given

Fig. 7.16 Pockets (A–F) for peptide binding in class I HLA molecules is shown. Illustrated pockets are based on the pocket definition of Bjorkman et al. (1987)

Fig. 8.1 The cholera toxin (CT) hetero-hexameric complex (AB5) consisting of CTA (cleaved into A1 and A2) and CTB pentamer with D, E, F, G, and H subunits is shown. This image is adapted from Shamini et al. (2011) under the open access creative commons attribution license

Fig. 8.2 Protein-protein interfaces in CTB are shown. The interaction between subunits D and E and D and H is illustrated. Subunit D interacts with subunits E and H on either side having two different interfaces. This image is adapted from Shamini et al. (2011) under the open access creative commons attribution license

Fig. 8.3 Interface residue positions in CTA and CTB are shown using delta ASA as a measure of interface area. Residue positions with mutations in CTA and CTB are mapped to interfaces. This image is adapted from Shamini et al. (2011) under the open access creative commons attribution license

Fig. 8.4 Common mutations in CTA and CTB are illustrated using CPK residue models. These mutations are identified compared to the wild CT type sequence in a dataset of sequences summarized in Table 2.12. This image is adapted from Shamini et al. (2011) under the open access creative commons attribution license

Fig. 8.5 Known mutations mapped to the interface of CTA/CTB and within CTB are shown using CPK residue models in three dimensions. Please see Fig.8.3 for comparison. This image is adapted from Shamini et al. (2011) under the open access creative commons attribution license

Fig. 9.1 Schematic illustration of HIV-1 GP160 (GP120/GP40) trimer spike protein is shown with bound membrane. This complex is made of three GP120/GP40 assemblies. Protein-protein interfaces between GP120-GP120,

165

166

167

168

GP40-GP40, and GP120-GP40 is realized. This image is adapted with permission from Nilofer et al. (2017) . . .

Fig. 9.2 Superimposed electron microscopy (EM) structure of HIV-1 GP160 (GP120/GP40) trimer spike protein is shown. Each GP160 unit is made of GP120 at the top and GP40 at the bottom. This image is adapted with permission from Kangueane and Nilofer (2018) .

Fig. 9.3 Structure of GP120 and GP40 in monomer and trimer form is shown. V, variable region; C, constant region. This image is adapted from Sowmya et al. (2011) under the open access creative commons attribution license .

175

176

Fig. 9.4 Protein-protein interface between GP120/GP120 and GP40/GP40 is shown. This image is adapted with permission from Nilofer et al. (2017) 177

Fig. 9.5 Number of known HIV-1 GP160 ENV sequences in LANL database over a period of two decades. This image is adapted with permission from Nilofer et al. (2017) 177

Fig. 9.6 Glycosylated structure of HIV-1 GP160 trimer ENV protein is shown. Sugar moieties expanded as NAG (N-acetyl-d-glucosamine), BMA (beta-d-mannose), MAN (alpha-d-mannose), GAL (beta-d-galactose), and FUC (alpha-l-fucose). This image is adapted with permission from Nilofer et al. (2017) 178

Fig. 10.1 The gene structure for SEG and MEG is illustrated 184

Fig. 10.2 Mechanism of retro-transposition is illustrated with the formation of pseudogene, active/inactive retro-genes 187

Fig. 10.3 Definition of intron phases (a) illustrated with an example (b) 189

Fig. 10.4 Alternative splicing by exon skipping illustrated 189

Fig. 10.5 Distribution of intron (thick black bars) positions along homologous protein sequence (scaled to length) for gamma tubulins in different species 193

Fig. 10.6 Trends in modern molecular biology 193

List of Tables

Table

Table 1.2 List of useful UNIX commands

Table 1.3 Standard codon usage table arranged based on frequency of codon for specific amino acid residues

Table 1.4 Major institutions worldwide for storing genetic and biological data

Table 1.5 Major databases for genetic and biological data

Table 1.6 List of popular force fields for molecular mechanics

Table 2.1 HLA class I-specific peptides with known IC50 binding values collected from literature

Table 2.2 Grouping of peptides based on IC50 binding values given in Table 2.1

Table 2.3

Table 2.6

Table 2.7 Heterodimer

Table 2.8

Table 2.9 Dataset of homodimers divided into three groups based unfolding pathways

Table 2.10

CDS (coding sequence) patterns used for SEG annotation

Table 2.11 List of examples exhibiting fusion phenomenon

Table 2.12 CTA and CTB sequences from various serogroups

Table 2.13 GP120a structural dataset from PDB

Table 2.14 GP40 structural dataset from PDB

Table 6.1 Residue conservation at the interface of IGPS in TT and SC

Table 6.2 Structural properties of IGPS in TT and SC are given for initial and final structures . .

Table 7.1 Definition of HLA supertypes . . . .

Table 10.1 Eukaryotic genomes and their constituents

About the Author

Pandjassarame Kangueane (born on November 9, 1974) went to Petit Seminaire Higher Secondary School, Pondicherry (a French colony in pre-independence India), India. He was awarded the merit award for 1991 Matriculation Top Ranker (science subject) by the Education Department, Government of Pondicherry. He graduated with a B. Tech degree in industrial biotechnology with first class (distinction) from the Centre for Biotechnology (CBT), Anna University, in 1997. During 1993–1997, he worked on lipase production, assay, and its application in ibuprofen biotransformation. He was awarded a PhD in 2001 by the National University of Singapore (NUS) for his work on short peptide vaccine design and GvHD related to bone marrow transplantation (BMT). He served as scientist (2001) at S*BIO Pte Ltd., Singapore, visiting scientist for Technology Transfer (2001) at Chiron Corporation Emeryville, Bay Area, California, USA, assistant professor of bioinformatics (2002–2006) at Nanyang Technological University, Singapore, visiting professor (2007–2009) at VIT University, Vellore, India, and professor of biotechnology (2009–2011) at AIMST University, Malaysia. He currently serves as director at Biomedical Informatics (P) Ltd. and chief editor of Bioinformation, an open access journal for beyond bioinformatics. He serves as an associate editor of BMC Bioinformatics (a UK-based Biomed Central publication since 2005). He has advised several students leading to PhD degree by research in the field and has authored numerous research articles and book chapters. He is also an author of several books published by Springer, USA (2008, 2009, 2018); NOVA USA (2011), and LAP, Germany (2011). He is an ambassador for peace, Universal Peace Federation since 2008. He is conferred the Vishal Bharathi award with the title Bharath Jothi by GOPIO on April 8, 2012. He was also awarded the Indian Leadership award for Industrial Development by All India Achievers Foundation (AIAF) on August 9, 2012. He is truly excited while farming several crops including sugarcane, peanut, black gram, green gram, vegetables, coconut, lemon, rice, and chrysanthemum (chamomile) since 1990. He is a scientist, an author of scholarly materials, teacher of higher education, professor, educationalist, editor, journalist, entrepreneur, social reformer, and a farmer.

Abbreviations

2S 2 state homodimer

3S 3 state homodimer

3SDI 3S with dimer intermediate

3SMI 3S with monomer intermediate

ANN Artificial neural networks

ASA Accessible surface area

ATP Adenosine triphosphate

BCT Bacteria

BLAST Basic Local Alignment Search Tool

CDS Coding sequence

cDNA Complementary DNA

CGI Common gateway interface

CIB Center for Information Biology

CPFRP Critical polymorphic functional residue positions

CON Contigs

CATH Class-architecture-topology-homologous superfamily

DDBJ DNA Data Bank of Japan

DNA Deoxyribonucleic acids

EBI European Bioinformatics Institute

EMBL European Molecular Biology Laboratory

EST Expressed sequence tags

ERP Essential residue positions

EXINT Exon-intron

FTP File transfer protocol

FUN Functional

g-GK

GPCR

g-glutamate-5-kinase

G-protein-coupled receptor

GSA Glutamic g-semialdehyde

GSS Genome survey sequences

HLA Human leukocyte antigen

HTC High-throughput cDNA

Abbreviations

HTG

High-throughput genomic

HTML Hypertext markup language

HUM Human

IAR

Interface amino acid residue

IBS Independent binding of side chains

IC50 Inhibitory concentration 50

IGPS Imidazole glycerol phosphate synthetase

IHF Integration host factor

IMGT International ImMunoGeneTics

IND Investigational new drug

INV Invertebrate

IPR Intellectual property rights

KEGG Kyoto Encyclopedia of Genes and Genomes

MAM Mammals

MEG Multiple exon gene

MHC Major histocompatibility complex

NAD(P) Nicotinamide adenine dinucleotide phosphate

NCBI National Center for Biotechnology Information

NDA New drug application

NMR Nuclear magnetic resonance

ORG Organelle

PAT Patent

PDB Protein data bank

PERL Practical Extraction and Report Language

PLN Plant

PHG Phage

PRO Prokaryote

PRI Primate

PSMA

Prostate-specific membrane antigen

RCSB Research Collaboratory for Structural Bioinformatics

RDBMS Relational database management system

Rg Radius of gyration

RNA Ribonucleic acids

ROD Rodent

RXR Retinoid receptor

SC S. cerevisiae

SCOP

Structural classification of proteins

SEG Single exon gene

SEGE Single exon gene in eukaryotes

SNP Single nucleotide polymorphism

SLL Squared loop length

STS Sequence-tagged sites

SYN Synthetic

TCR T-cell receptor

TT T. thermophilus

Abbreviations

U-Genome Unicellular eukaryotic genome

URL Uniform Resource Locator

VRL Viral

VRT Vertebrate

WWW World Wide Web

Chapter 1 Bioinformatics for Bioinformation

Abstract This chapter introduces concepts in bioinformatics and describes its application in agriculture, healthcare, and biotechnology. The principles and components of bioinformatics are discussed in detail. A sound knowledge on the basic concepts in bioinformatics is the foundation for bioinformation discovery from data. The significance of bioinformation discovery in defining targets for developing drugs is discussed. The issues in a discovery environment under pharmaceutical or biotechnological research and development environment are illustrated using block diagrams. The importance of biological data (sequence, structure, and function) in discovery is highlighted. The chapter also contains sufficient exercise problems for practice.

Keywords Bioinformatics · Bioinformation · Concepts · Techniques · Tools · Databases · Drug discovery · Pipeline · Sequence · Structure · Model · Interactions · Design

1.1 Bioinformatics

The bioinformatics discipline evolved in the late twentieth century using concepts and techniques from multiple other disciplines with a vision to study issues in biology by comparing observable phenotypes with molecular genetic data. This has immense application and relevance in healthcare, agriculture, and biotechnology (Fig. 1.1).

The key here is data related to the molecular genetics of living cells and organisms generated using advanced techniques and tools from engineering. Thus, the discipline borrows concepts and techniques either directly or indirectly from other established disciplines such as mathematics, physics, chemistry, zoology, botany, genetics, biochemistry, molecular biology, chemical engineering, biochemical reaction engineering, biotechnology, computer engineering, and information science. The idea is to store, retrieve, curate, and use molecular genetics data in databases for the simulation of molecular phenomena in cells and organisms by applying mathematical models. It should be noted that data is generated by the analysis of samples from living cells, tissues, organs, and organisms using techniques, methods,

Another random document with no related content on Scribd:

PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the free distribution of electronic works, by using or distributing this work (or any other work associated in any way with the phrase “Project Gutenberg”), you agree to comply with all the terms of the Full Project Gutenberg™ License available with this file or online at www.gutenberg.org/license.

Section 1. General Terms of Use and Redistributing Project Gutenberg™ electronic works

1.A. By reading or using any part of this Project Gutenberg™ electronic work, you indicate that you have read, understand, agree to and accept all the terms of this license and intellectual property (trademark/copyright) agreement. If you do not agree to abide by all the terms of this agreement, you must cease using and return or destroy all copies of Project Gutenberg™ electronic works in your possession. If you paid a fee for obtaining a copy of or access to a Project Gutenberg™ electronic work and you do not agree to be bound by the terms of this agreement, you may obtain a refund from the person or entity to whom you paid the fee as set forth in paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only be used on or associated in any way with an electronic work by people who agree to be bound by the terms of this agreement. There are a few things that you can do with most Project Gutenberg™ electronic works even without complying with the full terms of this agreement. See paragraph 1.C below. There are a lot of things you can do with Project Gutenberg™ electronic works if you follow the terms of this agreement and help preserve free future access to Project Gutenberg™ electronic works. See paragraph 1.E below.

1.C. The Project Gutenberg Literary Archive Foundation (“the Foundation” or PGLAF), owns a compilation copyright in the collection of Project Gutenberg™ electronic works. Nearly all the individual works in the collection are in the public domain in the United States. If an individual work is unprotected by copyright law in the United States and you are located in the United States, we do not claim a right to prevent you from copying, distributing, performing, displaying or creating derivative works based on the work as long as all references to Project Gutenberg are removed. Of course, we hope that you will support the Project Gutenberg™ mission of promoting free access to electronic works by freely sharing Project Gutenberg™ works in compliance with the terms of this agreement for keeping the Project Gutenberg™ name associated with the work. You can easily comply with the terms of this agreement by keeping this work in the same format with its attached full Project Gutenberg™ License when you share it without charge with others.

1.D. The copyright laws of the place where you are located also govern what you can do with this work. Copyright laws in most countries are in a constant state of change. If you are outside the United States, check the laws of your country in addition to the terms of this agreement before downloading, copying, displaying, performing, distributing or creating derivative works based on this work or any other Project Gutenberg™ work. The Foundation makes no representations concerning the copyright status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project Gutenberg:

1.E.1. The following sentence, with active links to, or other immediate access to, the full Project Gutenberg™ License must appear prominently whenever any copy of a Project Gutenberg™ work (any work on which the phrase “Project Gutenberg” appears, or with which the phrase “Project

Gutenberg” is associated) is accessed, displayed, performed, viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived from texts not protected by U.S. copyright law (does not contain a notice indicating that it is posted with permission of the copyright holder), the work can be copied and distributed to anyone in the United States without paying any fees or charges. If you are redistributing or providing access to a work with the phrase “Project Gutenberg” associated with or appearing on the work, you must comply either with the requirements of paragraphs 1.E.1 through 1.E.7 or obtain permission for the use of the work and the Project Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted with the permission of the copyright holder, your use and distribution must comply with both paragraphs 1.E.1 through 1.E.7 and any additional terms imposed by the copyright holder. Additional terms will be linked to the Project Gutenberg™ License for all works posted with the permission of the copyright holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project Gutenberg™ License terms from this work, or any files containing a part of this work or any other work associated with Project Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute this electronic work, or any part of this electronic work, without prominently displaying the sentence set forth in paragraph 1.E.1 with active links or immediate access to the full terms of the Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary, compressed, marked up, nonproprietary or proprietary form, including any word processing or hypertext form. However, if you provide access to or distribute copies of a Project Gutenberg™ work in a format other than “Plain Vanilla ASCII” or other format used in the official version posted on the official Project Gutenberg™ website (www.gutenberg.org), you must, at no additional cost, fee or expense to the user, provide a copy, a means of exporting a copy, or a means of obtaining a copy upon request, of the work in its original “Plain Vanilla ASCII” or other form. Any alternate format must include the full Project Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying, performing, copying or distributing any Project Gutenberg™ works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing access to or distributing Project Gutenberg™ electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive from the use of Project Gutenberg™ works calculated using the method you already use to calculate your applicable taxes. The fee is owed to the owner of the Project Gutenberg™ trademark, but he has agreed to donate royalties under this paragraph to the Project Gutenberg Literary Archive Foundation. Royalty payments must be paid within 60 days following each date on which you prepare (or are legally required to prepare) your periodic tax returns. Royalty payments should be clearly marked as such and sent to the Project Gutenberg Literary Archive Foundation at the address specified in Section 4, “Information

about donations to the Project Gutenberg Literary Archive Foundation.”

• You provide a full refund of any money paid by a user who notifies you in writing (or by e-mail) within 30 days of receipt that s/he does not agree to the terms of the full Project Gutenberg™ License. You must require such a user to return or destroy all copies of the works possessed in a physical medium and discontinue all use of and all access to other copies of Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of any money paid for a work or a replacement copy, if a defect in the electronic work is discovered and reported to you within 90 days of receipt of the work.

• You comply with all other terms of this agreement for free distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™ electronic work or group of works on different terms than are set forth in this agreement, you must obtain permission in writing from the Project Gutenberg Literary Archive Foundation, the manager of the Project Gutenberg™ trademark. Contact the Foundation as set forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend considerable effort to identify, do copyright research on, transcribe and proofread works not protected by U.S. copyright law in creating the Project Gutenberg™ collection. Despite these efforts, Project Gutenberg™ electronic works, and the medium on which they may be stored, may contain “Defects,” such as, but not limited to, incomplete, inaccurate or corrupt data, transcription errors, a copyright or other intellectual property infringement, a defective or damaged disk or other

medium, a computer virus, or computer codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGESExcept for the “Right of Replacement or Refund” described in paragraph 1.F.3, the Project Gutenberg Literary Archive Foundation, the owner of the Project Gutenberg™ trademark, and any other party distributing a Project Gutenberg™ electronic work under this agreement, disclaim all liability to you for damages, costs and expenses, including legal fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH

1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a defect in this electronic work within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending a written explanation to the person you received the work from. If you received the work on a physical medium, you must return the medium with your written explanation. The person or entity that provided you with the defective work may elect to provide a replacement copy in lieu of a refund. If you received the work electronically, the person or entity providing it to you may choose to give you a second opportunity to receive the work electronically in lieu of a refund. If the second copy is also defective, you may demand a refund in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS

OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied warranties or the exclusion or limitation of certain types of damages. If any disclaimer or limitation set forth in this agreement violates the law of the state applicable to this agreement, the agreement shall be interpreted to make the maximum disclaimer or limitation permitted by the applicable state law. The invalidity or unenforceability of any provision of this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the trademark owner, any agent or employee of the Foundation, anyone providing copies of Project Gutenberg™ electronic works in accordance with this agreement, and any volunteers associated with the production, promotion and distribution of Project Gutenberg™ electronic works, harmless from all liability, costs and expenses, including legal fees, that arise directly or indirectly from any of the following which you do or cause to occur: (a) distribution of this or any Project Gutenberg™ work, (b) alteration, modification, or additions or deletions to any Project Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of Project Gutenberg™

Project Gutenberg™ is synonymous with the free distribution of electronic works in formats readable by the widest variety of computers including obsolete, old, middle-aged and new computers. It exists because of the efforts of hundreds of volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the assistance they need are critical to reaching Project

Gutenberg™’s goals and ensuring that the Project Gutenberg™ collection will remain freely available for generations to come. In 2001, the Project Gutenberg Literary Archive Foundation was created to provide a secure and permanent future for Project Gutenberg™ and future generations. To learn more about the Project Gutenberg Literary Archive Foundation and how your efforts and donations can help, see Sections 3 and 4 and the Foundation information page at www.gutenberg.org.

Section 3. Information about the Project Gutenberg Literary Archive Foundation

The Project Gutenberg Literary Archive Foundation is a nonprofit 501(c)(3) educational corporation organized under the laws of the state of Mississippi and granted tax exempt status by the Internal Revenue Service. The Foundation’s EIN or federal tax identification number is 64-6221541. Contributions to the Project Gutenberg Literary Archive Foundation are tax deductible to the full extent permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up to date contact information can be found at the Foundation’s website and official page at www.gutenberg.org/contact

Section 4. Information about Donations to the Project Gutenberg Literary Archive Foundation

Project Gutenberg™ depends upon and cannot survive without widespread public support and donations to carry out its mission of increasing the number of public domain and licensed works that can be freely distributed in machine-readable form

accessible by the widest array of equipment including outdated equipment. Many small donations ($1 to $5,000) are particularly important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws regulating charities and charitable donations in all 50 states of the United States. Compliance requirements are not uniform and it takes a considerable effort, much paperwork and many fees to meet and keep up with these requirements. We do not solicit donations in locations where we have not received written confirmation of compliance. To SEND DONATIONS or determine the status of compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where we have not met the solicitation requirements, we know of no prohibition against accepting unsolicited donations from donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation methods and addresses. Donations are accepted in a number of other ways including checks, online payments and credit card donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project Gutenberg™ electronic works

Professor Michael S. Hart was the originator of the Project Gutenberg™ concept of a library of electronic works that could be freely shared with anyone. For forty years, he produced and

distributed Project Gutenberg™ eBooks with only a loose network of volunteer support.

Project Gutenberg™ eBooks are often created from several printed editions, all of which are confirmed as not protected by copyright in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition.

Most people start at our website which has the main PG search facility: www.gutenberg.org.

This website includes information about Project Gutenberg™, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks.