Bioinformation discovery data to knowledge in biology pandjassarame kangueane - Download the ebook n
Visit to download the full and correct content document: https://textbookfull.com/product/bioinformation-discovery-data-to-knowledge-in-biolog y-pandjassarame-kangueane/
More products digital (pdf, epub, mobi) instant download maybe you interests ...
Ecological informatics data management and knowledge discovery Michener
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Printed on acid-free paper
This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Dedicated to the creator of life on earth and to humanity that ponders its universal existence.
Preface
The purpose of the book titled Bioinformation Discovery: Data to Knowledge in Biology is to illustrate the power of biological data in knowledge discovery. The book consists of 10 chapters spanning approximately 200 pages. It describes biological data types and representations with examples for creating a workflow in bioinformation discovery. The concepts in biological knowledge discovery from data are illustrated using line diagrams. This book provides clarity to graduate students entering research in biology to design experiments and to formulate hypothesis.
The principles and concepts in biological knowledge discovery are used for the identification of prevalent rules toward the development of prediction models. Simulations of biological reactions using prediction models will further help in the design of its components. Advanced topics in molecular evolution in the context of cellular and molecular biology are addressed using bioinformation gleaned through knowledge discovery from data. The salient features of the book are (1) bioinformation discovery as a new domain in biology, (2) biological data representation, (3) biological dataset creation from databases, (4) biological knowledge extraction from data, (5) examples of knowledge discovery, and (6) exercises for practice. The exercise problems are designed to help students to expand their problem-solving skills in bioinformation discovery.
Pondicherry, India
Pandjassarame Kangueane
Acknowledgments
I wish to express my sincere appreciation to all members of Biomedical Informatics (P) Ltd. for many discussions on the subject of this book. I also thank all my colleagues (Dr. S. Subbiah, Dr. Meena Sakharkar, Dr. Venkatarajan Mathura, Dr. P. Gautam, Dr. B. S. Lakshmi), associates (Dr. Tan Tin Wee, Dr. P. R. Kolatkar, Dr. E. C. Ren), collaborators (Dr. Paul Shapshak, Dr. Francesco Chiappelli, Dr. Kannan Gunasekaran), staffs (Ms. R. Kayathri, Ms. N. Dandona, Ms. C. Iti, Mr. Lee Pern Chern), and students (Dr. Zhao Bing, Dr. Yu Yiting, Dr. Lei Li, Dr. Cui Zhanhua, Ms. Lim Yun Ping, Dr. A. Mohanapriya, Dr. Sajitha Lulu, Dr. M. Jayanthi, Dr. G. Sowmya, Dr. Abishek Suresh, Dr. V. Karthikraja, Ms. A. Vaishnavi, Ms. G. Shamini, Ms. S. Anita, Ms. Ilakya, Mr. G. Kalaivani) in my professional life, especially during 1993–2018 without whom this edition of the book would not have been materialized. I would like to thank the authors of several bioinformatics tools, techniques, and databases made available in the public domain through open access and open source publishing models. I am also thankful to Ms. K. Uma and Ms. C. Nilofer for help with the development of this book.
Pandjassarame Kangueane
1.26 Discovery Environment
1.27 Sequence, Structure Alignment, and Evolutionary Inferences
1.27.1 Sequence Alignment
1.28.1 Protein Modeling
1.28.2 Methods of Protein Modeling
1.28.3 Popular Force Fields for Molecular Mechanics
1.28.4 Prediction of Protein Structure.
1.28.5 Caveats on Homology Modeling
3.7 FASTA
3.8 INSIGHT II
3.9 GENSCAN
3.10 GROMOS.
4
3.11 HBPLUS
3.12 LALIGN/PLALIGN
3.13 LIGPLOT
3.14 LOOK
3.15 MODELLER
3.16 NACCESS
3.17 PHYLIP
3.18 PROTPARAM
3.19 PROTORP
3.20 PSAP
3.21
5.8
6.1 Gene Fusion .
6.2 Operons in Prokaryotes as Human Fusion Proteins
6.3 Multiple Functions in Fusion Proteins
6.4 Alternative Splicing in Fusion Genes
6.5 Protein Subunit Interaction and Fusion Proteins
6.6 Mechanism of Gene Fusion
6.7 Hypothesis of Gene Fusion
6.8 Structural Importance of Fusion Proteins
6.8.1 Fusion Protein IGPS Function
6.8.2 Fusion Protein IGPS Structure
6.8.3 IGPS Sequence, Structure, and Properties
6.8.4 Interface Area in IGPS
6.8.5
7.10.8 Note on T-EPITOPE .
7.11 HLA Supertypes. .
7.11.1 Grouping of HLA Alleles by Several Research Groups
7.11.2 Perplexing Issues with HLA Supertypes
7.11.3 Structural Basis for HLA Supertypes
7.11.4 Predictive Grouping of HLA Supertypes
7.11.5 Grouping Using Electrostatic Distribution Maps
7.11.6 Remarks on HLA Supertypes
7.12 Exercises
Exercises
List of Figures
Fig. 1.1 Relevance of bioinformatics in agriculture, healthcare, and biotechnology 2
Fig. 1.2 Evolution of bioinformatics and bioinformation 2
Fig. 1.3 Drug discovery pipeline. IND investigational new drug, NDA new drug application 4
Fig. 1.4 Skills for bioinformatics 5
Fig. 1.5 Types of data distribution are shown 7
Fig. 1.6 Description of energy function and energy minimization using first-order differentiation 7
Fig. 1.7 Illustration of regression analysis and determination of Pearson correlation coefficient (r) as shown 8
Fig. 1.8 Data warehousing in a discovery environment 8
Fig. 1.14 Biological database and their associations 14
Fig. 1.15 Data exchange between NCBI (USA), EBI (Europe), and CIB (Japan). Please refer to Table 1.4 for description on NCBI, EBI, and CIB 15
Fig. 1.16 Data explosion in biological domain 16
Fig. 1.17 Genetic data growth in GenBank 17
Fig. 1.18 Divisions in GenBank. BCT bacteria, FUN functional, HUM human, INV invertebrate, MAM mammals, ORG organelle, PHG phage, PLN plant, PRI primate, PRO prokaryote, ROD rodent, SYN synthetic, VRL viral, VRT vertebrate, PAT patent, EST expressed sequence tags, STS sequence-tagged sites, GSS genome
survey sequences, HTG high-throughput genomic, HTC highthroughput cDNA, CON contigs
Fig. 1.19 Structural and classifications
Fig. 1.20 Protein structure and its components
Fig. 1.21 SCOP classification and folds
Fig. 1.22 CATH and classification . . . .
Fig. 1.23 An example pathway (glucose metabolism) is shown. This pathway consists of two sections (glycolysis and citric acid cycle). Glucose, glucose-6-phosphate, pyruvate, lactate, acetyl-co-A, citric acid, α-ketoglutarate, and oxaloacetate are small molecule metabolites. In this example pyruvate dehydrogenase is the catalyzing protein enzyme
22
Fig. 1.24 Major bioinformatics development based on category 22
Fig. 1.25 Tools and concepts in bioinformatics 23
Fig. 1.26 Issues in a biological discovery environment
24
Fig. 1.27 Types of molecular interactions 24
Fig. 1.28 Sequence and structure alignment relation 25
Fig. 1.29 Illustration of sequence alignment by global and local alignment is shown
Fig. 1.30 Protein modeling principles. Force field equation (top), force field terms (middle), unfolded to folded (bottom)
Fig. 1.31 Protein structure prediction is illustrated. The steps involved in the prediction of a protein structure are shown
Fig. 1.32 Schematic diagram illustrating the docking of a small molecule ligand to a protein target to produce a target-ligand complex
Fig. 1.33 Structure of the target Candida rugosa lipase (CRL) is shown
Fig. 1.34 Molecular docking of Candida rugosa lipase (CRL) with the isomers of ibuprofen is shown. H-bonds formed between S(+) ibuprofen and target stabilizes binding. This is not true for R(−) ibuprofen with the docked target
Fig. 1.35 Phylogenetic analysis. (a) Properties of phylogenetic trees. Leaves (vertices) represent species or sequences compared. Nodes (vertices) represent bifurcations, speciation events, and hypothetical ancestor sequences. Branches (edges) represent sequence diversity. Branch lengths represent sequence variation over time and rate of change. The root (vertice) represents the hypothetical ancestor. (b) Relationships of apes and humans are shown using a sample phylogenetic tree
Fig. 1.36 Different types of phylogenetic analysis methods are illustrated .
Fig. 2.1 Creating biological datasets for knowledge discovery. PDB Protein Data Bank, DDBJ DNA Data Bank of Japan, EMBL European Molecular Biology Laboratory, RCSB Research Collaboration for Structural Biology
26
27
28
29
29
29
Fig. 2.2 MHC-peptide binding at the binding groove is shown
Fig. 2.3 An example heterodimer structure complex of succinyl co-A synthetase (α) and succinyl co-A synthetase (β) is shown . . . . .
Fig. 2.4 Sequence alignment (using EMBOSS needle) between succinyl co-A synthetase (α) and succinyl co-A synthetase (β) is shown with percentage similarity and identity .
Fig. 2.5 An example homodimer structure complex of aspartate aminotransferase A and B subunits is shown
Fig. 2.6 Sequence alignment (using EMBOSS needle) between aspartate aminotransferase A and B subunits is shown with percentage similarity and identity
Fig. 2.7 Homodimer folding and binding mechanism is shown. 2S two state, 3SDI three state with dimer intermediate, 3SMI three state with monomer intermediate
Fig. 2.8 The gene structure of SEG and MEG is illustrated
Fig. 2.9 GenBank FEATURES and CDS annotation (bottom horizontal arrow) for a genomic DNA (top horizontal arrow)
Fig. 2.10 CDS annotation for direct, complement, and partial intronless genes
Fig. 2.11 Different CDS representations for intron-containing multiple exon genes in eukaryotes are illustrated
Fig. 2.12 Fusion protein scenario for imidazole glycerol phosphate synthetase (IGPS) in yeast and bacteria
Fig. 2.13 A structural model of a cholera toxin (CT) is shown. CT is a hetero-hexameric complex (AB5) consisting of CTA (cleaved into 194 residues A1 and 46 residues A2) and CTB (103 residues) pentamer with D, E, F, G, and H chains
Fig. 2.14 Creation of a dataset for CTA and CTB sequences from GenBank. A sequence dataset of CTA and CTB was derived from GenBank (release 177) using KEYWORD search as illustrated in the flowchart. The KEYWORD search “cholera toxin” resulted in 1257 hits. This set consists of 27 CTA sequences, 165 CTB sequences according to GenBank description and available annotations. The remaining 1065 sequences with descriptions such as secretion protein, cholera toxin transcriptional activator, ADP-ribosylation factor, GNAS complex, dopamine receptor, Pertussis toxin, Shiga-like toxin, and the like are eliminated from the dataset. Thus, a CT sequence dataset of 192 sequences consisting of 27 CTA and 165 CTB was created. The CTA and CTB sequences are included in the dataset as available in the
46
61
63
64
65
66
67
GenBank. The biased availability on the amount of CTA and CTB sequences in GenBank is attributed to the likely observation of frequent mutations in CTB .
Fig. 2.15 Superposition of electron microscopy (EM) structures (PDB ID, 5FUU (4.19 Å resolution) and 5U1F (6.8 Å resolution)) of HIV-1/GP160 (GP120/GP40) trimer spike protein complex. This is a trimer of three GP160 structures. Each GP160 is made of cleaved GP120 and GP40. Thus, (GP120/GP40)3 forms the viral spike protein complex. GP glycoprotein
Fig. 2.16 Biological knowledge pipeline from data is illustrated
68
68
69
Fig. 2.17 A graphical abstract of different datasets described in this chapter 69
Fig. 3.1 An example for global and local alignment is illustrated using ALIGN 76
Fig. 3.2 An example for BIMAS HLA-peptide-binding prediction is shown 77
Fig. 3.3 An example for BLAST analysis is shown 78
Fig. 3.4 An example for multiple sequence alignment is shown 78
Fig. 3.5 DeepView download page 79
Fig. 3.6 FASTA download page 80
Fig. 3.7 An example for GENSCAN output is shown 81
Fig. 3.8 A schematic representation of a hydrogen bond is illustrated 82
Fig. 3.9 Download page for HBPLUS 82
Fig. 3.10 An example of LALIGN/PLALIGN input/output 83
Fig. 3.11 An example for inhibitor-enzyme interaction is shown using LIGPLOT 84
Fig. 3.12 Binding of HLA A*0201 with mHag peptides HA-1H and HA-1R is modeled and shown using the LOOK interface 85
Fig. 3.13 The download page for MODELLER 85
Fig. 3.14 The download page for NACCESS 86
Fig. 3.15 The download page for PHYLIP is shown 87
Fig. 3.16 An example of PROTPARAM input/output is shown 87
Fig. 3.17 The web interface for PSAP is shown 88
Fig. 3.18 An example input/output for InterPro is shown
Fig. 3.19 Download page for PYMOL is shown
Fig. 3.20 Download page for RASMOL is shown
Fig. 3.21 The web interface for ROSETTA is shown
Fig. 3.22 The download page for SURFNET is shown
Fig. 3.23 An example of input/output for T-EPITOPE designer is shown
Fig. 4.1 Interface shape complementarity between interacting subunits
89
89
90
90
91
Fig. 4.2 The correspondence between interface residues in one dimension and three dimensions is illustrated . . . . . . . . . . . . . 97
Fig. 4.3 A hydrophobic residue interface is illustrated using 1M4U (PDB ID) showing P35-I86 as well as I33-P74 interaction. .
Fig. 4.4 An illustration of interface residues at the protein-protein interface using CPK representation is illustrated. This shows that a stable interface is critical for protein-protein binding. This figure is adapted from Nilofer et al. (2017) under the open access creative commons attribution license . .
98
98
Fig. 4.5 Relationship between interface size and interface area is shown. It is clear that interface area increases with interface size. This is adapted from Sowmya et al. (2011) under the open access creative commons attribution license 99
Fig. 4.6 An illustration of large, medium, and small interfaces is shown with corresponding homodimer complexes shown. 2S 2-state, 3SMI 3-state monomer intermediate, 3SDI 3-state dimer intermediate. This figure is adapted with permission from Kangueane and Nilofer (2018)
Fig. 4.7 Fractional distribution of interface residues is shown. Hydrophobic residues are dominant in homodimer interfaces. This figure is adapted from Zhanhua et al. (2005) under the open access creative commons attribution license
Fig. 4.8 The distribution of amino acid residues as a ratio of interface to surface and interior is shown for heterodimer and homodimer protein complexes. The ratio of charged residues at the interface to interior is high for heterodimer protein complexes. This trend is true for hydrophobic residues in homodimer protein complexes. This figure is adapted from Zhanhua et al. (2005) under the open access creative commons attribution license
Fig. 4.9 The contribution by hydrogen bond energy at the interface of obligatory, nonobligatory, and immune complexes is shown. It is noted that hydrogen bond energy contributes to about 15 ± 6.5% at the interface of protein-protein complexes. This figure is adapted with permission from Kangueane and Nilofer (2018)
Fig. 4.10 Relationship between interface size and hydrogen bond energy at the interface of obligatory, nonobligatory, and immune complexes is shown. This image is adapted from Nilofer et al. (2017) under the open access creative commons attribution license
Fig. 4.11 The contribution by electrostatic energy at the interface of obligatory, nonobligatory, and immune complexes is shown. It is noted that electrostatic energy contributes to about
99
100
100
11.3 ± 8.7% at the interface of protein-protein complexes. This figure is adapted with permission from Kangueane and Nilofer (2018) .
Fig. 4.12 Relationship between interface size and electrostatic energy at the interface of obligatory, nonobligatory, and immune complexes is shown. This image is adapted from Nilofer et al. (2017) under the open access creative commons attribution license
Fig. 4.13 Distribution of sidechain-sidechain interaction (S1S1I) at the interface is shown as a function of distance x (Å). Two atoms are considered to be interacting of the interacting if the distance between them is within the sum of their vdW radii plus x distance. This image is adapted from Li et al. (2006) under the open access creative commons attribution license
103
Fig. 4.14 An interface hot spot is shown. The interaction of residue K15 (PDB ID: BPTI, Chain I) to residues S190, S195, and V213 (Trypsin, Chain E) is shown (PDB ID: 2PTC). K15 has three interacting sidechain atoms (CB, CD, and NZ). It should be noted that these three atoms are involved in favorable contacts and only CB participates in unfavorable contacts. This image is adapted from Li et al. (2006) under the open access creative commons attribution license 104
Fig. 5.1 Homodimer folding and binding mechanism are illustrated 108
Fig. 5.2 Distribution of 2S, 3SMI, and 3SDI proteins in relation to size and interface area is demonstrated 110
Fig. 5.3 An example of a 2S protein is illustrated with binding mode
111
Fig. 5.4 An example of a 3SMI protein is illustrated with binding mode 112
Fig. 5.5 An example of a 3SDI protein is illustrated with binding mode
113
Fig. 5.6 The distribution of interface to total residues is shown for 2S, 3SMI, 3SDI proteins in Table 2.9. Ψ = 3SDI 114
Fig. 5.7 An illustration of large, medium, and small interfaces is shown among homodimers. This is adopted from Karthikraja et al. (2009) under the open access creative commons attribution license
Fig. 6.1 A fusion protein is illustrated
Fig. 6.2 A fusion protein mimicking operon-like structure is shown.
Fig. 6.3 A fusion protein with multiple functions is illustrated
118
Fig. 6.4 A fusion protein mimicking protein subunit interaction is illustrated
Fig. 6.5 IGPS structure in bacteria (not fused) and yeast (fused) .
Fig. 6.6 A fusion scenario for IGPS between bacteria and yeast is shown
Fig. 6.7 IGPS in bacteria and yeast before and after molecular dynamics simulation
Fig. 6.8 Interface area in IGPS from bacteria and yeast
Fig. 6.9 Gap volume for IGPS from bacteria and yeast
Fig. 6.10 Gap index for IGPS from bacteria and yeast
Fig. 6.11 Rg for IGPS from bacteria and yeast
Fig. 7.1 MHC gene loci
Fig. 7.2 HLA sequence growth at IMGT/HLA database
Fig. 7.3 MHC and its function in T-cell immunity
Fig. 7.4 Structure of class I MHC molecule. The structure consists of a peptide binding alpha subunit and supporting beta-2m. The alpha subunit consists of domain 1, 2, and 3. The peptide binding domain is 1 and 2
Fig. 7.5 Peptide binding domains of class I MHC molecules with bound peptide. The polymorphic residues are often centered at the peptide binding groove comprising of alpha 1 and alpha 2 domains
Fig. 7.6
Structural and sequence alignment of the α chain with highly polymorphic residues clustered in the peptide binding groove is shown
Fig. 7.7 Class II MHC molecule HLA-DC1 with the bound peptide is shown. The groove is formed by α and β chains
Fig. 7.8 Sequence anchors in HLA-A*0201 binding peptide motif are shown
Fig. 7.9 Structural alignment of human class II MHC specific peptides with known sequence and structural anchors. BOLD letters represent binding structural anchors
Fig. 7.10 Structural alignment of (a) class I and (b) class II HLA alleles with bound peptides. The binding groove of class I is formed by alpha 1 and alpha 2 domains. The binding groove of class I is formed by α chain and β chain. The peptides bound to class II molecules have extended conformation from the groove unlike class I molecules
Fig. 7.11 An illustration of the steps involved in the development of a prediction model for T-EPITOPE Designer. The model is based on information gleaned from MHC-peptide structures. HERP highly essential residue positions .
Fig. 7.12 An illustration of the user interface for T-EPITOPE Designer. The web interface contains (a) Overview,
136
136
137
139
141
142
146
149
(b) Service, (c) Model, (d) Designer, (e) Links, and (f) Team
Fig. 7.13 Definition of HLA supertypes
Fig. 7.14 Multiple sequence alignment of HLA alleles
Fig. 7.15 Critical residues in class I HLA structures for peptide binding. (a) Distribution in the dataset (see Table 2.4 in Chap. 2 for dataset), (b) mean, and (c) standard deviation about the mean are given
Fig. 7.16 Pockets (A–F) for peptide binding in class I HLA molecules is shown. Illustrated pockets are based on the pocket definition of Bjorkman et al. (1987)
Fig. 8.1 The cholera toxin (CT) hetero-hexameric complex (AB5) consisting of CTA (cleaved into A1 and A2) and CTB pentamer with D, E, F, G, and H subunits is shown. This image is adapted from Shamini et al. (2011) under the open access creative commons attribution license
Fig. 8.2 Protein-protein interfaces in CTB are shown. The interaction between subunits D and E and D and H is illustrated. Subunit D interacts with subunits E and H on either side having two different interfaces. This image is adapted from Shamini et al. (2011) under the open access creative commons attribution license
Fig. 8.3 Interface residue positions in CTA and CTB are shown using delta ASA as a measure of interface area. Residue positions with mutations in CTA and CTB are mapped to interfaces. This image is adapted from Shamini et al. (2011) under the open access creative commons attribution license
Fig. 8.4 Common mutations in CTA and CTB are illustrated using CPK residue models. These mutations are identified compared to the wild CT type sequence in a dataset of sequences summarized in Table 2.12. This image is adapted from Shamini et al. (2011) under the open access creative commons attribution license
Fig. 8.5 Known mutations mapped to the interface of CTA/CTB and within CTB are shown using CPK residue models in three dimensions. Please see Fig.8.3 for comparison. This image is adapted from Shamini et al. (2011) under the open access creative commons attribution license
Fig. 9.1 Schematic illustration of HIV-1 GP160 (GP120/GP40) trimer spike protein is shown with bound membrane. This complex is made of three GP120/GP40 assemblies. Protein-protein interfaces between GP120-GP120,
165
165
166
167
168
GP40-GP40, and GP120-GP40 is realized. This image is adapted with permission from Nilofer et al. (2017) . . .
Fig. 9.2 Superimposed electron microscopy (EM) structure of HIV-1 GP160 (GP120/GP40) trimer spike protein is shown. Each GP160 unit is made of GP120 at the top and GP40 at the bottom. This image is adapted with permission from Kangueane and Nilofer (2018) .
Fig. 9.3 Structure of GP120 and GP40 in monomer and trimer form is shown. V, variable region; C, constant region. This image is adapted from Sowmya et al. (2011) under the open access creative commons attribution license .
175
175
176
Fig. 9.4 Protein-protein interface between GP120/GP120 and GP40/GP40 is shown. This image is adapted with permission from Nilofer et al. (2017) 177
Fig. 9.5 Number of known HIV-1 GP160 ENV sequences in LANL database over a period of two decades. This image is adapted with permission from Nilofer et al. (2017) 177
Fig. 9.6 Glycosylated structure of HIV-1 GP160 trimer ENV protein is shown. Sugar moieties expanded as NAG (N-acetyl-d-glucosamine), BMA (beta-d-mannose), MAN (alpha-d-mannose), GAL (beta-d-galactose), and FUC (alpha-l-fucose). This image is adapted with permission from Nilofer et al. (2017) 178
Fig. 10.1 The gene structure for SEG and MEG is illustrated 184
Fig. 10.2 Mechanism of retro-transposition is illustrated with the formation of pseudogene, active/inactive retro-genes 187
Fig. 10.3 Definition of intron phases (a) illustrated with an example (b) 189
Fig. 10.4 Alternative splicing by exon skipping illustrated 189
Fig. 10.5 Distribution of intron (thick black bars) positions along homologous protein sequence (scaled to length) for gamma tubulins in different species 193
Fig. 10.6 Trends in modern molecular biology 193
List of Tables
Table
Table 1.2 List of useful UNIX commands
Table 1.3 Standard codon usage table arranged based on frequency of codon for specific amino acid residues
Table 1.4 Major institutions worldwide for storing genetic and biological data
Table 1.5 Major databases for genetic and biological data
Table 1.6 List of popular force fields for molecular mechanics
Table 2.1 HLA class I-specific peptides with known IC50 binding values collected from literature
Table 2.2 Grouping of peptides based on IC50 binding values given in Table 2.1
Table 2.3
Table 2.6
Table 2.7 Heterodimer
Table 2.8
Table 2.9 Dataset of homodimers divided into three groups based unfolding pathways
Table 2.10
CDS (coding sequence) patterns used for SEG annotation
Table 2.11 List of examples exhibiting fusion phenomenon
Table 2.12 CTA and CTB sequences from various serogroups
Table 2.13 GP120a structural dataset from PDB
Table 2.14 GP40 structural dataset from PDB
Table 6.1 Residue conservation at the interface of IGPS in TT and SC
Table 6.2 Structural properties of IGPS in TT and SC are given for initial and final structures . .
Table 7.1 Definition of HLA supertypes . . . .
Table 10.1 Eukaryotic genomes and their constituents
About the Author
Pandjassarame Kangueane (born on November 9, 1974) went to Petit Seminaire Higher Secondary School, Pondicherry (a French colony in pre-independence India), India. He was awarded the merit award for 1991 Matriculation Top Ranker (science subject) by the Education Department, Government of Pondicherry. He graduated with a B. Tech degree in industrial biotechnology with first class (distinction) from the Centre for Biotechnology (CBT), Anna University, in 1997. During 1993–1997, he worked on lipase production, assay, and its application in ibuprofen biotransformation. He was awarded a PhD in 2001 by the National University of Singapore (NUS) for his work on short peptide vaccine design and GvHD related to bone marrow transplantation (BMT). He served as scientist (2001) at S*BIO Pte Ltd., Singapore, visiting scientist for Technology Transfer (2001) at Chiron Corporation Emeryville, Bay Area, California, USA, assistant professor of bioinformatics (2002–2006) at Nanyang Technological University, Singapore, visiting professor (2007–2009) at VIT University, Vellore, India, and professor of biotechnology (2009–2011) at AIMST University, Malaysia. He currently serves as director at Biomedical Informatics (P) Ltd. and chief editor of Bioinformation, an open access journal for beyond bioinformatics. He serves as an associate editor of BMC Bioinformatics (a UK-based Biomed Central publication since 2005). He has advised several students leading to PhD degree by research in the field and has authored numerous research articles and book chapters. He is also an author of several books published by Springer, USA (2008, 2009, 2018); NOVA USA (2011), and LAP, Germany (2011). He is an ambassador for peace, Universal Peace Federation since 2008. He is conferred the Vishal Bharathi award with the title Bharath Jothi by GOPIO on April 8, 2012. He was also awarded the Indian Leadership award for Industrial Development by All India Achievers Foundation (AIAF) on August 9, 2012. He is truly excited while farming several crops including sugarcane, peanut, black gram, green gram, vegetables, coconut, lemon, rice, and chrysanthemum (chamomile) since 1990. He is a scientist, an author of scholarly materials, teacher of higher education, professor, educationalist, editor, journalist, entrepreneur, social reformer, and a farmer.
NCBI National Center for Biotechnology Information
NDA New drug application
NMR Nuclear magnetic resonance
ORG Organelle
PAT Patent
PDB Protein data bank
PERL Practical Extraction and Report Language
PLN Plant
PHG Phage
PRO Prokaryote
PRI Primate
PSMA
Prostate-specific membrane antigen
RCSB Research Collaboratory for Structural Bioinformatics
RDBMS Relational database management system
Rg Radius of gyration
RNA Ribonucleic acids
ROD Rodent
RXR Retinoid receptor
SC S. cerevisiae
SCOP
Structural classification of proteins
SEG Single exon gene
SEGE Single exon gene in eukaryotes
SNP Single nucleotide polymorphism
SLL Squared loop length
STS Sequence-tagged sites
SYN Synthetic
TCR T-cell receptor
TT T. thermophilus
Abbreviations
U-Genome Unicellular eukaryotic genome
URL Uniform Resource Locator
VRL Viral
VRT Vertebrate
WWW World Wide Web
Chapter 1 Bioinformatics for Bioinformation
Abstract This chapter introduces concepts in bioinformatics and describes its application in agriculture, healthcare, and biotechnology. The principles and components of bioinformatics are discussed in detail. A sound knowledge on the basic concepts in bioinformatics is the foundation for bioinformation discovery from data. The significance of bioinformation discovery in defining targets for developing drugs is discussed. The issues in a discovery environment under pharmaceutical or biotechnological research and development environment are illustrated using block diagrams. The importance of biological data (sequence, structure, and function) in discovery is highlighted. The chapter also contains sufficient exercise problems for practice.
The bioinformatics discipline evolved in the late twentieth century using concepts and techniques from multiple other disciplines with a vision to study issues in biology by comparing observable phenotypes with molecular genetic data. This has immense application and relevance in healthcare, agriculture, and biotechnology (Fig. 1.1).
The key here is data related to the molecular genetics of living cells and organisms generated using advanced techniques and tools from engineering. Thus, the discipline borrows concepts and techniques either directly or indirectly from other established disciplines such as mathematics, physics, chemistry, zoology, botany, genetics, biochemistry, molecular biology, chemical engineering, biochemical reaction engineering, biotechnology, computer engineering, and information science. The idea is to store, retrieve, curate, and use molecular genetics data in databases for the simulation of molecular phenomena in cells and organisms by applying mathematical models. It should be noted that data is generated by the analysis of samples from living cells, tissues, organs, and organisms using techniques, methods,
Another random document with no related content on Scribd:
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
To protect the Project Gutenberg™ mission of promoting the free distribution of electronic works, by using or distributing this work (or any other work associated in any way with the phrase “Project Gutenberg”), you agree to comply with all the terms of the Full Project Gutenberg™ License available with this file or online at www.gutenberg.org/license.
Section 1. General Terms of Use and Redistributing Project Gutenberg™ electronic works
1.A. By reading or using any part of this Project Gutenberg™ electronic work, you indicate that you have read, understand, agree to and accept all the terms of this license and intellectual property (trademark/copyright) agreement. If you do not agree to abide by all the terms of this agreement, you must cease using and return or destroy all copies of Project Gutenberg™ electronic works in your possession. If you paid a fee for obtaining a copy of or access to a Project Gutenberg™ electronic work and you do not agree to be bound by the terms of this agreement, you may obtain a refund from the person or entity to whom you paid the fee as set forth in paragraph 1.E.8.
1.B. “Project Gutenberg” is a registered trademark. It may only be used on or associated in any way with an electronic work by people who agree to be bound by the terms of this agreement. There are a few things that you can do with most Project Gutenberg™ electronic works even without complying with the full terms of this agreement. See paragraph 1.C below. There are a lot of things you can do with Project Gutenberg™ electronic works if you follow the terms of this agreement and help preserve free future access to Project Gutenberg™ electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the Foundation” or PGLAF), owns a compilation copyright in the collection of Project Gutenberg™ electronic works. Nearly all the individual works in the collection are in the public domain in the United States. If an individual work is unprotected by copyright law in the United States and you are located in the United States, we do not claim a right to prevent you from copying, distributing, performing, displaying or creating derivative works based on the work as long as all references to Project Gutenberg are removed. Of course, we hope that you will support the Project Gutenberg™ mission of promoting free access to electronic works by freely sharing Project Gutenberg™ works in compliance with the terms of this agreement for keeping the Project Gutenberg™ name associated with the work. You can easily comply with the terms of this agreement by keeping this work in the same format with its attached full Project Gutenberg™ License when you share it without charge with others.
1.D. The copyright laws of the place where you are located also govern what you can do with this work. Copyright laws in most countries are in a constant state of change. If you are outside the United States, check the laws of your country in addition to the terms of this agreement before downloading, copying, displaying, performing, distributing or creating derivative works based on this work or any other Project Gutenberg™ work. The Foundation makes no representations concerning the copyright status of any work in any country other than the United States.
1.E. Unless you have removed all references to Project Gutenberg:
1.E.1. The following sentence, with active links to, or other immediate access to, the full Project Gutenberg™ License must appear prominently whenever any copy of a Project Gutenberg™ work (any work on which the phrase “Project Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed, viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is derived from texts not protected by U.S. copyright law (does not contain a notice indicating that it is posted with permission of the copyright holder), the work can be copied and distributed to anyone in the United States without paying any fees or charges. If you are redistributing or providing access to a work with the phrase “Project Gutenberg” associated with or appearing on the work, you must comply either with the requirements of paragraphs 1.E.1 through 1.E.7 or obtain permission for the use of the work and the Project Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is posted with the permission of the copyright holder, your use and distribution must comply with both paragraphs 1.E.1 through 1.E.7 and any additional terms imposed by the copyright holder. Additional terms will be linked to the Project Gutenberg™ License for all works posted with the permission of the copyright holder found at the beginning of this work.
1.E.4. Do not unlink or detach or remove the full Project Gutenberg™ License terms from this work, or any files containing a part of this work or any other work associated with Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute this electronic work, or any part of this electronic work, without prominently displaying the sentence set forth in paragraph 1.E.1 with active links or immediate access to the full terms of the Project Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary, compressed, marked up, nonproprietary or proprietary form, including any word processing or hypertext form. However, if you provide access to or distribute copies of a Project Gutenberg™ work in a format other than “Plain Vanilla ASCII” or other format used in the official version posted on the official Project Gutenberg™ website (www.gutenberg.org), you must, at no additional cost, fee or expense to the user, provide a copy, a means of exporting a copy, or a means of obtaining a copy upon request, of the work in its original “Plain Vanilla ASCII” or other form. Any alternate format must include the full Project Gutenberg™ License as specified in paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying, performing, copying or distributing any Project Gutenberg™ works unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or providing access to or distributing Project Gutenberg™ electronic works provided that:
• You pay a royalty fee of 20% of the gross profits you derive from the use of Project Gutenberg™ works calculated using the method you already use to calculate your applicable taxes. The fee is owed to the owner of the Project Gutenberg™ trademark, but he has agreed to donate royalties under this paragraph to the Project Gutenberg Literary Archive Foundation. Royalty payments must be paid within 60 days following each date on which you prepare (or are legally required to prepare) your periodic tax returns. Royalty payments should be clearly marked as such and sent to the Project Gutenberg Literary Archive Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive Foundation.”
• You provide a full refund of any money paid by a user who notifies you in writing (or by e-mail) within 30 days of receipt that s/he does not agree to the terms of the full Project Gutenberg™ License. You must require such a user to return or destroy all copies of the works possessed in a physical medium and discontinue all use of and all access to other copies of Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of any money paid for a work or a replacement copy, if a defect in the electronic work is discovered and reported to you within 90 days of receipt of the work.
• You comply with all other terms of this agreement for free distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™ electronic work or group of works on different terms than are set forth in this agreement, you must obtain permission in writing from the Project Gutenberg Literary Archive Foundation, the manager of the Project Gutenberg™ trademark. Contact the Foundation as set forth in Section 3 below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend considerable effort to identify, do copyright research on, transcribe and proofread works not protected by U.S. copyright law in creating the Project Gutenberg™ collection. Despite these efforts, Project Gutenberg™ electronic works, and the medium on which they may be stored, may contain “Defects,” such as, but not limited to, incomplete, inaccurate or corrupt data, transcription errors, a copyright or other intellectual property infringement, a defective or damaged disk or other
medium, a computer virus, or computer codes that damage or cannot be read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGESExcept for the “Right of Replacement or Refund” described in paragraph 1.F.3, the Project Gutenberg Literary Archive Foundation, the owner of the Project Gutenberg™ trademark, and any other party distributing a Project Gutenberg™ electronic work under this agreement, disclaim all liability to you for damages, costs and expenses, including legal fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a defect in this electronic work within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending a written explanation to the person you received the work from. If you received the work on a physical medium, you must return the medium with your written explanation. The person or entity that provided you with the defective work may elect to provide a replacement copy in lieu of a refund. If you received the work electronically, the person or entity providing it to you may choose to give you a second opportunity to receive the work electronically in lieu of a refund. If the second copy is also defective, you may demand a refund in writing without further opportunities to fix the problem.
1.F.4. Except for the limited right of replacement or refund set forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied warranties or the exclusion or limitation of certain types of damages. If any disclaimer or limitation set forth in this agreement violates the law of the state applicable to this agreement, the agreement shall be interpreted to make the maximum disclaimer or limitation permitted by the applicable state law. The invalidity or unenforceability of any provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the trademark owner, any agent or employee of the Foundation, anyone providing copies of Project Gutenberg™ electronic works in accordance with this agreement, and any volunteers associated with the production, promotion and distribution of Project Gutenberg™ electronic works, harmless from all liability, costs and expenses, including legal fees, that arise directly or indirectly from any of the following which you do or cause to occur: (a) distribution of this or any Project Gutenberg™ work, (b) alteration, modification, or additions or deletions to any Project Gutenberg™ work, and (c) any Defect you cause.
Section 2. Information about the Mission of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of electronic works in formats readable by the widest variety of computers including obsolete, old, middle-aged and new computers. It exists because of the efforts of hundreds of volunteers and donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™ collection will remain freely available for generations to come. In 2001, the Project Gutenberg Literary Archive Foundation was created to provide a secure and permanent future for Project Gutenberg™ and future generations. To learn more about the Project Gutenberg Literary Archive Foundation and how your efforts and donations can help, see Sections 3 and 4 and the Foundation information page at www.gutenberg.org.
Section 3. Information about the Project Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a nonprofit 501(c)(3) educational corporation organized under the laws of the state of Mississippi and granted tax exempt status by the Internal Revenue Service. The Foundation’s EIN or federal tax identification number is 64-6221541. Contributions to the Project Gutenberg Literary Archive Foundation are tax deductible to the full extent permitted by U.S. federal laws and your state’s laws.
The Foundation’s business office is located at 809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up to date contact information can be found at the Foundation’s website and official page at www.gutenberg.org/contact
Section 4. Information about Donations to the Project Gutenberg Literary Archive Foundation
Project Gutenberg™ depends upon and cannot survive without widespread public support and donations to carry out its mission of increasing the number of public domain and licensed works that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated equipment. Many small donations ($1 to $5,000) are particularly important to maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws regulating charities and charitable donations in all 50 states of the United States. Compliance requirements are not uniform and it takes a considerable effort, much paperwork and many fees to meet and keep up with these requirements. We do not solicit donations in locations where we have not received written confirmation of compliance. To SEND DONATIONS or determine the status of compliance for any particular state visit www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states where we have not met the solicitation requirements, we know of no prohibition against accepting unsolicited donations from donors in such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff.
Please check the Project Gutenberg web pages for current donation methods and addresses. Donations are accepted in a number of other ways including checks, online payments and credit card donations. To donate, please visit: www.gutenberg.org/donate.
Section 5. General Information About Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project Gutenberg™ concept of a library of electronic works that could be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of volunteer support.
Project Gutenberg™ eBooks are often created from several printed editions, all of which are confirmed as not protected by copyright in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition.
Most people start at our website which has the main PG search facility: www.gutenberg.org.
This website includes information about Project Gutenberg™, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks.