Page 1

International Journal of Antimicrobial Agents 39 (2012) 346–351

Contents lists available at SciVerse ScienceDirect

International Journal of Antimicrobial Agents journal homepage: http://www.elsevier.com/locate/ijantimicag

YADAMP: yet another database of antimicrobial peptides Stefano P. Piotto a,∗ , Lucia Sessa a , Simona Concilio b , Pio Iannelli a a b

Department of Pharmaceutical and Biomedical Sciences, University of Salerno, Via Ponte don Melillo, 84084 Fisciano, Salerno, Italy Department of Industrial Engineering, University of Salerno, Via Ponte don Melillo, 84084 Fisciano, Salerno, Italy

a r t i c l e

i n f o

Article history: Received 31 October 2011 Accepted 6 December 2011 Keywords: Antimicrobial peptides Web database QSAR

a b s t r a c t This work presents an antimicrobial peptide database (YADAMP) based on an extensive literature search. This database is focused primarily on bacteria, with detailed information for 2133 peptides active against bacteria. YADAMP was created to facilitate access to critical information on antimicrobial peptides (AMPs). The main difference between YADAMP and other web databases of AMPs is the explicit presence of antimicrobial activity against the most common bacterial strains. YADAMP allows complex queries, easily accessible through a web interface. Peptide information can be retrieved based on peptide name, number of amino acids, net charge, hydrophobic percentage, sequence motif, structure and activity against bacteria. YADAMP is suitable for reviewing information on AMPs and for structure–function analyses of peptides. The database can be accessed via a web-based browser at http://www.yadamp.unisa.it. © 2012 Elsevier B.V. and the International Society of Chemotherapy. All rights reserved.

1. Introduction A large number of peptide sequences with experimentally proven antibacterial activity have been collected since the discovery by Zasloff [1] of the antimicrobial peptide (AMP) magainin from frog exudate. These AMPs have a broad spectrum of activity, mainly against bacteria but also against fungi, viruses or cancer cells. Some AMPs bind to protein receptors, whereas others appear to act directly against cell membranes. This is indicated by the fact that some AMPs retain their activities when l-amino acids are replaced by d-amino acids [2]. Several models of activity have been proposed [3–6], and some characteristics were indicated as necessary to provide antibiotic activity. The most common traits were helicity, flexibility and a cationic nature. For a long time, a common denominator was searched for amongst AMPs. Very soon, it appeared that basic features, such as charge and amphipathicity [7], are far too vague and are not shared by the majority of peptides. Unfortunately, these features are in common with other groups of polypeptides such as histones, which often exhibit antimicrobial activity. AMPs can be grouped into classes with different threedimensional structures. According to Boman [8], most of the scientific research has been focused on three classes: (i) linear peptides free of cysteines and often with a ␤ helical and amphipathic solution structure; (ii) peptides with disulphide bonds giving a flat dimeric ␤-sheet structure; and (iii) peptides with a

∗ Corresponding author. Tel.: +39 320 423 0068. E-mail address: piotto@unisa.it (S.P. Piotto).

particular prevalence of certain amino acids, such as proline, arginine, tryptophan or histidine. In recent years, many novel AMPs have been discovered and characterised. Most of these data have been included in web databases [9–17]. Unfortunately, several noteworthy features of AMPs, such as minimum inhibitory concentrations (MICs) or their spectrum of activity, are not always included in public databases. Furthermore, a single web database, even the most abundant in sequences and experimental data, far from includes all AMPs that have appeared in the scientific literature [18]. After 30 years of intensive research on AMPs, an accepted, universal model of action is still lacking. This is not due to lack of scientific research, but rather to the simple fact that a comprehensive model does not exist. AMPs utilise a wide variety of mechanisms, such as altering the membrane equilibrium, creating pores, disrupting the membrane, docking a protein receptor and so on. The first requirement to conduct a quantitative structure–activity relationship (QSAR) study on a series of peptides is to cluster those expected to act in the same way. A correct QSAR investigation is simply impossible if the same analysis set includes peptides acting with different mechanisms, such as membrane disruption or enzyme inhibition. The need to have a more extended collection of AMPs, together with the need for physical-chemical parameters, motivated us to create this new web database. We created YADAMP to collect data from existing AMPs databases and literature searches and to provide fundamental structural information for further bioinformatics analysis. YADAMP contains more quantitative data than any other database and provides open access to information on peptides that is not available anywhere else. The information is integrated in

0924-8579/$ – see front matter © 2012 Elsevier B.V. and the International Society of Chemotherapy. All rights reserved. doi:10.1016/j.ijantimicag.2011.12.003


S.P. Piotto et al. / International Journal of Antimicrobial Agents 39 (2012) 346–351

plain HTML, completed with some fundamental theoretical information. With YADAMP, it is possible to create a uniform subset of AMPs that is still large enough to allow meaningful statistical analysis. 2. Methods 2.1. Data collection YADAMP is a web database dedicated to AMPs, with detailed information about activities and structural features. These AMPs come from all biological sources, ranging from bacteria and plants to animals, including humans. Sequences of active AMPs were mainly extracted from the scientific literature and were compared with data in public databases (UniProtKB/Swiss-Prot [19], APD [20], CAMP [18]). We wanted to create a resource for QSAR investigations on AMPs. YADAMP brings together data on AMPs scattered in scientific papers or web databases as well as providing structural data and information on antimicrobial activities. We collected relevant information on 2133 active sequences. Collecting and annotating large numbers of sequences is a perilous job as it can propagate errors in the original papers that might have been corrected later, or because it can introduce new errors from manual annotations. For this reason, YADAMP is frequently updated and we encourage users to submit corrections for eventual mistakes. In YADAMP, a user can obtain information about peptide name, amino acid sequence, length, presence of disulphide bridges and date of discovery. In addition, the most relevant chemicalphysical properties are calculated such as charge, hydrophobic moment, helicity, flexibility, isoelectric point, Boman index and instability index. More importantly, YADAMP is especially focused on peptides activities. In microbiology, the MIC is the lowest concentration of an antimicrobial that will inhibit visible growth of a microorganism. The MIC is necessary to perform a statistical analysis on peptides. YADAMP permits the selection of AMPs with the lowest MIC value. Experimental MIC values (expressed in ␮M) were manually extracted from careful reading of the original papers. MIC values expressed in ␮g/mL were converted to ␮M to allow a quick comparison. Unsurprisingly, the more intensively studied organisms are Escherichia coli, Pseudomonas aeruginosa and Salmonella enterica serotype Typhimurium amongst Gram-negative organisms and Staphylococcus aureus and Micrococcus luteus amongst Grampositives. These organisms are also the primary source of infections in humans. For these reasons, in YADAMP the fields corresponding to these MIC values are added, by default, to the Result page. There are also two areas for activity against Bacillus subtilis and the fungus Candida albicans owing to their high frequency. All other bacteria with experimental MIC values were inserted in fields: Other Gram−, Other Gram+ and Other for fungi and yeast. Some peptides have no available MIC values. In these cases, we introduced the values of median lethal dose (LD50 ; dose required to kill one-half of the members of a tested population after a specified test duration), minimum effective concentration (MEC; minimum amount of drug that will produce the desired therapeutic effect), lethal concentration (LC; lowest concentration that can just inhibit colony formation on thin agar plates) and median infective dose (ID50 ; number of organisms that will cause infection in 50% of susceptible test animals under defined conditions) in the area Comment. Furthermore, YADAMP provides links to the original papers for further reading. A QSAR analysis requires the availability of a high number of sequences with experimental data, however this by itself does not

347

guarantee accuracy. For an accurate analysis, it is essential to group peptides sharing some features, such as similar secondary structure, flexibility or charge. For this reason, YADAMP completes the experimental data with some theoretical information. AMPs can act in very different pH conditions, depending on the tissue in which the bacteria are grown. Assuming the residues to be independent of each other, we have calculated the charge of each peptide by the formula: Charge =

 i

Ni

10pKai 10

pH

pKai

+ 10

 j

Nj

10pH 10

pH

+ 10pKaj

where Ni is the number of the N-terminus and of the side chains of arginine, lysine and histidine. The j-index pertains to the Cterminus and the aspartic acid, glutamic acid, cysteine and tyrosine amino acids. pKai and pKaj values refer to amino acids labelled with the index i and j. The charge is calculated at three different pH values (pH 5, 7 and 9). A quick inspection at the database reveals that, mainly because of the wide variation in lysine abundance, the charge of certain peptides can largely vary at different pH. This parameter can be decisive for peptide simulations in specific tissues. The isoelectric point (pI) is the pH at which a protein has no net electrical charge. Below the pI proteins carry a net positive charge and above it they have a net negative charge. pI values are calculated according to Bjellqvist et al. [21]. As Boman first pointed out [8], in the past most authors have agreed that a potential AMP should possess a positive net charge to facilitate binding to bacterial phospholipids as well as a certain degree of amphipathicity to allow molecule adaptation to a bacterial membrane. These criteria are too broad and they also belong to groups of other polypeptides such as histones and angiotensins, which often have antibacterial activity. The index that Boman introduced shows a certain degree of discrimination between membrane-interacting and protein-interacting peptides. The index is the sum of the free energies of the respective side chains for transfer from cyclohexane to water taken from Radzeka and Wolfenden [22] and divided by the total number of residues. In YADAMP, this so-called Boman index is computed for all sequences. Hydrophobicity is another critical characteristic of amino acid residues that determines protein folding, protein subunit interaction, binding to receptors, and interactions of proteins and peptides with biological membranes. The average hydrophobicity is the total hydrophobicity (sum of all residue hydrophobicity indices) [23] divided by the number of residues. The distribution of hydrophobic residues in amphipathic peptides is revealed by the hydrophobic moment, which is the vectorial sum of all the hydrophobicity indices, divided by the number of residues. Calculation of the hydrophobic moment depends on the spatial conformation of the peptide. When the investigation is focused on amphipathic peptides, the hydrophobic moment is meaningful only for ␣ helix conformations. The amphipathicity is calculated according to three different hydrophobic scales: CCS [24]; Kyte–Doolittle [25]; and Eisenberg [23]. The secondary structure of a peptide is therefore crucial for the investigation. In YADAMP, the prediction is based upon the DSC algorithm from King and Sternberg [26] and implemented via a Perl script in Discovery Studio from Accelrys. Peptide structure prediction is essential if it is not available experimentally. The overall per residue three-state accuracy of the prediction is ca. 70%. Flexibility of a peptide can strongly affect its secondary structure. Obviously, prediction of the secondary structure of an AMP is a hard task due to the different conformations that a peptide shows in different chemical environments. Moving from the water bulk into the membrane, the structure of peptides varies considerably. In YADAMP, the flexibility of ␣ AMPs is computed according to a


348

S.P. Piotto et al. / International Journal of Antimicrobial Agents 39 (2012) 346–351

Fig. 1. Screenshot of YADAMP web database.

conformational flexibility scale for amino acids in peptides [27], which provides an absolute measure for the time scale of conformational changes in short unstructured peptides as a function of the amino acid type. A peptide cannot show activity if it is not sufficiently resistant to proteases. The instability index provides an estimation of the stability of a protein in vitro [28]. Statistical analysis of 12 unstable and 32 stable proteins has revealed patterns in the occurrence of certain dipeptides. Some dipeptides are particularly frequent in stable proteins, whereas other dipeptides are common in unstable proteins. The instability index predicts the in vivo half-life of the peptide; it is added to YADAMP to investigate the correlation between activity and in vivo stability. 2.2. Database content YADAMP contains a large number of manually annotated sequences of AMPs. To date, it contains 2133 sequences of active peptides with length between 5 and 96 amino acids; amongst them, 447 sequences have one or more disulphide bridges. In cases

in which the active structure of a peptide is not experimentally known, an estimation of helicity was made by an automated script. From helicity estimations, 1098 sequences have a helicity index >5. The action mechanism of many peptides involves interaction with the membrane, therefore hydrophobic moment is a key parameter to characterise AMPs. In our collection, there are 507 sequences with mean hydrophobic moments >3.0; these values were calculated using the CCS scale. As an example, selection of ␣ helix amphipathic peptides is accomplished by choosing a helicity value >5.0 and hydrophobic moments >3.0; YADAMP returns 456 sequences. YADAMP has the largest collection of AMP MIC values. It contains 671 sequences with an experimentally verified action against E. coli (the bacterium with the largest data set). Other bacteria well represented in YADAMP are S. aureus (587 sequences), P. aeruginosa (260 sequences), B. subtilis (173 sequences), S. Typhimurium (125 sequences) and M. luteus (121 sequences). Furthermore, there are 160 sequences active against other Gram-negative bacteria and 261 sequences active against other Gram-positive bacteria. Fungi are represented by C. albicans (248 sequences), but there are 76


S.P. Piotto et al. / International Journal of Antimicrobial Agents 39 (2012) 346–351

349

Fig. 2. Query result page.

sequences active against other fungi or yeast. A specific peptide may have activity against different bacteria, so it can be included twice or more. 3. Results 3.1. Database architecture YADAMP is built on Microsoft IIS v6.0 with Microsoft SQL Server 2005 Express Edition (driven by MinCalc API) and .NET Framework 2.0 as the back-end and pure HTML page as the front-end. IIS, SQL Server and ASP.NET technology were preferred as they are natively compatible with the original YADAMP source file (an Excel file manually noted) and xAlp asynchronous call. Therefore, YADAMP could be made a data-source, accessible by third-party applications. 3.2. Web database design: an overview The web interface of YADAMP is designed to offer a simple but practical use of the database. It is possible to query the database by length, sequence, name or discovery period. Any query can be made combining as many fields as the user likes with AND and OR operators. Since YADAMP is intended to support quantity–structure analysis, we have provided the option to search structural parameters such as disulphide bridges, charge, hydrophobic moment or propensity to ␣ helix conformation. More importantly, it is possible to select AMPs with a proven activity against five organisms, chosen because of their importance and because they are a common target of AMPs. In this way, it is extremely easy to build sets of AMPs with characteristics suitable for further investigation. As an example of use, we can simulate the search of structural elements in peptides effective against E. coli. To develop a model of action against E. coli, it is necessary to choose sequences sharing the same target (for example peptides acting on the cell membrane or against the cell wall), and it is critical to quantify the effectiveness of this action. As shown in Fig. 1, it is possible to make a query to select peptides with a similar secondary structure (in this case with helicity >5), amphipathic (with

hydrophobic moment >2) and active against E. coli (MIC < 40 ␮M). Selection of a MIC against E. coli of <40 ␮M recovers 235 sequences for further investigation (Fig. 2). A more complete description of the peptide is present (Fig. 3). In contrast to other databases, information about the activity of the peptide is not yes-or-no limited, but is manually extracted by the corresponding paper. In the YADAMP site, the Theory section provides a synopsis of the theoretical terms (Fig. 1). Finally, due to the extraordinary interest in AMPs, as reflected by the high number of publications in this field, YADAMP provides a page dedicated to literature monitoring. This is done by automatically visualising the most recent papers through RSS syndication. 3.3. Comparison with existing databases and prediction tools At the time of writing, there are several web databases of AMPs. Most of them are dedicated to specific classes of AMPs and are, therefore, extremely useful when searching AMPs of specific classes. Examples are AMSdb [29], PhytAMP [11], BACTIBASE [12], PenBase [10], RAPD [13] and SAPD [15]. The Defensins knowledgebase [14] and Peptaibol Database [17] deal with defensins and peptaibols, respectively. Two AMP databases have a broader scope. The older is APD [20], which is a well-known database available since 2003. Currently, APD contains 1356 active sequences on bacteria but information about MICs is not directly accessible. The second database, CAMP [18], is a more recent database containing 1650 active sequences, but only a fraction of them are completed with MIC values. The data in CAMP are divided into three data sets (experimentally validated, patents and predicted). YADAMP contains the highest number of active sequences and, more importantly, it easily allows the creation of homogeneous subsets with activities and similar chemical–physical structure. The database can be queried with a combination of keywords using the Boolean operators ‘and/or/not’. A comparison of some general AMP databases with YADAMP is given in Supplementary Table S1.


350

S.P. Piotto et al. / International Journal of Antimicrobial Agents 39 (2012) 346–351

Fig. 3. Experimental and predicted data of a particular antimicrobial peptide.

4. Conclusion The importance of AMPs in therapeutics is widely recognised. Understanding the role of the sequence of AMPs in their activity is important for their rational design as drugs. However, the chemical–physical features of a peptide that are more important for antimicrobial activity are still not clear. The accuracy of prediction algorithms for AMPs heavily depends on the correctness and extent of information available in the training data sets used for the study. Hence, YADAMP has been created to help researchers understand the role of AMP composition and to determine their antimicrobial activity. Data available in YADAMP allow researchers to perform QSAR analysis on peptide sets that likely share the same mechanism and to correlate the structures with MIC values. YADAMP has three main advantages: it contains the largest number of sequences with proven antimicrobial activity; it contains the largest number of sequences with reported activities; and it permits the creation of a homogeneous subset of peptides. These characteristics are all very useful to perform further bioinformatic or chemometric analysis. YADAMP will be updated monthly to provide a valuable resource for research on AMPs. 5. Availability and requirements YADAMP is freely accessible at http://www.yadamp.unisa.it. Acknowledgements The authors thank Luigi Di Biasi, Giuseppe Cattaneo and Aniello Castiglione (Department of Informatics, University of Salerno, Salerno, Italy) for their consistent support to make the

database accessible to users via the Internet, as well as Erminia Bianchino, Federica Campana, Giacomo Di Lorenzo and Edlira Graceni (Department of Pharmaceutical Sciences and Biomedical Sciences, University of Salerno) for collecting the information for some peptides. The authors are thankful to Bruno Maresca and Amalia Porta (Department of Pharmaceutical and Biomedical Sciences, University of Salerno) for useful discussions on antimicrobial peptides. The authors are also thankful to the anonymous reviewers who helped improve the quality of the work. Funding: University of Salerno (Salerno, Italy) and MIUR (Italy). Competing interests: None declared. Ethical approval: Not required. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.ijantimicag.2011.12.003. References [1] Zasloff M. Magainins, a class of antimicrobial peptides from Xenopus skin: isolation, characterization of two active forms, and partial cDNA sequence of a precursor. Proc Natl Acad Sci USA 1987;84:5449–53. [2] Merrifield R, Juvvadi P, Andreu D, Ubach J, Boman A, Boman H. Retro and retroenantio analogs of cecropin–melittin hybrids. Proc Natl Acad Sci USA 1995;92:3449–53. [3] Brogden KA. Antimicrobial peptides: pore formers or metabolic inhibitors in bacteria? Nat Rev Microbiol 2005;3:238–50. [4] Zasloff M. Antimicrobial peptides of multicellular organisms. Nature 2002;415:389–95. [5] Epand RM, Vogel HJ. Diversity of antimicrobial peptides and their mechanisms of action. Biochim Biophys Acta 1999;1462:11–28. [6] Shai Y, Oren Z. From “carpet” mechanism to de-novo designed diastereomeric cell-selective antimicrobial peptides. Peptides 2001;10:1629–41.


S.P. Piotto et al. / International Journal of Antimicrobial Agents 39 (2012) 346–351 [7] Piotto S, Concilio S, Iannelli P, Mavelli F. Computer simulations of natural and synthetic polymers in confined systems. Macromol Symp 2009;286:25–33. [8] Boman HG. Antibacterial peptides: basic facts and emerging concepts. J Intern Med 2003;254:197–215. [9] de Jong A, van Hijum SA, Bijlsma JJ, Kok J, Kuipers OP. BAGE L: a webbased bacteriocin genome mining tool. Nucleic Acids Res 2006;34(Web server issue):W273–9. [10] Gueguen Y, Garnier J, Robert L, Lefranc MP, Mougenot I, de Lorgeril J, et al. PenBase, the shrimp antimicrobial peptide penaeidin database: sequencebased classification and recommended nomenclature. Dev Comp Immunol 2006;30:283–8. [11] Hammami R, Ben Hamida J, Vergoten G, Fliss I. PhytAMP: a database dedicated to antimicrobial plant peptides. Nucleic Acids Res 2009;37(Database issue):D963–8. [12] Hammami R, Zouhir A, Ben Hamida J, Fliss I. BACTIBASE: a new web-accessible database for bacteriocin characterization. BMC Microbiol 2007;7:89. [13] Li Y, Chen Z. RAPD: a database of recombinantly-produced antimicrobial peptides. FEMS Microbiol Lett 2008;289:126–9. [14] Seebah S, Suresh A, Zhuo S, Choong YH, Chua H, Chuon D, et al. Defensins knowledgebase: a manually curated database and information source focused on the defensins family of antimicrobial peptides. Nucleic Acids Res 2007;35(Database issue):D265–8. [15] Wade D, Englund J. Synthetic antibiotic peptides database. Protein Pept Lett 2002;9:53–7. [16] Wang Z, Wang G. APD: the antimicrobial peptide database. Nucleic Acids Res 2004;32(Database issue):D590–2. [17] Whitmore L, Wallace BA. The peptaibol database: a database for sequences and structures of naturally occurring peptaibols. Nucleic Acids Res 2004;32(Database issue):D593–4. [18] Thomas S, Karnik S, Shankar Barai R, Jayaraman VK, Idicula-Thomas S. CAMP: a useful resource for research on antimicrobial peptides. Nucleic Acids Res 2010;38(Database issue):D774–80.

351

[19] UniProtKB/Swiss-Prot. http://www.uniprot.org/uniprot [accessed 12 January 2012]. [20] Wang G, Li X, Wang Z. APD2: the updated antimicrobial peptide database and its application in peptide design. Nucleic Acids Res 2008;37(Database issue):D933–7. [21] Bjellqvist B, Basse B, Olsen E, Celis JE. Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions. Electrophoresis 1994;15:529–39. [22] Radzeka A, Wolfenden R. Comparing the polarities of amino acids: side-chain distribution coefficients between vapor phase, cyclohexane, 1-octanol and neutral aqueous solution. Biochemistry 1988;27: 1664–70. [23] Eisenberg D, Weiss RM, Terwilliger CT, Wilcox W. Hydrophobic moments and protein structure. Faraday Symp Chem Soc 1982;17:109–20. [24] Tossi A, Sandri L, Giangaspero A. New consensus hydrophobicity scale extended to non-proteinogenic amino acids. In: Peptides 2002. Proceedings of the Twenty-Seventh European Peptide Symposium. Napoli, Italy: Edizioni Ziino; 2002. p. 416–7. [25] Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982;157:105–32. [26] King RD, Sternberg MJE. Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci 1996;5:2298–310. [27] Fang H, Werner MN. A conformational flexibility scale for amino acids in peptides. Angew Chem Int Ed Engl 2003;42:2269–72. [28] Guruprasad K, Reddy BVB, Pandit MW. Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Eng 1990;4: 155–61. [29] Antimicrobial Sequences Database. http://www.bbcm.univ.trieste.it/∼tossi/ amsdb.html [accessed 12 January 2012].

YADAMP  

YADAMP - yet another database of antimicrobi