Bioinformatic Identification of Putative Crassostrea virginica Defensin Genes Using a Custom Profile Hidden Markov Model Jade Drawec and Maureen Krause Department of Biology, Hofstra University, Hempstead, NY 11549
Methods
Introduction Antimicrobial peptides are essential components of immune defense in all eukaryotes, particularly for invertebrates lacking adaptive immunity. Specifically, molluscan hemocytes and epithelial cells secrete antimicrobial peptides known as defensins as a part of their complex innate immune response.6 Most molluscan defensins belong to the cis-defensin superfamily, which are cationic, small (~4-5 kDa), and generally have 6-8 cysteine residues that form 3-4 disulfide bonds between the α-helix and a βsheet, where the directionality of the amino acid sequences is parallel (cis).1, 6, 8 Their cysteine pattern should make them a prime target for bioinformatic identification; however, multiple factors lead to difficulty in predicting defensin sequences from genome sequences, including from the American oyster, Crassostrea virginica. The National Center for Biotechnology Information (NCBI) genome annotation pipelines set cutoffs for open reading frame (ORF) lengths that makes discovering short polypeptides difficult.9 Aside from the cysteine motif, the rest of the amino acids in cisdefensins in molluscs are highly divergent and extensive posttranscriptional and post-translational modifications may occur, along with frequent insertions and deletions common to cysteinerich peptides.2, 4, 6, 8, 11, 12 To date, no cis-defensin genes have been identified in the C. virginica reference genome, however one has been identified through proteomics. Oysters support the large global fisheries and aquaculture industry and provide essential ecosystem services but are challenged by numerous microbial diseases10. Annotation of oyster defensins can contribute to a greater understanding of oyster immunity and support potential use of these AMPs as therapeutic drugs. The purpose of this study was to implement a bioinformatic approach using amino acid sequences from known molluscan defensins to characterize putative defensin genes from existing C. virginica transcriptomes and to verify a manually identified multigene defensin cluster. -N-C-C-N-C-C-N-C-C-N-C-C-N-C-C-N
Figure 1. Structure of constitutively expressed Crassostrea gigas (Pacific oyster) cis-mantle defensin Cg-Defm (PDB: 2B58) from Schmitt et al. 2012, Figure 3 (left). Model of two cis-oriented disulfide bonds (yellow) formed between an α-helix (red) and a βsheet (blue) (right).
Acknowledgements We are very grateful to the Hofstra University Honors College Undergraduate Research Assistantship Program for their support of this project. We would also like to thank James Kuldell, Juliette Gorson, Anthony Ricigliano, and Gregory Quintanilla for their help throughout this learning process.
Table 2. BLASTN results from hmmsearch of the digestive gland transcriptome of C. virginica exposed to crude oil (GHJJ00000000.1) using the custom Molluscan defensin pHMM. NCBI Description PREDICTED: Crassostrea virginica uncharacterized LOC111110325 (LOC111110325), ncRNA PREDICTED: Crassostrea virginica uncharacterized LOC111106012 (LOC111106012), ncRNA PREDICTED: Crassostrea virginica uncharacterized LOC111109893 (LOC111109893), transcript variant X1, ncRNA PREDICTED: Crassostrea virginica uncharacterized LOC111109893 (LOC111109893), transcript variant X2, ncRNA PREDICTED: Crassostrea virginica uncharacterized LOC111109894 (LOC111109894), ncRNA PREDICTED: Crassostrea virginica uncharacterized LOC111106655 (LOC111106655), ncRNA PREDICTED: Crassostrea virginica uncharacterized LOC111128490 (LOC111128490), ncRNA PREDICTED: Crassostrea virginica uncharacterized LOC111106020 (LOC111106020), ncRNA PREDICTED: Crassostrea virginica uncharacterized LOC111109895 (LOC111109895), ncRNA
Max Total Score Score 268 268
Query Cover 82%
E value 8.00E-71
Per. ident 87.79
Acc. Accession Len 325 XR_002635799.1
178
313
97%
1.00E-43
85.62
793 XR_002635153.1
164
164
93%
2.00E-39
74.8
370 XR_002635718.1
148
148
85%
2.00E-34
74.45
503 XR_002635719.1
101
101
44%
2.00E-20
80
292 XR_002635720.1
96.9
96.9
38%
3.00E-19
81.63
1057 XR_002635244.1
96.9
96.9
44%
3.00E-19
79.13
346 XR_002638731.1
93.3
93.3
97%
3.00E-18
69.26
355 XR_002635155.1
87.8
87.8
40%
1.00E-16
79.05
340 XR_002635721.1
Table 3. Comparison of the number of significant transcripts and the number of BLASTN hits using the two pHMMs (custom Molluscan, P401097) for each of the transcriptomes. Transcriptome McDowell et al. 2014 GGQE00000000.1 GHJJ00000000.1
pHMM used for hmmsearch Custom Molluscan PF01097 Custom Molluscan PF01097 Custom Molluscan PF01097
Number of Transcripts (e-value < e-3) 13 8 66 43 34 29
Number of BLASTN Hits 2 2 9 0 9 9
Figure 2. Bioinformatics pipeline for the identification of putative Crassostrea virginica cis-defensin-like genes. Creation of a custom profile HMM and use of a previously made Arthropod Defensin pHMM (PF01097) (blue) to search existing C. virginica transcriptomes (GGQE00000000.1, GHJJ00000000.1, McDowell et al. 2014) (green) for sequences that resemble defensins using HMMER (v3.3.2) (pink). BLASTN of transcripts from hmmsearch resulted in potential C. virginica defensin genes (yellow). Software or source used is indicated below the corresponding step. Figure 3. Multigene C. virginica defensin cluster identified on chromosome 8 and one defensin gene identified on chromosome 4. Genes represented by arrows, double lines denote a gap that includes a Gigasin-like gene. American Oyster Defensin (AOD)7 is believed to coincide with LOC111106012.
Results Table 1. Transcript list resulting from hmmsearch of the transcriptome of bacteria challenged C. virginica (McDowell et al. 2014) using the custom Molluscan pHMM. Grey highlighted transcripts were found in both pHMM searches of this transcriptome. Transcript comp11934_c0_seq1.p1 comp59630_c0_seq1.p1 comp16300_c0_seq1.p1 comp17621_c0_seq1.p1 comp35752_c0_seq1.p1 comp5314_c0_seq1.p1 comp22141_c0_seq1.p1 comp17515_c0_seq1.p1 comp17515_c0_seq2.p1 comp1155_c0_seq1.p1 comp4040_c0_seq1.p1 comp16636_c0_seq1.p3 comp567_c0_seq1.p1
E-value 1.1e-15 3.6e-15 4.7e-11 7.3e-11 1.8e-08 1.9e-08 2e-08 2.2e-08 2.2e-08 4.6e-07 1.5e-06 7.2e-05 2.3e-04
Full Sequence Score 62.1 60.5 47.3 46.7 39.1 39.0 38.9 38.7 38.7 34.5 32.8 27.5 25.9
Bias
9.1 3.5 3.6 4.8 4.5 5.0 7.9 4.5 4.5 3.4 9.6 3.4 10.6
E-value 1.3e-15 3.9e-15 5.1e-11 1.1e-10 2.3e-08 2.6e-08 2.7e-08 2.8e-08 2.8e-08 6.4e-07 2.4e-06 7.9e-05 3.2e-04
Best 1 Domain Score Bias 9.1 61.9 3.5 60.4 47.2 3.6 46.1 4.8 38.7 4.5 5.0 38.5 7.9 38.5 38.4 4.5 38.4 4.5 34.1 3.4 32.2 9.6 27.4 3.4 25.4 10.6
Figure 4. Multiple sequence alignment of predicted amino acid sequences of putative C. virginica defensin genes. Red highlight denotes predicted signal peptide sequence. Yellow highlight identifies conserved cysteine pattern. ORFs predicted using Expasy web server, alignment performed using Clustal O (1.2.4), signal peptide prediction done using SignalP (5.0). Cysteine motif pattern present: C-X6-C-X3-C-X9-C-X7/8-C-X-C.
Conclusions • The multigene cluster of putative C. virginica defensin genes on chromosome 8 originally discovered by hand was validated. • Two additional C. virginica cis-defensin-like genes were identified (LOC11110984, LOC111128490), including a gene on chromosome 4 that is not a part of the multigene cluster, and the gene that might code for AOD which was previously identified using proteomics7. • It was determined that the Arthropod defensin pHMM (PF01097) was a very good alternative to the custom Molluscan defensin pHMM as the transcript lists generated by hmmsearch using each of them resulted in the same BLASTN hits except for the transcriptome GGQE00000000.1 which resulted in 0 hits. • Although this pipeline resulted in BLASTN results, the process still relied on sequence similarity-based assumptions which may only find defensins that look a lot like those that have already been discovered. Unfortunately, these methods alone can only perpetuate the bias in phylogenetic comparisons between the known defensin families, as it excludes defensins that lack sequence-similarity with those previously discovered.
Future Directions • Update the NCBI annotation of the loci presently described as ncRNAs, including analysis of the full reference transcriptome. • Characterize tissue-specific expression of these genes through qPCR. • Verify antimicrobial activity and specificity through functional studies of the isolated or recombinant peptides. • Compare Crassostrea gigas and other oyster defensins to those found using this pipeline to gain insight into the evolution of this gene family. • Investigate possibility of defensin copy number variation, which is correlated with immune defense ability in some taxa3. • The potential exists for selective breeding of oysters for disease tolerance based on defensin variants, or defensins could be used directly as probiotics or antibiotics.
References 1. Bachère, E. et al. (2015). The new insights into the oyster antimicrobial defense: Cellular, molecular and genetic view. Fish & Shellfish Immunology, 46(1), 50–64. 2. Garrett, S., & Rosenthal, J. J. C. (2012). RNA Editing Underlies Temperature Adaptation in K+ Channels from Polar Octopuses. Science, 335(6070), 848–851. 3. Machado, L. R., & Ottolini, B. (2015). An Evolutionary History of Defensins: A Role for Copy Number Variation in Maximizing Host Innate and Adaptive Immune Responses. Frontiers in Immunology, 6. https://doi.org/10.3389/fimmu.2015.00115 4. McDowell, I. C., Nikapitiya, C., Aguiar, D., Lane, C. E., Istrail, S., & Gomez-Chiarri, M. (2014). Transcriptome of American Oysters, Crassostrea virginica, in Response to Bacterial Challenge: Insights into Potential Mechanisms of Disease Resistance. PLoS ONE, 9(8), e105097. 5. Mitchell, M. L., Shafee, T., Papenfuss, A. T., & Norton, R. S. (2019). Evolution of cnidarian trans ‐defensins: Sequence, structure and exploration of chemical space. Proteins: Structure, Function, and Bioinformatics, 87(7), 551–560. 6. Schmitt, P., Rosa, R. D., Duperthuy, M., de Lorgeril, J., Bachère, E., & Destoumieux-Garzón, D. (2012). The Antimicrobial Defense of the Pacific Oyster, Crassostrea gigas. How Diversity may Compensate for Scarcity in the Regulation of Resident/Pathogenic Microflora. Frontiers in Microbiology, 3. 7. Seo, J.-K., Crawford, J. M., Stone, K. L., & Noga, E. J. (2005). Purification of a novel arthropod defensin from the American oyster, Crassostrea virginica. Biochemical and Biophysical Research Communications, 338(4), 1998–2004. https://doi.org/10.1016/j.bbrc.2005.11.013 8. Shafee, T. M. A. et al. (2017). Convergent evolution of defensin sequence, structure and function. Cellular and Molecular Life Sciences, 74(4), 663–682. 9. Thibaud-Nissen, F. et al. Eukaryotic Genome Annotation Pipeline. 2013 Nov 14. In: The NCBI Handbook [Internet]. 2nd edition. Bethesda (MD): National Center for Biotechnology Information (US); 2013-2021. 10. Zannella, C., Mosca, F., Mariani, F., Franci, G., Folliero, V., Galdiero, M., Tiscar, P. G., & Galdiero, M. (2017). Microbial Diseases of Bivalve Mollusks: Infections, Immunology and Antimicrobial Defense. Marine Drugs, 15(6), 182. https://doi.org/10.3390/md15060182 11. Zeng, X.-C., Luo, F., & Li, W.-X. (2006). Characterization of a novel cDNA encoding a short venom peptide derived from venom gland of scorpion Buthus martensii Karsch: Trans-splicing may play an important role in the diversification of scorpion venom peptides. Peptides, 27(4), 675–681. 12. Zielezinski, A., Vinga, S., Almeida, J., & Karlowski, W. M. (2017). Alignment-free sequence comparison: Benefits, applications, and tools. Genome Biology, 18(1), 186.