4 minute read

Genomic Structure and Variation

164 | CovidReference.com

Hemagglutinin Esterade Protein (HE)

There is the presence of an additional structural protein, the hemagglutininesterase (HE) protein, in a subset of Betacoronaviruses such as murine CoVs and the humancoronavirus HKU1. The HE-protein is also located within in viral envelope as a dimer forming short projections of 5-10nm. It functions as a cofactor for the S-protein assisting with viral attachment and transit through extracellular mucosa (Knipe 2013). It is however, is not a feature of SARS-CoV-2.

The CoV genome is composed of nonsegmented, single stranded, positive sense RNA ranging from 26-32 kilobases pairs (kbp) in length. Unlike other RNA viruses (whose genomes are generally less than 10kbp), CoV genomes are comparatively extremely large. Structurally they comprise of the same basic features which include: 5’ cap structure, 3’-poly-adenylated tail, 5’ and 3’ untranslated region (UTR), a conserved set of structural and non-structural genes, as well as strain-specific accessory genes. The invariant genes order of all CoV families are arranged in a 5’ to 3’ order as: ORF1a/b (or replicase gene), spike (S), envelope (E), membrane (M), and nucleocapsid (N), with accessory genes interspersed among the structural genes at the 3’ end (Perlman 2020, Chen Y 2020). Between each gene downstream from ORF1a/b, there are short motifs ranging from 6-49 base pairs called transcription regulatory sequences (TRS). TRS play a role in the production of subgenomic RNA (sgRNA) which is discussed later in the chapter (O’Leary 2020). Genomic sequencing of lower respiratory tract samples from index patients in Wuhan, China, identified SARS-CoV-2 as a novel coronavirus. It was thus placed by the CSG within the Coronaviridae family [Lu 2020]. Phylogenetic analysis conducted to determine the relationship of SARS-CoV-2 to other CoV clustered it in the Betacoronavirus genus, Sarbecovirus subgenus [Tan W 2020, Zhu N 2020]. Notably there is 94,4% homology with SARS-CoV in the seven conserved replicase domains in ORF1ab forming a distinct clade within the Severe Acute respiratory syndrome related coronavirus species (SARSr-CoV).

Virology | 165

Table 2. The functions of the 16 non-structural proteins (NSP) translated from the CoV genome. NSP1-10 is translated from ORF1A, whereas NSP11-16 is translated from ORF1B. * Non-structural protein

GENE NONSTRUCTURAL PROTEIN (NSP) NSP 1 FUNCTION

 Inhibits interferon signaling  Inhibit host protein synthesis  Cellular mRNA degradation

1A ORF

ORF 1B

NSP 2

Unknown

NSP 3

NSP 4

 Tethers genome to RTC allowing initiation of RNA synthesis  Papain-like protease (PLP) for polypeptide cleaving  Inhibit host innate immune response  Promotes cytokine expression  Transmembrane helices that anchor RTC to intracellular membranes  Double membrane vesicle formation

NSP 5

NSP 6

NSP 7

NSP 8

NSP 9

 Main protease (Mpro) for polypeptide cleaving  Chymotrypsin-like protease (3CLpro) for polypeptide cleaving  Inhibits interferon signaling  Transmembrane helices that anchor RTC to intracellular membranes  Double membrane vesicle formation Essential small proteins:  Hexadecameric complex (cofactor NSP8 and NSP12) Essential small proteins:  Hexadecameric complex, (cofactor NSP7 and NSP12)  Primase Essential small proteins:  RNA-binding protein  Dimerization

NSP 10

NSP 11

Essential small proteins:  Zinc binding domain (ZBD) cofactor for 2’-Omethyltransferase (2’-O-MTase)  Scaffold protein for NSP14 and NSP16 Unknown NSP 12 RNA-dependent polymerase (RdRP) NSP 13  Zinc binding domain (ZBD)  RNA 5’ triphosphatase – synthesis of 5’ terminal cap structure of mRNA  RNA helicase – unwinds RNA duplexes with a 5’-3’ polarity NSP 14  3’-5’ exonuclease – some proof-reading activity that is unique to CoV  N7-methyltransferase NSP 15 Endoribonuclease NSP 16  2’-O-methyltransferase (2’-O-MTase)  Downregulating host innate immunity

166 | CovidReference.com

The SARSr-CoV species comprises of hundreds of known viruses predominantly isolated from humans and diverse bats. Understandably the reference to “SARS” can be misleading as SARS-CoV-2, along with other SARSr-CoV, do not cause SARS-like clinical disease. SARS-CoV was the prototype of a new viral species and thus the unique name was assigned to the species as per established viral taxonomic practise. Accordingly, virus nomenclature does not necessarily indicate SARS-like disease but refers to the phylogenetic grouping within the founding virus’s species (CSG ICTV 2020, Wu Y 2020). Each NSPs has a specific role in the replication of coronaviruses although not all are known or well understood. Their functions are summarized in table 2 (Knipe 2013, Chen Y 2020). Downstream from the replicase gene on the remaining one-third of the genome near the 3’-end, are ORFs coding for structural and accessory proteins. As previously mentioned, these genes are not translated directly from the viral RNA, but from subgenomic mRNA Accessory genes are intercalated between the structural genes and can range from one up to as many as eight. On occasion, these accessory ORF may be embedded or overlap with another gene making analysis by bioinformatics challenging. The accessory proteins in CoV vary in location and size in the different viral sub-groups. Functions of the accessory proteins remain elusive; however, they are understood to be involved in pathogenicity of the natural host (Michel 2020). Studies involving mutational knockout or deletions of these accessory genes indicating that they are not essential for viral replication (Knipe 2013).

Figure 4. Organization of the Coronavirus Genome.

The SARS-CoV-2 genome is typical of a CoV comprising of 29903 base pairs, single stranded, positive sense RNA. Replicase and structural genes are pre-