Genetics is the study of genes and heredity explaining how certain traits is passed on from parents to their offspring which is as a result of changes in the DNA sequence. A gene is a part of a DNA, it contains the instructions of building molecules and how they should work. Genome on the other hand is the all the genetic material of an organism, which includes its genes and other elements that control the activities of the genes in the organism. Genomics combines recombinant DNAs, the DNA sequencing methods and the bioinformatics to sequence, assemble and analyze the structure and functions of genomes. Genomics focuses on epistasis, pleiotropy and heterosis. In the 1970s and the 1980s, Fred Sanger’s group established the sequencing techniques, genome mapping, and bioinformatics analysis and data storage.
The date for the first recognition of the human coronavirus was in the 1960s and seven types of the human coronaviruses were identified. They were classified based on their protein sequence into the; Alpha coronavirus, Beta coronavirus, Gamma coronavirus and the Delta coronavirus. The Alpha and Beta Coronavirus infect mammals and the Gamma and Delta Coronavirus infect birds. (Wu et al., 2020). For RNA viruses to control their hosts, certain proteins are required to be encoded. RNA viruses are also well known because of the high mutation rates they have and because of this, they are able to adapt easily in their hosts. There are different genomic structures for the human coronavirus genera. According to scientists, there are seven strains of coronavirus strains that can infect humans. They include; HCoV-229E, HCoV-HKU1, HCoV-NL63, HCoV-OC43, SARS-CoV, MERS-CoV and SARS-CoV-2. (Arabi et al., 2020). HCoV-229E and HCoV-NL63 belong to the Alpha coronavirus group and they use a different host protein as a target receptor. HCoV-229E uses aminopeptidase N (hAPN) as a receptor by binding to it. HCoV-NL63 binds an angiotensin converting enzyme II which is ACE 2 in addition to host aminopeptidase N as the receptor. The five Alpha coronaviruses also bind to different host receptors. SARS-CoV and SARS-CoV-2bind to ACE2 receptors, HCoV-OC43 and HCoV-HKU1 also uses the 9-O-acetylsialic acids as receptors and MERS-CoV uses the Dipeptidyl peptidase-4 (DPP4) receptors. ( Li et al., 2019). HCoV-229E, HCoV-NL63, HCoV-OC43 and HCoV-HKU1 causes cold symptoms like sneezing, fever, dry cough, sore throat etc. MERS-CoV, SARS-CoV and SARS-CoV-2, they cause failures in the respiratory system in cardiopulmonary and immunocompromised patients because they are highly pathogenic viruses. (Seah and Agrawal, 2020).
COMMON GENOMIC FEATURES AND CHARACTERIZATIONS OF CORONAVIRUS:
The coronavirus genome is represents the second genome of all RNA viruses. It is single stranded, a positive sense RNA and has a genome size ranging from 26kb to 32kb in length, has a poly-A and a cap at the 5` and 3` tail respectively( Chan et al., 2020). Lab ORFs represent the biggest gene in the coronavirus genome and it covers almost two-thirds of the entire genome. The number of ORFs present in the coronavirus genome ranges from 6 to 15 ORFs (Song et al., 2019).the two genes encode different NSPs like the replication-transcription complex which is responsible for the synthesis and transcription the subgenomic RNA sgRNA (Gorbalenya et al., 2006). Transcription regulatory sequence mediates the process of transcription that is located between ORFs in the sgRNA synthesis (Sawicki et al., 2007). Frameshift mutation between the ORF1a an ORF1b results in the synthesis of two polypeptide which will be processed into 16 NSPs by the aid of chymotrypsin like proteases or papain like proteases with the main proteases (Masters, 2006). The one third of the coronavirus genome that is remained is responsible for encoding at least four structural proteins like spin protein (S), nucleocapsid protein (N), envelope protein (E), and the membrane proteins (M), besides some accessory proteins such as 3a/b, 4a/b and Hemagglutinin-Esterase proteins (Hussain et al., 2005). The sequence alignment of all the CoVs genome illustrates 43% for the structural proteins coding regions and the remaining 58% for nonstructural noncoding regions. The identity in the entire genome among all the CoVs is about 54% when comparing. It is therefore suggested that the structural proteins have more diversity than the other nonstructural
GENOME COMPOSITION AND ORIGIN OF SARS-CoV-2 BASED ON GENE SEQUENCING
The SARS-CoV-2 genome is made up of 14 ORFs coded into 27 proteins at the 5`terminal of the genome (Srinivasan et al., 2020). There is the OFR1a and the ORF1b that encode the lab and la polypeptides respectively. The 3` terminal of the genome represents four structural proteins that is the spike, envelope, matrix and nucleocapsid proteins and eight accessory proteins that is 3a, 3b, 6, 7a, 7b, 8b, 9a, 9b and 0RF 10 proteins. The genome of the SARS-CoV-2 is about 23,903 nucleotides. The highly mutable spike protein of the virus is probably related to the increased human-to-human transmission rate through interaction with the host’s ACE2 receptor [20]. Ugurel et al. reported C14408T variant on Nsp12 and A23403G variation on Spike protein, and both cause significant mutations and changes in virus variants worldwide [66]
Bats continue to be a prime suspected origin of SARS-CoV-2 since the resemblance of the whole genome identity of bat coronavirus RaTG13 that was isolated from Rhinolophus affinis and SARS-CoV-2 is 96%. Burin cleavage near the junction of the spike protein subunit; S1 and S2 induced by the amino acid residues is an important difference in genomic factors between SARS-CoV-2 and RaTG13 (Coutard et al., 2020). When researchers compared the SARS-CoV-2 and the SARS-CoV-2 and the SARS-CoV at the amino acid level, they found out that SARS-CoV-2 was quite similar to SARS-CoV, but there were some easily noticeable differences in the 8a, 8b, and 3b protein [14]. Again, when researchers compared the SARS-CoV-2 with the MERS-CoV, they found that the SARS-CoV-2 was distantly related to the MERS-CoVs. The phylogenetic tree based on whole genomes shows that the SARS-CoV-2 is parallel to the SARS-like bat CoVs, while the SARS-CoV has descended from the SARS-like bat CoV lineage, indicating that SARS-CoV-2 is much closer to the SARS-like bat CoVs than the SARS-CoVs based on of the whole-genome sequencing (). The analysis of the genome from nine patient samples also confirmed that the SARS-CoV-2 was more similar to two SARS-like bat CoVs from Zhoushan in eastern China, bat-SL-CoVZC45 and bat-SL-CoVZXC21, than to the SARS-CoV and MERS-CoV [17]. At the whole-genome level, SARS-CoV-2 shares an 87.99% sequence identity with the bat-SL-CoVZC45 and 87.23% sequence identity with the bat-SL-CoVZXC2, less genetically similar to the SARS-CoV (about 79%) and MERS-CoV (about 50%) [17]. And at the protein level, the lengths of most of the proteins encoded by the SARS-CoV-2, the bat-SL-CoVZC45, and the bat-SL-CoVZXC21 were very similar, with only a few minor insertions or deletions [17]. Although the SARS-CoV-2 was closer to the bat-SL-CoVZC45 and the bat-SL-CoVZXC21 at the whole-genome level on the phylogenetic tree, the receptor-binding domain of the SARS-CoV-2 located in lineage B was closer to that of the SARS-CoV [17]. It was also reported that 27 of the first 41 infected patients had been exposed to the Huanan Seafood Market [18]. Thus, there was a believe that the new coronavirus originated from the Huanan Seafood Market in Wuhan and spread from animal hosts to humans in the process of wildlife trade, transportation, slaughter, and trade. Bats have most of the variety of coronaviruses in their bodies and they host of many kinds of coronaviruses, such as the SARS-CoV and the MERS-CoV [19]. The SARS-CoV and the MERS-CoV are considered to be highly pathogenic, and it is very likely that the SARS-CoV was transmitted from bats to palm civets and the MERS-CoV was transmitted from bats to dromedary camels and finally to the humans [20, 21].
DIFFERENT VARIANTS AND STRAINS:
The alternation of the SARS-CoV-2 genome, through the mutation and recombination, potentially leads to changes in the viral life cycle, including transitivity, cellular tropism, and severity of the disease. The different clinical outcomes in the COVID-19 patients happen due to SARS-CoV-2 genome mutations. Mutation of the single-stranded RNA viruses is much faster than the human genome’s mutation rate, about 10–6–10–4 and 10–8, respectively [76, 77]. This leads to numerous quasi-species in each infected one, which is to justify the observed difference in symptoms and disease severity [78]. The altered ACE2 binding interactions or the shifted tissue tropism might happen due to a mutation among viral progeny that causes aggressive and immense infections [20]. The preliminary studies at the beginning of the outbreak identified two major genotypes of SARS-CoV-2 among a Chinese population, type Ι, and type ΙΙ [18]. The prevalence of the aggressive form decreased in the early months due to the start of treatment, and its mild form became the common variant [24].
Further studies also showed that, three major variant types (A, B, C) of SARS-CoV-2 were identified, based on amino acid changes [22]. Forster et al., also confirmed those three major variant types by phylogenetic analysis of 160 viral genomes [32]. Variant A is a conventional type; type B viruses prevailed in East Asia, while both type A and C viruses have been dominant in the America and Europe. After two mutations occurring, including the synonymous mutation T8782C and the non-synonymous mutation C28144T, serine was replaced with leucine in type A, type B is formed. Type C was also derived from type B by the non-synonymous mutation G26144T, in which valine replaces glycine [32, 80, 81]
Recent studies from around the world identified eight strains of SARS-CoV-2. However, they are similar in terms of sequencing. [50]. Also, Liu et al. there have been recognition of four distinct groups of common mononucleotide types (SNVs) in more than 28,000 high-quality, the high-coverage SARS-CoV-2 complete genome sequences and demonstrating different viral strains [46]. The reports were consistent with the findings of two studies in Italy and the United States, where about 4–10 non-synonymous stable mutations were reported in the SARS genome [11, 50]. Eke, one of the mutations in Spike protein (D614G), has been observed repeatedly in the Europe and the United States since the beginning of the infection, apparently because of this, it has dramatically increased the transmission ability of the SARS-CoV-2. Thus, it has become the most common variant [41, 56]
Although these mutations of the SARS-CoV-2 appear to be very stable, consecutive consideration of these virus mutations remains very important. A large study by Poterico and colleagues showed characterized two novel mutations in the S region across 691 complete viral genomes of SARS-CoV-2 from around the world. They highlighted that the virus had acquired about 27 mutations, and most of South American countries’ strains are nearly related to European viral isolates [57]. Meanwhile, a unique mutation 24351C (A930V (T) found on the spike surface glycoprotein was reported in one of the Wuhan strains in India [24]. A study conducted in Singapore, the causes of SARS attenuation was attributed to the 382-nucleotide deletion in ORF8 of the viral genome [23]. From a survey conducted in mid-March in Mexico, evidence of local translocation of strains with an H49Y mutation in Spike protein strains was reported [63]. According to the findings of Castello et al., the first three cases of ORF amino acid are classified as S type in position 28 144 and the fourth case is a G type in the position 23 403 [29].
THE EFFECTS OF ARIANTS ON THE VIRAL TRANSMISSION:
Some mutations facilitate transmission of SARS-CoV-2 between animal species and humans. The G-U transversion excess might also play a role in the bat to human transmission [51]. Besides, the S1/S2 junction region’s specific motif can cause the viral exchange between species [44]. With regards to the viral transmission between humans, some fundamental facts are noteworthy. The rapid viral replication may cause rapid morbidity and mortality and hinder the viral passage to healthy individuals. Viruses causing slower replications and asymptomatic or mild disease can also allow the transmission for a more extended period [27].