A CONCISE IN SILICO PREDICTION REPORT OF A POTENTIAL PRION-LIKE DOMAIN IN SARS-COV-2 POLYPROTEIN

COVID-19 has shown higher virulence compared to the previous coronavirus epidemics and has shown that it causes damages to the nervous system. In the present study, PrionW web server was used to predict the prion-like domains (PrLDs) in 15 structural and non-structural proteins of SARS-CoV, MERS-CoV and SARS-CoV-2. Among all of these proteins, the results demonstrated one PrLD with the sequence (51)EDDYQGKPLEFGATSAALQPEEEQEEDWLDDDSQQTVGQQDGSEDNQTTTIQTIVEVQPQL(1012), having an amyloid-core of (988)GQQDGSEDNQTTTIQTIVEVQ(1009) in the non-structural protein of SARS-CoV-2 with pWALTZ_Score of 59.9936. The sequence of SARS-CoV-2 polyprotein was further investigated by FoldIndex(C) tool, and a negative fold index was demonstrated at the site of predicted prion-like domain. Multiple sequence alignment of this region with non-structural proteins of SARS-CoV and MERS-CoV, showed that there is no sequence similarity between this predicted region and the corresponding regions of two other viruses. Considering the high similarity between polyproteins of SARS-CoV-2 and SARS-CoV, and their ability to affect the nervous system, it could be suggested that a potential PrLD might be added to SARS-CoV polyprotein.


INTRODUCTION
Changing the human behaviors and environmental factors have caused the development of more than 30 new infectious disorders in the last decades (Nkengasong 2020). In 2002 and 2003, severe acute respiratory syndrome coronavirus (SARS-CoV) was appeared among human beings. The 2nd coronavirus epidemic, famous as Middle East Respiratory Syndrome Corona Virus (MERS-CoV), appeared in 2012 (Weston and Frieman 2020). In 2019-20, coronavirus pandemic of COVID-19 appeared by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (Ahmed et al. 2020). Similar to coronavirus causing previous epidemics, SARS-CoV-2 is a member of βcoronavirus genus (Lu et al. 2020). Coronaviruses are characterized by having a large single-stranded, positive-sense RNA genomes. The initial two-thirds of their genetic material, encodes for two large polyproteins cleaved into the non-structural proteins that are necessary for the production of viral particles. The rest of their genome encodes for the structural proteins, including the envelope, spike, the membrane, and the nucleocapsid proteins Recent information has indicated that neurotropism is a similar aspect of coronaviruses (Glass et al. 2004). It has been reported that infection of COVID-19 could affect the brains of patients and animal models (Florindo et al. 2020), and it is why COVID-19 sufferers also present neurologic symptoms by affecting central nervous system (CNS). Previously, it has reported that in mice infected with MERS-CoV, viral particles were found only in the brain, but not in the lung, which demonstrates that the infection in the central nervous system could be more critical for the high mortality rate in the infected mice (Li et al. 2020). Consequently, it seemed necessary to predict the prion like domains (PrLDs) in various proteins of SARS-CoV, MERS-CoV, and SARS-CoV-2 and find out a probable relationship among the neural damage capability of these viruses and the presence of PrLDs. Prion disorders are infectious neurodegenerative condition that happen in both humans and animals, which could be a cause for mortality and morbidity (Aguzzi et al. 2008). These low complexity sequences are found in RNA binding proteins. Moreover, PrLDs have also been associated in mediating gene regulation through liquid-phase transitions which causes ribonucleoprotein granule assembly (Hennig et al. 2015). Previously, it has been indicated that some viral proteins, such as LEF-10 which is a baculovirus-encoded protein, has prion-like behaviors (Nan et al. 2019), therefore it is possible that virus which affect the neural system, such as COIVID-19, express proteins with such properties. The main event in the prion related disorders includes the aggregation of misfolded prion proteins into the large amyloid plaques and fibrous constructs that could induce neurodegeneration. The importance of the prion-like domains in eukaryotic viruses, including human disorders, has been reported by the earlier researches (Tetz and Tetz 2018). Moreover, recent findings also suggest that in addition to neurological disorders, the proteins with prion-like domains might also be involved in other pathologies, such as cancers, and viral infection process Nowadays, by using bioinformatics, various researches are carried out on viruses. Accordingly, in the present work a bioinformatics approach is used to investigate the possible role of prion like domain involved in SARS-CoV-2 that could influence different aspects of pathogenesis and immune responses toward this pandemic infection.

Data collection
This study is based on the fact that bioinformatics prediction of prion propensities is feasible and could help inspecting the potential prion like domain sequences (Castilla and Requena 2015). At first, amino acid sequences of structural and non-structural proteins of SARS-CoV-2, SARS-CoV and MERS-CoV were fetched from NCBI (https://www.ncbi.nlm.nih.gov/) ( Table 1).
COVID-19 has shown higher virulence compared to the previous coronavirus epidemics and has shown that it causes damages to the nervous system. In the present study, PrionW web server was used to predict the prion-like domains (PrLDs) in 15 structural and nonstructural proteins of SARS-CoV, MERS-CoV and SARS-CoV-2. Among all of these proteins, the results demonstrated one PrLD with the sequence 951 EDDYQGKPLEFGATSAALQPEEEQEEDWLDDDSQQTVGQQDGSEDNQTTTIQTIVEVQPQL 1012 , having an amyloid-core of 988 GQQDGSEDNQTTTIQTIVEVQ 1009 in the non-structural protein of SARS-CoV-2 with pWALTZ_Score of 59.9936. The sequence of SARS-CoV-2 polyprotein was further investigated by FoldIndex© tool, and a negative fold index was demonstrated at the site of predicted prion-like domain. Multiple sequence alignment of this region with non-structural proteins of SARS-CoV and MERS-CoV, showed that there is no sequence similarity between this predicted region and the corresponding regions of two other viruses. Considering the high similarity between polyproteins of SARS-CoV-2 and SARS-CoV, and their ability to affect the nervous system, it could be suggested that a potential PrLD might be added to SARS-CoV polyprotein.

Prion structure prediction
At the next step, PrionW (http://bioinf.uab.cat/prionw/), which is a web-based program that could scan sequences to identify the proteins containing PrLDs (Zambrano et al. 2015; Tetz and Tetz 2018b), was applied for 15 abovemotioned proteins. This online server is a web-based program that predicts proteins containing Q/N rich prion domains (PrLDs) and recognizes the 21residues amyloid-cores nucleating their amyloid construction. Previously, it has been reported that the potential of the amyloid-core is associated with the prionic propensities of Q/N rich peptides. At this study, Q+N richness ≥ 20 and pWaltz cut off equal to 50 were used. Since one of the he properties for a prion is its lack of proper folding (Sabate et al. 2015), the sequence for YP_009724389.1 that was showed to have a prion-like domain, was further investigated by FoldIndex© tool, using a window size of 51 (Prilusky et al. 2005).

Sequence alignment
To identify regions of similarity between any two protein sequences, Lalign program was used (Huang and Miller 1991). Lalign application implements the algorithm of Huang and Miller and provides a time-efficient algorithm which creates k best "non-intersecting" local alignments for any selected k. The advantage of this algorithm is that it only requires O (M + N + K) space, while M and N are the lengths of the provided sequences, and K is the overall length of the computed alignments. For multiple sequence alignment of proteins, Multalin server (http://multalin.toulouse.inra.fr/multalin/) was used (Corpet 1988). In this server, the closest sequences are aligned, which creates groups of aligned sequences. Afterward, the close groups were aligned until all sequences had aligned in only one group. Finally, to find out the closest amino acid sequences to the region predicted as a prion in SARS-Co-2 polyprotein, BLAST (https://blast.ncbi.nlm.nih.gov) was carried out.

Molecular Modelling
Tertiary structure prediction for the amyloid-core was carried out with Quark modeler program (https://zhanglab.ccmb.med.umich.edu/QUARK/). This server provides ab initio structure prediction and protein peptide folding, aiming to build the appropriate protein tertiary model from its amino acid sequence. Models provided by this server are created from small fragments (1-20 residues) via replica-exchange Monte Carlo simulation guided by an atomic-level knowledgebased force field (Xu and Zhang 2013). Ramachandran plot and Z-score analysis were used for inspecting the predicted model ( In order to analyze the evolutionary connection between the selected sequences, the phylogenetic tree analysis was carried out applying the Neighbor-Joining method (Saitou and Nei 1987). The bootstrap assay via one thousand replicates was used for phylogenic study (Felsenstein 1985).

Phylogenetic investigation
The evolutionary distances were investigated applying the method of Poisson correction (Zuckerkandl and Pauling 1965) and were in the units of the number of amino acid substitutions per location. This investigation covered 15 amino acid sequences selected for the study. All ambiguous positions were deleted for each sequence pair (option of pairwise deletion). Finally, there were a total of 7358 sites in the final dataset. Evolutionary investigations were carried out using the MEGA X program (Kumar et al. 2018).

Prion like domain and amyloid-core investigation
Analysis of amino acid sequences by PrionW tool indicated that among all the structural and non-structural proteins of these coronaviruses, the only protein which was containing a prion-like domain was the polyprotein of SARS-CoV-2. The predicted amino acid sequence for the prion-like domain was 951 EDDYQGKPLEFGATSAALQPEEEQEEDWLDDDSQQTVGQQDGSEDNQ TTTIQTIVEVQPQL 1012 with an amyloid-core of 988 GQQDGSEDNQTTTIQTIVEVQ 1009 , which is located in the nonstructural protein 3 (Nsp3) site (YP_009725299 locus) which has 1945 amino acids. Further analysis of polyprotein of SARS-CoV-2 using Foldindex © indicated that the sequence of this protein has a negative folding index in the predicted region by PrionW tool, that further confirms the possibility of prion-like potential for this sequence (Fig. 1).

Sequences alignment
Pairwise sequence alignment of polyproteins showed that there was not much similarity between polyproteins of SARS-CoV-2 and MERS-CoV (72.0%), however the similarity between polyproteins of SARS-CoV-2, SARS-CoV was 95.9% (Table 2). Multiple sequence alignment of the region, which was predicted to be a prion in SARS-Co-2 polyprotein and the corresponding regions in SARS-CoV and MERS-CoV demonstrated that excluding the position of one residue, there was no sequence similarity between these predicted regions. It could be concluded that the region might be added to the SARS-Co-2 polyprotein. The results of sequence BLAST showed that the closest amino acid sequence to the predicted amyloid-core of the prion like domain was seen in orf1ab polyprotein of bat (Rhinolophus affinis) coronavirus RaTG13 (GenBank accession number QHR63299.1) (Fig.2).
The molecular weight of the domain was calculated 6825.97 g/mol and the amyloid-core as 2291.34 g/mol based on the sequences. According to homology modeling assay, amyloid-core was located inside of the beta sheet, which is in agreement with the proposed construct for amyloid-cores in prion like domains. Fig. 3 shows the 3D construct of the predicted amyloid-core mode from Quark ab initio modeler (Fig. 3a) Topology of the predicted amyloid-core model indicated by PDBsum (Fig. 3b), Ramachandran analysis by using PROCHECK (Fig. 3c), and the Z-plot of the predicted model by ProSA web server (Fig. 3d). According to the z-plot, the z-score for predicted model was -0.34, and Ramachandran plot indicated that none of the residues were in the disallowed region.

Figure 3 (a)
The 3D predicted structure of the amyloid-core by using Quark server (b) Topology of the predicted amyloid-core structure (c) Ramachandran analysis of predicted amyloid-core model (d) Z-plot analysis of the predicted model for amyloid-core

Phylogenic tree analysis
The results of phylogenic tree analysis based on bootstrap is provided in Fig 4. The optimal tree with the sum of branch length = 7.24424446 is provided in the Fig4. The percentage of replicate trees in which the related taxa clustered together in the bootstrap assay, are demonstrated beside the branches. The tree is provided to scale, with branch lengths in the same units as those of the evolutionary distances applied to infer the final structure.

DISCUSSION
In 2002, SARS-CoV infected a large human population. This virus was originated from bat and transmitted through a palm civet to affect a human in the Guangdong province of China, causing to affect more than eight thousand individuals. A year after, SARS-CoV infections was discontinued, and afterward no epidemic of infection was reported (Zimmermann and Curtis 2020). MERS-CoV, a second epidemic coronavirus, appeared in 2012 and similar to SARS-CoV, the viral infection was initiated with a pneumonia condition and was originated from a bat and came via a camel to a human. Although, MERS-CoV demonstrated to be more restricted human-to-human transmission compared with the SARS-CoV. Since 2012, there have been barely around 2,500 sufferers, and most of them were from Middle East (Ahmed et al. 2020). SARS and SARS-COV-2 viruses are similar in many aspects, even share a common receptor named angiotensin converting enzyme 2 (ACE2) (Aguzzi and Heikenwalder 2006), and their different proteins show similarities. It has been indicated that CoVs are not always restricted to affect the respiratory system and they could also affect the CNS, which results in neurological disorders. Such a neuroinvasive tendency of CoVs has been reported in most of the β-CoVs members, such as SARS-CoV, MERS-CoV, HCoV-229E, HCoV-OC43, mouse hepatitis virus, and porcine hemagglutinating encephalomyelitis coronavirus (HEV) (Seeger et al. 2005). Because of the high resemblance between SARS-CoV and SARS-CoV-2, it could possible that SARS-CoV-2 might also have similar properties (Li et al. 2020). Emerging of the bioinformatics as a science and its accelerated progresses in the last decade have reduced many costs of laboratory works and financial burdens by providing useful predictions in the different areas of life science. In silico assays provide a platform for analysis of many molecular interactions which could be of importance in various studies, such as characterizing and sorting of biological macromolecules or studying the efficacy of designed therapeutic agents ( Recent evidences demonstrated that viral prion-like proteins could be present in some viruses that affect plants, insects, mammals, and humans. Additionally, some prion domains have been identified in proteins of viruses that influence human health, including human Herpes viruses and Hepatitis B and C. Earlier studies have reported the human Herpes virus could be associated with some neurological conditions, such as Alzheimer's, and the presence of HSV1 antigens in the cerebrospinal fluid of Alzheimer's sufferers has been reported (Itzhaki 2014). The presence of the multiple PrDs in the HCV1 protein has been suggested to correlate with the disordered condition of sufferers (Tetz and Tetz 2018). COVID-19 has spread more quickly than SARS-CoV, MERS-CoV and any other coronavirus. One possibility is that COVID-19 is better at breaking into human cells, because it contains a region that resembles a PrLD (Tetz and Tetz 2018). A recent bioinformatics study by Tetz et al. has predicted the presence of a prion like domain in the spike protein (receptor-binding domain of the S1 subunit) of SARS-CoV-2 by using the PLAAC algorithm ( Tetz and Tetz 2020). Although bioinformatics methods only present predictions and not all of these predictions by in silico methods always come true, and the different results could come out when the different algorithms are used, but it could be considered as a warning sign for those who use these proteins with potential prion like domains at in vivo studies and clinical trials, and more in-depth and conclusive laboratory confirmation assays regarding the presence of these domains need to be carried out. At the current study, the presence of a prion like domain was investigated in polyprotein of SARS-CoV-2 using the PrionW tool, which predicts proteins containing Q/N rich prion domains. The predicted region of the potential prionlike domain was located inside of the Nsp3 site, which in coronaviruses is a multifunctional protein comprising of different domains and regions. Nsp3 has shown to bind to viral RNA, nucleocapsid protein, as well as other viral proteins, and participates in polyprotein processing. Although there are some structural data available regarding Nsp3, but currently they are poorly characterized in coronaviruses (Lei et al. 2018). Since Nsp3 is a protein that is in association with RNA, and PrLDs are found in RNA binding proteins, it is possible that the prediction of PrLDs in Nsp3 of SARS-COV2 come true in laboratory assays. Moreover, since due to the connection of Nsp3 with replication-transcription complexes (RTCs), this protein has been suggested as a potential drug target, and any structural information about this protein is really necessary (Khan et al. 2020).

Figure 4
Circle presentation of the phylogenetic tree performed on the 15 selected sequences selected for studying the relation between protein structures Regarding the prion-like domains, the structure of their amyloid-core is of critical importance. Amyloid-cores are what make the domain with prion characteristics, therefore they play an important role. The structure of the amyloid cores is being defined in literature, and are supposed to be located in a cross-β-sheet conformation (Sant'Anna et al. 2016). Even though that in this study we presented the prediction of a proposed prionlike domain and the amyloid-core, but more in depth laboratory study for this issue is highly necessary. Further investigation is required to ascertain the importance of such properties to confirm the suggested role, and any information in this regard will be useful for preventing the possible future viral outbreaks with similar properties and might help controlling the current COVID-19 condition.

CONCLUSION
In this study, it was demonstrated that there was no PrLD in any of the 15 proteins of study, except the polyprotein of SARS-CoV-2. Overall, this report suggested the potential impact of a PrLD in SARS-CoV-2, which probably makes COVID-19 more dangerous than MERS-CoV and SARS-CoV.