Selection and T-cell antigenicity of synthetic long peptides derived from SARS-CoV-2

The pandemic caused by SARS-CoV-2 has led to the successful development of effective vaccines however the prospect of variants of SARS-CoV-2 and future coronavirus outbreaks necessitates the investigation of other vaccine strategies capable of broadening vaccine mediated T-cell responses and potentially providing cross-immunity. In this study the SARS-CoV-2 proteome was assessed for clusters of immunogenic epitopes restricted to diverse human leucocyte antigen. These regions were then assessed for their conservation amongst other coronaviruses representative of different alpha and beta coronavirus genera. Sixteen highly conserved peptides containing numerous HLA class I and II restricted epitopes were synthesized from these regions and assessed in vitro for their antigenicity against T-cells from individuals with previous SARS-CoV-2 infection. Monocyte derived dendritic cells were generated from these peripheral blood mononuclear cells (PBMC), loaded with SARS-CoV-2 peptides, and used to induce autologous CD4+ and CD8+ T cell activation. The SARS-CoV-2 peptides demonstrated antigenicity against the T-cells from individuals with previous SARS-CoV-2 infection indicating that this approach holds promise as a method to activate anti-SAR-CoV-2 T-cell responses from conserved regions of the virus which are not included in vaccines utilising the Spike protein.

or novel coronavirus outbreaks. Additional problems related to vaccination include cost, logistics and the duration of protection afforded by neutralising antibody [4].
T-cell responses to SARS-CoV-1 demonstrate greater durability than those of neutralising antibody [5] and are associated with protection against SARS-CoV-2 [6], particularly in the context of waning antibody titres [7], indicating that T-cell mediated immunity may offer durable immune protection which may limit the severity of disease and potentially offer immune responses that are cross reactive to variant SARS-CoV-2 and other coronaviruses [6,8,9] similar to those have observed for different influenza viruses [10]. T-cell immune responses are generated by vaccination with the SARS-CoV-2 spike protein however these responses are thought to represent approximately 50% of the total anti-SARS-CoV-2 CD4+ T cell response and 25% of CD8+ T cell responses [11]. Therefore, spike-based vaccines will likely induce sub-optimal anti-SARS-CoV-2 T-cell responses and alternative methods of inducing T-cell immunity need to be explored. Here we provide a rationale for the selection of antigenic regions from SARS-CoV-2 proteins including the nucleoprotein, membrane protein, envelope protein, ORF3, ORF7a and the non-structural proteins intended to provide broad T-cell activation and assess these synthetic long peptides for immunogenicity in vitro.

Coronavirus sequence conservation
Analysis of SARS-CoV-2 and other coronaviruses protein sequences from proteins harbouring epitope rich regions was performed using FASTA sequences deposited in the NCBI database (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/), alignment using Clustal Omega with default settings (https://www.ebi.ac.uk/Tools/msa/clustalo/) and analysis of conservation using Microsoft Excel. Accession numbers of coronaviruses strains can be found in the supplementary material file.

In silico prediction of T-cell epitopes
Identification of SARS-CoV-2 HLA restricted epitopes using prediction algorithms and experimentally validated epitopes deposited in the Immune epitope database and by using the NetMHCpan EL 4.1 (2020.09) prediction algorithm against an HLA allele reference set (https://www.iedb.org/).

Selection and synthesis of synthetic long peptides
Selection of conserved immunogenic regions between 15-30 amino acids in length (synthetic long peptides) was determined by assessing their suitability for synthesis based upon the physiochemical properties of the amino acids in the sequence and potential as CPP (cell penetrating peptides) defined by a net positive charge. Selected peptides were synthesized (Genscript, Netherlands).

Peripheral blood mononuclear cells
Peripheral blood mononuclear cells (PBMC) were purchased from the national blood service (NHS, UK) prior to the distribution of SARS-CoV-2 vaccines. PBMC demonstrating responses to a SARS-CoV-2 consensus peptide pool and serum antibody responses to the SARS-CoV-2 spike protein were defined as having been previously infected with SARS-CoV-2 whilst PBMC and sera lacking detectable responses were defined as SARS-CoV-2 naïve.

Isolation of PBmC and generation of monocyte-derived DC
The generation of mDDC (monocyte-derived dendritic cells) was performed using established protocols. CD14+ cells were isolated by positive selection using anti-CD14 conjugated magnetic beads (Miltenyi, Germany). The CD14+ cells were cultured for 6 days in RPMI (Sigma, UK) 10% foetal calf serum (sigma, UK) and 5% streptomycin/penicillin (Sigma, UK) of 10 ng ml −1 IL-4 and 50 ng ml −1 GM-CSF (Miltenyi, Germany). These mDDC were subsequently co-cultured with the T-cells enriched from the CD14-fraction of PBMC using anti-CD3 magnetic beads (Miltenyi, Germany) in the presence of SARS-CoV-2 peptides individual and in groups, and control peptides including CEF (Cytomegalovirus/Epstein Barr virus and influenza), CEFT (Cytomegalovirus/ Epstein Barr virus/influenza/tetanus) SARS-CoV-2 reference group (including overlapping peptides from the spike, nucleocapsid and membrane) (1 µg ml −1 for each peptide) for antigenicity assays.
ELISPoT IFN-γ ELISPOT (enzyme-linked immunosorbent spot) assays using peptide pulsed mDDC-T-cell co-cultures (2×10 4 mDDC: 2×10 5 T-cells) were incubated in IFN-γ ELISPOT plates (CTL ltd, Germany) for 48 h in order to assess antigen specific T-cell activation. After 48 h ELISPOT plates were processed according to the manufacturers protocol. Briefly, plates were washed in PBS ×3 prior to the incubation with detection anti-IFN-γ antibody for 2 h. Plates were washed ×3 in PBS wash buffer and Strep-Biotin reagent was added for 1 h. Plates were washed ×3 prior to the addition of substrate solution. Spot formation was observed and the plates were washed once in distilled H 2 O and left to dry before enumeration using a CTL Immunospot entry ELISPOT plate reader. Positive responses compared to the no peptide control were defined as >1.5-fold change and statistical significance using Student's t-test.

Flow cytometry
Multiparameter flow cytometry was used to measure CD8+ and CD4+ T cell activation using the AIM assay. A total of 1×10 5 MDDC were incubated with SARS-CoV-2 or control peptides and co cultured with 1×10 6 of the T-cells enriched from the CD14fraction of PBMC for 24 h in flat bottomed 96 well plates prior to staining. V450 anti-CD3, AF488 anti-CD8, APC-fire-anti-CD4,  V670 anti-CD45RA and V710 anti-CCR7 were used as T-cell subset markers. APC anti-CD69, Percp anti-CD137 and PE-dazzle anti-OX40, Alexa Fluor 700-OX were used as activation induced markers. Responses to the peptides were defined as a 1.5-fold increase to the no peptide control.

Identification of antigenic regions within the SARS-CoV-2 proteome
In order to identify amino acid sequences within the SARS-CoV-2 proteome that contain multiple class I and II restricted epitopes, peptides from conserved regions from SARS-CoV-2 proteins were assessed using the IEBD epitope prediction tool [12]. Identification of clusters of epitopes, previously validated for HLA binding or T-cell activation and deposited within the IEDB database, were also used. This resulted in the identification of 25 peptide regions harbouring multiple predicted or experimentally validated epitopes. Five of these peptides were identified within the Spike protein and were not investigated further since T-cell responses to these regions may be raised by existing vaccines. The remaining 20 peptides were derived from the nucleoprotein, envelop protein, membrane protein, ORF3a, ORF7a and the ORF1ab polyprotein (Table 1). These peptides were assessed for conservation between 500-2500 SARS-CoV-2 protein sequences deposited on the NCBI virus database and including sequences derived from different geographic locations and belonging to variants of concern. The amino acids in each peptide were highly conserved with typically between 98-100% conservation for each aa residue within each peptide (Table S1). Some peptides demonstrated 100% conservation whilst the average conservation across all 20 peptides was 99.4 % ( Table 1). The limited variation was often between similar amino acids (Table S1). Early analysis of available protein seqeunces from the Omicron varient of SARS-CoV-2 also demonstrated 99% conservation with the peptides identified here.
Peptides intended to induce broad antigen specific T-cell responses need to contain epitopes to the most common HLA alleles in human populations. The collective HLA restriction of the experimentally validated epitopes identified in the IEDB database in each of the 20 peptides was determined ( Table 2). The 20 peptides included a total of 144 experimentally validated epitopes, 125 restricted to HLA class I and 19 restricted to HLA-class II. Next the presence of predicted, but untested, epitopes was determined (Table 3) defined as being within the top 0.1% of predicted binders for each HLA allele. Ninety-five predicted epitopes were identified, 93 restricted to HLA class I and 2 to HLA-class II. In total the epitopes identified within the peptides were restricted to the most common HLA-class I alleles including but not limited to HLA*A01 : 01, HLA*A02 : 01, HLA*A03 : 01, HLA*A11 : 01, HLA*A24 : 02, HLA*A68 : 01, HLA*A68 : 02, HLA*B07 : 02, HLA*08 : 01, HLA-B*15 : 02, HLA*B35 : 01, HLA*B40 : 01.
Significant variation exists between bat coronaviruses related to SARS-CoV-2 and between other coronaviruses known to infect humans. Conservation of the 20 SARS-CoV-2 peptides with 93 other coronaviruses was assessed (Table 4). High sequence conservation between SARS-CoV-2 and other Serbecoviruses, including SARS-CoV-1 and bat derived SARS-like viruses, was demonstrated. Peptides derived from ORF1ab demonstrated greater conservation between viruses compared to peptides derived from the structural proteins such as the nucleoprotein. High conservation was also observed within the peptides between SARS-CoV-2 and Marbecoviruses, including MERS-CoV, responsible for pathogenic human infection. Again, higher levels of conservation were observed for the ORF1ab peptides. Some conservation was seen for coronaviruses more distantly related to SARS-CoV-2 such as Embecovirus, Duvinacovirus and Setracovirus genera including the viruses OC43, NL63, 229E, HKU1 which infect humans, consistent with recent studies detecting T-cell responses against SARS-CoV-2 in uninfected individuals [13][14][15][16][17]. The conservation in the peptides between SARS-CoV-2 and 93 other coronaviruses was then compared to conservation within regions of the Spike protein known to be targets for neutralising antibody ( Table 4). The receptor binding domain (RBD) and the N-terminal domain (NTD) of the spike protein demonstrated greater variation between SARS-CoV-2 and the 93 other coronaviruses relative to the SARS-CoV-2 peptides. For example, the majority of Serbecoviruses demonstrated 100% homology with SARS-CoV-2 in the NSP16 6821-452 peptide. In contrast the same viruses demonstrated approximately 47% homology to the SARS-CoV-2 NTD and between 60-70% to the SARS-CoV-2 RBD. These data indicate that immune responses raised against the SARS-CoV-2 peptides identified here could mediate cross immunity against diverse coronavirus strains, including those containing spike proteins with limited homology to SARS-CoV-2.
Most HLA class I restricted epitopes consist of 8-10mer amino acid sequences. Whilst high levels of conservation in the peptides was demonstrated between Coronaviruses (Table 4) a relatively small amount of variation can significantly alter recognition by either T-cell or antibody-based immune responses, demonstrated by observations that amino acid substations allow immune escape from neutralising antibody, however variation within one or two amino acids within the epitope sequence may still allow for T-cell recognition, albeit sometimes with altered TCR avidity for the peptide/MHC complex. This is particularly true for conservative amino acid substation such as isoleucine and leucine. For this reason, the conservation of the 125 validated, HLA-class I restricted epitopes identified in SARS-CoV-2 peptides were determined in each of the 20 peptides from each of the     Table 2. Continued Table 3. Predicted SARS-CoV-2 epitopes present in the selected peptides. Predicted SARS-CoV-2 epitopes and their HLA restriction present in each peptide based upon the NetMHCpan EL4.1 HLA binding prediction (Reynisson et al., 2020). Positive binding was defined as the top 0.1% scoring epitopes      Table 3. Continued       Table 4. Continued 93 coronaviruses used previously (Table 5). All 93 viruses had ten or more epitopes with homology to the SARS-CoV-2 epitopes within the peptides, including at least one identical epitope. Merbecoviruses contained between 16-35 epitopes, including between 1-11 identical epitopes. There was a high degree of epitope conservation within the Serbecoviruses most closely related to SARS-CoV-2. These data add further support for the potential cross reactivity of immune responses to the peptides between diverse coronaviruses.

Selection and T-cell reactivity of synthetic long peptides from SARS-CoV-2
Next, each peptide region was assessed for its the suitability to synthesize as synthetic long peptides (SLP) since the physiochemical properties of the peptides may make them unsuitable for synthesis or for targeting toward antigen presenting cells or homology between the peptides and human proteins may make them unsuitable for vaccination. Amino acid modifications were made outside of epitope containing regions in order to improve synthesis, stability and internalisation. Each of the peptides was differently conserved between other coronaviruses and contained a different number of epitopes restricted to different HLA. A total of 16 of these candidate SLP were selected as an immunogenic pool for in vitro assessment. These peptides are all water soluble and positively charged, which aids their internalisation into antigen presenting cells. The sequence length of between 21-30 amino acids allows for the presence of negatively charged, or hydrophobic amino acids, and the epitopes containing them, without resulting in an overall negative charge or solubility of the peptides.
The selected peptides were assessed for their ability to activate T-cell responses. PBMC from seven individuals with previous SARS-CoV-2 infection and four individuals seronegative for SARS-CoV-2, without selection for specific HLA expression, were used to generate monocyte derived dendritic cells (Fig. 1a), loaded with peptides and combined in IFN-γ ELISPOT plates with autologous T-cells (Fig. 1b-e). The SARS-CoV-2 peptides were assessed as peptide groups including a nucleoprotein group, ORF1ab group and an 11-peptide group which combined peptides from different regions based upon their expression of class II epitopes and the most highly conserved class I epitopes (identified in Table 5; details in Table S2). These were compared to a SARS-CoV-2 reference group of overlapping peptides to the Spike, nucleoprotein and membrane protein and to CEF and CEFT positive control peptides. The SAR-CoV-2 groups induced IFN-γ from mDDC-T-cell co-cultures derived from individuals with previous SARS-CoV-2 infection but not seronegative individuals (Fig. 1c, d). Each peptide group induced the expression of IFN-γ from the DC-T-cell co-cultures of between 3-5 donor PBMC (Fig. 1c) providing support for the feasibility of grouping the peptides. The 16 SARS-CoV-2 were individually tested and induced IFN-γ expression from the T-cells of between 1-5 individuals with previous SARS-CoV-2 infection (Fig. 1e, Table S2) but none against the T-cells from seronegative donors (data not shown). IFN-γ responses were observed against between 2-9 peptides (average of five) derived from different between 2-6 viral proteins (average of 3.7) from each of the seven donors.
Next a flow cytometry-based activation induced marker (AIM) assay was used to gain a greater insight into the T-cell responses raised by the peptides (Figs 2 and 3, Table S3). In this assay individuals with prior SARS-COV-2 infection demonstrated responses to each of the peptides through the expression of paired markers including CD137, CD69, CD40L or Ox40. Both CD4+ (Fig. 2a, Table 5. Conservation of epitopes between SARS-CoV-2 and other coronaviruses. Experimentally validated epitopes deposited in IEDB and present within the 21 peptides chosen in this study were identified and their presence and variability assessed in 93 different coronaviruses. The number of identical epitopes or those harbouring one or two amino acid substations was determined. b) and CD8+ (Fig. 3a, b) T-cell responses indicative of HLA restricted T-cell activation were observed with at least one response observed for each peptide with the exception of NSP 4895-915 which demonstrated a CD4+ T cell response from the PBMC of one individual but no CD8+ T cell responses. This pattern of responses was consistent with other reports studying anti-SARS T-cell responses [13,17]. Interestingly the greatest responses were observed amongst CD45RA+CCR7-TEMRA CD4+ T cells (Fig. 4a, b) effector and central memory CD8+ T cells (Fig. 4c, d) consistent with recent studies investigating the phenotype of SARS-CoV-2 reactive T-cells [18,19].
Taken together these data demonstrate the in vitro T-cell antigenicity of SARS-CoV-2 derived SLP containing epitopes restricted to multiple HLA and conserved between SARS-CoV-2 variants and other coronaviruses.

DISCuSSIon
SARS-CoV-2 vaccines based upon the Spike protein have demonstrated between 60-95% efficacy in phase three trials and are now in widespread use. Although these vaccines are highly efficacious numerous issues remain unresolved. These include supply, cost and the requirement of some vaccines for a cold chain. From an immunological perspective there remains concern that variation in the Spike protein may evolve against which antibodies raised by vaccination are less effective, demonstrated to some degree by the Gamma [20] and Delta [21] variants of SARS-CoV-2 in addition to a report detailing extensive but incomplete escape of vaccine elicited neutralization by the Omicron Variant of SARS-CoV-2 . Related issues involve the significant decline of antibody titres within weeks of vaccination in some people [22] and the observation that vaccinated individuals may still become infectious, indicating that regular SARS-CoV-2 vaccination is likely required. The possibility of generating antibody dependent enhancement [23], to novel SARS-CoV-2 variants or other Sarbecoviruses, a phenomenon demonstrated for SARS-CoV-1 [24], and the prospect of future zoonosis with novel Coronaviruses, to which the existing spike-based vaccines may be ineffective, are also of concern.
The broad therapeutic activation of SARS specific T-cell immune responses may resolve or ameliorate a number of these issues. T-cell responses are more stable than humoral responses [15] whilst patients with agammaglobulinemia can recover from COVID-19, supporting a protective role for T-cell immunity [25]. Early induction [26] and antigenic diversity [6] of the SARS-CoV-2specific T-cell responses is associated with mild COVID-19 and cross reactivity with CD4+ T cell responses to other human coronaviruses are associated with mild infection [13][14][15][16]. These observations are consistent with studies showing that a lower frequency [27] and functionality [28,29] of T-cells is positively correlated with in-hospital death or severity of illness whilst lower counts of total, CD8+, or CD4+ T cells are negatively correlated with patient survival [30]. The Spike protein includes a number of T-cell immunogenic regions but taken together only represents a fraction of the potential anti-SARCoV-2 T-cell response which   also targets other SARS-CoV-2 proteins including the nucleoprotein, membrane protein and non-structural proteins of ORF1ab [17,19,[31][32][33]. These studies show that significant variation exists in the T-cell antigenic targets of SARS-CoV-2 which may lower the efficacy of spike-based vaccines in patients who demonstrate limited anti-spike T-cell activation. Although infection offers an opportunity to gain immune protection to diverse SARS-CoV-2 antigen, and some studies have identified strong T-cell responses from individuals with asymptomatic or mild COVID-19, other studies suggest that asymptomatic infection may not provide sufficient antigenic stimulation to activate protective, long-lasting anti-SARS-T-cell response [29,33], supported by observations that CD8+ T cell responses could not be detected in 30% of convalescent individuals [12].
In this study, immunogenic regions from SARS-CoV-2 proteins other than the spike were identified and their conservation amongst selected alpha and beta coronaviruses was assessed. The selected peptides contain multiple epitopes restricted to the most common HLA class I molecules and which have previously demonstrated induction of T-cell activation in response to SARS-CoV-2 (Table 2). Importantly, these peptides were highly conserved between different coronaviruses, particularly of the SARS-like Serbacoviruses, compared to the receptor binding domain of the spike protein, the major antigenic site of neutralising antibodies ( Table 4). Each of the peptides induced T-cell responses from the T-cells of at least one individual with previous SARS-CoV-2 infection (Figs 1-3) however future work is warranted in order to extend the limited observations made here. This could define the nature of T-cell responses raised to the peptides in greater detail along with their ability to contribute to protection from SARS-CoV-2 challenge and induce responses from naïve donors, which were not observed in the present study, likely due to the small number of experiments performed using cells from naïve donors. Nethertheless, these data indicate that T-cell responses raised against these peptides may cross react with future SARS-CoV-2 variants, which may evolve to escape neutralising antibody responses, and against future emerging coronaviruses. This is supported by studies screening SARS-CoV-2 epitopes in COVID-19 and uninfected patients which have observed SARS-CoV-2 epitopes specific CD4+ and CD8+ T cell responses in SARS-CoV-2 uninfected individuals, which share homology with epitopes in other human coronaviruses [6,[8][9][10]31]. The SARS-CoV-2 peptides studied here include 125 epitopes identified by these epitope screening studies of SARS-CoV-2 patients [6,8,9,19,31,32]. A recent study, screening epitopes in 16 COVID-19 patients identified 122 epitopes reactive to T-cells in these individuals [33]. The peptides detailed in the present study share 17 epitopes with this study. Future epitope screening studies may reveal further SARS-CoV-2 specific epitopes which, if part of epitope rich clusters, may identify new regions suitable for the generation of synthetic long peptides of the kind studied here.
The peptides are derived from the immunodominant viral proteins other than the spike (Table 1) so could complement existing spike-based vaccination and contribute to the induction of broad T-cell reactivity associated with improved anti-viral immunity. For example, antibody titres in COVID-19 patients correlate with CD4+ T cell immune responses not just to the spike protein, but also to the nucleoprotein and membrane protein [14,34] and the peptides studied here include multiple class II restricted epitopes from the nucleoprotein and membrane protein not present in existing spike-based vaccines. Harnessing CD4+ T cell epitopes from other SARS-CoV-2 antigen represents a strategy for improving the response or longevity of protection afforded by existing spike-based vaccines, particularly given observations that the diversity [6], functionality [18,35] and quality [36] of CD4+ T cell activation supports the generation of cellular and humoral immune responses associated with protection. These observations are supported by the reports that hospitalised patients with robust B-cell responses yet suffering from severe COVID-19 infection demonstrate limited activation of circulating CD4+ follicular T-cells, indicative of the importance of these cells to effective humoral immunity [37]. Vaccine approaches including the envelop and nucleoprotein are under investigation [38,39], consistent with this approach.
A recent study showed that CD8+ T cell responses against SARS-CoV-2 were raised against approximately 17 epitopes derived from between 1-6 viral proteins (average 2.7) [17]. In our in vitro experiments DC-T-cell co-cultures generated from the PBMC of individuals with previous SARS-CoV-2 infection responded with IFN-γ expression to an average of five peptides derived from an average of four proteins. Previous studies have demonstrated that broad T-cell responses against multiple epitopes are more effective than narrow responses targeting fewer epitopes [40][41][42] and less susceptible to exhaustion [43] indicating that broadening the anti-SARS-CoV-2 T-cell response from vaccination is desirable.
Analysis of T-cell responses to the ChAdOx1 spike-based vaccine showed that nearly 30% of unique TCRs raised by the vaccine mapped to a single region of the spike protein which is mutated in the Beta variant of SARS-CoV-2 [44]. This may contribute to the failure of ChAdOx1 to protect against mild-to-moderate COVID-19 [20]. These studies suggest that variation in SARS-CoV-2 has the potential to reduce vaccine efficacy and support the use of SARS-CoV-2 antigen derived from non-spike proteins.
Synthetic long peptides of the kind studied here have been used in numerous therapeutic vaccines for both infectious disease and cancer and demonstrated an ability to induce efficacious T-cell responses [45][46][47]. Peptide based vaccines are inherently safe, can be easily manufactured, combined with different adjuvants, including those selected for therapeutic properties such as trained innate immunity, and do not have the same requirements for cold chains as other vaccines. They may be useful alternatives to other vaccine designs for the generation of broad T-cell responses since they exclude non-immunogenic regions and avoid the generation of non-neutralising antibody responses which may be induced by whole virus vaccines [48] and are associated with ADE or toxicity. Alternatively, the peptide regions identified here could also be incorporated into mRNA-based vaccines.
Currently vaccination with whole, killed SARS-CoV-2 virions, which have the potential to induce T-cell responses against each viral protein, have demonstrated lower efficacy compared to mRNA or viral vector-based vaccines indicating that other methods of broadening the antigenic repertoire of SARS-CoV-2 vaccines are needed. The peptides studied here are candidate SARS-CoV-2 immunogens with the potential to increase the breadth and cross reactivity of T-cell activation to existing SARS-CoV-2 vaccines.

Funding information
This work received no specific grant from any funding agency.