Dear Editor,

Recent study showed that around 80% of coronavirus disease-19 (COVID-19) patients are moderate cases who will recover with or without conventional treatment, while the remaining 20% developed severe disease requiring intensive care.1 Early and accurate screening of new COVID-19 patients to identify those who will develop severe disease will facilitate decision-making on appropriate treatment regimens and reasonable allocation of limited healthcare resources. Therefore, novel predictive factors for disease progress from moderate to severe are urgently needed.

As proteomic profiling from sera of patients are informative during the disease progression, we hypothesize that some proteins would be significantly altered in the sera of patients who will develop severe COVID-19 after admission, which are informative to predict the disease progression.2,3 Herein, we initiated a study by recruiting two groups of COVID-19 patients, the development group (group I) with 23 patients (15 severe and 8 moderate) (Supplementary Table S1); and the validation group (group II) with 50 patients (29 severe and 21 moderate) (Supplementary Table S2) (Fig. S1a). In addition, 10 healthy individuals negative for the SARS-CoV-2 nucleic acid test were recruited as control group. To characterize the proteomics signatures of COVID-19 patients, we performed a quantitative liquid chromatography-tandem mass spectrometry/mass spectrometry (LC-MS/MS) proteomic analysis on the sera of the patients from group I and the control group. Overall, the abundance of 988 proteins across these 33 individuals exhibited high consistency (Supplementary Fig. 1b; Supplementary Data S1). Based on the treatment records, 15 patients in group I who required ventilator were classified as severe cases, and the rest were classified as moderate (Supplementary Table S1). Principal component analysis (PCA) results show that although most of the moderate patients are similar to healthy controls, the severe patients clearly segregated from the rest (Fig. 1a), suggesting some unique confounding proteomic features for the severe cases. Differential expression (DE) analysis between patients and healthy controls identified 43 and 47 significantly upregulated and downregulated proteins, respectively (Fig. 1b, c, Supplementary Table S3). The upregulated proteins include classical inflammatory response proteins, such as protein S100-A8 (S100A8), protein S100-A9 (S100A9), serum amyloid A-1 protein (SAA1), serum amyloid A-2 protein (SAA2), and alpha-1-antichymotrypsin (SERPINA3). We performed systematic Gene Ontology (GO) enrichment analysis and the results show that the upregulated proteins are significantly enriched (Bonferroni corrected P < 0.05) in biological processes related to acute inflammatory response, neutrophil activation, and activation of innate immune response (Fig. 1d; Supplementary Table S4). Our results also indicate that some of the upregulated proteins are significantly involved in response to hypoxia and cytokine secretion (Fig. 1d). The significance of some of these differentially expressed proteins were independently validated by parallel reaction monitoring (PRM) (Supplementary Figs. 2, 3). These results provide an initial proteomics characterization for the 23 confirmed COVID-19 patients. DE analysis between the severe and moderate cases show that 20 proteins are significantly upregulated in the severe cases (Fig. 1e, f, Supplementary Table S5). Ten of them, including S100A8 and S100A9, are also identified as upregulated in the previous comparison between the patients and the healthy counterparts. Upregulation of these 10 proteins identified in both comparisons suggests that the severe cases might have stronger inflammatory responses than the moderate patients. Upregulations of the other 10 proteins including actin, cytoplasmic 1 (ACTB), calpain small subunit 1 (CAPNS1), collagen alpha-3(VI) chain (COL6A3), coagulation factor VIII (F8), glutathione S-transferase omega-1 (GSTO1), myoglobin (MB), out at first protein homolog (OAF), superoxide dismutase [Mn], mitochondrial (SOD2), thymosin beta-4 (TMSB4X), and tumor necrosis factor receptor superfamily member 17 (TNFRSF17), are only found in severe patients while the expression of such those 10 proteins are either nonsignificantly different or downregulated in moderate patients as compared to controls (Supplementary Fig. 4). Functional enrichment analysis of the 20 upregulated proteins in the progression to severe group show that, in addition to immune response related biological processes, some of these proteins (ACTB, GSTO1, S100A9, OAF, and SOD2) are significantly involved in cellular detoxification during the disease progression. (Fig. 1g; Supplementary Table S6).

Fig. 1
figure 1

Identification of proteomic predictive factors for severe COVID-19 patient screening. Sera samples from two independent cohorts were collected. Samples from Development group were proteomically analyzed by LC-MS/MS and PRM. Machine-learning analysis of the proteomics data identified three predictive factors for screening severe COVID-19 patients. Samples from Validation group were analyzed by ELISA assay to validate the proteomic predictive factors. a PCA plot of the proteomics data (988 quantified proteins), red, green and blue data points are control, moderate and severe COVID-19 patients, respectively. The subject ID numbers are also shown. The centroid of each sample group is indicated by a larger solid red circle, green triangle and blue diamond, respectively. b Volcano plot showing DE protein identification between all patients (Development group) and controls. Selected DE proteins are indicated. c Heatmap visualization of the 90 significantly DE proteins between patients and controls. Upregulated and downregulated are in respect to patients vs. controls. Triangle indicates the only patient (Patient 22) who was classified as severe on hospital admission. Gene names are shown on the right. d Dotplot visualization of selected GO biological process terms for the upregulated proteins in COVID-19 patients. Dot sizes are proportional to the number of proteins annotated to the corresponding GO term. Color-scale is proportional to −log10(adjusted P value). e Volcano plot showing DE protein identification between all severe and moderate patients in Development group, with selected DE proteins indicated. f Heatmap visualization of the 24 significantly DE proteins between severe and moderate patients. Upregulated and downregulated are in respect to severe vs. moderate. Triangle indicates the only patient who was classified as severe on hospital admission. Gene names are shown on the right. g Dotplot visualization of selected GO biological process terms for the upregulated proteins in severe patients. Dot sizes are proportional to the number of proteins annotated to the corresponding GO term. Color-scale is proportional to −log10(adjusted P value). h Boxplots of PRM validation intensity of DDT, OAF, and MB in severe and moderate patients. i ROC of the random forest model built with the three protein features shown in j. The AUC is quoted with the 95% confidence interval values in brackets. j Importance ranking of OAF, MB and DDT in random forest models. k Boxplots of ELISA detection values of DDT, MB, and OAF between moderate (n = 21) and severe (n = 29) COVID-19 patients in Validation group. *P < 0.05. P values were calculated by Wilcoxon Rank Sum test. S Severe, M Moderate

To consolidate the specificity of the upregulated proteins in the severe patients, we performed additional PRM validation. The validation results indicate 8 (S100A8, OAF, 40S ribosomal protein S28 (RPS28), SOD2, MB, GSTO1, D-dopachrome decarboxylase (DDT) and CAPNS1) out of the 10 upregulated proteins are of significantly higher detection rate by PRM in the progression to severe samples than the moderate ones (P < 0.05, Wilcoxon rank sum test, Fig. 1h, Supplementary Figs. 5, 6, and 7). Next, we performed random forest machine-learning to test whether any of these factors with differential protein expression profiles can be used for early detection of potential severe COVID-19 patients. The results show that the combination of these eight proteins can only achieve an area-under-curve (AUC) of ~0.80 (Supplementary Fig. 8).

To further identify severe COVID-19 predictive factor proteins, we designed an iterative model building approach on the proteomics data to search for an optimal combination of proteins for prediction of severe COVID-19 patients (Supplementary Fig. 9a). All 255 combinations of the eight PRM validated protein features were exhaustively tested. The overall results indicate that the combination of OAF, MB, and DDT is the best predictor for screening severe patients with AUC = 0.907 (CI:0.87–0.93) (Fig. 1i, j, Supplementary Fig. 9b–d, Supplementary Table S7).

To validate the predictive performance of DDT, OAF, and MB, we further determined their protein levels in sera samples from patients in group II by enzyme-linked immunosorbent assay (ELISA) assay, which is more clinically and financially practical than LC-MS/MS and PRM methods, the results show that all of DDT, MB, and OAF exhibited significantly (P < 0.05) higher concentrations by ELISA in severe patients than moderate (Fig. 1k; Supplementary Fig. 10a). In addition, random forest analysis of the combined ELISA data of these three proteins also indicate a strong predictive power (AUC = 0.904, CI: 0.89–0.91) for severe patient detection (Supplementary Fig. 10b, c).

In addition to the potential to alert the disease progression, the identified predictive factors panel may also shed light on potential molecular mechanisms underlying COVID-19 progression. DDT, also recognized as MIF-2, may play a pivotal upstream role in the inflammatory cascade by promoting the secretion of other inflammatory mediators, including TNF-a, IL-6, which are considered as major cytokine involved in cytokine storm syndrome in COVID-19 patients.4 Rhabdomyolysis with significant increase of myoglobin was reported on COVID-19, suggesting that kidney injury/dysfunction is another fatal risk in severe patients with COVID-19.5 It would be more interesting to investigate the role of OAF (Out at first protein homolog) in the progression of COVID-19, which remains largely unknown. Generally, our study supports the concept hat severe COVID-19 is rather a systemic dysfunction than a sole severe viral pneumonia with respiratory failure.

In conclusion, we performed statistical analysis on the combined proteomic signatures of COVID-19 patients and successfully identified DDT, OAF, and MB as a panel of predictive factors to predict the potential deterioration of patients with moderate COVID-19 at admission before manifestation of severe symptoms. Our proposed predictive factors panel would be helpful for early identification of potential severe COVID-19 patients, which is not only critical to the lifesaving of patients with COVID-19 but also beneficial to relief of COVID-19 pandemic in most countries.