All semi-supervised NMF approaches tested by Gaujoux et al

All semi-supervised NMF approaches tested by Gaujoux et al. fractions leveraging on a signature matrix describing the cell-type-specific expression profiles (Fig.?1b). In this review, we describe state-of-the-art computational methods that quantify immune cells from expression data of cell mixtures using marker genes coupled with GSEA or other scoring approaches, or leveraging on deconvolution algorithms and immune cell expression signatures (Table?1). Finally, we discuss the issues and open challenges that must be resolved to accurately quantify immune infiltrates from bulk tumor RNA-seq data. Table 1 Features of the computational tools for the quantification of tumor-infiltrating immune cells from transcriptomics data considered in this review: tool or function name, algorithm type (M?=?marker genes, P?=?partial deconvolution, C?=?complete deconvolution), main method, cell WAF1 types quantified using the embedded gene sets or signature profiles, code availability, name of the method in the CellMix package [9], reference publication values (0.25, 0.5, and 0.75) and the solution providing the lowest root-mean-square error (RMSE) between the true expression and the estimated expression is selected. Also in a-Apo-oxytetracycline this approach, the coefficients are forced to nonnegative values and normalized to sum up to one. Validated on microarray data of cell mixtures derived from blood and from lymph node biopsies, CIBERSORT proved to have a high accuracy in the simultaneous deconvolution of nine and three immune cell subsets, respectively, whereas it showed a lower accuracy in the quantification of gamma-delta T cells [19]. Tested on simulated mixtures of four malignant immune cell types, it also proved robustness to various levels of noise and unknown tumor content. CIBERSORT was applied to about 18,000 microarray data sets across 39 solid and hematological cancers (results available at https://precog.stanford.edu/) [39]. Li et al. developed a multi-step computational approach, TIMER, to estimate the abundances of six immune cell types in 32 cancer types leveraging on a list of immune-specific markers derived from the IRIS database and on immune cell expression signatures extracted from the?HPCA microarray data [20]. Each cancer expression matrix under investigation, derived from RNA-seq or microarray data, is merged with the immune cell expression matrix and normalized with Combat [40] to remove batch effects. Signature genes are identified separately for each malignancy type by selecting from the immune cell markers the genes that are negatively associated with tumor purity. Finally, for each malignancy type, the signature matrix is built from the normalized immune cell profiles considering the selected immune cell markers. TIMER performs deconvolution using the linear least square regression approach proposed in [15] and forces all negative estimates to zeros. The estimation is usually repeated several times with an increasingly smaller set of T-cell markers to reduce the correlation between the estimated CD8+ and CD4+ T cell proportions. Unlike CIBERSORT, the final estimates are not normalized to sum up to one and, thus, cannot be?neither interpreted directly as cell fractions [41] nor compared across different immune cell types a-Apo-oxytetracycline and data sets [20]. TIMER was validated on simulated mixtures, as well as on TCGA samples considering as ground truth quantized neutrophil abundances estimated from images of hematoxylin and eosin (H&E)-stained tissue slides and lymphocytic infiltration scores computed from DNA a-Apo-oxytetracycline methylation data [20]. TIMER was applied to more than 10,000 samples across 32 cancer types of TCGA (results available at https://cistrome.shinyapps.io/timer/) [42]. Racle et al. recently developed a tool to Estimate the Proportion of Immune and Cancer cells (EPIC) [21]. EPIC uses constrained least square regression to explicitly incorporate the non-negativity constraint into the deconvolution problem and to impose that this sum of all cell fractions in each sample does not exceed one. The difference between one (i.e., 100% of the cells in the mixture) and the sum of the.