You are viewing the site in preview mode

Skip to main content

MiCML: a causal machine learning cloud platform for the analysis of treatment effects using microbiome profiles

Abstract

Background

The treatment effects are heterogenous across patients due to the differences in their microbiomes, which in turn implies that we can enhance the treatment effect by manipulating the patient’s microbiome profile. Then, the coadministration of microbiome-based dietary supplements/therapeutics along with the primary treatment has been the subject of intensive investigation. However, for this, we first need to comprehend which microbes help (or prevent) the treatment to cure the patient’s disease.

Results

In this paper, we introduce a cloud platform, named microbiome causal machine learning (MiCML), for the analysis of treatment effects using microbiome profiles on user-friendly web environments. MiCML is in particular unique with the up-to-date features of (i) batch effect correction to mitigate systematic variation in collective large-scale microbiome data due to the differences in their underlying batches, and (ii) causal machine learning to estimate treatment effects with consistency and then discern microbial taxa that enhance (or lower) the efficacy of the primary treatment. We also stress that MiCML can handle the data from either randomized controlled trials or observational studies.

Conclusion

We describe MiCML as a useful analytic tool for microbiome-based personalized medicine. MiCML is freely available on our web server (http://micml.micloud.kr). MiCML can also be implemented locally on the user’s computer through our GitHub repository (https://github.com/hk1785/micml).

Peer Review reports

Background

Recent advances in high throughput metagenomic sequencing (i.e., the targeted 16S rRNA gene [1, 2] or internal transcribed spacer [3] amplicon sequencing as well as the shotgun metagenomic sequencing [3]) have enabled cost-effective microbiome profiling. Then, the microbiome has been the subject of intensive investigation, spanning academia and industry, with the aims to promote the treatment and prevention of human diseases [5,6,7,8,9,10,11].

Notably, there has also been pre-clinical evidence that treatment effects are heterogenous across patients due to the differences in their microbiomes. For example, Matson et al. [12] showed that the gut microbiome can influence (double or halve) the anti-carcinogenic responses of cancer immunotherapies (i.e., immune check point inhibitors such as anti-PD-1 and anti-CTLA-4) through a series of well-designed randomized controlled trials [12]. There have also been many related observational studies for human subjects on the roles of the gut microbiome on the efficacy of cancer immunotherapies [13,14,15,16,17,18,19]. The key underlying mechanisms are in various functions and operations of the microbes on their host’s immune system [19]. More specifically, the gut microbes reprogram the tumor microenvironment through the channels of innate and/or adaptive immune cells [19]. Of course, cancers and immunotherapies should not be the only cases. Similar immunologic mechanisms can pertain to various immune diseases (e.g., psoriasis [20], allergy [21], inflammatory disease [22, 23], metabolic disorder [22], obesity [24,25,26], diabetes [26]) and their related therapies.

Notably, such heterogenous treatment effects also imply that we can enhance the treatment effect by manipulating the patient’s microbiome profile. Then, investigators have sought new ways to augment the efficacy of the primary treatment (e.g., immunotherapy) through the coadministration of microbiome-based dietary supplements (e.g., prebiotics, probiotics, dietary fiber) or therapeutics (e.g., antibiotics, pharmabiotics, phage therapy, microbiota transplantation) [19,20,21,22,23,24,25,26,27]. However, for this, we first need to comprehend which microbes help (or prevent) the primary treatment to cure the patient’s disease. In principle, it is the matter of interaction effects between treatment and microbiome on the patient’s recovery. That is, both treatment and microbiome can influence the patient’s health or disease status, but their total effect is not necessarily additive. They can make synergy or antagonism effects. In particular, investigators are interested in the direction on how the effect of the primary treatment on the patient’s health or disease status can be modified by the human microbiome.

In this paper, we introduce a cloud platform, named as microbiome causal machine learning (MiCML), for the analysis of treatment effects using microbiome profiles. The motivations and rationales behind are as follows. First, we noted that recent advances in web-based analytics have made the analysis of highly complex (i.e., high-dimensional, compositional, zero-inflated, and over-dispersed) microbiome data easy for many people in various disciplines (e.g., clinicians, public health practitioners, biologists, bioinformaticians, statisticians, computer scientists) [28,29,30,31,32,33,34,35]. However, no web-based analytic platform existed to examine the interaction effects between treatment and microbiome on the patient’s recovery. Therefore, we designed MiCML to streamline all related data processing and analytical procedures within a user-friendly web environment. Second, we accommodated two up-to-date non-parametric analytic methods, (i) conditional quantile regression (ConQuR) [36] for robust batch effect correction in collective microbiome data and (ii) causal machine learning [37,38,39] for patient-level treatment effect estimation and its subsequent analytics to discern microbial taxa that enhance (or lower) the efficacy of the primary treatment. Especially, the causal machine learning [37,38,39] should be useful and promising in microbiome-based personalized medicine for better causality and data driven adaptivity [38]. However, it is not used in the microbiome field as much as it should be. It might be because of the absence of carefully planned and automated software facility that pipelines a series of causal machine learning approaches for microbiome data. Therefore, we equipped MiCML with microbiome data analysis modules tailored for causal machine learning [37,38,39] along with clear, publication-ready visualizations that are easy to interpret.

In the following Methods section, we begin by outlining the underlying web server architecture and implementation facilities, along with the gut microbiome data on cancer immunotherapy efficacy [12,13,14, 16, 17], which we use to demonstrate MiCML. We also describe the methodological concepts and consequences of ConQuR [36] and causal machine learning [37,38,39]. Then, in the Results section, we (i) describe all the data processing and analytic modules, and (ii) demonstrate their uses through the analysis of the gut microbiome data on the efficacy of cancer immunotherapy and interpret the results [12,13,14, 16, 17]. Finally in the Discussion and Conclusion section, we summarize all, describe the implications of MiCML in microbiome-based personalized medicine, and discuss any limitations encountered in this study.

Methods

Web server architecture and implementation facilities

MiCML is an open-source software facility written in R language mainly based on the R libraries: ConQuR (https://github.com/wdl2459/ConQuR) [36] for batch effect correction, and rpart (https://cran.r-project.org/web/packages/rpart), randomForest (https://cran.r-project.org/web/ packages/randomForest) and grf (https://cran.r-project.org/web/packages/grf) [37,38,39] for causal machine learning. We constructed the application architecture using Shiny (https://www.rstudio.com/products/shiny), and then deployed it onto the web server using ShinyProxy (https://www.shinyproxy.io) and Apache2 (https://httpd.apache.org). Our web server builds-in the Intel Core i9-12900 (16-core) processor and 64 GB DDR4 memory, operating on Ubuntu Server 20.04 (https://ubuntu.com). MiCML can be implemented freely on our web server (http://micml.micloud.kr) with no requisite log-in or registration. MiCML can also be implemented locally on the user’s computer through our GitHub repository (https://github.com/hk1785/micml) when the server is busy.

Real microbiome data: gut microbiome data on immunotherapy

In the Results section, we demonstrate each data processing/analytic module using real microbiome data (see Application note) [12,13,14, 16, 17]. The data that we use are the gut microbiome data originally used for the meta-analysis in Limeta et al. [17]. Limeta et al. collected gut microbiome data from four related studies - Frankel et al., 2017 [13] with the sample size of 38, Gopalakrishnan et al., 2018 [14] with the sample size of 25, Matson et al., 2018 [12] with the sample size of 39, and Peters et al., 2019 [16] with the sample size of 26 - on the efficacy of cancer immunotherapies (i.e., immune check-point inhibitors such as anti-PD-1 and anti-CTLA-4) for metastatic melanoma patients.

In particular, we survey which microbial taxa enhance (or lower) the efficacy of the combined anti-PD-1 and anti-CTLA-4 therapy over anti-PD-1 only - where the treatment variable is coded as 0 for anti-PD-1 only (92 patients) and 1 for anti-PD-1 and anti-CTLA-4 (36 patients), and the response variable is coded as 0 for non-responders (NR) (64 patients) and 1 for responders (R) (64 patients) to immunotherapies - for 128 metastatic melanoma patients in total.

The patients’ fecal samples were collected, and their gut microbiomes were profiled through the shotgun metagenomic sequencing before introducing immunotherapies to the patients. All the detailed raw sequence data processing procedures can be found in the original article [17]. The final processed data are also freely available as example data in the Data Input module for our users to easily confirm compatible data formats.

Batch effect correction

Investigators often compile microbiome data from different batches (e.g., labs, studies, populations, states, locations, times) to create large-scale data that enable more robust and powerful analyses. However, such collective large-scale microbiome data are subjected to the batch effect problem - systematic variation in the data due to the differences in their underlying batches - that leads to excessive false positives and spurious findings [36]. Ling et al. have recently developed a non-parametric batch effect correction method, called ConQuR, based on a two-part quantile regression model to estimate batch effects while maintaining the variability in the microbiome data due to other (primary or nuisance) sources [36]. The first logistic regression part of ConQuR [36] models the presence/absence of the microbes to address the zero-inflated nature of the data. Then, the second quantile regression part of ConQuR [36] models various percentiles of the remaining non-zero counts to account for the full spectrum of the underlying (possibly over-dispersed) data distribution, while other existing methods [40] make restricted distributional assumptions, for example, on the mean and variance only. ConQuR [36] finally produces batch effect-free count data that are balanced across batches while preserving the signals from other key (primary and nuisance) variables. To be more detailed, ConQuR [36] removes the batch effects relative to a reference batch, for which we had MiCML select the batch with the largest sample size as the reference. As a result, ConQuR [36] can address the high complexity (e.g., zero-inflation, over-dispersion) of the microbiome data robustly, and the final batch effect-free count data are convenient to use for any subsequent data analytics (e.g., data transformation, association testing, prediction modeling).

Causal machine learning

First, we suppose that there are n patients (i = 1, ……, n), and let \(\:{Y}_{i}\) denote the binary (e.g., 0: unrecovered, 1: recovered) or continuous (e.g., post-treatment body mass index) response on the patient’s recovery, \(\:{T}_{i}\) denote the binary treatment status (e.g., 0: control, 1: treatment; 0: placebo, 1: treatment; or 0: old treatment, 1: new treatment), and \(\:{M}_{i}\) denote the patient’s microbiome profile in relative abundance.

A traditional approach for the analysis of interaction effect is to survey the effect of the product term between treatment and microbiome on the patient’s recovery using a parametric generalized linear model [41], that is \(\:{\beta\:}_{3}\) in Eq. (1).

$$\:{g}({\mu\:}_{i})={\beta\:}_{1}{T}_{i}\:{+}{\beta\:}_{2}{M}_{i}\:{+}{\beta\:}_{3}{T}_{i}\cdot\:{M}_{i}$$
(1)

where g(\(\:\cdot\:\)) is a canonical link function and \(\:{\mu\:}_{i}\) = E(\(\:{Y}_{i}\) | \(\:{T}_{i}\), \(\:{M}_{i}\)). This model is easy to understand, and its final results are also easily interpreted, enabling a breadth of statistical inferences (e.g., parameter estimation and significance testing). However, it is restricted to the parametric and linearity conventions that may not be adequately suited to highly skewed microbiome data [42].

The causal machine learning approaches [37,38,39] based on the potential outcomes framework [43, 44] have recently been spotlighted for better causality and data-driven adaptivity. We can first define the patient-level treatment effect based on the potential outcomes framework [43, 44] as \(\:{\tau\:\left(m\right)}_{i}\) in Eq. (2).

$$\:{\tau\:\left(m\right)}_{i}\:={E}({Y}_{i}^{\left(1\right)}-{Y}_{i}^{\left(0\right)}|\:{M}_{i}={m})$$
(2)

where \(\:{Y}_{i}^{\left(1\right)}\) is the potential outcome of the i-th patient when he receives the treatment, and \(\:{Y}_{i}^{\left(0\right)}\) is the potential outcome of the i-th patient when he does not receive the treatment. However, it is not possible that the same i-th patient receives and does not receive the treatment at the same time. Hence, one of the potential outcomes, \(\:{Y}_{i}^{\left(1\right)}\) or \(\:{Y}_{i}^{\left(0\right)}\), is always missing, and the patient-level treatment effect \(\:{\tau\:\left(m\right)}_{i}\) cannot be directly measured [44].

However, a recent non-parametric method based on the Breiman’s random forest algorithm [45, 46], called causal forest [38], enables consistent estimation for patient-level treatment effects for both randomized controlled trials and observational studies [38]. That is, for any \(\:{\upepsilon\:}\) > 0,

$$\:\underset{n\to\:\infty\:}{\text{lim}}P\left(\right|{\widehat{\tau\:}\left(m\right)}_{i}-{\tau\:\left(m\right)}_{i}|\:>\:{\epsilon\:})={0}$$
(3)

where \(\:{\widehat{\tau\:}\left(m\right)}_{i}\) and \(\:{\tau\:\left(m\right)}_{i}\) are the estimated and true treatment effects, respectively, for the i-th patient when his microbiome profile is given as m. This indicates that, as the sample size (n) increases, we can achieve asymptotically valid and unbiased effect estimates of the primary treatment (e.g., immunotherapy) for each patient [38]. The key underlying assumption, known as unconfoundedness [47], is well satisfied in a randomized controlled trial, where the treatment assignments are randomized with regardless of the patients’ characteristics. On the contrary, in observational studies, the treatment assignments can be affected by patients’ characteristics. Though, as a remedy, we can estimate the propensity of receiving treatment based on patients’ characteristics and consider the method of propensity weighting to enhance the unconfoundedness [38, 47, 48]. To clarify, the two different versions of causal forest [38] are (i) double-sample tree for randomized controlled trials and (ii) propensity tree for observational studies [38].

However, the causal forest [38] alone does not allow any interpretable results. Thus, we use the estimated patient-level treatment effects as an output and the patients’ microbiome profiles as inputs in subsequent analyses to generate interpretable results as follows. First, we can perform subgroup identification - fitting a regression tree [37, 39, 45] - to find the groups of patients that have enhanced (or lowered) treatment effects based on their microbiome compositions [37,38,39]. Here, the regression tree [37, 39, 45] is used to partition the input (i.e., microbiome) space into multiple distinct and non-overlapping subgroups of patients [37,38,39, 45], where the treatment effect for each subgroup of patients is estimated by the mean treatment effect of the patients affiliated [38, 39, 45]. Second, we can also perform treatment effect prediction - fitting a random forest [46] instead of a regression tree [37, 39, 45] - to find the microbial taxa that have crucial roles to enhance (or lower) the treatment effects. Here, we employ the variable importance plots to rank microbial taxa in influence with respect to the amount of decrease in mean squared error, as well as the partial dependence plots to describe their patterns of the relationships (i.e., positive, negative, linear, or non-linear relationships) to the treatment effects. To summarize, the two subsequent analyses are (i) fitting a regression tree [45] for subgroup identification [37,38,39, 45] and (ii) fitting a random forest [46] to rank microbial taxa and describe their patterns of the relationships to the treatment effects.

It is a well-known fact that the random forest [46] leads to a smaller variance than the regression tree [45] with a mild loss in interpretability. In this sense, the use of random forest [45] in the subsequent analysis can produce more reliable results. However, we do not discourage the use of regression tree in the subsequent analysis because of its unique feature of subgroup identification with easy interpretation. We compared the generalized linear regression and causal machine learning in Table 1. We also outlined the overall workflow for causal machine learning in Fig. 1. Again, our main recommended method is the causal machine learning with the subsequent analysis of random forest [46] because of its robustness with less underlying assumptions and its reliability with a small variance. However, we also included the other methods in the MiCML analytic modules as user options. Each method has its own unique features with distinct strengths and weaknesses.

Table 1 The comparison between generalized linear model and causal machine learning
Fig. 1
figure 1

The overall workflow of causal machine learning

Results

Data processing

The Data Processing module consists of three sub-modules (i) Data Input, (ii) Batch Effect Correction & Quality Control and (iii) Data Transformation.

First, the purpose of the Data Input module is to upload microbiome data using a unified phyloseq [49] format file (.rdata, .rds) or individual data files (.txt, .csv, .biom, .tre, .nwk).

Second, the purpose of the Batch Effect Correction & Quality Control module is to perform batch effect correction using ConQuR [36] and quality controls with respect to the kingdom of interest (default: Kingdom), minimum library size (default: 3,000), minimum mean proportion (default: 0.02%), and to remove errors in taxonomic names [28,29,30]. The batch effect correction using ConQuR [36] is not always required; hence, users can first select to do batch effect correction (yes) or not (no). If yes, users need to select a batch ID variable for the batches (e.g., labs, studies, populations, states, locations, times). Then, users need to select a primary variable (required) and can select other nuisance variables (optional) to maintain the variability in the microbiome data due to these variables. Note again that the batch effect correction is to balance the microbiome data across the batches while preserving the signals from other primary and nuisance variables. Again, MiCML selects the batch with the largest sample size to be the reference batch. MiCML displays the sample size as well as the numbers of features (e.g., operational taxonomic units (OTUs) or amplicon sequence variants (ASVs)), phyla, classes, orders, families, genera and species. MiCML also visualizes the library sizes across patients and the mean proportions across features using interactive histograms and box plots. If the batch effect correction was conducted, MiCML visualizes the disparities in microbiome composition across batches using principal coordinate analysis (PCoA) [50] plots before and after the batch effect correction.

Finally, the purpose of the Data Transformation module is to transform the microbiome data into four widely used data formats: (i) centered log ratio (CLR) [51], (ii) rarefied count [52], (iii) proportion, and (iv) arcsine-root for each taxonomic rank (phylum, class, order, family, genus, species).

Application note

We first uploaded the gut microbiome data collected from four related studies [12,13,14, 16] on the efficacy of cancer immunotherapies for metastatic melanoma patients. Again, the data are freely available as example data in the Data Input module.

We then performed batch effect correction using ConQuR [36] and quality controls using default settings. For the batch effect correction using ConQuR [36], we selected the ‘Lab’ variable that distinguishes the four underlying studies [12,13,14, 16], as the batch ID variable, with which Matson et al., 2018 that has the largest sample size of 39 was selected as the reference batch. We then selected the ‘Response’ and ‘Treatment’ as the primary and nuisance variables, respectively. This is to balance the microbiome data across the four studies while preserving the signals from the key (response and treatment) variables. Then, finally, we rescued 294 features, that are classified into 7 phyla, 14 classes, 20 orders, 30 families, 58 genera and 134 species, for 128 metastatic melanoma patients [Fig. 2A]. We can see that the microbiome data have the nature of varying library sizes [Fig. 2B] and excessive zeros [Fig. 2C]. We can also see that the disparities in microbiome composition across the four studies have been much reduced after the batch effect correction according to the Bray-Curtis dissimilarity [53], proportional distance, arcsine distance and Aitchison distance [51] [Fig. 3].

Fig. 2
figure 2

The microbiome data after the batch effect correction and quality controls. (A) The sample size as well as the numbers of features, phyla, classes, orders, families, genera and species in the microbiome data after the quality controls. (B) The histogram and box plot for library sizes (i.e., total read counts) across patients. (C) The histogram and box plot for proportions across features

Fig. 3
figure 3

The PCoA plots before and after the batch effect correction using ConQuR. F, G, M and P represent Frankel et al., 2017, Gopalakrishnan et al., 2018, Matson et al., 2018 and Peters et al., 2019, respectively. Bray-Curtis, Proportion, Arcsine and Aitchison represent the Bray-Curtis dissimilarity, proportional distance, arcsine distance, and Aitchison distance, respectively

Finally, in the Data Transformation module, we transformed the microbiome data into four widely used data formats: CLR [51], rarefied count [52], proportion, and arcsine-root for each taxonomic rank (phylum, class, order, family, genus, species).

Data analysis

The Data Analysis module consists of three sub-modules (i) Descriptive Analysis, (ii) Generalized Linear Models and (iii) Causal Machine Learning.

First, the purpose of the Descriptive Analysis module is to perform some basic descriptive analysis to survey the association between treatment and response, not involving microbiome data. If the response variable is binary (e.g., 0: unrecovered, 1: recovered), the Fisher’s exact test [54] (default) or Pearson’s Chi-squared test [55] can be employed, while if it is continuous (e.g., post-treatment body mass index), the Mann-Whitney test [56] (default) or Welch’s t-test [57] can be employed. MiCML visualizes the results using bar plots, box plots and/or kernel density plots [58] depending on if the response variable is binary or continuous. Users can also adjust the legend locations in the plots.

Second, the purpose of the Generalized Linear Models module is to perform significance testing on the interaction effects between treatment and each microbial taxon on the patient’s recovery using generalized linear models [41]. For this, users first need to select a data format: CLR (default) [51], rarefied count [52], proportion, or arcsine-root. If the response variable is binary (e.g., 0: unrecovered, 1: recovered), the logistic regression model is employed, while if it is continuous (e.g., post-treatment body mass index), the ordinary linear regression model is employed. Users can add covariates in the model to perform covariate-adjusted analysis. Users can also survey the taxonomic ranks from phylum to genus, which is typical for 16S rRNA amplicon sequencing [1, 2], or from phylum to species, which is for shotgun metagenomic sequencing [4]. MiCML applies the Benjamini-Hochberg (BH) step-up procedures [59] to control for the false discovery rate (FDR) per taxonomic rank and produces Q-values (i.e., FDR-adjusted P-values). MiCML visualizes the results using line plots for the fitted response values stratified by treatment status. Users can also adjust the number of taxa to be displayed (default: Q-value < 0.05) and the legend locations in the plots.

Finally, the purpose of the Causal Machine Learning module is to estimate the patient-level treatment effects using causal forest [38] and then discern microbial taxa that enhance (or lower) the efficacy of the treatment in subsequent analyses: (i) subgroup identification based on microbiome profiles and (ii) ranking microbial taxa and describing their relationship patterns. For this, users first need to select a data format: CLR (default) [51], rarefied count [52], proportion, or arcsine-root. For a randomized controlled trial, the double-sample tree is employed [38], while for an observational study, the propensity tree is employed [38]. For the propensity tree, users can specify covariates (i.e., potential confounders such as age, gender, etc.) to estimate propensity scores of receiving the treatment based on the covariates and microbiome profiles [38, 47, 48]. Otherwise, the propensity scores are estimated based on the microbiome profiles only. Then, MiCML automatically performs all the subsequent analytics to discern microbial taxa that enhance (or lower) the efficacy of the treatment. First, MiCML performs subgroup identification - fitting a regression tree [37, 39, 45] - and displays the results using top-down tree structure. Second, MiCML performs treatment effect prediction - using the random forest algorithm [46] - to survey the microbial taxa that have crucial roles to enhance (or lower) the treatment effects, and displays the results using variable importance and partial dependence plots. Users can also adjust the maximum number of taxa to be displayed per taxonomic tank in the plots (default: 20).

Application note: We first performed some basic descriptive analysis to see the association between the treatment (0: anti-PD-1 only, 1: anti-PD-1 and anti-CTLA-4) and response to the treatment (0: non-responder (NR), 1: responder (R)). Then, we observed that there are more responders (22, 61.1%) than non-responders (14, 38.9%) for the patients who received both anti-PD-1 and anti-CTLA-4, while there are more non-responders (50, 54.3%) than responders (42, 45.7%) for the patients who received anti-PD-1 only [Fig. 4]. This indicates some efficacy of the combined anti-PD-1 and anti-CTLA-4 therapy over anti-PD-1 only. However, the signal was not significant at the level of 0.05 by the Fisher’s exact test [54] (P-value: 0.168) [Fig. 4].

Fig. 4
figure 4

The results from descriptive analysis. The bar plots display the counts (on the left panel) and proportions (on the right panel) of non-responders (NR) and responders (R) stratified by the treatment status (Con: anti-PD-1 only, Trt: anti-PD-1 and anti-CTLA-4). The P-value was calculated by the Fisher’s exact test on the association between treatment and response

We then performed significance testing on the interaction effects between the treatment (0: anti-PD-1 only, 1: anti-PD-1 and anti-CTLA-4) and each microbial taxon in CLR [51] on the patient’s response to the treatment (0: non-responder (NR), 1: responder (R)) using logistic regression models. We found the significant interaction effect for two microbial species, Clostridium sp. CAG:492 (Q-value: 0.001) and Enterococcus faecium [C] (Q-value: 0.032). We also estimated that the response rate increases as the CLR-transformed abundance of Clostridium sp. CAG:492 and Enterococcus faecium [C] decreases, indicating enhanced treatment effects [Fig. 5]. Hence, Clostridium sp. CAG:492 and Enterococcus faecium [C] might be harmful species in the administration of the combined anti-PD-1 and anti-CTLA-4 therapy.

Fig. 5
figure 5

The results from Generalized Linear Models at the species level. The line plots display the fitted response rates (i.e., predicted probabilities to be the respondent (R)) stratified by treatment status (Con: anti-PD-1 only, Trt: anti-PD-1 and anti-CTLA-4). Q-value represents FDR-adjusted P-value

Finally, we performed causal machine learning [37,38,39] to discern microbial taxa in CLR [51] that enhance (or lower) the efficacy of the combined anti-PD-1 and anti-CTLA-4 therapy. Since the underlying study design was observational [12,13,14, 16, 17], we used the propensity tree [38, 47, 48]. Then, first, from the subgroup identification, we found that the metastatic melanoma patients that had the species Clostridium sp. CAG:492 < 0.998 and Faecalibacterium prausnitzii  ≥ 1.43 in CLR-transformed abundance made an enhanced treatment effect (5.2%) [Fig. 6]. Hence, the combined anti-PD-1 and anti-CTLA-4 therapy might be effective for the patients that belong to this subgroup. We can also modulate the patient’s microbiome profile - using some dietary supplements or therapeutics - to make the patient to belong to this subgroup and to enhance the efficacy of the combined anti-PD-1 and anti-CTLA-4 therapy eventually. On the contrary, the metastatic melanoma patients that had the species Clostridium sp. CAG:492 ≥ 0.998 in CLR-transformed abundance made a lowered treatment effect (-14.2%) [Fig. 6]. Hence, the combined anti-PD-1 and anti-CTLA-4 therapy may not be effective for the patients that belong to this subgroup. Second, from the treatment effect prediction for ranking microbial taxa and describing their patterns of the relationships, we found Clostridium sp. CAG:492 and Enterococcus faecium [C] as two most influential microbial species on the efficacy of the combined anti-PD-1 and anti-CTLA-4 therapy as in the variable importance plot [Fig. 7], while the other 18 microbial species showed no significant influence [Fig. 7].

Fig. 6
figure 6

The results from Causal Machine Learning for the subgroup identification based on the fitted regression tree at the species level. The leaves are the identified subgroups of patients, and the values on the leaves are their estimated treatment effects. S8: Clostridium sp. CAG:492, s12: Faecalibacterium prausnitzii

Fig. 7
figure 7

The results from Causal Machine Learning for the variable importance in treatment effect prediction at the species level. s8: Clostridium sp. CAG:492; s65: Enterococcus faecium [C]; s72: Desulfovibrio sp. [C 3_1_syn3/6_1_46AFAA]; s12: Faecalibacterium prausnitzii; s2: Holdemania filiformis; s52: Anaerostipes caccae [C]; s33: [Clostridium] scindens; s100: Parabacteroides goldsteinii [C]; s1: Staphylococcus sp. CAG:324; s59: Veillonella atypica [C]; s76: Collinsella intestinalis; s49: Eubacterium sp. CAG:156; s83: Prevotella copri; s4: [Clostridium] innocuum [C Clostridioides difficile]; s93: Bacteroides eggerthii; s39: Roseburia sp. 40_7; s67: Klebsiella sp. [C quasipneumoniae/ pneumoniae/ varicola]; s69: Haemophilus parainfluenzae; s60: Dialister invisus; s71: Sutterella wadsworthensis

We also estimated that the response rate decreases as the CLR-transformed abundance of Clostridium sp. CAG:492 and Enterococcus faecium [C] increases, indicating lowered treatment effects [Fig. 8]. Hence, these species might be harmful species in the administration of the combined anti-PD-1 and anti-CTLA-4 therapy. Especially, the relationship is considerably non-linear for Clostridium sp. CAG:492 and Enterococcus faecium [C], where the response rate drops dramatically for Clostridium sp. CAG:492 > 1 and Enterococcus faecium [C] > 0.5 [Fig. 8]. This indicates in a clinical sense that lowering the levels of Clostridium sp. CAG:492 and Enterococcus faecium [C] for the patients that are rich in them might be helpful to enhance the efficacy of the combined anti-PD-1 and anti-CTLA-4 therapy, yet it is vice versa for the patients that are already deplete in them [Fig. 8].

Fig. 8
figure 8

The results from Causal Machine Learning for the partial dependences to treatment effects at the species level. s8: Clostridium sp. CAG:492; s65: Enterococcus faecium [C]; s72: Desulfovibrio sp. [C 3_1_syn3/6_1_46AFAA]; s12: Faecalibacterium prausnitzii; s2: Holdemania filiformis; s52: Anaerostipes caccae [C]; s33: [Clostridium] scindens; s100: Parabacteroides goldsteinii [C]; s1: Staphylococcus sp. CAG:324; s59: Veillonella atypica [C]; s76: Collinsella intestinalis; s49: Eubacterium sp. CAG:156; s83: Prevotella copri; s4: [Clostridium] innocuum [C Clostridioides difficile]; s93: Bacteroides eggerthii; s39: Roseburia sp. 40_7; s67: Klebsiella sp. [C quasipneumoniae/ pneumoniae/ varicola]; s69: Haemophilus parainfluenzae; s60: Dialister invisus; s71: Sutterella wadsworthensis

We reviewed the literature on the microbial species that we identified as harmful (i.e., Clostridium sp. CAG:492 and Enterococcus faecium) and beneficial (i.e., Faecalibacterium prausnitzii) in the administration of the combined anti-PD-1 and anti-CTLA-4 therapy. First, Clostridium is a well-known genus that includes pathogenetic species, such as Clostridium botulinum that can cause botulism [60], and Clostridium perfringens that is associated with the conditions of cellulitis, fasciitis, necrotic enteritis and gas gangrene [61]. However, Clostridium sp. CAG:492 is not a well-known species, and does not yet have an officially designated name. Second, Enterococcus faecium is known for its antibiotic resistance and its role in causing enterococcal infections in hospitalized patients [62]. Third, Faecalibacterium prausnitzii is one of the most prevalent species in the gut, known for its commensal role in promoting gut health and its potential to discriminate ulcerative colitis and Crohn’s disease [63]. Although these studies do not specifically investigate the combined anti-PD-1 and anti-CTLA-4 therapy, they reveal consistent trends of harmful or beneficial effects on human health, supporting the reliability of our findings.

Discussion and conclusion

In this paper, we introduced a web cloud computing platform, MiCML, for comprehensive analysis of treatment effects using microbiome profiles. MiCML is especially unique with the cutting-edge facilities for (i) batch effect correction and (ii) causal machine learning. To be detailed, first, for the batch effect correction, MiCML employs ConQuR [36] to address the high complexity (e.g., zero-inflation, over-dispersion) of the microbiome data robustly, and for convenience as the final batch effect-free count data can be used in general for any kinds of subsequent data analytics. Second, for the causal machine learning, MiCML employs the causal forest [38] for consistent estimation of patient-level treatment effects, and then performs subgroup identification [37,38,39, 45] and treatment effect prediction for discerning microbial taxa that enhance (or lower) the efficacy of the primary treatment.

We illustrated the use of MiCML through the example analysis of the gut microbiome data on the efficacy of cancer immunotherapy [12,13,14, 16, 17]. To increase the sample size, we combined gut microbiome data from four related studies [12,13,14, 16, 17] and then applied the batch effect correction using ConQuR [36]. However, the sample size of the combined data was 128, which is still not huge. Hence, for better reliability, larger-scale microbiome data are needed. We found Clostridium sp. CAG:492 and Enterococcus faecium as harmful species and Faecalibacterium prausnitzii as beneficial species in the administration of the combined anti-PD-1 and anti-CTLA-4 therapy. There was no major contradiction between our findings using the generalized linear model and causal machine learning, indicating strong reproducibility and reliability. The prior studies [60,61,62,63] also supported our findings.

However, different methods can yield varying results, which is, of course, a common and expected occurrence. For novel findings, there can also be no directly related prior studies available. However, this does not necessarily indicate that your findings are inaccurate. If you encounter challenges in interpreting discrepancies between methods, we recommend clearly documenting the methods and software used to facilitate reproducibility in future studies. MiCML also provides a comprehensive list of reference papers for the methods employed, enhancing user convenience.

The cancer immunotherapy is a form of cancer treatment that uses the power of the human immune system to prevent, control and eliminate cancer, and it is widely recognized as a new paradigm of cancer treatment [64, 65]. However, it does not work for all the patients for all different kinds of cancer, which indicates that its treatment effects are heterogenous across patients. Moreover, it is highly expensive, and there are also different kinds of cancer immunotherapy, and their treatment effects can all vary. Notably, many recent studies [12,13,14,15,16,17,18,19] have reported pre-clinical evidence that the microbiome can influence the efficacy of cancer immunotherapy. This in turn implies that we can augment the treatment effect by manipulating the patient’s microbiome profile using dietary supplements (e.g., prebiotics, probiotics, dietary fiber) or therapeutics (e.g., antibiotics, pharmabiotics, phage therapy, microbiota transplantation) in advance. Conversely, we can profile the patient’s microbiome first, and then employ the treatment that works for the patient. Yet, for such microbiome-based personalized medicine, we need a carefully planned and automated software facility that streamlines all related data processing and analytic procedures, for which MiCML can serve as a user-friendly analytic tool for many people in various disciplines. Again, cancers and immunotherapies should not be the only cases to be considered. MiCML is an analytic tool in general for the interaction effects between treatment and microbiome on the patient’s response, and we do not restrict the use of MiCML for the cancer immunotherapy to be the treatment and the recovery status from cancer to be the response.

Data availability

We used public microbiome data, where the raw sequence data are deposited at the European Nucleotide Archive with accession IDs: PRJEB22893, PRJNA399742, PRJNA397906, PRJEB22863 and PRJNA541981, NCBI BioProject with accession ID PRJEB22863, and European Genome-Phenome Archive with accession ID, EGA S00001002698. The final processed data are also freely available as example data in the Data Input module (http://micml.micloud.kr) or can also be obtained from the GitHub repository (https://github.com/angelolimeta/gut-microbiome-immunotherapy). MiCML is freely available on our web server (http://micml.micloud.kr) or can alternatively run on a user’s local computer (https://github.com/hk1785/micml).

References

  1. Woese CR, Fox GE. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci. 1977;74(11):5088–90.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Woese CR, Kandler O, Wheelis ML. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci. 1990;87(12):4576–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W, Fungal Barcoding Consortium, Fungal Barcoding Consortium Author List, Bolchacova E, et al. Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proc Natl Acad Sci. 2012;109(16):6241–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Thomas T, Gilbert J, Meyer F. Metagenomics-a guide from sampling to data analysis. Microb Inf Exp. 2012;2(3):1–12.

    Google Scholar 

  5. Ridaura VK, Faith JJ, Rey FE, Cheng J, Duncan AE, Kau AL, Griffin NW, Lombard V, Henrissat B, Bain JR, Muehlbauer MJ. Gut microbiota from twins discordant for obesity modulate metabolism in mice. Science. 2013;341(6150):1241214.

    Article  PubMed  Google Scholar 

  6. Sampson TR, Debelius JW, Thron T, Janssen S, Shastri GG, Ilhan ZE, Challis C, Schretter CE, Rocha S, Gradinaru V, et al. Gut microbiota regulate motor deficits and neuroinflammation in a model of Parkinson’s disease. Cell. 2016;167(6):1469–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. De Palma G, Lynch MD, Lu J, Dang VT, Deng Y, Jury J, Umeh G, Miranda PM, Pigrau Pastor M, Sidani S, Pinto-Sanchez MI. Transplantation of fecal microbiota from patients with irritable bowel syndrome alters gut function and behavior in recipient mice. Sci Transl Med. 2017;9(379):eaaf6397.

    Article  PubMed  Google Scholar 

  8. Kang DW, Adams JB, Gregory AC, Borody T, Chittick L, Fasano A, Khoruts A, Geis E, Maldonado J, McDonough-Means S, et al. Microbiota transfer therapy alters gut ecosystem and improves gastrointestinal and autism symptoms: an open-label study. Microbiome. 2017;5(10):1–16.

    Google Scholar 

  9. Touw K, Ringus DL, Hubert N, Wang Y, Leone VA, Nadimpalli A, Theriault BR, Huang YE, Tune JD, Herring PB, et al. Mutual reinforcement of pathophysiological host-microbe interactions in intestinal stasis models. Physiol Rep. 2017;5(6):e13182.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Johnsen PH, Hilpüsch F, Cavanagh JP, Leikanger IS, Kolstad C, Valle PC, Goll R. Faecal microbiota transplantation versus placebo for moderate-to-severe irritable bowel syndrome: a double-blind, randomised, placebo-controlled, parallel-group, single-centre trial. Lancet Gastroenterol Hepatol. 2018;3(1):17–24.

    Article  PubMed  Google Scholar 

  11. Zhang XS, Li J, Krautkramer KA, Badri M, Battaglia T, Borbet TC, Koh H, Ng S, Sibley RA, Li Y, et al. Antibiotic-induced acceleration of type 1 diabetes alters maturation of innate intestinal immunity. Elife. 2018;7:e37816.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Matson V, Fessler J, Bao R, Chongsuwat T, Zha Y, Alegre ML, Luke JJ, Gajewski TF. The commensal microbiome is associated with anti–PD-1 efficacy in metastatic melanoma patients. Science. 2018;359(6371):104–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Frankel AE, Coughlin LA, Kim J, Froehlich TW, Xie Y, Frenkel EP, Koh AY. Metagenomic shotgun sequencing and unbiased metabolomic profiling identify specific human gut microbiota and metabolites associated with immune checkpoint therapy efficacy in melanoma patients. Neoplasia. 2017;19(10):848–55.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Gopalakrishnan V, Spencer CN, Nezi L, Reuben A, Andrews MC, Karpinets T, Prieto PA, Vicente D, Hoffman K, Wei SC, et al. Gut microbiome modulates response to anti–PD-1 immunotherapy in melanoma patients. Science. 2018;359(6271):97–103.

    Article  CAS  PubMed  Google Scholar 

  15. Routy B, Le Chatelier E, Derosa L, Duong CP, Alou MT, Daillère R, Fluckiger A, Messaoudene M, Rauber C, Roberti MP, et al. Gut microbiome influences efficacy of PD-1–based immunotherapy against epithelial tumors. Science. 2018;359(6371):91–7.

    Article  CAS  PubMed  Google Scholar 

  16. Peters BA, Wilson M, Moran U, Pavlick A, Izsak A, Wechter T, Weber JS, Osman I, Ahn J. Relating the gut metagenome and metatranscriptome to immunotherapy responses in melanoma patients. Genome Med. 2019;11(61):1–14.

    CAS  Google Scholar 

  17. Limeta A, Ji B, Levin M, Gatto F, Nielsen J. Meta-analysis of the gut microbiota in predicting response to cancer immunotherapy in metastatic melanoma. JCI Insight. 2020;5(23):e140940.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Mager LF, Burkhard R, Pett N, Cooke NC, Brown K, Ramay H, Paik S, Stagg J, Groves RA, Gallo M, et al. Microbiome-derived inosine modulates response to checkpoint inhibitor immunotherapy. Science. 2020;369(6510):1481–9.

    Article  CAS  PubMed  Google Scholar 

  19. Li X, Zhang S, Guo G, Han J, Yu J. Gut microbiome in modulating immune checkpoint inhibitors. EBioMedicine. 2022;82(104163):1–11.

    Google Scholar 

  20. Chen L, Li J, Zhu W, Kuang Y, Liu T, Zhang W, Chen X, Peng C. Skin and gut microbiome in psoriasis: gaining insight into the pathophysiology of it and finding novel therapeutic strategies. Front Microbiol. 2020;11:589726.

    Article  PubMed  PubMed Central  Google Scholar 

  21. An SB, Yang BG, Jang G, Kim DY, Kim J, Oh SM, Oh N, Lee S, Moon JY, Kim JA, et al. Combined IgE neutralization and Bifidobacterium longum supplementation reduces the allergic response in models of food allergy. Nat Commun. 2022;13:5669.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Yang J, Shin TS, Kim JS, Jee YK, Kim YK. A new horizon of precision medicine: combination of the microbiome and extracellular vesicles. Exp Mol Med. 2022;54:466–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Chang ZY, Liu HM, Leu YL, Hsu CH, Lee TY. Modulation of gut microbiota combined with upregulation of intestinal tight junction explains anti-inflammatory effect of corylin on colitis-associated cancer in mice. Int J Mol Sci. 2022;23(5):2667.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Deehan EC, Zhang Z, Riva A, Armet AM, Perez-Muñoz ME, Nguyen NK, Krysa JA, Seethaler B, Zhao YY, Cole J, et al. Elucidating the role of the gut microbiota in the physiological effects of dietary fiber. Microbiome. 2022;10(77):1–22.

    Google Scholar 

  25. Liu C, Jiang W, Yang F, Cheng Y, Guo Y, Yao W, Zhao Y, Qian H. The combination of microbiome and metabolome to analyze the cross-cooperation mechanism of Echinacea purpurea polysaccharide with the gut microbiota in vitro and in vivo. Food Funct. 2022;13:10069–82.

    Article  CAS  PubMed  Google Scholar 

  26. Pearson JA, Ding H, Hu C, Peng J, Galuppo B, Wong FS, Caprio S, Santoro N, Wen L. IgM-associated gut bacteria in obesity and type 2 diabetes in C57BL/6 mice and humans. Diabetologia. 2022;65(8):1398–411.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Jones-Nelson O, Tovchigrechko A, Glover MS, Fernandes F, Rangaswamy U, Liu H, Tabor DE, Boyd J, Warrener P, Martinez J, et al. Antibacterial monoclonal antibodies do not disrupt the intestinal microbiome or its function. Antimicrob Agents Chemother. 2020;64(5):e02347–19.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Jang H, Koh H, Gu W, Kang B. Integrative web cloud computing and analytics using MiPair for design-based comparative analysis with paired microbiome data. Sci Rep. 2022;12(1):20465.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Jang H, Park S, Koh H. Comprehensive microbiome causal mediation analysis using MiMed on user-friendly web interfaces. Biology Methods Protocols. 2023;8(1):bpad023.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Kim J, Koh H. A unified web cloud analytic platform for user-friendly and interpretable microbiome data mining using tree-based methods. Microorganisms. 2023;11(11):2816.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Dhariwal A, Chong J, Habib S, King IL, Agellon LB, Xia J. MicrobiomeAnalyst: a web-based tool for comprehensive statistical, visual and meta-analysis of microbiome data. Nucleic Acids Res. 2017;45(W1):W180–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Weber N, Liou D, Dommer J, MacMenamin M, Quiñones M, Misner I, et al. Nephele: a cloud platform for simplified, standardized and reproducible microbiome data analysis. Bioinformatics. 2017;34(8):1411–3.

    Article  PubMed Central  Google Scholar 

  33. Gonzalez A, Navas-Molina JA, Kosciolek T, McDonald D, Vázquez-Baeza Y, Ackermann G, et al. Qiita: rapid, web-enabled microbiome meta-analysis. Nat Methods. 2018;15:796–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME2. Nat Biotechnol. 2019;37(8):852–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Mitchell K, Ronas J, Dao C, Freise AC, Mangul S, Shapiro C et al. PUMAA: a platform for accessible microbiome analysis in the undergraduate classroom. Front Microbiol 2020; 11(584699).

  36. Ling W, Lu J, Zhao N, Lulla A, Plantinga AM, Fu W, Zhang A, Liu H, Song H, Li Z, et al. Batch effects removal for microbiome data via conditional quantile regression. Nat Commun. 2022;13:5418.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Foster JC, Taylor JM, Ruberg SJ. Subgroup identification from randomized clinical trial data. Stat Med. 2011;30(24):2867–80.

    Article  PubMed  Google Scholar 

  38. Wager S, Athey S. Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc. 2018;113(523):1228–42.

    Article  CAS  Google Scholar 

  39. Koh H. Subgroup identification using virtual twins for human microbiome studies. IEEE/ACM Trans Comput Biol Bioinform. 2023;20(6):3800–8.

    Article  CAS  PubMed  Google Scholar 

  40. Zhang Y, Parmigiani G, Johnson WE. ComBat-Seq: batch effect adjustment for RNA-Seq count data. NAR Genom Bioinform. 2020;2(3):lqaa078.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Nelder J, Wedderburn R. Generalized linear models. J R Stat Soc Ser A. 1972;135(3):370–84.

    Article  Google Scholar 

  42. Koh H, Zhao N. A powerful microbial group association test based on the higher criticism analysis for sparse microbial association signals. Microbiome. 2020;8:63.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Neyman J. Sur les applications de la théorie des probabilités aux experiences agricoles: Essai Des principes. Roczniki Nauk Rolniczych. 1923;10:1–51.

    Google Scholar 

  44. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688–701.

    Article  Google Scholar 

  45. Breiman L, Friedman JH, Stone CJ, Olshen RA. Classification and regression trees. CRC.; 1984.

  46. Breiman L. Random forests. Mach Learn. 2001;45:5–32.

    Article  Google Scholar 

  47. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55.

    Article  Google Scholar 

  48. Hirano K, Imbens GW, Ridder G. Efficient estimation of Average Treatment effects using the estimated propensity score. Econometrica. 2003;71(4):1161–89.

    Article  Google Scholar 

  49. McMurdie PJ, Holmes S. Phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE. 2013;8(4):e61217.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Torgerson WS. Multidimensional scaling: I. Theory and method. Psychometrika. 1952;17(4):401–19.

    Article  Google Scholar 

  51. Aitchison J. The statistical analysis of compositional data. J R Stat Soc Ser B. 1982;44(2):139–77.

    Article  Google Scholar 

  52. Sanders HL. Marine benthic diversity: a comparative study. Am Nat. 1968;102(925):243–82.

    Article  Google Scholar 

  53. Bray JR, Curtis JT. An ordination of the upland forest communities of southern Wisconsin. Ecol Monogr. 1957;27(4):325–49.

    Article  Google Scholar 

  54. Fisher RA. On the interpretation of Chi-square from contingency tables, and the calculation of P. J R Stat Soc. 1922;85(1):87–94.

    Article  Google Scholar 

  55. Pearson K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos Mag Ser 5. 1900;50(302):157–75.

    Article  Google Scholar 

  56. Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat. 1947;18(1):50–60.

    Article  Google Scholar 

  57. Welch BL. The generalization of Student’s problem when several different population variances are involved. Biometrika. 1947;34(1/2):28–35.

    Article  CAS  PubMed  Google Scholar 

  58. Rosenblatt M. Remarks on some nonparametric estimates of a density function. Ann Math Stat. 1956;27(3):832–7.

    Article  Google Scholar 

  59. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 1995;57(1):289–300.

    Article  Google Scholar 

  60. Erbguth FJ. Historical notes on botulism, Clostridium botulinum, botulinum toxin, and the idea of the therapeutic use of the toxin. Mov Disord. 2004;Suppl 8:S2–6.

    Article  Google Scholar 

  61. Kiu R, Hall LJ. An update on the human and animal enteric pathogen Clostridium perfringens. Emerg Microbes Infect. 2018;7(1):141.

    Article  PubMed  PubMed Central  Google Scholar 

  62. Zhou X, Willems RJL, Friedrich AW et al. Enterococcus faecium: from microbiological insights to practical recommendations for infection control and diagnostics. Antimicrob. Resist. Infect. Control. 2020;9(130).

  63. Lopez-Siles M, Duncan S, Garcia-Gil L, et al. Faecalibacterium prausnitzii: from microbiology to diagnostics and prognostics. ISME J. 2017;11:841–52.

    Article  PubMed  PubMed Central  Google Scholar 

  64. Arbour KC, Riely GJ. Systemic therapy for locally advanced and metastatic non–small cell lung cancer: a review. J Am Med Assoc. 2019;322(8):764–74.

    Article  CAS  Google Scholar 

  65. Morad G, Helmink BA, Sharma P, Wargo JA. Hallmarks of response, resistance, and toxicity to immune checkpoint blockade. Cell. 2021;184(21):5309–37.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the reviewers for their insightful observations and comments.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (2021R1C1C1013861) and the Individual Faculty Accounts (IFA) funded by the State University of New York, Korea (SUNY Korea).

Author information

Authors and Affiliations

Authors

Contributions

HK conceptualized and initiated the study, curated the data, contributed to the methodological aspects, developed the overall architecture and design, wrote the programs, developed the shiny application, web server and local GitHub repository, performed data analysis, and wrote the manuscript. JK and HJ wrote the programs and developed the shiny application, web server and local GitHub repository. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Hyunwook Koh.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Koh, H., Kim, J. & Jang, H. MiCML: a causal machine learning cloud platform for the analysis of treatment effects using microbiome profiles. BioData Mining 18, 10 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13040-025-00422-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13040-025-00422-3

Keywords