MicrobeStudio
Home
Upload Data
Filter
Rarefaction
Abundance
Alpha Diversity
Dendrogram
Ordination
PERMANOVA
Metadata
Regression
Indicator Species
From raw amplicon data to publication-ready insights — no coding required
Measure alpha diversity (Shannon, Simpson, Chao1), compare groups with Kruskal-Wallis tests, and assess sampling adequacy with rarefaction curves
Visualize taxonomic composition with bar plots, heatmaps, and dendrograms at any rank from Phylum to Species
Run PERMANOVA, PERMDISP, and ordination (NMDS, PCoA, t-SNE) with significance testing and downloadable results
RDA, Mantel tests, regression analysis, and SHAP-based indicator species identification linking taxa to metadata
Drag & drop phyloseq .rds, BIOM, or CSV files
Remove rare taxa, apply rarefaction or TSS
Explore modules, download plots as PDF
Load .rds, BIOM, or CSV files
Remove taxa, apply TSS or rarefaction
Assess sampling depth adequacy
Bar plots, heatmaps, line charts
Shannon, Simpson, Chao1 with statistics
Hierarchical clustering of samples
NMDS, PCoA, t-SNE with stress info
Statistical testing of group differences
RDA, correlations, Mantel tests
Taxa vs. environmental variables
SHAP-based biomarker identification
MicrobeStudio is an interactive Shiny application for exploring microbiome data through powerful visual analytics and statistical tools. Designed for streamlined phyloseq analysis, this guide walks you through each feature and provides detailed information on input formats to ensure a smooth experience.
Choose one of two supported formats:
Upload three separate tables:
| File Type | Description | Format |
|---|---|---|
| Count Table | Abundance data | Rows = ASVs/Taxa, Columns = Samples |
| Taxonomy Table | Taxonomic classifications | Rows = ASVs/Taxa, Columns = Taxonomic ranks |
| Metadata Table | Sample information | Rows = Samples, Columns = Metadata variables |
A pre-assembled R object containing all components in a single file.
💡 Pro Tip: For phylogeny-based analyses, your phyloseq object must include a phylogenetic tree. See the Phyloseq creation guide below for instructions.
Clean and prepare your data for analysis:
🔄 Important: Once filtering is complete, the cleaned dataset is used across all analysis modules automatically.
Purpose: Assess sequencing depth adequacy
Features:
Purpose: Explore taxonomic composition
Plot Types:
Customization Options:
Purpose: Analyze sample relationships through clustering
Features:
Purpose: Measure within-sample diversity
Available Metrics:
Customization:
Purpose: Examine between-sample diversity patterns
Ordination Methods:
Distance Measures:
Statistical Testing:
Purpose: Correlate microbial communities with numerical environmental variables
Variables: pH, temperature, age, BMI, etc.
Analysis Steps:
⚠️ Critical: Only select columns with numeric data. Text or factor data will cause errors.
Purpose: Model relationships between specific taxa and environmental variables
Workflow:
data/)For optimal performance with large datasets, run MicrobeStudio on your local machine.
git clone https://github.com/shanptom/MicrobeStudio.git ~/Path/to/your/folder
Run this command in R to install required packages:
source("Path/to/install_dep.R")
library(shiny)
runApp('/Path/to/MicrobeStudio')
For High-Performance Computing environments, load required modules first:
Example for Ohio Supercomputer Center (OSC):
ml gcc/12.3.0
ml R/4.4.0
ml gdal/3.7.3
ml proj/9.2.1
ml geos/3.12.0
R
🖥️ Note: Modify module commands based on your specific HPC environment.
The phyloseq package in R is a powerful and widely used tool for analyzing and visualizing microbial community data from metabarcoding studies. It integrates various types of data into a single structured object, enabling efficient and reproducible analysis.
A phyloseq object typically consists of the following components:
This table contains the number of times each taxon (e.g., OTU or ASV) was observed in each sample. Rows represent taxa and columns represent samples.
Example:
| Taxa | Sample1 | Sample2 | Sample3 |
|---|---|---|---|
| Taxa1 | 10 | 15 | 20 |
| Taxa2 | 12 | 7 | 0 |
| Taxa3 | 3 | 14 | 24 |
This table describes the taxonomic classification of each taxon. Each row corresponds to a taxon, and columns represent taxonomic levels such as Kingdom, Phylum, Class, Order, Family, Genus, and Species.
Example:
| Taxa | Kingdom | Phylum | Class | Order | Family | Genus | Species |
|---|---|---|---|---|---|---|---|
| Taxa1 | Bacteria | Proteobacteria | Alphaproteobacteria | Rhizobiales | Rhizobiaceae | Rhizobium | R. leguminosarum |
| Taxa2 | Bacteria | Actinobacteria | Actinobacteria | Actinomycetales | Micrococcaceae | Arthrobacter | A. globiformis |
| Taxa3 | Bacteria | Firmicutes | Bacilli | Bacillales | Bacillaceae | Bacillus | B. subtilis |
This table includes descriptive information about each sample, such as sampling location, time point, environmental variables, or experimental conditions.
Example:
| SampleID | Location | pH | Temperature | Season | Treatment |
|---|---|---|---|---|---|
| Sample1 | Lake_North | 7.2 | 15.3°C | Spring | Control |
| Sample2 | Lake_South | 6.8 | 18.1°C | Summer | Treated |
| Sample3 | Lake_East | 7.0 | 14.7°C | Fall | Control |
These are the actual DNA sequences (usually ASVs or OTUs) representing each taxon, often in FASTA format. This component is useful for downstream functional or phylogenetic analysis.
A phylogenetic tree inferred from the reference sequences, often using tools such as DECIPHER, phangorn, or external software (e.g., FastTree). It helps in calculating phylogeny-aware metrics like UniFrac distances.
Expected Input Files
| File Name | Description |
|---|---|
count_table.csv |
ASV/OTU count table (taxa x samples) |
taxonomy_table.csv |
Taxonomic classification per ASV |
metadata.csv |
Sample metadata (samples x variables) |
Open R and Install latest version of Phyloseq from here
library(phyloseq)
The taxonomy table should have taxa/ASVs as row names and taxonomic ranks as columns.
Code:
tax <- read.csv("taxonomy_table.csv", header = TRUE, row.names = 1)
tax <- tax_table(as.matrix(tax))
The count table should have ASVs as row names and samples as column names.
Code:
asv_counts <- read.csv("asv_table.csv", header = TRUE, row.names = 1)
asv_counts <- otu_table(as.matrix(asv_counts), taxa_are_rows = TRUE)
The metadata should have sample names as row names.
Code:
meta <- read.csv("metadata.csv", header = TRUE, row.names = 1)
meta <- sample_data(meta)
Now combine all components into a single phyloseq object:
physeq <- phyloseq(asv_counts, tax, meta)
You can now use this physeq object for downstream analysis and visualization in phyloseq.
A complete phyloseq object provides a coherent structure for integrating and analyzing the following components:
Together, these elements enable robust exploration of microbial community composition, structure, and relation to metadata.
| Function | Description |
|---|---|
read.csv() |
Load all data tables as CSV files |
as.matrix() |
Convert to matrix for phyloseq input |
tax_table() |
Create taxonomic data |
otu_table() |
Create ASV count data |
sample_data() |
Create sample metadata |
phyloseq() |
Combine into a single phyloseq object |
Phylogenetic trees are essential for calculating phylogeny-aware diversity metrics like UniFrac and for conducting null model analyses. Below is a detailed step-by-step guide to build a phylogenetic tree from your reference sequences, starting from a phyloseq object or an external FASTA file.
You can extract reference sequences directly from a phyloseq object or load them from an external FASTA file. Ensure the sequence names (headers) exactly match the taxon/ASV names in your count table.
refseq <- ps@refseq
where ps is your phyloseq object
library(Biostrings)
refseq <- readDNAStringSet("refseqs.fasta")
⚠️ Important: Taxon/ASV names in the FASTA headers must exactly match the taxa names in your OTU/ASV table.
Use the DECIPHER package to align sequences. This step is compute-intensive; request sufficient resources if running on an HPC cluster (e.g., OSC).
library(DECIPHER)
alignment <- AlignSeqs(refseq, anchor = NA, processors = 48)
Convert the aligned sequences into a format suitable for distance and tree calculations.
library(phangorn)
phang.align <- phyDat(as(alignment, "matrix"), type = "DNA")
Use maximum likelihood distance based on the alignment.
dm <- dist.ml(phang.align)
treeNJ <- NJ(dm)
fit <- pml(treeNJ, data = phang.align)
This step improves the model fit. However, it is computationally intensive, especially with large datasets. Use update() to test parameters incrementally to avoid crashes.
fitGTR <- update(fit, k = 4, inv = 0.2)
🧠 Tip:
If
update()crashes or takes too long, start with lower values likek = 2andinv = 0.Increase
k(number of gamma rate categories) andinv(proportion of invariant sites) gradually.You can use
AIC(fitGTR)to evaluate model fit.
saveRDS(fitGTR, file = "fitGTR.rds")
Phyloseq objectUse this tree in downstream diversity analyses, like UniFrac, or integrate it into your phyloseq object using phy_tree().
ps <- merge_phyloseq(ps, phy_tree(fitGTR$tree))
| Step | Description |
|---|---|
| 1. | Extract reference sequences (from phyloseq or FASTA) |
| 2. | Align sequences with DECIPHER::AlignSeqs |
| 3. | Convert alignment to phyDat |
| 4. | Compute distance matrix with phangorn::dist.ml |
| 5. | Build initial NJ tree |
| 6. | Fit model with pml() |
| 7. | Optimize with update() using GTR |
| 8. | Save tree object for later use |
Uploaded files are stored temporarily on the server.