Microbiome Analysis Platform

MicrobeStudio

From raw amplicon data to publication-ready insights — no coding required

10+

Analysis Modules

PDF

Export All Plots

4

Input Formats

Statistical Tests

What You Can Do

Diversity Analysis

Measure alpha diversity (Shannon, Simpson, Chao1), compare groups with Kruskal-Wallis tests, and assess sampling adequacy with rarefaction curves

Community Profiling

Visualize taxonomic composition with bar plots, heatmaps, and dendrograms at any rank from Phylum to Species

Statistical Testing

Run PERMANOVA, PERMDISP, and ordination (NMDS, PCoA, t-SNE) with significance testing and downloadable results

Environmental Correlations

RDA, Mantel tests, regression analysis, and SHAP-based indicator species identification linking taxa to metadata

How It Works

1

Upload

Drag & drop phyloseq .rds, BIOM, or CSV files

2

Filter & Normalize

Remove rare taxa, apply rarefaction or TSS

3

Analyze & Export

Explore modules, download plots as PDF

Analysis Modules

Upload Data

Load .rds, BIOM, or CSV files

Filter & Normalize

Remove taxa, apply TSS or rarefaction

Rarefaction Curves

Assess sampling depth adequacy

Abundance Plots

Bar plots, heatmaps, line charts

Alpha Diversity

Shannon, Simpson, Chao1 with statistics

Dendrogram

Hierarchical clustering of samples

Ordination

NMDS, PCoA, t-SNE with stress info

PERMANOVA

Statistical testing of group differences

Metadata Analysis

RDA, correlations, Mantel tests

Regression

Taxa vs. environmental variables

Indicator Species

SHAP-based biomarker identification

Powered by

phyloseq vegan microeco ggplot2 XGBoost phylosmith

MicrobeStudio User Manual

MicrobeStudio is an interactive Shiny application for exploring microbiome data through powerful visual analytics and statistical tools. Designed for streamlined phyloseq analysis, this guide walks you through each feature and provides detailed information on input formats to ensure a smooth experience.

1. Getting Started

Quick Start Workflow

Upload your data (CSV files or phyloseq object)
Filter unwanted taxa and normalize data
Explore using various analysis modules
Customize visualizations and export results

2. Data Upload

Choose one of two supported formats:

Option A: CSV Files (Recommended for beginners)

Upload three separate tables:

File Type	Description	Format
Count Table	Abundance data	Rows = ASVs/Taxa, Columns = Samples
Taxonomy Table	Taxonomic classifications	Rows = ASVs/Taxa, Columns = Taxonomic ranks
Metadata Table	Sample information	Rows = Samples, Columns = Metadata variables

Option B: phyloseq Object (.rds)

A pre-assembled R object containing all components in a single file.

💡 Pro Tip: For phylogeny-based analyses, your phyloseq object must include a phylogenetic tree. See the Phyloseq creation guide below for instructions.

3. Data Filtering

Clean and prepare your data for analysis:

Available Filters

Remove Unwanted Taxa: Filter out contaminants (e.g., Chloroplast, Mitochondria)
Rarefaction: Normalize sequencing depth across samples (optional but recommended)

How to Apply Filters

Specify taxa to remove by name
Choose rarefaction depth (if desired)
Click Apply Filtering

🔄 Important: Once filtering is complete, the cleaned dataset is used across all analysis modules automatically.

4. Analysis Modules

4.1 Rarefaction Plot

Purpose: Assess sequencing depth adequacy

Features:

Visualize depth distribution across samples
Color-code and organize by metadata categories
Identify samples with insufficient coverage

4.2 Abundance Analysis

Purpose: Explore taxonomic composition

Plot Types:

Bar Plots: Stacked relative abundance
Line Plots: Abundance trends over samples/conditions
Heatmaps: Color-coded abundance matrices

Customization Options:

Select taxonomic rank (Genus, Family, etc.)
Focus on top N most abundant taxa
Arrange samples by metadata or manually
Flip axes for temporal/stratigraphic data

4.3 Dendrogram

Purpose: Analyze sample relationships through clustering

Features:

Multiple distance metrics available
Group samples by metadata categories
Adjustable label appearance

4.4 Alpha Diversity

Purpose: Measure within-sample diversity

Available Metrics:

Shannon diversity
Simpson diversity
Other standard indices

Customization:

Group by metadata categories
Adjust sample ordering and colors
Modify axis orientation and labels

4.5 Beta Diversity & Ordination

Purpose: Examine between-sample diversity patterns

Ordination Methods:

NMDS (Non-metric Multidimensional Scaling)
PCoA (Principal Coordinates Analysis)
tSNE (t-distributed Stochastic Neighbor Embedding)
Additional methods available

Distance Measures:

Bray-Curtis (recommended for abundance data)
Jaccard (presence/absence)
UniFrac (requires phylogenetic tree)

Statistical Testing:

PERMANOVA: Test for significant group differences

4.6 Metadata Analysis

Purpose: Correlate microbial communities with numerical environmental variables

Variables: pH, temperature, age, BMI, etc.

Analysis Steps:

Select Numeric Columns: Choose numerical variables.
Create Environment Dataset: Click “Create trans_env”
Choose Analysis Type:
- RDA (Redundancy Analysis): Quantify variation explained by metadata
- Correlation: Direct relationships between genera and variables
- Mantel Test: Distance-based correlations

⚠️ Critical: Only select columns with numeric data. Text or factor data will cause errors.

4.7 Regression Analysis

Purpose: Model relationships between specific taxa and environmental variables

Workflow:

Select target taxon (e.g., specific genus)
Choose numeric environmental variable
Optionally group by metadata categories
Generate linear model plots

5. Tips & Best Practices

General Usage

Tooltips: Hover over inputs for contextual help
Manual Ordering: Use comma-separated values for custom sample order
Dynamic Updates: Plots refresh automatically with new selections

Performance Optimization

Large Datasets: Consider running locally for better performance
Demo Files: Practice with sample datasets included in this repository (see data/)

Common Issues & Solutions

Slow Performance: Use local installation for large datasets
Analysis Errors: Ensure metadata columns are properly formatted (numeric vs. categorical)
Missing Results: Check that filtering step was completed successfully

6. Running MicrobeStudio Locally

For optimal performance with large datasets, run MicrobeStudio on your local machine.

Prerequisites

R (version 4.0 or higher)
RStudio (recommended)
Git (for cloning repository)

Installation Steps

Step 1: Clone Repository

git clone https://github.com/shanptom/MicrobeStudio.git ~/Path/to/your/folder

Step 2: Install Dependencies

Run this command in R to install required packages:

source("Path/to/install_dep.R")

Step 3: Launch Application

library(shiny)
runApp('/Path/to/MicrobeStudio')

HPC Cluster Setup (Optional)

For High-Performance Computing environments, load required modules first:

Example for Ohio Supercomputer Center (OSC):

ml gcc/12.3.0
ml R/4.4.0
ml gdal/3.7.3
ml proj/9.2.1
ml geos/3.12.0
R

🖥️ Note: Modify module commands based on your specific HPC environment.

Getting Help

GitHub Issues: Report bugs or request features
Documentation: Additional guides available in the repository

Creating a Phyloseq Object

The phyloseq package in R is a powerful and widely used tool for analyzing and visualizing microbial community data from metabarcoding studies. It integrates various types of data into a single structured object, enabling efficient and reproducible analysis.

A phyloseq object typically consists of the following components:

1. Count Table (OTU/ASV Table)

This table contains the number of times each taxon (e.g., OTU or ASV) was observed in each sample. Rows represent taxa and columns represent samples.

Example:

Taxa	Sample1	Sample2	Sample3
Taxa1	10	15	20
Taxa2	12	7	0
Taxa3	3	14	24

2. Taxonomy Table (Taxa Table)

This table describes the taxonomic classification of each taxon. Each row corresponds to a taxon, and columns represent taxonomic levels such as Kingdom, Phylum, Class, Order, Family, Genus, and Species.

Example:

Taxa	Kingdom	Phylum	Class	Order	Family	Genus	Species
Taxa1	Bacteria	Proteobacteria	Alphaproteobacteria	Rhizobiales	Rhizobiaceae	Rhizobium	R. leguminosarum
Taxa2	Bacteria	Actinobacteria	Actinobacteria	Actinomycetales	Micrococcaceae	Arthrobacter	A. globiformis
Taxa3	Bacteria	Firmicutes	Bacilli	Bacillales	Bacillaceae	Bacillus	B. subtilis

3. Sample Metadata

This table includes descriptive information about each sample, such as sampling location, time point, environmental variables, or experimental conditions.

Example:

SampleID	Location	pH	Temperature	Season	Treatment
Sample1	Lake_North	7.2	15.3°C	Spring	Control
Sample2	Lake_South	6.8	18.1°C	Summer	Treated
Sample3	Lake_East	7.0	14.7°C	Fall	Control

4. Reference Sequences (Optional)

These are the actual DNA sequences (usually ASVs or OTUs) representing each taxon, often in FASTA format. This component is useful for downstream functional or phylogenetic analysis.

5. Phylogenetic Tree (Optional)

A phylogenetic tree inferred from the reference sequences, often using tools such as DECIPHER, phangorn, or external software (e.g., FastTree). It helps in calculating phylogeny-aware metrics like UniFrac distances.

6. Creating a Phyloseq object from csv files

Expected Input Files

File Name	Description
`count_table.csv`	ASV/OTU count table (taxa x samples)
`taxonomy_table.csv`	Taxonomic classification per ASV
`metadata.csv`	Sample metadata (samples x variables)

Required Package

Open R and Install latest version of Phyloseq from here

library(phyloseq)

1. Load the Taxonomy Table

The taxonomy table should have taxa/ASVs as row names and taxonomic ranks as columns.

Code:

tax <- read.csv("taxonomy_table.csv", header = TRUE, row.names = 1)
tax <- tax_table(as.matrix(tax))

2. Load the ASV/OTU Count Table

The count table should have ASVs as row names and samples as column names.

Code:

asv_counts <- read.csv("asv_table.csv", header = TRUE, row.names = 1)
asv_counts <- otu_table(as.matrix(asv_counts), taxa_are_rows = TRUE)

3. Load the Sample Metadata

The metadata should have sample names as row names.

Code:

meta <- read.csv("metadata.csv", header = TRUE, row.names = 1)
meta <- sample_data(meta)

4. Combine into a Phyloseq Object

Now combine all components into a single phyloseq object:

physeq <- phyloseq(asv_counts, tax, meta)

You can now use this physeq object for downstream analysis and visualization in phyloseq.

Summary

A complete phyloseq object provides a coherent structure for integrating and analyzing the following components:

Count Data → What taxa are found in each sample and how abundant?
Taxonomic Classification → What are these taxa?
Sample Metadata → What conditions or contexts do the samples represent?
(Optional) DNA Sequences & Phylogenies → What are the evolutionary relationships between taxa?

Together, these elements enable robust exploration of microbial community composition, structure, and relation to metadata.

Function	Description
`read.csv()`	Load all data tables as CSV files
`as.matrix()`	Convert to matrix for phyloseq input
`tax_table()`	Create taxonomic data
`otu_table()`	Create ASV count data
`sample_data()`	Create sample metadata
`phyloseq()`	Combine into a single phyloseq object

Constructing a Phylogenetic Tree from Reference Sequences in a Phyloseq Workflow

Phylogenetic trees are essential for calculating phylogeny-aware diversity metrics like UniFrac and for conducting null model analyses. Below is a detailed step-by-step guide to build a phylogenetic tree from your reference sequences, starting from a phyloseq object or an external FASTA file.

1. Extract Reference Sequences

You can extract reference sequences directly from a phyloseq object or load them from an external FASTA file. Ensure the sequence names (headers) exactly match the taxon/ASV names in your count table.

Option A: From Phyloseq Object

refseq <- ps@refseq

where ps is your phyloseq object

Option B: From FASTA File

library(Biostrings)
refseq <- readDNAStringSet("refseqs.fasta")

⚠️ Important: Taxon/ASV names in the FASTA headers must exactly match the taxa names in your OTU/ASV table.

2. Multiple Sequence Alignment

Use the DECIPHER package to align sequences. This step is compute-intensive; request sufficient resources if running on an HPC cluster (e.g., OSC).

library(DECIPHER)
alignment <- AlignSeqs(refseq, anchor = NA, processors = 48)

3. Convert Alignment to PhyDat Format

Convert the aligned sequences into a format suitable for distance and tree calculations.

library(phangorn)
phang.align <- phyDat(as(alignment, "matrix"), type = "DNA")

4. Calculate Distance Matrix

Use maximum likelihood distance based on the alignment.

dm <- dist.ml(phang.align)

5. Construct Initial Tree Using Neighbor Joining

treeNJ <- NJ(dm)

6. Fit Tree Using the pml Function

fit <- pml(treeNJ, data = phang.align)

7. Update Tree Using GTR Model (with inv/gamma options)

This step improves the model fit. However, it is computationally intensive, especially with large datasets. Use update() to test parameters incrementally to avoid crashes.

fitGTR <- update(fit, k = 4, inv = 0.2)

🧠 Tip:

If update() crashes or takes too long, start with lower values like k = 2 and inv = 0.

Increase k (number of gamma rate categories) and inv (proportion of invariant sites) gradually.

You can use AIC(fitGTR) to evaluate model fit.

8. Save the Fitted Tree Object

saveRDS(fitGTR, file = "fitGTR.rds")

9. Combine with `Phyloseq` object

Use this tree in downstream diversity analyses, like UniFrac, or integrate it into your phyloseq object using phy_tree().

ps <- merge_phyloseq(ps, phy_tree(fitGTR$tree))

Notes on Runtime

Tree construction is a time-consuming process.
For ~90 samples and 48 cores on HPC, this step can take 6-8 hours.
Always request computational resources based on your dataset size.

Summary

Step	Description
1.	Extract reference sequences (from phyloseq or FASTA)
2.	Align sequences with `DECIPHER::AlignSeqs`
3.	Convert alignment to `phyDat`
4.	Compute distance matrix with `phangorn::dist.ml`
5.	Build initial NJ tree
6.	Fit model with `pml()`
7.	Optimize with `update()` using GTR
8.	Save tree object for later use

Data Input

Browse

OR

Browse

OR

Browse

BIOM v1 (JSON) or v2 (HDF5). Optionally add metadata CSV above.

Load Demo Data

Select Demo Dataset:

Uploaded files are stored temporarily on the server.

Data Filtering

Normalization Options

Apply rarefaction

Normalize by TSS

Taxa Filters

Dataset Summary

Download Filtered Phyloseq (.rds)

Plot Controls

Axis Text Size

Sample Label Size

Show Sample Labels

Loading...

Download Plot (PDF)

Plot Settings

Plot Type

Number of Top Taxa

Axis Text Size

Flip axes (horizontal plot)

Loading...

Download Plot (PDF)

Diversity Indices

Select Diversity Index

Observed

Chao1

ACE

Shannon

Simpson

InvSimpson

Fisher

Flip axes (horizontal plot)

Text Label Size

Download Plot (PDF) Download Table (CSV)

Loading...

Download Plot (PDF)

Ordination Info:

Analysis Summary:

Loading...

Download Plot (PDF)

About PERMANOVA: Tests for differences in community composition between groups using permutations of a distance matrix (adonis2). R² indicates effect size (proportion of variance explained). Consider checking homogeneity of dispersion (PERMDISP) to validate assumptions.

PERMDISP:

Download CSV

MicrobeStudio

What You Can Do

Diversity Analysis

Community Profiling

Statistical Testing

Environmental Correlations

How It Works

Upload

Filter & Normalize

Analyze & Export

Analysis Modules

Upload Data

Filter & Normalize

Rarefaction Curves

Abundance Plots

Alpha Diversity

Dendrogram

Ordination

PERMANOVA

Metadata Analysis

Regression

Indicator Species

MicrobeStudio User Manual

1. Getting Started

Quick Start Workflow

2. Data Upload

Option A: CSV Files (Recommended for beginners)

Option B: phyloseq Object (.rds)

3. Data Filtering

Available Filters

How to Apply Filters

4. Analysis Modules

4.1 Rarefaction Plot

4.2 Abundance Analysis

4.3 Dendrogram

4.4 Alpha Diversity

4.5 Beta Diversity & Ordination

4.6 Metadata Analysis

4.7 Regression Analysis

5. Tips & Best Practices

General Usage

Performance Optimization

Common Issues & Solutions

6. Running MicrobeStudio Locally

Prerequisites

Installation Steps

Step 1: Clone Repository

Step 2: Install Dependencies

Step 3: Launch Application

HPC Cluster Setup (Optional)

Getting Help

Creating a Phyloseq Object

1. Count Table (OTU/ASV Table)

2. Taxonomy Table (Taxa Table)

3. Sample Metadata

4. Reference Sequences (Optional)

5. Phylogenetic Tree (Optional)

6. Creating a Phyloseq object from csv files

Required Package

1. Load the Taxonomy Table

2. Load the ASV/OTU Count Table

3. Load the Sample Metadata

4. Combine into a Phyloseq Object

Summary

Constructing a Phylogenetic Tree from Reference Sequences in a Phyloseq Workflow

1. Extract Reference Sequences

Option A: From Phyloseq Object

Option B: From FASTA File

2. Multiple Sequence Alignment

3. Convert Alignment to PhyDat Format

4. Calculate Distance Matrix

5. Construct Initial Tree Using Neighbor Joining

6. Fit Tree Using the pml Function

7. Update Tree Using GTR Model (with inv/gamma options)

8. Save the Fitted Tree Object

9. Combine with Phyloseq object

Notes on Runtime

Summary

Data Input

Load Demo Data

9. Combine with `Phyloseq` object