Microbiome Analysis Platform

MicrobeStudio

From raw amplicon data to publication-ready insights — no coding required

10+
Analysis Modules
PDF
Export All Plots
4
Input Formats
Statistical Tests

What You Can Do

Diversity Analysis

Measure alpha diversity (Shannon, Simpson, Chao1), compare groups with Kruskal-Wallis tests, and assess sampling adequacy with rarefaction curves

Community Profiling

Visualize taxonomic composition with bar plots, heatmaps, and dendrograms at any rank from Phylum to Species

Statistical Testing

Run PERMANOVA, PERMDISP, and ordination (NMDS, PCoA, t-SNE) with significance testing and downloadable results

Environmental Correlations

RDA, Mantel tests, regression analysis, and SHAP-based indicator species identification linking taxa to metadata

How It Works

1

Upload

Drag & drop phyloseq .rds, BIOM, or CSV files

2

Filter & Normalize

Remove rare taxa, apply rarefaction or TSS

3

Analyze & Export

Explore modules, download plots as PDF

Analysis Modules

Upload Data

Load .rds, BIOM, or CSV files

Filter & Normalize

Remove taxa, apply TSS or rarefaction

Rarefaction Curves

Assess sampling depth adequacy

Abundance Plots

Bar plots, heatmaps, line charts

Alpha Diversity

Shannon, Simpson, Chao1 with statistics

Dendrogram

Hierarchical clustering of samples

Ordination

NMDS, PCoA, t-SNE with stress info

PERMANOVA

Statistical testing of group differences

Regression

Taxa vs. environmental variables

Indicator Species

SHAP-based biomarker identification

Powered by
phyloseq vegan microeco ggplot2 XGBoost phylosmith

MicrobeStudio User Manual

MicrobeStudio is an interactive Shiny application for exploring microbiome data through powerful visual analytics and statistical tools. Designed for streamlined phyloseq analysis, this guide walks you through each feature and provides detailed information on input formats to ensure a smooth experience.


1. Getting Started

Quick Start Workflow

  1. Upload your data (CSV files or phyloseq object)
  2. Filter unwanted taxa and normalize data
  3. Explore using various analysis modules
  4. Customize visualizations and export results

2. Data Upload

Choose one of two supported formats:

Upload three separate tables:

File Type Description Format
Count Table Abundance data Rows = ASVs/Taxa, Columns = Samples
Taxonomy Table Taxonomic classifications Rows = ASVs/Taxa, Columns = Taxonomic ranks
Metadata Table Sample information Rows = Samples, Columns = Metadata variables

Option B: phyloseq Object (.rds)

A pre-assembled R object containing all components in a single file.

💡 Pro Tip: For phylogeny-based analyses, your phyloseq object must include a phylogenetic tree. See the Phyloseq creation guide below for instructions.


3. Data Filtering

Clean and prepare your data for analysis:

Available Filters

  • Remove Unwanted Taxa: Filter out contaminants (e.g., Chloroplast, Mitochondria)
  • Rarefaction: Normalize sequencing depth across samples (optional but recommended)

How to Apply Filters

  1. Specify taxa to remove by name
  2. Choose rarefaction depth (if desired)
  3. Click Apply Filtering

🔄 Important: Once filtering is complete, the cleaned dataset is used across all analysis modules automatically.


4. Analysis Modules

4.1 Rarefaction Plot

Purpose: Assess sequencing depth adequacy

Features:

  • Visualize depth distribution across samples
  • Color-code and organize by metadata categories
  • Identify samples with insufficient coverage

4.2 Abundance Analysis

Purpose: Explore taxonomic composition

Plot Types:

  • Bar Plots: Stacked relative abundance
  • Line Plots: Abundance trends over samples/conditions
  • Heatmaps: Color-coded abundance matrices

Customization Options:

  • Select taxonomic rank (Genus, Family, etc.)
  • Focus on top N most abundant taxa
  • Arrange samples by metadata or manually
  • Flip axes for temporal/stratigraphic data

4.3 Dendrogram

Purpose: Analyze sample relationships through clustering

Features:

  • Multiple distance metrics available
  • Group samples by metadata categories
  • Adjustable label appearance

4.4 Alpha Diversity

Purpose: Measure within-sample diversity

Available Metrics:

  • Shannon diversity
  • Simpson diversity
  • Other standard indices

Customization:

  • Group by metadata categories
  • Adjust sample ordering and colors
  • Modify axis orientation and labels

4.5 Beta Diversity & Ordination

Purpose: Examine between-sample diversity patterns

Ordination Methods:

  • NMDS (Non-metric Multidimensional Scaling)
  • PCoA (Principal Coordinates Analysis)
  • tSNE (t-distributed Stochastic Neighbor Embedding)
  • Additional methods available

Distance Measures:

  • Bray-Curtis (recommended for abundance data)
  • Jaccard (presence/absence)
  • UniFrac (requires phylogenetic tree)

Statistical Testing:

  • PERMANOVA: Test for significant group differences

4.6 Metadata Analysis

Purpose: Correlate microbial communities with numerical environmental variables

Variables: pH, temperature, age, BMI, etc.

Analysis Steps:

  1. Select Numeric Columns: Choose numerical variables.
  2. Create Environment Dataset: Click “Create trans_env”
  3. Choose Analysis Type:
    • RDA (Redundancy Analysis): Quantify variation explained by metadata
    • Correlation: Direct relationships between genera and variables
    • Mantel Test: Distance-based correlations

⚠️ Critical: Only select columns with numeric data. Text or factor data will cause errors.

4.7 Regression Analysis

Purpose: Model relationships between specific taxa and environmental variables

Workflow:

  1. Select target taxon (e.g., specific genus)
  2. Choose numeric environmental variable
  3. Optionally group by metadata categories
  4. Generate linear model plots

5. Tips & Best Practices

General Usage

  • Tooltips: Hover over inputs for contextual help
  • Manual Ordering: Use comma-separated values for custom sample order
  • Dynamic Updates: Plots refresh automatically with new selections

Performance Optimization

  • Large Datasets: Consider running locally for better performance
  • Demo Files: Practice with sample datasets included in this repository (see data/)

Common Issues & Solutions

  • Slow Performance: Use local installation for large datasets
  • Analysis Errors: Ensure metadata columns are properly formatted (numeric vs. categorical)
  • Missing Results: Check that filtering step was completed successfully

6. Running MicrobeStudio Locally

For optimal performance with large datasets, run MicrobeStudio on your local machine.

Prerequisites

  • R (version 4.0 or higher)
  • RStudio (recommended)
  • Git (for cloning repository)

Installation Steps

Step 1: Clone Repository

git clone https://github.com/shanptom/MicrobeStudio.git ~/Path/to/your/folder

Step 2: Install Dependencies

Run this command in R to install required packages:

source("Path/to/install_dep.R")

Step 3: Launch Application

library(shiny)
runApp('/Path/to/MicrobeStudio')

HPC Cluster Setup (Optional)

For High-Performance Computing environments, load required modules first:

Example for Ohio Supercomputer Center (OSC):

ml gcc/12.3.0
ml R/4.4.0
ml gdal/3.7.3
ml proj/9.2.1
ml geos/3.12.0
R

🖥️ Note: Modify module commands based on your specific HPC environment.


Getting Help

  • GitHub Issues: Report bugs or request features
  • Documentation: Additional guides available in the repository


Creating a Phyloseq Object

The phyloseq package in R is a powerful and widely used tool for analyzing and visualizing microbial community data from metabarcoding studies. It integrates various types of data into a single structured object, enabling efficient and reproducible analysis.

A phyloseq object typically consists of the following components:


1. Count Table (OTU/ASV Table)

This table contains the number of times each taxon (e.g., OTU or ASV) was observed in each sample. Rows represent taxa and columns represent samples.

Example:

Taxa Sample1 Sample2 Sample3
Taxa1 10 15 20
Taxa2 12 7 0
Taxa3 3 14 24

2. Taxonomy Table (Taxa Table)

This table describes the taxonomic classification of each taxon. Each row corresponds to a taxon, and columns represent taxonomic levels such as Kingdom, Phylum, Class, Order, Family, Genus, and Species.

Example:

Taxa Kingdom Phylum Class Order Family Genus Species
Taxa1 Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Rhizobiaceae Rhizobium R. leguminosarum
Taxa2 Bacteria Actinobacteria Actinobacteria Actinomycetales Micrococcaceae Arthrobacter A. globiformis
Taxa3 Bacteria Firmicutes Bacilli Bacillales Bacillaceae Bacillus B. subtilis

3. Sample Metadata

This table includes descriptive information about each sample, such as sampling location, time point, environmental variables, or experimental conditions.

Example:

SampleID Location pH Temperature Season Treatment
Sample1 Lake_North 7.2 15.3°C Spring Control
Sample2 Lake_South 6.8 18.1°C Summer Treated
Sample3 Lake_East 7.0 14.7°C Fall Control

4. Reference Sequences (Optional)

These are the actual DNA sequences (usually ASVs or OTUs) representing each taxon, often in FASTA format. This component is useful for downstream functional or phylogenetic analysis.


5. Phylogenetic Tree (Optional)

A phylogenetic tree inferred from the reference sequences, often using tools such as DECIPHER, phangorn, or external software (e.g., FastTree). It helps in calculating phylogeny-aware metrics like UniFrac distances.


6. Creating a Phyloseq object from csv files

Expected Input Files

File Name Description
count_table.csv ASV/OTU count table (taxa x samples)
taxonomy_table.csv Taxonomic classification per ASV
metadata.csv Sample metadata (samples x variables)

Required Package

Open R and Install latest version of Phyloseq from here

library(phyloseq) 

1. Load the Taxonomy Table

The taxonomy table should have taxa/ASVs as row names and taxonomic ranks as columns.

Code:

tax <- read.csv("taxonomy_table.csv", header = TRUE, row.names = 1)
tax <- tax_table(as.matrix(tax))

2. Load the ASV/OTU Count Table

The count table should have ASVs as row names and samples as column names.

Code:

asv_counts <- read.csv("asv_table.csv", header = TRUE, row.names = 1)
asv_counts <- otu_table(as.matrix(asv_counts), taxa_are_rows = TRUE)

3. Load the Sample Metadata

The metadata should have sample names as row names.

Code:

meta <- read.csv("metadata.csv", header = TRUE, row.names = 1)
meta <- sample_data(meta)

4. Combine into a Phyloseq Object

Now combine all components into a single phyloseq object:

physeq <- phyloseq(asv_counts, tax, meta)

You can now use this physeq object for downstream analysis and visualization in phyloseq.

Summary

A complete phyloseq object provides a coherent structure for integrating and analyzing the following components:

  • Count Data → What taxa are found in each sample and how abundant?
  • Taxonomic Classification → What are these taxa?
  • Sample Metadata → What conditions or contexts do the samples represent?
  • (Optional) DNA Sequences & Phylogenies → What are the evolutionary relationships between taxa?

Together, these elements enable robust exploration of microbial community composition, structure, and relation to metadata.

Function Description
read.csv() Load all data tables as CSV files
as.matrix() Convert to matrix for phyloseq input
tax_table() Create taxonomic data
otu_table() Create ASV count data
sample_data() Create sample metadata
phyloseq() Combine into a single phyloseq object

Constructing a Phylogenetic Tree from Reference Sequences in a Phyloseq Workflow

Phylogenetic trees are essential for calculating phylogeny-aware diversity metrics like UniFrac and for conducting null model analyses. Below is a detailed step-by-step guide to build a phylogenetic tree from your reference sequences, starting from a phyloseq object or an external FASTA file.


1. Extract Reference Sequences

You can extract reference sequences directly from a phyloseq object or load them from an external FASTA file. Ensure the sequence names (headers) exactly match the taxon/ASV names in your count table.

Option A: From Phyloseq Object

refseq <- ps@refseq

where ps is your phyloseq object

Option B: From FASTA File

library(Biostrings)
refseq <- readDNAStringSet("refseqs.fasta")

⚠️ Important: Taxon/ASV names in the FASTA headers must exactly match the taxa names in your OTU/ASV table.


2. Multiple Sequence Alignment

Use the DECIPHER package to align sequences. This step is compute-intensive; request sufficient resources if running on an HPC cluster (e.g., OSC).

library(DECIPHER)
alignment <- AlignSeqs(refseq, anchor = NA, processors = 48)

3. Convert Alignment to PhyDat Format

Convert the aligned sequences into a format suitable for distance and tree calculations.

library(phangorn)
phang.align <- phyDat(as(alignment, "matrix"), type = "DNA")

4. Calculate Distance Matrix

Use maximum likelihood distance based on the alignment.

dm <- dist.ml(phang.align)

5. Construct Initial Tree Using Neighbor Joining

treeNJ <- NJ(dm)

6. Fit Tree Using the pml Function

fit <- pml(treeNJ, data = phang.align)

7. Update Tree Using GTR Model (with inv/gamma options)

This step improves the model fit. However, it is computationally intensive, especially with large datasets. Use update() to test parameters incrementally to avoid crashes.

fitGTR <- update(fit, k = 4, inv = 0.2)

🧠 Tip:

  • If update() crashes or takes too long, start with lower values like k = 2 and inv = 0.

  • Increase k (number of gamma rate categories) and inv (proportion of invariant sites) gradually.

  • You can use AIC(fitGTR) to evaluate model fit.


8. Save the Fitted Tree Object

saveRDS(fitGTR, file = "fitGTR.rds")

9. Combine with Phyloseq object

Use this tree in downstream diversity analyses, like UniFrac, or integrate it into your phyloseq object using phy_tree().

ps <- merge_phyloseq(ps, phy_tree(fitGTR$tree))

Notes on Runtime

  • Tree construction is a time-consuming process.
  • For ~90 samples and 48 cores on HPC, this step can take 6-8 hours.
  • Always request computational resources based on your dataset size.

Summary

Step Description
1. Extract reference sequences (from phyloseq or FASTA)
2. Align sequences with DECIPHER::AlignSeqs
3. Convert alignment to phyDat
4. Compute distance matrix with phangorn::dist.ml
5. Build initial NJ tree
6. Fit model with pml()
7. Optimize with update() using GTR
8. Save tree object for later use
Ordination Info:
Analysis Summary:
Loading...
Loading...
Download Plot (PDF)
About PERMANOVA: Tests for differences in community composition between groups using permutations of a distance matrix (adonis2). indicates effect size (proportion of variance explained). Consider checking homogeneity of dispersion (PERMDISP) to validate assumptions.
PERMDISP:
Download CSV
Loading...
Loading...
Loading...
Download Plot (PDF)