Microbial Profiles
Get microbial profiles with MetaPhlAn and HUMAnN
Comprehensive taxonomic and functional profiling of microbial communities using MetaPhlAn and HUMAnN.
Description
The microbial_profiles workflow performs comprehensive microbial community analysis using state-of-the-art tools. This workflow provides both taxonomic and functional profiling:
- Taxonomic profiling - Species-level identification and abundance using MetaPhlAn
- Functional profiling - Gene family and pathway analysis using HUMAnN
Parameters
Required Parameters
| Parameter | Type | Description |
|---|---|---|
--input |
File path | Input .csv file for microbial profiles |
Global Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
--outdir |
No | results |
Output directory |
--help |
No | - | Display help information |
--debug |
No | false |
Enable debug mode |
--config |
No | - | Custom configuration file |
Syntax
metagear microbial_profiles --input INPUT_FILE [GLOBAL_OPTIONS]
Examples
Basic Usage
# Run microbial profiling with default settings
metagear microbial_profiles --input samples.csv
# Run with custom output directory
metagear microbial_profiles --input samples.csv --outdir profiles_output
# Enable debug mode for troubleshooting
metagear microbial_profiles --input samples.csv --debug
Preview Mode
# Generate script without executing
metagear microbial_profiles --input samples.csv --preview
This will create a metagear_microbial_profiles.sh script that can be reviewed and executed manually.
Input Format
The input CSV file should contain sample information with the following columns:
sample,fastq_1,fastq_2
SAMPLE-01,/path/to/sample1_R1.fastq.gz,/path/to/sample1_R2.fastq.gz
SAMPLE-02,/path/to/sample2_R1.fastq.gz,/path/to/sample2_R2.fastq.gz
SAMPLE-03,/path/to/sample3_R1.fastq.gz,/path/to/sample3_R2.fastq.gz
Input Data Requirements
- Quality-controlled reads: Input should be quality-controlled and contamination-removed reads
- Sufficient depth: Adequate sequencing depth for reliable taxonomic and functional profiling
- Paired-end preferred: While single-end reads are supported, paired-end provides better resolution
Output
The workflow generates comprehensive profiling results in the following directory structure:
outdir/
├── metaphlan/ # MetaPhlAn taxonomic profiling results
│ ├── {sample}_microbial_profile.txt # Individual taxonomic profiles
│ ├── {sample}.biom # BIOM format profiles
│ ├── {sample}.bowtie2out.txt # Alignment files (optional)
│ └── merged_microbial_profiles.txt # Combined abundance table across samples
├── humann/ # HUMAnN functional profiling results
│ ├── {sample}/ # Per-sample HUMAnN output directories
│ │ ├── {sample}_qc_genefamilies_cpm.tsv # Gene family abundances (CPM normalized)
│ │ ├── {sample}_qc_pathabundance_cpm.tsv # Pathway abundances (CPM normalized)
│ │ └── {sample}_qc_pathcoverage.tsv # Pathway coverage
│ ├── gene_families_merged_profiles.tsv # Merged gene family profiles
│ └── path_abundances_merged_profiles.tsv # Merged pathway abundance profiles
├── multiqc/ # Quality control and summary reports
│ ├── multiqc_report.html # Consolidated analysis report
│ ├── multiqc_data/ # Parsed statistics and data
│ └── multiqc_plots/ # Static plot images
└── pipeline_info/ # Pipeline execution metadata
├── execution_report.html # Nextflow execution report
├── execution_timeline.html # Processing timeline
└── execution_trace.txt # Resource usage tracking
Key Output Files
Taxonomic profiling results:
metaphlan/merged_microbial_profiles.txt- Combined species abundance table across all samplesmetaphlan/{sample}_microbial_profile.txt- Individual taxonomic profiles per samplemetaphlan/{sample}.biom- BIOM format profiles for downstream analysis tools
Functional profiling results:
humann/gene_families_merged_profiles.tsv- Combined gene family abundance matrixhumann/path_abundances_merged_profiles.tsv- Combined pathway abundance matrixhumann/{sample}/{sample}_qc_genefamilies_cpm.tsv- Per-sample gene family abundances (CPM normalized)humann/{sample}/{sample}_qc_pathabundance_cpm.tsv- Per-sample pathway abundances (CPM normalized)humann/{sample}/{sample}_qc_pathcoverage.tsv- Per-sample pathway coverage metrics
Analysis summary:
multiqc/multiqc_report.html- Comprehensive quality control and profiling summary report
Prerequisites
Before running this workflow:
- Install databases: Run
metagear download_databasesfirst - Quality control: Run
qc_dna(for DNA data) orqc_rna(for RNA data) on raw data first - Sufficient computational resources: Profiling can be computationally intensive
- Adequate disk space: Profile generation requires substantial storage
Recommended Workflow Order
# 1. Download databases (run once)
metagear download_databases
# 2. Quality control
metagear qc_dna --input raw_samples.csv
# 3. Microbial profiling (using QC'd data)
metagear microbial_profiles --input qc_samples.csv
Analysis Features
Taxonomic Profiling (MetaPhlAn)
- Species-level taxonomic assignment
- Relative abundance estimation
- Novel clade detection
Functional Profiling (HUMAnN)
- Gene family abundance quantification
- Metabolic pathway reconstruction
- Pathway coverage and abundance
- Species-stratified functional profiles
Performance Considerations
- Memory requirements: HUMAnN requires substantial RAM (8-16GB recommended)
- Processing time: Can take several hours for large datasets
- Database size: MetaPhlAn and HUMAnN databases are large (>10GB)
- I/O intensive: Frequent disk access during processing
Troubleshooting
Common issues and solutions:
- Database not found: Ensure
download_databasescompleted successfully - Memory errors: Increase available RAM or adjust configuration
- Low profiling rate: May indicate poor quality input or low microbial content
- Missing species: Some novel species may not be in reference databases