knitr::opts_chunk$set(message = FALSE,
               warning = FALSE,
               echo    = TRUE,
               include = TRUE,
               eval    = FALSE,
               comment = "")

library(tidyverse)
library(conflicted)
library(bslib)
library(htmltools)

Introduction

Some Background

Microbiome data analysis has rapidly evolved into a cornerstone of biological and ecological research, offering insights into how microbial communities influence everything from human health to environmental ecosystems. However, this type of analysis often involves multiple complex steps: data normalization, diversity calculations, community composition comparisons, and advanced visualizations.

For more information on some of the statistical tests I often use/recommend, see the tutorial in this directory called Data_Notes.

MicroEco

The microeco R package provides an elegant and comprehensive solution by integrating many of the most current and popular microbiome analysis approaches into a unified framework. This package simplifies workflows, making it easy to prepare datasets, calculate metrics, and create publication-quality visualizations. Importantly, microeco is designed to work seamlessly with ggplot2 and other widely used R packages, offering flexibility for customization and compatibility with established workflows. If you click the link above, you will find a very comprehensive tutorial presenting the full array of analysis options.

In this workflow, we will prepare our local hard drive and working directory for use with microeco by properly installing the package and all its dependencies.

Install MicroEco

Main Dependencies

I am following the linux/mac instructions. See the microeco tutorial for windows-specific code.

# If devtools package is not installed, first install it
install.packages("devtools")

Tax4Fun2 Dependencies

Microeco leverages Tax4Fun2 to predict the functional potential of microbial communities based on 16S rRNA gene data on the KEGG database. Tax4Fun translates taxonomic profiles into functional profiles by mapping taxa to KEGG Orthologs (KOs) using pre-trained databases. This integration allows researchers to estimate the metabolic capabilities of microbial communities, such as energy production, nutrient cycling, and ecological interactions.

I will provide a series of steps below to set this up for your first run, but keep in mind that this sometimes works differently based on operating systems. I also recommend you set up your directory structure to follow mine if you are working from a clone of the github repository. Unlike other dependencies, I do not load these to the repo, as they take up too much unecessary space.

I also recommend you check on the instructions in the microeco tutorial, where they provide alternative methods for these steps.

Why Tax4Fun2? Tax4Fun2 is efficient and user-friendly, making it ideal for functional predictions when shotgun metagenomics data isn’t available. However, setting up Tax4Fun2 and its dependencies (e.g., reference databases) can be challenging but crucial for reliable results.

Download and extract reference database

  • Click here to download the database
  • Transfer the downloaded, zipped directory to the location where you want to extract everything.
  • Run the chunk below to extract all files.
tar -xvzf Tax4Fun2_ReferenceData_v2.tar.gz

Download the Blast+ Tools

This code only works for newer MacOS systems

Run the following code in your terminal from the same directory as the last step:

curl -O ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.16.0+-aarch64-macosx.tar.gz

tar -xvzf ncbi-blast-2.16.0+-aarch64-macosx.tar.gz

Make Sure the Extracted Downloads are Executable

  • Run the following chunk from the same directory again:
chmod 755 ncbi-blast-2.16.0+/bin/*
  
chmod -R 755 Tax4Fun2_ReferenceData_v2/*

Futher Troubleshooting

If you are on a mac and still get an error about executing blastn after running the Tax4Fun2 commands, then you may still need to go to your security preferences and select “Open anyway” where it says something about blocking Blastn.

You can also run the chunk below from the R console to make sure R is able to read and write to the directory.

dir_path <- "setup/microbiome/Tax4Fun2_ReferenceData_v2/Ref99NR/"
if (file.access(dir_path, mode = 4) == 0) {
  print("Directory is readable.")
} else {
  print("Directory is not accessible.")
}

MecoDev Dependencies

MicroEco’s tutorial excerpt above guided us through installation of several dependencies, including BEEM-static, but to work with true longitudinal data, we want to use the original BEEM package instead. This is where microeco’s mecodev package comes in handy. Mecodev provides a set of extended classes based on the microeco package, including trans_ts, a class designed for handling time-series data.

Install mecodev and dependencies

mecodev
devtools::install_github("ChiLiubio/mecodev")
Dependencies
# For linux or mac
install.packages("doMC")
# Then install the following packages
install.packages("lokern")
install.packages("monomvn")
install.packages("pspline")
install.packages("seqinr", dependencies = TRUE)
devtools::install_github('csb5/beem')

Bioconductor Packages

install.packages("file2meco", repos = BiocManager::repositories())
install.packages("MicrobiomeStat", repos = BiocManager::repositories())
install.packages("WGCNA", repos = BiocManager::repositories())
BiocManager::install("ggtree")
BiocManager::install("metagenomeSeq")
BiocManager::install("ALDEx2")
BiocManager::install("ANCOMBC")

mecodev Package

# Then install the following packages
install.packages("lokern")
install.packages("monomvn")
install.packages("pspline")

mecoturn Package

install.packages("mecoturn")
# check and install dependent packages
packages <- c("agricolae")
for(x in packages){
    if(!require(x, character.only = TRUE)){
        install.packages(x)
    }
}