This repository contains the scripts, tutorials, and templates for the Rich Lab’s mainstream bioinformatic workflows. You can find some of the current recommended tutorial versions of these workflows linked below.

Tutorial Options

`metadata_setup`

The purpose of a metadata file is to organize the potential independent or predictor variables for your analysis into a single table with one row per SampleID. Then, when you produce a set of potential dependent or outcome values, you organize those into a similar structure with the SampleIDs organized rowwise to streamline the process of matching predictor variables to outcome variables by SampleID.
It’s good practice to keep as much of your information in one tidy table as possible so that you can keep pulling from that source to further filter, wrangle and analyze without losing track of different versions and datasets over time.
That means you should brainstorm as many possible predictor variables you might use in downstream analysis as possible and organize them into one tidy table where each SampleID is matched to a value for every variable. You will end up ignoring most of these variables as you construct individual tests and visuals later, so consider this simply a rough draft of your information of interest.
This tutorial uses the Pygmy Loris dataset as an example. You may have a very different set of variables to organize for your own project.

Files Needed

compilation_loris.tsv - produced by the sample_inventory script
All other files depend on your specific variables of interest

Mine mostly focus on diet trials and environmental context for the Pygmy Loris subjects

Files Produced

samples_metadata.tsv - This file gets carried forward on several other scripts in this repository to help you build consistent predictor variables across datasets.

`microbiome_alignments.html`

Follow this tutorial to take new raw read alignment data produced by Epi2ME Lab’s wf-16s pipeline and convert it into fully formatted, tidy tables that you can normalize and analyze using microeco, phyloseq or a similar package.

Files Needed

samples_metadata.tsv - produced by the metadata_setup script
*_abundance_table_species.tsv - produced by the wf-16s pipeline in readprocessing_multiplex_16s
*_wf-16s-report.html - produced by the wf-16s pipeline in readprocessing_multiplex_16s

The tutorial provides instructions and code to download one .csv file per demultiplexed sample from this .html file for a given sequencing run and merge those into a table with raw alignment data

Files Produced

taxonomy_table.tsv - This table is required for microeco and several other metagenomic packages. It must contain exactly one row per taxon in your dataset.
otu_table.tsv - This table is required for microeco and several other metagenomic packages. It must contain exactly one row per taxon and one column per sample in your dataset. The numeric values represent the raw count of reads mapped to a given taxon within a given sample. We will want to normalize these before analyzing/interpreting them.
sample_table.tsv - This table is required for microeco and several other metagenomic packages. It must contain exactly one row per sample in your dataset. The columns contain metadata values that we may use as independent/predictor variables for interpreting our results.

`microbiome_references.html`

Follow this tutorial to use the NCBI accessions available in the read alignments you created in the previous script and use those to fetch one representative sequence per OTU in your dataset and then build a phylogenetic tree to match.

Files Needed

alignments.tsv - *raw alignment data gathered from the wf-16s output and organized into one table in the previous microbiome_alignments script*

Files Produced

refs_aligned_mafft.fasta - This contains one GenBank reference sequence for each taxon in your dataset that we can use to substitute for ASVs used in many approaches designed for short read 16S data.
refs_tree.treefile - This is a phylogenetic tree produced from the previous fasta multiple sequence alignment.

`microbiome_preprocess.html`

Follow this tutorial to construct, filter, and clean a microbiome dataset that will finally be ready for use in calculating basic summary stats/metrics.
- Keep in mind that the OTU table still stores count data, not normalized abundance or diversity values.

Files Needed

refs_aligned_mafft.fasta - This contains one GenBank reference sequence for each taxon in your dataset that we can use to substitute for ASVs used in many approaches designed for short read 16S data.
refs_tree.treefile - This is a phylogenetic tree produced from the previous fasta multiple sequence alignment.

Files Produced

phyloseq_genus.RData - Microbiome dataset stored as a phylo object with counts summed and trees collapsed to the Genus level.

Note that the script also contains code to replicate this and all other files at higher taxonomic levels (Family, Order, Class, Phylum)

tree_genus.RData - Only the phylogenetic tree stored in the phylo object.
phyloseq_melt_genus.RData - Long tibble version of the phylo object dataset.

General Script Advice

On Your First-Use of a Workflow

I created a secondary repository called workflows_first_use. This is where I will store some scripts/tutorials with specific instructions for setting up your working directory and environment for specialized packages (especially for microeco). You can find the tutorial version of those scripts under the First Use option in the menu bar above.

`params` option in the R Markdown Header

Here is what my default yaml header on most R Markdown documents in this repository look like:

---
output:
  html_document:
    theme:
      bslib: true
    css: journal.css
    toc: true
    toc_float: true
    df_print: paged
params:
  sampleset: "loris"
  seqrun: "hdz18"
                     
---

I use the params option for streamlining some automation across different scripts. You can use the sampleset setting under params in the header of this script to select which sampleset you will be working with. Everywhere in a chunk of code where I have written params$sampleset will be replaced with whichever strong you change your params value to in the header.

Sample Lists

I also created a simple script containing lists of samplesets, subjects, and sequencing runs, to help with iterative programming functions.

Show/Hide `inputs_read_processing.R` File

subjects <- list(
  marmoset = list(
    HAM  = "Hamlet",
    HER  = "Hera",
    JAR  = "JarJar BINKS",
    OPH  = "Ophelia",
    KUB  = "Kubo",
    KOR  = "Korra",
    WOL  = "Wolverine",
    IRI  = "Iris",
    GOO  = "Goose",
    LAM  = "Lambchop",
    FRA  = "Franc",
    IVY  = "Ivy",
    CHA  = "Charles",
    PAD  = "Padme",
    BUB  = "Bubblegum",
    GRO  = "Grogu",
    MAR  = "Marshmallow",
    BUD  = "Buddy",
    JOA  = "Joans",
    HEN  = "Henry",
    GIN  = "Ginger"
  ),
  loris = list(
    WARB = "Warble",
    CULI = "Culi"
  ),
  bats = list(
    UNKN = "Unknown",
    EPFU = "EPFU",
    MYLE = "MYLE",
    LABO = "LABO",
    MYCI = "MYCI",
    LANO = "LANO",
    NYHU = "NYHU"
  ),
  envir = list(
    UNK = "Unknown"
  ),
  isolates = list(
    UNK = "Unknown"
  )
)


seqruns <- list(
  loris     = as.list(c(paste0("hdz", 1:19))),
  marmoset  = as.list(sprintf("cm%03d", 1:10)),
  isolates  = as.list(paste0("salci", 1)),
  bats      = as.list(paste0("nwr", 1))
)

File Paths and the `config` Package

I also use config for streamlining and organization. If you want to do the same, you should create your own config.yml file in your project working directory. You can expand the content below to see what my example config.yml file looks like as well as the R script I use to integrate this with the params values for consistent file sourcing and writing across datasets.

Show/Hide `config.yml` File

default:
  setup: "setup/global/setup.R"
  conflicts: "setup/conflicted.R"
  functions: "setup/global/functions.R"
  packages: "setup/global/packages.R"
  inputs: "setup/default_inputs.R"
  fonts: "setup/global/fonts/FA6_Free-Solid-900.otf"
  test_prediction: "setup/microbiome/test_prediction/"
  visuals: "visuals/"
  tmp_tsv: "tmp/tmp_table.tsv"
  tmp_downloads: "tmp/downloads/"
  tmp_fetch: "tmp/fetch_references.txt"
  tmp_fasta3: "tmp/tmp3.fasta"
  tmp_fasta4: "tmp/tmp4.fasta"
  git_url: "https://rich-molecular-health-lab.github.io/"
  
isolates:
  samplesets: "salci" 
  minQC: 10
  
microbiome:
  micro_scripts: !expr list("setup/microbiome/functions.R", "setup/microbiome/inputs.R")
  setup_dir: "setup/microbiome"
  inputs: "setup/microbiome/inputs.R"
  packages: "setup/microbiome/packages.R"
  first_use_packages: "setup/microbiome/packages_first_use.R"
  functions: "setup/microbiome/functions.R"
  tax4fun: "setup/microbiome/Tax4Fun2_ReferenceData_v2/"
  Ref99NR: "setup/microbiome/Tax4Fun2_ReferenceData_v2/Ref99NR/"
  blast: "setup/microbiome/ncbi-blast-2.16.0+/bin"
  
swan:
  functions: "setup/read_processing/functions.R"
  loris_tax: "/work/richlab/aliciarich/bioinformatics_stats/data/loris/taxonomy/"
  uid_file: "/work/richlab/aliciarich/bioinformatics_stats/tmp/fetch_references.txt"
  logs: "/work/richlab/aliciarich/bioinformatics_stats/logs"
  ont_reads: "/work/richlab/aliciarich/ont_reads/"
  dorado_model: "/work/richlab/aliciarich/ont_reads/dna_r10.4.1_e8.2_400bps_sup@v5.0.0/"
  bioinformatics_stats: "/work/richlab/aliciarich/bioinformatics_stats"
  samplesheets: "/work/richlab/aliciarich/bioinformatics_stats/dataframes/sample_sheet/loris"
  scripts: "/work/richlab/aliciarich/bioinformatics_stats/batch_scripts/"
  accessions: "/work/richlab/aliciarich/bioinformatics_stats/tmp/accessions.txt"
  tmp_fasta1: "/work/richlab/aliciarich/bioinformatics_stats/tmp/tmp1.fasta"
  tmp_fasta2: "/work/richlab/aliciarich/bioinformatics_stats/tmp/tmp2.fasta"
  tmp_fasta3: "/work/richlab/aliciarich/bioinformatics_stats/tmp/tmp3.fasta"
  tmp_fasta4: "/work/richlab/aliciarich/bioinformatics_stats/tmp/tmp4.fasta"
  raw_loris_mb: "/work/richlab/aliciarich/ont_reads/loris_microbiome/hdz_raw"
  basecalled_loris_mb: "/work/richlab/aliciarich/ont_reads/loris_microbiome/basecalled"
  trimmed_loris_mb: "/work/richlab/aliciarich/ont_reads/loris_microbiome/trimmed"
  filtered_loris_mb: "/work/richlab/aliciarich/ont_reads/loris_microbiome/filtered"
  loris_mb_aligned: "/work/richlab/aliciarich/bioinformatics_stats/data/loris/taxonomy/refseqs_aligned.fasta"
  loris_mb_tree: "/work/richlab/aliciarich/bioinformatics_stats/data/loris/taxonomy/refseqs_tree.newick"

bats:
  sequencing:
    coverage: "visuals/bats_16s_depth_summary.html"
    depth_plot: "visuals/bats_16s_depth_hist.html"
  metadata:
    summary: "metadata/bats/samples_metadata.tsv"
    key: "metadata/bats/metadata_key.R"
    factors: "metadata/bats/factors.R"
    scripts: !expr list("metadata/bats/colors.R", "metadata/bats/metadata_key.R", "metadata/bats/factors.R")
  inventories:
    all_stages: "../read_processing/samples/inventories/bats/compilation_bats.tsv"
    collection: "../read_processing/samples/inventories/bats/samples_bats.csv"
    extraction: "../read_processing/samples/inventories/bats/extracts_bats.csv"
    libraries: "../read_processing/samples/inventories/bats/libraries_bats.csv"
    seqruns: "../read_processing/samples/inventories/bats/seqruns_bats.csv"
  barcodes_output: "../read_processing/samples/barcode_alignments/bats/"
  read_alignments: "data/bats/read_alignments"
  taxa_reps:
    aligned: "data/bats/taxonomy/refseqs_aligned.fasta"
    tree: "data/bats/taxonomy/refseqs_tree.newick"
    table: "data/bats/taxonomy/tax_table.tsv"

loris:
  day1: "2023-10-26"
  last: "2024-10-25"
  visuals:
    demographics: "visuals/loris/demography"
    coverage: "visuals/loris_depth_summary.html"
    depth_plot: "visuals/loris_depth_hist.html"
  AZAstudbooks:
    btp: "metadata/loris/Studbook/BTP25.R"
    btp_current: "metadata/loris/Studbook/BTP25.tsv"
    lifetable: "metadata/loris/Studbook/lifetable25.tsv"
    lifetabStatic: "metadata/loris/Studbook/lifetable25_static.tsv"
    load_data: "metadata/loris/Studbook/LoadStudbookData.R"
    functions: "metadata/loris/studbookFunctions.R"
    reactables: "metadata/loris/Studbook/reactables.R"
    living25: "metadata/loris/Studbook/LivingPop_2025.csv"
    living21: "metadata/loris/Studbook/LivingPop_2021.csv"
    historic21: "metadata/loris/Studbook/HistoricPop_2021.csv"
    institutions: 
      current25: "metadata/loris/Studbook/CurrentInstitutions_2025.csv"
      current21: "metadata/loris/Studbook/CurrentInstitutions_2021.csv"
      historic21: "metadata/loris/Studbook/HistoricInstitutions_2021.csv"
    working: "metadata/loris/Studbook/working_studbook_loris25.tsv"
    reactable_ready: "metadata/loris/Studbook/studbook_loris_reactableReady.tsv"
    timeline: "metadata/loris/Studbook/working_timeline_loris25.tsv"
  metadata: 
    scripts: !expr list("metadata/loris/colors.R", "metadata/loris/metadata_key.R", "metadata/loris/diet_schedule.R")
    bristol: "metadata/loris/bristols.tsv"
    studbook: "metadata/loris/subjects_loris.csv"
    summary: "metadata/loris/samples_metadata.tsv"
    key: "metadata/loris/metadata_key.R"
    factors: "metadata/loris/factors.R"
    foods: "metadata/loris/foods.tsv"
    proteins: "metadata/loris/proteins.tsv"
    fats: "metadata/loris/fats.tsv"
    CHOs: "metadata/loris/CHOs.tsv"
    Ash: "metadata/loris/Ash.tsv"
    vitamins: "metadata/loris/vitamins.tsv"
    reactable: "metadata/loris/loris_metadata_summary.html"
    sample_table: 
      identifiers: "metadata/loris/identifier_key.tsv"
      main: "metadata/loris/sample_table.tsv"
      merged: "metadata/loris/sample_table_merged.tsv"
  inventories:
    all_stages: "../read_processing/samples/inventories/loris/compilation_loris.tsv"
    collection: "../read_processing/samples/inventories/loris/samples_loris.csv"
    extraction: "../read_processing/samples/inventories/loris/extracts_loris.csv"
    libraries: "../read_processing/samples/inventories/loris/libraries_loris.csv"
    seqruns: "../read_processing/samples/inventories/loris/seqruns_loris.csv"
  outputs_wf16s: "data/loris/outputs_wf16s/"
  barcodes_output: "dataframes/barcodes/loris/"
  read_alignments: "data/loris/read_alignments"
  taxa_reps:
    aligned: "data/loris/taxonomy/refseqs_aligned.fasta"
    tree: "data/loris/taxonomy/refseqs_tree.newick"
    table: "data/loris/taxonomy/tax_table.tsv"
  abundance_wf16s: "data/loris/wf16s_abundance/"
  microeco: 
    dataset:
      main:
        keg: "microeco/loris/datasets/main/keg"
        njc: "microeco/loris/datasets/main/njc"
        fpt: "microeco/loris/datasets/main/fpt"
        tax: "microeco/loris/datasets/main"
      culi:
        keg: "microeco/loris/datasets/culi/keg"
        njc: "microeco/loris/datasets/culi/njc"
        fpt: "microeco/loris/datasets/culi/fpt"
        tax: "microeco/loris/datasets/culi"
      warb:
        keg: "microeco/loris/datasets/warble/keg"
        njc: "microeco/loris/datasets/warble/njc"
        fpt: "microeco/loris/datasets/warble/fpt"
        tax: "microeco/loris/datasets/warble"
    abund:
      main:
        keg: "microeco/loris/abundance/main/keg"
        fpt: "microeco/loris/abundance/main/fpt"
        njc: "microeco/loris/abundance/main/njc"
        tax: "microeco/loris/abundance/main"
      culi:
        keg: "microeco/loris/abundance/culi/keg"
        fpt: "microeco/loris/abundance/culi/fpt"
        njc: "microeco/loris/abundance/culi/njc"
        tax: "microeco/loris/abundance/culi"
      warb:
        keg: "microeco/loris/abundance/warble/keg"
        fpt: "microeco/loris/abundance/warble/fpt"
        njc: "microeco/loris/abundance/warble/njc"
        tax: "microeco/loris/abundance/warble"
    alpha:
      main: "microeco/loris/alphadiversity/main"
      culi: "microeco/loris/alphadiversity/culi"
      warb: "microeco/loris/alphadiversity/warble"
    beta:
      main:
        kegg: "microeco/loris/betadiversity/main/keg"
        fpt: "microeco/loris/betadiversity/main/fpt"
        njc: "microeco/loris/betadiversity/main/njc"
        tax: "microeco/loris/betadiversity/main"
      culi:
        kegg: "microeco/loris/betadiversity/culi/keg"
        fpt:  "microeco/loris/betadiversity/culi/fpt"
        njc:  "microeco/loris/betadiversity/culi/njc"
        tax: "microeco/loris/betadiversity/culi"
      warb:
        kegg: "microeco/loris/betadiversity/warble/keg"
        fpt:  "microeco/loris/betadiversity/warble/fpt"
        njc:  "microeco/loris/betadiversity/warble/njc"
        tax: "microeco/loris/betadiversity/warble"
    data:
      main:
        feature: "microeco/loris/datasets/main/feature_table.tsv"
        tree:    "microeco/loris/datasets/main/phylo_tree.tre"
        fasta:   "microeco/loris/datasets/main/rep_fasta.fasta"
        samples: "microeco/loris/datasets/main/sample_table.tsv"
        taxa:    "microeco/loris/datasets/main/tax_table.tsv"
      culi: 
        feature: "microeco/loris/datasets/culi/feature_table.tsv"
        tree:    "microeco/loris/datasets/culi/phylo_tree.tre"
        fasta:   "microeco/loris/datasets/culi/rep_fasta.fasta"
        samples: "microeco/loris/datasets/culi/sample_table.tsv"
        taxa:    "microeco/loris/datasets/culi/tax_table.tsv"
      warb:
        feature: "microeco/loris/datasets/warb/feature_table.tsv"
        tree:    "microeco/loris/datasets/warb/phylo_tree.tre"
        fasta:   "microeco/loris/datasets/warb/rep_fasta.fasta"
        samples: "microeco/loris/datasets/warb/sample_table.tsv"
        taxa:    "microeco/loris/datasets/warb/tax_table.tsv"

sample_sheets:
  compilations:
    bats:    "../read_processing/samples/sample_sheets/bats/nwr_combined_sample_sheet.csv"
    loris:    "../read_processing/samples/sample_sheets/loris/hdz_combined_sample_sheet.csv"
  nwr1: "../read_processing/samples/sample_sheets/bats/nwr1_sample_sheet.csv"
  hdz1:  "../read_processing/samples/sample_sheets/loris/hdz1_sample_sheet.csv"
  hdz2:  "../read_processing/samples/sample_sheets/loris/hdz2_sample_sheet.csv"
  hdz3:  "../read_processing/samples/sample_sheets/loris/hdz3_sample_sheet.csv"
  hdz4:  "../read_processing/samples/sample_sheets/loris/hdz4_sample_sheet.csv"
  hdz5:  "../read_processing/samples/sample_sheets/loris/hdz5_sample_sheet.csv"
  hdz6:  "../read_processing/samples/sample_sheets/loris/hdz6_sample_sheet.csv"
  hdz7:  "../read_processing/samples/sample_sheets/loris/hdz7_sample_sheet.csv"
  hdz8:  "../read_processing/samples/sample_sheets/loris/hdz8_sample_sheet.csv"
  hdz9:  "../read_processing/samples/sample_sheets/loris/hdz9_sample_sheet.csv"
  hdz10: "../read_processing/samples/sample_sheets/loris/hdz10_sample_sheet.csv"
  hdz11: "../read_processing/samples/sample_sheets/loris/hdz11_sample_sheet.csv"
  hdz12: "../read_processing/samples/sample_sheets/loris/hdz12_sample_sheet.csv"
  hdz13: "../read_processing/samples/sample_sheets/loris/hdz13_sample_sheet.csv"
  hdz14: "../read_processing/samples/sample_sheets/loris/hdz14_sample_sheet.csv"
  hdz15: "../read_processing/samples/sample_sheets/loris/hdz15_sample_sheet.csv"
  hdz16: "../read_processing/samples/sample_sheets/loris/hdz16_sample_sheet.csv"
  hdz17: "../read_processing/samples/sample_sheets/loris/hdz17_sample_sheet.csv"
  hdz18: "../read_processing/samples/sample_sheets/loris/hdz18_sample_sheet.csv"

barcode_alignments:
  compilations:
    loris:    "../read_processing/samples/barcode_alignments/loris/hdz_combined_barcode_alignment.tsv"
    bats:    "../read_processing/samples/barcode_alignments/bats/nwr_combined_barcode_alignment.tsv"
  nwr1: "../read_processing/samples/barcode_alignments/bats/nwr1_barcode_alignment.tsv"
  hdz1:  "../read_processing/samples/barcode_alignments/loris/hdz1_barcode_alignment.tsv"
  hdz2:  "../read_processing/samples/barcode_alignments/loris/hdz2_barcode_alignment.tsv"
  hdz3:  "../read_processing/samples/barcode_alignments/loris/hdz3_barcode_alignment.tsv"
  hdz4:  "../read_processing/samples/barcode_alignments/loris/hdz4_barcode_alignment.tsv"
  hdz5:  "../read_processing/samples/barcode_alignments/loris/hdz5_barcode_alignment.tsv"
  hdz6:  "../read_processing/samples/barcode_alignments/loris/hdz6_barcode_alignment.tsv"
  hdz7:  "../read_processing/samples/barcode_alignments/loris/hdz7_barcode_alignment.tsv"
  hdz8:  "../read_processing/samples/barcode_alignments/loris/hdz8_barcode_alignment.tsv"
  hdz9:  "../read_processing/samples/barcode_alignments/loris/hdz9_barcode_alignment.tsv"
  hdz10: "../read_processing/samples/barcode_alignments/loris/hdz10_barcode_alignment.tsv"
  hdz11: "../read_processing/samples/barcode_alignments/loris/hdz11_barcode_alignment.tsv"
  hdz12: "../read_processing/samples/barcode_alignments/loris/hdz12_barcode_alignment.tsv"
  hdz13: "../read_processing/samples/barcode_alignments/loris/hdz13_barcode_alignment.tsv"
  hdz14: "../read_processing/samples/barcode_alignments/loris/hdz14_barcode_alignment.tsv"
  hdz15: "../read_processing/samples/barcode_alignments/loris/hdz15_barcode_alignment.tsv"
  hdz16: "../read_processing/samples/barcode_alignments/loris/hdz16_barcode_alignment.tsv"
  hdz17: "../read_processing/samples/barcode_alignments/loris/hdz17_barcode_alignment.tsv"
  hdz18: "../read_processing/samples/barcode_alignments/loris/hdz18_barcode_alignment.tsv"

abund_wf16s_files:
  hdz1:  "data/loris/wf16s_abundance/hdz1_abundance_table_species.tsv"
  hdz2:  "data/loris/wf16s_abundance/hdz2_abundance_table_species.tsv"
  hdz3:  "data/loris/wf16s_abundance/hdz3_abundance_table_species.tsv"
  hdz4:  "data/loris/wf16s_abundance/hdz4_abundance_table_species.tsv"
  hdz5:  "data/loris/wf16s_abundance/hdz5_abundance_table_species.tsv"
  hdz6:  "data/loris/wf16s_abundance/hdz6_abundance_table_species.tsv"
  hdz7:  "data/loris/wf16s_abundance/hdz7_abundance_table_species.tsv"
  hdz8:  "data/loris/wf16s_abundance/hdz8_abundance_table_species.tsv"
  hdz9:  "data/loris/wf16s_abundance/hdz9_abundance_table_species.tsv"
  hdz10: "data/loris/wf16s_abundance/hdz10_abundance_table_species.tsv"
  hdz11: "data/loris/wf16s_abundance/hdz11_abundance_table_species.tsv"
  hdz12: "data/loris/wf16s_abundance/hdz12_abundance_table_species.tsv"
  hdz13: "data/loris/wf16s_abundance/hdz13_abundance_table_species.tsv"
  hdz14: "data/loris/wf16s_abundance/hdz14_abundance_table_species.tsv"
  hdz15: "data/loris/wf16s_abundance/hdz15_abundance_table_species.tsv"
  hdz16: "data/loris/wf16s_abundance/hdz16_abundance_table_species.tsv"
  hdz17: "data/loris/wf16s_abundance/hdz17_abundance_table_species.tsv"
  hdz18: "data/loris/wf16s_abundance/hdz18_abundance_table_species.tsv"

methods_16s:
  libprep_workflow: "'rapid16s'"
  dorado_model: "'dna_r10.4.1_e8.2_400bps_sup@v5.0.0'"
  min_length: 1000
  max_length: 2000
  min_qual: 7
  min_id: 85
  min_cov: 80
  kit_name: "'SQK-16S114-24'"
  tax_rank: "S"
  n_taxa_barplot: 12
  abund_threshold: 0
  loris:
    rarefy: 3500
    norm: "SRS"
    min_abund: 0.00001
    min_freq: 0.01
    include_lowest: TRUE
    unifrac: TRUE
    betadiv: "aitchison"
    alpha_pd: TRUE
    tax4fun_db: "Ref99NR"
    loris_rarefy: 3500
    keg_minID: 97

Show/Hide `config_paths.R` Script

global            <- config::get(config = "default")
swan              <- config::get(config = "swan")
micro             <- config::get(config = "microbiome")
loris             <- config::get(config = "loris")
marmoset          <- config::get(config = "marmoset")
isolates          <- config::get(config = "isolates")
bats              <- config::get(config = "bats")
methods_16s       <- config::get(config = "methods_16s")
sample_sheets     <- config::get(config = "sample_sheets")
abund_wf16s_files <- config::get(config = "abund_wf16s_files")
barcode_alignments<- config::get(config = "barcode_alignments")

seqruns      <- seqruns %>% keep_at(params$sampleset)       %>% list_flatten(name_spec = "")
subject_list <- keep_at(subjects, paste0(params$sampleset)) %>% list_flatten(name_spec = "{inner}")
path         <- config::get(config = paste0(params$sampleset))

Terminal and Cat Engines

I use different language engines in some scripts named terminal and cat. If you see a chunk of code with {terminal} or {cat} written where you would usually see {r} at the top of the chunk, then running the chunk will not tell your R console to process this in the usual R language.

terminal chunks will echo the code in raw text format for easy copying and pasting into the terminal console. This just makes it easier for me to interact with other servers like Swan from my R Studio window. There are ways to set R Studio up to run code through multiple servers, but I find this the simplest way to switch back and forth while still keeping a record of the code that have used or changes I have made to it.

Expand to see the code I use to create the custom terminal language engine at the start of each script.

knitr::knit_engines$set(terminal = function(options) {
  code <- paste(options$code, collapse = "\n")
  
  options$warning <- FALSE
  knitr::engine_output(options, code, out = code)
})

cat chunks will write the text in that chunk directly into a file in this repository. I use this to streamline my batch scripts that I submit as jobs to the Swan cluster. This way, you only need to tweak the code in one location and rerun the chunk to ensure that update applies to all locations with working copies of the code.
- the engine.opts=list(file='directory/script.sh') string tells this chunk which directory and file to write the text to.
- adding append=TRUE to the string above tells the chunk that the text should be added to the bottom of the existing file. If this is not included, the text will overwrite anything already existing in a file at that location. Here is a complete example of what that chunk would look like:

{cat, engine.opts=list(file='subdirectory/script.sh')}
  

This would create a new file named `script.sh` with this text inside. If the file already exists, it will be replaced.

{cat, engine.opts=list(file='subdirectory/script.sh', append=TRUE)}

  
  
This chunk would add this text to the bottom of the file we just created and added the text to.

R Studio and R Markdown Basics

Once you gain fluency in R and github, it becomes very easy to adapt just about any project, document, or task to an automated, easily transferrable format with streamlined version control through R Studio and R Markdown. The following two online textbooks are the best place to start if you are new to R Markdown (I still refer back to these constantly for different tips and tricks).

Markdown Syntax

I also recommend downloading the following cheatsheet from The Markdown Guide for handy inline syntax when composing R Markdown files like those used in my tutorials.

Other Cheatsheets

This page also contains a list of very helpful cheatsheets for other packages or purposes in R Studio. Refer back to this often, as you may find some of these more useful the more fluent you become in R.

Bioinformatics and Stats Tutorials

Tutorial Options

metadata_setup

Files Needed

Files Produced

microbiome_alignments.html

Files Needed

Files Produced

microbiome_references.html

Files Needed

Files Produced

microbiome_preprocess.html

Files Needed

Files Produced

General Script Advice

On Your First-Use of a Workflow

params option in the R Markdown Header

Sample Lists

File Paths and the config Package

Terminal and Cat Engines

R Studio and R Markdown Basics

Markdown Syntax

Other Cheatsheets

`metadata_setup`

`microbiome_alignments.html`

`microbiome_references.html`

`microbiome_preprocess.html`

`params` option in the R Markdown Header

File Paths and the `config` Package