This repository contains the scripts, tutorials, and templates for the Rich Lab’s mainstream bioinformatic workflows. Before you begin working with any data or samples, you should first make sure you understand the DataInventory and Metadata workflows.
To view the webpage version of this repository, click here: Rich Lab Bioinformatics/Stats
To view the base repository, click here: Rich Lab Bioinformatics/Stats
If you are working with any molecular samples in the lab, you should begin with the following workflows (in order):
If you are working with microbiome data, then you should follow these tutorials with the MicroEco Setup:
The raw scripts with chunk of code that you can run directly (assuming you download the entire repository with necessary dependencies) are in .Rmd format. I also use the knitr package to create “prettier” (but read-only) versions of those files, which are easier to read and study on their own.
Once you look through these, you should work on your own MetadataSetup script for the hypotheses you are trying to test. You can download the tutorial as a template and edit it from there.
I will add more tutorials and guides later, including details on how to best use github and R Studio to push and pull your own repositories to this site.
Main R-Markdown (.Rmd) Files to Start From (each also available as a knitted html format):
R Markdown File | Knitted HTML Link | purpose |
---|---|---|
SampleInventory.Rmd | SampleInventory.html | Maintain records of all Rich Lab samples and export samplesheets for Dorado. |
MetadataSetup.Rmd | MetadataSetup.html | Create predictor variables, code them, and then match to samples. |
MinIONReadProcessing.Rmd | MinIONReadProcessing.html | Basecall, demultiplex, filter, clean, and organize raw ONT sequence data. |
microbiome_new_data.Rmd | microbiome_new_data.html | Prepare aligned reads and other wf-16s outputs for analysis using MicroEco. |
Data_Notes.Rmd | Data_Notes.html | Review basic statistical options available for some of our main datasets. |
Scripts not specific to R languages:
filename | description |
---|---|
README.md | Text file (markdown format) description of the project. |
config.yml | directory paths and other parameters to ensure reproducible code pipelines. |
Subdirectories:
directory name | purpose |
---|---|
data/ | Intermediate or raw-stage datasets in table form. Subdirectories organized by sampleset. |
dataframes/ | Data tables produced and used by other Rmd scripts in this repository. |
metadata/ | Data table files and R scripts to generate tibbles/dataframes with metadata. |
microeco/ | Datasets and results produced and used directly by the microeco package. |
setup/ | Modularized .R scripts with all parameters, functions, packages, and other dependencies. |
Each of these may appear as subdirectories within those listed above to organize the files for each set of projects. Below are the main samplesets in use.
sampleset shorthand | description |
---|---|
loris | Pygmy loris genetic, microbial, and behavioral data collected with Henry Doorly Zoo |
marmoset | Gut microbiome data gathered from the UNO Research Colony for Shayda Azadmanesh’s thesis. |
bats | Genetic and gut microbiome data gathered by collaborators at trapping sites across N. America |
environmental | Samples gathered from opportunistic environmental sources for Thomas Raad’s thesis. |
isolates | Purified DNA from bacterial isolates grown in the Ayayeye lab for whole genome sequencing in the Rich Lab. |
These links take you to full-page summary tables or graphics compiled for some of the in-progress analyses.
Here are some of the online resources for the packages or platforms that I use the most for different workflows and scripts in the lab.