library(conflicted)
library(tidyverse)
library(downloadthis)
library(fs)
source("setup/conflicted.R")
source("setup/knit_engines_simple.R")
knitr::opts_chunk$set(message = FALSE,
warning = FALSE,
echo = TRUE,
include = TRUE,
eval = TRUE,
comment = "")
Background
Read more about this workflow here: PRONAME:
a user-friendly pipeline to process long-read nanopore metabarcoding
data by generating high-quality consensus sequences .
Integrating Dorado for trimming & demux
We can continue to use the Dorado basecalling → demux steps, then
hand off the demultiplexed FASTQ files to PRONAME Here’s how it
works:
Dorado does the heavy lifting
GPU-accelerated SUP base-calling
Barcode demultiplex (with your ONT 16S kit’s barcodes)
Outputs clean, per-sample FASTQs.
PRONAME’s 4-step Pipeline:
proname_import
We will skip the optional trimming, which we already let dorado
handle automatically.
We use the --duplex 'yes' argument to optimize for V14
kit chemistry.
PRONAME provides length-vs-quality scatterplots that we can use to
inform our QC parameters.
proname_filter
Set the optimal filtering thresholds based on results in the
previous step and visualize the filtering impact on results.
C. proname_refine
Now PRONAME polishes reads via medaka, performs read
clustering, removes chimeric sequences, and and generates
error-corrected consensus sequences.
Files may be exported directly to QIIME2 to adapt
standard Illumina-type workflows.
proname_taxonomy
The files generated while gathering the consensus sequences and the
standard reference databases are used to perform a taxnonomic
analysis.
This step produces a taxonomy file and a taxa barplot.
Post-PRONAME
After this, we can use standard QIIME2 pipelines that
easily integrate into R packages like phyloseq.
For example, we can use qiime phylogeny to produce a
phylogenetic diversity analysis (No more need to fetch refernces
directly from GenBank ourselves! ).
Setting up your Swan Workspace
Dorado Usage
You can skip these steps if you already have dorado up
and running on swan, including the most up-to-date sup basecalling
model.
SUP Model Download
You should have some basic understanding of which models Dorado
provides for basecalling ONT reads by looking over this page .
I use the config package or parameters in the yaml header to track and
source the models that I am using. You will need to report details like
this in the methods section of any paper produced by your results.
We will almost always choose the newest SUP model available
on the HCC with the 10.4.1. kit chemistry.
For some reason dorado’s automatic sourcing and use of models does
not seem to work from the GPU nodes on the HCC, so we will download a
stable version of our current model options. - This file needs to be
the path below within your working directory for it to automatically be
located by the code I have written in other scripts.
You should at least download the newest sup model
available, but you may also download the newest hac and
fast models if you would like. The code in other scripts
will search this directory for whichever of these three models you
specify at that time.
Run the chunk below after replacing with the model of choice (and
your directory paths) and then transfer the file
batch_scripts/dorado_models.sh to your repo mirror on Swan
work.
Download Script
# batch_scripts/dorado_setup.sh
#!/bin/bash
#SBATCH --job-name=dorado_model
#SBATCH --output=/work/richlab/aliciarich/read_processing/logs/dorado_model.%j.out
#SBATCH --error=/work/richlab/aliciarich/read_processing/logs/dorado_model.%j.err
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --constraint='gpu_v100|gpu_t4'
#SBATCH --partition=gpu,guest_gpu
#SBATCH --gres=gpu:2
#SBATCH --mem=20GB
module load apptainer
cd /work/richlab/aliciarich/read_processing
apptainer exec docker://nanoporetech/dorado:latest dorado download --model dna_r10.4.1_e8.2_400bps_sup@v5.2.0 --directory dorado_models
Once you ensure this script has transferred to the
read_processing/batch_scripts path on your HCC directory,
run the code below to submit the job.
cd read_processing
sbatch batch_scripts/dorado_setup.sh
