3 Primer Removal

There are multiple ways in which you can remove primers sequences from your fastq files. CutAdapt and trimmomatic are two widely used tools for short-read data. DADA2 package has its own removePrimers sequence function which is recommended for PacBio data.

3.1 CutAdapt

The first step that everyone performs before doing an analysis is data cleaning. Data cleaning can mean multiple things in this context: primer removal, quality trimming, removing very short sequences etc. Remember inconsistent and incorrect data leads to false conclusions. In short, garbage in, garbage out applies to all data.

The first step that we will do with amplicon data is check which 16s rRNA gene region was sequenced, find the primer that were used and remove them. For this purpose there are a few software available like, cutAdapt, Trimmomatic, skewer. For the purpose of this workshop we will use cutAdapt.

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads. It also alows you to perform quality trimming on your reads.

3.1.1 Primer removal for paired end reads

cutadapt -a ADAPTER_FWD -A ADAPTER_REV -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq

Here, the input reads are in reads.1.fastq and reads.2.fastq, and the result will be written to out.1.fastq and out.2.fastq.

In paired-end mode, the options -a, -b, -g and -u that also exist in single-end mode are applied to the forward reads only. To modify the reverse read, these options have uppercase versions -A, -B, -G and -U that work just like their counterparts. In the example above, ADAPTER_FWD will therefore be trimmed from the forward reads and ADAPTER_REV from the reverse reads.

Single-end/R1 option Corresponding option for R2
–adapter, -a -A
–front, -g -G
–anywhere, -b -B
–cut, -u -U
–output, -o –paired-output, -p

In paired-end mode, Cutadapt checks whether the input files are properly paired. An error is raised if one of the files contains more reads than the other or if the read names in the two files do not match. The read name comparison ignores a trailing /1 or /2 to allow processing some old Illumina paired-end files.

Cutadapt supports compressed input and output files. Whether an input file needs to be decompressed or an output file needs to be compressed is detected automatically by inspecting the file name: For example, if it ends in .gz, then gzip compression is assumed

3.1.2 Quality trimming

cutadapt -q 15,10 -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq

-q 15,10

3.1.3 Call cutAdapt from R

Follow this link to see how you can call cutAdapt from within R https://benjjneb.github.io/dada2/ITS_workflow.html