Raw Read QC - Don't Skip This Critical First Step.

Using fastp, FastQC, and MultiQC to ensure your sequencing data is ready for downstream analysis

Alt text

Before you can discover variants, assemble genomes, or quantify expression, you need to know one thing: is your data any good? Quality control isn’t glamorous, but it’s the foundation that every reliable analysis is built on. In this first post of our NGS analysis series, we’ll explore how to evaluate your raw sequencing reads using three essential tools: fastp, FastQC, and MultiQC. Let’s make sure we’re starting with data we can trust.

Setting Up and Testing fastp

I started by creating a YAML file for the fastp conda environment, then built the environment using:

conda env create -f workflow/envs/fastp.yaml 

Once the environment was created, I activated it:

conda activate fastp-env

To verify the installation, I checked the fastp version:

$ fastp -v
fastp 0.20.0

First Run Issue

I tested fastp with a basic command on paired-end reads:

fastp -i sample_1.fq.gz \
     -I sample_2.fq.gz  \
     -h sample.html \
     -j sample.json 

Problem encountered: fastp ran without errors, but generated empty HTML and JSON reports.

← Previous