From Command Line to Script

Stop typing the same commands over and over—save them as scripts instead.

Alt text

You’ve mastered loops. You can process dozens of files with a single command. But here’s the thing: every time you close your terminal, those beautiful loops disappear. What if you could save them? What if you could run them again tomorrow, or share them with a colleague, or build a library of your most useful commands?

That’s exactly what bash scripts do. They’re your loops and commands, saved in a file, ready to run whenever you need them.

Setting Up Our Workspace

Before we dive into scripting, let’s create a proper workspace with some practice files. We’ll build a project structure that mirrors real bioinformatics work.

First, create a new directory for this lesson and some dummy FASTQ files:

$ mkdir data
$ cd data

Now let’s create some mock FASTQ files to work with using the for loop that we learned in the previous lesson.

Remember first we will run the loop with echo to see if it is working fine as we want.

$ for sample in 01 02 03
> do
>    echo yeast_${sample}_R1.fastq yeast_${sample}_R2.fastq
> done

Now let’s change the echo with touch to generate the files, before you start writing the command again remeber to hit the up arrow key to get the previous command and just change echo with touch in there:

for sample in 01 02 03; do  touch yeast_${sample}_R1.fast1 yeast_${sample}_R2.fastq; done

Let’s verify what we created:

$ ls
yeast_01_R1.fastq
yeast_01_R2.fastq
yeast_02_R1.fastq
yeast_02_R2.fastq
yeast_03_R1.fastq
yeast_03_R2.fastq

Perfect! We have three yeast samples, each with paired-end reads (R1 and R2). This mimics a real RNA-seq or genomics dataset.

Your First Script: Saving a Loop

Remember that loop we built in the previous lesson? Let’s save it as a script. Using your favorite text editor (nano, vi, or even a graphical editor), create a new file called echo.sh:

$ vi echo.sh

Type the following into the file:

for file in *.fastq
do
	echo ${file}
done

Save and close the file (in vi: press Esc, then type :wq and press Enter).

Now comes the magic moment—executing your script:

$ bash echo.sh
yeast_01_R1.fastq
yeast_01_R2.fastq
yeast_02_R1.fastq
yeast_02_R2.fastq
yeast_03_R1.fastq
yeast_03_R2.fastq

Congratulations! You just ran your first bash script. That .sh extension tells both you and the computer that this is a bash script—an executable text file containing commands.

Making Scripts More Useful: Word Count Example

Echoing filenames is nice for testing, but let’s do something more practical. Create a new script called wordcount.sh:

$ vi wordcount.sh

This time, we’ll replace echo with wc (word count) to get some statistics about our files:

for file in *.fastq
do
	wc ${file}
done

Run it:

$ bash wordcount.sh
0 0 0 yeast_01_R1.fastq
0 0 0 yeast_01_R2.fastq
0 0 0 yeast_02_R1.fastq
0 0 0 yeast_02_R2.fastq
0 0 0 yeast_03_R1.fastq
0 0 0 yeast_03_R2.fastq

Since our files are empty (just placeholders), we get zeros. But you can see how this would work with real data—it would show lines, words, and bytes for each file.

Adding Flexibility with Variables

Here’s the problem with our script: it’s hardcoded to look for *.fastq. What if you want to run it on just the R1 files? Or just one sample? We need flexibility.

Enter the ${@} variable—it represents “all the arguments passed to the script.” Let’s modify our wordcount script:

for file in ${@}
do
	wc ${file}
done

Now when we run the script, we can specify exactly which files to process:

$ bash wordcount.sh *R1.fastq
0 0 0 yeast_01_R1.fastq
0 0 0 yeast_02_R1.fastq
0 0 0 yeast_03_R1.fastq

See the difference? We only processed the R1 files because that’s what we specified. Same script, different inputs. That’s the power of ${@}.

Saving Output to Files

Looking at results in your terminal is fine, but often you want to save them for later analysis or record-keeping. Let’s modify the script to append results to a file:

for file in ${@}
do
	wc ${file} >> wordcount.txt
done

That >> redirects the output into a file called wordcount.txt, creating it if it doesn’t exist and appending to it if it does.

$ bash wordcount.sh *R1.fastq
$ ls
echo.sh
wordcount.sh
wordcount.txt
yeast_01_R1.fastq
yeast_01_R2.fastq
yeast_02_R1.fastq
yeast_02_R2.fastq
yeast_03_R1.fastq
yeast_03_R2.fastq

Let’s check what’s in our results file:

$ cat wordcount.txt
0 0 0 yeast_01_R1.fastq
0 0 0 yeast_02_R1.fastq
0 0 0 yeast_03_R1.fastq

Great! But now we have scripts and results mixed in with our data. This is getting messy. Time to get organized.

Professional Project Organization

Here’s a secret that will make your bioinformatics life infinitely better: keep your data, scripts, and results in separate directories.

This isn’t just about being neat—it’s about reproducibility, collaboration, and your own sanity three months from now when you’re trying to figure out what you did.

The standard structure looks like this:

project/
├── data/      # Raw and processed data files
├── bin/       # Scripts and executable code
└── results/   # Analysis outputs and results

Let’s build this structure. First, move back to your project root like your-favourite-path/bash_scripting_tutorial.

First we will create a new project folder, let’s name out project as my-first-project.

$ mkdir my-first-project

First we will move the data folder to our project folder.

$ mv data/ my-first-project/ 

Now create the bin and results directories:

$ mkdir bin results

Move your scripts to bin:

$ cd my-first-project/ # Make sure you are in project directory
$ cd data
$ mv echo.sh wordcount.sh ../bin/

Let’s see what we have:

$ cd ..
$ ls
bin
data
results

Beautiful! Now let’s look inside each directory:

$ ls data/
wordcount.txt
yeast_01_R1.fastq
yeast_01_R2.fastq
yeast_02_R1.fastq
yeast_02_R2.fastq
yeast_03_R1.fastq
yeast_03_R2.fastq
$ ls bin/
echo.sh
wordcount.sh
$ ls results/
(empty for now)

Adapting Scripts for New Directory Structure

Now that our files are organized, we need to update our scripts to know where things are. Navigate to the bin directory:

$ cd bin

Edit wordcount.sh to include relative paths:

for file in ../data/${@}
do
	wc ${file} >> ../results/wordcount.txt
done

The ../data/ tells the script to look in the data directory (one level up, then into data). Similarly, ../results/ sends output to the results directory.

Now run it from the bin directory:

$ bash wordcount.sh yeast_*_R1.fastq

Check the results:

$ cat ../results/wordcount.txt
0 0 0 ../data/yeast_01_R1.fastq
0 0 0 ../data/yeast_02_R1.fastq
0 0 0 ../data/yeast_03_R1.fastq

Perfect! Now your data stays in data, your scripts stay in bin, and your results go to results. Everything has its place.

Creating Sample-Specific Output Files

Having one big results file is okay, but often you want separate output for each sample. We can use the basename function to extract the filename and create custom output names.

Here’s the updated script:

for file in ../data/${@}
do
	output=$(basename ${file} .fastq)-wordcount.txt 
	wc ${file} >> ../results/${output}
done

Let’s break down that basename line:

Run it:

$ bash wordcount.sh yeast_*_R1.fastq
$ ls ../results/
wordcount.txt
yeast_01_R1-wordcount.txt
yeast_02_R1-wordcount.txt
yeast_03_R1-wordcount.txt

Now each sample has its own results file. Much more organized!

Check one of them:

$ cat ../results/yeast_01_R1-wordcount.txt
0 0 0 ../data/yeast_01_R1.fastq

Best Practices for Writing Scripts

As you develop your scripting skills, keep these principles in mind:

1. Start with a working command Always test your commands interactively first. Once they work, then move them into a script.

2. Use meaningful variable names output=$(basename ${file} .fastq)-results.txt is much clearer than out=$(basename ${f} .fastq)-r.txt

3. Comment your code Add comments explaining what complex sections do. Future you will be grateful:

# Extract sequence name and create output filename
output=$(basename ${file} .fastq)-overrepresented.txt

4. Organize your project Always use the data/bin/results structure. It’s not overkill—it’s professional.

5. Test with a subset Before running a script on 500 files, test it on 2-3 files first.

6. Use > vs >> deliberately > overwrites (good for fresh analyses), >> appends (good for combining results).

Challenge: Build Your Own Pipeline

Try these exercises to solidify your skills:

1. Create a file size reporter Write a script that uses ls -lh to show the size of each file and saves it to a results file.

2. Add logging Modify a script to print “Processing [filename]…” before each file and “Complete!” after.

3. Create a backup script Write a script that copies all FASTQ files from data/ to a new backup/ directory with today’s date as a prefix.

The Power of Reusability

After working through these examples, you’ve crossed an important threshold. You’re no longer just typing commands—you’re building tools. These scripts can be:

That simple wordcount.sh script? It’s now part of your bioinformatics toolkit forever. You’ll modify it, improve it, and eventually have a whole library of custom scripts that do exactly what you need.

The real magic happens when you realize you can solve today’s problem by slightly modifying a script you wrote last month. That’s when scripting transforms from a task into a superpower.


Bash scripts turn your command-line knowledge into reusable tools. Start simple, stay organized, and watch your productivity soar.

← Previous