11 min to read
From Command Line to Script
Stop typing the same commands over and over—save them as scripts instead.
You’ve mastered loops. You can process dozens of files with a single command. But here’s the thing: every time you close your terminal, those beautiful loops disappear. What if you could save them? What if you could run them again tomorrow, or share them with a colleague, or build a library of your most useful commands?
That’s exactly what bash scripts do. They’re your loops and commands, saved in a file, ready to run whenever you need them.
Setting Up Our Workspace
Before we dive into scripting, let’s create a proper workspace with some practice files. We’ll build a project structure that mirrors real bioinformatics work.
First, create a new directory for this lesson and some dummy FASTQ files:
$ mkdir data
$ cd data
Now let’s create some mock FASTQ files to work with using the for loop that we learned in the previous lesson.
Remember first we will run the loop with echo to see if it is working fine as we want.
$ for sample in 01 02 03
> do
> echo yeast_${sample}_R1.fastq yeast_${sample}_R2.fastq
> done
Now let’s change the echo with touch to generate the files, before you start writing the command again remeber to hit the up arrow key to get the previous command and just change echo with touch in there:
for sample in 01 02 03; do touch yeast_${sample}_R1.fast1 yeast_${sample}_R2.fastq; done
Let’s verify what we created:
$ ls
yeast_01_R1.fastq
yeast_01_R2.fastq
yeast_02_R1.fastq
yeast_02_R2.fastq
yeast_03_R1.fastq
yeast_03_R2.fastq
Perfect! We have three yeast samples, each with paired-end reads (R1 and R2). This mimics a real RNA-seq or genomics dataset.
Your First Script: Saving a Loop
Remember that loop we built in the previous lesson? Let’s save it as a script. Using your favorite text editor (nano, vi, or even a graphical editor), create a new file called echo.sh:
$ vi echo.sh
Type the following into the file:
for file in *.fastq
do
echo ${file}
done
Save and close the file (in vi: press Esc, then type :wq and press Enter).
Now comes the magic moment—executing your script:
$ bash echo.sh
yeast_01_R1.fastq
yeast_01_R2.fastq
yeast_02_R1.fastq
yeast_02_R2.fastq
yeast_03_R1.fastq
yeast_03_R2.fastq
Congratulations! You just ran your first bash script. That .sh extension tells both you and the computer that this is a bash script—an executable text file containing commands.
Making Scripts More Useful: Word Count Example
Echoing filenames is nice for testing, but let’s do something more practical. Create a new script called wordcount.sh:
$ vi wordcount.sh
This time, we’ll replace echo with wc (word count) to get some statistics about our files:
for file in *.fastq
do
wc ${file}
done
Run it:
$ bash wordcount.sh
0 0 0 yeast_01_R1.fastq
0 0 0 yeast_01_R2.fastq
0 0 0 yeast_02_R1.fastq
0 0 0 yeast_02_R2.fastq
0 0 0 yeast_03_R1.fastq
0 0 0 yeast_03_R2.fastq
Since our files are empty (just placeholders), we get zeros. But you can see how this would work with real data—it would show lines, words, and bytes for each file.
Adding Flexibility with Variables
Here’s the problem with our script: it’s hardcoded to look for *.fastq. What if you want to run it on just the R1 files? Or just one sample? We need flexibility.
Enter the ${@} variable—it represents “all the arguments passed to the script.” Let’s modify our wordcount script:
for file in ${@}
do
wc ${file}
done
Now when we run the script, we can specify exactly which files to process:
$ bash wordcount.sh *R1.fastq
0 0 0 yeast_01_R1.fastq
0 0 0 yeast_02_R1.fastq
0 0 0 yeast_03_R1.fastq
See the difference? We only processed the R1 files because that’s what we specified. Same script, different inputs. That’s the power of ${@}.
Saving Output to Files
Looking at results in your terminal is fine, but often you want to save them for later analysis or record-keeping. Let’s modify the script to append results to a file:
for file in ${@}
do
wc ${file} >> wordcount.txt
done
That >> redirects the output into a file called wordcount.txt, creating it if it doesn’t exist and appending to it if it does.
$ bash wordcount.sh *R1.fastq
$ ls
echo.sh
wordcount.sh
wordcount.txt
yeast_01_R1.fastq
yeast_01_R2.fastq
yeast_02_R1.fastq
yeast_02_R2.fastq
yeast_03_R1.fastq
yeast_03_R2.fastq
Let’s check what’s in our results file:
$ cat wordcount.txt
0 0 0 yeast_01_R1.fastq
0 0 0 yeast_02_R1.fastq
0 0 0 yeast_03_R1.fastq
Great! But now we have scripts and results mixed in with our data. This is getting messy. Time to get organized.
Professional Project Organization
Here’s a secret that will make your bioinformatics life infinitely better: keep your data, scripts, and results in separate directories.
This isn’t just about being neat—it’s about reproducibility, collaboration, and your own sanity three months from now when you’re trying to figure out what you did.
The standard structure looks like this:
project/
├── data/ # Raw and processed data files
├── bin/ # Scripts and executable code
└── results/ # Analysis outputs and results
Let’s build this structure. First, move back to your project root like your-favourite-path/bash_scripting_tutorial.
First we will create a new project folder, let’s name out project as my-first-project.
$ mkdir my-first-project
First we will move the data folder to our project folder.
$ mv data/ my-first-project/
Now create the bin and results directories:
$ mkdir bin results
Move your scripts to bin:
$ cd my-first-project/ # Make sure you are in project directory
$ cd data
$ mv echo.sh wordcount.sh ../bin/
Let’s see what we have:
$ cd ..
$ ls
bin
data
results
Beautiful! Now let’s look inside each directory:
$ ls data/
wordcount.txt
yeast_01_R1.fastq
yeast_01_R2.fastq
yeast_02_R1.fastq
yeast_02_R2.fastq
yeast_03_R1.fastq
yeast_03_R2.fastq
$ ls bin/
echo.sh
wordcount.sh
$ ls results/
(empty for now)
Adapting Scripts for New Directory Structure
Now that our files are organized, we need to update our scripts to know where things are. Navigate to the bin directory:
$ cd bin
Edit wordcount.sh to include relative paths:
for file in ../data/${@}
do
wc ${file} >> ../results/wordcount.txt
done
The ../data/ tells the script to look in the data directory (one level up, then into data). Similarly, ../results/ sends output to the results directory.
Now run it from the bin directory:
$ bash wordcount.sh yeast_*_R1.fastq
Check the results:
$ cat ../results/wordcount.txt
0 0 0 ../data/yeast_01_R1.fastq
0 0 0 ../data/yeast_02_R1.fastq
0 0 0 ../data/yeast_03_R1.fastq
Perfect! Now your data stays in data, your scripts stay in bin, and your results go to results. Everything has its place.
Creating Sample-Specific Output Files
Having one big results file is okay, but often you want separate output for each sample. We can use the basename function to extract the filename and create custom output names.
Here’s the updated script:
for file in ../data/${@}
do
output=$(basename ${file} .fastq)-wordcount.txt
wc ${file} >> ../results/${output}
done
Let’s break down that basename line:
basename ${file} .fastqtakes the full path like../data/yeast_01_R1.fastq- Removes the directory path:
yeast_01_R1.fastq - Strips the
.fastqextension:yeast_01_R1 - We add
-wordcount.txtto get:yeast_01_R1-wordcount.txt
Run it:
$ bash wordcount.sh yeast_*_R1.fastq
$ ls ../results/
wordcount.txt
yeast_01_R1-wordcount.txt
yeast_02_R1-wordcount.txt
yeast_03_R1-wordcount.txt
Now each sample has its own results file. Much more organized!
Check one of them:
$ cat ../results/yeast_01_R1-wordcount.txt
0 0 0 ../data/yeast_01_R1.fastq
Best Practices for Writing Scripts
As you develop your scripting skills, keep these principles in mind:
1. Start with a working command Always test your commands interactively first. Once they work, then move them into a script.
2. Use meaningful variable names
output=$(basename ${file} .fastq)-results.txt is much clearer than out=$(basename ${f} .fastq)-r.txt
3. Comment your code Add comments explaining what complex sections do. Future you will be grateful:
# Extract sequence name and create output filename
output=$(basename ${file} .fastq)-overrepresented.txt
4. Organize your project Always use the data/bin/results structure. It’s not overkill—it’s professional.
5. Test with a subset Before running a script on 500 files, test it on 2-3 files first.
6. Use > vs >> deliberately
> overwrites (good for fresh analyses), >> appends (good for combining results).
Challenge: Build Your Own Pipeline
Try these exercises to solidify your skills:
1. Create a file size reporter
Write a script that uses ls -lh to show the size of each file and saves it to a results file.
2. Add logging Modify a script to print “Processing [filename]…” before each file and “Complete!” after.
3. Create a backup script Write a script that copies all FASTQ files from data/ to a new backup/ directory with today’s date as a prefix.
The Power of Reusability
After working through these examples, you’ve crossed an important threshold. You’re no longer just typing commands—you’re building tools. These scripts can be:
- Run again tomorrow with different data
- Shared with collaborators
- Modified for new analyses
- Combined into larger pipelines
- Version controlled with git
That simple wordcount.sh script? It’s now part of your bioinformatics toolkit forever. You’ll modify it, improve it, and eventually have a whole library of custom scripts that do exactly what you need.
The real magic happens when you realize you can solve today’s problem by slightly modifying a script you wrote last month. That’s when scripting transforms from a task into a superpower.
Bash scripts turn your command-line knowledge into reusable tools. Start simple, stay organized, and watch your productivity soar.

Comments