Reading and writing files with Python.

In this video we will talk about how we can modify files in python.

Hello, bioinformatics enthusiasts!

Welcome back to another installment of our “Python for Bioinformatics” series. In this post, we’ll explore the essential concepts of reading and writing files in Python—a skill every programmer must master.

What Are Files in Python?

Files in Python act as a portal for data storage and retrieval, offering a natural and intuitive way to interact with data. While they’re similar to physical files (like notebooks or floppy disks from the past), Python treats them with specific structures and methods. Let’s dive in!

Step 1: Getting a File

For this example, we’ll work with sequence data from the well-known tumor suppressor gene, P53, frequently studied in cancer research.

GitHub Download Link

Step 2: Reading Files in Python

Using the open() Function

Here’s the basic way to read a file:

# Open the file in read mode
data = open("seqdump.txt", "r")

# Loop through each line in the file
for line in data:
    print(line)

data.close()  # Don’t forget to close the file!

Using the with Statement

A better and safer approach to handling files is using with:

with open("seqdump.txt", "r") as file:
    for line in file:
        print(line)

This method ensures the file is closed automatically when the block ends.

Reading Specific Data

To read specific parts of the file:

By bytes:

with open("seqdump.txt", "r") as file:
    print(file.read(10))  # Reads the first 10 bytes

By lines:

with open("seqdump.txt", "r") as file:
    print(file.readlines())  # Reads all lines into a list

Step 3: Writing Files in Python

Writing Data

To write data to a file, use the w mode (note: this overwrites the file):

with open("output.txt", "w") as file:
    file.write("Bioinformatics is awesome!\n")

Appending Data

To append data at the end of a file, use the a mode:

with open("output.txt", "a") as file:
    file.write("Adding more content!\n")

Step 4: Working with FASTA Files

For bioinformatics tasks, handling FASTA files is crucial. Let’s write a function to read sequences from a FASTA file:

def read_fasta(file_name):
    with open(file_name, "r") as file:
        sequences = file.read().split(">")
        return [seq.strip() for seq in sequences if seq]

sequences = read_fasta("seqdump.txt")
print(sequences[0])  # Print the first sequence

This function splits the file contents at each > symbol, which marks the beginning of a sequence in FASTA format.

Best Practices for File Handling

What’s Next?

Join us in the next post, List Comprehensions in Python Part I, where we’ll explore powerful techniques to write more concise and efficient code for data manipulation. We’ll cover the basics of list comprehensions and their applications in bioinformatics data processing.

← Previous Next →