Posts

2024

Visualizing read alignment data with ggplot

  • 8 min read

Read coverage plots are a readily interpretable way to visualize genomic or epigenomic profiles (RNA-seq, ChIP-seq, ATAC-seq, WGS, etc.) across many samples mapped to the same reference. See some examples here, here, and here. A common tool to visualize genomic data in this manner is IGV, which while versatile, often can be challenging to customize for publication-ready figures. Here is a tutorial on using ggplot and R to have much more artistic control over genomic coverage figures. The structure of intermediate objects will be shown so they can be easily replicated with custom data.

Read More

Reliably transferring large amounts of data using rsync and pattern-matching

  • 3 min read

A very common bioinformatic procedure is transferring files in different directories between computers. Often, it’s not as simple as running the exact same command each time, and there are slight modifications needed to make sure the correct files get transferred. The fundamentals however are very simple, and are generally very consistent. Here’s an explainer on how to use screen, rsync and pattern matching methods to make sure that specific files and directories get transferred reliably and efficiently.

Read More

Speeding up local BLAST using GNU parallel

  • 5 min read

BLAST can be parallelized to greatly improve runtime. This may be needed if you are BLASTing a large query sequence set against a giant database. The specific problem I was faced with solving was to identify contaminant sequences (of non-eukaryotic origin), accessioned in NCBI’s nt database, in a eukaryotic genome assembly stored locally. This procedure however can be easily modify for any use case (such as using different blast flavours, parameters, databases etc.) by changing the provided script.

Read More

Back to Top ↑