Michael Gerth
  • News & Blog
  • About
  • Research
    • Experimental evolution of Spiroplasma after host shifts
    • Wolbachia phylogenomics
    • Rickettsial symbionts in green lacewings
    • The role of Wolbachia in quill mites
    • Methods in molecular phylogenetics
  • Publications
  • Resources

How to efficiently bulk download NGS data from sequence read databases

25/2/2018

10 Comments

 
This blog post deals with the various ways of downloading large amounts of sequencing data (e.g., from NCBI’s SRA database). When I needed to bulk download short read for a recent project, it took me some time to figure out how to achieve this efficiently, and I am sharing my experience here in the hope it might be useful. 

The problem: you want to download lots of sequencing data (typically in form of Illumina generated reads), e.g., to reproduce a published experiment. The amount of data makes it impossible to click+download through a browser interface. There are two potential solutions: 1) download via NCBI’s SRA toolkit, and 2) access ftp servers directly.  

Read More
10 Comments

Two new scripts for creating simple GC-coverage plots from SPAdes assemblies and analysing PhyloBayes trace files

18/12/2017

1 Comment

 
Before writing about science again (new post is in the making), I have uploaded two scripts that I find useful for my work:
​

gc_cov.pl 

This script is basically a very simplified version of the blobplots function from the BlobTools package. It creates GC-coverage plots directly from SPAdes assembly files, without the need for mapping the reads back to the assembly. Since I use mainly SPAdes anyway, this has been quite handy. The script will also annotate the plot when a taxonomy file is provided, which can be generated, e.g., from blast outputs. Below is an example for the plots that can be generated with the script.
Picture

Read More
1 Comment

A script to automate SRA downloads.

18/7/2017

2 Comments

 
I have written a small script that automates the download of fastq files from the European Nucleotide Archive (ENA). This was created because I was annoyed with the speed NCBI's sra-tools. It takes NCBI SRA accession numbers as input and downloads the fastq files directly from the ENA using wget. The download speed is thus basically only limited by your bandwidth. Any feedback is very welcome!

Find it on the resources page or on github.
2 Comments

    Welcome!

    This is the website of Michael Gerth. I am a biologist with an interest in insects and the microbes within them. Click here to learn more.

    Archives

    June 2022
    February 2022
    June 2020
    October 2019
    March 2019
    January 2019
    February 2018
    December 2017
    July 2017
    December 2016
    April 2016
    March 2016

    Categories

    All
    Announcements
    Bees
    General
    Gene Repertoire Analysis
    Genomics
    Howto
    Long Branch Attraction
    Phylogenetics
    Post-publication Review
    Publications
    Scripts
    Wolbachia

    RSS Feed


    Tweets by gerth_micha
  • News & Blog
  • About
  • Research
    • Experimental evolution of Spiroplasma after host shifts
    • Wolbachia phylogenomics
    • Rickettsial symbionts in green lacewings
    • The role of Wolbachia in quill mites
    • Methods in molecular phylogenetics
  • Publications
  • Resources