Why I stopped C++

I have the feeling that generally bioinformatician use two languages, a language for small scripts, rapid analysis, prototyping (usually an interpreted language) and another when we have performance needs (usually a compiled language). Until recently thes…

5 minute read 12 November 2018

How to reduce the impact of your PAF file on your disk by 95%

Last week Shaun Jackman posted this tweet: I have a 1.2 TB PAF.gz file of minimap2 all-vs-all alignments of 18 flowcells of Oxford Nanopore reads. Yipes. I believe that's my first file to exceed a terabyte. Is there a better way? Perhaps removing th…

9 minute read 11 October 2018

State-of-the-art long reads overlapper-compare

Introduction In 2017, Chu et la. wrote a review 1 to present and compare 5 long-read overlapping tools, on 4 datasets (including 2 synthetic ones). This paper is very cool and clear. The authors compare overlappers with respect to peak memory, wall clock …

6 minute read 13 April 2018