Misassemblies in noisy assemblies

I think that all the people who have ever done a genome assembly one day say: "Ok my assembly is cool, but now how I can be sure that it's the best and it doesn't contain a lot of errors ?" We have many technics to evaluate the quality of assemb…

14 minute read 19 September 2019

Why I stopped C++

I have the feeling that generally bioinformatician use two languages, a language for small scripts, rapid analysis, prototyping (usually an interpreted language) and another when we have performance needs (usually a compiled language). Until recently thes…

6 minute read 12 November 2018

How to reduce the impact of your PAF file on your disk by 95%

Last week Shaun Jackman posted this tweet: I have a 1.2 TB PAF.gz file of minimap2 all-vs-all alignments of 18 flowcells of Oxford Nanopore reads. Yipes. I believe that's my first file to exceed a terabyte. Is there a better way? Perhaps removing th…

10 minute read 11 October 2018

State-of-the-art long reads overlapper-compare

Introduction In 2017, Chu et la. wrote a review 1 to present and compare 5 long-read overlapping tools, on 4 datasets (including 2 synthetic ones). This paper is very cool and clear. The authors compare overlappers with respect to peak memory, wall clock …

7 minute read 13 April 2018