The study of social insects has traditionally used approaches including behavioural observation and taxonomic sampling, with genetic analyses becoming more common since the mid-2000s. A pleasant surprise at the conference was the recent increase in highly molecular, genome-wide approaches where whole or partial genomes or transcriptome sequences of many individuals are obtained in order to make specific comparisons within species, or sometimes also between species.
A major challenge for small research labs now wielding in large genomic datasets is that it is easy to make a small mistake that has high costs, see the articles by Greg Miller, M. Gallego et al. and Steve Horvath.
In light of this, as part of a workshop on genomics approaches organised with Tim Linksvayer and Alex Mikheyev, I gave an overview of some of the lessons we can transfer from the worlds of “other” data sciences to our expanding world of social insect genomics. This includes:
benefits of peer-reviewing code, and of peer-coding sessions;
using specific tools that increase productivity while decreasing risks (rmarkdown, fat machines, snakemake/nextflow);
benefits of visualising data in many different manners. Typically when people learn to do basic linear models they learn the importance of visually inspecting some plots (e.g. qqplot, residuals). But when we end up performing tens of thousands of such analyses (e.g. one for each gene or one for each SNP), many forgo doing this.