I recently attended the Bioconductor 2019 conference in New York City, where I was lucky enough to give a workshop on my Bioconductor package plyranges and present some new ideas I’m working on range-based summarisation and visualisation. After some discussion with both Bioconductor veterans and new-comers there was general agreement that it was hard to find good resources or even a beginner’s guide for learning S4. This blog-post is an attempt to rectify that.
In this assignment, the focus was to practice data cleaning. Students suggested questions to build a class survey, to get to know the interests of other class members, and then completed the composed survey. After cleaning the data, a few summary plots of interesting aspects of the data were made. There are some common mistakes that rookies often make when constructing data plots: packing too much into a single graphic, leaving categorical variables unordered, reversing norms for response and explanatory variables, conditioning in wrong order, plotting counts when proportions should be the focus, not normalizing by counts, using a boxplot for small sample size.