A better look at big data
The old adage that “bigger is better” may or may not be true when it comes to data. Nevertheless, big data has become a fact of life that impacts all areas of society, including scientific research. Paul McNicholas, Canada Research Chair in Computational Statistics, is developing new, computationally intensive statistical methods to get a better insight into big, and otherwise complex, data.
Some data sets have many measurements taken for each observation. While statisticians and computer scientists routinely deal with data that can contain hundreds or thousands of variables, modern data sets often have upwards of ten thousand variables. Unfortunately, there is a lack of effective methodology for this so-called “ultra high-dimensional” data.
McNicholas is combining expertise in computing and statistics to develop computational statistics approaches for this type of data. He will focus on methods that find subgroups of similar observations and are applicable in any setting where ultra high-dimensional data exists—from management science to disease diagnostics and bioinformatics.
McNicholas is also developing computational statistics methods that will allow users to make sense of massive data sets with measurements of different types. Similar to his work on ultra high-dimensional data, this work promises to simplify and facilitate data analytics in many fields of social, economic, and scientific endeavour.
McNicholas’ research will help create statistical approaches to data that can: look for subtypes of certain cancers; help identify candidate genes for modification to allow food crops to grow in developing countries; and combine genetic, fitness, and other health data to study the relationship between obesity, genes, nutrition, and exercise.