Research summary
Statisticians and computer scientists routinely deal with data sets that can often have upwards of ten thousand variables. Data clustering is a technique they can use to divide data sets into groups where data have two or more variables that are similar to one another. As Canada Research Chair in Computational Statistics, Dr. Paul McNicholas is exploring a computational statistics approach to data clustering called mixture model-based clustering.
McNicholas and his team are developing model-based clustering approaches that are effective for three-way data with outliers, and other approaches that are effective for higher-order data. They are also working on high-performance implementations of these techniques. They hope to release new approaches as freely available packages (that is, containing modules, tests and documentation) for the R or Julia programming languages. Ultimately, their research could simplify and facilitate data analytics in social, economic and scientific endeavours.