Announcement: Statistical Challenges and Opportunities for the Analysis of Microbiome Data

Announcement: Statistical Challenges and Opportunities for the Analysis of Microbiome Data


May 10, 08:30 - May 11, 16:30, 2017
Apotex Building - University of Manitoba 750 McDermot Winnipeg Manitoba R3E0T6
Team: Events, Training & News
Posted on February 03, 2017

Announcement: Statistical Challenges and Opportunities for the Analysis of Microbiome Data

Team: Events, Training & News

Location: Apotex Building - University of Manitoba 750 McDermot Winnipeg Manitoba R3E0T6

May 10, 08:30 - May 11, 16:30, 2017


Overview & Objectives

Microbiome composition is increasingly recognized as a crucial component of human and animal health. While the microbiome performs maintenance functions, such as vitamin production in the gut, recent research suggests that dysbiosis of the microbiome is associated with complex and persistent health conditions such as inflammatory bowel disease and depression. Accordingly, the microbiome is an expanding topic of research in a diverse range of disciplines.

Analyses of microbiome data are complex. The data are inherently multi-dimensional because thousands of species of microorganisms may be present in any given sample. Multiple testing is common, as researchers aim to compare the abundance of different taxa between populations. Prevalence of some species may be sparse, requiring noisy zero-inflated models. Microbiome data are also susceptible to undersampling of low-abundance species and genes, potentially omitting rare but important components and distorting diversity and relative abundance measures. Changes in the microbiome composition may also be associated with time-varying characteristics of the patient. Finally, microbiome composition is difficult to rigorously and concisely summarize, owing to its high dimensionality and non-Gaussian structure.

The volume of research about statistical techniques for microbiome data is growing rapidly. However, there is still substantial uncertainty about the performance of competing statistical models and practices. Novel modeling approaches will need to be developed and disseminated.

The workshop objectives are therefore: (1) bring together international, national, and local researchers and trainees to share knowledge about methods for the analysis of microbiome data; and (2) stimulate discussion between applied researchers from animal and human health disciplines and statisticians about the analytic challenges and opportunities for microbiome data.



  • Dr. Shyamal Peddada, National Institute of Health
  • Dr. Jun Chen, Mayo Clinic
  • Dr. Tonya Ward, University of Minnesota
  • Dr. Gregory Gloor, University of Western Ontario
  • Dr. Charles Bernstein, Rady Faculty of Health Sciences, University of Manitoba
  • Dr. Gary van Domselaar, Public Health Agency of Canada’s National Microbiology Laboratory and University of Manitoba
  • Dr. Natalie Knox, Public Health Agency of Canada’s National Microbiology Laboratory
  • Dr. Ehsan Khafipour, Faculty of Agriculture, University of Manitoba
  • Michael Hall, Dalhousie University



Apotex Centre

University of Manitoba

750 McDermot Avenue

Winnipeg, MB R3E 0T5 CANADA




Part I: Introduction to Concepts and Topics in Metagenomics and the Microbiome (morning: May 10, 2017)

  • Dr. Natalie Knox, Public Health Agency of Canada’s National Microbiology Laboratory & Dr. Ehsan Khafipour, University of Manitoba

This section is for applied researchers and graduate students interested in learning more about the background of the microbiome, sampling and processing pipelines, available software packages, and common statistical analyses.

Morning Overview Session - Slides


Part II: Keynote Presentation and Setting the Stage (afternoon: May 10, 2017)

  • Dr. Shyamal Peddada, National Institutes of Health - Slides
  • Dr. Ehsan Khafipour, University of Manitoba
  • Dr. Gary Van Domselaar, Public Health Agency of Canada, National Microbiology Laboratory - Slides
  • Mr Micheal Hall, PhDc, Dalhousie University - Slides
  • Dr. Charles Bernstein, University of Manitoba - Slides

The afternoon keynote lecture will be intended for those with some experience in microbiome analysis, and who are interested in learning more about the current state of the art of statistical modeling approaches. The mathematical properties of microbiome data will be emphasized. Novel techniques and future opportunities will also be discussed.


Part III: Further Exploration and Discussion (May 11, 2017)

  • Dr. Jun Chen, Mayo Clinic
  • Dr. Tonya Ward, University of Minnesota - Slides
  • Dr. Gregory Gloor, The University of Western Ontario
  • Roundtable Discussions

The last day of the workshop is intended to bring together statisticians and experienced applied researchers in a collaborative environment. In addition to technical presentations, there will be several roundtable discussions emphasizing inter-disciplinary research opportunities for the advanced modeling of microbiome data.


Speaker Info & Abstracts

    • Shyamal Peddada

      Title: Some challenges in the analysis of microbiome data

      Abstract: Over the past couple of decades, researchers have been interested in studying genes by (external) environment interaction on human health. However, lately there is considerable interest to study the role of internal microbial environment on human health. Numerous studies are being routinely conducted to understand the association between microbiome and various health outcomes. The 16S rRNA data generated from such studies are high dimensional count data containing large number of zeros. Using these microbial count data, researchers are often interested in a wide range of problems, such as comparing various experimental groups and classification of subjects into groups (e.g. healthy and sick). Because of the intrinsic structure of these data, standard methods of analyses are not necessarily appropriate. A goal of this talk is to introduce some statistical issues relating to the analysis of these count data. For example, we shall discuss normalization, comparison of experimental groups, classification of samples etc. We shall use some recently published data to illustrate various methods described in this talk.

      Bio: Received PhD in Statistics from University of Pittsburgh in 1983 under the supervision of Professor C. R. Rao. Was a full professor in the department of statistics at the University of Virginia before joining the National Institute of Environmental Health Sciences (NIEHS) in 2000. Currently a Senior Investigator and Acting Chief of the Biostatistics and Computational Biology Branch (NIEHS). Fellow of ASA, Elected Member of International Statistical Institute, Recipient of ASA’s Outstanding Statistical Application Award (1998) for the work on tracking ice in the north pole and for bootstrap confidence regions for estimating the motion of tectonic plates.

    • Gary Van Domselaar

      Title: One sample, three labs - a multi-centre study of variation in the analysis of the IBD-associated microbial population from common donors

      Abstract: The microbiome has been found to play an important role in health and disease. New studies emerge nearly every day demonstrating the influence of the microbiome on the immunological and metabolic activities of the host. However, the findings of individual studies have been difficult to replicate, due in part to the sensitivity of the microbial population composition on so many external factors which can be difficult or impossible to control. Increasingly, researchers are questioning the degree to which different laboratory preparation methods and data analysis pipelines may be contribute to this observed variation. In this study, we attempt to assess the influence of technical variation on the observed microbial population. Twenty biopsy samples collected in triplicate from the gastrointestinal tract mucosa of twenty patients, flash frozen, and sent to three independent labs for library preparation, sequencing, and analysis. The purified nucleic acid from each lab was also sent to a single lab for sequencing and variation. The results of these studies will be used to assess the degree to which DNA extraction, library preparation, sequencing, and DNA analysis can introduce variation when performed by different labs.

      Bio: Gary Van Domselaar, PhD is the Chief of Bioinformatics and an international authority on microbial bioinformatics with a wide-ranging expertise in bacterial and viral genomics including metagenomics, genomic epidemiology, pathogenomics, advanced molecular detection, molecular surveillance, and vaccine development. Dr. Van Domselaar has several active investigations in metagenomics, including the use of metagenomics approaches in culture independent diagnostic testing, he is also studying the link between multiple sclerosis and the microbiome, as well as other immune-mediated inflammatory disorders such as Crohn's disease, ulcerative colitis, multiple sclerosis, and rheumatoid arthritis.

    • Jun Chen

      Title: A Robust and Powerful Statistical Framework for Microbiome Data Analysis

      Abstract: Next generation sequencing technologies have enabled a culture-independent study of the microbiome using direct DNA sequencing. One strategy sequences the bacterial 16S rRNA gene for studying the bacterial component of the microbiome. The sequenced 16S tags are usually clustered into the Operational Taxonomic Units (OTUs) and downstream statistical analyses are performed based on the OTU data. Analysis of such OTU data raises many statistical challenges, including modeling excessive zeros and overdispersion and taking into account the phylogenetic relationship among OTUs. In this presentation, I will talk about several statistical methods for robust and powerful analysis of microbiome data. To test for the overall association between the microbiome composition and an outcome, I will introduce a powerful small-sample kernel-machine test implemented in our recently published MiRKAT software. To identify specific taxa associated with an outcome, I will present a fully generalized regression model based on a zeroinflated negative binomial model, which allows covariate-dependent dispersion to account for sample heterogeneity. A powerful omnibus test is designed to detect any difference in prevalence, abundance, and dispersion of the count distribution. In the framework, a new normalization method for zeroinflated sequencing data will also be introduced. To further improve the power after differential abundance testing, I will talk about a false discovery rate control procedure that integrates prior structure information such as the phylogenetic tree for microbiome data. The method is based on a hierarchical model, where the structure information is encoded in a structure-based prior. I will illustrate these methods using simulations as well as real data sets.

      Bio: Dr. Chen is an assistant professor of biostatistics and a senior associate consultant in the Division of Biomedical Statistics and Informatics at Mayo Clinic, where he has been involved in both collaborative and methodological research. Previously he received his PhD in genomics and computational biology from the University of Pennsylvania, and completed his postdoctoral training at Harvard University. He is a recipient of Saul Winegrad Award for outstanding dissertation and the Gerstner Family Career Development Award from Mayo Clinic. Dr. Chen's current research focuses on efficient modeling of microbiome sequencing data by taking into account its inherent structure and large variability.  

    • Charles Bernstein

      Title: Searching for the cause of IBD with microbiome-related studies spanning epidemiology, to ecology to human microbiology

      Abstract: Inflammatory bowel diseases include Crohn’s disease and ulcerative colitis, diseases. Affected individuals have inflammatory changes in their gastrointestinal tract that primarily lead to diarrhea, gastrointestinal bleeding and abdominal pain. In genetically predisposed individuals, the intestinal and systemic immune systems respond aberrantly to some environmental trigger. Currently, the favored hypothesis is that the gut microbiome is harboring either the triggering microbes or is lacking the protective microbes that facilitate the aberrant immune response. While IBD has been well established for seventy years in the developed world it has been only just emerging in the past twenty-five years in the developing world. However, for the most part IBD phenotypically is quite similar in peoples of developed and developing nations. Hence, any etiologic hypothesis must account for the presence of disease worldwide that clinically looks quite similar. In this lecture I will discuss the studies we have undertaken to advance our understanding of the potential for the gut microbiome to harbor the etiologic answer to IBD and other chronic immune mediated inflammatory diseases.

      Bio: Dr. Charles Bernstein is Bingham Chair in Gastroenterology and Director, University of Manitoba Inflammatory Bowel Disease Clinical and Research Centre. He developed among the largest validated population based databases of inflammatory bowel disease (IBD). His main research interests are primarily related to IBD; in terms of optimizing management approaches; exploring predictors of clinical outcomes; and disease etiology. More recently, he has been actively involved in exploring the biological and clinical intersection between different chronic immune mediated inflammatory diseases. He has been elected into the Canadian Academy of Health Sciences (2008) and Royal Society of Canada -Life Sciences Division of the Academy of Science (2012).

    • Gregory Gloor

      Title: Microbiome datasets are compositional: and this is not optional

      Abstract: Datasets collected by high throughput sequencing (HTS) from 16S rRNA gene sequencing or from metagenomic sequencing are commonplace and being used to study human disease states, ecological differences between sites, and the built environment. There is increasing awareness that microbiome datasets generated by high throughput sequencing (HTS) are compositional: that is, they have a constant or irrelevant total. However, many investigators are unaware of this or treat their data as conditionally compositional, or make specific assumptions about the properties of the data. I want to alert investigators to the dangers inherent in these approaches, and point out that HTS datasets derived from microbiome studies can and should be treated as compositions at all stages of analysis. I start with a brief introduction to the nature of compositional data, illustrate the pathologies that occur when compositional data are analyzed inappropriately, and finally give a use example from a 1000 sample cross-sectional cohort on how compositional data analysis can be adapted to microbiome datasets.

      Bio: Dr. Gregory Gloor is a professor of Biochemistry at the University of Western Ontario where he also completed his PhD in Biochemistry. His work focuses on computational biology and bioinformatics; with an interest in the use of high-throughput sequencing methods to study bacterial populations and a second interest in molecular evolution. He also actively develops methods to characterize microbiota samples using emerging sequencing platforms. His lab developed a novel statistical method for meta-RNA-seq that uses Bayesian techniques coupled with the centred log-ratio transformation to conduct a consistency check on gene expression levels. With this, gene expression changes that are not linked to organism abundance can be identified. This methodology provides a robust statistical framework that allows us to interrogate mixed microbial communities and find out what the constituent organisms are doing.  

    • Tonya Ward

      Title: BugBase: a tool to predict organism-level microbiome phenotypes

      Abstract: Microbiome studies increasingly focus on identifying functional mechanisms linked to disease or experimental conditions. Shotgun metagenomics and marker gene amplicon sequencing can be used to measure directly or predict the functional repertoire of the microbiota en masse, but current methods do not readily estimate the functional capability of individual organisms within the microbiome. BugBase addresses these challenges as an algorithm that predicts organism-level coverage of functional pathways within complex microbiomes using either whole-genome shotgun or marker gene sequencing data. We find organism-level pathway coverage predictions for inferred phenotypes from BugBase to be statistically higher powered than current ‘bag-of-genes’ approaches for discerning functional changes in both host-associated and environmental microbiomes. In addition to predicting the presence of user-defined pathways in microbiome samples, BugBase also predicts biologically interpretable organism-level phenotypes such as oxygen tolerance, Gram staining and pathogenic potential by utilizing databases of experimentally-annotated bacterial phenotypes. BugBase enables novel biological insights and generation of new mechanistic hypotheses across a broad range of microbiome types with potential applications in medical, agricultural, industrial, and environmental research.

      Bio: Dr. Tonya Ward completed her PhD in Biochemistry from the University of Ottawa in 2014. Her researched focused on immune-modulatory components of human milk, such as innate immune proteins, the human milk microbiome and immune-modulatory DNA within human milk. Currently, Dr. Ward is a postdoctoral associate working with Dr. Dan Knights within the Biotechnology Institute at the University of Minnesota. Her research projects include the development of novel microbiome analysis tools, the characterization of the infant fungal microbiome (mycobiome) and determining its relation to allergy development, determining microbial changes in microbiome of patients with irritable bowel syndrome, and the elucidating the effect of probiotics and antibiotics on mammalian and avian microbiomes.  

    • Natalie Knox

      Title: An introduction to microbiome and metagenomics data analysis

      Abstract: The goals of this introductory session are to provide attendees with a broad overview of microbiome and metagenomics data analysis. Basic principles, advantages and current limitations will be presented in this dynamic area of research. The session will also highlight statistical challenges associated with these complex datasets and offer potential solutions to mitigate them. To conclude, case studies will be examined to demonstrate the theoretical principles in an applied approach using published research. At the end of the session, attendees should have a greater understanding of the fundamental aspects of microbiome and metagenomics data analysis and the ability to interpret research in these areas.

      Bio: Natalie Knox, Ph.D., is the head of Bacterial Genomics at the National Microbiology Laboratory, the federal center for infectious disease research at the Public Health Agency of Canada. Her research interests include genomic epidemiology of foodborne diseases, development of large-scale comparative genomic methods, and the application of metagenomics for pathogen detection. She currently leads several large-scale next generation sequencing projects to improve the applicability of ‘–omics’ data for use in foodborne surveillance and outbreak response in Canada.

    • Michael Hall

      Title: Breaking down the OTU

      Abstract: In targeted metagenomics projects, the resulting amplicon gene sequences are frequently clustered into operational taxonomic units (OTUs) based on shared sequence identity. We will explore the practical reasons for this approach and contrast commonly used algorithms. Recent work suggests that the OTU clustering approach does not fully leverage the accuracy of modern sequencers. Alternatives such as oligotyping and sequence denoising have been shown to partition the data at a finer resolution. We will describe these approaches with a focus on DADA2, the default sequence clustering algorithm in the newest version of the QIIME package.

      Bio: Michael is a PhD student in Dalhousie University’s Interdisciplinary PhD program, supervised by Dr. Robert Beiko. He obtained a BMath degree in Computational Mathematics from the University of Waterloo, and a MSc in Computational Biology and Bioinformatics from Dalhousie. He has worked on a range of microbial community surveys, spanning environments such as the human oral cavity and gut, mosquito guts, agricultural soil, and bioreactors



The itinerary for the two-day workshop can be viewed here.


Registration & Travel Awards


Registration is free, and offered in three separate parts. Attendees can register for as many sections as they wish, but since space is limited we ask that you select only those you are able to attend, and which also align with your research goals.

Travel awards are available to post-docs and trainees at Canadian institutions, up to a maximum of $750. Those interested in applying should submit a summary of their academic background, relevant research interests, and budget that includes expenses and anticipated sources of funding to by March 17, 2017. Applications should be no longer 750 words, excluding budget.

Questions about the workshop should be submitted to


Organizing Committee

      • Lisa Lix, George & Fay Yee Centre for Heathcare Innovation
      • Elif Acar, Department of Statistics, University of Manitoba
      • Brenden Dufault, George & Fay Yee Centre for Healthcare Innovation
      • Ehsan Khafipour, Department of Animal Science, University of Manitoba
      • Gary Van Domselaar, National Microbiology Laboratory



The Apotex Centre is located near the CanadInns Health Sciences hotel, which is near downtown Winnipeg and within walking distance of the workshop facilities. Hotel amenities include two restaurants, fitness centre, and Starbucks Coffee.


Sponsored by:

Documents and Photos

Web Links and Embedded Content



Location: 750 McDermot, Winnipeg, Manitoba, Canada