Modern Statistics and Machine Learning for Population Health in Africa

A hands-on course for students and researchers at the intersection of statistics, probabilistic programming, and population health

24th - 28th March 2025
Location: AIMS Cape Town, South Africa
Organised by: Department of Mathematics, Imperial College London; the Machine Learning and Global Health Network; and the African Institute for Mathematical Sciences

Overview

One of the groundbreaking advances in machine learning research in the past decade is surrounding the emergence of increasingly sophisticated, robust, and easily usable probabilistic programming languages. These new tools, including Stan or numpyro, hide tedious calculations involving automatic differentiation and gradient-based optimization from the end-user, making modern statistical methods widely available to data scientists in Africa that wish to address some of the most urgent challenges on the continent, ranging from habitat degradation, air pollution, extreme weather events, disease outbreaks and population health in general.

This one-week course will cover how you can integrate modern statistical techniques with the Stan probabilistic programming language to effectively address a broad range of applications from epidemiological, genomic and spatial data. We hope this course will equip you with intelligence-driven statistical technologies to drive your own evidence-based discoveries in global health or other applications, and more broadly increase your fluency in artificial intelligence and modern statistics.

Content covered/What attendees will learn

  • Bayesian workflow with probabilistic programming (Stan)
  • Core regression models for hierarchical data
  • Gaussian process regression with Stan
  • State-of-the-art GP approximations for scalable inference
  • Infectious disease modelling with probabilistic programming
  • Pathogen phylogenetics with Stan

Practical real-world examples with applications in malaria modelling, HIV epidemiology, ecology, environmental health
Varied datasets including Spatial data, genomic data, epidemiological data
Stan templates and Python code for implementing the methods covered

Learning styles/course structure

  • Lectures
  • Individuals labs
  • Group project
  • Presenting findings

Who should attend and pre-requisites

  • Students and researchers interested in advanced statistical methods and probabilistic programming with applications in global health, including analysis of clinical trials and studies, infectious disease epidemiology and modelling outbreaks, and handling large genomic datasets for the surveillance of pathogens.
  • Attendees should have good knowledge of python and pandas to participate fully in the practical components. Previous experience with a probabilistic programming language (e.g. Stan, NumPyro, PyMc, Turing.jl) is advantageous but not essential.
  • Attendees should be familiar with git for reproducible analyses and collaborative coding.
Apply