Deep Dive workshop

Statistical perspectives on machine learning for health applications

Wednesday 27 August 9.00

Organizer: Line Clemmensen, University of Copenhagen

Our session aims to bring a statistical perspective on data science with applications in healthcare. Healthcare is considered a high risk application in EU’s AI act, and therefore statistical rigor is of utmost importance. The use of machine learning in health science research is rapidly increasing, and AI tools are also entering clinical practice. This stresses the importance of understanding the specific challenges embedded into health data, and how they interplay with machine learning approaches. In this session, we introduce a number of cases and topics where statistical thinking may provide new insights into ML for health applications.

We will bring together researchers and practitioners to discuss methods and best practices for accurate modeling and evaluation of ML/AI in health applications. The methodology and discussions are supported by examples with computer vision to detect skin cancer, risk of cardiovascular disease, and medical image analysis.

Program

The session will be organized as a mix of invited (3 x 30 minutes) and contributed (3 x 15 minutes) talks, as well as facilitated discussions at tables (2 x 20 minutes). The invited talks provide introductory overviews to general themes, while the contributed talks can zoom in on a more specific topic or piece of research. The table discussions will focus on how the talks relate to the participants’ own data science practice and experiences.

The invited speakers are:

  • Nyalleng Moorosi, Senior Researcher, Distributed AI Research (DAIR) Institute
    Appropriate use of Long-Tailed Learning Methods – Case Study: Skin Cancer Identification

  • Michael Sachs, Associate Professor, Pioneer Centre for SMARTbiomed & Section of Biostatistics, Dept. of Public Health, University of Copenhagen
    Concepts and challenges in causal prediction for medical risk prediction. Application example: Cardiovascular disease

  • Line Clemmensen, Professor of Computational Statistics, University of Copenhagen, Technical University of Denmark
    Data Representativity in Machine Learning. Application example: Health economics vs the patient’s rights to best treatment

 

The contributed speakers are:

  • Tanja Bugajski, Cand.Polyt. Student in Mathematical Engineering, Aalborg University:
    Forecasting Intra-Day Glucose Patterns in Individuals with Type 2 Diabetes: A Transparent Benchmark for personalized Prediction

  • Harald Vilhelm Skat-Rørdam, Research Assistant, Statistics and Data Analysis, Department of Applied Mathematics and Computer Science, DTU:
    Measuring What Matters: Temporal Tolerance for Stress Event Detection While Preserving Ground Truth

  • Christoffer Sejling, PhD Student, Section of Biostatistics, Department of Public Health, University of Copenhagen:
    Novel Approach for Hierarchical Family Selection of an Ambient Air Pollutant Mixture with Application to Childhood Asthma.
Target audience and size

We expect an audience of around 50-70 people in the room. The target audience is any participant with an interest in or working with data science for healthcare applications, including but not limited to those on the statistical evaluation of AI Summer School. Both conference participants engaged in health care applications using data science methods, or those who do research in trustworthy AI and related fields may find this session useful.

Workshop outcome

We expect participants to gain insights into methodological challenges when working with health data and an overview of existing knowledge and approaches from statistics that may help address such challenges. Through the table discussions, participants will reflect on how the topics presented by the invited speakers relate to their own work. We will capture the learnings on posters with colored stick-it notes.

Level

Introductory

Organisers

The organizing team is multidisciplinary and multi-institutional, bringing forth experience from theoretical and applied machine learning and statistics.

  • Line H. Clemmensen, Professor (lkhc@math.ku.dk), KU (MATH) – Lead organizer
    Her research is focused on machine learning models, low resource applications and transfer learning and evaluation of artificial intelligence (AI). Her current work investigates trustworthy AI and applications in psychiatry. She earned her PhD in 2010 from DTU.

  • Anne Helby, Assistant Professor (ahpe@sund.ku.dk), KU (SUND)
    Her research focuses on causal discovery, spanning from methodological research, into translational and applied research on observational health data. She is particularly interested in applicability in cohort studies, including life course epidemiology. She obtained a PhD in Biostatistics from University of
    Copenhagen in 2022.

  • Ahcène Boubekki, Postdoc (ahcene.boubekki@ptb.de), Physikalisch-Technische Bundesanstalt (Berlin).
    He is a postdoctoral researcher at the Machine Learning and Uncertainty group at PTB. Prior to joining the PTB, he was a postdoctoral researcher at the Machine Learning group at University in Tromsø, Norway. His research interests lie in explainable ai, unsupervised learning, and representation learning.

  • Sneha Das, Assistant professor (sned@dtu.dk), DTU Compute (STAT)
    Her research is on AI-safety and low-resource ML. She focuses on applications within high-stake domains like multi-sensory AI for (mental) health, education, and is active in the AI-regulation sphere. She received her PhD on Speech and Language technology from Aalto University, Finland in December 2021, on speech coding, enhancement and privacy for distributed speech processing.

  • Dan Witzner, Professor (Witzner@itu.dk), ITU (ML)
    His focus is on robust methods for extracting information from the eyes. He investigate how this information can be employed as a part of the next generation of contactless and continuous biometric identification modalities and how it can be used for the general public everyday and out-of-laboratory use.