Deep Dive workshop

Big Scandinavian Data and LLMs

Wednesday 27 August 9.00

Organizer: Johannes Bjerva, Aalborg University

Program

Intro including instructions for the mingling activity (5 min)

Keynote
(20 min + 10 min Q&A)
Ditte Laursen, Ph.D., Royal Danish Library

Keynote (20 min + 10 min Q&A)
Danila Petrelli, Senior Data Lead, AI Sweden

Breakout groups with structured discussions (25 min)

Break (15 min)

Panel (25 min)
Ditte Laursen and Danila Petrelli. Moderator: Stella Frank

Keynote (20 min + 10 min Q&A)
Heather Lent, Ph.D., Aalborg University

Mingling activity (30 min)
Find your note-of-interest match

Wrap-up (5 min)

Mingling Activity

This activity aims to encourage attendees to talk to each other and meet new people. Before the break, participants will write a brief Note of Interest (NoI), indicating their main area of interest. In the “Mingling activity”, participants will be given a random NoI as collected during the break. They are then tasked with finding the person who wrote that NoI and finding the person holding their own NoI.

Speaker’s list

The role of libraries in the language modelling revolution
Ditte Laursen, Ph.D, Senior researcher, Royal Danish Library

Areas of expertise include collection management, it governance and research & development. Special interests include digital cultural heritage, digital humanities and digital research infrastructures, including AI’s use of digital collections as data.

What’s in the Data? Lessons from Building Scandinavian LLM Foundations

Danila Petrelli, Senior Data Lead @ AI Sweden

Danila works at AI Sweden, where, as part of the Natural Language
 Understanding team, she leads work on data governance and dataset
     development for large language models. She’s especially interested
 in public sector use cases, multilingual data, and the challenges of
 building European LLMs with open and legally sound data. Her recent
 work includes the TrustLLM project, national data infrastructure
 efforts, and initiatives to connect legal, technical, and societal
 perspectives on data.

Ethics in Multilingual NLP Security // How NLP Security Affects Scandinavian Languages
Heather Lent, Ph.D, Aalborg University

Heather is a postdoc at AAU working on Security in LLMs. She has a Ph.D from the University of Copenhagen. Her research interests include multilingual NLP, with a focus on small and underserved languages.

Target audience & size

The target audience is Ph.D students, Postdocs, Faculty, and industry practitioners. We expect attendance from those with an interest in NLP, LLMs, textual data, and how current
developments affect society (e.g. cultural effects, security challenges). We do not have a preference for a maximum audience size – however, we strongly prefer a room where tables are set up in groups during the whole workshop in order to facilitate our breakout group discussions. This might place restrictions on the maximum number of participants.

Workshop outcome
  • Cutting-edge knowledge about data and models in multilingual NLP
  • Consideration of ethical and legal issues surrounding data collection for LLM training
  • Increase the participant’s professional network in the Danish NLP scene
Level

Intermediate

Organizers