Big Scandinavian Data and LLMs

Deep Dive workshop

Wednesday 27 August 9.00

Organizer: Johannes Bjerva, Aalborg University

Program

Intro including instructions for the mingling activity (5 min)

Keynote (20 min + 10 min Q&A)
Ditte Laursen, Ph.D., Royal Danish Library

Keynote (20 min + 10 min Q&A)
Danila Petrelli, Senior Data Lead, AI Sweden

Breakout groups with structured discussions (25 min)

Break (15 min)

Panel (25 min)
Ditte Laursen and Danila Petrelli. Moderator: Stella Frank

Keynote (20 min + 10 min Q&A)
Heather Lent, Ph.D., Aalborg University

Mingling activity (30 min)
Find your note-of-interest match

Wrap-up (5 min)

Mingling Activity

This activity aims to encourage attendees to talk to each other and meet new people. Before the break, participants will write a brief Note of Interest (NoI), indicating their main area of interest. In the “Mingling activity”, participants will be given a random NoI as collected during the break. They are then tasked with finding the person who wrote that NoI and finding the person holding their own NoI.

Speaker’s list

The role of libraries in the language modelling revolution
Ditte Laursen, Ph.D, Senior researcher, Royal Danish Library

Areas of expertise include collection management, it governance and research & development. Special interests include digital cultural heritage, digital humanities and digital research infrastructures, including AI’s use of digital collections as data.

What’s in the Data? Lessons from Building Scandinavian LLM Foundations 
Danila Petrelli, Senior Data Lead @ AI Sweden

Danila works at AI Sweden, where, as part of the Natural Language  Understanding team, she leads work on data governance and dataset  development for large language models. She’s especially interested  in public sector use cases, multilingual data, and the challenges of  building European LLMs with open and legally sound data. Her recent  work includes the TrustLLM project, national data infrastructure  efforts, and initiatives to connect legal, technical, and societal  perspectives on data.

Ethics in Multilingual NLP Security // How NLP Security Affects Scandinavian Languages
Heather Lent, Ph.D, Aalborg University

Heather is a postdoc at AAU working on Security in LLMs. She has a Ph.D from the University of Copenhagen. Her research interests include multilingual NLP, with a focus on small and underserved languages.

Target audience & size

The target audience is Ph.D students, Postdocs, Faculty, and industry practitioners. We expect attendance from those with an interest in NLP, LLMs, textual data, and how current
developments affect society (e.g. cultural effects, security challenges). We do not have a preference for a maximum audience size – however, we strongly prefer a room where tables are set up in groups during the whole workshop in order to facilitate our breakout group discussions. This might place restrictions on the maximum number of participants.

Workshop outcome

Cutting-edge knowledge about data and models in multilingual NLP
Consideration of ethical and legal issues surrounding data collection for LLM training
Increase the participant’s professional network in the Danish NLP scene

Level

Intermediate

Organizers

Johannes Bjerva (lead), Full Professor at AAU, jbjerva@cs.aau.dk
Stella Frank, Postdoc at UCPH/Pioneer Centre for AI, stfr@diku.dk
Danae Sanchez, Postdoc at UCPH, davi@di.ku.dk
Alice Schiavone, Research Assistant at UCPH, alisch@di.ku.dk
Ernests Lavrionvics, Ph.D. student at AAU, elav@cs.aau.dk
Hannah Claus, Ph.D. student at the University of Cambridge, hmc78@cam.ac.uk

Program

Mingling Activity

Speaker’s list

Target audience & size

Workshop outcome

Level

adDress

hotel nyborg strand
østerøvej 2
5800 nyborg

contact

info@direc.dk

about us

read more

CODE OF CONDUCT

Follow us

Program

Mingling Activity

Speaker’s list

Target audience & size

Workshop outcome

Level

adDress

hotel nyborg strandøsterøvej 25800 nyborg

contact

info@direc.dk

about us

read more

CODE OF CONDUCT

Follow us

hotel nyborg strand
østerøvej 2
5800 nyborg