Proceedings

Workshop and conference papers can be found in the proceedings:

Volume 1: Long and Short Papers
Volume 2: Workshops

The proceedings will also be archived in the ACL Anthology.

Keynotes

Michael Hahn: Opening the Black Box of Language Models via Theory and Interpretability

Wednesday, 14:00-15:00 (BC.L.1.31)

Recent progress in LLMs has rapidly outpaced our ability to understand their inner workings. This talk describes our work aiming towards such understanding. First, we mechanistically reverse-engineer transformers’ solutions to tasks such as arithmetic and in-context learning. Second, we develop rigorous results describing the abilities (and limitations) of transformers and other architectures in performing reasoning. We show that this can help us understand abilities and limitations of LLMs on practically-relevant tasks, and even point to possible improvements. I will close with directions for future research.

Michael Hahn Michael Hahn is a Tenure-Track Professor (W2) at Saarland Informatics Campus at Saarland University, where he directs the Language, Computation, and Cognition Lab (LaCoCo). He is affiliated with the Departments of Language Science and Technology and Computer Science. Michael Hahn received his PhD from Stanford University in 2022, advised by Judith Degen and Dan Jurafsky.

Sina Zarrieß: What does it take to raise a BabyLM?

Thursday, 13:30-14:30 (BC.L.1.31)

The current landscape of CL research is dominated by language models that are too large to be designed and built from scratch by most researchers. This radically limits the possibilities for scientific experimentation and thus our understanding of how these models work. In this talk, I will argue for research on small language models, trained from scratch, on orders of magnitude less data than LLMs. I will present evidence that basic linguistic abilities emerge in small language models in a very similar way to much larger models. I will also show how experimenting with modelling choices in small language models can help us understand the limitations of LLMs.

Sina Zarrieß Sina Zarrieß is a professor for Computational Linguistics at Bielefeld University. Previously, she held a junior professorship for Digital Humanities, Language Technology and Machine Learning at the University of Jena. She obtained her PhD at the Institute for Natural Language Processing at Stuttgart University and spent her post-doc at the University of Bielefeld as a member of the Excellence Cluster for Cognitive Interaction Technology. Her research focuses on computational models of language use in text and dialogue, with applications in natural language generation, dialogue systems, language & vision.

Timo Freiesleben: Dear XAI Community, We Need to Talk! Fundamental Misconceptions in Current XAI Research

Friday, 9:00-10:00 (BC.L.1.31)

Despite progress in the field, significant parts of current XAI research are still not on solid conceptual, ethical, or methodological grounds. Unfortunately, these unfounded parts are not on the decline but continue to grow. Many explanation techniques are still proposed without clarifying their purpose. Instead, they are advertised with ever more fancy-looking heatmaps or only seemingly relevant benchmarks. Moreover, explanation techniques are motivated with questionable goals, such as building trust, or rely on strong assumptions about the ’concepts’ that deep learning algorithms learn. In this talk, I will highlight and discuss these and other misconceptions in current XAI research. Moreover, I will suggest steps to make XAI a more substantive area of research.

Timo Freiesleben Timo Freiesleben is a postdoctoral fellow at the Cluster of Excellence Machine Learning for Science at the University of Tübingen. His research explores how concepts from the philosophy of science — such as explanation, representation, and robustness — can inform and enhance both theoretical and practical aspects of machine learning. His work focuses particularly on how machine learning can contribute to generating new scientific insights. Prior to his position in Tübingen, he completed his PhD at the Munich Center for Mathematical Philosophy at LMU Munich, where he investigated the question of what explainable artificial intelligence actually explains.

Public Lecture

Thursday, 12:30-13:30 (BC.L.1.31)

On Thursday, September 21st, we organize a public lecture during the lunch break that is open to all KONVENS participants, but to all interested employees, students and visitors of the university of Hildesheim as well.

Torsten Zesch: Smarte Technologie, bessere Bildung? KI in der Hochschule

Der Vortrag beleuchtet die vielversprechenden Möglichkeiten und kritischen Herausforderungen beim Einsatz von KI-Technologien in der Lehre. Von personalisierten Lernplattformen über automatisierte Bewertungssysteme bis hin zu intelligenten Tutoren: KI verspricht individuellere Betreuung, effizientere Prozesse und neue Lernwege. Doch was bedeutet das für die Rolle der Lehrenden? Verbessern sich wirklich die Lernergebnisse? Und welche technischen Fragen stellen sich beim Umgang mit Daten und Algorithmen im Bildungsbereich? Anhand konkreter Beispiele aus dem Reallabor der FernUniversität in Hagen diskutieren wir Chancen und Risiken des digitalen Wandels an Hochschulen. Dabei wird deutlich: Technologie allein macht noch keine bessere Bildung – entscheidend ist, wie wir sie gestalten und einsetzen. Ein Ausblick auf mögliche Zukunftsszenarien rundet den Vortrag ab und lädt zur kritischen Reflexion über die Bildung von morgen ein.

Torsten Zesch Torsten Zesch is a full professor of Computational Linguistics at CATALPA (Center of Advanced Technology for Assisted Learning and Predictive Analytics), FernUniversität in Hagen, Germany. He holds a doctoral degree in computer science from Technische Universität Darmstadt and was the president of the German Society for Computational Linguistics and Language Technology (GSCL) from 2017 to 2023. His main research interests are in educational natural language processing, in particular the ways in which teaching and learning processes can be supported by language technology. For this purpose, he develops methods for the automatic analysis of textual and multimodal language data, with a focus on robust and explainable models.

Oral Sessions

Click on the session titles to see more information.

1. Methods Wednesday, 15:00-16:00 (BC.L.1.31)

  • Surprisal in Action: A Comparative Study of LDA and LSA for Keyword Extraction
    J. Nathanael Philipp, Max Kölbl, Michael Richter
  • Learn to pick the winner: Black-box ensembling for textual and visual question answering
    Yuxi Xia, Klim Zaporojets, Benjamin Roth
  • LRMs are not thinking straight: Unreliability of thinking trajectories
    Jhouben Cuesta-Ramirez, Samuel Beaussant, Mehdi Mounsif
2. Applications Thursday, 9:00-10:00 (BC.L.1.31)

  • Adaption and Evaluation of Generative Large Language Models for German Medical Information Extraction
    Sören Spiegel, Seid Muhie Yimam, Philipp Breitfeld, Frank Ückert
  • ZEFYS2025: A German Historical Newspaper Dataset for Named Entity Recognition and Entity Linking
    Sophie Schneider, Ulrike Förstel, Kai Labusch, Jörg Lehmann, Clemens Neudecker
  • Generating Search-Engine-Optimized Headlines for Sports News
    Frank Zalkow, Benedikt Schäfer, Thomas Moissl, Jonas Bücherl, Kerstin Markl, Sebastian Bothe, Francois Duchateau, Julia Dollase, Patric Kabus, Daniel Steinigen, Oliver Schmitt, Fabian Küch
3. New Resources Thursday, 11:30-12:30 (BC.L.1.31)

  • Predicting Functional Content Zones in German Source-Dependent Argumentative Essays: Experiments on a Novel Dataset
    Xiaoyu Bai, Manfred Stede
  • SocCor: A Multimodal-based Multilingual Soccer Corpus for Text Data Analytics
    Paul Löhr, Jannik Strötgen
  • A Survey of Idiom Datasets for Psycholinguistic and Computational Research
    Michael Flor, Xinyi Liu, Anna Feldman
4. Discourse and Semantics Thursday, 14:30-15:30 (BC.L.1.31)

  • Function Words as Stable Features for German Opinion Articles Classification
    Amelie Schmidt-Colberg, Simon Burkard, Anne Grohnert, Michael John
  • LLM-based Classification of Grounding Acts in German
    Milena Belosevic, Hendrik Buschmeier
  • Efficient and Effective Coreference Resolution for German
    Fynn Petersen-Frey, Hans Ole Hatzel, Chris Biemann
5. Hate Speech Friday, 11:30-12:30 (BC.L.1.31)

  • FASCIST-O-METER: Classifier for Neo-fascist Discourse Online
    Rudy Alexandro Garrido Veliz, Martin Semmann, Chris Biemann, Seid Muhie Yimam
  • Conditioning Large Language Models on Legal Systems? Detecting Punishable Hate Speech
    Florian Ludwig, Frederike Zufall, Torsten Zesch
  • HICC: A Dataset for German Hate Speech in Conversational Context
    Lars Schmid, Pius von Däniken, Patrick Giedemann, Don Tuggener, Judith Bühler, Maria Kamenowski, Katja Girschick, Dirk Baier, Mark Cieliebak
6. Translation and Multilinguality Friday, 11:30-12:30 (BC.L.0.67)

  • Evaluating the Feasibility of Using ChatGPT for Cross-cultural Survey Translation
    Danielly Sorato, Diana Zavala-Rojas
  • Information Divergence in Translation and Interpreting: Findings from Same-Source Texts
    Maria Kunilovskaya, Sharid Loáiciga, Ekaterina Lapshinova-Koltunski
  • SEAS: Sentence Extraction and Alignment from Subtitles
    Josh Stephenson, Libby Barak

Poster Sessions

Click on the session titles to see more information.

Poster 1 Thursday, 10:00-11:30 (BC.LN.0.03)

  • German Aspect-based Sentiment Analysis in the Wild: B2B Dataset Creation and Cross-Domain Evaluation
    Jakob Fehle, Niklas Donhauser, Udo Kruschwitz, Nils Constantin Hellwig, Christian Wolff
  • Vague, Incomplete, Subjective, and Uncertain Information in Art Provenance
    Fabio Mariani
  • Automatic Creation of Marginalia
    Aaron Lang, Robin Jegan, Andreas Henrich
  • Localization of English Affective Narrative Generation to German
    Johannes Schäfer, Sabine Weber, Roman Klinger
  • Multimodal Docker Unified UIMA Interface: New Horizons for Distributed Microservice-Oriented Processing of Corpora using UIMA
    Daniel Bundan, Giuseppe Abrami, Alexander Mehler
  • Systematic Review of Linguistic Characteristics in Profiling and Automated Detection of Autistic Speech
    Charlotte Bellinghausen, Andreas Riedel
  • More than the Sum of Their Words: Generating and Contrasting Large Linguistic Networks
    Hanna Schmück
  • Towards a Cross-Dialectal Dictionary for Low German (Low Saxon)
    Christian Chiarcos, Janine Siewert, Tabea Gröger, Christian Fäth
  • Rapid Text Segmentation: Crowd-sourcing Lay Intuition about Text Structure in the Browser
    Florian Frenken
  • Applying an Information-theoretic Approach for Automatic Identification of German Multi-word Expressions
    Sergei Bagdasarov, Elke Teich
Poster 2 Friday, 10:00-11:30 (BC.LN.0.03)

  • Hit or Be Hit: Tests of (Pre)Compositional Abilities in Vision and Language Models
    Mădălina Zgreabăn, Albert Gatt, Pablo Mosteiro
  • When AI Gets It Wrong: Exploring the Educational Value of Flawed Transcriptions in Language Pedagogy
    Anna Malin Gerke
  • Hybrid Feature-Embedding Models for Robust AI Text Detection
    Kasper Thomas Gartside Knudsen, Christian Hardmeier
  • Advancing German Language Modelling - Transparent Models and Comprehensive Benchmarks
    Jan Pfister, Julia Wunderle, Anton Ehrmanntraut, Fotis Jannidis, Andreas Hotho
  • Using LLMs for experimental stimulus pretests in linguistics. Evidence from semantic associations between words and social gender
    Christian Lang, Franziska Kretzschmar, Sandra Hansen
  • Developmentally plausible pretraining, now also auf Deutsch: a BabyLM Dataset for German
    Bastian Bunzeck, Daniel Duran, Sina Zarrieß
  • PETapter: Leveraging PET-style classification heads for modular few-shot parameter-efficient fine-tuning
    Jonas Rieger, Mattes Ruckdeschel, Gregor Wiedemann
  • Large Language Model Data Generation for Enhanced Intent Recognition in German Speech
    Theresa Pekarek Rosin, Burak Can Kaplan, Stefan Wermter
  • Detecting Sexism and Its Severity in German Online Comments: Modeling Annotation Subjectivity with BERT and mBERT
    Melanie Woodrow, Margot Mieskes

Workshops

5th Workshop on Computational Linguistics for the Political and Social Sciences (CPSS)

The main goal of the workshop is to bring together researchers and ideas from computational linguistics/NLP and the text-as-data community from political and social science, in order to foster collaboration and catalyze further interdisciplinary research efforts between these communities.

The different submission types (archival/non-archival) are supposed to meet the needs of researchers from different communities, allowing them to come together and exchange ideas in a “get to know each other” environment with the goal of fostering interdisciplinary collaborations.

See https://cpss-sig.github.io/CPSS-2025/ for more information.

KlarText Workshop: German Text Simplification & Readability Assessment

This is the first edition of the KlarText workshop on German Text Simplification & Readability Assessment. The KlarText Workshop aims to bring together researchers, practitioners, and industry experts to discuss state-of-the-art methods, share resources, and identify future research directions in German text simplification and readability assessment. We especially want to highlight the different simplification goals and the variety of simplified language forms in German, such as Einfache Sprache and Leichte Sprache, and encourage researchers to address the challenges of German text simplification. A key focus of the workshop is the evaluation and resources for text simplification methods. By fostering interdisciplinary exchange, we aim to advance research on making information more accessible.

See https://klar-text.github.io/ for more information.

Workshop on NLP for Sustainability (NLP4Sustain)

With this workshop, we want to provide an interdisciplinary forum for discussing research, progress, and challenges in the context of NLP and sustainability. We invite submissions about NLP-based analyses of sustainability-related texts, sustainable NLP models and evaluation practices in general, as well as other related topics. Authors and other participants will engage with each other in a poster session and there will be an interdisciplinary invited talk with an ensuing discussion. The results of the SustainEval GermEval Shared Task will also be presented at the workshop.

GermEval

The following tasks have been accepted for GermEval 2025:

  • Understanding Sustainability Reports

With this shared task, we aim to fuel research on automatic analysis and detection of greenwashing by challenging participants to build systems that categorize excerpts from German sustainability reports for (A) content class and (B) statement verifiability rating. We also invite submission of original papers about analyzing sustainability texts with NLP and other aspects of sustainability and NLP. Papers should describe original, unpublished work and can be technical contributions of empirical or theoretical nature, literature surveys or opinion pieces. For more information, submission formats, and trial data please refer to the shared task homepage: https://sustaineval.github.io/

  • Harmful Content Detection in Social Media

This shared task focuses on detecting harmful content in German social media posts, addressing three key challenges: identifying calls to action, detecting attacks on the free democratic basic order (DBO) and recognizing disturbingly positive statements towards violence. These forms of content pose risks, such as inciting violence or undermining democratic structures.

Link zum Task: https://www.codabench.org/competitions/4963

  • Flausch-Erkennung

The task is to identify expressions of candy speech (“Flausch”) in online posts (YouTube comments). We define candy speech as an expression of positive attitudes in social media toward individuals or their output (videos, comments, etc.). The purpose of candy speech is to encourage, cheer up, support and empower others. It can be viewed as the counterpart to hate speech, as it also aims to influence the self-image of the target person or group, but in a positive way.

We offer the following two subtasks:

Subtask 1: Coarse-Grained Classification. The goal of this subtask is to identify whether the given comment contains candy speech (“Flausch”) or not.

Subtask 2: Fine-Grained Classification. The goal of this subtask is to identify the span of each candy speech expression in a given text and classify it in one of the predefined categories, such as “positive feedback”, “compliment”, “group membership” etc.

More details on the subtasks (including examples) can be found on the website of the shared task: https://yuliacl.github.io/GermEval2025-Flausch-Erkennung/

  • LLMs4Subjects

Building on the strong community engagement of its first iteration, the second edition of LLMs4Subjects continues to challenge researchers to develop cutting-edge LLM-based solutions for subject tagging of technical records from Leibniz University’s Technical Library (TIBKAT). Participants must leverage LLMs to tag records using the GND taxonomy, the standard across all German libraries. The task requires bilingual language modeling, as systems must process technical documents in both German and English. Successful solutions may be integrated into TIB’s operational workflows at the Leibniz Information Centre for Science and Technology.

Task website: https://sites.google.com/view/llms4subjects-germeval/


Tutorials

On Tuesday, September 9th, the following workshops will be offered.

FlexiConc: Reading Concordances with Algorithms

Stephanie Evert and Alexander Piperski (FAU Erlangen-Nürnberg)

Concordance analysis is a central technique in corpus linguistics, computational lexicography, discourse analysis, digital humanities, and other fields. In this tutorial, we show how concordance reading can be supported with well-established as well as innovative computational algorithms. We present FlexiConc, a Python library developed specifically for this purpose which works with concordance data from various corpus tools (including CWB, Sketch Engine, KorAP, and CLiC). Following a theoretical introduction to the principles and five key strategies of concordance reading, we introduce a general mathematical framework for algorithms organised around the five strategies, as well as our approach to comprehensive research documentation in terms of analysis trees. We also discuss the practical implementation of FlexiConc, its integration with host apps, and the challenges we have faced. The last part of the tutorial is a hands-on session showing how to use FlexiConc in a Jupyter Notebook environment, which enables a tight integration of quantitative and qualitative approaches. Drawing on worked examples from multiple subfields, including lexicography and literary stylistics, we demonstrate how reproducible concordance analysis can inform and enrich linguistic research. Participants are encouraged to bring their own laptops and install FlexiConc, but can also work with Google Colab notebooks.

Download Workshop Materials here

The 101 introduction explaining how to produce, publish, and use machine-readable scientific knowledge

Markus Stocker (TIB Hannover)

The tutorial introduces participants to ORKG reborn (https://reborn.orkg.org) and the underlying method for the systematic production and publication of machine-readable scientific knowledge. The tutorial first briefly overviews the state of the art in scientific knowledge graphs and then describes the “reborn article” approach by means of a few illustrative examples. It motivates the developments and contrasts the approach with more classical knowledge extraction from scientific articles. Finally, the tutorial showcases how machine-assisted use of scientific knowledge can support advanced knowledge presentation (visualization) and knowledge synthesis in research.

Fusing Vision and Language: A Tutorial on Vision-Language Models for Multimodal Content Analysis

Eric Müller-Budack and Sushil Awale (TIB Hannover)

The increasing availability of multimodal data, including images and videos, has led to a surge of interest in multimodal models that combine visual and textual information. This tutorial will provide an in-depth introduction to the latest advances in multimodal models, with a focus on large vision-language models. Through a combination of theoretical explanations, code demonstrations, and hands-on exercises, participants will learn how to apply these models to a range of image and video analysis tasks, including image captioning, visual concept detection, and image retrieval. By the end of the tutorial, attendees will have a solid understanding of the strengths and limitations of these models, enabling them to implement their own multimodal applications.