We release EMMA-500 Llama 3/3.1 models and MaLA bilingual translation corpus in 2,500+ language pairs 🌐
June 2025
We release a series of CPT models that study the data mixing in continual pre-training 🤗
April 2025
We release the preview of GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models
April 2025
Big congrats to Zihao Li on starting PhD research!
March 1, 2025 @Helsinki
Call for Participation - SemEval-2025 Task-3 — Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes
DDL Jan 31, 2025 @SemEval-2025
Thrilled to have Doan Nam Long Vu join the team – welcome aboard!
January 2, 2025 @Darmstadt
Check out our latest survey paper on LLMs for graph learning
January 2, 2025
One paper on multilingual instruction fine-tuning accepted at COLING 2025. Check out
Lucky52 models and paper
November 29, 2024
Invited talk on NLP for mental health at
GeMTeX Large Language Model Workshop
November 18, 2024 @TU Munich
I moved to TU Darmstadt as an independent research group leader
October 1, 2024 @Darmstadt
Releasing EMMA-500, a multilingual model continue-trained on 546 languages
September 26, 2024 @Helsinki
One paper on LM vs. MT accepted at EMNLP 2024
September 20, 2024