Our Mission

We focus on omni language AI research, advancing human language technologies, pushing forwards multimodal learning capabilities, and fostering societal impact through outreach efforts. Our work bridges technical sophistication with real-world applications, exploring fundamental research questions in natural language processing and multimodal AI while building large language models for societal impacts.

Our dynamic group consists of researchers, students and visitors from ELLIS Institute Finland, University of Turku (TurkuNLP group), and other collaborative institutions.

Group Members

Researchers

Shaoxiong Ji
Shaoxiong Ji
Group Leader
ELLIS Institute Finland & University of Turku
NLP & AI for Health
Zihao Li
Zihao Li
Doctoral Researcher
University of Helsinki (Co-supervision with Jörg Tiedemann)
Multilingual NLP
Renhao Pei
Renhao Pei
Doctoral Researcher
ELLIS Institute Finland & University of Turku
Multilingual NLP
Mingyuan Li
Mingyuan Li
Visiting Researcher
University of Turku
NLP
Md Mohsinul Kabir
Md Mohsinul Kabir
ELLIS PhD
University of Manchester (co-advised with Sophia Ananiadou)
NLP

Students

Md Mehrab Hossain
Md Mehrab Hossain
MSc Student
University of Turku
Multilingual NLP
Abdulaziz Mahmoud
Abdulaziz Mahmoud
MSc Student
University of Jyväskylä
Multilingual NLP
Miikael Mändmets
Miikael Mändmets
MSc Student
University of Turku
NLP

Follow Us

Join Us

Are you passionate about advancing NLP research and contributing to impactful societal applications? Although we currently do not have funded positions available, we welcome highly motivated Master’s students and visiting researchers to collaborate with us through external funding or self-supported programs.

Alumnus

PhD Candidate 2025
PhD Candidate at TU Darmstadt
Role: Doctoral Researcher
Institution: Technical University of Darmstadt, Germany
Project: DYNAMIC (Dynamic Network Approach of Mental Health to Stimulate Innovations for Change)

Publications:
• Roleplaying with Structure: Synthetic Therapist-Client Conversation Generation from Questionnaires (arXiv 2025)
Jaakko Paavola
Research Assistant 2024
Quantitative Analyst at OP Financial Group
Role: Research Assistant
Institution: University of Helsinki, Finland
Project: MaLA-LM (Massive Language Adaptation of Large Language Models)

Publications:
• EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models (arXiv 2024)
• Massively Multilingual Adaptation of Large Language Models Using Bilingual Translation Data (arXiv 2025)
Henna Roinisto
MSc Student 2024
Data Scientist at Metsä Group
Role: MSc Thesis Student
Institution: University of Helsinki, Finland (jointly with Metsä Group)
Thesis: Integrating Open-Source Retrieval-Augmented Generation with Large Language Models for Business, Market and Responsibility Insights
Ya Gao
MSc Student 2022-23
PhD Candidate at Aalto University
Role: MSc Thesis Student
Institution: Aalto University, Finland
Thesis: Joint entity and relation extraction via contrastive learning on knowledge-augmented graph embeddings

Publications:
• Knowledge-augmented Graph Neural Networks with Concept-aware Attention for Adverse Drug Event Detection (LREC-COLING 2024)
• Contextualized Graph Embeddings for Adverse Drug Event Detection (ECML-PKDD 2022)
Tuulia Denti
MSc Student 2022
Data Analyst at HUS
Role: MSc Thesis Student
Institution: Aalto University, Finland (jointly with HUS)
Thesis: Natural Language Processing with Topic Models for Clinical Texts of Prostate Cancer Patients

Publications:
• Weak Supervision and Clustering-Based Sample Selection for Clinical Named Entity Recognition (ECML-PKDD 2023)
Wei Sun
MSc Student 2021-22
PhD Candidate at KU Leuven, Belgium
Role: MSc Thesis Student
Institution: Aalto University, Finland (jointly with HUS)
Thesis: Extracting Medical Entities from Radiology Reports with Ontology-based Distant Supervision

Publications:
• A Unified Review of Deep Learning for Automated Medical Coding (ACM Computing Surveys 2024)
• Weak Supervision and Clustering-Based Sample Selection for Clinical Named Entity Recognition (ECML-PKDD 2023)
• Multitask Balanced and Recalibrated Network for Medical Code Prediction (TIST 2022)
• Multitask Recalibrated Aggregation Network for Medical Code Prediction (ECML-PKDD 2021)