Our Mission

The research group focuses on advancing human language technologies and fostering societal impact through outreach efforts. Our work bridges technical sophistication with real-world applications, exploring fundamental research questions in natural language processing (NLP) while building large language models.

We are dedicated to creating multilingual and multimodal NLP systems that extend beyond the boundaries. Our research and innovations contribute to societal domains, such as mental healthcare and diagnostic support, where advanced methods in NLP can make a difference. In a broader sense, outreach aims to extend help, knowledge, or resources to communities, fostering inclusion, education, and positive social impact. 

Group Members

Researchers

Shaoxiong Ji Group Leader

  • Independent research group leader at TU Darmstadt
  • PhD in computer science from Aalto University, Finland, 2023
  • Areas: health informatics, affective computing, and multilingual LLMs.

Doan Nam Long Vu Doctoral Researcher

  • MSc and BSc from TU Darmstadt
  • Areas: machine translation, differential privacy, and mental health applications
  • Projects: DYNAMIC for Mental Health

Students

Zihao Li MSc student at University of Helsinki

  • Research assistant of MaLA project, now working on MSc thesis
  • Areas: multilingual NLP, LLMs, and machine translation.
  • Thesis: continual training of LLMs

Ongoing Projects

Project A: DYNAMIC for Mental Healthcare

The DYNAMIC project (Dynamic Network Approach of Mental Health to Stimulate Innovations for Change) aims to leverage advanced technologies, particularly in natural language processing (NLP) and knowledge discovery, to foster innovations in mental health care. This initiative focuses on several key research topics, including the application of large language models (LLMs) for clinical purposes and the analysis of multimodal clinical data.

Join in our discord server for further information and collaborations as we strive to revolutionize mental healthcare with cutting-edge AI technology.

Project B: MaLA for Massive Language Adaptation of LLMs

MaLA focuses on the adaptation of large language models (LLMs) to better understand and generate human language across diverse contexts and applications in the massively multilingual scenario. It aims to enhance the capabilities of existing LLMs by integrating massive datasets and employing advanced techniques to improve their performance in various linguistic tasks. The initiative recognizes the importance of adapting these models to different languages and dialects, ensuring that they can effectively serve a broader audience.

For those interested in joining the conversation or learning more about the project, you can connect with the community on Discord: MaLA-LM Discord.

Alumnus

University of Helsinki

Jaakko Paavola. Research assistant at University of Helsinki, 2024. Now Quantitative Analyst at OP Financial Group

Henna Roinisto. MSc thesis, University of Helsinki, jointly with Metsä Group
Thesis: Integrating Open-Source Retrieval-Augmented Generation with Large Language Models for Business, Market and Responsibility Insights, 2024.

Aalto University

Ya Gao. MSc thesis, Aalto University. Now PhD candidate at Aalto University
Thesis: Joint entity and relation extraction via contrastive learning on knowledge-augmented graph embeddings, 2023.

Tuulia Denti. MSc thesis, Aalto University, jointly with HUS. Now Data Analyst at HUS
Thesis: Natural Language Processing with Topic Models for Clinical Texts of Prostate Cancer Patients, 2022.

Wei Sun. MSc thesis, Aalto University, jointly with HUS. Now PhD candidate at KU Leuven, Belgium
Thesis: Extracting Medical Entities from Radiology Reports with Ontology-based Distant Supervision, 2022.

Join Us

Are you passionate about advancing NLP research and contributing to impactful societal applications? Although we currently do not have funded positions available, we welcome highly motivated Master’s students and visiting researchers to collaborate with us through external funding or self-supported programs.