MSc Thesis Topics | Shaoxiong Ji

We welcome Master’s students who are looking for thesis opportunities in natural language processing and related fields to work with us. If you are enthusiastic about solving real-world challenges with NLP, particularly in the areas of multilingual and multimodal systems, or exploring societal impacts such as mental healthcare applications, consider joining our team.

What We Offer:

Supervision and guidance from leading experts in NLP and AI.
Access to state-of-the-art research tools, datasets, and computational resources.
Opportunities to publish in high-impact venues and present at conferences.

Eligibility Criteria:

Must be enrolled in a Master’s program at TU Darmstadt.
Strong academic background in computer science, machine learning, natural language processing or a related field.
Experience or coursework in natural language processing is preferred.

How to Apply:

A brief description of your research interests and how they align with our work.
Your CV and academic transcript.

Please send your application to shaoxiong.ji@tu-darmstadt.de

Previous MSc Thesis Topics

Henna Roinisto (MSc, University of Helsinki, jointly with Metsä Group)
Integrating Open-Source Retrieval-Augmented Generation with Large Language Models for Business, Market and Responsibility Insights, 2024.
Ya Gao (MSc, Aalto University, now PhD candidate at Aalto University)
Joint entity and relation extraction via contrastive learning on knowledge-augmented graph embeddings, 2023.
Tuulia Denti (MSc, Aalto University, jointly with HUS, now Data Analyst at HUS)
Natural Language Processing with Topic Models for Clinical Texts of Prostate Cancer Patients, 2022.
Wei Sun (MSc, Aalto University, jointly with HUS, now PhD candidate at KU Leuven, Belgium)
Extracting Medical Entities from Radiology Reports with Ontology-based Distant Supervision, 2022.

Previous MSc Research Projects

An empirical study of language modeling and translation as multilingual pretraining objectives (2023-24 at University of Helsinki)
Deep learning for medical code assignment from clinical notes (2020-2022 at Aalto University)
Deep model fusion in federated learning (2020-2021)
Conversational/multimodal sentiment analysis (2020-2021 at Aalto University)
NLP for mental health (e.g, depression detection and suicidal ideation detection) (2021-2023 at Aalto University)
Adverse drug event detection and extraction (2021-2022 at Aalto University)
Multilingual complex named entity recognition at SemEval shared tasks (2021-2022 at Aalto University)
Risk adjustment for healthcare plan payment (2019-2020 at Aalto University)

Previous BSc Thesis Topics

Risk adjustment for health plan payment (2019 Winter at Aalto University)
Deep learning for cyberbullying detection (2020 Summer at Aalto University)
Pretrained language models for diagnosis code prediction (2020 Summer at Aalto University)
Federated learning (2020 Fall at Aalto University)
Depression detection from social content (2021 Spring at Aalto University)
Biomedical text classification (2022 Spring at Aalto University)

Published Project Reports

We are proud of our students’ exceptional research, which has played a key role in advancing language technology for societal impact. Through their dedication, they have co-authored papers in top scientific venues, developed novel NLP models, and contributed to impactful, real-world projects. Their work reflects the collaborative environment we foster. Below is a list of project reports that were published after revisions in scientific venues.

Zihao Li, Shaoxiong Ji, Timothee Mickus, Vincent Segonne, and Jörg Tiedemann. A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives. In Proceedings of EMNLP, 2024.
Ya Gao, Shaoxiong Ji, Tongxuan Zhang, Prayag Tiwari and Pekka Marttinen. Contextualized Graph Embeddings for Adverse Drug Event Detection. ECML-PKDD, 2022.
Aapo Pietiläinen and Shaoxiong Ji. AaltoNLP at SemEval-2022 Task 11: Ensembling Task-adaptive Pretrained Transformers for Multilingual Complex NER. Proceedings of International Workshop on Semantic Evaluation (SemEval), 2022.
Luna Ansari, Shaoxiong Ji, Qian Chen, and Erik Cambria. Ensemble Hybrid Learning Methods for Automated Depression Detection. IEEE Transactions on Computational Social Science, 2022.
Wei Sun, Shaoxiong Ji, Erik Cambria, and Pekka Marttinen. Multitask Recalibrated Aggregation Network for Medical Code Prediction. ECML-PKDD, 2021.

Photo by Patrick Tomasso on Unsplash