Home

Senior researcher · Inria Paris · Almanach

Djamé Seddah

I am a former tenured associate professor at Sorbonne University, now in a full-time senior research position at INRIA Paris in the Almanach team. My interests cover the field of natural language processing—in the past mainly wide-coverage multilingual syntactic analysis and the syntax-semantics interface, and now building robust language models for low-resource languages, specialized domains, etc. I invested a considerable amount of time in the construction of annotated corpora (Sequoia corpus, French Social Media Bank, French Question Bank, Narabizi Treebank, Counter Dataset, etc.) and parsing models for morphologically-rich languages. I participated in the development of the CamemBERT, PagnolXL, CamemBERTa, and ModernCamemBERT language models, as well as character-based models for dialectal and highly noisy languages (CharacterBERT-UGC).

My current research focuses on language models and possible ways of avoiding their weaponization (content detection, bias detection and mitigation, etc.). To this end, together with Benoit Sagot and Eric de la Clergerie, I am extremely involved in the development of the GAPerson series of LLMs that focus on French. You can see the relevant pages and models on Hugging Face's GAPeron collection page; our paper is on arXiv.
Portrait of Djamé Seddah

Research themes

  • LLM safety, interpretability, and backdoor analysis
  • Model specialization, instruction tuning, and alignment
  • Social and cultural robustness in NLP
  • Bias evaluation, fairness, and context-sensitive modeling
  • Low-resource, multilingual, and domain-specific language modeling

Quick links

Short bio

Dr. Djamé Seddah is a former tenured associate professor at Sorbonne University, now on a full time senior research position INRIA Paris in the Almanach team. His interests cover the field of natural language processing, in the past mainly wide-coverage multilingual syntactic analysis, the syntax-semantics interface, and now trying to build robust language models, eg. for low-resource languages, specialized domains, etc. A specialist in the construction of annotated corpora (Sequoia corpus, French Social Media Bank, French Question Bank, Narabizi Treebank, Counter Dataset, etc.), he participated in the development of the CamemBERT, PagnolXL, CamemBERTa and ModernCamemBERT language models, as well as character-based models for dialectal and highly noisy languages (CharacterBERT-UGC)

His current research focuses on language models and possible ways of avoiding their weaponization (content detection, bias detection and mitigation, etc.). To this end, together with Benoit Sagot and Eric de la Clergerie, he’s extremely involved in the development of the GAPerson series of LLMs that focus on French. See the relevant pages and model at HuggingFace’s. The paper is on arXiv.

Selected news from my archived homepage

2026

aggregated models metrics

More details here.

2025

2024

2023