Ammar Kheder with the NAO humanoid robot at INRIA Bordeaux

Research engineer · 2021–2022 · INRIA Bordeaux

LingoRob

A crowdsourcing platform for multi-language linguistic corpora, designed to train brain-inspired language models for human-robot interaction.

← All projects

During my one-year apprenticeship at INRIA Bordeaux’s Mnemosyne project team, supervised by Xavier Hinaut, I worked on LingoRob, a citizen-science platform that collects multi-language linguistic data to train neural models for human-robot interaction.

The platform crowdsources three flavours of data on the same sentence: a written form, an audio recording, and a structured semantic representation (predicates, Semantic Role Labelling). The resulting corpora are used to train recurrent neural networks built on the Reservoir Computing paradigm, the family of brain-inspired models studied in the team. Once trained, the same models drive language understanding on humanoid robots like the NAO shown above.

My role focused on the engineering side: stitching together prototype components into a single Django/Python web application, integrating the audio pipeline, designing contributor flows for writing, reading, and evaluating sentences, and laying the groundwork for the live demo where visitors can test the model with their own input directly from the browser.

INRIA

Mnemosyne team
Bordeaux

1 year

Apprenticeship
2021–2022

RC

Reservoir Computing
brain-inspired RNNs

FR / EN / DE

Multi-language
cross-lingual corpora

What it does

  1. Three-faceted data collection. For each prompt, the platform records a written sentence, an audio reading, and a semantic annotation (predicates and roles), making the corpus reusable across multiple modelling stacks.
  2. Multi-language by design. French, English, German and other languages share the same task design, supporting research on language universality and cross-lingual generalisation.
  3. Quality control via peer evaluation. Contributors review each other’s productions and flag content that does not match the task, keeping the corpus clean without manual curation.
  4. Live model demo. Trained Reservoir Computing models are exposed through the same web interface, letting neuro-linguists and visitors run sentences through the model with no code, no setup.
  5. Robot-grounded prompts. Many tasks ask the contributor to write or speak commands aimed at a humanoid robot in a simulated scene, anchoring the linguistic data in the human-machine interaction loop.

Stack

Python Django HTML · CSS · JS Reservoir Computing MFCC audio Semantic Role Labelling

Context & prior work

LingoRob built on a body of prior work in the team: years of research on Reservoir Computing for language, and earlier collaborations with the University of Hamburg around contributor-experience design, audio-quality control, and natural-language interpretation for simulated robotic arms. Those preliminary studies, conducted as bachelor projects at Hamburg, informed the platform’s task taxonomy, gamification approach (motivational arc, leaderboards, multi-step evaluations), and quality-assurance pipeline. The engineering work during this apprenticeship consolidated those building blocks into a single deployable Django application.

Credits

Engineering
Ammar Kheder, research engineer (apprenticeship)
Supervision
Xavier Hinaut, INRIA Bordeaux, Mnemosyne project team
Host lab
INRIA Mnemosyne, Bordeaux
Stack
Python, Django, web stack, Reservoir Computing models
Prior design & feasibility studies
Bachelor projects at the University of Hamburg, used with attribution