Our contribution to AI research
Cogite deeply believes that African French-speaking languages and cultures deserve better representation in global AI models. That's why we periodically publish open-source datasets freely usable for academic research and open-source development.
Available datasets
cogite-fr-african-sentiments (upcoming)
Dataset of 20,000 African French sentences annotated for sentiment, covering Cameroonian, Ivorian, Senegalese and Congolese variants. Ideal for fine-tuning French-language sentiment analysis models.
License: CC-BY-SA 4.0 · Availability: Q3 2026
cogite-fr-mobile-money-ner (upcoming)
Named Entity Recognition dataset specialized in the mobile money domain in African French: operator entities, transaction types, currencies, locations. 8,000 annotated sentences.
License: CC-BY 4.0 · Availability: Q4 2026
cogite-bilingual-codeswitch (upcoming)
Dataset of sentences mixing French and English (code-switching), a widespread linguistic phenomenon in Anglophone Africa and bilingual Cameroon. 12,000 sentences with token-level language annotation.
License: CC-BY 4.0 · Availability: Q1 2027
For the research community
If you're a researcher or doctoral student and want early access to these datasets, or want to propose a research partnership, contact our team. We grant early access to research projects whose results are published in open access.
Why these datasets?
Current AI models are massively trained on English-speaking and Western data. The resulting biases — cultural, linguistic, economic — are documented but rarely corrected. By contributing to these African French-speaking datasets, Cogite participates in a collective effort to make AI more inclusive, more representative, and therefore more useful to the 280 million French speakers of Africa.