Coteries launches, the largest French language model

Team member
Digital Marketer & PM
On November 8, 2021, our agency is launching Cedille, a new artificial intelligence for text generation that provides a game-changing solution for French-speaking users.

Any company active in the generation of content in French that until now had access to models trained in English, can now take advantage of the largest French-language model to date, publicly available in Beta version on, the largest and most powerful French-language model, now publicly accessible

The model now reaches a perplexity score — a key performance measure for predicting the next word where the lowest score is the best — of 4.5 compared to the best publicly available system (GPT-fr) which has a score of 12.9, positioning Cedille as nearly 3 times more efficient.

The project was launched with the support of the Google TRC program and was trained for several months on Tensor Processing Units (TPUs), special chips created from scratch by Google to speed up artificial intelligence calculations. By relying on this infrastructure, the team was able to ensure a neutral ecological footprint for the model's training process. This is a major achievement when you know that such processes require huge amounts of energy and therefore high carbon emissions.

Martin Müller and Florian Laurent, the two Senior Machine Learning Engineers behind the development of

Cedille relies on the EleutherAI community, a grassroots movement of open source AI researchers. Because Cedille is available to the public, researchers can verify and replicate the results and experiment with them as they please.

“With Cedille we are redistributing the cards for French compared to English language models — and with even more language models to come! We were able to achieve this feat thanks to the efforts of the EleutherAI open source community. By publishing our model publicly, we are excited to contribute back to the community!”

Martin Müller, Senior Machine Learning Engineer at Coteries

Excluding toxic and inappropriate data

To understand the world, the main current text generation models based on artificial intelligence such as GPT-3 are trained using large databases of content available publicly on the Internet. As this content also contains a good deal of misinformation, sexism or racism, it has been shown that existing models can take up these same discriminatory tendencies in text generation.

Coteries made every effort to publish a free model of inappropriate content as much as possible and to filter the data for Cedille's training. All toxic content as well as low quality content has been removed. This process was made possible by a combination of Natural Language Processing and careful manual review of the data samples.

As a result, Cedille is now generating quality texts with a significant reduction of 14.7% in toxic content compared to the best model existing so far (GPT-fr).

Endless application possibilities with Cedille

From improved journalism to autocompletion through chatbots, Cedille offers a very wide potential for use. Coteries offers its model and the skills of its team to create personalized applications, representing an excellent opportunity for any company wishing to make the most of artificial intelligence to generate content in French.

“With Cedille, I am delighted to be able to bring the power of very great models to the French language. Now there is no need to train a new model for each specific task: just give Cedille a few examples!”

Florian Laurent, Senior Machine Learning Engineer at Coteries

You can test Cedille on

Let's work together!

Tell us about your project or need, without commitment ! Of course, we guarantee the highest confidentiality.
Team member
Thank you! Your submission has been received!
Oops! There was a problem with this form and we will correct it as soon as possible.

In the meantime, please send us your request by email to

See you soon!
By clicking Send, you agree to our terms of use and our privacy policy.