Phare LLM Benchmark
Phare is an open, independent & multilingual benchmark to evaluate LLMs across key safety & security dimensions, including hallucination, factual accuracy, bias, and potential harm.

Contribute
Become a financial contributor.
Financial Contributions
Get your company logo on our website and the right to use the Phare logo on your website as donators. Read more
Bronze benefits + joint social media post and thank you mention in our next newsletter after the first donation. Read more
Silver benefits + joint blog article, press release and webinar to announce our collaboration. Read more
Support extending Phare LLM Benchmark to one new language with culturally grounded prompts and native annotators. Read more
Support the addition of a new module to the Phare LLM benchmark, focused on evaluating a new task or category of AI Safety / Security risk Read more
Phare LLM Benchmark is all of us
Our contributors 2
Thank you for supporting Phare LLM Benchmark.

Connect
Let’s get the ball rolling!
News from Phare LLM Benchmark
Updates on our activities and progress.
📰 Latest Update: Benchmarking 17 LLMs with Phare (June 2025)

About
Phare is not a paid service — it is freely accessible for research and non-commercial use, and we welcome community contributions and donations to support its continued development
Current Modules
- Hallucination
- Samples: ~6000 private, ~1600 public
- Focus: Factual reliability and misleading information, including tool-based generation (e.g. RAG).
2. Harmfulness
- Tasks: Harmful Misguidance
- Samples: ~1500 private, ~400 public
- Focus: Generation of unsafe or dangerous advice, including unauthorized medical or legal information.
3. Bias & Fairness
- Tasks: Self-assessed Stereotypes
- Samples: ~2400 private, ~600 public
- Focus: Discrimination and stereotype reinforcement across demographic groups and languages.
🔧 Sample Creation Process
We follow a three-step protocol to develop benchmark samples:
Evaluation is performed using LMEval, an open-source framework. We will soon release our full evaluation pipeline to ensure full reproducibility.
📂 Resources
🌍 Why Open Collective?
Phare is being built as a public-good infrastructure. We believe that safety evaluations should not be owned by tech giants, but by communities that care about fairness, transparency, and societal impact. Open Collective gives us the tools to fund Phare transparently and to work with contributors from around the world.
Whether you’re a researcher, company, policymaker, or citizen, your support helps us expand Phare to new domains, new languages, and new safety modules.
Our team
David Berenstein
Alexandre Com...