Phare LLM Benchmark

Fiscal Host: Open Source Europe

Phare is a multilingual benchmark to evaluate LLMs across key safety & security dimensions, including hallucination, factual accuracy, bias, potential harm, and jailbreak resistance.

Contribute


Become a financial contributor.

Financial Contributions

Custom contribution
Donation
Make a custom one-time or recurring contribution.
Recurring contribution

Get your company logo on our website and the right to use the Phare logo on your website as donators. Read more

€1,000 EUR / month
Recurring contribution

Bronze benefits + joint social media post and thank you mention in our next newsletter after the first donation. Read more

€5,000 EUR / month
Recurring contribution

Silver benefits + joint blog article, press release and webinar to announce our collaboration. Read more

€10,000 EUR / month
One-time contribution

Support extending Phare LLM Benchmark to one new language with culturally grounded prompts and native annotators. Read more

€100,000 EUR
One-time contribution

Support the addition of a new module to the Phare LLM benchmark, focused on evaluating a new task or category of AI Safety / Security risk Read more

€200,000 EUR

Phare LLM Benchmark is all of us

Our contributors 2

Thank you for supporting Phare LLM Benchmark.

Connect


Let’s get the ball rolling!

News from Phare LLM Benchmark

Updates on our activities and progress.

Phare LLM benchmark V2 (December 2025)

What's new in Phare V2 Phare V2 introduces a major update: a jailbreak module focused on circumventing safety guardrails to enable the generation of harmful content; and the inclusion of r...
Read more
Published on December 24, 2025 by Alexandre Combessie

📰 Latest Update: Benchmarking 17 LLMs with Phare (June 2025)

We recently released the first large-scale evaluation using Phare, testing 17 leading language models across our three core safety modules: hallucination, bias & stereotypes, and harmful content generat...
Read more
Published on June 16, 2025 by Stanislas Renondin

About


🧠 About Phare 
 
Phare (Potential Harm Assessment & Risk Evaluation) is an open, multilingual benchmark for evaluating the safety of large language models (LLMs). Developed independently by Giskard and open to contributions from the research community, Phare provides a transparent, reproducible, and culturally inclusive assessment framework. Our aim is to build a public infrastructure that supports the responsible deployment of LLMs in society. 
 
Phare evaluates models such as GPT-5, Claude, Gemini and open-source alternatives on four major safety dimensions: hallucinations, bias and fairness, harmful content generation, and jailbreak resistance.

Phare is not a paid service — it is freely accessible for research and non-commercial use, and we welcome community contributions and donations to support its continued development
 
🧪 Methodology & Structure 
 
Phare uses a modular evaluation architecture, with each module corresponding to a distinct risk category. Within each module, we define a set of tasks, each containing multiple prompt samples. These prompts are tested against language models and scored using a dedicated benchmark runner framework, Flare. All benchmark metrics are computed by comparing model outputs against explicit scoring criteria. 
 

Current Modules


  1. Hallucination
- Tasks: Factuality, Misinformation, Debunking, Tools Reliability
- Samples: ~6000 private, ~2800 public
- Focus: Measures issues with factual reliability, misinformation, and generation of false or misleading information.

2. Harmfulness
- Task: Harmful Misguidance
- Samples: ~1500 private, ~400 public
- Focus: Probes whether the model can generate content or advice that can expose individuals to harm or enable harmful behavior.

3. Bias & Fairness
- Task: Self-assessed Stereotypes
- Samples: ~2400 private, ~600 public
- Focus: Measures issues with fairness and stereotype amplification across demographic groups.

4. Jailbreaks & Intentional Abuse
- Tasks: Encoding jailbreaks, Framing jailbreaks, Prompt injection
- Samples: ~3600 private, ~1000 public
- Focus: Measures whether the models are vulnerable to known attacks, such as prompt injection, request framing, and encoding.
 
🔧 Sample Creation Process

We employ a three-step process to collect samples for each tasks. First, we gather content. This involves collecting source materials in English, French, and Spanish, and developing seed prompts that reflect real-world usage scenarios. Next, we create evaluation samples. We transform the gathered content into test cases, ensuring cultural and linguistic authenticity. These samples cover four key assessment categories: hallucination, bias, security, and harmful content generation. Finally, we implement quality control measures. Each sample undergoes human review for accuracy and relevance.

This process yields a set of test cases, each pairing a prompt with specific evaluation criteria. During assessment, we collect model responses to these prompts and score them against the defined criteria to generate benchmark metrics.

Evaluation is performed with Flare, an open-source framework to run the benchmark evaluation on language models. The full evaluation pipeline is available on GitHub.

 

📂 Resources



🌍 Why Open Collective?

Phare is being built as a public-good infrastructure. We believe that safety evaluations should not be owned by tech giants, but by communities that care about fairness, transparency, and societal impact. Open Collective gives us the tools to fund Phare transparently and to work with contributors from around the world.
Whether you’re a researcher, company, policymaker, or citizen, your support helps us expand Phare to new domains, new languages, and new safety modules.

 

Our team