Alexandre Combessie

Phare LLM Benchmark

Support the addition of a new module to the Phare LLM benchmark, focused on evaluating a new task or category of AI Safety / Security risk

<div>Includes dataset curation, expert validation, and domain-specific safety metrics. <br /><br />Please let us know which task and specific LLM Safety / Security risk you'd like to see included in the Phare benchmark by email to <a href="https://opencollective.com/redirect?url=mailto%3Aphare%40giskard.ai">phare@giskard.ai</a>.</div>

🆕 Module Addition

Support the addition of a new module to the Phare LLM benchmark, focused on evaluating a new task or category of AI Safety / Security risk