Simulate Behaviors with Multi-Turn Persona Evaluation
The behavior of language models can change over the course of an extended conversation. Okareo's Multi-Turn evaluations use simulated users to push language model evaluations beyond single interactions.
What do you need?
You will need an environment for running Okareo. Typescript and Python are both available. Please see the SDK sections for more on how to setup each.
Cookbook examples for this guide are available:
Example Using OpenAI
In this example, we show you how to use the MultiTurnDriver
to evaluate a languge model over the course of a conversation in Okareo.
A MultiTurnDriver
is a tool composed of two language models: a Driver and a Target. A typical use case for a MultiTurnDriver
is evaluating a chatbot or agent (the Target) over multiple interactions with a user (the Driver). Both the Target and the Driver will be OpenAI models in this example.
This example will be set up to evaluate a Target's ability to adhere to a set of directives.