Skip to main content

Overview

What problem are we solving?

AI/ML is becoming increasingly common in software development. Deterministic code, which always produces the same output given the same input, is relatively easy to test. Non-deterministic software components, which can produce different outputs given the same input, require new approaches.

Manual testing and manually monitored production feedback loops are arduous, time consuming and risky.

Enter Okareo. Our focus is on helping you establish reliable AI throughout your development lifecycle.

Okareo DiagramOkareo Diagram

Runtime Autonomous Evaluation

When seeking reliability, offline model evaluation gets complicated, fast. So let's start with something simpler to get a feel for Okareo. Once you have an Okareo API Key, you can use the Okareo proxy to automatically evauate each completion and conversation in your running application.

Step 1: Sign Up and get your Okareo API KEY

Okareo offers a free sign-up so you can get an Okareo_API_KEY and then experiment with analytics and synthetic data.

Step 2: Add Okareo to your LLM requests

To use runtime evaluation, just add baseURL and default_headers.api-key to your existing OpenAI, or favorite model, definition. Okareo will automativally classify and associate your datapoint with appropriate checks (our name for metrics) pre-defined in our library or that you defined.

openai = OpenAI(
base_url="https://proxy.okareo.com",
default_headers={"api-key": "<OKAREO_API_KEY>"},
api_key="<YOUR_LLM_PROVIDER_KEY>")

Step 3: Debugging Results

Okareo Auto EvaluationOkareo Auto Evaluation

Okareo's Autonomous Evaluation will automatically inspect each LLM completion and assign judges based on our hundreds of built-in metrics and any custom metrics that you've defined. The result is a constant stream of evaluation and baseline information that you canuse to drive moment to moment analytics and simulation spikes on model and prompt updates.

At any time you can collect a set of online completions to use as seeds for synthetic generation. With generated variations of real behaviors, you can run offline simulations to compare models and prompts or even establish regular end-to-end evaluations in CI for deployment readiness.

Simulation and Offline Evaluation

If you aren't ready to connect your LLM application directly to Okareo or want to follow a more labs-based approach to evaluation, you can use Okareo to run simulations and evaluations offline.

info

If your prefer Python Notebooks, feel free to start with the examples below.

Sign-Up and get your API Token

To use Okareo programatically, you will need an API Token, some data, and a prompt to test.

  1. If you haven't already, you can sign-up for a free account with full access.
  2. Provision your API Token from the Home page or from Settings > API Token
  3. We suggest making the token available in your environment as OKAREO_API_KEY
export OKAREO_API_KEY="<YOUR_TOKEN>"

Simulation and Evaluation

Okareo includes a simple CLI script runner that you can use with Python, Typescript and even yaml config to drive offline simulations and evaluations. To get started quickly, we will provide simple Python and Typescript examples.

Download the latest version of the okareo CLI (okareo) for your development environment.

curl -O -L https://github.com/okareo-ai/okareo-cli/releases/latest/download/okareo_darwin_arm64.tar.gz
tar -xvf okareo_darwin_arm64.tar.gz

Add Okareo to your path after unpacking:

export PATH="$PATH:[LOCAL_PATH_WHERE_YOU_UNPACKED_OKAREO]"

Run okareo -v to verify your installation before moving to next step.

Let's Run an Evaluation!

Step 1: Create an Okareo Project

Okareo projects are language specific typescript or python

okareo init --language typescript

The init command creates a .okareo folder with a config.yml file and a folder called flows. Evaluation and fine tuning flows you want to run from the CLI are placed in the flows folder and can be run individually or as a group.

Step 2: Create an Evaluation Flow

Everyone's AI/Model evaluation needs are different. We have provided some common examples that you can build on.

Here we will evaluate a Function Call. The goal is to detemine if the "model" (in this case a simple code block) will correctly interpret the request and respond with a signature complete API response. In this mock example, failure could mean accidentally deleting a valid user.

Save the following script as function_eval.ts in your .okareo/flows folder created by the okareo init command.

// Save this flow as function_eval.ts and place it in your .okareo/flows folder
import { Okareo, RunTestProps, TestRunType, CustomModel,} from "okareo-ts-sdk";

const main = async () => {
try {
const okareo = new Okareo({api_key:process.env.OKAREO_API_KEY});
const project_id = (await okareo.getProjects()).find(p => p.name === 'Global')?.id;

const seedData = [
{ input: "can you delete my account? my name is Bob", result: {name: "delete_account", parameter_definitions: { username: { value: "Bob", type: "str", required: true } } } },
{ input: "how do I make an account? I'm Alice", result: { name: "create_account", parameter_definitions: { username: { value: "Alice", type: "str", required: true } } } },
{ input: "how do I create an account?", result: { name: "create_account", parameter_definitions: { username: { value: "Alice", type: "str", required: true } } } },
{ input: "my name is John. how do I create a project?", result: { name: "create_account", parameter_definitions: { username: { value: "Alice", type: "str", required: true } } } }];

const scenario: any = await okareo.create_scenario_set({
name: `Function Call Demo Scenario - ${(Math.random() + 1).toString(36).substring(7)}`,
project_id: project_id,
seed_data: seedData
});

const function_call_model = {
type: 'custom',
invoke: async (input_value) => {
const usernames = ["Alice", "Bob", "Charlie"];
const out: { tool_calls: { name: string; parameters: { [key: string]: any } }[] } = { tool_calls: [] };
const tool_call: { name: string; parameters: { [key: string]: any } } = { name: "unknown", parameters: {} };
if (input_value.includes("delete")) {
tool_call.name = "delete_account";
}
if (input_value.includes("create")) {
tool_call.name = "create_account";
}
for (const username of usernames) {
if (input_value.includes(username)) {
tool_call.parameters["username"] = username;
break;
}
}
out.tool_calls.push(tool_call);
return {
model_prediction: out,
model_input: input_value,
model_output_metadata: {},
};
}
} as CustomModel;

const model = await okareo.register_model({
name: 'Function Call Demo Model',
project_id: project_id,
models: function_call_model,
update: true,
});

const eval_run: any = await model.run_test({
name: 'Function Call Demo Evaluation',
project_id: project_id,
scenario_id: scenario.scenario_id,
calculate_metrics: true,
type: TestRunType.NL_GENERATION,
checks: ["is_function_correct", "are_required_params_present", "are_all_params_expected", "do_param_values_match"],
} as RunTestProps);

console.log(`View the evaluation in the Okareo app: ${eval_run.app_link}`);

} catch (e) {
console.error(JSON.stringify(e, null, 2));
}
}
main();

Step 3: Run your first flow

Let's run your first Okareo flow. In this case we are using the -f flag to just run the function_eval script. For a list of available comamnds, you can use okareo --help.

okareo run -f function_eval

Step 4: What to do next

Now that we have a working flow that establishes a scenario, registers a model and runs an evaluation, you are ready to start building and evaluating your AI-native capabilities. Also, don't hesitate to give us feedback. Learning is golden.

Next Steps:
  • Synthetic Scenario Generation: Learn about synthetically creating a behavior map of positive and negative scenarios that you can use to establish baseline metrics for your models.
  • Supported Models and Approaches: Okareo includes built-in support for a wide variety of model types and providers incuding custom models.
  • Evaluation and Checks: Evaluations are driven through discrete Checks that provide specific metrics. Checks can be deterministic code or based on an AI judge. Learn more about built-in and custom checks for evaluation.
  • Fine Tuning: Assembling and organizing data sets for fine tuning is a native element of Okareo. Learn more from our Founding Data Scientist in this blog on bootstrapping fine tuning with Okareo.