Overview

What problem are we solving?

AI/ML is becoming increasingly common in software development. Deterministic code, which always produces the same output given the same input, is relatively easy to test. Non-deterministic software components, which can produce different outputs given the same input, require new approaches.

Manual testing and manually monitored production feedback loops are arduous, time consuming and risky.

Enter Okareo. Our focus is on helping you establish reliable AI throughout your development lifecycle.

Runtime Autonomous Evaluation

When seeking reliability, offline model evaluation gets complicated, fast. So let's start with something simpler to get a feel for Okareo. Once you have an Okareo API Key, you can use the Okareo proxy to automatically evauate each completion and conversation in your running application.

Okareo offers a free sign-up so you can get an Okareo_API_KEY and then experiment with analytics and synthetic data.

Step 2: Add Okareo to your LLM requests

To use runtime evaluation, just add baseURL and default_headers.api-key to your existing OpenAI, or favorite model, definition. Okareo will automativally classify and associate your datapoint with appropriate checks (our name for metrics) pre-defined in our library or that you defined.

Python
Typescript
Curl

openai = OpenAI(
    base_url="https://proxy.okareo.com",
    default_headers={"api-key": "<OKAREO_API_KEY>"},
    api_key="<YOUR_LLM_PROVIDER_KEY>")

const openai = new OpenAI({
    baseURL: "https://proxy.okareo.com",
    defaultHeaders: { "api-key": "<OKAREO_API_KEY>" },
    apiKey: "<YOUR_LLM_PROVIDER_KEY>",
});

curl https://proxy.okareo.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR_LLM_PROVIDER_KEY>" \
-H "api-key: <OKAREO_API_KEY>" \
-d '{
    "model": "gpt-3.5-turbo",
    "messages": [
        {
            "role": "system",
            "content": "Answer the question with a single word based only on the following context: Capital of France is Berlin."
        },
        {   
            "role": "user", 
            "content": "Is Berlin the capital of France?"
        }
    ]
    }'

Step 3: Debugging Results

Okareo's Autonomous Evaluation will automatically inspect each LLM completion and assign judges based on our hundreds of built-in metrics and any custom metrics that you've defined. The result is a constant stream of evaluation and baseline information that you canuse to drive moment to moment analytics and simulation spikes on model and prompt updates.

At any time you can collect a set of online completions to use as seeds for synthetic generation. With generated variations of real behaviors, you can run offline simulations to compare models and prompts or even establish regular end-to-end evaluations in CI for deployment readiness.

Simulation and Offline Evaluation

If you aren't ready to connect your LLM application directly to Okareo or want to follow a more labs-based approach to evaluation, you can use Okareo to run simulations and evaluations offline.

info

If your prefer Python Notebooks, feel free to start with the examples below.

To use Okareo programatically, you will need an API Token, some data, and a prompt to test.

If you haven't already, you can sign-up for a free account with full access.
Provision your API Token from the Home page or from Settings > API Token
We suggest making the token available in your environment as OKAREO_API_KEY

export OKAREO_API_KEY="<YOUR_TOKEN>"

Simulation and Evaluation

Okareo includes a simple CLI script runner that you can use with Python, Typescript and even yaml config to drive offline simulations and evaluations. To get started quickly, we will provide simple Python and Typescript examples.

Download the latest version of the okareo CLI (okareo) for your development environment.

macOS Silicon
Windows
Linux
All Releases

curl -O -L https://github.com/okareo-ai/okareo-cli/releases/latest/download/okareo_darwin_arm64.tar.gz
tar -xvf okareo_darwin_arm64.tar.gz

# Download and Extract the archive (requires tar in PowerShell 5.1+)
Invoke-WebRequest -Uri https://github.com/okareo-ai/okareo-cli/releases/latest/download/okareo_windows_386.tar.gz -OutFile okareo_windows_386.tar.gz
tar -xvf okareo_windows_386.tar.gz

curl -O -L https://github.com/okareo-ai/okareo-cli/releases/latest/download/okareo_linux_386.tar.gz
tar -xvf okareo_linux_386.tar.gz

Add Okareo to your path after unpacking:

export PATH="$PATH:[LOCAL_PATH_WHERE_YOU_UNPACKED_OKAREO]"

Run okareo -v to verify your installation before moving to next step.

Let's Run an Evaluation!

Step 1: Create an Okareo Project

Okareo projects are language specific typescript or python

okareo init --language typescript

The init command creates a .okareo folder with a config.yml file and a folder called flows. Evaluation and fine tuning flows you want to run from the CLI are placed in the flows folder and can be run individually or as a group.

Step 2: Create an Evaluation Flow

Everyone's AI/Model evaluation needs are different. We have provided some common examples that you can build on.

Here we will evaluate a Function Call. The goal is to detemine if the "model" (in this case a simple code block) will correctly interpret the request and respond with a signature complete API response. In this mock example, failure could mean accidentally deleting a valid user.

Typescript
Python

Save the following script as function_eval.ts in your .okareo/flows folder created by the okareo init command.

// Save this flow as function_eval.ts and place it in your .okareo/flows folder
import { Okareo, RunTestProps, TestRunType, CustomModel,} from "okareo-ts-sdk";

const main = async () => {
    try {
        const okareo = new Okareo({api_key:process.env.OKAREO_API_KEY});
        const project_id = (await okareo.getProjects()).find(p => p.name === 'Global')?.id;

        const seedData = [
            { input: "can you delete my account? my name is Bob", result: {name: "delete_account", parameter_definitions: { username: { value: "Bob", type: "str", required: true } } } },
            { input: "how do I make an account? I'm Alice", result: { name: "create_account", parameter_definitions: { username: { value: "Alice", type: "str", required: true } } } }, 
            { input: "how do I create an account?", result: { name: "create_account", parameter_definitions: { username: { value: "Alice", type: "str", required: true } } } }, 
            { input: "my name is John. how do I create a project?", result: { name: "create_account", parameter_definitions: { username: { value: "Alice", type: "str", required: true } } } }];
        
        const scenario: any = await okareo.create_scenario_set({
            name: `Function Call Demo Scenario - ${(Math.random() + 1).toString(36).substring(7)}`,
            project_id: project_id,
            seed_data: seedData
        });

        const function_call_model = {
            type: 'custom',
            invoke: async (input_value) =>  {
                const usernames = ["Alice", "Bob", "Charlie"];
                const out: { tool_calls: { name: string; parameters: { [key: string]: any } }[] } = { tool_calls: [] };
                const tool_call: { name: string; parameters: { [key: string]: any } } = { name: "unknown", parameters: {} };
                if (input_value.includes("delete")) {
                    tool_call.name = "delete_account";
                }
                if (input_value.includes("create")) {
                    tool_call.name = "create_account";
                }
                for (const username of usernames) {
                    if (input_value.includes(username)) {
                        tool_call.parameters["username"] = username;
                        break;
                    }
                }
                out.tool_calls.push(tool_call);
                return {
                    model_prediction: out,
                    model_input: input_value,
                    model_output_metadata: {},
                };
            }
        } as CustomModel;
        
        const model = await okareo.register_model({
            name: 'Function Call Demo Model',
            project_id: project_id,
            models: function_call_model,
            update: true,
        });

        const eval_run: any = await model.run_test({
            name: 'Function Call Demo Evaluation',
            project_id: project_id,
            scenario_id: scenario.scenario_id,
            calculate_metrics: true,
            type: TestRunType.NL_GENERATION,
            checks: ["is_function_correct", "are_required_params_present", "are_all_params_expected", "do_param_values_match"],
        } as RunTestProps);

        console.log(`View the evaluation in the Okareo app: ${eval_run.app_link}`);

    } catch (e) {
        console.error(JSON.stringify(e, null, 2));  
    }
}
main();

Save the following script as function_eval.py in your .okareo/flows folder created by the okareo init command.

# Save this flow as function_eval.py and place it in your .okareo/flows folder
from okareo import Okareo
import os
import random
import string
from okareo.model_under_test import CustomModel, ModelInvocation
from okareo_api_client.models.scenario_set_create import ScenarioSetCreate
from okareo_api_client.models.seed_data import SeedData
from okareo_api_client.models.test_run_type import TestRunType

OKAREO_API_KEY = os.environ.get("OKAREO_API_KEY")
okareo = Okareo(OKAREO_API_KEY)

def random_string(length: int) -> str:
    return "".join(random.choices(string.ascii_letters, k=length))

seed_data = [
    SeedData(
        input_="can you delete my account? my name is Bob",
        result={"name": "delete_account", "parameter_definitions": {"username": {"value": "Bob", "type": "str", "required": True}}},
    ),
    SeedData(
        input_="how do I make an account? I'm Alice",
        result={"name": "create_account", "parameter_definitions": {"username": {"value": "Alice", "type": "str", "required": True}}},
    ),
    SeedData(
        input_="how do I create an account?",
        result={"name": "create_account", "parameter_definitions": {"username": {"value": "Alice", "type": "str", "required": True}}},
    ),
    SeedData(
        input_="my name is John. how do I create a project?",
        result={"name": "create_account", "parameter_definitions": {"username": {"value": "Alice", "type": "str", "required": True}}},
    ),
]

tool_scenario = okareo.create_scenario_set(
    ScenarioSetCreate(
        name=f"Function Call Demo Scenario - {random_string(5)}",
        seed_data=seed_data,
    ) 
)

class FunctionCallModel(CustomModel):
    def __init__(self, name):
        super().__init__(name)
        self.usernames = ["Bob", "Alice", "John"]

    def invoke(self, input_value):
        out = {"tool_calls": []}
        tool_call = {"name": "unknown"}

        # parse out the function name
        if "delete" in input_value:
            tool_call["name"] = "delete_account"
        if "create" in input_value:
            tool_call["name"] = "create_account"

        # parse out the function parameter
        tool_call["parameters"] = {}
        for username in self.usernames:
            if username in input_value:
                tool_call["parameters"]["username"] = username
                break

        # package the tool call and return
        out["tool_calls"].append(tool_call)
        return ModelInvocation(
            model_prediction=out,
            model_input=input_value,
        )

# Register the model to use in the test run
mut_name="Function Call Demo Model"
model_under_test = okareo.register_model(
    name=mut_name,
    model=[FunctionCallModel(name=FunctionCallModel.__name__)],
    update=True
)

eval_name = f"Function Call Demo Evaluation"
evaluation = model_under_test.run_test(
    name=eval_name,
    scenario=tool_scenario.scenario_id,
    test_run_type=TestRunType.NL_GENERATION,
    checks=[
        "is_function_correct",
        "are_required_params_present",
        "are_all_params_expected",
        "do_param_values_match",
    ],
)
print(f"See results in Okareo: {evaluation.app_link}")

Step 3: Run your first flow

Let's run your first Okareo flow. In this case we are using the -f flag to just run the function_eval script. For a list of available comamnds, you can use okareo --help.

okareo run -f function_eval

Step 4: What to do next

Now that we have a working flow that establishes a scenario, registers a model and runs an evaluation, you are ready to start building and evaluating your AI-native capabilities. Also, don't hesitate to give us feedback. Learning is golden.

Next Steps:

Synthetic Scenario Generation: Learn about synthetically creating a behavior map of positive and negative scenarios that you can use to establish baseline metrics for your models.
Supported Models and Approaches: Okareo includes built-in support for a wide variety of model types and providers incuding custom models.
Evaluation and Checks: Evaluations are driven through discrete Checks that provide specific metrics. Checks can be deterministic code or based on an AI judge. Learn more about built-in and custom checks for evaluation.
Fine Tuning: Assembling and organizing data sets for fine tuning is a native element of Okareo. Learn more from our Founding Data Scientist in this blog on bootstrapping fine tuning with Okareo.

Overview

What problem are we solving?​

Runtime Autonomous Evaluation​

Step 1: Sign Up and get your Okareo API KEY​

Step 2: Add Okareo to your LLM requests​

Step 3: Debugging Results​

Simulation and Offline Evaluation​

Sign-Up and get your API Token​

Simulation and Evaluation​

Let's Run an Evaluation!​

Step 1: Create an Okareo Project​

Step 2: Create an Evaluation Flow​

Step 3: Run your first flow​

Step 4: What to do next​