Classification

What do you need?

You will need an environment for running Okareo. Typescript and Python are both available. Please see the SDK sections for more on how to setup each.

Cookbook examples for this guide are available:

Colab Notebook
Typescript Cookbook - Coming Soon

tip

If you prefer command line, make sure to review the Okareo CLI for more information.

Step 1: Register a Custom Model

Register a Model: To use a local model, a custom model class needs to be instantiated using CustomModel as the base class. Then, we can register the model and use it in a test run.

Python
Typescript

# Register your model
from okareo import Okareo
from okareo.model_under_test import CustomModel

okareo = Okareo(OKAREO_API_KEY)

class ClassificationModel(CustomModel):
    # Callable to be applied to each scenario in the scenario set
    def invoke(self, input: str):
        # call your model being tested here using <input> from the scenario set
        if "how much" in input:
            actual = "pricing"
        elif "return" in input:
            actual = "returns"
        else:
            actual = "complaints"
        return actual, {"labels": actual, "confidence": .99 }  # return a tuple of (actual, overall model response context)


# this will return a model if it already exists or create a new one if it doesn't
model_under_test = okareo.register_model(name="intent_classifier", model=ClassificationModel(name="Classification model"))

import { Okareo, ModelUnderTest, CustomModel } from "okareo-ts-sdk";

const model_under_test: any = await okareo.register_model({
    name: "Custom Model",
    tags: ["Custom", "Example"],
    project_id: project_id,
    models: {
        type: "custom",
        invoke: (input: string) => { 
            // This function is called for each row in the scenario.
            // Call your custom endpoint to get actual_result
            return [
                actual_result,
                {
                    input: input,
                    method: "custom_endpoint",
                    context: {
                        input: input,
                        //...other useful information
                    },
                } 
            ]
        }
    } as CustomModel,
});

Step 2: Create a Scenario to Evaluate

Create Scenario: Scenarios are uploaded or are created synthetically within Okareo. The example creates a scenario from the seed data and asks Okareo not to use a synthetic data generator, by providing the generation type as ScenarioType.SEED.
At scale, it is more common to import as jsonl or to generate new scenarios from past failure/success cases from prior runs.

Python
Typescript

from okareo import Okareo
from okareo_api_client.models import ScenarioSetCreate, SeedData, ScenarioType

# generate example scenario based on seed data and return results in one call
scenario_set_create = ScenarioSetCreate(name="My Test Scenario Set",
    generation_type=ScenarioType.SEED,
    number_examples=1,
    seed_data=[SeedData(input_="I want to send this product back", result="returns"),
            SeedData(input_="my product is not working", result="complaints"),
            SeedData(input_="how much is the warranty on the product", result="pricing"),
            SeedData(input_="this product is having issues", result="complaints"),
            SeedData(input_="I want to send this product back for a return", result="returns"),
            SeedData(input_="how much is this product", result="pricing")])

scenario = okareo.create_scenario_set(scenario_set_create)
scenario_id = scenario.scenario_id

import { Okareo, SeedData } from "okareo-ts-sdk";

# generate example scenario based on seed data and return results in one call
const scenario: any = await okareo.create_scenario_set({
    name: "My Test Scenario Set",
    project_id: project_id,
    number_examples: 1,
    generation_type: ScenarioType.SEED,
    seed_data: [
        SeedData({input:"I want to send this product back", result:"returns"}),
        SeedData({input:"my product is not working", result:"complaints"}),
        SeedData({input:"how much is the warranty on the product", result:"pricing"}),
        SeedData({input:"this product is having issues", result:"complaints"}),
        SeedData({input:"I want to send this product back for a return", result:"returns"}),
        SeedData({input:"how much is this product", result:"pricing"})
    ]
});
const scenario_id: string = scenario.scenario_id

Step 3: Evaluate the Scenario

Evaluation: Okareo has a built-in test harness for running evaluations directly from the cloud, but for custom models, the scenario is run against the model locally, and the results are sent to the server for evaluation.

Python
Typescript

# Run the evaluation on the model and get a link to the results in Okareo
# use the scenario or scenario id to run the test
test_run_item = model_under_test.run_test(scenario=scenario_id, name="Intent Classifier Run", calculate_metrics=True)

# model metrics for review in code
model_results = test_run_item.model_metrics.to_dict()
# link back to Okareo site for evaluation visualization
app_link = test_run_item.app_link
print(f"See results in Okareo: {app_link}")

// Run the evaluation on the model and get a link to the results in Okareo
// use the scenario or scenario id to run the test
const test_run_item: any = await model_under_test.run_test({
    name: "Intent Classifier Run",
    tags: ["Classifier", "BUILD_ID"],
    project_id: project_id,
    scenario: scenario,
    calculate_metrics: true,
    type: TestRunType.MULTI_CLASS_CLASSIFICATION,
});
// model metrics for review in code
model_results = test_run_item.model_metrics;
// link back to Okareo site for evaluation visualization
console.log(`See results in Okareo: ${test_run_item.app_link}`);

Step 4: Review Results

Results: Navigate to your last evaluation either within app.okareo.com or directly from the link generated in the example to view evaluation results. Okareo automaticlly calculates metrics and provides an error matrix to compare expected to actual results for evaluations identified as MULTI_CLASS_CLASSIFICATION.

Accuracy: The over-arching capability of the model to respond appropriately.
F1: Calculated combination of Precison and Recall to quicky assess response quality.
Recall: A measure of how many positives the model predicts correctly out of all available positives.
Precision A measure of how many predicted positives are true positives out of all predicted posistives. In addition to the metrics, the source scenario, model data points, and evaluation details are all included.

What do you need?​

Step 1: Register a Custom Model​

Step 2: Create a Scenario to Evaluate​

Step 3: Evaluate the Scenario​

Step 4: Review Results​

What do you need?

Step 1: Register a Custom Model

Step 2: Create a Scenario to Evaluate

Step 3: Evaluate the Scenario

Step 4: Review Results