okareo.model_under_test

ModelUnderTest Objects

class ModelUnderTest(AsyncProcessorMixin)

A class for managing a Model Under Test (MUT) in Okareo. Returned by okareo.register_model()

submit_test

def submit_test(
        scenario: Union[ScenarioSetResponse, str],
        name: str,
        api_key: Optional[str] = None,
        api_keys: Optional[dict] = None,
        metrics_kwargs: Optional[dict] = None,
        test_run_type: TestRunType = TestRunType.MULTI_CLASS_CLASSIFICATION,
        calculate_metrics: bool = True,
        checks: Optional[List[str]] = None) -> TestRunItem

Asynchronous server-based version of test-run execution. For CustomModels, model invocations are handled client-side then evaluated server-side asynchronously. For other models, model invocations and evaluations handled server-side asynchronously.

Arguments:

scenario Union[ScenarioSetResponse, str] - The scenario set or identifier to use for the test run.
name str - The name to assign to the test run.
api_key Optional[str] - Optional API key for authentication.
api_keys Optional[dict] - Optional dictionary of API keys for different services.
metrics_kwargs Optional[dict] - Optional dictionary of keyword arguments for metrics calculation.
test_run_type TestRunType - The type of test run to execute. Defaults to MULTI_CLASS_CLASSIFICATION.
calculate_metrics bool - Whether to calculate metrics after the test run. Defaults to True.
checks Optional[List[str]] - Optional list of checks to perform during the test run.

Returns:

TestRunItem - The resulting test run item for the submitted test run. The id field can be used to retrieve the test run.

run_test

def run_test(
        scenario: Union[ScenarioSetResponse, str],
        name: str,
        api_key: Optional[str] = None,
        api_keys: Optional[dict] = None,
        metrics_kwargs: Optional[dict] = None,
        test_run_type: TestRunType = TestRunType.MULTI_CLASS_CLASSIFICATION,
        calculate_metrics: bool = True,
        checks: Optional[List[str]] = None) -> TestRunItem

Server-based version of test-run execution. For CustomModels, model invocations are handled client-side then evaluated server-side. For other models, model invocations and evaluations handled server-side.

Arguments:

scenario Union[ScenarioSetResponse, str] - The scenario set or identifier to use for the test run.
name str - The name to assign to the test run.
api_key Optional[str] - Optional API key for authentication.
api_keys Optional[dict] - Optional dictionary of API keys for different services.
metrics_kwargs Optional[dict] - Optional dictionary of keyword arguments for metrics calculation.
test_run_type TestRunType - The type of test run to execute. Defaults to MULTI_CLASS_CLASSIFICATION.
calculate_metrics bool - Whether to calculate metrics after the test run. Defaults to True.
checks Optional[List[str]] - Optional list of checks to perform during the test run.

Returns:

TestRunItem - The resulting test run item for the completed test run.

get_test_run

def get_test_run(test_run_id: str) -> TestRunItem

Retrieve a test run by its ID.

Arguments:

test_run_id str - The ID of the test run to retrieve.

Returns:

TestRunItem - The test run item corresponding to the provided ID.

ModelInvocation Objects

@_attrs_define
class ModelInvocation()

Model invocation response object returned from a CustomModel.invoke method or as an element of a list returned from a CustomBatchModel.invoke_batch method.

Arguments:

model_prediction - Prediction from the model to be used when running the evaluation, e.g. predicted class from classification model or generated text completion from a generative model. This would typically be parsed out of the overall model_output_metadata.
model_input - All the input sent to the model.
model_output_metadata - Full model response, including any metadata returned with model's output.
tool_calls - List of tool calls made during the model invocation, if any.

OpenAIModel Objects

@define
class OpenAIModel(BaseModel)

An OpenAI model definition with prompt template and relevant parameters for an Okareo evaluation.

Arguments:

model_id - Model ID to request from OpenAI completion. For list of available models, see https://platform.openai.com/docs/models
temperature - Parameter for controlling the randomness of the model's output.
system_prompt_template - System role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}.
user_prompt_template - User role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}
dialog_template - Dialog template in OpenAI message format to pass to the model. Uses mustache syntax for variable substitution.
tools - List of tools to pass to the model.

GenerationModel Objects

@define
class GenerationModel(BaseModel)

An LLM definition with prompt template and relevant parameters for an Okareo evaluation.

Arguments:

model_id - Model ID to request for LLM completion.
temperature - Parameter for controlling the randomness of the model's output.
system_prompt_template - System role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}.
user_prompt_template - User role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}
dialog_template - Dialog template in OpenAI message format to pass to the model. Uses mustache syntax for variable substitution.
tools - List of tools to pass to the model.

OpenAIAssistantModel Objects

@_attrs_define
class OpenAIAssistantModel(BaseModel)

An OpenAI Assistant definition with prompt template and relevant parameters for an Okareo evaluation.

Arguments:

model_id - Assistant ID to request to run a thread against.
assistant_prompt_template - System role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}.
user_prompt_template - User role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}
dialog_template - Dialog template in OpenAI message format to pass to the model. Uses mustache syntax for variable substitution.

CohereModel Objects

@_attrs_define
class CohereModel(BaseModel)

A Cohere model definition with prompt template and relevant parameters for an Okareo evaluation.

Arguments:

model_id - Model ID to request for the Cohere completion. For a full list of available models, see https://docs.cohere.com/v2/docs/models
model_type - Type of application for the Cohere model. Currently, we support 'classify' and 'embed'.
input_type - Input type for the Cohere embedding model. For more details, see https://docs.cohere.com/v2/docs/embeddings#the-input_type-parameter

PineconeDb Objects

@_attrs_define
class PineconeDb(BaseModel)

A Pinecone vector database configuration for use in an Okareo retrieval evaluation.

Arguments:

index_name - The name of the Pinecone index to connect to.
region - The region where the Pinecone index is hosted.
project_id - The project identifier associated with the Pinecone index.
top_k - The number of top results to retrieve for queries. Defaults to 5.

QdrantDB Objects

@_attrs_define
class QdrantDB(BaseModel)

A Qdrant vector database configuration for use in an Okareo retrieval evaluation.

Arguments:

collection_name - The name of the Qdrant collection to connect to.
url - The URL of the Qdrant instance.
top_k - The number of top results to retrieve for queries. Defaults to 5.
sparse - Whether to use sparse vectors for the Qdrant collection. Defaults to False.

CustomModel Objects

@_attrs_define
class CustomModel(BaseModel)

A custom model definition for an Okareo evaluation. Requires a valid invoke definition that operates on a single input.

Arguments:

name - A name for the custom model.

invoke

@abstractmethod
def invoke(input_value: Union[dict, list, str]) -> Union[ModelInvocation, Any]

Method for taking a single scenario input and returning a single model output

Arguments:

input_value - Union[dict, list, str] - input to the model.

Returns:

Union[ModelInvocation, Any] - model output. If the model returns a ModelInvocation, it should contain the model's prediction, input, and metadata. If the model returns a tuple, the first element should be the model's prediction and the second element should be the metadata.

CustomMultiturnTarget Objects

@_attrs_define
class CustomMultiturnTarget(BaseModel)

A custom model definition for an Okareo multiturn evaluation. Requires a valid invoke definition that operates on a single turn of a converstation.

invoke

@abstractmethod
def invoke(
    messages: List[dict[str, str]],
    scenario_input: Optional[Union[dict, list, str]] = None
) -> Union[ModelInvocation, Any]

Method for continuing a multiturn conversation with a custom model

Arguments:

messages - list - list of messages in the conversation
scenario_input - Optional[dict | list | str] - scenario input for the conversation

Returns:

StopConfig Objects

@define
class StopConfig()

Configuration for stopping a multiturn conversation based on a specific check.

Arguments:

check_name - Name of the check to use for stopping the conversation.
stop_on - The check condition to stop the conversation. Defaults to True (i.e., conversation stops when check evaluates to True).

SessionConfig Objects

class SessionConfig()

Configuration for a custom API endpoint that starts a session.

Arguments:

url - URL of the endpoint to start the session.
method - HTTP method to use for the request. Defaults to POST.
headers - Headers to include in the request. Defaults to an empty JSON object.
body - Body to include in the request. Defaults to an empty JSON object.
status_code - Expected HTTP status code of the response.
response_session_id_path - Path to extract the session ID from the response. E.g., response.id will use the id field of the response JSON object to set the session_id.

TurnConfig Objects

class TurnConfig()

Configuration for a custom API endpoint that continues a session/conversation by one turn.

Arguments:

url - URL of the endpoint to start the session.
method - HTTP method to use for the request. Defaults to POST.
headers - Headers to include in the request. Supports mustache syntax for variable substitution for {latest_message}, {message_history}, {session_id}. Defaults to an empty JSON object.
body - Body to include in the request. Supports mustache syntax for variable substitution for {latest_message}, {message_history}, {session_id}. Defaults to an empty JSON object.
method1 - Expected HTTP status code of the response.
method2 - Path to extract the model's generated message from the response. E.g., method3 will parse out the corresponding field of the response JSON object as the model's generated response.
method4 - Path to extract tool calls from the response.

EndSessionConfig Objects

class EndSessionConfig()

Configuration for a custom API endpoint that ends a session.

Arguments:

url - URL of the endpoint to start the session.
method - HTTP method to use for the request. Defaults to POST.
headers - Headers to include in the request. Defaults to an empty JSON object.
body - Body to include in the request. Defaults to an empty JSON object.
status_code - Expected HTTP status code of the response.
response_session_id_path - Path to extract the session ID from the response.

CustomEndpointTarget Objects

class CustomEndpointTarget()

A pair of custom API endpoints for starting a session and continuing a conversation to use in Okareo multiturn evaluation.

Arguments:

start_session - A valid SessionConfig for starting a session.
next_turn - A valid TurnConfig for requesting and parsing the next turn of a conversation.
end_session - A valid EndSessionConfig for ending a session.
max_parallel_requests - Maximum number of parallel requests to allow when running the evaluation.

MultiTurnDriver Objects

@_attrs_define
class MultiTurnDriver(BaseModel)

A driver model for Okareo multiturn evaluation.

Arguments:

target - Target model under test to use in the multiturn evaluation.
stop_check - A valid StopConfig or a dict that can be converted to StopConfig.
driver_model_id - Model ID to use for the driver model (e.g., "gpt-4.1").
driver_temperature - Parameter for controlling the randomness of the driver model's output.
repeats - Number of times to run a conversation per scenario row. Defaults to 1.
max_turns - Maximum number of turns to run in a conversation. Defaults to 5.
first_turn - Name of model (i.e., "target" or "driver") that should initiate each conversation. Defaults to "target".
driver_prompt_template - Optional system prompt template to pass to the driver model. Uses mustache syntax for variable substitution, e.g. {input}.

CustomBatchModel Objects

@_attrs_define
class CustomBatchModel(BaseModel)

A custom batch model definition for an Okareo evaluation. Requires a valid invoke_batch definition that operates on a single input.

invoke_batch

@abstractmethod
def invoke_batch(
    input_batch: list[dict[str, Union[dict, list, str]]]
) -> list[dict[str, Union[ModelInvocation, Any]]]

Method for taking a batch of scenario inputs and returning a corresponding batch of model outputs

Arguments:

input_batch - list[dict[str, Union[dict, list, str]]] - batch of inputs to the model. Expects a list of dicts of the format { 'id': str, 'input_value': Union[dict, list, str] }.

Returns:

List of dicts of format { 'id': str, 'model_invocation': Union[ModelInvocation, Any] }. 'id' must match the corresponding input_batch element's 'id'.

ModelUnderTest Objects​

submit_test​

run_test​

get_test_run​

ModelInvocation Objects​

OpenAIModel Objects​

GenerationModel Objects​

OpenAIAssistantModel Objects​

CohereModel Objects​

PineconeDb Objects​

QdrantDB Objects​

CustomModel Objects​

invoke​

CustomMultiturnTarget Objects​

invoke​

StopConfig Objects​

SessionConfig Objects​

TurnConfig Objects​

EndSessionConfig Objects​

CustomEndpointTarget Objects​

MultiTurnDriver Objects​

CustomBatchModel Objects​

invoke_batch​

ModelUnderTest Objects

submit_test

run_test

get_test_run

ModelInvocation Objects

OpenAIModel Objects

GenerationModel Objects

OpenAIAssistantModel Objects

CohereModel Objects

PineconeDb Objects

QdrantDB Objects

CustomModel Objects

invoke

CustomMultiturnTarget Objects

invoke

StopConfig Objects

SessionConfig Objects

TurnConfig Objects

EndSessionConfig Objects

CustomEndpointTarget Objects

MultiTurnDriver Objects

CustomBatchModel Objects

invoke_batch