Skip to main content

okareo.model_under_test

ModelUnderTest Objects

class ModelUnderTest(AsyncProcessorMixin)

A class for managing a Model Under Test (MUT) in Okareo. Returned by okareo.register_model()

submit_test

def submit_test(
scenario: Union[ScenarioSetResponse, str],
name: str,
api_key: Optional[str] = None,
api_keys: Optional[dict] = None,
metrics_kwargs: Optional[dict] = None,
test_run_type: TestRunType = TestRunType.MULTI_CLASS_CLASSIFICATION,
calculate_metrics: bool = True,
checks: Optional[List[str]] = None) -> TestRunItem

Asynchronous server-based version of test-run execution. For CustomModels, model invocations are handled client-side then evaluated server-side asynchronously. For other models, model invocations and evaluations handled server-side asynchronously.

Arguments:

  • scenario Union[ScenarioSetResponse, str] - The scenario set or identifier to use for the test run.
  • name str - The name to assign to the test run.
  • api_key Optional[str] - Optional API key for authentication.
  • api_keys Optional[dict] - Optional dictionary of API keys for different services.
  • metrics_kwargs Optional[dict] - Optional dictionary of keyword arguments for metrics calculation.
  • test_run_type TestRunType - The type of test run to execute. Defaults to MULTI_CLASS_CLASSIFICATION.
  • calculate_metrics bool - Whether to calculate metrics after the test run. Defaults to True.
  • checks Optional[List[str]] - Optional list of checks to perform during the test run.

Returns:

  • TestRunItem - The resulting test run item for the submitted test run. The id field can be used to retrieve the test run.

run_test

def run_test(
scenario: Union[ScenarioSetResponse, str],
name: str,
api_key: Optional[str] = None,
api_keys: Optional[dict] = None,
metrics_kwargs: Optional[dict] = None,
test_run_type: TestRunType = TestRunType.MULTI_CLASS_CLASSIFICATION,
calculate_metrics: bool = True,
checks: Optional[List[str]] = None) -> TestRunItem

Server-based version of test-run execution. For CustomModels, model invocations are handled client-side then evaluated server-side. For other models, model invocations and evaluations handled server-side.

Arguments:

  • scenario Union[ScenarioSetResponse, str] - The scenario set or identifier to use for the test run.
  • name str - The name to assign to the test run.
  • api_key Optional[str] - Optional API key for authentication.
  • api_keys Optional[dict] - Optional dictionary of API keys for different services.
  • metrics_kwargs Optional[dict] - Optional dictionary of keyword arguments for metrics calculation.
  • test_run_type TestRunType - The type of test run to execute. Defaults to MULTI_CLASS_CLASSIFICATION.
  • calculate_metrics bool - Whether to calculate metrics after the test run. Defaults to True.
  • checks Optional[List[str]] - Optional list of checks to perform during the test run.

Returns:

  • TestRunItem - The resulting test run item for the completed test run.

get_test_run

def get_test_run(test_run_id: str) -> TestRunItem

Retrieve a test run by its ID.

Arguments:

  • test_run_id str - The ID of the test run to retrieve.

Returns:

  • TestRunItem - The test run item corresponding to the provided ID.

ModelInvocation Objects

@_attrs_define
class ModelInvocation()

Model invocation response object returned from a CustomModel.invoke method or as an element of a list returned from a CustomBatchModel.invoke_batch method.

Arguments:

  • model_prediction - Prediction from the model to be used when running the evaluation, e.g. predicted class from classification model or generated text completion from a generative model. This would typically be parsed out of the overall model_output_metadata.
  • model_input - All the input sent to the model.
  • model_output_metadata - Full model response, including any metadata returned with model's output.
  • tool_calls - List of tool calls made during the model invocation, if any.

OpenAIModel Objects

@define
class OpenAIModel(BaseModel)

An OpenAI model definition with prompt template and relevant parameters for an Okareo evaluation.

Arguments:

  • model_id - Model ID to request from OpenAI completion. For list of available models, see https://platform.openai.com/docs/models
  • temperature - Parameter for controlling the randomness of the model's output.
  • system_prompt_template - System role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}.
  • user_prompt_template - User role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}
  • dialog_template - Dialog template in OpenAI message format to pass to the model. Uses mustache syntax for variable substitution.
  • tools - List of tools to pass to the model.

GenerationModel Objects

@define
class GenerationModel(BaseModel)

An LLM definition with prompt template and relevant parameters for an Okareo evaluation.

Arguments:

  • model_id - Model ID to request for LLM completion.
  • temperature - Parameter for controlling the randomness of the model's output.
  • system_prompt_template - System role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}.
  • user_prompt_template - User role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}
  • dialog_template - Dialog template in OpenAI message format to pass to the model. Uses mustache syntax for variable substitution.
  • tools - List of tools to pass to the model.

OpenAIAssistantModel Objects

@_attrs_define
class OpenAIAssistantModel(BaseModel)

An OpenAI Assistant definition with prompt template and relevant parameters for an Okareo evaluation.

Arguments:

  • model_id - Assistant ID to request to run a thread against.
  • assistant_prompt_template - System role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}.
  • user_prompt_template - User role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}
  • dialog_template - Dialog template in OpenAI message format to pass to the model. Uses mustache syntax for variable substitution.

CohereModel Objects

@_attrs_define
class CohereModel(BaseModel)

A Cohere model definition with prompt template and relevant parameters for an Okareo evaluation.

Arguments:

PineconeDb Objects

@_attrs_define
class PineconeDb(BaseModel)

A Pinecone vector database configuration for use in an Okareo retrieval evaluation.

Arguments:

  • index_name - The name of the Pinecone index to connect to.
  • region - The region where the Pinecone index is hosted.
  • project_id - The project identifier associated with the Pinecone index.
  • top_k - The number of top results to retrieve for queries. Defaults to 5.

QdrantDB Objects

@_attrs_define
class QdrantDB(BaseModel)

A Qdrant vector database configuration for use in an Okareo retrieval evaluation.

Arguments:

  • collection_name - The name of the Qdrant collection to connect to.
  • url - The URL of the Qdrant instance.
  • top_k - The number of top results to retrieve for queries. Defaults to 5.
  • sparse - Whether to use sparse vectors for the Qdrant collection. Defaults to False.

CustomModel Objects

@_attrs_define
class CustomModel(BaseModel)

A custom model definition for an Okareo evaluation. Requires a valid invoke definition that operates on a single input.

Arguments:

  • name - A name for the custom model.

invoke

@abstractmethod
def invoke(input_value: Union[dict, list, str]) -> Union[ModelInvocation, Any]

Method for taking a single scenario input and returning a single model output

Arguments:

  • input_value - Union[dict, list, str] - input to the model.

Returns:

Union[ModelInvocation, Any] - model output. If the model returns a ModelInvocation, it should contain the model's prediction, input, and metadata. If the model returns a tuple, the first element should be the model's prediction and the second element should be the metadata.

CustomMultiturnTarget Objects

@_attrs_define
class CustomMultiturnTarget(BaseModel)

A custom model definition for an Okareo multiturn evaluation. Requires a valid invoke definition that operates on a single turn of a converstation.

invoke

@abstractmethod
def invoke(messages: List[dict[str, str]]) -> Union[ModelInvocation, Any]

Method for continuing a multiturn conversation with a custom model

Arguments:

  • messages - list - list of messages in the conversation

Returns:

Union[ModelInvocation, Any] - model output. If the model returns a ModelInvocation, it should contain the model's prediction, input, and metadata. If the model returns a tuple, the first element should be the model's prediction and the second element should be the metadata.

StopConfig Objects

@define
class StopConfig()

Configuration for stopping a multiturn conversation based on a specific check.

Arguments:

  • check_name - Name of the check to use for stopping the conversation.
  • stop_on - The check condition to stop the conversation. Defaults to True (i.e., conversation stops when check evaluates to True).

SessionConfig Objects

class SessionConfig()

Configuration for a custom API endpoint that starts a session.

Arguments:

  • url - URL of the endpoint to start the session.
  • method - HTTP method to use for the request. Defaults to POST.
  • headers - Headers to include in the request. Defaults to an empty JSON object.
  • body - Body to include in the request. Defaults to an empty JSON object.
  • status_code - Expected HTTP status code of the response.
  • response_session_id_path - Path to extract the session ID from the response. E.g., response.id will use the id field of the response JSON object to set the session_id.

TurnConfig Objects

class TurnConfig()

Configuration for a custom API endpoint that continues a session/conversation by one turn.

Arguments:

  • url - URL of the endpoint to start the session.
  • method - HTTP method to use for the request. Defaults to POST.
  • headers - Headers to include in the request. Supports mustache syntax for variable substitution for {latest_message}, {message_history}, {session_id}. Defaults to an empty JSON object.
  • body - Body to include in the request. Supports mustache syntax for variable substitution for {latest_message}, {message_history}, {session_id}. Defaults to an empty JSON object.
  • method1 - Expected HTTP status code of the response.
  • method2 - Path to extract the model's generated message from the response. E.g., method3 will parse out the corresponding field of the response JSON object as the model's generated response.
  • method4 - Path to extract tool calls from the response.

CustomEndpointTarget Objects

class CustomEndpointTarget()

A pair of custom API endpoints for starting a session and continuing a conversation to use in Okareo multiturn evaluation.

Arguments:

  • start_session - A valid SessionConfig for starting a session.
  • next_turn - A valid TurnConfig for requesting and parsing the next turn of a conversation.
  • max_parallel_requests - Maximum number of parallel requests to allow when running the evaluation.

MultiTurnDriver Objects

@_attrs_define
class MultiTurnDriver(BaseModel)

A driver model for Okareo multiturn evaluation.

Arguments:

  • target - Target model under test to use in the multiturn evaluation.
  • stop_check - A valid StopConfig or a dict that can be converted to StopConfig.
  • driver_model_id - Model ID to use for the driver model (e.g., "gpt-4.1").
  • driver_temperature - Parameter for controlling the randomness of the driver model's output.
  • repeats - Number of times to run a conversation per scenario row. Defaults to 1.
  • max_turns - Maximum number of turns to run in a conversation. Defaults to 5.
  • first_turn - Name of model (i.e., "target" or "driver") that should initiate each conversation. Defaults to "target".

CustomBatchModel Objects

@_attrs_define
class CustomBatchModel(BaseModel)

A custom batch model definition for an Okareo evaluation. Requires a valid invoke_batch definition that operates on a single input.

invoke_batch

@abstractmethod
def invoke_batch(
input_batch: list[dict[str, Union[dict, list, str]]]
) -> list[dict[str, Union[ModelInvocation, Any]]]

Method for taking a batch of scenario inputs and returning a corresponding batch of model outputs

Arguments:

  • input_batch - list[dict[str, Union[dict, list, str]]] - batch of inputs to the model. Expects a list of dicts of the format { 'id': str, 'input_value': Union[dict, list, str] }.

Returns:

List of dicts of format { 'id': str, 'model_invocation': Union[ModelInvocation, Any] }. 'id' must match the corresponding input_batch element's 'id'.