okareo.model_under_test
ModelUnderTest Objects
class ModelUnderTest(AsyncProcessorMixin)
A class for managing a Model Under Test (MUT) in Okareo. Returned by okareo.register_model()
submit_test
def submit_test(
scenario: Union[ScenarioSetResponse, str],
name: str,
api_key: Optional[str] = None,
api_keys: Optional[dict] = None,
metrics_kwargs: Optional[dict] = None,
test_run_type: TestRunType = TestRunType.MULTI_CLASS_CLASSIFICATION,
calculate_metrics: bool = True,
checks: Optional[List[str]] = None) -> TestRunItem
Asynchronous server-based version of test-run execution. For CustomModels, model invocations are handled client-side then evaluated server-side asynchronously. For other models, model invocations and evaluations handled server-side asynchronously.
Arguments:
scenario
Union[ScenarioSetResponse, str] - The scenario set or identifier to use for the test run.name
str - The name to assign to the test run.api_key
Optional[str] - Optional API key for authentication.api_keys
Optional[dict] - Optional dictionary of API keys for different services.metrics_kwargs
Optional[dict] - Optional dictionary of keyword arguments for metrics calculation.test_run_type
TestRunType - The type of test run to execute. Defaults to MULTI_CLASS_CLASSIFICATION.calculate_metrics
bool - Whether to calculate metrics after the test run. Defaults to True.checks
Optional[List[str]] - Optional list of checks to perform during the test run.
Returns:
TestRunItem
- The resulting test run item for the submitted test run. Theid
field can be used to retrieve the test run.
run_test
def run_test(
scenario: Union[ScenarioSetResponse, str],
name: str,
api_key: Optional[str] = None,
api_keys: Optional[dict] = None,
metrics_kwargs: Optional[dict] = None,
test_run_type: TestRunType = TestRunType.MULTI_CLASS_CLASSIFICATION,
calculate_metrics: bool = True,
checks: Optional[List[str]] = None) -> TestRunItem
Server-based version of test-run execution. For CustomModels, model invocations are handled client-side then evaluated server-side. For other models, model invocations and evaluations handled server-side.
Arguments:
scenario
Union[ScenarioSetResponse, str] - The scenario set or identifier to use for the test run.name
str - The name to assign to the test run.api_key
Optional[str] - Optional API key for authentication.api_keys
Optional[dict] - Optional dictionary of API keys for different services.metrics_kwargs
Optional[dict] - Optional dictionary of keyword arguments for metrics calculation.test_run_type
TestRunType - The type of test run to execute. Defaults to MULTI_CLASS_CLASSIFICATION.calculate_metrics
bool - Whether to calculate metrics after the test run. Defaults to True.checks
Optional[List[str]] - Optional list of checks to perform during the test run.
Returns:
TestRunItem
- The resulting test run item for the completed test run.
get_test_run
def get_test_run(test_run_id: str) -> TestRunItem
Retrieve a test run by its ID.
Arguments:
test_run_id
str - The ID of the test run to retrieve.
Returns:
TestRunItem
- The test run item corresponding to the provided ID.
ModelInvocation Objects
@_attrs_define
class ModelInvocation()
Model invocation response object returned from a CustomModel.invoke method or as an element of a list returned from a CustomBatchModel.invoke_batch method.
Arguments:
model_prediction
- Prediction from the model to be used when running the evaluation, e.g. predicted class from classification model or generated text completion from a generative model. This would typically be parsed out of the overall model_output_metadata.model_input
- All the input sent to the model.model_output_metadata
- Full model response, including any metadata returned with model's output.tool_calls
- List of tool calls made during the model invocation, if any.
OpenAIModel Objects
@define
class OpenAIModel(BaseModel)
An OpenAI model definition with prompt template and relevant parameters for an Okareo evaluation.
Arguments:
model_id
- Model ID to request from OpenAI completion. For list of available models, see https://platform.openai.com/docs/modelstemperature
- Parameter for controlling the randomness of the model's output.system_prompt_template
-System
role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g.{scenario_input}
.user_prompt_template
-User
role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g.{scenario_input}
dialog_template
- Dialog template in OpenAI message format to pass to the model. Uses mustache syntax for variable substitution.tools
- List of tools to pass to the model.
GenerationModel Objects
@define
class GenerationModel(BaseModel)
An LLM definition with prompt template and relevant parameters for an Okareo evaluation.
Arguments:
model_id
- Model ID to request for LLM completion.temperature
- Parameter for controlling the randomness of the model's output.system_prompt_template
-System
role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g.{scenario_input}
.user_prompt_template
-User
role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g.{scenario_input}
dialog_template
- Dialog template in OpenAI message format to pass to the model. Uses mustache syntax for variable substitution.tools
- List of tools to pass to the model.
OpenAIAssistantModel Objects
@_attrs_define
class OpenAIAssistantModel(BaseModel)
An OpenAI Assistant definition with prompt template and relevant parameters for an Okareo evaluation.
Arguments:
model_id
- Assistant ID to request to run a thread against.assistant_prompt_template
-System
role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g.{scenario_input}
.user_prompt_template
-User
role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g.{scenario_input}
dialog_template
- Dialog template in OpenAI message format to pass to the model. Uses mustache syntax for variable substitution.
CohereModel Objects
@_attrs_define
class CohereModel(BaseModel)
A Cohere model definition with prompt template and relevant parameters for an Okareo evaluation.
Arguments:
model_id
- Model ID to request for the Cohere completion. For a full list of available models, see https://docs.cohere.com/v2/docs/modelsmodel_type
- Type of application for the Cohere model. Currently, we support 'classify' and 'embed'.input_type
- Input type for the Cohere embedding model. For more details, see https://docs.cohere.com/v2/docs/embeddings#the-input_type-parameter
PineconeDb Objects
@_attrs_define
class PineconeDb(BaseModel)
A Pinecone vector database configuration for use in an Okareo retrieval evaluation.
Arguments:
index_name
- The name of the Pinecone index to connect to.region
- The region where the Pinecone index is hosted.project_id
- The project identifier associated with the Pinecone index.top_k
- The number of top results to retrieve for queries. Defaults to 5.
QdrantDB Objects
@_attrs_define
class QdrantDB(BaseModel)
A Qdrant vector database configuration for use in an Okareo retrieval evaluation.
Arguments:
collection_name
- The name of the Qdrant collection to connect to.url
- The URL of the Qdrant instance.top_k
- The number of top results to retrieve for queries. Defaults to 5.sparse
- Whether to use sparse vectors for the Qdrant collection. Defaults to False.
CustomModel Objects
@_attrs_define
class CustomModel(BaseModel)
A custom model definition for an Okareo evaluation.
Requires a valid invoke
definition that operates on a single input.
Arguments:
name
- A name for the custom model.
invoke
@abstractmethod
def invoke(input_value: Union[dict, list, str]) -> Union[ModelInvocation, Any]
Method for taking a single scenario input and returning a single model output
Arguments:
input_value
- Union[dict, list, str] - input to the model.
Returns:
Union[ModelInvocation, Any] - model output. If the model returns a ModelInvocation, it should contain the model's prediction, input, and metadata. If the model returns a tuple, the first element should be the model's prediction and the second element should be the metadata.
CustomMultiturnTarget Objects
@_attrs_define
class CustomMultiturnTarget(BaseModel)
A custom model definition for an Okareo multiturn evaluation.
Requires a valid invoke
definition that operates on a single turn of a converstation.
invoke
@abstractmethod
def invoke(messages: List[dict[str, str]]) -> Union[ModelInvocation, Any]
Method for continuing a multiturn conversation with a custom model
Arguments:
messages
- list - list of messages in the conversation
Returns:
Union[ModelInvocation, Any] - model output. If the model returns a ModelInvocation, it should contain the model's prediction, input, and metadata. If the model returns a tuple, the first element should be the model's prediction and the second element should be the metadata.
StopConfig Objects
@define
class StopConfig()
Configuration for stopping a multiturn conversation based on a specific check.
Arguments:
check_name
- Name of the check to use for stopping the conversation.stop_on
- The check condition to stop the conversation. Defaults toTrue
(i.e., conversation stops when check evaluates toTrue
).
SessionConfig Objects
class SessionConfig()
Configuration for a custom API endpoint that starts a session.
Arguments:
url
- URL of the endpoint to start the session.method
- HTTP method to use for the request. Defaults toPOST
.headers
- Headers to include in the request. Defaults to an empty JSON object.body
- Body to include in the request. Defaults to an empty JSON object.status_code
- Expected HTTP status code of the response.response_session_id_path
- Path to extract the session ID from the response. E.g.,response.id
will use theid
field of the response JSON object to set thesession_id
.
TurnConfig Objects
class TurnConfig()
Configuration for a custom API endpoint that continues a session/conversation by one turn.
Arguments:
url
- URL of the endpoint to start the session.method
- HTTP method to use for the request. Defaults toPOST
.headers
- Headers to include in the request. Supports mustache syntax for variable substitution for{latest_message}
,{message_history}
,{session_id}
. Defaults to an empty JSON object.body
- Body to include in the request. Supports mustache syntax for variable substitution for{latest_message}
,{message_history}
,{session_id}
. Defaults to an empty JSON object.method
1 - Expected HTTP status code of the response.method
2 - Path to extract the model's generated message from the response. E.g.,method
3 will parse out the corresponding field of the response JSON object as the model's generated response.method
4 - Path to extract tool calls from the response.
CustomEndpointTarget Objects
class CustomEndpointTarget()
A pair of custom API endpoints for starting a session and continuing a conversation to use in Okareo multiturn evaluation.
Arguments:
start_session
- A valid SessionConfig for starting a session.next_turn
- A valid TurnConfig for requesting and parsing the next turn of a conversation.max_parallel_requests
- Maximum number of parallel requests to allow when running the evaluation.
MultiTurnDriver Objects
@_attrs_define
class MultiTurnDriver(BaseModel)
A driver model for Okareo multiturn evaluation.
Arguments:
target
- Target model under test to use in the multiturn evaluation.stop_check
- A valid StopConfig or a dict that can be converted to StopConfig.driver_model_id
- Model ID to use for the driver model (e.g., "gpt-4.1").driver_temperature
- Parameter for controlling the randomness of the driver model's output.repeats
- Number of times to run a conversation per scenario row. Defaults to 1.max_turns
- Maximum number of turns to run in a conversation. Defaults to 5.first_turn
- Name of model (i.e., "target" or "driver") that should initiate each conversation. Defaults to "target".
CustomBatchModel Objects
@_attrs_define
class CustomBatchModel(BaseModel)
A custom batch model definition for an Okareo evaluation.
Requires a valid invoke_batch
definition that operates on a single input.
invoke_batch
@abstractmethod
def invoke_batch(
input_batch: list[dict[str, Union[dict, list, str]]]
) -> list[dict[str, Union[ModelInvocation, Any]]]
Method for taking a batch of scenario inputs and returning a corresponding batch of model outputs
Arguments:
input_batch
- list[dict[str, Union[dict, list, str]]] - batch of inputs to the model. Expects a list of dicts of the format{ 'id': str, 'input_value': Union[dict, list, str] }
.
Returns:
List of dicts of format { 'id': str, 'model_invocation': Union[ModelInvocation, Any] }
. 'id' must match
the corresponding input_batch element's 'id'.