Jenkins
Okareo is easy to use within the Jenkins ecosystem. You can use it for CI/CD including synthetic scenario generation, evaluation, and deployment readiness.
You can drive Okareo through your Jenkins build pipelines with our dedicated CLI or as part of a unit test framework. Okareo supports all python and typescript/javascript unit test frameworks such as Jest and Pytest. To learn more, please refer to the CLI, Python, or Typescript documentation.
The SDK requires an API Token. Refer to the Okareo API Key guide for more information.
Usage
To use Jenkins, it is best to prepare the tests in a repo that you persist in your preferred SCM (GitHub, GitLab, BitBucket, Perforce, SVN, etc).
The Okareo CLI provides easy access to the Okareo SDKs and APIs. It does not presume a specific language/version. As a result, make sure to setup either Typescript or Python as part of your build steps.
Setting up Jenkins
Regardless of specific language you decide to use, there are several key concerns you should manage as part of your Jenkins install.
- Make sure to manage the OKAREO_API_KEY as a secret. You can make as many API keys in Okareo as needed to isolate build concerns. Make sure not to pass them around. They are secrets.
- We strongly suggest maintaining a repo with the build tests, scripts, and configurations in it. Although this is not strictly required, it is a best practice in CI and can save you days of work rebuilding your pipeline when a repo could persist all the necessary configuration.
- Driving builds from Jenkins is a great idea. But don't overload Jenkins with partial analytics. Use the Okareo interface to debug and analyze results.
We are seeking feedback on Evaluations in CI. Don't be shy, reach out. We would love to hear from you and solve any challenges your are facing with AI CI. Connect and tell us what you think.
Working with Okareo
Because Okareo supports a wide range of usage models for Jenkins, the primary work to be done is in test design.
We suggest starting with Model Unit Testing. The goal of MUT is to isolate a model and/or LLM+Context to establish a performance threshold for that compoonent. In Typescript, Okaroe provides a reporting function that you can use to set thresholds for build failure. Model Unit Tests should be triggered whenever relevant code or modificatiosn to the model are promoted.
In addition to Model Unit Testing are End-to-End Model tests. The integration test as it is often called will typically utilize API entry points that span multiple services and models. In the case of RAG or Agentics, a single integration test may touch upwards of 3-4 models before completing. The Okareo CustomModel mechanism enables you to baseline these multi-service and multi-model interactions just as you would a single module under test.
Viewing Results in Okareo
You can view the results of your synthetic scenario generations and model evaluations in https://app.okareo.com/. The json response always includes direct links to newly generated entities (scenarios, models, evaluations) in a format like:
https://app.okareo.com/project/<project UUID>/scenario/<scenario UUID>
or for evaluations:
https://app.okareo.com/project/<project UUID>/eval/<evaluation UUID>
Reporters (Typescript)
The Typescript SDK provides reporters that can provide information directly into the CI console. This provides high level visibility of evaluation results within Jenkins.
Example Summarization console output for a failing test:
Example Retrieval Report
Learn more about reporters in the Typescript SDK.