New models
We have added support for the following new models.
- gcp/gemini-2.0-flash-exp
- groq/llama-3.3-70b-versatile
OpperCLI now supports showing usage information
The OpperCLI now supports showing usage information for your account. This can be used to get an overview of your usage, and optionally grouped by your custom call tags.
The basic usage showing total_tokens
looks like this:
➜ opper usage list --fields=total_tokens
Usage Events:
Time Bucket: 2024-12-03T00:00:00Z
Cost: 0.029731
Count: 25
total_tokens: 4806
Time Bucket: 2024-12-04T00:00:00Z
Cost: 0.025908
Count: 13
total_tokens: 4155
Time Bucket: 2024-12-06T00:00:00Z
Cost: 0.017290
Count: 7
total_tokens: 2689
More usage information can be found by running the command:
➜ opper usage
Manage usage information
Usage:
opper usage [command]
Examples:
# List usage information
opper usage list
# List usage with time range and granularity
opper usage list --from-date=2024-01-01T00:00:00Z --to-date=2024-12-31T23:59:59Z --granularity=day
# List usage with specific fields and grouping
opper usage list --fields=completion_tokens,total_tokens --group-by=model,project.name
# Show count over time as ASCII graph (default)
opper usage list --graph
# Show cost over time as ASCII graph
opper usage list --graph=cost
# Show count over time by model
opper usage list --group-by model --graph
# Export usage as CSV
opper usage list --out csv
Tracking calls using a customer tag looks like this. First include the customer tag in the call:
opper.call(
name="my-function",
input="Hello, world!",
tags={"customer": "mycustomer"},
)
Then run the opper usage list --group-by=customer
command to see the usage information grouped by the customer tag.
➜ opper usage list --fields=total_tokens --group-by=customer
Usage Events:
Time Bucket: 2024-12-06T00:00:00Z
Cost: 0.025908
Count: 13
customer: <nil>
total_tokens: 4155
Time Bucket: 2024-12-06T00:00:00Z
Cost: 0.000007
Count: 1
customer: mycustomer
total_tokens: 23
New feature: Run evaluations on alternative models and prompts
Opper now supports running ad hoc evaluations with different models, instructions and function configurations. It works by running through a functions dataset entries and evaluating the results. This allows for testing how a function performs with current or alternative configuration.
See our documentation on Offline Evals for more information.
Updates to managing datasets
We have improved handling of datasets to help make it easier to populate them:
- Dataset entries now includes an
expected
field that is used in evaluations and in few shot configuration. - Dataset entries can be populated from any trace, by uploading a json file or through the sdks.
See our documentation on Datasets for more information.