Inference Objective¶
Alpha since v1.0.0
The InferenceObjective resource is alpha and may have breaking changes in
future releases of the API.
Background¶
The InferenceObjective API defines a set of serving objectives of the specific request it is associated with. This CRD currently houses only Priority but will be expanded to include fields such as SLO attainment.
Usage¶
To associate a request to the InferencePool with a specific InferenceObjective, the system uses a specific header: x-gateway-inference-objective with the value of the header set to the InferenceObjective metadata name. So the calling client must set the header key/value on the request to associate the selected InferenceObjective. If no InferenceObjective is selected, default values are used.
Spec¶
The full spec of the InferenceObjective is defined here.