caikit.runtime.model_management.batcher

The Batcher transparently aggregates individual inference calls into unified batches to call the run_batch implementation of the wrapped model.

Attributes

log

Classes

Batcher

Module Contents

caikit.runtime.model_management.batcher.log[source]

class caikit.runtime.model_management.batcher.Batcher(model_name: str, model: caikit.core.ModuleBase, batch_size: int, batch_collect_delay_s: float | None = None)[source]

__doc__ = Multiline-String

Show Value

"""
The Batcher transparently aggregates individual inference calls into unified
batches to call the run_batch implementation of the wrapped model.
"""

_model_name

_model

_batch_size

_batch_collect_delay_s = None

_input_q

_finished_tasks

_req_num = 0

_id_lock

_ready_event

_stop_event

_batch_thread_start_lock

_batch_thread = None

_model_run_defaults

__del__()[source]: Shut down the internal thread

run(**kwargs) → caikit.core.data_model.base.DataBase[source]

This run function gives a facade to the underlying model’s run function that is implemented by running batches of individual requests through the model’s run_batch method.

NOTE: Only kwargs accepted to simplify batching across inconsistent sets: of kwargs (and only kwargs are used in the predict servicer)

stop()[source]: Stop this batcher’s run thread (cannot be undone)

_ensure_batch_thread()[source]: The run thread will stop itself if there’s no work to do, so this function is called to ensure that it’s up and running

_next_req_id()[source]: Make a unique ID for this request

_batch_thread_run()[source]: This function runs in an independent thread and manages pulling requests from the input queue, running the batch, and returning the completed results into _finished_tasks.