caikit.runtime.protobufs.model_runtime_pb2_grpc

Client and server classes corresponding to protobufs-defined services.

Classes

`ModelRuntimeStub`	this is the internal "sidecar" API for interfacing with a
`ModelRuntimeServicer`	this is the internal "sidecar" API for interfacing with a
`ModelRuntime`	this is the internal "sidecar" API for interfacing with a

class caikit.runtime.protobufs.model_runtime_pb2_grpc.ModelRuntimeStub(channel)[source]

Bases: object

this is the internal “sidecar” API for interfacing with a colocated model runtime container

class caikit.runtime.protobufs.model_runtime_pb2_grpc.ModelRuntimeServicer[source]

Bases: object

this is the internal “sidecar” API for interfacing with a colocated model runtime container

loadModel(request, context)[source]: Load a model, return when model is fully loaded. Include size of loaded model in response if no additional cost. A gRPC error code of PRECONDITION_FAILED or INVALID_ARGUMENT should be returned if no attempt to load the model was made (so can be sure that no space remains used). Note that the RPC may be cancelled by model-mesh prior to completion, after which an unloadModel call will immediately be sent for the same model. To avoid state inconsistency and “leaking” memory, implementors should ensure that this case is properly handled, i.e. that the model doesn’t remain loaded after returning successfully from this unloadModel call.

unloadModel(request, context)[source]: Unload a previously loaded (or failed) model. Return when model is fully unloaded, or immediately if not found/loaded.

predictModelSize(request, context)[source]: Predict size of not-yet-loaded model - must return almost immediately. Should not perform expensive computation or remote lookups. Should be a conservative estimate.

modelSize(request, context)[source]: Calculate size (memory consumption) of currently-loaded model

runtimeStatus(request, context)[source]: Provide basic runtime status and parameters; called only during startup. Before returning a READY status, implementations should check for and purge any/all currently-loaded models. Since this is only called during startup, there should very rarely be any, but if there are it implies the model-mesh container restarted unexpectedly and such a purge must be done to ensure continued consistency of state and avoid over-committing resources.

caikit.runtime.protobufs.model_runtime_pb2_grpc.add_ModelRuntimeServicer_to_server(servicer, server)[source]

class caikit.runtime.protobufs.model_runtime_pb2_grpc.ModelRuntime[source]

Bases: object

this is the internal “sidecar” API for interfacing with a colocated model runtime container

static loadModel(request, target, options=(), channel_credentials=None, call_credentials=None, insecure=False, compression=None, wait_for_ready=None, timeout=None, metadata=None)[source]

static unloadModel(request, target, options=(), channel_credentials=None, call_credentials=None, insecure=False, compression=None, wait_for_ready=None, timeout=None, metadata=None)[source]

static predictModelSize(request, target, options=(), channel_credentials=None, call_credentials=None, insecure=False, compression=None, wait_for_ready=None, timeout=None, metadata=None)[source]

static modelSize(request, target, options=(), channel_credentials=None, call_credentials=None, insecure=False, compression=None, wait_for_ready=None, timeout=None, metadata=None)[source]

static runtimeStatus(request, target, options=(), channel_credentials=None, call_credentials=None, insecure=False, compression=None, wait_for_ready=None, timeout=None, metadata=None)[source]