caikit.core.modules

Submodules

Classes

ModuleBase

Abstract base class from which all modules should inherit.

ModuleConfig

Config object used by all modules for config loading, saving, etc.

ModuleLoader

ModuleSaver

A module saver that provides common functionality used for saving modules and also a context

Functions

module([id, name, version, task, tasks, backend_type, ...])

Apply this decorator to any class that should be treated as a caikit module

Package Contents

class caikit.core.modules.ModuleBase[source]

Abstract base class from which all modules should inherit.

_metadata
_load_backend = None
property metadata: Dict[str, Any]

This module’s metadata.

Returns:

Dict[str, Any]: A dictionary of this module’s metadata

TODO: Can this be a ModuleConfig object instead? (or aconfig.Config)?

property module_metadata: Dict[str, Any]

Helper property to return metadata about a Module. This function is separate from metadata as this is specific for the class module. This function also requires a flat metadata structure without nested dictionaries.

NOTE: This should be a @classmethod but using @property/@classmethod together has been deprecated

Returns:

Dict[str, str]: A dictionary of this ModuleBases’s metadata

property public_model_info: Dict[str, Any]

Helper property to return public metadata about a specific Model. This function is separate from metdata as that contains the entire ModelConfig which might not want to be shared/exposed.

Returns:

Dict[str, str]: A dictionary of this models’s public metadata

set_load_backend(load_backend)[source]

Method used by the model manager to indicate the load backend that was used to load this module

classmethod get_inference_signature(input_streaming: bool, output_streaming: bool, task: Type[caikit.core.TaskBase] = None) caikit.core.signature_parsing.CaikitMethodSignature | None[source]

Returns the inference method signature that is capable of running the module’s task for the given flavors of input and output streaming

classmethod get_inference_signatures(task: Type[caikit.core.TaskBase]) List[Tuple[bool, bool, caikit.core.signature_parsing.CaikitMethodSignature]][source]

Returns inference method signatures for all supported flavors of input and output streaming for a given task

property load_backend

Get the backend instance used to load this module. This can be used in module implementations that require use of a specific backend at inference time.

classmethod bootstrap(*args, **kwargs)[source]

Bootstrap a module. This method can be used to initialize the module from artifacts created outside of a particular caikit library

classmethod load(model_path: str | caikit.core.modules.config.ModuleConfig, *args, **kwargs) ModuleBase[source]

Load a new instance of workflow from a given model_path

Args:
model_path (Union[str, ModuleConfig]): Path to saved model or

in-memory ModuleConfig

Returns:

model (ModuleBase): A new instance of this module class

classmethod _load(module_loader, *args, **kwargs)[source]

Load a model.

classmethod timed_load(*args, **kwargs)[source]

Time a model load call.

Args:

*args (list): Will be passed to self.load. **kwargs (dict): Will be passed to self.load – the only way to

pass arbitrary arguments to self.load from this function.

Returns:
int, caikit.core._ModuleBase: The first return value is the total

time spent in the self.load call. The second return value is the loaded model.

Notes:

You can pass everything that should go to the run function normally using args/kwargs. Example: model.timed_load(“/model/path/dir”)

validate_loaded_model(*args)[source]

Validate a loaded model.

save(model_path: str, *args, **kwargs)[source]

Save a model.

Args:

model_path (str): Path on disk to export the model to.

as_file_like_object(*args, **kwargs) io.BytesIO[source]

Produces a file-like object corresponding to a zip archive affiliated with a given model. This method wraps is functionally similar to .save() - it saves a model into a temporary directory and produces a zip archive, then loads the result as a io.BytesIO object. The result of this function is also compatible with .load(), and cleanup is handled automatically.

Args:

*args, **kwargs (dict): Optional keyword arguments for saving.

Returns:
io.BytesIO: File like object holding an exported model in memory as

a io.BytesIO object.

as_bytes(*args, **kwargs) bytes[source]

Produces a bytes object corresponding to a zip archive affiliated with a given model. This method wraps is functionally similar to .save() - it saves a model into a temporary directory and produces a zip archive, then loads the result as a bytes object. The result of this function is also compatible with .load(), and cleanup is handled automatically.

Args:

*args, **kwargs (dict): Optional keyword arguments for saving.

Returns:

bytes: bytes object holding an exported model in memory.

run(*args, **kwargs)[source]

Run a model - this typically makes a single prediction and returns an object from the data model.

run_batch(*args, **kwargs)[source]

Run a model in batch mode - this typically ingests an iterable of inputs that can be applied to run & returns a list of data model objects that run ordinarily returns. A module may override this method to provide faster evaluation capabilities, e.g., by leveraging vectorization during prediction.

All provided args and kwargs that should be expanded with the batch should be provided as prebatched iterables. If a provided arg/kwarg is not provided as an iterable, it will be passed as is to all self contained run calls, which may be the case in some rare cases, such as runtime explanability enablement.

This function is intentionally kept as simple as possible. In order to maintain its simplicity, all argument iterables must be the same length, where the length of every provided iterable is presumed to be the batch size. If an iterable must be passed as arg to each run call, batch run must be called by wrapping it in another iterable and duplicating the iterable arg to match the size, or (ideally) overridden in the subclass as necessary.

Args:

*args: Variable length argument list to be passed directly to run(). **kwargs: Arbitrary keyword arguments to be passed directly to run().

Returns:

tuple: Iterable of prediction outputs, run as a batch.

timed_run(*args, num_seconds=None, num_iterations=None, **kwargs)[source]

Time a number of runs over set seconds or iterations.

Args:

*args (list): Will be passed to self.run. num_seconds (int): Minimum number of seconds to run timed_run over.

Will most likely be more than this value due to its waiting for the each call to self.run to finish.

num_iterations (int): Minimum number of iterations to run timed_run

over. Will run exactly this many times.

**kwargs (dict): Will be passed to self.run.

Returns:
int, int, caikit.core.data_model.DataBase: The first return value is

the total time spent in the self.run loop. The second return value is the total number of calls to self.run were made. The return value is the output of the module’s run method

Notes:

You can pass everything that should go to the run function normally using args/kwargs. Example: model.timed_run(“some example text”, num_seconds=60)

By default it will run for greater than or equal to 120 seconds.

stream(data_stream, *args, **kwargs)[source]

Lazily evaluate a run() on a given model by constructing a new data stream generator from the results. Note that we do not allow datastreams in args/kwargs. In rare cases, this may mean that stream() is not available, e.g., for keywords extraction. In these cases, stream() should be overridden in the subclass (module implementation) to allow and expand along multiple data streams.

Args:
data_stream (caikit.core.data_model.DataStream): Datastream to be

lazily sequentially processed by the module under consideration.

*args: Variable length argument list to be passed directly to run(). **kwargs: Arbitrary keyword arguments to be passed directly to run().

Returns:

protobufs: A DataBase object.

classmethod train(*args, **kwargs)[source]

Train a model.

classmethod validate_training_data(training_data: str | caikit.core.data_model.DataStream, limit: int = -1) List[caikit.core.exceptions.validation_error.DataValidationError][source]

Validate a set of training data, passed as a filename or as a data stream. Return up to limit number of DataValidationErrors

evaluation_type = None
evaluator = None
static find_label_func(*_args, **_kwargs)[source]
Abstractmethod:

Function used to extract “label” from a prediction/result of a module’s .run method. Define if you wish to have more specific evaluation metrics. Implemented in subclass.

static find_label_data_func(*_args, **_kwargs)[source]
Abstractmethod:

Function used to extract data belonging to class “label” from a prediction/result of a module’s .run method. Define if you wish to have more specific evaluation metrics. Implemented in subclass.

evaluate_quality(dataset_path, preprocess_func=None, detailed_metrics=False, labels=None, partial_match_metrics=False, max_hierarchy_levels=3, **kwargs)[source]

Run quality evaluation for instance of module

Args:
dataset_path (str): Path to where the input “gold set” dataset

lives. Most often this is .json file.

preprocess_func (method): Function used as proxy for any preliminary

steps that need to be taken to run the model on the input text. This helper function ultimately leads to the input to this module and may involve executing other modules.

detailed_metrics: boolean (Optional, defaults to False)

Only for ‘keywords’. Include partial scores and scores over every text in document.

labels: list (Optional, defaults to None)

Optional list of class labels to evaluate quality on. By default evaluation is done over all class labels. Using this, you can explicitly mention only a subset of labels to include in the quality evaluation.

partial_match_metrics: boolean (Optional, defaults to False)

Include partial match micro avg F1.

max_hierarchy_levels (int): Used in hierarchical multilabel

multiclass evaluation only. The number of levels in the hierarchy to run model evaluation on, in addition to complete matches.

*args, **kwargs: Optional arguments which can be used by goldset/prediction

set extraction. Nonekeyword arguments: block_level: str

For any module that has pre processing steps in the middle of raw text and actual module input, use the input from gold standard labels instead of a pre-process function. Useful for measuring quality for the ‘block’ alone (instead of the module + pre_process pipeline)

Returns:
dict: Dictionary of results provided by the self.evaluator.run

function, depending on the associated evaluation_type. Reports things like precision, recall, and f1.

static _is_expandable_iterable(arg)[source]

Check to see if something is a list / tuple of data model objects or strings. If it is, we consider it “expandable”, meaning that one element of the iterable to one run call. In contrast, if something is not expandable, it will be passed as is to each call.

Args:

arg (any): Argument to run_batch being considered.

Returns:
bool: True if the argument is a compatible iterable, False

otherwise.

_validate_and_extract_batch_size(*args, **kwargs)[source]

Check to ensure that there’s at least one iterable whose length is well defined, i.e., no generators, and that if multiple iterable arg/kwarg values are provided, they are all the same length.

Args:

*args: Variable length argument list to be passed directly to run(). **kwargs: Arbitrary keyword arguments to be passed directly to run().

Returns:

int: Inferred batch size based on expandable iterables.

_validate_arg_and_verify_batch_size(val, current_batch_size)[source]

Check an arg value from args/kwargs. If we find that it’s an expandable iterable, see if it conflicts with what we know about the inferred batch size so far.

args:

val (any): Argument / keyword argument value being inspected. current_batch_size (None | int): Current inferred batch size from

previous args/kwargs, or None if no inferences have been made on other expandable iterables yet.

Returns:

None | inferred batch size.

static _build_args_for_default_run_with_batch(fixed_args, expanded_args, idx)[source]

Build the non keyword arguments for run_batch’s default implementation by expanding iterable args where possible, and grouping them with repeated noniterable arguments. The index correspondes to the current document under consideration.

Args:

fixed_args (dict): Noniterable args - common across all documents. expanded_args (dict): Iterable args - we’ll need to index into this

to get our doc arg.

idx (int): Index of the document being considered.

Returns:

list: Args to be run for document [idx].

static _build_kwargs_for_default_run_with_batch(fixed_kwargs, expanded_kwargs, idx)[source]

Similar to the previous function, but for kwargs. Note that we can just clone our fixed kwargs instead of cycling through them, because order doesn’t matter here.

Args:
fixed_args (dict): Noniterable valued kwargs - common across all

documents.

expanded_args (dict): Iterable valued kwargs - we’ll need to index

into these to get our doc kwarg.

Returns:

dict: Kwargs to be run for document [idx].

_extract_gold_set(dataset)[source]

Method for extracting gold set from dataset. Implemented in subclass.

Args:
dataset (object): In-memory version of whatever is loaded from on-

disk. May be json, txt, etc.

Returns:
list: List of labels in the format of the module_type that is being

called.

_extract_pred_set(dataset, preprocess_func=None, **kwargs)[source]

Method for extracting pred set from dataset. Implemented in subclass.

Args:
dataset (object): In-memory version of whatever is loaded from on-

disk. May be json, txt, etc.

preprocess_func (method): Function used as proxy for any preliminary

steps that need to be taken to run the model on the input text. This helper function ultimately leads to the input to this module and may involve executing other modules.

**kwargs (dict): Optional keyword arguments for prediction set extraction.

Returns:
list: List of labels in the format of the module_type that is being

called.

static _load_evaluation_dataset(dataset_path)[source]

Helper specifically for dataset loading.

Args:
dataset_path (str): Path to where the input ‘gold set’ dataset

lives. Most often this is .json file.

Returns:
object: list, dict, or other python object, depending on the input

dataset_path extension. Currently only supports .json and uses fileio from toolkit.

static _extract_gold_annotations(gold_set)[source]

Extract the core list of annotations that is needed for quality evaluation

Args:

gold_set (list)

Returns:

gold_annotations: list

static _extract_pred_annotations(pred_set)[source]

Extract the core list of predictions that is needed for quality evaluation

Args:

pred_set (list)

Returns:

pred_annotations: list

static _generate_report(report, gold_set)[source]

Generate the quality report output Args:

report (dict) gold_set (list(dict))

class caikit.core.modules.ModuleConfig(config_dict)[source]

Bases: aconfig.Config

Config object used by all modules for config loading, saving, etc.

reserved_keys = ['model_path']
classmethod load(model_path: str | ModuleConfig) ModuleConfig[source]

Load a new module configuration from a directory on disk.

Args:
model_path (Union[str, ModuleConfig]): Path to model directory. At

the top level of directory is config.yml which holds info about the model. Note that the model_path here is assumed to be operating system correct as a consequence of the way this method is invoked by the model manager.

Returns:
model_config (ModuleConfig): Instantiated ModuleConfig for model

given model_path.

save(model_path)[source]

Save this module configuration to a top-level config.yml file in the specified model path.

Args: str

Path to model directory. The config.yml file will be written to this location.

Notes:

model_path must already exist! This means you must create the directory outside of this routine.

caikit.core.modules.module(id=None, name=None, version=None, task: Type[caikit.core.task.TaskBase] = None, tasks: List[Type[caikit.core.task.TaskBase]] | None = None, backend_type='LOCAL', base_module: str | Type[caikit.core.modules.base.ModuleBase] = None, backend_config_override: Dict | None = None)[source]
Apply this decorator to any class that should be treated as a caikit module

(i.e., extends`{caikit.core.ModuleBase}) and registered with caikit.core so that the library “knows” the class is a caikit module and is capable of loading instances of the module.

Args:
id: str

A UUID to use when registering this module with caikit.core Not required if based on another caikit module using base_module

name: str

A human-readable name for the module Not required if based on another caikit module using base_module

version: str

A SemVer for the module Not required if based on another caikit module using base_module

task: Type[TaskBase]

An ML task class that this module is an implementation for Not required if based on another caikit module using base_module, or if multiple tasks are specified using tasks.

tasks: Optional[List[Type[TaskBase]]

List of ML task classes that this module implements.

backend_type: backend_type

Associated backend type for the module. Default: LOCAL

base_module: str | ModuleBase

If this module is based on a different caikit module, provide name of the base module. Default: None

backend_config_override: Dict

Dictionary containing configuration required for the specific backend. Default: None

Returns:

A decorated version of the class to which it was applied, after registering the class as a valid module with caikit.core

class caikit.core.modules.ModuleLoader(model_path: str | caikit.core.modules.config.ModuleConfig)[source]
MODULE_PATHS_KEY = 'module_paths'
config
model_path
load_arg(arg)[source]

Extract arg value from the loaded model’s config

load_args(*args)[source]

Extract values from the loaded model’s config

load_module(module_paths_key, load_singleton=False)[source]

Load a CaikitCore module from a module config.module_paths specification.

Args:
module_paths_key (str): key in config.module_paths looked at to

load a module

load_singleton (bool): singleton load flag to pass to individual

module loads

load_module_list(module_paths_key)[source]

Load a list of CaikitCore module from a workflow config.module_paths specification.

Args:
module_paths_key (str): key in config.module_paths looked at to

load a list of modules

Returns:

list: list of loaded modules

class caikit.core.modules.ModuleSaver(module: caikit.core.modules.base.ModuleBase, model_path, exist_ok=True)[source]

A module saver that provides common functionality used for saving modules and also a context manager that cleans up in case an error is encountered during the save process for a model_path that did not already exist.

SAVED_KEY_NAME = 'saved'
CREATED_KEY_NAME = 'created'
TRACKING_KEY_NAME = 'tracking_id'
MODULE_VERSION_KEY_NAME = 'version'
MODULE_ID_KEY_NAME = 'module_id'
MODULE_CLASS_KEY_NAME = 'module_class'
model_path = b'.'
exist_ok = True
config
add_dir(relative_path, base_relative_path='')[source]

Create a directory inside the model_path for this saver.

Args:
relative_path (str): A path relative to this saver’s model_path

denoting the directory to create.

base_relative_path (str): A path, relative to this saver’s

model_path, in which relative_path will be created.

Returns:
str, str: A tuple containing both the relative_path and

absolute_path to the directory created.

Examples:
>>> with ModelSaver('/path/to/model') as saver:
>>>     rel_path, abs_path = saver.add_dir('word_embeddings', 'model_data')
>>> print(rel_path)
model_data/word_embeddings
>>> print(abs_path)
/path/to/model/model_data/word_embeddings
copy_file(file_path, relative_path='')[source]

Copy an external file into a subdirectory of the model_path for this saver.

Args:

file_path (str): Absolute path to the external file to copy. relative_path (str): The relative path inside of model_path where

the file will be copied to. If set to the empty string (default) then the file will be placed directly in the model_path directory.

Returns:
str, str: A tuple containing both the relative_path and

absolute_path to the copied file.

save_object(obj, filename, serializer, relative_path='')[source]

Save a Python object using the provided ObjectSerializer.

Args:

obj (any): The Python object to save filename (str): The filename to use for the saved object serializer (ObjectSerializer): An ObjectSerializer instance (e.g.,

YAMLSerializer) that should be used to serialize the object

relative_path (str): The relative path inside of model_path where

the object will be saved

update_config(additional_config)[source]

Add items to this saver’s config dictionary.

Args:
additional_config (dict): A dictionary of config options to add the

this saver’s configuration.

Notes:

The behavior of this method matches dict.update and is equivalent to calling saver.config.update. The saver.config dictionary may be accessed directly for more sophisticated manipulation of the configuration.

save_module(module, relative_path, **kwargs)[source]

Save a CaikitCore module within a workflow artifact and add a reference to the config.

Args:
module (caikit.core.ModuleBase): The CaikitCore module to save as

part of this workflow

relative_path (str): The relative path inside of model_path where

the module will be saved

**kwargs: dict

key-value pair of parameters to be passed to module.save

save_module_list(modules, config_key, **kwargs)[source]

Save a list of CaikitCore modules within a workflow artifact and add a reference to the config.

Args:
modules (dict{str -> caikit.core.ModuleBase}): A dict with module

relative path as key and a CaikitCore module as value to save as part of this workflow

config_key (str): The config key inside of model_path where the

modules’ relative path with be referenced

**kwargs: dict

key-value pair of parameters to be passed to module.save

Returns:
list_of_rel_path: list(str)

List of relative paths where the modules are saved

list_of_abs_path: list(str)

List of absolute paths where the modules are saved

__enter__()[source]

Enter the module saver context. This creates the model_path directory. If this context successfully exits, then the model configuration and all files it contains will be written and saved to disk inside the model_path directory.

If exist_ok is False, an exception will be raised before touching existing model_path files.

If any uncaught exceptions are thrown inside this context, and exist_ok is False, then this new model_path will be removed. If exist_ok is True, the files will be kept and may include incomplete updates.

__exit__(exc_type, exc_val, exc_tb)[source]

Exit the module saver context. If this context successfully exits, then the model configuration and all files it contains will be written and saved to disk inside the model_path directory.

If any uncaught exceptions are thrown inside this context, and exist_ok is False, then this new model_path will be removed. If exist_ok is True, the files will be kept and may include incomplete updates.