caikit.core
Caikit Core AI Framework library. This is the base framework for core AI/ML libraries.
Submodules
Attributes
Exceptions
This error is used for data validation problems during training |
Classes
A DataObject is a data model class that is backed by a @dataclass. |
|
Manage the models or resources for library. |
|
Interface for creating configuration setup for backends |
|
Interface for creating configuration setup for backends |
|
Abstract base class from which all modules should inherit. |
|
Config object used by all modules for config loading, saving, etc. |
|
A module saver that provides common functionality used for saving modules and also a context |
|
The TaskBase defines the interface for an abstract AI task |
|
Enum that contains set of all possible evaluation types. |
|
Class that holds all evaluation logic for now. May eventually be broken up into |
|
Abstract class for serializing an object to disk. |
|
An ObjectSerializer for serializing to a JSON file. |
|
An ObjectSerializer for serializing a python list to a text file. |
|
An ObjectSerializer for serializing to a YAML file. |
|
An ObjectSerializer for serializing to a CSV file. |
|
An ObjectSerializer for pickling arbitrary Python objects. |
Functions
Get a dictionary mapping all module IDs to the string names of the |
|
|
Apply this decorator to any class that should be treated as a caikit module |
|
The decorator for AI Task classes. |
|
Load a string from a file with utf8 encoding. |
|
Load a list of files from a text file with utf8 encoding |
|
Write a string to a text file with utf8 encoding. |
|
Load a binary string from a file. |
|
Write a binary buffer to a file. |
|
Load a csv into a list-of-lists. |
|
Write a list-of-lists to a csv file. |
|
Load a csv into a list-of-dicts. |
|
Write a list of dicts to a csv file. |
|
Load a json file into a dictionary. |
|
Save a dictionary into a json file. |
|
Load a yaml file into a dictionary. |
|
Save a dictionary into a yaml file. |
|
Load an object from a pickle file. |
|
Save an object to a pickle file. |
|
Write the given raw string content to output file. |
|
Compress a given folder recursively to an archive with a given extension format |
Package Contents
- class caikit.core.DataObjectBase[source]
Bases:
caikit.core.data_model.base.DataBaseA DataObject is a data model class that is backed by a @dataclass.
Data model classes that use the @dataobject decorator must derive from this base class.
- exception caikit.core.DataValidationError(reason: str, item_number: int | None = None)[source]
Bases:
ExceptionThis error is used for data validation problems during training
- _reason
- _item_number = None
- property reason: str
The reason given for this data validation error
- property item_number: int | None
The index of the training data item that failed validation. Probably zero indexed
- caikit.core.get_valid_module_ids()[source]
Get a dictionary mapping all module IDs to the string names of the implementing classes.
- class caikit.core.ModelManager[source]
Manage the models or resources for library.
- _singleton_module_cache
- _trainers
- _finders
- _job_predictors
- _initializers
- __singleton_lock
- initialize_components()[source]
Proactively initialize all configured trainer/finder/initializer component instances. This is a separate call to enable explicit config.
- train(module: Type[caikit.core.modules.base.ModuleBase] | str, *args, trainer: str | caikit.core.model_management.ModelTrainerBase = 'default', save_path: str | caikit.interfaces.common.data_model.stream_sources.S3Path | None = None, save_with_id: bool = False, model_name: str | None = None, wait: bool = False, **kwargs) caikit.core.model_management.ModelTrainerFutureBase[source]
Train an instance of the given module with the given args and kwargs using the given trainer.
Each module’s train function encapsulates the code needed to perform the training locally. This top-level train function provides the wrapper functionality to delegate the execution of the module’s train function to an alternate framework using a ModelTrainerBase. It also allows training to be launched asynchronously.
- Args:
- module (Union[Type[ModuleBase], str]): The module class or guid for
the module to train
- *args: Additional positional args to pass through to the module’s
train function
- Kwargs:
- trainer (Union[str, ModelTrainerBase]): The trainer to use. If given
as a string, this is a key in the global config at model_management.trainers.
- save_path (Optional[Union[str, S3Path]]): Base path where the model should be
saved (may be relative to a remote trainer’s filesystem, or link to S3 storage)
- save_with_id (bool): Inject the training ID into the save path for
the output model
- model_name (Optional[str]): Name of model that will be appended
to the end of the save_path
wait (bool): Wait for training to complete before returning **kwargs: Additional keyword arguments to pass through to the
modules’s train function
- Returns:
- model_future (ModelFutureBase): The future handle
to the model which holds the status of the in-flight training.
- start_prediction_job(model: caikit.core.modules.base.ModuleBase, prediction_func_name: str, *args, predictor: str | caikit.core.model_management.JobPredictorBase = 'default', wait: bool = False, **kwargs) caikit.core.model_management.JobPredictorFutureBase[source]
Start a prediction job using a job_predictor.
- Args:
model (ModuleBase): Loaded model to run prediction on prediction_func_name (str): String reference to name of function to run predictor (Union[str, JobPredictorBase], optional): Which job_predictor to use.
Defaults to “default”.
wait (bool, optional): Weather to wait for job to finish. Defaults to False.
- Returns:
JobPredictorFutureBase: Future to track job result
- get_model_future(training_id: str) caikit.core.model_management.ModelTrainerFutureBase[source]
Get the future handle to an in-progress training
- Args:
- training_id (str): The ID string from the original training
submission’s ModelFuture
- Returns:
- model_future (ModelTrainerFutureBase): The future handle
to the model which holds the status of the in-flight training.
- get_prediction_future(prediction_id: str) caikit.core.model_management.JobPredictorFutureBase[source]
Get the future handle to an in-progress prediction job
- Args:
- prediction_id (str): The ID string from the original prediction
submission’s ModelFuture
- Returns:
- prediction_future (JobPredictorFutureBase): The future handle
to the job which holds the status of the in-flight prediction.
- load(module_path: str | io.BytesIO | bytes, *, load_singleton: bool = False, finder: str | caikit.core.model_management.ModelFinderBase = 'default', initializer: str | caikit.core.model_management.ModelInitializerBase = 'default', **kwargs)[source]
Load a model and return an instantiated object on which we can run inference.
- Args:
- module_path (str | BytesIO | bytes): A module path (identifier) to
one of the following: 1. A directory containing a yaml config file in the top level. 2. A zip archive containing either a yaml config file in the
top level when extracted, or a directory containing a yaml config file in the top level.
- A BytesIO object corresponding to a zip archive containing
either a yaml config file in the top level when extracted, or a directory containing a yaml config file in the top level.
- A bytes object corresponding to a zip archive containing
either a yaml config file in the top level when extracted, or a directory containing a yaml config file in the top level.
- A string that is understood by the configured
finder/initializer
- Kwargs:
load_singleton (bool): Load this model as a singleton finder (Union[str, ModelFinderBase]): Finder to use when loading
this model. If passed as a string, this names the finder in the global config model_management.finders section.
- initializer (Union[str, ModelInitializerBase]): Loader to use when
initializint this model. If passed as a string, this is the name of the initializer in the global config model_management.initializers section.
- Returns:
- model (ModuleBase) Model object that is loaded, configured, and
ready for prediction.
- extract(zip_path: str, model_path: str, force_overwrite: bool = False) str[source]
Method to extract a downloaded archive to a specified directory.
- Args:
zip_path (str): Location of .zip file to extract. model_path (str): Model directory where the archive should be
unzipped unzipped.
- force_overwrite: bool (Defaults to false)
Force an overwrite to model_path, even if the folder exists
- Returns:
str: Output path where the model archive is unzipped.
- resolve_and_load(path_or_name_or_model_reference: str | caikit.core.modules.base.ModuleBase, **kwargs)[source]
Try our best to load a model, given a path or a name. Simply returns any loaded model passed in. This exists to ease the burden on workflow developers who need to accept individual modules in their API, where users may have references to custom models or may only have the ability to give the name of a stock model.
- Args:
- path_or_name_or_model_reference (str, ModuleBase): Either a
Path to a model on disk
Name of a model that the catalog knows about
Loaded module
- **kwargs: Any keyword arguments to pass along to ModelManager.load()
or ModelManager.download()
e.g. parent_dir
- Returns:
A loaded module
- Examples:
>>> stock_syntax_model = manager.resolve_and_load('syntax_izumo_en_stock') >>> local_categories_model = manager.resolve_and_load('path/to/categories/model') >>> some_custom_model = manager.resolve_and_load(some_custom_model)
- get_singleton_model_cache_info()[source]
Returns information about the singleton cache in {hash: module type} format
- Returns:
Dict[str, type]: A dictionary of model hashes to model types
- clear_singleton_cache()[source]
Clears the cache of singleton models. Useful to release references of models, as long as you know that they are no longer held elsewhere and you won’t be loading them again.
- Returns:
None
- get_trainer(trainer: str | caikit.core.model_management.ModelTrainerBase) caikit.core.model_management.ModelTrainerBase[source]
Get the configured model trainer or the one passed by value
- get_finder(finder: str | caikit.core.model_management.ModelFinderBase) caikit.core.model_management.ModelFinderBase[source]
Get the configured model finder or the one passed by value
- get_initializer(initializer: str | caikit.core.model_management.ModelInitializerBase) caikit.core.model_management.ModelInitializerBase[source]
Get the configured model initializer or the one passed by value
- get_predictor(inferencer: str | caikit.core.model_management.JobPredictorBase) caikit.core.model_management.JobPredictorBase[source]
Get the configured job predictor or the one passed by value
- get_module_backends(initialize: bool = True) List[caikit.core.module_backends.base.BackendBase][source]
Convenience method to get access to the configured module backends if any have been configured
- Args:
initialize (bool): Initialize the components from config
- Returns:
- backends (List[BackendBase]): The list of backend instances that
have been configured
- _do_load(module_path, load_singleton, finder, initializer, **kwargs)[source]
Load a model from a directory.
- Args:
- module_path (str): Path to directory. At the top level of directory
is config.yml which holds info about the model.
load_singleton (bool): Load this model as a singleton finder (Union[str, ModelFinderBase]): Finder to use when loading
this model. If passed as a string, this names the finder in the global config model_management.finders section.
- initializer (Union[str, ModelInitializerBase]): Loader to use when
loading this model. If passed as a string, this is the name of the initializer in the global config model_management.initializers section.
- Returns:
- subclass of caikit.core.modules.ModuleBase: Model object that is
loaded, configured, and ready for prediction.
- _load_from_zipfile(module_path, load_singleton, finder, initializer, **kwargs)[source]
Load a model from a zip archive.
- Args:
- module_path (str): Path to directory. At the top level of directory
is config.yml which holds info about the model.
load_singleton (bool): Load this model as a singleton finder (Union[str, ModelFinderBase]): Finder to use when loading
this model. If passed as a string, this names the finder in the global config model_management.finders section.
- initializer (Union[str, ModelInitializerBase]): Loader to use when
loading this model. If passed as a string, this is the name of the initializer in the global config model_management.initializers section.
- Returns:
- subclass of caikit.core.modules.ModuleBase: Model object that is
loaded, configured, and ready for prediction.
- _singleton_lock(load_singleton: bool)[source]
Helper contextmanager that will only lock the singleton cache if this load is a singleton load
- static _get_component(component: str | caikit.core.toolkit.factory.FactoryConstructible, component_dict: Dict[str, caikit.core.toolkit.factory.FactoryConstructible], component_factory: caikit.core.toolkit.factory.Factory, component_name: str, component_cfg: dict, component_type: type) caikit.core.toolkit.factory.FactoryConstructible[source]
Common logic for resolving components from config
- NOTE: This is done lazily to avoid relying on import order and to allow
for dynamic config changes
- class caikit.core.BackendBase(config: aconfig.Config | None = None)[source]
Bases:
abc.ABCInterface for creating configuration setup for backends
- config
- _started = False
- _start_lock
- class property backend_type
- Abstractmethod:
Property storing type of the backend
- property is_started
- abstract register_config(config)[source]
Function to allow dynamic merging of configs. This can be useful, if there are explicit configurations particular implementations (modules) need to register before the starting the backend.
- abstract start()[source]
Function to start a distributed backend. This function should set self._started variable to True
- abstract stop()[source]
Function to stop a distributed backend. This function should set self._started variable to False
- handle_runtime_context(model_id: str, runtime_context: caikit.core.data_model.runtime_context.RuntimeServerContextType)[source]
Update backend state for the given model based on a runtime request.
Some backends may need to handle runtime context information for the target model in order to correctly configure the backend before finding and loading the model. By default, this is a No-Op.
- Args:
- model_id (str): The unique ID of the model that is referenced by the
runtime context
- runtime_context (RuntimeServerContextType): The context for the
given runtime request
- class caikit.core.LocalBackend(config: aconfig.Config | None = None)[source]
Bases:
caikit.core.module_backends.base.BackendBaseInterface for creating configuration setup for backends
- backend_type = 'LOCAL'
Property storing type of the backend
- class caikit.core.ModuleBase[source]
Abstract base class from which all modules should inherit.
- _metadata
- _load_backend = None
- property metadata: Dict[str, Any]
This module’s metadata.
- Returns:
Dict[str, Any]: A dictionary of this module’s metadata
TODO: Can this be a ModuleConfig object instead? (or aconfig.Config)?
- property module_metadata: Dict[str, Any]
Helper property to return metadata about a Module. This function is separate from metadata as this is specific for the class module. This function also requires a flat metadata structure without nested dictionaries.
NOTE: This should be a @classmethod but using @property/@classmethod together has been deprecated
- Returns:
Dict[str, str]: A dictionary of this ModuleBases’s metadata
- property public_model_info: Dict[str, Any]
Helper property to return public metadata about a specific Model. This function is separate from metdata as that contains the entire ModelConfig which might not want to be shared/exposed.
- Returns:
Dict[str, str]: A dictionary of this models’s public metadata
- set_load_backend(load_backend)[source]
Method used by the model manager to indicate the load backend that was used to load this module
- classmethod get_inference_signature(input_streaming: bool, output_streaming: bool, task: Type[caikit.core.TaskBase] = None) caikit.core.signature_parsing.CaikitMethodSignature | None[source]
Returns the inference method signature that is capable of running the module’s task for the given flavors of input and output streaming
- classmethod get_inference_signatures(task: Type[caikit.core.TaskBase]) List[Tuple[bool, bool, caikit.core.signature_parsing.CaikitMethodSignature]][source]
Returns inference method signatures for all supported flavors of input and output streaming for a given task
- property load_backend
Get the backend instance used to load this module. This can be used in module implementations that require use of a specific backend at inference time.
- classmethod bootstrap(*args, **kwargs)[source]
Bootstrap a module. This method can be used to initialize the module from artifacts created outside of a particular caikit library
- classmethod load(model_path: str | caikit.core.modules.config.ModuleConfig, *args, **kwargs) ModuleBase[source]
Load a new instance of workflow from a given model_path
- Args:
- model_path (Union[str, ModuleConfig]): Path to saved model or
in-memory ModuleConfig
- Returns:
model (ModuleBase): A new instance of this module class
- classmethod timed_load(*args, **kwargs)[source]
Time a model load call.
- Args:
*args (list): Will be passed to self.load. **kwargs (dict): Will be passed to self.load – the only way to
pass arbitrary arguments to self.load from this function.
- Returns:
- int, caikit.core._ModuleBase: The first return value is the total
time spent in the self.load call. The second return value is the loaded model.
- Notes:
You can pass everything that should go to the run function normally using args/kwargs. Example: model.timed_load(“/model/path/dir”)
- save(model_path: str, *args, **kwargs)[source]
Save a model.
- Args:
model_path (str): Path on disk to export the model to.
- as_file_like_object(*args, **kwargs) io.BytesIO[source]
Produces a file-like object corresponding to a zip archive affiliated with a given model. This method wraps is functionally similar to .save() - it saves a model into a temporary directory and produces a zip archive, then loads the result as a io.BytesIO object. The result of this function is also compatible with .load(), and cleanup is handled automatically.
- as_bytes(*args, **kwargs) bytes[source]
Produces a bytes object corresponding to a zip archive affiliated with a given model. This method wraps is functionally similar to .save() - it saves a model into a temporary directory and produces a zip archive, then loads the result as a bytes object. The result of this function is also compatible with .load(), and cleanup is handled automatically.
- run(*args, **kwargs)[source]
Run a model - this typically makes a single prediction and returns an object from the data model.
- run_batch(*args, **kwargs)[source]
Run a model in batch mode - this typically ingests an iterable of inputs that can be applied to run & returns a list of data model objects that run ordinarily returns. A module may override this method to provide faster evaluation capabilities, e.g., by leveraging vectorization during prediction.
All provided args and kwargs that should be expanded with the batch should be provided as prebatched iterables. If a provided arg/kwarg is not provided as an iterable, it will be passed as is to all self contained run calls, which may be the case in some rare cases, such as runtime explanability enablement.
This function is intentionally kept as simple as possible. In order to maintain its simplicity, all argument iterables must be the same length, where the length of every provided iterable is presumed to be the batch size. If an iterable must be passed as arg to each run call, batch run must be called by wrapping it in another iterable and duplicating the iterable arg to match the size, or (ideally) overridden in the subclass as necessary.
- timed_run(*args, num_seconds=None, num_iterations=None, **kwargs)[source]
Time a number of runs over set seconds or iterations.
- Args:
*args (list): Will be passed to self.run. num_seconds (int): Minimum number of seconds to run timed_run over.
Will most likely be more than this value due to its waiting for the each call to self.run to finish.
- num_iterations (int): Minimum number of iterations to run timed_run
over. Will run exactly this many times.
**kwargs (dict): Will be passed to self.run.
- Returns:
- int, int, caikit.core.data_model.DataBase: The first return value is
the total time spent in the self.run loop. The second return value is the total number of calls to self.run were made. The return value is the output of the module’s run method
- Notes:
You can pass everything that should go to the run function normally using args/kwargs. Example: model.timed_run(“some example text”, num_seconds=60)
By default it will run for greater than or equal to 120 seconds.
- stream(data_stream, *args, **kwargs)[source]
Lazily evaluate a run() on a given model by constructing a new data stream generator from the results. Note that we do not allow datastreams in args/kwargs. In rare cases, this may mean that stream() is not available, e.g., for keywords extraction. In these cases, stream() should be overridden in the subclass (module implementation) to allow and expand along multiple data streams.
- Args:
- data_stream (caikit.core.data_model.DataStream): Datastream to be
lazily sequentially processed by the module under consideration.
*args: Variable length argument list to be passed directly to run(). **kwargs: Arbitrary keyword arguments to be passed directly to run().
- Returns:
protobufs: A DataBase object.
- classmethod validate_training_data(training_data: str | caikit.core.data_model.DataStream, limit: int = -1) List[caikit.core.exceptions.validation_error.DataValidationError][source]
Validate a set of training data, passed as a filename or as a data stream. Return up to limit number of DataValidationErrors
- evaluation_type = None
- evaluator = None
- static find_label_func(*_args, **_kwargs)[source]
- Abstractmethod:
Function used to extract “label” from a prediction/result of a module’s .run method. Define if you wish to have more specific evaluation metrics. Implemented in subclass.
- static find_label_data_func(*_args, **_kwargs)[source]
- Abstractmethod:
Function used to extract data belonging to class “label” from a prediction/result of a module’s .run method. Define if you wish to have more specific evaluation metrics. Implemented in subclass.
- evaluate_quality(dataset_path, preprocess_func=None, detailed_metrics=False, labels=None, partial_match_metrics=False, max_hierarchy_levels=3, **kwargs)[source]
Run quality evaluation for instance of module
- Args:
- dataset_path (str): Path to where the input “gold set” dataset
lives. Most often this is .json file.
- preprocess_func (method): Function used as proxy for any preliminary
steps that need to be taken to run the model on the input text. This helper function ultimately leads to the input to this module and may involve executing other modules.
- detailed_metrics: boolean (Optional, defaults to False)
Only for ‘keywords’. Include partial scores and scores over every text in document.
- labels: list (Optional, defaults to None)
Optional list of class labels to evaluate quality on. By default evaluation is done over all class labels. Using this, you can explicitly mention only a subset of labels to include in the quality evaluation.
- partial_match_metrics: boolean (Optional, defaults to False)
Include partial match micro avg F1.
- max_hierarchy_levels (int): Used in hierarchical multilabel
multiclass evaluation only. The number of levels in the hierarchy to run model evaluation on, in addition to complete matches.
- *args, **kwargs: Optional arguments which can be used by goldset/prediction
set extraction. Nonekeyword arguments: block_level: str
For any module that has pre processing steps in the middle of raw text and actual module input, use the input from gold standard labels instead of a pre-process function. Useful for measuring quality for the ‘block’ alone (instead of the module + pre_process pipeline)
- Returns:
- dict: Dictionary of results provided by the self.evaluator.run
function, depending on the associated evaluation_type. Reports things like precision, recall, and f1.
- static _is_expandable_iterable(arg)[source]
Check to see if something is a list / tuple of data model objects or strings. If it is, we consider it “expandable”, meaning that one element of the iterable to one run call. In contrast, if something is not expandable, it will be passed as is to each call.
- Args:
arg (any): Argument to run_batch being considered.
- Returns:
- bool: True if the argument is a compatible iterable, False
otherwise.
- _validate_and_extract_batch_size(*args, **kwargs)[source]
Check to ensure that there’s at least one iterable whose length is well defined, i.e., no generators, and that if multiple iterable arg/kwarg values are provided, they are all the same length.
- _validate_arg_and_verify_batch_size(val, current_batch_size)[source]
Check an arg value from args/kwargs. If we find that it’s an expandable iterable, see if it conflicts with what we know about the inferred batch size so far.
- args:
val (any): Argument / keyword argument value being inspected. current_batch_size (None | int): Current inferred batch size from
previous args/kwargs, or None if no inferences have been made on other expandable iterables yet.
- Returns:
None | inferred batch size.
- static _build_args_for_default_run_with_batch(fixed_args, expanded_args, idx)[source]
Build the non keyword arguments for run_batch’s default implementation by expanding iterable args where possible, and grouping them with repeated noniterable arguments. The index correspondes to the current document under consideration.
- Args:
fixed_args (dict): Noniterable args - common across all documents. expanded_args (dict): Iterable args - we’ll need to index into this
to get our doc arg.
idx (int): Index of the document being considered.
- Returns:
list: Args to be run for document [idx].
- static _build_kwargs_for_default_run_with_batch(fixed_kwargs, expanded_kwargs, idx)[source]
Similar to the previous function, but for kwargs. Note that we can just clone our fixed kwargs instead of cycling through them, because order doesn’t matter here.
- Args:
- fixed_args (dict): Noniterable valued kwargs - common across all
documents.
- expanded_args (dict): Iterable valued kwargs - we’ll need to index
into these to get our doc kwarg.
- Returns:
dict: Kwargs to be run for document [idx].
- _extract_gold_set(dataset)[source]
Method for extracting gold set from dataset. Implemented in subclass.
- Args:
- dataset (object): In-memory version of whatever is loaded from on-
disk. May be json, txt, etc.
- Returns:
- list: List of labels in the format of the module_type that is being
called.
- _extract_pred_set(dataset, preprocess_func=None, **kwargs)[source]
Method for extracting pred set from dataset. Implemented in subclass.
- Args:
- dataset (object): In-memory version of whatever is loaded from on-
disk. May be json, txt, etc.
- preprocess_func (method): Function used as proxy for any preliminary
steps that need to be taken to run the model on the input text. This helper function ultimately leads to the input to this module and may involve executing other modules.
**kwargs (dict): Optional keyword arguments for prediction set extraction.
- Returns:
- list: List of labels in the format of the module_type that is being
called.
- static _load_evaluation_dataset(dataset_path)[source]
Helper specifically for dataset loading.
- Args:
- dataset_path (str): Path to where the input ‘gold set’ dataset
lives. Most often this is .json file.
- Returns:
- object: list, dict, or other python object, depending on the input
dataset_path extension. Currently only supports .json and uses fileio from toolkit.
- static _extract_gold_annotations(gold_set)[source]
Extract the core list of annotations that is needed for quality evaluation
- Args:
gold_set (list)
- Returns:
gold_annotations: list
- class caikit.core.ModuleConfig(config_dict)[source]
Bases:
aconfig.ConfigConfig object used by all modules for config loading, saving, etc.
- reserved_keys = ['model_path']
- classmethod load(model_path: str | ModuleConfig) ModuleConfig[source]
Load a new module configuration from a directory on disk.
- Args:
- model_path (Union[str, ModuleConfig]): Path to model directory. At
the top level of directory is config.yml which holds info about the model. Note that the model_path here is assumed to be operating system correct as a consequence of the way this method is invoked by the model manager.
- Returns:
- model_config (ModuleConfig): Instantiated ModuleConfig for model
given model_path.
- save(model_path)[source]
Save this module configuration to a top-level config.yml file in the specified model path.
- Args: str
Path to model directory. The config.yml file will be written to this location.
- Notes:
model_path must already exist! This means you must create the directory outside of this routine.
- class caikit.core.ModuleLoader(model_path: str | caikit.core.modules.config.ModuleConfig)[source]
- MODULE_PATHS_KEY = 'module_paths'
- config
- model_path
- class caikit.core.ModuleSaver(module: caikit.core.modules.base.ModuleBase, model_path, exist_ok=True)[source]
A module saver that provides common functionality used for saving modules and also a context manager that cleans up in case an error is encountered during the save process for a model_path that did not already exist.
- SAVED_KEY_NAME = 'saved'
- CREATED_KEY_NAME = 'created'
- TRACKING_KEY_NAME = 'tracking_id'
- MODULE_VERSION_KEY_NAME = 'version'
- MODULE_ID_KEY_NAME = 'module_id'
- MODULE_CLASS_KEY_NAME = 'module_class'
- model_path = b'.'
- exist_ok = True
- config
- add_dir(relative_path, base_relative_path='')[source]
Create a directory inside the model_path for this saver.
- Args:
- relative_path (str): A path relative to this saver’s model_path
denoting the directory to create.
- base_relative_path (str): A path, relative to this saver’s
model_path, in which relative_path will be created.
- Returns:
- str, str: A tuple containing both the relative_path and
absolute_path to the directory created.
- Examples:
>>> with ModelSaver('/path/to/model') as saver: >>> rel_path, abs_path = saver.add_dir('word_embeddings', 'model_data') >>> print(rel_path) model_data/word_embeddings >>> print(abs_path) /path/to/model/model_data/word_embeddings
- copy_file(file_path, relative_path='')[source]
Copy an external file into a subdirectory of the model_path for this saver.
- Args:
file_path (str): Absolute path to the external file to copy. relative_path (str): The relative path inside of model_path where
the file will be copied to. If set to the empty string (default) then the file will be placed directly in the model_path directory.
- Returns:
- str, str: A tuple containing both the relative_path and
absolute_path to the copied file.
- save_object(obj, filename, serializer, relative_path='')[source]
Save a Python object using the provided ObjectSerializer.
- Args:
obj (any): The Python object to save filename (str): The filename to use for the saved object serializer (ObjectSerializer): An ObjectSerializer instance (e.g.,
YAMLSerializer) that should be used to serialize the object
- relative_path (str): The relative path inside of model_path where
the object will be saved
- update_config(additional_config)[source]
Add items to this saver’s config dictionary.
- Args:
- additional_config (dict): A dictionary of config options to add the
this saver’s configuration.
- Notes:
The behavior of this method matches dict.update and is equivalent to calling saver.config.update. The saver.config dictionary may be accessed directly for more sophisticated manipulation of the configuration.
- save_module(module, relative_path, **kwargs)[source]
Save a CaikitCore module within a workflow artifact and add a reference to the config.
- Args:
- module (caikit.core.ModuleBase): The CaikitCore module to save as
part of this workflow
- relative_path (str): The relative path inside of model_path where
the module will be saved
- **kwargs: dict
key-value pair of parameters to be passed to module.save
- save_module_list(modules, config_key, **kwargs)[source]
Save a list of CaikitCore modules within a workflow artifact and add a reference to the config.
- Args:
- modules (dict{str -> caikit.core.ModuleBase}): A dict with module
relative path as key and a CaikitCore module as value to save as part of this workflow
- config_key (str): The config key inside of model_path where the
modules’ relative path with be referenced
- **kwargs: dict
key-value pair of parameters to be passed to module.save
- Returns:
- list_of_rel_path: list(str)
List of relative paths where the modules are saved
- list_of_abs_path: list(str)
List of absolute paths where the modules are saved
- __enter__()[source]
Enter the module saver context. This creates the model_path directory. If this context successfully exits, then the model configuration and all files it contains will be written and saved to disk inside the model_path directory.
If exist_ok is False, an exception will be raised before touching existing model_path files.
If any uncaught exceptions are thrown inside this context, and exist_ok is False, then this new model_path will be removed. If exist_ok is True, the files will be kept and may include incomplete updates.
- __exit__(exc_type, exc_val, exc_tb)[source]
Exit the module saver context. If this context successfully exits, then the model configuration and all files it contains will be written and saved to disk inside the model_path directory.
If any uncaught exceptions are thrown inside this context, and exist_ok is False, then this new model_path will be removed. If exist_ok is True, the files will be kept and may include incomplete updates.
- caikit.core.module(id=None, name=None, version=None, task: Type[caikit.core.task.TaskBase] = None, tasks: List[Type[caikit.core.task.TaskBase]] | None = None, backend_type='LOCAL', base_module: str | Type[caikit.core.modules.base.ModuleBase] = None, backend_config_override: Dict | None = None)[source]
- Apply this decorator to any class that should be treated as a caikit module
(i.e., extends`{caikit.core.ModuleBase}) and registered with caikit.core so that the library “knows” the class is a caikit module and is capable of loading instances of the module.
- Args:
- id: str
A UUID to use when registering this module with caikit.core Not required if based on another caikit module using base_module
- name: str
A human-readable name for the module Not required if based on another caikit module using base_module
- version: str
A SemVer for the module Not required if based on another caikit module using base_module
- task: Type[TaskBase]
An ML task class that this module is an implementation for Not required if based on another caikit module using base_module, or if multiple tasks are specified using tasks.
- tasks: Optional[List[Type[TaskBase]]
List of ML task classes that this module implements.
- backend_type: backend_type
Associated backend type for the module. Default: LOCAL
- base_module: str | ModuleBase
If this module is based on a different caikit module, provide name of the base module. Default: None
- backend_config_override: Dict
Dictionary containing configuration required for the specific backend. Default: None
- Returns:
A decorated version of the class to which it was applied, after registering the class as a valid module with caikit.core
- class caikit.core.TaskBase[source]
The TaskBase defines the interface for an abstract AI task
An AI task is a logical function signature which, when implemented, performs a task in some AI domain. The key property of a task is that the set of required input argument types and the output value type are consistent across all implementations of the task.
- class InferenceMethodPtr[source]
Little container class that holds a method name and its flavor of streaming. i.e. the args to a @TaskClass.taskmethod decoration.
- method_name: str
- input_streaming: bool
- output_streaming: bool
- context_arg: str | None
- classmethod taskmethod(input_streaming: bool = False, output_streaming: bool = False, context_arg: str | None = None) Callable[[_InferenceMethodBaseT], _InferenceMethodBaseT][source]
Decorates a module instancemethod and indicates whether the inputs and outputs should be handled as streams. This will trigger validation that the signature of this method is compatible with the task’s definition of input and output types.
The actual handling of validating the method and registering it is deferred until after the module class is created, which happens outside the context of this decoration.
- classmethod deferred_method_decoration(module: Type)[source]
Runs the actual decoration logic that taskmethod would have run if the module class existed during its lifetime.
Validates that all decorated methods match the task’s API expectations, and stores the signatures on the module class for access later.
- classmethod has_inference_method_decorators(module_class: Type) bool[source]
Utility that returns true iff a module has any @TaskClass.taskmethod decorations
- classmethod validate_run_signature(signature: caikit.core.signature_parsing.CaikitMethodSignature, input_streaming: bool, output_streaming: bool) None[source]
Validates that the provided method signature meets the api constraints defined in this task, for the given streaming flavors.
- Raises:
ValueError if no type annotations were provided on the method TypeError if the type annotations do not meet the task’s api constraints
- classmethod get_required_parameters(input_streaming: bool) Dict[str, ValidInputTypes | Type[Iterable[ValidInputTypes]]][source]
Get the set of input types required by this task
- classmethod get_output_type(output_streaming: bool) Type[caikit.core.data_model.base.DataBase][source]
Get the output type for this task
- NOTE: This method is automatically configured by the @task decorator
and should not be overwritten by child classes.
- classmethod get_visibility() bool[source]
Get the visibility for this task.
NOTE: defaults to True even if visibility wasn’t provided
- classmethod get_metadata() Dict[str, Any][source]
Get any metadata defined for this task
NOTE: defaults to an empty dict if one wasn’t provided
- caikit.core.task(unary_parameters: Dict[str, ValidInputTypes] = None, streaming_parameters: Dict[str, Type[Iterable[ValidInputTypes]]] = None, unary_output_type: Type[caikit.core.data_model.base.DataBase] = None, streaming_output_type: Type[Iterable[Type[caikit.core.data_model.base.DataBase]]] = None, visible: bool = True, metadata: Dict[str, Any] | None = None, **kwargs) Callable[[Type[TaskBase]], Type[TaskBase]][source]
The decorator for AI Task classes.
This defines an output data model type for the task, and a minimal set of required inputs that all public models implementing this task must accept.
As an example, the caikit.interfaces.nlp.SentimentTask might look like:
@task( unary_parameters={ "raw_document": caikit.interfaces.nlp.RawDocument }, streaming_parameters={ "raw_documents": Iterable[caikit.interfaces.nlp.RawDocument] } unary_output_type=caikit.interfaces.nlp.SentimentPrediction streaming_output_type=Iterable[caikit.interfaces.nlp.SentimentPrediction] ) class SentimentTask(caikit.TaskBase): pass
and a module that implements this task might have methods like:
@module(id="b9d98408-84c2-488c-8385-9d698effe60b", task=SentimentTask) class MyModule(ModuleBase): @SentimentTask.taskmethod() def run(raw_document: caikit.interfaces.nlp.RawDocument, inference_mode: str = "fast") -> caikit.interfaces.nlp.SentimentPrediction: # impl @SentimentTask.taskmethod(input_streaming=True, output_streaming=True) def run_bidi_stream(raw_documents: DataStream[caikit.interfaces.nlp.RawDocument]) -> DataStream[caikit.interfaces.nlp.SentimentPrediction]: # impl
Note the run function may include other arguments beyond the minimal required inputs for the task.
- Args:
- unary_parameters (Dict[str, ValidInputTypes]): The required parameters that all module’s
unary-input inference methods must contain. A dictionary of parameter name to parameter type, where the types can be in the set of:
Python primitives
Caikit data models
Iterable containers of the above
Caikit model references (maybe?)
- streaming_parameters: The same as unary_parameters, but for streaming-input inference
methods. All types must be in the form Iterable[T]
- unary_output_type (Type[DataBase]): The unary output type of the task, which all modules’
unary-output inference methods must return. This must be a caikit data model type.
- streaming_output_type (Type[Iterable[Type[DataBase]]]): The streaming output type of the
task, which all modules’ streaming-output inference methods must return. This must be in the form Iterable[T].
- visible (bool): If this task should be exposed to the end user in documentation or if
it should only be used internally
- metadata (Optional[Dict[str, Any]]): Any additional metadata that should
be included in the documentation for this task
- Returns:
- A decorator function for the task class, registering it with caikit’s core registry of
tasks.
- class caikit.core.EvalTypes(*args, **kwds)[source]
Bases:
enum.EnumEnum that contains set of all possible evaluation types.
- SINGLELABEL_MULTICLASS = 1
- MULTILABEL_MULTICLASS = 2
- MULTILABEL_MULTICLASS_HIERARCHICAL = 3
- class caikit.core.F1Metrics[source]
- true_positive: int | None = None
- false_positive: int | None = None
- false_negative: int | None = None
- precision: float | None = None
- recall: float | None = None
- f1: float | None = None
- class caikit.core.QualityEvaluator(gold, pred)[source]
Class that holds all evaluation logic for now. May eventually be broken up into subclasses.
- gold
- pred
- run(evaluation_type, find_label_func=None, find_label_data_func=None, detailed_metrics=False, labels=None, partial_match_metrics=False, max_hierarchy_levels=3)[source]
Main entry point for evaluation.
- Args:
- evaluation_type (str): Which type of evaluation to run. Only a few
are currently supported.
- find_label_func: function to fetch labels from any one prediction, used in
multiclass multilabel evaluation. eg: if a prediction is of form (token, label), this function should be able to tell us how to extract the class labels from the prediction, in this case return the second element of the tuple.
- find_label_data_func: function to fetch predictions that belongs to a certain label,
used only in multiclass multilabel eval type, e.g., if predictions for a data example looks like [(tok1, labX), (tok2, labY), (tok3, labX)], then the function should be able to return all predictions with a given label - labX return should look like [(tok1, labX), (tok3, labX)]
- detailed_metrics: flag to indicate whether or not you want detailed metrics
(currently only for multiclass multilabel eval type) Detailed metrics give us metrics for every example, and metrics using a custom partial match function
- labels: list (Optional, defaults to None)
Optional list of class labels to evaluate quality on. By default evaluation is done over all class labels. Using this, you can explicitly mention only a subset of labels to include in the quality evaluation.
- partial_match_metrics: flag to indicate whether or not you want partial match
micro avg metrics. (currently only for multiclass multilabel eval type)
- max_hierarchy_levels (int): Used in hierarchical multilabel
multiclass evaluation only. The number of levels in the hierarchy to run model evaluation on, in addition to complete matches.
- Returns:
dict: Full results from evaluation on dataset and model.
- singlelabel_multiclass_evaluation(labels=None) dict[source]
Obtain results of evaluation for a single-label, multi-class model.
- Args:
Note: here class should be initialized with gold and pred in the following format self.gold (list): list of gold set labels for every example, where each example
can have only one label eg: [‘label1’,’label2’, ‘label3’,’label4’]
self.pred (list): Predicted-by-the-model set labels for every example. labels: list (Optional, defaults to None)
Optional list of class labels to evaluate quality on. By default evaluation is done over all class labels. Using this, you can explicitly mention only a subset of labels to include in the quality evaluation.
- Returns:
- dict: Dictionary looks like: { ‘per_class_confusion_matrix’:
{‘entity_type’: {‘true_positive’: int …}} ‘macro_precision’: 0 <= float <= 1, ‘macro_recall’: 0 <= float <= 1, ‘macro_f1’: 0 <= float <= 1, ‘micro_precision’: 0 <= float <= 1,, ‘micro_recall’: 0 <= float <= 1,, ‘micro_f1’: 0 <= float <= 1, ‘overall_tp’: int, ‘overall_fp’: int, ‘overall_fn’: int
}
- multilabel_multiclass_evaluation(find_label_func, find_label_data_func, labels=None, detailed_metrics=False, partial_match_metrics=False, use_labels_for_matching=False) dict[source]
Obtain results of evaluation for a multi-label, multi-class model.
- Args:
Note: here class should be initialized with gold and pred in the following format self.gold (list(list)): list of gold set labels for every example eg:
[[‘label1’,’label2’], [‘label1’, ‘label4’]]
self.pred (list(list)): Predicted-by-the-model set labels for every example. find_label_func: function to fetch labels from any one prediction find_label_data_func: function to fetch data that belongs to a certain class labels: list (Optional, defaults to None)
Optional list of class labels to evaluate quality on. By default evaluation is done over all class labels. Using this, you can explicitly mention only a subset of labels to include in the quality evaluation.
- detailed_metrics: flag to indicate whether or not you want detailed metrics
Detailed metrics give us metrics for every example, and metrics using a custom partial match function
- partial_match_metrics: flag to indicate whether or not you want partial match
micro avg metrics.
- use_labels_for_matching (bool): Indicates whether or not we should
use the output of find_label_func for metric computations, or the raw data tuples.
- Returns:
- dict: Dictionary looks like: { ‘per_class_confusion_matrix’:
{‘entity_type’: {‘true_positive’: int …}} ‘macro_precision’: 0 <= float <= 1, ‘macro_recall’: 0 <= float <= 1, ‘macro_f1’: 0 <= float <= 1, ‘micro_precision’: micro_precision, ‘micro_recall’: micro_recall, ‘micro_f1’: micro_f1, ‘detailed_metrics’ : {‘exact_match_precision’..,’partial_match_precision’} ‘micro_precision_partial_match’: 0 <= float <= 1, ‘micro_recall_partial_match’: 0 <= float <= 1, ‘micro_f1_partial_match’: 0 <= float <= 1 }
- multilabel_multiclass_hierarchical_evaluation(find_label_func_builder, find_label_data_func_builder, max_hierarchy_levels=3) dict[source]
Evaluate multilabel/multiclass over a hierarchy, e.g., for ESA categories. This method Evaluates over a set number of hierarchy levels.
Because each level in the hierarchy needs to be able to compare and extract differently, we use builder funcs that create the appropriate functions for a given level of the hierarchy.
- Args:
- find_label_func_builder (function): A function that takes in a level
number (or None if full hierarchy) and returns a find_label_func for this level that can be passed to the multilabel multiclass evaluator.
- find_label_data_func_builder (function): A function that takes in a
level number (or None if full hierarchy) and returns a find_label_data_func for this level that can be passed to the multilabel multiclass evaluator.
- max_hierarchy_levels (int): The number of levels to run in the
hierarchy, in addition to complete match.
- Returns:
- dict: Dictionary, where each key is a level number, or ‘FULL’, and
maps to the dict returned by multilabel_multiclass_evaluation for that level of the hierarchy.
- static calc_f1_score(gold, pred, match_fun=None)[source]
Calculates F1 score Args:
gold (list): List of gold annotations pred (list): List of predictions match_fun: Function that finds the matches and returns tuple of matched gold, preds
- Returns:
tuple: Precision, Recall, F1 score
- static find_partial_matches(groundtruth, prediction)[source]
- Function to do find partial match between predicted phrases and the ground truth.
partial match means a complete predicted phrase is a part of any ground truth phrase or a complete ground truth phrase is a part of any predicted phrase. Overlaps are not considered.
- Args:
groundtruth (list): Groundtruth data prediction (list): Predictions returned by the model
- Returns:
- tuple: gold_matched: set, pred_matched: set gold annotations that
were matched Predictions that partially or fully matched with groundtruth
- static calc_metrics_from_confusion_matrix(per_class_confusion_matrix: Dict[str, F1Metrics]) F1MetricsContainer[source]
- Function to calculate precision, recall, F1 metrics using a confusion matrix containing
statistics per class label.
- Args:
- per_class_confusion_matrix (Dict[str, F1Metrics]): Dictionary of
statistics per class label. Class labels are keys for the dictionary. For each class label, there should be a F1Metrics class object with values true positive, false_positive , false_negative representating the count of these per class. The dictionary looks like: per_class_confusion_matrix[label] = F1Metrics(true_positive = val 1, false_positive = val 2, false_negative = val 3)
- Returns:
Returns: metrics_summary: F1MetricsContainer An instance of F1MetricsContainer dataclass containing summary of F1 metrics
- convert_F1MetricsContainer_to_dict() dict[source]
- Args:
- metrics_summary (F1MetricsContainer): An object of dataclass
F1MetricsContainer
- Returns:
Returns: dict
- Dictionary looks like: {
‘per_class_confusion_matrix’: {‘entity_type’: {‘true_positive’: int …}} ‘macro_precision’: 0 <= float <= 1, ‘macro_recall’: 0 <= float <= 1, ‘macro_f1’: 0 <= float <= 1, ‘micro_precision’: 0 <= float <= 1,, ‘micro_recall’: 0 <= float <= 1,, ‘micro_f1’: 0 <= float <= 1, ‘overall_tp’: int, ‘overall_fp’: int, ‘overall_fn’: int
}
- caikit.core.load_txt_lines(filename)[source]
Load a list of files from a text file with utf8 encoding
- caikit.core.save_txt(text, filename, mode='w')[source]
Write a string to a text file with utf8 encoding.
- caikit.core.save_dict_csv(dict_list, filename, mode='w')[source]
Write a list of dicts to a csv file.
- caikit.core.save_raw(save_content, filename, mode='w')[source]
Write the given raw string content to output file.
- caikit.core.compress(dir_path, output_path=None, extension='zip')[source]
Compress a given folder recursively to an archive with a given extension format
- Args:
dir_path (str): Path of directory to compress output_path: (Optional) str
Output path where the archive is created. Defaults to current path + ‘archive’ + format extension >>> compress(‘.’, ‘my/path’, ‘tar’) >>> # saves to ‘my/path/archive.tar’
- extension: (Optional) (one of: zip/tar/gztar/bztar/xztar depending on module availability)
Defaults to .zip
- Returns:
str: Path to created archive
- class caikit.core.ObjectSerializer[source]
Bases:
abc.ABCAbstract class for serializing an object to disk.
- class caikit.core.JSONSerializer[source]
Bases:
ObjectSerializerAn ObjectSerializer for serializing to a JSON file.
- class caikit.core.TextSerializer[source]
Bases:
ObjectSerializerAn ObjectSerializer for serializing a python list to a text file.
- class caikit.core.YAMLSerializer[source]
Bases:
ObjectSerializerAn ObjectSerializer for serializing to a YAML file.
- class caikit.core.CSVSerializer[source]
Bases:
ObjectSerializerAn ObjectSerializer for serializing to a CSV file.
- class caikit.core.PickleSerializer[source]
Bases:
ObjectSerializerAn ObjectSerializer for pickling arbitrary Python objects.
- caikit.core.MODEL_MANAGER
- caikit.core.extract
- caikit.core.load
- caikit.core.resolve_and_load
- caikit.core.train
- caikit.core.start_prediction_job
- caikit.core.get_model_future
- caikit.core.get_prediction_future