caikit.core.modules.base
========================

.. py:module:: caikit.core.modules.base

.. autoapi-nested-parse::

   Shared functionality and interfaces used by *all* moduless.


Attributes
----------

.. autoapisummary::

   caikit.core.modules.base.log
   caikit.core.modules.base.error


Classes
-------

.. autoapisummary::

   caikit.core.modules.base.ModuleBase


Module Contents
---------------

.. py:data:: log

.. py:data:: error

.. py:class:: ModuleBase

   Abstract base class from which all modules should inherit.


   .. py:attribute:: _metadata


   .. py:attribute:: _load_backend
      :value: None


   .. py:property:: metadata
      :type: Dict[str, Any]


      This module's metadata.

      Returns:
          Dict[str, Any]: A dictionary of this module's metadata

          TODO: Can this be a `ModuleConfig` object instead? (or aconfig.Config)?


   .. py:property:: module_metadata
      :type: Dict[str, Any]


      Helper property to return metadata about a Module. This function
      is separate from `metadata` as this is specific for the class module. This
      function also requires a flat metadata structure without nested dictionaries.

      NOTE: This should be a @classmethod but using @property/@classmethod together has
      been deprecated

      Returns:
          Dict[str, str]: A dictionary of this ModuleBases's metadata


   .. py:property:: public_model_info
      :type: Dict[str, Any]


      Helper property to return public metadata about a specific Model. This
      function is separate from `metdata` as that contains the entire ModelConfig
      which might not want to be shared/exposed.

      Returns:
          Dict[str, str]: A dictionary of this models's public metadata


   .. py:method:: set_load_backend(load_backend)

      Method used by the model manager to indicate the load backend that
      was used to load this module


   .. py:method:: get_inference_signature(input_streaming: bool, output_streaming: bool, task: Type[caikit.core.TaskBase] = None) -> Optional[caikit.core.signature_parsing.CaikitMethodSignature]
      :classmethod:


      Returns the inference method signature that is capable of running the module's task
      for the given flavors of input and output streaming


   .. py:method:: get_inference_signatures(task: Type[caikit.core.TaskBase]) -> List[Tuple[bool, bool, caikit.core.signature_parsing.CaikitMethodSignature]]
      :classmethod:


      Returns inference method signatures for all supported flavors
      of input and output streaming for a given task


   .. py:property:: load_backend

      Get the backend instance used to load this module. This can be used
      in module implementations that require use of a specific backend at
      inference time.


   .. py:method:: bootstrap(*args, **kwargs)
      :classmethod:


      Bootstrap a module. This method can be used to initialize the module
      from artifacts created outside of a particular caikit library


   .. py:method:: load(model_path: Union[str, caikit.core.modules.config.ModuleConfig], *args, **kwargs) -> ModuleBase
      :classmethod:


      Load a new instance of workflow from a given model_path

      Args:
          model_path (Union[str, ModuleConfig]): Path to saved model or
              in-memory ModuleConfig
      Returns:
          model (ModuleBase): A new instance of this module class


   .. py:method:: _load(module_loader, *args, **kwargs)
      :classmethod:


      Load a model.


   .. py:method:: timed_load(*args, **kwargs)
      :classmethod:


      Time a model `load` call.

      Args:
          *args (list): Will be passed to `self.load`.
          **kwargs (dict): Will be passed to `self.load` -- the only way to
              pass arbitrary arguments to `self.load` from this function.

      Returns:
          int, caikit.core._ModuleBase: The first return value is the total
              time spent in the `self.load` call. The second return value is
              the loaded model.

      Notes:
          You can pass everything that should go to the run function normally using args/kwargs.
          Example: `model.timed_load("/model/path/dir")`


   .. py:method:: validate_loaded_model(*args)

      Validate a loaded model.


   .. py:method:: save(model_path: str, *args, **kwargs)

      Save a model.

      Args:
          model_path (str): Path on disk to export the model to.


   .. py:method:: as_file_like_object(*args, **kwargs) -> io.BytesIO

      Produces a file-like object corresponding to a zip archive affiliated with a given
      model. This method wraps is functionally similar to .save() - it saves a model into
      a temporary directory and produces a zip archive, then loads the result as a io.BytesIO
      object. The result of this function is also compatible with .load(), and cleanup is
      handled automatically.

      Args:
          *args, **kwargs (dict): Optional keyword arguments for saving.
      Returns:
          io.BytesIO: File like object holding an exported model in memory as
              a io.BytesIO object.


   .. py:method:: as_bytes(*args, **kwargs) -> bytes

      Produces a bytes object corresponding to a zip archive affiliated with a given
      model. This method wraps is functionally similar to .save() - it saves a model into
      a temporary directory and produces a zip archive, then loads the result as a bytes
      object. The result of this function is also compatible with .load(), and cleanup is
      handled automatically.

      Args:
          *args, **kwargs (dict): Optional keyword arguments for saving.
      Returns:
          bytes: bytes object holding an exported model in memory.


   .. py:method:: run(*args, **kwargs)

      Run a model - this typically makes a single prediction and returns an object from the
      data model.


   .. py:method:: run_batch(*args, **kwargs)

      Run a model in batch mode - this typically ingests an iterable of inputs that can be
      applied to run & returns a list of data model objects that run ordinarily returns. A module
      may override this method to provide faster evaluation capabilities, e.g., by leveraging
      vectorization during prediction.

      All provided args and kwargs that should be expanded with the batch should be provided as
      prebatched iterables. If a provided arg/kwarg is not provided as an iterable, it will be
      passed as is to all self contained run calls, which may be the case in some rare cases,
      such as runtime explanability enablement.

      This function is intentionally kept as simple as possible. In order to maintain its
      simplicity, all argument iterables must be the same length, where the length of every
      provided iterable is presumed to be the batch size. If an iterable must be passed as
      arg to each run call, batch run must be called by wrapping it in another iterable and
      duplicating the iterable arg to match the size, or (ideally) overridden in the subclass
      as necessary.

      Args:
          *args: Variable length argument list to be passed directly to run().
          **kwargs: Arbitrary keyword arguments to be passed directly to run().
      Returns:
          tuple: Iterable of prediction outputs, run as a batch.


   .. py:method:: timed_run(*args, num_seconds=None, num_iterations=None, **kwargs)

      Time a number of runs over set seconds or iterations.

      Args:
          *args (list): Will be passed to `self.run`.
          num_seconds (int): Minimum number of seconds to run timed_run over.
              Will most likely be more than this value due to its waiting for
              the each call to `self.run` to finish.
          num_iterations (int): Minimum number of iterations to run timed_run
              over. Will run exactly this many times.
          **kwargs (dict): Will be passed to `self.run`.

      Returns:
          int, int, caikit.core.data_model.DataBase: The first return value is
              the total time spent in the `self.run` loop. The second return
              value is the total number of calls to `self.run` were made. The
              return value is the output of the module's run method

      Notes:
          You can pass everything that should go to the run function normally using args/kwargs.
          Example: `model.timed_run("some example text", num_seconds=60)`

      By default it will run for greater than or equal to 120 seconds.


   .. py:method:: stream(data_stream, *args, **kwargs)

      Lazily evaluate a run() on a given model by constructing a new data stream generator
      from the results. Note that we do not allow datastreams in args/kwargs. In rare cases,
      this may mean that stream() is not available, e.g., for keywords extraction. In these
      cases, stream() should be overridden in the subclass (module implementation) to allow
      and expand along multiple data streams.

      Args:
          data_stream (caikit.core.data_model.DataStream): Datastream to be
              lazily sequentially processed by the module under consideration.
          *args: Variable length argument list to be passed directly to run().
          **kwargs: Arbitrary keyword arguments to be passed directly to run().
      Returns:
          protobufs: A DataBase object.


   .. py:method:: train(*args, **kwargs)
      :classmethod:


      Train a model.


   .. py:method:: validate_training_data(training_data: Union[str, caikit.core.data_model.DataStream], limit: int = -1) -> List[caikit.core.exceptions.validation_error.DataValidationError]
      :classmethod:


      Validate a set of training data, passed as a filename or as a data stream.
      Return up to `limit` number of DataValidationErrors


   .. py:attribute:: evaluation_type
      :value: None


   .. py:attribute:: evaluator
      :value: None


   .. py:method:: find_label_func(*_args, **_kwargs)
      :staticmethod:

      :abstractmethod:


      Function used to extract "label" from a prediction/result of a module's .run method.
      Define if you wish to have more specific evaluation metrics. Implemented in subclass.


   .. py:method:: find_label_data_func(*_args, **_kwargs)
      :staticmethod:

      :abstractmethod:


      Function used to extract data belonging to class "label" from a prediction/result
      of a module's .run method. Define if you wish to have more specific evaluation metrics.
      Implemented in subclass.


   .. py:method:: evaluate_quality(dataset_path, preprocess_func=None, detailed_metrics=False, labels=None, partial_match_metrics=False, max_hierarchy_levels=3, **kwargs)

      Run quality evaluation for instance of module

      Args:
          dataset_path (str): Path to where the input "gold set" dataset
              lives. Most often this is .json file.
          preprocess_func (method): Function used as proxy for any preliminary
              steps that need to be taken to run the model on the input text.
              This helper function ultimately leads to the input to this
              module and may involve executing other modules.
          detailed_metrics: boolean (Optional, defaults to False)
              Only for 'keywords'. Include partial scores and scores over every text in document.
          labels: list (Optional, defaults to None)
              Optional list of class labels to evaluate quality on. By default evaluation is done
              over all class labels. Using this, you can explicitly mention only a subset of
              labels to include in the quality evaluation.
          partial_match_metrics: boolean (Optional, defaults to False)
              Include partial match micro avg F1.
          max_hierarchy_levels (int): Used in hierarchical multilabel
              multiclass evaluation only. The number of levels in the
              hierarchy to run model evaluation on, in addition to complete
              matches.
          *args, **kwargs: Optional arguments which can be used by goldset/prediction
              set extraction.
              Nonekeyword arguments: `block_level`: str
                  For any module that has pre processing steps in the
                  middle of raw text and actual module input, use the input from gold standard
                  labels instead of a pre-process function. Useful for measuring quality for the
                  'block' alone (instead of the module + pre_process pipeline)
      Returns:
          dict: Dictionary of results provided by the `self.evaluator.run`
              function, depending on the associated `evaluation_type`. Reports
              things like precision, recall, and f1.


   .. py:method:: _is_expandable_iterable(arg)
      :staticmethod:


      Check to see if something is a list / tuple of data model objects or strings. If it is,
      we consider it "expandable", meaning that one element of the iterable to one run call. In
      contrast, if something is not expandable, it will be passed as is to each call.

      Args:
          arg (any): Argument to run_batch being considered.
      Returns:
          bool: True if the argument is a compatible iterable, False
              otherwise.


   .. py:method:: _validate_and_extract_batch_size(*args, **kwargs)

      Check to ensure that there's at least one iterable whose length is well defined,
      i.e., no generators, and that if multiple iterable arg/kwarg values are provided,
      they are all the same length.

      Args:
          *args: Variable length argument list to be passed directly to run().
          **kwargs: Arbitrary keyword arguments to be passed directly to run().
      Returns:
          int: Inferred batch size based on expandable iterables.


   .. py:method:: _validate_arg_and_verify_batch_size(val, current_batch_size)

      Check an arg value from args/kwargs. If we find that it's an expandable iterable, see
      if it conflicts with what we know about the inferred batch size so far.

      args:
          val (any): Argument / keyword argument value being inspected.
          current_batch_size (None | int): Current inferred batch size from
              previous args/kwargs, or None if no inferences have been made on
              other expandable iterables yet.
      Returns:
          None | inferred batch size.


   .. py:method:: _build_args_for_default_run_with_batch(fixed_args, expanded_args, idx)
      :staticmethod:


      Build the non keyword arguments for run_batch's default implementation by expanding
      iterable args where possible, and grouping them with repeated noniterable arguments. The
      index correspondes to the current document under consideration.

      Args:
          fixed_args (dict): Noniterable args - common across all documents.
          expanded_args (dict): Iterable args - we'll need to index into this
              to get our doc arg.
          idx (int): Index of the document being considered.
      Returns:
          list: Args to be run for document [idx].


   .. py:method:: _build_kwargs_for_default_run_with_batch(fixed_kwargs, expanded_kwargs, idx)
      :staticmethod:


      Similar to the previous function, but for kwargs. Note that we can just clone our fixed
      kwargs instead of cycling through them, because order doesn't matter here.

      Args:
          fixed_args (dict): Noniterable valued kwargs - common across all
              documents.
          expanded_args (dict): Iterable valued kwargs - we'll need to index
              into these to get our doc kwarg.
      Returns:
          dict: Kwargs to be run for document [idx].


   .. py:method:: _extract_gold_set(dataset)

      Method for extracting gold set from dataset. Implemented in subclass.

      Args:
          dataset (object): In-memory version of whatever is loaded from on-
              disk. May be json, txt, etc.

      Returns:
          list: List of labels in the format of the module_type that is being
              called.


   .. py:method:: _extract_pred_set(dataset, preprocess_func=None, **kwargs)

      Method for extracting pred set from dataset. Implemented in subclass.

      Args:
          dataset (object): In-memory version of whatever is loaded from on-
              disk. May be json, txt, etc.
          preprocess_func (method): Function used as proxy for any preliminary
              steps that need to be taken to run the model on the input text.
              This helper function ultimately leads to the input to this
              module and may involve executing other modules.
          **kwargs (dict): Optional keyword arguments for prediction set extraction.
      Returns:
          list: List of labels in the format of the module_type that is being
              called.


   .. py:method:: _load_evaluation_dataset(dataset_path)
      :staticmethod:


      Helper specifically for dataset loading.

      Args:
          dataset_path (str): Path to where the input 'gold set' dataset
              lives. Most often this is .json file.

      Returns:
          object: list, dict, or other python object, depending on the input
              dataset_path extension. Currently only supports `.json` and uses
              fileio from toolkit.


   .. py:method:: _extract_gold_annotations(gold_set)
      :staticmethod:


      Extract the core list of annotations that is needed for quality evaluation

      Args:
          gold_set (list)
      Returns:
          gold_annotations: list


   .. py:method:: _extract_pred_annotations(pred_set)
      :staticmethod:


      Extract the core list of predictions that is needed for quality evaluation

      Args:
          pred_set (list)
      Returns:
          pred_annotations: list


   .. py:method:: _generate_report(report, gold_set)
      :staticmethod:


      Generate the quality report output
      Args:
          report (dict)
          gold_set (list(dict))