caikit.interfaces.nlp.data_model ================================ .. py:module:: caikit.interfaces.nlp.data_model Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/caikit/interfaces/nlp/data_model/classification/index /autoapi/caikit/interfaces/nlp/data_model/embedding_vectors/index /autoapi/caikit/interfaces/nlp/data_model/package/index /autoapi/caikit/interfaces/nlp/data_model/reranker/index /autoapi/caikit/interfaces/nlp/data_model/sentence_similarity/index /autoapi/caikit/interfaces/nlp/data_model/text/index /autoapi/caikit/interfaces/nlp/data_model/text_generation/index Attributes ---------- .. autoapisummary:: caikit.interfaces.nlp.data_model.NLP_PACKAGE Classes ------- .. autoapisummary:: caikit.interfaces.nlp.data_model.ClassificationResult caikit.interfaces.nlp.data_model.ClassificationResults caikit.interfaces.nlp.data_model.ClassificationTrainRecord caikit.interfaces.nlp.data_model.ClassifiedGeneratedTextResult caikit.interfaces.nlp.data_model.ClassifiedGeneratedTextStreamResult caikit.interfaces.nlp.data_model.TokenClassificationResult caikit.interfaces.nlp.data_model.TokenClassificationResults caikit.interfaces.nlp.data_model.TokenClassificationStreamResult caikit.interfaces.nlp.data_model.EmbeddingResult caikit.interfaces.nlp.data_model.EmbeddingResults caikit.interfaces.nlp.data_model.RerankResult caikit.interfaces.nlp.data_model.RerankResults caikit.interfaces.nlp.data_model.RerankScore caikit.interfaces.nlp.data_model.RerankScores caikit.interfaces.nlp.data_model.SentenceSimilarityResult caikit.interfaces.nlp.data_model.SentenceSimilarityResults caikit.interfaces.nlp.data_model.SentenceSimilarityScores caikit.interfaces.nlp.data_model.ChunkerTokenizationStreamResult caikit.interfaces.nlp.data_model.Token caikit.interfaces.nlp.data_model.TokenizationResults caikit.interfaces.nlp.data_model.TokenizationStreamResult caikit.interfaces.nlp.data_model.FinishReason caikit.interfaces.nlp.data_model.GeneratedTextResult caikit.interfaces.nlp.data_model.GeneratedTextStreamResult caikit.interfaces.nlp.data_model.GeneratedToken caikit.interfaces.nlp.data_model.TokenStreamDetails Package Contents ---------------- .. py:class:: ClassificationResult Bases: :py:obj:`caikit.core.DataObjectBase` A single classification prediction. .. py:attribute:: label :type: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)] .. py:attribute:: score :type: py_to_proto.dataclass_to_proto.Annotated[float, FieldNumber(2)] .. py:class:: ClassificationResults Bases: :py:obj:`caikit.core.DataObjectBase` Classification results generated from a text and consisting multiple classes. .. py:attribute:: results :type: py_to_proto.dataclass_to_proto.Annotated[List[ClassificationResult], FieldNumber(1)] .. py:class:: ClassificationTrainRecord Bases: :py:obj:`caikit.core.DataObjectBase` A classification training record consisting of a single train instance. .. py:attribute:: text :type: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)] .. py:attribute:: labels :type: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(2)] .. py:class:: ClassifiedGeneratedTextResult Bases: :py:obj:`caikit.core.DataObjectBase` Classification result on text produced by a text generation model, contains information from the original text generation output as well as the result of classification on the generated text. .. py:class:: TextGenTokenClassificationResults Bases: :py:obj:`caikit.core.DataObjectBase` A DataObject is a data model class that is backed by a @dataclass. Data model classes that use the @dataobject decorator must derive from this base class. .. py:attribute:: input :type: py_to_proto.dataclass_to_proto.Annotated[Optional[List[TokenClassificationResult]], FieldNumber(10)] .. py:attribute:: output :type: py_to_proto.dataclass_to_proto.Annotated[Optional[List[TokenClassificationResult]], FieldNumber(20)] .. py:attribute:: generated_text :type: py_to_proto.dataclass_to_proto.Annotated[Optional[str], FieldNumber(1)] .. py:attribute:: token_classification_results :type: py_to_proto.dataclass_to_proto.Annotated[Optional[ClassifiedGeneratedTextResult.TextGenTokenClassificationResults], FieldNumber(2)] .. py:attribute:: finish_reason :type: py_to_proto.dataclass_to_proto.Annotated[Optional[caikit.interfaces.nlp.data_model.text_generation.FinishReason], FieldNumber(3)] .. py:attribute:: generated_token_count :type: py_to_proto.dataclass_to_proto.Annotated[Optional[int], FieldNumber(4)] .. py:attribute:: seed :type: py_to_proto.dataclass_to_proto.Annotated[Optional[numpy.uint64], FieldNumber(5)] .. py:attribute:: input_token_count :type: py_to_proto.dataclass_to_proto.Annotated[Optional[int], FieldNumber(6)] .. py:attribute:: warnings :type: py_to_proto.dataclass_to_proto.Annotated[Optional[List[InputWarning]], FieldNumber(9)] .. py:attribute:: tokens :type: py_to_proto.dataclass_to_proto.Annotated[Optional[List[caikit.interfaces.nlp.data_model.text_generation.GeneratedToken]], FieldNumber(10)] .. py:attribute:: input_tokens :type: py_to_proto.dataclass_to_proto.Annotated[Optional[List[caikit.interfaces.nlp.data_model.text_generation.GeneratedToken]], FieldNumber(11)] .. py:class:: ClassifiedGeneratedTextStreamResult Bases: :py:obj:`ClassifiedGeneratedTextResult` Streaming classification on generated text result that indicates up to where in stream is processed. .. py:attribute:: processed_index :type: py_to_proto.dataclass_to_proto.Annotated[Optional[int], FieldNumber(7)] .. py:attribute:: start_index :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(8)] .. py:class:: TokenClassificationResult Bases: :py:obj:`caikit.core.DataObjectBase` A single token classification prediction. .. py:attribute:: start :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(1)] .. py:attribute:: end :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)] .. py:attribute:: word :type: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(3)] .. py:attribute:: entity :type: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(4)] .. py:attribute:: entity_group :type: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(5)] .. py:attribute:: score :type: py_to_proto.dataclass_to_proto.Annotated[float, FieldNumber(6)] .. py:attribute:: token_count :type: py_to_proto.dataclass_to_proto.Annotated[Optional[int], FieldNumber(7)] .. py:class:: TokenClassificationResults Bases: :py:obj:`caikit.core.DataObjectBase` Token classification results generated from a text and consisting multiple classes. .. py:attribute:: results :type: py_to_proto.dataclass_to_proto.Annotated[List[TokenClassificationResult], FieldNumber(1)] .. py:class:: TokenClassificationStreamResult Bases: :py:obj:`TokenClassificationResults` Streaming token classification results that indicates up to where in stream is processed. .. py:attribute:: processed_index :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)] .. py:attribute:: start_index :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(3)] .. py:class:: EmbeddingResult Bases: :py:obj:`caikit.core.DataObjectBase` Result from text embedding task .. py:attribute:: result :type: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.Vector1D, FieldNumber(1)] .. py:attribute:: producer_id :type: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)] .. py:attribute:: input_token_count :type: py_to_proto.dataclass_to_proto.Annotated[Optional[int], FieldNumber(3)] .. py:class:: EmbeddingResults Bases: :py:obj:`caikit.core.DataObjectBase` Results from text embeddings task .. py:attribute:: results :type: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ListOfVector1D, FieldNumber(1)] .. py:attribute:: producer_id :type: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)] .. py:attribute:: input_token_count :type: py_to_proto.dataclass_to_proto.Annotated[Optional[int], FieldNumber(3)] .. py:data:: NLP_PACKAGE :value: 'caikit_data_model.nlp' .. py:class:: RerankResult Bases: :py:obj:`caikit.core.DataObjectBase` Result for one query in a rerank task. This is a list of n ReRankScore where n is based on top_n documents and each score indicates the relevance of that document for this query. Results are ordered most-relevant first. .. py:attribute:: result :type: py_to_proto.dataclass_to_proto.Annotated[RerankScores, FieldNumber(1)] .. py:attribute:: producer_id :type: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)] .. py:attribute:: input_token_count :type: py_to_proto.dataclass_to_proto.Annotated[Optional[int], FieldNumber(3)] .. py:class:: RerankResults Bases: :py:obj:`caikit.core.DataObjectBase` Results list for rerank tasks (supporting multiple queries). For multiple queries, each one has a RerankQueryResult (ranking the documents for that query). .. py:attribute:: results :type: py_to_proto.dataclass_to_proto.Annotated[List[RerankScores], FieldNumber(1)] .. py:attribute:: producer_id :type: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)] .. py:attribute:: input_token_count :type: py_to_proto.dataclass_to_proto.Annotated[Optional[int], FieldNumber(3)] .. py:class:: RerankScore Bases: :py:obj:`caikit.core.DataObjectBase` The score for one document (one query) .. py:attribute:: document :type: py_to_proto.dataclass_to_proto.Annotated[Optional[caikit.core.data_model.json_dict.JsonDict], FieldNumber(1)] .. py:attribute:: index :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)] .. py:attribute:: score :type: py_to_proto.dataclass_to_proto.Annotated[float, FieldNumber(3)] .. py:attribute:: text :type: py_to_proto.dataclass_to_proto.Annotated[Optional[str], FieldNumber(4)] .. py:class:: RerankScores Bases: :py:obj:`caikit.core.DataObjectBase` Scores for a query in a rerank task. This is a list of n ReRankScore where n is based on top_n documents and each score indicates the relevance of that document for this query. Results are ordered most-relevant first. .. py:attribute:: query :type: py_to_proto.dataclass_to_proto.Annotated[Optional[str], FieldNumber(1)] .. py:attribute:: scores :type: py_to_proto.dataclass_to_proto.Annotated[List[RerankScore], FieldNumber(2)] .. py:class:: SentenceSimilarityResult Bases: :py:obj:`caikit.core.DataObjectBase` Result for sentence similarity task .. py:attribute:: result :type: py_to_proto.dataclass_to_proto.Annotated[SentenceSimilarityScores, FieldNumber(1)] .. py:attribute:: producer_id :type: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)] .. py:attribute:: input_token_count :type: py_to_proto.dataclass_to_proto.Annotated[Optional[int], FieldNumber(3)] .. py:class:: SentenceSimilarityResults Bases: :py:obj:`caikit.core.DataObjectBase` Results list for sentence similarity tasks .. py:attribute:: results :type: py_to_proto.dataclass_to_proto.Annotated[List[SentenceSimilarityScores], FieldNumber(1)] .. py:attribute:: producer_id :type: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)] .. py:attribute:: input_token_count :type: py_to_proto.dataclass_to_proto.Annotated[Optional[int], FieldNumber(3)] .. py:class:: SentenceSimilarityScores Bases: :py:obj:`caikit.core.DataObjectBase` Scores for a sentence similarity task .. py:attribute:: scores :type: py_to_proto.dataclass_to_proto.Annotated[List[float], FieldNumber(1)] .. py:class:: ChunkerTokenizationStreamResult Bases: :py:obj:`TokenizationStreamResult` Streaming tokenization result that provides pointer to the input chunk processed .. py:attribute:: input_start_index :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(20)] .. py:attribute:: input_end_index :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(21)] .. py:class:: Token Bases: :py:obj:`caikit.core.DataObjectBase` Tokens here are the basic units of text. Tokens can be characters, words, sub-words, or other segments of text or code, depending on the method of tokenization chosen or the task being implemented. .. py:attribute:: start :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(1)] .. py:attribute:: end :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)] .. py:attribute:: text :type: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(3)] .. py:class:: TokenizationResults Bases: :py:obj:`caikit.core.DataObjectBase` Tokenization result generated from a text. .. py:attribute:: results :type: py_to_proto.dataclass_to_proto.Annotated[Optional[List[Token]], FieldNumber(1)] .. py:attribute:: token_count :type: py_to_proto.dataclass_to_proto.Annotated[Optional[int], FieldNumber(4)] .. py:class:: TokenizationStreamResult Bases: :py:obj:`TokenizationResults` Streaming tokenization result that indicates up to where in stream is processed. .. py:attribute:: processed_index :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)] .. py:attribute:: start_index :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(3)] .. py:class:: FinishReason(*args, **kwds) Bases: :py:obj:`enum.Enum` Create a collection of name/value pairs. Example enumeration: >>> class Color(Enum): ... RED = 1 ... BLUE = 2 ... GREEN = 3 Access them by: - attribute access:: >>> Color.RED - value lookup: >>> Color(1) - name lookup: >>> Color['RED'] Enumerations can be iterated over, and know how many members they have: >>> len(Color) 3 >>> list(Color) [, , ] Methods can be added to enumerations, and members can have their own attributes -- see the documentation for details. .. py:attribute:: NOT_FINISHED :value: 0 .. py:attribute:: MAX_TOKENS :value: 1 .. py:attribute:: EOS_TOKEN :value: 2 .. py:attribute:: CANCELLED :value: 3 .. py:attribute:: TIME_LIMIT :value: 4 .. py:attribute:: STOP_SEQUENCE :value: 5 .. py:attribute:: TOKEN_LIMIT :value: 6 .. py:attribute:: ERROR :value: 7 .. py:class:: GeneratedTextResult Bases: :py:obj:`caikit.core.DataObjectBase` A DataObject is a data model class that is backed by a @dataclass. Data model classes that use the @dataobject decorator must derive from this base class. .. py:attribute:: generated_text :type: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)] .. py:attribute:: generated_tokens :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)] .. py:attribute:: finish_reason :type: py_to_proto.dataclass_to_proto.Annotated[FinishReason, FieldNumber(3)] .. py:attribute:: producer_id :type: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(4)] .. py:attribute:: input_token_count :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(5)] .. py:attribute:: seed :type: py_to_proto.dataclass_to_proto.Annotated[Optional[numpy.uint64], FieldNumber(6)] .. py:attribute:: tokens :type: py_to_proto.dataclass_to_proto.Annotated[Optional[List[GeneratedToken]], FieldNumber(7)] .. py:attribute:: input_tokens :type: py_to_proto.dataclass_to_proto.Annotated[Optional[List[GeneratedToken]], FieldNumber(8)] .. py:class:: GeneratedTextStreamResult Bases: :py:obj:`caikit.core.DataObjectBase` A DataObject is a data model class that is backed by a @dataclass. Data model classes that use the @dataobject decorator must derive from this base class. .. py:attribute:: generated_text :type: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)] .. py:attribute:: tokens :type: py_to_proto.dataclass_to_proto.Annotated[Optional[List[GeneratedToken]], FieldNumber(2)] .. py:attribute:: details :type: py_to_proto.dataclass_to_proto.Annotated[Optional[TokenStreamDetails], FieldNumber(3)] .. py:attribute:: producer_id :type: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(4)] .. py:attribute:: input_tokens :type: py_to_proto.dataclass_to_proto.Annotated[Optional[List[GeneratedToken]], FieldNumber(5)] .. py:class:: GeneratedToken Bases: :py:obj:`caikit.core.DataObjectBase` A DataObject is a data model class that is backed by a @dataclass. Data model classes that use the @dataobject decorator must derive from this base class. .. py:attribute:: text :type: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)] .. py:attribute:: logprob :type: py_to_proto.dataclass_to_proto.Annotated[Optional[float], FieldNumber(3)] .. py:attribute:: rank :type: py_to_proto.dataclass_to_proto.Annotated[Optional[int], FieldNumber(4)] .. py:class:: TokenStreamDetails Bases: :py:obj:`caikit.core.DataObjectBase` A DataObject is a data model class that is backed by a @dataclass. Data model classes that use the @dataobject decorator must derive from this base class. .. py:attribute:: finish_reason :type: py_to_proto.dataclass_to_proto.Annotated[FinishReason, FieldNumber(1)] .. py:attribute:: generated_tokens :type: py_to_proto.dataclass_to_proto.Annotated[numpy.uint32, FieldNumber(2)] .. py:attribute:: seed :type: py_to_proto.dataclass_to_proto.Annotated[Optional[numpy.uint64], FieldNumber(3)] .. py:attribute:: input_token_count :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(4)]