caikit.interfaces.nlp.data_model
Submodules
- caikit.interfaces.nlp.data_model.classification
- caikit.interfaces.nlp.data_model.embedding_vectors
- caikit.interfaces.nlp.data_model.package
- caikit.interfaces.nlp.data_model.reranker
- caikit.interfaces.nlp.data_model.sentence_similarity
- caikit.interfaces.nlp.data_model.text
- caikit.interfaces.nlp.data_model.text_generation
Attributes
Classes
A single classification prediction. |
|
Classification results generated from a text and consisting multiple classes. |
|
A classification training record consisting of a single train instance. |
|
Classification result on text produced by a text generation model, contains |
|
Streaming classification on generated text result that indicates up to where in stream |
|
A single token classification prediction. |
|
Token classification results generated from a text and consisting multiple classes. |
|
Streaming token classification results that indicates up to where in stream is processed. |
|
Result from text embedding task |
|
Results from text embeddings task |
|
Result for one query in a rerank task. |
|
Results list for rerank tasks (supporting multiple queries). |
|
The score for one document (one query) |
|
Scores for a query in a rerank task. |
|
Result for sentence similarity task |
|
Results list for sentence similarity tasks |
|
Scores for a sentence similarity task |
|
Streaming tokenization result that provides pointer to the input chunk processed |
|
Tokens here are the basic units of text. Tokens can be characters, words, |
|
Tokenization result generated from a text. |
|
Streaming tokenization result that indicates up to where in stream is processed. |
|
Create a collection of name/value pairs. |
|
A DataObject is a data model class that is backed by a @dataclass. |
|
A DataObject is a data model class that is backed by a @dataclass. |
|
A DataObject is a data model class that is backed by a @dataclass. |
|
A DataObject is a data model class that is backed by a @dataclass. |
Package Contents
- class caikit.interfaces.nlp.data_model.ClassificationResult[source]
Bases:
caikit.core.DataObjectBaseA single classification prediction.
- label: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)]
- score: py_to_proto.dataclass_to_proto.Annotated[float, FieldNumber(2)]
- class caikit.interfaces.nlp.data_model.ClassificationResults[source]
Bases:
caikit.core.DataObjectBaseClassification results generated from a text and consisting multiple classes.
- results: py_to_proto.dataclass_to_proto.Annotated[List[ClassificationResult], FieldNumber(1)]
- class caikit.interfaces.nlp.data_model.ClassificationTrainRecord[source]
Bases:
caikit.core.DataObjectBaseA classification training record consisting of a single train instance.
- text: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)]
- labels: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(2)]
- class caikit.interfaces.nlp.data_model.ClassifiedGeneratedTextResult[source]
Bases:
caikit.core.DataObjectBaseClassification result on text produced by a text generation model, contains information from the original text generation output as well as the result of classification on the generated text.
- class TextGenTokenClassificationResults[source]
Bases:
caikit.core.DataObjectBaseA DataObject is a data model class that is backed by a @dataclass.
Data model classes that use the @dataobject decorator must derive from this base class.
- input: py_to_proto.dataclass_to_proto.Annotated[List[TokenClassificationResult] | None, FieldNumber(10)]
- output: py_to_proto.dataclass_to_proto.Annotated[List[TokenClassificationResult] | None, FieldNumber(20)]
- generated_text: py_to_proto.dataclass_to_proto.Annotated[str | None, FieldNumber(1)]
- token_classification_results: py_to_proto.dataclass_to_proto.Annotated[ClassifiedGeneratedTextResult.TextGenTokenClassificationResults | None, FieldNumber(2)]
- finish_reason: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.nlp.data_model.text_generation.FinishReason | None, FieldNumber(3)]
- generated_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(4)]
- seed: py_to_proto.dataclass_to_proto.Annotated[numpy.uint64 | None, FieldNumber(5)]
- input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(6)]
- warnings: py_to_proto.dataclass_to_proto.Annotated[List[InputWarning] | None, FieldNumber(9)]
- tokens: py_to_proto.dataclass_to_proto.Annotated[List[caikit.interfaces.nlp.data_model.text_generation.GeneratedToken] | None, FieldNumber(10)]
- input_tokens: py_to_proto.dataclass_to_proto.Annotated[List[caikit.interfaces.nlp.data_model.text_generation.GeneratedToken] | None, FieldNumber(11)]
- class caikit.interfaces.nlp.data_model.ClassifiedGeneratedTextStreamResult[source]
Bases:
ClassifiedGeneratedTextResultStreaming classification on generated text result that indicates up to where in stream is processed.
- processed_index: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(7)]
- start_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(8)]
- class caikit.interfaces.nlp.data_model.TokenClassificationResult[source]
Bases:
caikit.core.DataObjectBaseA single token classification prediction.
- start: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(1)]
- end: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
- word: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(3)]
- entity: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(4)]
- entity_group: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(5)]
- score: py_to_proto.dataclass_to_proto.Annotated[float, FieldNumber(6)]
- token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(7)]
- class caikit.interfaces.nlp.data_model.TokenClassificationResults[source]
Bases:
caikit.core.DataObjectBaseToken classification results generated from a text and consisting multiple classes.
- results: py_to_proto.dataclass_to_proto.Annotated[List[TokenClassificationResult], FieldNumber(1)]
- class caikit.interfaces.nlp.data_model.TokenClassificationStreamResult[source]
Bases:
TokenClassificationResultsStreaming token classification results that indicates up to where in stream is processed.
- processed_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
- start_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(3)]
- class caikit.interfaces.nlp.data_model.EmbeddingResult[source]
Bases:
caikit.core.DataObjectBaseResult from text embedding task
- result: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.Vector1D, FieldNumber(1)]
- producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)]
- input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(3)]
- class caikit.interfaces.nlp.data_model.EmbeddingResults[source]
Bases:
caikit.core.DataObjectBaseResults from text embeddings task
- results: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ListOfVector1D, FieldNumber(1)]
- producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)]
- input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(3)]
- caikit.interfaces.nlp.data_model.NLP_PACKAGE = 'caikit_data_model.nlp'
- class caikit.interfaces.nlp.data_model.RerankResult[source]
Bases:
caikit.core.DataObjectBaseResult for one query in a rerank task. This is a list of n ReRankScore where n is based on top_n documents and each score indicates the relevance of that document for this query. Results are ordered most-relevant first.
- result: py_to_proto.dataclass_to_proto.Annotated[RerankScores, FieldNumber(1)]
- producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)]
- input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(3)]
- class caikit.interfaces.nlp.data_model.RerankResults[source]
Bases:
caikit.core.DataObjectBaseResults list for rerank tasks (supporting multiple queries). For multiple queries, each one has a RerankQueryResult (ranking the documents for that query).
- results: py_to_proto.dataclass_to_proto.Annotated[List[RerankScores], FieldNumber(1)]
- producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)]
- input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(3)]
- class caikit.interfaces.nlp.data_model.RerankScore[source]
Bases:
caikit.core.DataObjectBaseThe score for one document (one query)
- document: py_to_proto.dataclass_to_proto.Annotated[caikit.core.data_model.json_dict.JsonDict | None, FieldNumber(1)]
- index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
- score: py_to_proto.dataclass_to_proto.Annotated[float, FieldNumber(3)]
- text: py_to_proto.dataclass_to_proto.Annotated[str | None, FieldNumber(4)]
- class caikit.interfaces.nlp.data_model.RerankScores[source]
Bases:
caikit.core.DataObjectBaseScores for a query in a rerank task. This is a list of n ReRankScore where n is based on top_n documents and each score indicates the relevance of that document for this query. Results are ordered most-relevant first.
- query: py_to_proto.dataclass_to_proto.Annotated[str | None, FieldNumber(1)]
- scores: py_to_proto.dataclass_to_proto.Annotated[List[RerankScore], FieldNumber(2)]
- class caikit.interfaces.nlp.data_model.SentenceSimilarityResult[source]
Bases:
caikit.core.DataObjectBaseResult for sentence similarity task
- result: py_to_proto.dataclass_to_proto.Annotated[SentenceSimilarityScores, FieldNumber(1)]
- producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)]
- input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(3)]
- class caikit.interfaces.nlp.data_model.SentenceSimilarityResults[source]
Bases:
caikit.core.DataObjectBaseResults list for sentence similarity tasks
- results: py_to_proto.dataclass_to_proto.Annotated[List[SentenceSimilarityScores], FieldNumber(1)]
- producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)]
- input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(3)]
- class caikit.interfaces.nlp.data_model.SentenceSimilarityScores[source]
Bases:
caikit.core.DataObjectBaseScores for a sentence similarity task
- scores: py_to_proto.dataclass_to_proto.Annotated[List[float], FieldNumber(1)]
- class caikit.interfaces.nlp.data_model.ChunkerTokenizationStreamResult[source]
Bases:
TokenizationStreamResultStreaming tokenization result that provides pointer to the input chunk processed
- input_start_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(20)]
- input_end_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(21)]
- class caikit.interfaces.nlp.data_model.Token[source]
Bases:
caikit.core.DataObjectBaseTokens here are the basic units of text. Tokens can be characters, words, sub-words, or other segments of text or code, depending on the method of tokenization chosen or the task being implemented.
- start: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(1)]
- end: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
- text: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(3)]
- class caikit.interfaces.nlp.data_model.TokenizationResults[source]
Bases:
caikit.core.DataObjectBaseTokenization result generated from a text.
- token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(4)]
- class caikit.interfaces.nlp.data_model.TokenizationStreamResult[source]
Bases:
TokenizationResultsStreaming tokenization result that indicates up to where in stream is processed.
- processed_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
- start_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(3)]
- class caikit.interfaces.nlp.data_model.FinishReason(*args, **kwds)[source]
Bases:
enum.EnumCreate a collection of name/value pairs.
Example enumeration:
>>> class Color(Enum): ... RED = 1 ... BLUE = 2 ... GREEN = 3
Access them by:
attribute access:
>>> Color.RED <Color.RED: 1>
value lookup:
>>> Color(1) <Color.RED: 1>
name lookup:
>>> Color['RED'] <Color.RED: 1>
Enumerations can be iterated over, and know how many members they have:
>>> len(Color) 3
>>> list(Color) [<Color.RED: 1>, <Color.BLUE: 2>, <Color.GREEN: 3>]
Methods can be added to enumerations, and members can have their own attributes – see the documentation for details.
- NOT_FINISHED = 0
- MAX_TOKENS = 1
- EOS_TOKEN = 2
- CANCELLED = 3
- TIME_LIMIT = 4
- STOP_SEQUENCE = 5
- TOKEN_LIMIT = 6
- ERROR = 7
- class caikit.interfaces.nlp.data_model.GeneratedTextResult[source]
Bases:
caikit.core.DataObjectBaseA DataObject is a data model class that is backed by a @dataclass.
Data model classes that use the @dataobject decorator must derive from this base class.
- generated_text: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)]
- generated_tokens: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
- finish_reason: py_to_proto.dataclass_to_proto.Annotated[FinishReason, FieldNumber(3)]
- producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(4)]
- input_token_count: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(5)]
- seed: py_to_proto.dataclass_to_proto.Annotated[numpy.uint64 | None, FieldNumber(6)]
- tokens: py_to_proto.dataclass_to_proto.Annotated[List[GeneratedToken] | None, FieldNumber(7)]
- input_tokens: py_to_proto.dataclass_to_proto.Annotated[List[GeneratedToken] | None, FieldNumber(8)]
- class caikit.interfaces.nlp.data_model.GeneratedTextStreamResult[source]
Bases:
caikit.core.DataObjectBaseA DataObject is a data model class that is backed by a @dataclass.
Data model classes that use the @dataobject decorator must derive from this base class.
- generated_text: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)]
- tokens: py_to_proto.dataclass_to_proto.Annotated[List[GeneratedToken] | None, FieldNumber(2)]
- details: py_to_proto.dataclass_to_proto.Annotated[TokenStreamDetails | None, FieldNumber(3)]
- producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(4)]
- input_tokens: py_to_proto.dataclass_to_proto.Annotated[List[GeneratedToken] | None, FieldNumber(5)]
- class caikit.interfaces.nlp.data_model.GeneratedToken[source]
Bases:
caikit.core.DataObjectBaseA DataObject is a data model class that is backed by a @dataclass.
Data model classes that use the @dataobject decorator must derive from this base class.
- text: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)]
- logprob: py_to_proto.dataclass_to_proto.Annotated[float | None, FieldNumber(3)]
- rank: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(4)]
- class caikit.interfaces.nlp.data_model.TokenStreamDetails[source]
Bases:
caikit.core.DataObjectBaseA DataObject is a data model class that is backed by a @dataclass.
Data model classes that use the @dataobject decorator must derive from this base class.
- finish_reason: py_to_proto.dataclass_to_proto.Annotated[FinishReason, FieldNumber(1)]
- generated_tokens: py_to_proto.dataclass_to_proto.Annotated[numpy.uint32, FieldNumber(2)]
- seed: py_to_proto.dataclass_to_proto.Annotated[numpy.uint64 | None, FieldNumber(3)]
- input_token_count: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(4)]