caikit.interfaces.nlp.data_model

Submodules

Attributes

NLP_PACKAGE

Classes

ClassificationResult

A single classification prediction.

ClassificationResults

Classification results generated from a text and consisting multiple classes.

ClassificationTrainRecord

A classification training record consisting of a single train instance.

ClassifiedGeneratedTextResult

Classification result on text produced by a text generation model, contains

ClassifiedGeneratedTextStreamResult

Streaming classification on generated text result that indicates up to where in stream

TokenClassificationResult

A single token classification prediction.

TokenClassificationResults

Token classification results generated from a text and consisting multiple classes.

TokenClassificationStreamResult

Streaming token classification results that indicates up to where in stream is processed.

EmbeddingResult

Result from text embedding task

EmbeddingResults

Results from text embeddings task

RerankResult

Result for one query in a rerank task.

RerankResults

Results list for rerank tasks (supporting multiple queries).

RerankScore

The score for one document (one query)

RerankScores

Scores for a query in a rerank task.

SentenceSimilarityResult

Result for sentence similarity task

SentenceSimilarityResults

Results list for sentence similarity tasks

SentenceSimilarityScores

Scores for a sentence similarity task

ChunkerTokenizationStreamResult

Streaming tokenization result that provides pointer to the input chunk processed

Token

Tokens here are the basic units of text. Tokens can be characters, words,

TokenizationResults

Tokenization result generated from a text.

TokenizationStreamResult

Streaming tokenization result that indicates up to where in stream is processed.

FinishReason

Create a collection of name/value pairs.

GeneratedTextResult

A DataObject is a data model class that is backed by a @dataclass.

GeneratedTextStreamResult

A DataObject is a data model class that is backed by a @dataclass.

GeneratedToken

A DataObject is a data model class that is backed by a @dataclass.

TokenStreamDetails

A DataObject is a data model class that is backed by a @dataclass.

Package Contents

class caikit.interfaces.nlp.data_model.ClassificationResult[source]

Bases: caikit.core.DataObjectBase

A single classification prediction.

label: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)]
score: py_to_proto.dataclass_to_proto.Annotated[float, FieldNumber(2)]
class caikit.interfaces.nlp.data_model.ClassificationResults[source]

Bases: caikit.core.DataObjectBase

Classification results generated from a text and consisting multiple classes.

results: py_to_proto.dataclass_to_proto.Annotated[List[ClassificationResult], FieldNumber(1)]
class caikit.interfaces.nlp.data_model.ClassificationTrainRecord[source]

Bases: caikit.core.DataObjectBase

A classification training record consisting of a single train instance.

text: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)]
labels: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(2)]
class caikit.interfaces.nlp.data_model.ClassifiedGeneratedTextResult[source]

Bases: caikit.core.DataObjectBase

Classification result on text produced by a text generation model, contains information from the original text generation output as well as the result of classification on the generated text.

class TextGenTokenClassificationResults[source]

Bases: caikit.core.DataObjectBase

A DataObject is a data model class that is backed by a @dataclass.

Data model classes that use the @dataobject decorator must derive from this base class.

input: py_to_proto.dataclass_to_proto.Annotated[List[TokenClassificationResult] | None, FieldNumber(10)]
output: py_to_proto.dataclass_to_proto.Annotated[List[TokenClassificationResult] | None, FieldNumber(20)]
generated_text: py_to_proto.dataclass_to_proto.Annotated[str | None, FieldNumber(1)]
token_classification_results: py_to_proto.dataclass_to_proto.Annotated[ClassifiedGeneratedTextResult.TextGenTokenClassificationResults | None, FieldNumber(2)]
finish_reason: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.nlp.data_model.text_generation.FinishReason | None, FieldNumber(3)]
generated_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(4)]
seed: py_to_proto.dataclass_to_proto.Annotated[numpy.uint64 | None, FieldNumber(5)]
input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(6)]
warnings: py_to_proto.dataclass_to_proto.Annotated[List[InputWarning] | None, FieldNumber(9)]
tokens: py_to_proto.dataclass_to_proto.Annotated[List[caikit.interfaces.nlp.data_model.text_generation.GeneratedToken] | None, FieldNumber(10)]
input_tokens: py_to_proto.dataclass_to_proto.Annotated[List[caikit.interfaces.nlp.data_model.text_generation.GeneratedToken] | None, FieldNumber(11)]
class caikit.interfaces.nlp.data_model.ClassifiedGeneratedTextStreamResult[source]

Bases: ClassifiedGeneratedTextResult

Streaming classification on generated text result that indicates up to where in stream is processed.

processed_index: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(7)]
start_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(8)]
class caikit.interfaces.nlp.data_model.TokenClassificationResult[source]

Bases: caikit.core.DataObjectBase

A single token classification prediction.

start: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(1)]
end: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
word: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(3)]
entity: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(4)]
entity_group: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(5)]
score: py_to_proto.dataclass_to_proto.Annotated[float, FieldNumber(6)]
token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(7)]
class caikit.interfaces.nlp.data_model.TokenClassificationResults[source]

Bases: caikit.core.DataObjectBase

Token classification results generated from a text and consisting multiple classes.

results: py_to_proto.dataclass_to_proto.Annotated[List[TokenClassificationResult], FieldNumber(1)]
class caikit.interfaces.nlp.data_model.TokenClassificationStreamResult[source]

Bases: TokenClassificationResults

Streaming token classification results that indicates up to where in stream is processed.

processed_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
start_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(3)]
class caikit.interfaces.nlp.data_model.EmbeddingResult[source]

Bases: caikit.core.DataObjectBase

Result from text embedding task

result: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.Vector1D, FieldNumber(1)]
producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)]
input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(3)]
class caikit.interfaces.nlp.data_model.EmbeddingResults[source]

Bases: caikit.core.DataObjectBase

Results from text embeddings task

results: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ListOfVector1D, FieldNumber(1)]
producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)]
input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(3)]
caikit.interfaces.nlp.data_model.NLP_PACKAGE = 'caikit_data_model.nlp'
class caikit.interfaces.nlp.data_model.RerankResult[source]

Bases: caikit.core.DataObjectBase

Result for one query in a rerank task. This is a list of n ReRankScore where n is based on top_n documents and each score indicates the relevance of that document for this query. Results are ordered most-relevant first.

result: py_to_proto.dataclass_to_proto.Annotated[RerankScores, FieldNumber(1)]
producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)]
input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(3)]
class caikit.interfaces.nlp.data_model.RerankResults[source]

Bases: caikit.core.DataObjectBase

Results list for rerank tasks (supporting multiple queries). For multiple queries, each one has a RerankQueryResult (ranking the documents for that query).

results: py_to_proto.dataclass_to_proto.Annotated[List[RerankScores], FieldNumber(1)]
producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)]
input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(3)]
class caikit.interfaces.nlp.data_model.RerankScore[source]

Bases: caikit.core.DataObjectBase

The score for one document (one query)

document: py_to_proto.dataclass_to_proto.Annotated[caikit.core.data_model.json_dict.JsonDict | None, FieldNumber(1)]
index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
score: py_to_proto.dataclass_to_proto.Annotated[float, FieldNumber(3)]
text: py_to_proto.dataclass_to_proto.Annotated[str | None, FieldNumber(4)]
class caikit.interfaces.nlp.data_model.RerankScores[source]

Bases: caikit.core.DataObjectBase

Scores for a query in a rerank task. This is a list of n ReRankScore where n is based on top_n documents and each score indicates the relevance of that document for this query. Results are ordered most-relevant first.

query: py_to_proto.dataclass_to_proto.Annotated[str | None, FieldNumber(1)]
scores: py_to_proto.dataclass_to_proto.Annotated[List[RerankScore], FieldNumber(2)]
class caikit.interfaces.nlp.data_model.SentenceSimilarityResult[source]

Bases: caikit.core.DataObjectBase

Result for sentence similarity task

result: py_to_proto.dataclass_to_proto.Annotated[SentenceSimilarityScores, FieldNumber(1)]
producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)]
input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(3)]
class caikit.interfaces.nlp.data_model.SentenceSimilarityResults[source]

Bases: caikit.core.DataObjectBase

Results list for sentence similarity tasks

results: py_to_proto.dataclass_to_proto.Annotated[List[SentenceSimilarityScores], FieldNumber(1)]
producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(2)]
input_token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(3)]
class caikit.interfaces.nlp.data_model.SentenceSimilarityScores[source]

Bases: caikit.core.DataObjectBase

Scores for a sentence similarity task

scores: py_to_proto.dataclass_to_proto.Annotated[List[float], FieldNumber(1)]
class caikit.interfaces.nlp.data_model.ChunkerTokenizationStreamResult[source]

Bases: TokenizationStreamResult

Streaming tokenization result that provides pointer to the input chunk processed

input_start_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(20)]
input_end_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(21)]
class caikit.interfaces.nlp.data_model.Token[source]

Bases: caikit.core.DataObjectBase

Tokens here are the basic units of text. Tokens can be characters, words, sub-words, or other segments of text or code, depending on the method of tokenization chosen or the task being implemented.

start: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(1)]
end: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
text: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(3)]
class caikit.interfaces.nlp.data_model.TokenizationResults[source]

Bases: caikit.core.DataObjectBase

Tokenization result generated from a text.

results: py_to_proto.dataclass_to_proto.Annotated[List[Token] | None, FieldNumber(1)]
token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(4)]
class caikit.interfaces.nlp.data_model.TokenizationStreamResult[source]

Bases: TokenizationResults

Streaming tokenization result that indicates up to where in stream is processed.

processed_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
start_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(3)]
class caikit.interfaces.nlp.data_model.FinishReason(*args, **kwds)[source]

Bases: enum.Enum

Create a collection of name/value pairs.

Example enumeration:

>>> class Color(Enum):
...     RED = 1
...     BLUE = 2
...     GREEN = 3

Access them by:

  • attribute access:

>>> Color.RED
<Color.RED: 1>
  • value lookup:

>>> Color(1)
<Color.RED: 1>
  • name lookup:

>>> Color['RED']
<Color.RED: 1>

Enumerations can be iterated over, and know how many members they have:

>>> len(Color)
3
>>> list(Color)
[<Color.RED: 1>, <Color.BLUE: 2>, <Color.GREEN: 3>]

Methods can be added to enumerations, and members can have their own attributes – see the documentation for details.

NOT_FINISHED = 0
MAX_TOKENS = 1
EOS_TOKEN = 2
CANCELLED = 3
TIME_LIMIT = 4
STOP_SEQUENCE = 5
TOKEN_LIMIT = 6
ERROR = 7
class caikit.interfaces.nlp.data_model.GeneratedTextResult[source]

Bases: caikit.core.DataObjectBase

A DataObject is a data model class that is backed by a @dataclass.

Data model classes that use the @dataobject decorator must derive from this base class.

generated_text: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)]
generated_tokens: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
finish_reason: py_to_proto.dataclass_to_proto.Annotated[FinishReason, FieldNumber(3)]
producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(4)]
input_token_count: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(5)]
seed: py_to_proto.dataclass_to_proto.Annotated[numpy.uint64 | None, FieldNumber(6)]
tokens: py_to_proto.dataclass_to_proto.Annotated[List[GeneratedToken] | None, FieldNumber(7)]
input_tokens: py_to_proto.dataclass_to_proto.Annotated[List[GeneratedToken] | None, FieldNumber(8)]
class caikit.interfaces.nlp.data_model.GeneratedTextStreamResult[source]

Bases: caikit.core.DataObjectBase

A DataObject is a data model class that is backed by a @dataclass.

Data model classes that use the @dataobject decorator must derive from this base class.

generated_text: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)]
tokens: py_to_proto.dataclass_to_proto.Annotated[List[GeneratedToken] | None, FieldNumber(2)]
details: py_to_proto.dataclass_to_proto.Annotated[TokenStreamDetails | None, FieldNumber(3)]
producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.common.data_model.ProducerId, FieldNumber(4)]
input_tokens: py_to_proto.dataclass_to_proto.Annotated[List[GeneratedToken] | None, FieldNumber(5)]
class caikit.interfaces.nlp.data_model.GeneratedToken[source]

Bases: caikit.core.DataObjectBase

A DataObject is a data model class that is backed by a @dataclass.

Data model classes that use the @dataobject decorator must derive from this base class.

text: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(1)]
logprob: py_to_proto.dataclass_to_proto.Annotated[float | None, FieldNumber(3)]
rank: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(4)]
class caikit.interfaces.nlp.data_model.TokenStreamDetails[source]

Bases: caikit.core.DataObjectBase

A DataObject is a data model class that is backed by a @dataclass.

Data model classes that use the @dataobject decorator must derive from this base class.

finish_reason: py_to_proto.dataclass_to_proto.Annotated[FinishReason, FieldNumber(1)]
generated_tokens: py_to_proto.dataclass_to_proto.Annotated[numpy.uint32, FieldNumber(2)]
seed: py_to_proto.dataclass_to_proto.Annotated[numpy.uint64 | None, FieldNumber(3)]
input_token_count: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(4)]