caikit.interfaces.ts.data_model
Data model definitions for structures in the time series domain
Submodules
- caikit.interfaces.ts.data_model._single_timeseries
- caikit.interfaces.ts.data_model.backends
- caikit.interfaces.ts.data_model.package
- caikit.interfaces.ts.data_model.time_types
- caikit.interfaces.ts.data_model.timeseries
- caikit.interfaces.ts.data_model.timeseries_evaluation
- caikit.interfaces.ts.data_model.toolkit
Attributes
Classes
A PeriodicTimeSequence represents an indefinite time sequence where ticks |
|
A PointTimeSequence represents a finite sequence of time points that may |
|
A nanosecond value that can be interpreted as either a datetime or a |
|
The core data model object for a TimeDuration |
|
The core data model object for a TimePoint |
|
A ValueSequence is a finite list of contiguous values, each representing |
|
The TimeSeries object is the central data container for the library. |
|
A DataObject is a data model class that is backed by a @dataclass. |
|
A single instance of Id |
|
A single EvaluationRecord for EvaluationResult |
|
EvaluationResult containing the evaluation results |
Package Contents
- caikit.interfaces.ts.data_model.TS_PACKAGE = 'caikit_data_model.timeseries'
- class caikit.interfaces.ts.data_model.PeriodicTimeSequence[source]
Bases:
caikit.core.DataObjectBaseA PeriodicTimeSequence represents an indefinite time sequence where ticks occur at a regular period
- period_length: py_to_proto.dataclass_to_proto.Annotated[TimeDuration, FieldNumber(2)]
- class caikit.interfaces.ts.data_model.PointTimeSequence[source]
Bases:
caikit.core.DataObjectBaseA PointTimeSequence represents a finite sequence of time points that may or may not be evenly distributed in time
- class caikit.interfaces.ts.data_model.Seconds[source]
Bases:
caikit.core.DataObjectBaseA nanosecond value that can be interpreted as either a datetime or a timedelta
- seconds: py_to_proto.dataclass_to_proto.Annotated[float, FieldNumber(1)]
- as_datetime() datetime.datetime[source]
Return a python datetime object. The returned object will have timezone.utc set as its timezone info.
- class caikit.interfaces.ts.data_model.TimeDuration[source]
Bases:
caikit.core.DataObjectBaseThe core data model object for a TimeDuration
- time: py_to_proto.dataclass_to_proto.Annotated[int, OneofField('dt_int'), FieldNumber(1)] | py_to_proto.dataclass_to_proto.Annotated[float, OneofField('dt_float'), FieldNumber(2)] | py_to_proto.dataclass_to_proto.Annotated[str, OneofField('dt_str'), FieldNumber(3)] | py_to_proto.dataclass_to_proto.Annotated[Seconds, OneofField('dt_sec'), FieldNumber(4)]
- class caikit.interfaces.ts.data_model.TimePoint[source]
Bases:
caikit.core.DataObjectBaseThe core data model object for a TimePoint
- class caikit.interfaces.ts.data_model.ValueSequence[source]
Bases:
caikit.core.DataObjectBaseA ValueSequence is a finite list of contiguous values, each representing the value of a given attribute for a specific observation within a TimeSeries
- class IntValueSequence[source]
Bases:
caikit.core.DataObjectBaseNested value sequence of integers
- values: py_to_proto.dataclass_to_proto.Annotated[List[int], FieldNumber(1)]
- class FloatValueSequence[source]
Bases:
caikit.core.DataObjectBaseNested value sequence of floats
- values: py_to_proto.dataclass_to_proto.Annotated[List[float], FieldNumber(1)]
- class StrValueSequence[source]
Bases:
caikit.core.DataObjectBaseNested value sequence of strings
- values: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(1)]
- class VectorValueSequence[source]
Bases:
caikit.core.DataObjectBaseNested value sequence of vectors
- class TimePointSequence[source]
Bases:
caikit.core.DataObjectBaseNested value sequence of TimePoints
- values: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(1)]
- class AnyValueSequence[source]
Bases:
caikit.core.DataObjectBaseNested value sequence of Any objects
- values: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(1)]
- classmethod decode_values(values: Tuple[str])[source]
Cached class method to enable caching of decoded representations
- sequence: py_to_proto.dataclass_to_proto.Annotated[ValueSequence.IntValueSequence, OneofField('val_int'), FieldNumber(1)] | py_to_proto.dataclass_to_proto.Annotated[ValueSequence.FloatValueSequence, OneofField('val_float'), FieldNumber(2)] | py_to_proto.dataclass_to_proto.Annotated[ValueSequence.StrValueSequence, OneofField('val_str'), FieldNumber(3)] | py_to_proto.dataclass_to_proto.Annotated[ValueSequence.TimePointSequence, OneofField('val_timepoint'), FieldNumber(4)] | py_to_proto.dataclass_to_proto.Annotated[ValueSequence.AnyValueSequence, OneofField('val_any'), FieldNumber(5)] | py_to_proto.dataclass_to_proto.Annotated[ValueSequence.VectorValueSequence, OneofField('val_vector'), FieldNumber(6)]
- class caikit.interfaces.ts.data_model.SingleTimeSeries(*args, **kwargs)[source]
Bases:
caikit.core.DataObjectBaseThe TimeSeries object is the central data container for the library. At present it wraps either a pandas.DataFrame, or pyspark.sql.DataFrame to bind into the caikit data model.
- class StringIDSequence[source]
Bases:
caikit.core.DataObjectBaseNested value sequence of strings
- values: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(1)]
- class IntIDSequence[source]
Bases:
caikit.core.DataObjectBaseNested value sequence of ints
- values: py_to_proto.dataclass_to_proto.Annotated[List[int], FieldNumber(1)]
- time_sequence: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.ts.data_model.time_types.PeriodicTimeSequence, OneofField('time_period'), FieldNumber(10)] | py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.ts.data_model.time_types.PointTimeSequence, OneofField('time_points'), FieldNumber(20)]
- values: py_to_proto.dataclass_to_proto.Annotated[List[caikit.interfaces.ts.data_model.time_types.ValueSequence], FieldNumber(1)]
- timestamp_label: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(2)]
- value_labels: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(3)]
- ids: py_to_proto.dataclass_to_proto.Annotated[SingleTimeSeries.IntIDSequence, OneofField('id_int'), FieldNumber(30)] | py_to_proto.dataclass_to_proto.Annotated[SingleTimeSeries.StringIDSequence, OneofField('id_str'), FieldNumber(40)]
- _DEFAULT_TS_COL = 'timestamp'
- _get_pd_df() Tuple[pandas.DataFrame, str, Iterable[str]][source]
Convert the data to a pandas DataFrame, efficiently if possible
- __eq__(other: SingleTimeSeries) bool[source]
Equivalence operator for SingleTimeSeries objects.
Performs ordering of data based on timestamp_label prior to checking for equivalence. Relies on underlying pandas equivalence testing function pd.testing.assert_frame_equal.
- Args:
other (SingleTimeSeries): SingleTimeSeries to test against.
- Returns:
bool: True if the SingleTimeSeries are equivalent.
- _as_pandas_ops(adf, include_timestamps: None | bool = False)[source]
operate on pandas-like object instead of strictly pandas
- as_pandas(include_timestamps: bool | None = None) pandas.DataFrame[source]
Get the view of this timeseries as a pandas DataFrame
- Args:
include_timestamps (bool, optional): Control the addition or removal of timestamps. True will include timestamps, generating if needed, while False will remove timestamps. Use None to returned what is available, leaving unchanged. Defaults to None.
- Returns:
pd.DataFrame: The view of the data as a pandas DataFrame
- as_spark(include_timestamps: bool | None = None) caikit.interfaces.ts.data_model.toolkit.optional_dependencies.pyspark.sql.DataFrame[source]
Get the view of this timeseries as a spark DataFrame
- Args:
include_timestamps (bool, optional): Control the addition or removal of timestamps. True will include timestamps, generating if needed, while False will remove timestamps. Use None to returned what is available, leaving unchanged. Defaults to None.
- Returns:
pyspark.sql.DataFrame: The view of the data as a spark DataFrame
- class caikit.interfaces.ts.data_model.TimeSeries(*args, **kwargs)[source]
Bases:
caikit.core.DataObjectBaseA DataObject is a data model class that is backed by a @dataclass.
Data model classes that use the @dataobject decorator must derive from this base class.
- timeseries: List[caikit.interfaces.ts.data_model._single_timeseries.SingleTimeSeries]
- id_labels: List[str]
- producer_id: caikit.core.data_model.ProducerId
- _DEFAULT_ID_COL = '_TS_RESERVED'
- _DEFAULT_TS_COL = 'timestamp'
- __eq__(other: TimeSeries) bool[source]
Equivalence operator for TimeSeries objects.
- Args:
other (TimeSeries): TimeSeries to test against.
- Returns:
bool: True if the TimeSeries are equivalent.
- _get_pd_df() Tuple[pandas.DataFrame, Iterable[str], str, Iterable[str]][source]
Convert the data to a pandas DataFrame, efficiently if possible
- as_pandas(include_timestamps: bool | None = None, is_multi: bool | None = None) pandas.DataFrame[source]
Get the view of this timeseries as a pandas DataFrame
- Args:
include_timestamps (bool, optional): Control the addition or removal of timestamps. True will include timestamps, generating if needed, while False will remove timestamps. Use None to returned what is available, leaving unchanged. Defaults to None.
is_multi (bool, optional): Controls how id_labels are handled in the output. If the id_labels are specified in the data model, they are always returned. If there are no id_labels specified, setting is_multi to True will add a new column with generated id labels (0), while False or None will not add any id_labels.
- Returns:
pd.DataFrame: The view of the data as a pandas DataFrame
- as_spark(include_timestamps: bool | None = None, is_multi: bool | None = None) caikit.interfaces.ts.data_model.toolkit.optional_dependencies.pyspark.sql.DataFrame[source]
Get the view of this timeseries as a spark DataFrame
- Args:
include_timestamps (bool, optional): Control the addition or removal of timestamps. True will include timestamps, generating if needed, while False will remove timestamps. Use None to returned what is available, leaving unchanged. Defaults to None.
is_multi (bool, optional): Controls how id_labels are handled in the output. If the id_labels are specified in the data model, they are always returned. If there are no id_labels specified, setting is_multi to True will add a new column with generated id labels (0), while False or None will not add any id_labels.
- Returns:
pyspark.sql.DataFrame: The view of the data as a spark DataFrame
- class caikit.interfaces.ts.data_model.Id[source]
Bases:
caikit.core.DataObjectBaseA single instance of Id Representation of ids that can be either text or index. Customized this way to be able to work with repeated
- value: py_to_proto.dataclass_to_proto.Annotated[str, OneofField('text'), FieldNumber(1)] | py_to_proto.dataclass_to_proto.Annotated[int, OneofField('index'), FieldNumber(2)]
- class caikit.interfaces.ts.data_model.EvaluationRecord(id_values=None, metric_values=None, offset=None)[source]
Bases:
caikit.core.DataObjectBaseA single EvaluationRecord for EvaluationResult Representation of EvaluationRecord for each row in the dataframe EvaluationRecord{id_values=[“A”, “B”], metric_values=[0.234, 0.568, 0.417], offset=”overall”}
- metric_values: py_to_proto.dataclass_to_proto.Annotated[List[float], FieldNumber(2)]
- class caikit.interfaces.ts.data_model.EvaluationResult(records=None, id_cols=None, metric_cols=None, offset_col=None, df=None, producer_id=None)[source]
Bases:
caikit.core.DataObjectBaseEvaluationResult containing the evaluation results Representation of EvaluationResult stores rows of the dataframe as list of records string lists to keep track of id and metric columns
- records: py_to_proto.dataclass_to_proto.Annotated[List[EvaluationRecord], FieldNumber(1)]
- id_cols: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(2)]
- metric_cols: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(3)]
- offset_col: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(4)]
- producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.core.data_model.ProducerId, FieldNumber(5)]