caikit.interfaces.ts.data_model.backends.dfcache

Utilities related to manageing spark DataFrame caching

Functions

ensure_spark_cached(→ pyspark.sql.DataFrame)

Will ensure that a given dataframe is cached.

Module Contents

caikit.interfaces.ts.data_model.backends.dfcache.ensure_spark_cached(dataframe: pyspark.sql.DataFrame) pyspark.sql.DataFrame

Will ensure that a given dataframe is cached. If dataframe is already cached it does nothing. If it’s not cached, it will cache it and then uncache the object when the ensure_spark_cached object container goes out of scope. Users must utilize the with pattern of access.

Example: ```python

with ensure_spark_cached(df) as _:

# do dataframey sorts of things on df # it’s guarenteed to be cached # inside this block

# that’s it, you’re done. # df remains cached if it already was # or it’s no longer cached if it wasn’t # before entering the with block above.

```