caikit.interfaces.ts.data_model.toolkit.sparkconf
=================================================

.. py:module:: caikit.interfaces.ts.data_model.toolkit.sparkconf

.. autoapi-nested-parse::

   Defines function(s) for obtaining a spark configurations.


Attributes
----------

.. autoapisummary::

   caikit.interfaces.ts.data_model.toolkit.sparkconf.WE_HAVE_PYSPARK


Functions
---------

.. autoapisummary::

   caikit.interfaces.ts.data_model.toolkit.sparkconf.sparkconf_local
   caikit.interfaces.ts.data_model.toolkit.sparkconf.sparkconf_k8s


Module Contents
---------------

.. py:data:: WE_HAVE_PYSPARK

.. py:function:: sparkconf_local(master: str = 'local[2]', executor_memory: str = '2g', driver_memory: str = '2g', app_name: str = 'unnamed', **kwargs)

   Returns a SparkConf object configured for spark-local operation

   Args:
       executor_memory (str, optional): Exectuor memory. Defaults to "2g".
       driver_memory (str, optional): Driver memory. Defaults to "2g".
       app_name (str, optional): Spark application name. Defaults to "unnamed".
       kwargs: passthru key,value arguments that will be added to the spark configuration

   Returns:
       SparkConf: a spark configuration object.


.. py:function:: sparkconf_k8s(app_name: str, namespace: str, executor_image: str, driver_image: str, master: str = 'k8s://https://kubernetes.default.svc:443', num_executors: str = '2', executor_memory: str = '1g', executor_cores: str = '2', driver_memory: str = '1g', driver_cores: str = '2', pvc_mount_path: Union[str, None] = None, pvc_claim_name: Union[str, None] = None, python_path: Union[str, None] = None, k8s_service_account: Union[str, None] = None, **kwargs)

   Return a spark configuraion object for use on a kubernetes cluster. For more information on
   what some of these parameters are for see
   https://spark.apache.org/docs/latest/running-on-kubernetes.html

   NOTE: if you are simply running a local spark job, we advise you use the sparkconf_local method
   instead as it has fewer parameters and more defaults to get you going more quickly.

   Args:
       app_name (str): The application name (useful for for keeping track of jobs on a multiuser
           cluster)
       namespace (str): k8s namespace in which this job will run (e.g., "default")
       executor_image (str): The container image to use for spark executors.
       driver_image (str): The spark driver image to use (tpyically the same as exectuor image)
       master (_type_, optional): The master specificication. Defaults to
           "k8s://https://kubernetes.default.svc:443".
       num_executors (str, optional): The number of executors to run. Defaults to "2".
       executor_memory (str, optional): The maximum memory allocated to each executor (use g or M
           notation). Defaults to "1g".
       executor_cores (str, optional): The maximum number of cores per executor. Defaults to "2".
       driver_memory (str, optional): The maxumum memory allocated to the driver. Defaults to
           "1g".
       driver_cores (str, optional): The maximum number of cores allocated to the driver. Defaults
           to "2".
       pvc_mount_path (str | None, optional): The PVC mount path for exectuors and driver to mount
           (this usually has to be rwX). Defaults to None.
       pvc_claim_name (str | None, optional): The PVC claim name assocated with the PVC mount.
           Defaults to None.
       python_path (str | None, optional): The python path to use in python jobs in executor and
           driver python processes. Defaults to None.
       k8s_service_account (str | None, optional): The k8s service account to use. Defaults to
           None.
       kwargs: passthru key,value arguments that will be added to the spark configuration

   Returns:
       SparkConf: A spark configuration that has been defined in a way that makes it compatible
           with time series use cases and intended for use with a k8s cluster.