caikit.interfaces.ts.data_model.toolkit.sparkconf ================================================= .. py:module:: caikit.interfaces.ts.data_model.toolkit.sparkconf .. autoapi-nested-parse:: Defines function(s) for obtaining a spark configurations. Attributes ---------- .. autoapisummary:: caikit.interfaces.ts.data_model.toolkit.sparkconf.WE_HAVE_PYSPARK Functions --------- .. autoapisummary:: caikit.interfaces.ts.data_model.toolkit.sparkconf.sparkconf_local caikit.interfaces.ts.data_model.toolkit.sparkconf.sparkconf_k8s Module Contents --------------- .. py:data:: WE_HAVE_PYSPARK .. py:function:: sparkconf_local(master: str = 'local[2]', executor_memory: str = '2g', driver_memory: str = '2g', app_name: str = 'unnamed', **kwargs) Returns a SparkConf object configured for spark-local operation Args: executor_memory (str, optional): Exectuor memory. Defaults to "2g". driver_memory (str, optional): Driver memory. Defaults to "2g". app_name (str, optional): Spark application name. Defaults to "unnamed". kwargs: passthru key,value arguments that will be added to the spark configuration Returns: SparkConf: a spark configuration object. .. py:function:: sparkconf_k8s(app_name: str, namespace: str, executor_image: str, driver_image: str, master: str = 'k8s://https://kubernetes.default.svc:443', num_executors: str = '2', executor_memory: str = '1g', executor_cores: str = '2', driver_memory: str = '1g', driver_cores: str = '2', pvc_mount_path: Union[str, None] = None, pvc_claim_name: Union[str, None] = None, python_path: Union[str, None] = None, k8s_service_account: Union[str, None] = None, **kwargs) Return a spark configuraion object for use on a kubernetes cluster. For more information on what some of these parameters are for see https://spark.apache.org/docs/latest/running-on-kubernetes.html NOTE: if you are simply running a local spark job, we advise you use the sparkconf_local method instead as it has fewer parameters and more defaults to get you going more quickly. Args: app_name (str): The application name (useful for for keeping track of jobs on a multiuser cluster) namespace (str): k8s namespace in which this job will run (e.g., "default") executor_image (str): The container image to use for spark executors. driver_image (str): The spark driver image to use (tpyically the same as exectuor image) master (_type_, optional): The master specificication. Defaults to "k8s://https://kubernetes.default.svc:443". num_executors (str, optional): The number of executors to run. Defaults to "2". executor_memory (str, optional): The maximum memory allocated to each executor (use g or M notation). Defaults to "1g". executor_cores (str, optional): The maximum number of cores per executor. Defaults to "2". driver_memory (str, optional): The maxumum memory allocated to the driver. Defaults to "1g". driver_cores (str, optional): The maximum number of cores allocated to the driver. Defaults to "2". pvc_mount_path (str | None, optional): The PVC mount path for exectuors and driver to mount (this usually has to be rwX). Defaults to None. pvc_claim_name (str | None, optional): The PVC claim name assocated with the PVC mount. Defaults to None. python_path (str | None, optional): The python path to use in python jobs in executor and driver python processes. Defaults to None. k8s_service_account (str | None, optional): The k8s service account to use. Defaults to None. kwargs: passthru key,value arguments that will be added to the spark configuration Returns: SparkConf: A spark configuration that has been defined in a way that makes it compatible with time series use cases and intended for use with a k8s cluster.