From bc3566c6d9d2d791f30b1432b307e0eabfe11483 Mon Sep 17 00:00:00 2001 From: Kevin Bates Date: Tue, 10 Dec 2019 10:16:37 -0800 Subject: [PATCH 1/2] [DRAFT] Parameterized Kernel Launch --- parameterized-launch/parameterized-launch.md | 84 ++++++++++++++++++++ 1 file changed, 84 insertions(+) create mode 100644 parameterized-launch/parameterized-launch.md diff --git a/parameterized-launch/parameterized-launch.md b/parameterized-launch/parameterized-launch.md new file mode 100644 index 00000000..e22b5337 --- /dev/null +++ b/parameterized-launch/parameterized-launch.md @@ -0,0 +1,84 @@ +# Parameterized Kernel Launch + +This proposal is rooted in the [jupyter_kernel_mgmt](https://github.com/takluyver/jupyter_kernel_mgmt) repo because it relies on the Kernel Provider model introduced in this library. As a result, it is dependent upon the acceptance of [JEP #45](https://github.com/jupyter/enhancement-proposals/pull/45). In addition, the proposal (optionally) affects other repositories, namely [jupyter_server](https://github.com/jupyter/jupyter_server), [jupyterlab](https://github.com/jupyterlab/jupyterlab), [notebook](https://github.com/jupyter/notebook), [voila](https://github.com/voila-dashboards/voila) and any other client-facing applications that launch kernels once jupyter_server is adopted as the primary backend server. + +This proposal formalizes the [changes that introduced launch parameters](https://github.com/takluyver/jupyter_kernel_mgmt/pull/22) by defining kernel launch parameter metadata and how it is to be returned from kernel providers and interpreted by client applications. This feature is known as _Parameterized Kernel Launch_ (a.k.a _Parameterized Kernels_). It includes 'launch' because many of the parameters really apply to the _context_ in which the kernel will run and are not actual parameters to the kernel. Things like memory, cpus, and gpus are examples of "contextual" parameters. This proposal was [originally posted as an issue](https://github.com/takluyver/jupyter_kernel_mgmt/issues/38) in jupyter_kernel_mgmt, but has since been trasitioned to this enhancement proposal. Please note that I have added some content since the original issue was posted. + + + +## Launch Parameter Schema +The set of available launch parameters for a given kernel will be conveyed from the server to the client application via the _kernel type_ information (formerly known as the kernelspec) as JSON returned from the `/api/kernelspecs` REST endpoint. When available, launch parameter metadata will be included within the existing `metadata` stanza under `launch_parameter_schema`, and will consist of JSON schema that describes each available parameter. Because this is pure JSON schema, this information can convey required values, default values, choice lists, etc. and be easily consumed by applications. (Although I'd prefer to avoid this, we could introduce a custom schema if we find the generic schema metadata is not sufficient.) + +```json + "metadata": { + "launch_parameter_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "Available parameters for kernel type 'Spark - Scala (Kubernetes)'", + "properties": { + "cpus": {"type": "number", "minimum": 0.5, "maximum": 8.0, "default": 4.0, "description": "The number of CPUs to use for this kernel"}, + "memory": {"type": "integer", "minimum": 2, "maximum": 1024, "default": 8, "description": "The number of GB to reserve for memory for this kernel"} + }, + "required": ["cpus"] + } + } +``` +Because the population of the `metadata.launch_parameter_schema` entry is a function of the _kernel provider_[1], how the provider determines what to include is an implementation detail. The requirement is that `metadata.launch_parameter_schema` contain valid JSON schema. However, since nearly 100% of kernels today are based on kernelspec information located in kernel.json, this proposal will also address how the `KernelSpecProvider` goes about composing `metadata.launch_parameter_schema` and acting on the returned parameter values. + +## KernelSpecProvider Schema Population +I believe we should support two forms of population, referential and embedded, both of which can be used simultaneously. +### Referential Schema Population +Referential schema population is intended for launch parameters that are shared across kernel configurations, typically the aforementioned "contextual" parameters. When the `KernelSpecProvider` loads the kernel.json file, it will look for a key under `metadata` named `launch_parameter_schema_file`. If the key exists and its value is an existing file, that file's contents will be loaded into a dictionary object. +### Embedded Schema Population +Once the referential population step has taken place, the `KernelSpecProvider` will check if `metadata.launch_parameter_schema` exists and contains a value. If so, the KernelSpecProvider will load that value, then update the dictionary resulting from the referential population step. This allows _per-kernel_ parameter information to override the shared parameter information. For example, some kernel types may require more cpus that aren't generally available to all kernel types. + +`KernelSpecProvider` will then use the merged dictionaries from the two population steps as the value for `metadata.launch_parameter_schema` that is returned from its `find_kernels()` method and, ultimately, the `/api/kernelspecs` REST API. Any entry for `metadata.launch_parameter_schema_file` will not appear in the returned payload. + +## Client Applications +_Parameter-aware_ applications that retrieve kernel type information from `/api/kernelspecs` will recognize the existence of any `metadata.launch_parameter_schema` values. When a kernel type is selected and contains launch parameter schema information, the application should construct a dialog from the schema that prompts for parameter values. Required values should be noted and default values should be pre-filled. (We will need to emphasize that all required values have reasonable defaults, but how that is handled is more a function of the kernel provider.) + +Once the application has obtained the desired set of parameters, it will create an entry in the JSON body of the `/api/kernels` POST request that is a dictionary of name/value pairs. The key under which this set of pairs resides will be named `launch_params`. The kernels handler will then pass this dictionary to the framework, where the kernel provider launch method will act on it. + +```json + "launch_params": { + "cpus": 4, + "memory": 512 + } +``` + +Note that applications that are unaware of `launch_parameter_schema` will still behave in a reasonable manner provided the kernel provider applies reasonable default values to any required parameters. + +In addition, it would be beneficial if the set of parameter name/value pairs could be added into the notebook metadata so that subsequent launch attempts could use _those_ values in the pre-filled dialog. + +## Kernel Provider Launch +Once the kernel provider launch method is called, the provider should validate the parameters and their values against the schema. Any validation errors should result in a failure to launch - although the decision to fail the launch will be a function of the kernel provider. The provider will need to differentiate between "contextual" parameters and actual kernel parameters and apply the values appropriately. `jupyter_kernel_mgmt` will likely provide a helper method for validation. + +Note: Since KernelSpecProvider will be the primary provider, at least initially, applications that wish to take advantage kernel launch parameters may want to create their own providers. Fortunately, we've provided a mechanism whereby KernelSpecProvider can be extended such that much of the discovery and launch machinery can be reused. In these cases, the kernel.json file would need to be prefixed with the new provider id so that `KernelSpecProvider` doesn't include those same kernel types in its set. + +## Environment Variables +A common mechanism in use today to vary a kernel's launch behavior utilizes environment variables. These variables are conveyed to the launch mechanism and set into the kernel's environment when launched. Since environment variables are commonly used in containerized contexts, we may want to support the ability for their specification within this mechanism. There are a few options to distinguish these kinds of parameters from "contextual" and kernel-specific parameters, if at all (option 4). + +1. Use a custom schema that defines a `is_env` meta-property. Schema entries with `is_env=True` will be set into the kernel's environment. I'd prefer to avoid a custom schema since it would require access to its definition and introduces more deployment/configuration issues. +2. Create an explicit sub-section in `launch_parameter_schema` named `env_variables` that define the metadata corresponding to environmental variables. The payload on the subsequent POST request (to start a kernel) would then also include an `env_variables` sub-section consisting of the name/value pairs that the kernel provider ensures are placed into the target kernel's environment. +3. Have an implicit rule that parameter names that are completely capitalized (with underscores: `[A-Z][A-Z_]*`) are treated as environment variables. +4. Do nothing. The kernel provider will know which parameters correspond to environment variables, "contextual" variables or kernel-specific parameters. This approach implies that the client-facing application doesn't need to expose a given parameter as an environmental variable - which, technically, is just an implementation detail anyway. + +Note: where option 4 breaks down is for kernel providers that are generically written. For example `KernelSpecProvider` will be the provider for 90% of kernels. Should _it_ be the entity that knows a given parameter should be interpreted as an environment variable? As a result, a more explicit mechanism as proposed in the first 3 options is probably warranted. + + +## Virtual Kernel Types +One of the advantages of kernel launch parameters is that one could conceivably have a single kernel configured, yet allow for a plethora of configuration options based on the parameter values - as @rgbkrk points out [here](https://github.com/takluyver/jupyter_kernel_mgmt/issues/9#issuecomment-496434455) - since this facility essentially _fabricates_ kernel types that, today, would require a separate type for each set of options. + +## Backwards Compatibility +Parameter-aware applications that are receiving results from the `/api/kernelspecs` REST API must be able to tolerate the _non-existence_ of `metadata.launch_parameter_schema` within the kernelspec results. Likewise, parameter-unaware applications will need to ignore the parameter stanza - which is the case today. + +Kernel providers that support parameterized launches must also handle required parameters by providing reasonable defaults. In addition, they must not assume that the application will provide those defaults - despite the fact that the schema for those required parameters define default values - since the application that is requesting the kernel start (via the POST on `/api/kernels`) may be unaware of this parameter mechanism. + +## References +https://github.com/takluyver/jupyter_kernel_mgmt/pull/22 +https://github.com/jupyter/jupyter_client/issues/434 +https://github.com/jupyter/enterprise_gateway/issues/640 +https://paper.dropbox.com/doc/Day-1-Kernels-jupyter_client-IPython-Notebook-server--ApyJEjYtqrjfoPg1QpbxZfcpAg-MyS7d8X4wkkhRQy7wClXY +https://github.com/takluyver/jupyter_kernel_mgmt/issues/9 + +[1]: _Kernel Provider_ is a term introduced by the proposed [Jupyter Kernel Management](https://github.com/jupyter/enhancement-proposals/pull/45) package which enables the ability for third-party applications to bring their own kernel management (and discovery) mechanisms that can co-exist with other third-party applications doing the same thing. Previously, exactly one override of `KernelManager` (and for discovery, `KernelSpecManager`) could be supported at a time. + From 4c0727fbe03653aba789de67c388e2d28cda4cb9 Mon Sep 17 00:00:00 2001 From: Kevin Bates Date: Sun, 15 Aug 2021 17:58:50 -0700 Subject: [PATCH 2/2] Updates to reflect Kernel Provisioners --- parameterized-launch/parameterized-launch.md | 217 ++++++++++++++----- 1 file changed, 165 insertions(+), 52 deletions(-) diff --git a/parameterized-launch/parameterized-launch.md b/parameterized-launch/parameterized-launch.md index e22b5337..bb9adf81 100644 --- a/parameterized-launch/parameterized-launch.md +++ b/parameterized-launch/parameterized-launch.md @@ -1,84 +1,197 @@ # Parameterized Kernel Launch -This proposal is rooted in the [jupyter_kernel_mgmt](https://github.com/takluyver/jupyter_kernel_mgmt) repo because it relies on the Kernel Provider model introduced in this library. As a result, it is dependent upon the acceptance of [JEP #45](https://github.com/jupyter/enhancement-proposals/pull/45). In addition, the proposal (optionally) affects other repositories, namely [jupyter_server](https://github.com/jupyter/jupyter_server), [jupyterlab](https://github.com/jupyterlab/jupyterlab), [notebook](https://github.com/jupyter/notebook), [voila](https://github.com/voila-dashboards/voila) and any other client-facing applications that launch kernels once jupyter_server is adopted as the primary backend server. - -This proposal formalizes the [changes that introduced launch parameters](https://github.com/takluyver/jupyter_kernel_mgmt/pull/22) by defining kernel launch parameter metadata and how it is to be returned from kernel providers and interpreted by client applications. This feature is known as _Parameterized Kernel Launch_ (a.k.a _Parameterized Kernels_). It includes 'launch' because many of the parameters really apply to the _context_ in which the kernel will run and are not actual parameters to the kernel. Things like memory, cpus, and gpus are examples of "contextual" parameters. This proposal was [originally posted as an issue](https://github.com/takluyver/jupyter_kernel_mgmt/issues/38) in jupyter_kernel_mgmt, but has since been trasitioned to this enhancement proposal. Please note that I have added some content since the original issue was posted. - - - -## Launch Parameter Schema -The set of available launch parameters for a given kernel will be conveyed from the server to the client application via the _kernel type_ information (formerly known as the kernelspec) as JSON returned from the `/api/kernelspecs` REST endpoint. When available, launch parameter metadata will be included within the existing `metadata` stanza under `launch_parameter_schema`, and will consist of JSON schema that describes each available parameter. Because this is pure JSON schema, this information can convey required values, default values, choice lists, etc. and be easily consumed by applications. (Although I'd prefer to avoid this, we could introduce a custom schema if we find the generic schema metadata is not sufficient.) - -```json - "metadata": { - "launch_parameter_schema": { - "$schema": "http://json-schema.org/draft-07/schema#", - "title": "Available parameters for kernel type 'Spark - Scala (Kubernetes)'", - "properties": { - "cpus": {"type": "number", "minimum": 0.5, "maximum": 8.0, "default": 4.0, "description": "The number of CPUs to use for this kernel"}, - "memory": {"type": "integer", "minimum": 2, "maximum": 1024, "default": 8, "description": "The number of GB to reserve for memory for this kernel"} - }, - "required": ["cpus"] - } +With the introduction of [_Kernel Provisioners_](https://jupyter-client.readthedocs.io/en/latest/provisioning.html) in the upcoming jupyter_client 7.0 release, we now have a means by which a kernel's runtime environment can be easily configured. This is because the kernel provisioner is the entity most knowledgeable about _where_ and _how_ a given kernel will run. + +This feature is known as _Parameterized Kernel Launch_ (a.k.a _Parameterized Kernels_). It includes 'launch' because many of the parameters really apply to the _context_ in which the kernel will run and are not actual parameters to the kernel. Things like memory, cpus, and gpus are examples of "provisioner" parameters. + +This proposal formalizes how kernel parameterization is expressed via the kernel specification, how it is derived from the kernel provisioner, and acted upon by client applications. + +Because kernel provisioners are essentially kernel-agnostic, there are _two_ sets of parameters we should address in this proposal: _provisioner parameters_ and _kernel parameters_. Provisioner parameters are known to the kernel provisioner and influence the kernel's runtime environment, while kernel parameters are specific to the kernel and typically influence its behavior. + +## Provisioner and Kernel Parameter Schemas +Both sets of parameters (provisioner and kernel) relative to a given kernel will be conveyed from the server to the client application via the kernel specification (a.k.a. the kernelspec). This information will be expressed as JSON schema returned from the `/api/kernelspecs` REST endpoint or from the `KernelSpecManager` directly (for command-line based applications). + +When available, parameter metadata will be included within the existing `metadata` stanza of the kernelspec (`kernel.json`) file. Provisioner parameters schema will be located within the `kernel_provisioner` stanza in `provisioner_parameter_schema`, while kernel-specific parameters will be located within a `kernel_parameter_schema` stanza directly within the `metadata` stanza. These stanzas will consist of JSON schema that describe each available parameter. Because this is pure JSON schema, this information can convey required values, default values, choice lists, etc. and be easily consumed by applications. (Although I'd prefer to avoid this, we _could_ introduce a custom schema if we find the generic schema metadata is not sufficient.) + +Here's an example of a possible `kernel.json`'s metadata stanza... +```JSON +"metadata": { + "kernel_provisioner": { + "name": "spark-provisioner", + "config": { + "host_endpoint": "https://acme.com/spark-cluster:7777" + }, + "provisioner_parameter_schema": { + "title": "Spark Provisioner Parameters", + "properties": { + "provisioner_parameters": { + "type": "object", + "properties": { + "cpus": {"type": "number", "minimum": 0.5, "maximum": 8.0, "default": 4.0, "description": "The number of CPUs to use for this kernel"}, + "memory": {"type": "integer", "minimum": 2, "maximum": 1024, "default": 8, "description": "The number of GB to reserve for memory for this kernel"} + } + } + } + "required": ["cpus"] + } + }, + "kernel_parameter_schema": { + "title": "IPyKernel Parameters", + "type": "object", + "properties": { + "kernel_parameters": { + "type": "object", + "properties": { + "cache_size": {"type": "integer", "description": "Set the size of the output cache", "default": 1000, "minimum": 0, "maximum": 50000}, + "matplotlib": {"type": "string", "default": "auto", "enum": ["auto", "agg", "gtk", "gtk3", "inline", "ipympl", "nbagg", "notebook", "osx", "pdf", "ps", "qt", "qt4", "qt5", "svg", "tk", "widget", "wx"], "description": "Configure matplotlib for interactive use with the default matplotlib"} + } + } + } } +} ``` -Because the population of the `metadata.launch_parameter_schema` entry is a function of the _kernel provider_[1], how the provider determines what to include is an implementation detail. The requirement is that `metadata.launch_parameter_schema` contain valid JSON schema. However, since nearly 100% of kernels today are based on kernelspec information located in kernel.json, this proposal will also address how the `KernelSpecProvider` goes about composing `metadata.launch_parameter_schema` and acting on the returned parameter values. -## KernelSpecProvider Schema Population -I believe we should support two forms of population, referential and embedded, both of which can be used simultaneously. -### Referential Schema Population -Referential schema population is intended for launch parameters that are shared across kernel configurations, typically the aforementioned "contextual" parameters. When the `KernelSpecProvider` loads the kernel.json file, it will look for a key under `metadata` named `launch_parameter_schema_file`. If the key exists and its value is an existing file, that file's contents will be loaded into a dictionary object. -### Embedded Schema Population -Once the referential population step has taken place, the `KernelSpecProvider` will check if `metadata.launch_parameter_schema` exists and contains a value. If so, the KernelSpecProvider will load that value, then update the dictionary resulting from the referential population step. This allows _per-kernel_ parameter information to override the shared parameter information. For example, some kernel types may require more cpus that aren't generally available to all kernel types. - -`KernelSpecProvider` will then use the merged dictionaries from the two population steps as the value for `metadata.launch_parameter_schema` that is returned from its `find_kernels()` method and, ultimately, the `/api/kernelspecs` REST API. Any entry for `metadata.launch_parameter_schema_file` will not appear in the returned payload. +### Provisioner Parameter Schema +Because the population of the `metadata.kernel_provisioner.provisioner_parameter_schema` entry is a function of the _kernel provisioner_, how the provisioner determines what to include as its parameter schema is specific to that provisioner. The requirement is that `metadata.kernel_provisioner.provisioner_parameter_schema` contain valid JSON schema. However, since 100% of kernels today are based on kernelspec information located in `kernel.json`, this proposal will also address how the `KernelSpecManager` goes about composing `metadata.kernel_provisioner.provisioner_parameter_schema` and acting on the returned parameter values. + +#### KernelSpec Schema Population +It's important that parameter definitions be both easy to use and flexible to configure. As a result, there should be multiple sources of parameter schema with differing orders of precedence. This proposal introduces three sources of schema population not all of which are necessary. + +The KernelSpecManager will coordinate the accumulation of provisioner parameters and populate the returned kernelspec with the result, taking orders of precedence into account. + +##### Embedded Schema +Embedded schema population is when then the parameter schema is included directly in the `kernel.json` file. For example, the contents of the `kernel.json` would closely resemble the example above. With embedded schema, the administrator has the ability to influence default values, enumerations, etc. Schema defined at this location will always take precedence over other sources of parameter schema. + +##### Referential Schema +A second form of population that could be supported is referential schema population in which a reference to a file is provided and that file is then used as a basis on which the embedded schema is applied. The file reference would be conveyed as a sibling attribute to `provisioner_parameter_schema` named `provisioner_parameter_schema_file`, whose value is a path (absolute or relative) to a file containing the parameter schema. + +An advantage of referential schema population is that it could serve to define which parameters should be displayed by the client application - assuming that packaged population (see next) is not used. + +A second advantage is that a site administrator could configure the parameters across a number of kernel specifications that utilize the same provisioner with overrides defined via embedded population. + +##### Packaged Schema +Packaged schema population is when the parameter schema is obtained directly from the Kernel Provisioner itself. This form of population requires that the provisioner implement a static function `get_parameter_schema()` which returns a dictionary of schema entries defining each parameter. These parameter definitions are essentially "factory settings" on which the provisioner was implmented. The `KernelSpecManager` will be responsible for retrieving this information and merging it with any embedded schema that might exist. Since embedded and referential population forms take precedence, the output of `get_parameter_schema()` will serve as the basis, and the other populations will be applied to it. + +Some observations of the above population schemes: +1. If _packaged population_ is to be used, we should probably define the ability to hide parameters since it represents the complete set of supported parameters. This will likely introduce a schema _meta-property_ like `is_hidden` - which then implies kernel provisioning may want its own schema definition. If we decide to have our own schema definition, any additions should be defined in and exposed va `KernelProvisionerFactory`. +2. A provisioner can have its own configuration in which an administrator can define its set of _exposed parameters_ thereby removing the need for a `is_hidden` meta-property. The result of calling `get_parameter_schema()` would utilize such a configuration setting, returning the schema of only exposed parameters. +3. We should probably have each provisioner implement a form of `get_parameter_schema()` regardless of whether packaged population is used, solely for the tooling and a _source of truth_. + +### Kernel Parameter Schema +Variants for how kernel parameter schema is populated are far fewer, consisting solely of _embedded population_. (Note: we _could_ use a referential form here as well - probably worth discussing.) In addition, the manner in which these parameters are used must be generic across all kernels since provisioners are kernel-agnostic. + +Today, kernel-specific parameters must be conveyed as templated variables in the `kernel.json` `argv:` stanza. These values are substituted when the `KernelManager`, now via the `KernelProvisioner`, formats the command. As a result, the kernel provisioner will be responsible for taking the applicable kernel-specific parameter values and applying them to the argument vector. Since any kernel-specific parameters not reflected as templated values in the `argv:` stanza will be ignored, it is the administrator's responsibility to ensure the `argv:` stanza is properly templated. + +### Environment Variables +A common mechanism in use today to vary a kernel's launch behavior utilizes environment variables. These variables are conveyed to the launch mechanism and set into the kernel's environment when launched. Since environment variables are commonly used in containerized contexts, we should support the ability for their specification within this framework. + +This proposal will adopt the convention that environment variables can be specific to both kernels and provisioners. However, because both are applied in the same manner, the provisioner will be responsible for gathering environment variables and ensuring their _deployment_ into the kernel's environment. Rather than intersperse environment variables amongst parameters, each parameter schema will define an object-valued property named `environment_variables` that specifies the recognized environment variables for the kernel or provisioner. In addition, this schema should allow for additional properties (i.e., additional environment variables) since we typically find other integrations that the kernel and provisioner may be interacting with require their own environment variables. + +Here's an example of such a `kernel.json` file in which environment variable schemas are specified... +```JSON +"metadata": { + "kernel_provisioner": { + "name": "spark-provisioner", + "config": { + "host_endpoint": "https://acme.com/spark-cluster:7777" + }, + "provisioner_parameter_schema": { + "title": "Spark Provisioner Parameters", + "properties": { + "provisioner_parameters": { + "type": "object", + "properties": { + "cpus": {"type": "number", "minimum": 0.5, "maximum": 8.0, "default": 4.0, "description": "The number of CPUs to use for this kernel"}, + "memory": {"type": "integer", "minimum": 2, "maximum": 1024, "default": 8, "description": "The number of GB to reserve for memory for this kernel"} + }, + "environment_variables": { + "type": "object", + "properties": { + "PROVISIONER_ENV_A": {"type": "string"}, + "PROVISIONER_ENV_B": {"type": "string"}, + "PROVISIONER_ENV_C": {"type": "string"} + } + } + } + } + "required": ["cpus"] + } + }, + "kernel_parameter_schema": { + "title": "IPyKernel Parameters", + "type": "object", + "properties": { + "kernel_parameters": { + "type": "object", + "properties": { + "cache_size": {"type": "integer", "description": "Set the size of the output cache", "default": 1000, "minimum": 0, "maximum": 50000}, + "matplotlib": {"type": "string", "default": "auto", "enum": ["auto", "agg", "gtk", "gtk3", "inline", "ipympl", "nbagg", "notebook", "osx", "pdf", "ps", "qt", "qt4", "qt5", "svg", "tk", "widget", "wx"], "description": "Configure matplotlib for interactive use with the default matplotlib"} + }, + "environment_variables": { + "type": "object", + "properties": { + "KERNEL_ENV_A": {"type": "string"}, + "KERNEL_ENV_A": {"type": "string"}, + "KERNEL_ENV_A": {"type": "string"} + } + } + } + } + } +} +``` ## Client Applications -_Parameter-aware_ applications that retrieve kernel type information from `/api/kernelspecs` will recognize the existence of any `metadata.launch_parameter_schema` values. When a kernel type is selected and contains launch parameter schema information, the application should construct a dialog from the schema that prompts for parameter values. Required values should be noted and default values should be pre-filled. (We will need to emphasize that all required values have reasonable defaults, but how that is handled is more a function of the kernel provider.) +_Parameter-aware_ applications that retrieve kernel specifications from `/api/kernelspecs` will need to recognize the existence of any `kernel_provisioner.provisioner_parameter_schema` and `kernel_parameter_schema` values within the specification's `metadata` stanza. When a kernel specification is selected and contains parameter schema information, the application should construct a dialog from the schema that prompts for parameter values. Required values should be noted and default values should be pre-filled. Command-line applications that cannot construct parameter inputs will need to rely on the provisioner using reasonable default values for any required parameters. (We will need to emphasize that all required values have reasonable defaults, but how that is handled is more a function of the kernel provider.) -Once the application has obtained the desired set of parameters, it will create an entry in the JSON body of the `/api/kernels` POST request that is a dictionary of name/value pairs. The key under which this set of pairs resides will be named `launch_params`. The kernels handler will then pass this dictionary to the framework, where the kernel provider launch method will act on it. +Once the application has obtained the desired set of parameters, it will create an entry in the JSON body of the `/api/kernels` POST request that is a dictionary of two dictionaries, each consisting of name/value pairs. The key under which this pair of dictionaries resides will be named `parameters`. Provisioner parameters will reside under the key `provisioner_parameters`, while kernel parameters will be noted within `kernel_parameters`. Each parameter-based dictionary can also include a dictionary named `environment_variables` corresponding to the encapsulating parameters directionary (provisioner or kernel). The kernels handler will then pass this dictionary to the framework, where the kernel launch method will act on it. +Here's an example of such a JSON body entry consisting of various parameters and their values... ```json - "launch_params": { - "cpus": 4, - "memory": 512 + "parameters": { + "provisioner_parameters": { + "cpus": 4, + "memory": 512, + "environment_variables": { + "PROVISIONER_ENV_A": "research" + } + }, + "kernel_parameters": { + "cache": 4, + "environment_variables": { + "KERNEL_ENV_A": "science" + } + } } ``` -Note that applications that are unaware of `launch_parameter_schema` will still behave in a reasonable manner provided the kernel provider applies reasonable default values to any required parameters. - -In addition, it would be beneficial if the set of parameter name/value pairs could be added into the notebook metadata so that subsequent launch attempts could use _those_ values in the pre-filled dialog. - -## Kernel Provider Launch -Once the kernel provider launch method is called, the provider should validate the parameters and their values against the schema. Any validation errors should result in a failure to launch - although the decision to fail the launch will be a function of the kernel provider. The provider will need to differentiate between "contextual" parameters and actual kernel parameters and apply the values appropriately. `jupyter_kernel_mgmt` will likely provide a helper method for validation. - -Note: Since KernelSpecProvider will be the primary provider, at least initially, applications that wish to take advantage kernel launch parameters may want to create their own providers. Fortunately, we've provided a mechanism whereby KernelSpecProvider can be extended such that much of the discovery and launch machinery can be reused. In these cases, the kernel.json file would need to be prefixed with the new provider id so that `KernelSpecProvider` doesn't include those same kernel types in its set. +Note that applications that are unaware of parameterization will still behave in a reasonable manner provided the kernel provisioner applies reasonable default values to any required parameters and the administrator does the same for kernel-specific parameters. -## Environment Variables -A common mechanism in use today to vary a kernel's launch behavior utilizes environment variables. These variables are conveyed to the launch mechanism and set into the kernel's environment when launched. Since environment variables are commonly used in containerized contexts, we may want to support the ability for their specification within this mechanism. There are a few options to distinguish these kinds of parameters from "contextual" and kernel-specific parameters, if at all (option 4). +In addition, it would be beneficial if the set of parameter values (i.e., the `parameters` dictionary) could be added into the notebook metadata (along with any other necessary information) so that subsequent launch attempts could use _those_ values in the pre-filled dialog. -1. Use a custom schema that defines a `is_env` meta-property. Schema entries with `is_env=True` will be set into the kernel's environment. I'd prefer to avoid a custom schema since it would require access to its definition and introduces more deployment/configuration issues. -2. Create an explicit sub-section in `launch_parameter_schema` named `env_variables` that define the metadata corresponding to environmental variables. The payload on the subsequent POST request (to start a kernel) would then also include an `env_variables` sub-section consisting of the name/value pairs that the kernel provider ensures are placed into the target kernel's environment. -3. Have an implicit rule that parameter names that are completely capitalized (with underscores: `[A-Z][A-Z_]*`) are treated as environment variables. -4. Do nothing. The kernel provider will know which parameters correspond to environment variables, "contextual" variables or kernel-specific parameters. This approach implies that the client-facing application doesn't need to expose a given parameter as an environmental variable - which, technically, is just an implementation detail anyway. +## Kernel Launch +When a kernel is started, any parameters included in the keyword arguments should be validated prior to the kernel's launch (i.e., the startup of its hosting process). The provisioner's `pre_launch()` method is meant for this purpose, although the actual location for validation is up to the provisioner's author, provided validaton occurs prior to the `KernelManager`'s return from its `start_kernel()` method. -Note: where option 4 breaks down is for kernel providers that are generically written. For example `KernelSpecProvider` will be the provider for 90% of kernels. Should _it_ be the entity that knows a given parameter should be interpreted as an environment variable? As a result, a more explicit mechanism as proposed in the first 3 options is probably warranted. +Upon successful parameter validation, the kernel provisioner will be responsible for consuming its provisioner-specific parameters and applying any kernel-specific parameters to the templated `argv:` list. +Environment variables will be conveyed to the kernel process's environment and is, again, the responsibility of the kernel provisioner for how that is accomplished. ## Virtual Kernel Types -One of the advantages of kernel launch parameters is that one could conceivably have a single kernel configured, yet allow for a plethora of configuration options based on the parameter values - as @rgbkrk points out [here](https://github.com/takluyver/jupyter_kernel_mgmt/issues/9#issuecomment-496434455) - since this facility essentially _fabricates_ kernel types that, today, would require a separate type for each set of options. +One of the advantages of kernel launch parameters is that one could conceivably have a single kernel configured, yet allow for a plethora of configuration options based on the parameter values - as @rgbkrk points out [here](https://github.com/takluyver/jupyter_kernel_mgmt/issues/9#issuecomment-496434455) - since this facility essentially _fabricates_ kernel types that, today, would require a separate specification for each set of options. ## Backwards Compatibility Parameter-aware applications that are receiving results from the `/api/kernelspecs` REST API must be able to tolerate the _non-existence_ of `metadata.launch_parameter_schema` within the kernelspec results. Likewise, parameter-unaware applications will need to ignore the parameter stanza - which is the case today. -Kernel providers that support parameterized launches must also handle required parameters by providing reasonable defaults. In addition, they must not assume that the application will provide those defaults - despite the fact that the schema for those required parameters define default values - since the application that is requesting the kernel start (via the POST on `/api/kernels`) may be unaware of this parameter mechanism. +Kernel provisioners that support parameterized launches must also handle required parameters by providing reasonable defaults. In addition, they must not assume that the application will provide those defaults - despite the fact that the schema for those required parameters define default values - since the application that is requesting the kernel start (via the POST on `/api/kernels`) may be unaware of this parameter mechanism. ## References +https://jupyter-client.readthedocs.io/en/latest/provisioning.html +https://github.com/jupyter/jupyter_client/issues/608 +https://github.com/jupyter/jupyter_client/pull/612 https://github.com/takluyver/jupyter_kernel_mgmt/pull/22 https://github.com/jupyter/jupyter_client/issues/434 https://github.com/jupyter/enterprise_gateway/issues/640 https://paper.dropbox.com/doc/Day-1-Kernels-jupyter_client-IPython-Notebook-server--ApyJEjYtqrjfoPg1QpbxZfcpAg-MyS7d8X4wkkhRQy7wClXY https://github.com/takluyver/jupyter_kernel_mgmt/issues/9 -[1]: _Kernel Provider_ is a term introduced by the proposed [Jupyter Kernel Management](https://github.com/jupyter/enhancement-proposals/pull/45) package which enables the ability for third-party applications to bring their own kernel management (and discovery) mechanisms that can co-exist with other third-party applications doing the same thing. Previously, exactly one override of `KernelManager` (and for discovery, `KernelSpecManager`) could be supported at a time.