Implementing a Kernel Specification#
If you find yourself implementing a kernel launcher, you’ll need a way to make that kernel and kernel launcher available to applications. This is accomplished via the kernel specification or kernelspec.
Kernelspecs reside in well-known directories. For multi-tenant installations, where users may share kernel
specifications, we generally recommend they reside in /usr/local/share/jupyter/kernels
where each entry in this
directory is a directory representing the name of the kernel. The kernel specification is represented by the
file kernel.json
, the contents of which essentially indicate what environment variables should be present in
the kernel process (via the env
stanza) and which command (and arguments) should be issued to start the kernel
process (via the argv
stanza). The JSON also includes a metadata
stanza that contains the kernel provisioner
configuration, along with which provisioner to instantiate to help manage the kernel process’s lifecycle.
One approach the generated Gateway Provisioners kernel specifications take is to include a shell script that actually
issues the spark-submit
request. It is this shell script (typically named run.sh
) that is referenced in the
argv
stanza.
Here’s an example from a generated kernel specification for Spark running in a Hadoop YARN cluster:
{
"argv": [
"/usr/local/share/jupyter/kernels/yarn_spark_python/bin/run.sh",
"--kernel-id",
"{kernel_id}",
"--port-range",
"{port_range}",
"--response-address",
"{response_address}",
"--public-key",
"{public_key}",
"--spark-context-initialization-mode",
"lazy",
"--kernel-class-name",
"ipykernel.ipkernel.IPythonKernel"
],
"env": {
"SPARK_HOME": "/opt/spark",
"PYSPARK_PYTHON": "/opt/miniconda3/envs/provisioners/bin/python",
"PYTHONPATH": "${HOME}/.local/lib/python3.7/site-packages:/opt/spark/python",
"SPARK_OPTS": "--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/home/${KERNEL_USERNAME}/.local --conf spark.yarn.appMasterEnv.PYTHONPATH=${HOME}/.local/lib/python3.7/site-packages:/opt/spark/python --conf spark.yarn.appMasterEnv.PATH=/opt/miniconda3/envs/provisioners/bin:$PATH ${KERNEL_EXTRA_SPARK_OPTS}",
"LAUNCH_OPTS": ""
},
"display_name": "Spark Python (YARN Cluster)",
"language": "python",
"interrupt_mode": "signal",
"metadata": {
"kernel_provisioner": {
"provisioner_name": "yarn-provisioner",
"config": {
"launch_timeout": 30
}
}
}
}
where run.sh
issues spark-submit
specifying the kernel launcher as the “application”:
eval exec \
"${SPARK_HOME}/bin/spark-submit" \
"${SPARK_OPTS}" \
"${IMPERSONATION_OPTS}" \
"${PROG_HOME}/scripts/launch_ipykernel.py" \
"${LAUNCH_OPTS}" \
"$@"
For container-based environments, the argv
may instead reference a script that is meant to create the container pod
(for Kubernetes). For these, we use a template file
that operators can adjust to meet the needs of their environment. Here’s how that kernel.json
looks:
{
"argv": [
"python",
"/usr/local/share/jupyter/kernels/k8s_python/scripts/launch_kubernetes.py",
"--kernel-id",
"{kernel_id}",
"--port-range",
"{port_range}",
"--response-address",
"{response_address}",
"--public-key",
"{public_key}",
"--kernel-class-name",
"ipykernel.ipkernel.IPythonKernel"
],
"env": {},
"display_name": "Kubernetes Python",
"language": "python",
"interrupt_mode": "signal",
"metadata": {
"debugger": true,
"kernel_provisioner": {
"provisioner_name": "kubernetes-provisioner",
"config": {
"image_name": "elyra/kernel-py:dev",
"launch_timeout": 30
}
}
}
}
When using the launch_ipykernel
launcher (aka the Python kernel launcher), subclasses of ipykernel.kernelbase.Kernel
can be launched. By default, this launcher uses the classname "ipykernel.ipkernel.IPythonKernel"
, but other
subclasses of ipykernel.kernelbase.Kernel
can be specified by adding a --kernel-class-name
parameter to the argv
stanza. See Invoking subclasses of ipykernel.kernelbase.Kernel
for more information.
As should be evident, kernel specifications are highly tuned to the runtime environment so your needs may be different, but should resemble the approaches we’ve taken so far.