Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-jupyter: add iPython extension #1482

Closed
wants to merge 13 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ project.ext {
versions = [
// include only libraries which are NOT in the Spring Boot BOM: org.springframework:spring-boot-dependencies
'com.google.guava:guava' : 'com.google.guava:guava:31.1-jre',
'com.nimbusds:nimbus-jose-jwt' : 'com.nimbusds:nimbus-jose-jwt:9.25.6',
'com.nimbusds:nimbus-jose-jwt' : 'com.nimbusds:nimbus-jose-jwt:9.28',
// Update the ph-client to latest.
'io.micrometer:micrometer-registry-prometheus' : 'io.micrometer:micrometer-registry-prometheus:1.10.2',
'org.openapitools:jackson-databind-nullable' : 'org.openapitools:jackson-databind-nullable:0.2.4',
Expand All @@ -15,8 +15,8 @@ project.ext {
'org.junit.jupiter:junit-jupiter-engine' : 'org.junit.jupiter:junit-jupiter-engine:5.9.1',
'org.junit.platform:junit-platform-suite-api' : 'org.junit.platform:junit-platform-suite-api:1.9.1',
'com.mmnaseri.utils:spring-data-mock' : 'com.mmnaseri.utils:spring-data-mock:2.2.0',
'org.mockito:mockito-core' : 'org.mockito:mockito-core:4.10.0',
'net.bytebuddy:byte-buddy' : 'net.bytebuddy:byte-buddy:1.12.19',
'org.mockito:mockito-core' : 'org.mockito:mockito-core:4.11.0',
'net.bytebuddy:byte-buddy' : 'net.bytebuddy:byte-buddy:1.12.20',
'com.fasterxml.jackson.core:jackson-databind' : 'com.fasterxml.jackson.core:jackson-databind:2.14.1',
'com.fasterxml.jackson.datatype:jackson-datatype-jsr310' : 'com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.14.1',
'org.json:json' : 'org.json:json:20220924',
Expand All @@ -40,7 +40,7 @@ project.ext {
'org.apache.commons:commons-lang3' : 'org.apache.commons:commons-lang3:3.12.0',
'org.apache.commons:commons-text' : 'org.apache.commons:commons-text:1.10.0',
'com.github.tomakehurst:wiremock' : 'com.github.tomakehurst:wiremock:2.27.2',
'com.graphql-java:graphql-java-extended-scalars' : 'com.graphql-java:graphql-java-extended-scalars:19.1',
'com.graphql-java:graphql-java-extended-scalars' : 'com.graphql-java:graphql-java-extended-scalars:20.0',
'org.springframework.retry:spring-retry' : 'org.springframework.retry:spring-retry:2.0.0',
'org.apache.tika:tika-core' : 'org.apache.tika:tika-core:2.6.0',

Expand Down
35 changes: 35 additions & 0 deletions projects/vdk-plugins/vdk-ipython-ext/.plugin-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Copyright 2021 VMware, Inc.
# SPDX-License-Identifier: Apache-2.0

image: "python:3.7"

.build-vdk-ipython-ext:
variables:
PLUGIN_NAME: vdk-ipython-ext
extends: .build-plugin

build-py37-vdk-ipython-ext:
extends: .build-vdk-ipython-ext
image: "python:3.7"


build-py38-vdk-ipython-ext:
extends: .build-vdk-ipython-ext
image: "python:3.8"

build-py39-vdk-ipython-ext:
extends: .build-vdk-ipython-ext
image: "python:3.9"

build-py310-vdk-ipython-ext:
extends: .build-vdk-ipython-ext
image: "python:3.10"

build-py311-vdk-ipython-ext:
extends: .build-vdk-ipython-ext
image: "python:3.11"

release-vdk-ipython-ext:
variables:
PLUGIN_NAME: vdk-ipython-ext
extends: .release-plugin
53 changes: 53 additions & 0 deletions projects/vdk-plugins/vdk-ipython-ext/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# ipython-ext

Ipython extension for VDK

This extension introduces a magic command for Jupyter.
The command enables the user to load job_input for his current data job and use it freely while working with Jupyter.

See more about magic commands: https://ipython.readthedocs.io/en/stable/interactive/magics.html


## Usage
To use the extension it must be firstly installed with pip as a python package.
Then to load the extension in Jupyter the user should use:
```
%reload_ext vdk_ipython_ext
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it not look a bit nicer if it was just %reload_ext vdk

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this way I get some errors with the imports and I thing it would be good to have a little difference since it is not loading the whole vdk but only a small part of it

```
And to load the job_input:
```
%reload_job_input
```
The %reload_job_input magic can be used with arguments such as passing the job's path with --path
or giving the job a new with --name, etc.

### Example
The output of this example is "myjob"
```
%reload_ext vdk_ipython_ext

%reload_job_input --name=myjob

job_input.get_name()
```

### Build and testing

```
pip install -r requirements.txt
pip install -e .
pytest
```

In VDK repo [../build-plugin.sh](https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins/build-plugin.sh) script can be used also.


#### Note about the CICD:

.plugin-ci.yaml is needed only for plugins part of [Versatile Data Kit Plugin repo](https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins).

The CI/CD is separated in two stages, a build stage and a release stage.
The build stage is made up of a few jobs, all which inherit from the same
job configuration and only differ in the Python version they use (3.7, 3.8, 3.9, 3.10 and 3.11).
They run according to rules, which are ordered in a way such that changes to a
plugin's directory trigger the plugin CI, but changes to a different plugin does not.
8 changes: 8 additions & 0 deletions projects/vdk-plugins/vdk-ipython-ext/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# this file is used to provide testing requirements
# for requirements (dependencies) needed during and after installation of the plugin see (and update) setup.py install_requires section

IPython

pytest
vdk-core
vdk-test-utils
31 changes: 31 additions & 0 deletions projects/vdk-plugins/vdk-ipython-ext/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Copyright 2021 VMware, Inc.
# SPDX-License-Identifier: Apache-2.0
import pathlib

import setuptools


__version__ = "0.1.0"

setuptools.setup(
name="vdk-ipython-ext",
version=__version__,
url="https://github.com/vmware/versatile-data-kit",
description="Ipython extension for VDK",
long_description=pathlib.Path("README.md").read_text(),
long_description_content_type="text/markdown",
install_requires=["vdk-core"],
package_dir={"": "src"},
packages=setuptools.find_namespace_packages(where="src"),
entry_points={"vdk.plugin.run": ["ipython-ext = vdk_ipython_ext"]},
classifiers=[
"Development Status :: 2 - Pre-Alpha",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Framework :: IPython",
],
)
45 changes: 45 additions & 0 deletions projects/vdk-plugins/vdk-ipython-ext/src/vdk_ipython_ext.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright 2021 VMware, Inc.
# SPDX-License-Identifier: Apache-2.0
import os
import pathlib

from IPython import get_ipython
from IPython.core.magic_arguments import argument
from IPython.core.magic_arguments import magic_arguments
from IPython.core.magic_arguments import parse_argstring
from vdk.internal.builtin_plugins.run.standalone_data_job import (
StandaloneDataJobFactory,
)


def load_ipython_extension(ipython):
"""
IPython will look for this function specifically.
See https://ipython.readthedocs.io/en/stable/config/extensions/index.html
"""
ipython.register_magic_function(magic_load_job, magic_name="reload_job_input")


@magic_arguments()
@argument("-p", "--path", type=str, default=None)
@argument("-n", "--name", type=str, default=None)
@argument("-a", "--arguments", type=str, default=None)
@argument("-t", "--template", type=str, default=None)
def magic_load_job(line: str):
"""
You can use %initialize_vdk_job line magic within your Notebook to reload the job_input variable
for your current job
See more for line magic: https://ipython.readthedocs.io/en/stable/interactive/magics.html
"""
args = parse_argstring(magic_load_job, line)
load_job(args.path, args.name, args.arguments, args.template)


def load_job(
path: str = None, name: str = None, arguments: str = None, template: str = None
):
path = pathlib.Path(path) if path else pathlib.Path(os.getcwd())
with StandaloneDataJobFactory.create(
data_job_directory=path, name=name, job_args=arguments, template_name=template
) as job_input:
get_ipython().push(variables={"job_input": job_input})
38 changes: 38 additions & 0 deletions projects/vdk-plugins/vdk-ipython-ext/tests/test_plugin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Copyright 2021 VMware, Inc.
# SPDX-License-Identifier: Apache-2.0
import pytest
from IPython.core.error import UsageError
from IPython.testing.globalipapp import start_ipython
from vdk.api.job_input import IJobInput


@pytest.fixture(scope="session")
def session_ip():
yield start_ipython()


@pytest.fixture(scope="function")
def ip(session_ip):
session_ip.run_line_magic(magic_name="load_ext", line="vdk_ipython_ext")
yield session_ip
session_ip.run_line_magic(magic_name="reset", line="-f")


def test_load_job_input_with_no_arguments(ip):
ip.run_line_magic(magic_name="reload_job_input", line="")
assert ip.user_global_ns["job_input"] is not None
assert isinstance(ip.user_global_ns["job_input"], IJobInput)


def test_load_job_input_with_valid_argument(ip):
ip.run_line_magic(magic_name="reload_job_input", line="--name=test")
assert ip.user_global_ns["job_input"] is not None
assert isinstance(ip.user_global_ns["job_input"], IJobInput)
assert ip.user_global_ns["job_input"].get_name() == "test"


def test_load_job_input_with_invalid_argument(ip):
with pytest.raises(
UsageError, match=r"unrecognized arguments: --invalid_arg=dummy"
):
ip.run_line_magic(magic_name="reload_job_input", line="--invalid_arg=dummy")
20 changes: 10 additions & 10 deletions projects/vdk-plugins/vdk-jobs-troubleshooting/.plugin-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,13 @@ build-py311-vdk-jobs-troubleshooting:
extends: .build-vdk-jobs-troubleshooting
image: "python:3.11"

#build-vdk-jobs-troubleshooting-on-vdk-core-release:
# variables:
# PLUGIN_NAME: vdk-jobs-troubleshooting
# extends: .build-plugin-on-vdk-core-release
# image: "python:3.9"

#release-vdk-jobs-troubleshooting:
# variables:
# PLUGIN_NAME: vdk-jobs-troubleshooting
# extends: .release-plugin
build-vdk-jobs-troubleshooting-on-vdk-core-release:
variables:
PLUGIN_NAME: vdk-jobs-troubleshooting
extends: .build-plugin-on-vdk-core-release
image: "python:3.9"

release-vdk-jobs-troubleshooting:
variables:
PLUGIN_NAME: vdk-jobs-troubleshooting
extends: .release-plugin
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,65 @@

from vdk.api.plugin.hook_markers import hookimpl
from vdk.api.plugin.plugin_registry import IPluginRegistry
from vdk.internal.builtin_plugins.run.job_context import JobContext
from vdk.internal.core.config import ConfigurationBuilder
from vdk.plugin.jobs_troubleshoot.api.troubleshoot_utility import ITroubleshootUtility
from vdk.plugin.jobs_troubleshoot.troubleshoot_configuration import add_definitions
from vdk.plugin.jobs_troubleshoot.troubleshoot_utilities.utilities_registry import (
get_utilities_to_use,
)

log = logging.getLogger(__name__)


class JobTroubleshootingPlugin:
"""
Entrypoint for the Data Jobs Troubleshooting plugin - it provides the means to initialize and configure
troubleshooting utilities, based on the configured environment variables.

Example:
To start the thread dump utility, configure the following environment variables:
VDK_TROUBLESHOOT_UTILITIES_TO_USE="thread-dump"
VDK_PORT_TO_USE=8783
"""

def __init__(self):
self.troubleshooting_utils: List[ITroubleshootUtility] = []

@staticmethod
@hookimpl
def vdk_configure(config_builder: ConfigurationBuilder) -> None:
add_definitions(config_builder=config_builder)

@hookimpl
def initialize_job(self, context: JobContext) -> None:
self.troubleshooting_utils = get_utilities_to_use(
job_config=context.core_context.configuration
)
try:
for util in self.troubleshooting_utils:
util.start()
except Exception as e:
log.info(
f"""
An exception occurred while processing a troubleshooting
utility. The error was: {e}
"""
)

@hookimpl
def finalize_job(self, context: JobContext) -> None:
try:
for util in self.troubleshooting_utils:
util.stop()
except Exception as e:
log.info(
f"""
An exception occurred while processing a troubleshooting
utility. The error was: {e}
"""
)


@hookimpl
def vdk_start(plugin_registry: IPluginRegistry, command_line_args: List) -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright 2021 VMware, Inc.
# SPDX-License-Identifier: Apache-2.0
import logging
from http.server import HTTPServer
from threading import Thread
from typing import Any

log = logging.getLogger(__name__)


class HealthCheckServer:
"""
A class for creating an HTTP server for serving requests.
"""

def __init__(self, port: int, handler: Any = None):
"""
Initializes a new instance of the HealthCheckServer class.

Parameters:
port (int): The port number on which the server will listen for requests.
handler (Any, optional): The request handler class. Defaults to SimpleHTTPRequestHandler.
"""
if handler:
self._server = HTTPServer(("", port), handler)
self._thread = Thread(target=self._server.serve_forever)
log.error(f"Troubleshooting utility server started on port {port}.")
else:
log.error(
"Troubleshooting utility handler not specified. Will not start the server."
)

def start(self):
"""
Starts the server.
"""
self._thread.start()

def stop(self):
"""
Stops the server.
"""
self._server.shutdown()
self._server.server_close()
self._thread.join()
Loading