Skip to content

Commit

Permalink
vdk-structlog: create structured logging plugin
Browse files Browse the repository at this point in the history
Why?

As part of VEP-2448, we'd like to make structured logging pluggable

What?

Create a vdk-structlog plugin. The plugin allows users to configure
logging metadata and logging format. It also works with bound loggers.

How was this tested?

Installed the plugin and ran a data job locally with
different configurations

White kind of change is this?

feature/non-breaking

Signed-off-by: Dilyan Marinov <[email protected]>
  • Loading branch information
Dilyan Marinov authored and DeltaMichael committed Oct 23, 2023
1 parent cfd5aa8 commit d6f9715
Show file tree
Hide file tree
Showing 10 changed files with 674 additions and 0 deletions.
22 changes: 22 additions & 0 deletions projects/vdk-plugins/vdk-structlog/.plugin-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Copyright 2021-2023 VMware, Inc.
# SPDX-License-Identifier: Apache-2.0

image: "python:3.7"

.build-vdk-structlog:
variables:
PLUGIN_NAME: vdk-structlog
extends: .build-plugin

build-py37-vdk-structlog:
extends: .build-vdk-structlog
image: "python:3.7"

build-py311-vdk-structlog:
extends: .build-vdk-structlog
image: "python:3.11"

release-vdk-structlog:
variables:
PLUGIN_NAME: vdk-structlog
extends: .release-plugin
121 changes: 121 additions & 0 deletions projects/vdk-plugins/vdk-structlog/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Structured Logging For VDK

This plugin allows users to:
- select the log output format
- configure the logging metadata
- display metadata added by bound loggers

## Usage

```
pip install vdk-structlog
```

### Configuration

(`vdk config-help` is useful command to browse all config options of your installation of vdk)

| Name | Description | (example) Value |
| ---------------- | --------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| logging_metadata | Configure the metadata that will be output along with the log message | "timestamp, level, logger_name, file_name, line_number, function_name, vdk_job_name, vdk_step_name, vdk_step_type" |
| logging_format | Configure the logging output format. Available formats: json, console | "console" |

### Example: Configure metadata

Create a data job and add the following config options

```
[vdk]
logging_metadata=timestamp,level,file_name,vdk_job_name
logging_format=console
```

Then run the data job. You should see just the configured tags where relevant.
For example, you won't see the vdk_job_name outside of log statements directly
related to job execution.

```
2023-10-17 11:20:59,202 [VDK] [INFO ] managed_cursor.py ingest-from-db-example-job - Executing query SUCCEEDED. Query duration 00h:00m:00s
2023-10-17 11:20:59,202 [VDK] [INFO ] managed_connection_b ingest-from-db-example-job - Fetching query result...
2023-10-17 11:20:59,202 [VDK] [INFO ] managed_cursor.py ingest-from-db-example-job - Fetching all results from query ...
2023-10-17 11:20:59,202 [VDK] [INFO ] managed_cursor.py ingest-from-db-example-job - Fetching all results from query SUCCEEDED.
2023-10-17 11:20:59,202 [VDK] [INFO ] managed_cursor.py ingest-from-db-example-job - Closing DB cursor ...
2023-10-17 11:20:59,202 [VDK] [INFO ] managed_cursor.py ingest-from-db-example-job - Closing DB cursor SUCCEEDED.
2023-10-17 11:20:59,203 [VDK] [INFO ] file_based_step.py ingest-from-db-example-job - Entering 30_ingest_to_table.py#run(...) ...
2023-10-17 11:20:59,203 [VDK] [INFO ] ingester_router.py ingest-from-db-example-job - Sending tabular data for ingestion with method: sqlite and target: None
2023-10-17 11:20:59,204 [VDK] [INFO ] file_based_step.py ingest-from-db-example-job - Exiting 30_ingest_to_table.py#run(...) SUCCESS
```

Now, let's remove the timestamp from the configuration and add the line number

```
logging_metadata=level,file_name,line_number,vdk_job_name
logging_format=console
```

And run the job again

```
[INFO ] managed_cursor.py :97 ingest-from-db-example-job - Executing query SUCCEEDED. Query duration 00h:00m:00s
[INFO ] managed_connection_b :133 ingest-from-db-example-job - Fetching query result...
[INFO ] managed_cursor.py :193 ingest-from-db-example-job - Fetching all results from query ...
[INFO ] managed_cursor.py :196 ingest-from-db-example-job - Fetching all results from query SUCCEEDED.
[INFO ] managed_cursor.py :203 ingest-from-db-example-job - Closing DB cursor ...
[INFO ] managed_cursor.py :205 ingest-from-db-example-job - Closing DB cursor SUCCEEDED.
[INFO ] file_based_step.py :103 ingest-from-db-example-job - Entering 30_ingest_to_table.py#run(...) ...
[INFO ] ingester_router.py :106 ingest-from-db-example-job - Sending tabular data for ingestion with method: sqlite and target: None
[INFO ] file_based_step.py :109 ingest-from-db-example-job - Exiting 30_ingest_to_table.py#run(...) SUCCESS
```

### Example: Configure logging format

Create a data job and add the following config options

```
[vdk]
logging_metadata=timestamp,level,file_name,vdk_job_name
logging_format=json
```

And you should see json-formatted logs.

```
{"filename": "managed_cursor.py", "lineno": 97, "vdk_job_name": "ingest-from-db-example-job", "message": "Executing query SUCCEEDED. Query duration 00h:00m:00s", "vdk_step_name": "20_create_table.sql", "vdk_step_type": "sql", "timestamp": 1697532608.968082, "level": "INFO"}
{"filename": "managed_connection_base.py", "lineno": 133, "vdk_job_name": "ingest-from-db-example-job", "message": "Fetching query result...", "vdk_step_name": "20_create_table.sql", "vdk_step_type": "sql", "timestamp": 1697532608.9681149, "level": "INFO"}
{"filename": "managed_cursor.py", "lineno": 193, "vdk_job_name": "ingest-from-db-example-job", "message": "Fetching all results from query ...", "vdk_step_name": "20_create_table.sql", "vdk_step_type": "sql", "timestamp": 1697532608.968137, "level": "INFO"}
{"filename": "managed_cursor.py", "lineno": 196, "vdk_job_name": "ingest-from-db-example-job", "message": "Fetching all results from query SUCCEEDED.", "vdk_step_name": "20_create_table.sql", "vdk_step_type": "sql", "timestamp": 1697532608.9681568, "level": "INFO"}
{"filename": "managed_cursor.py", "lineno": 203, "vdk_job_name": "ingest-from-db-example-job", "message": "Closing DB cursor ...", "vdk_step_name": "20_create_table.sql", "vdk_step_type": "sql", "timestamp": 1697532608.968405, "level": "INFO"}
{"filename": "managed_cursor.py", "lineno": 205, "vdk_job_name": "ingest-from-db-example-job", "message": "Closing DB cursor SUCCEEDED.", "vdk_step_name": "20_create_table.sql", "vdk_step_type": "sql", "timestamp": 1697532608.9684658, "level": "INFO"}
{"filename": "file_based_step.py", "lineno": 103, "vdk_job_name": "ingest-from-db-example-job", "message": "Entering 30_ingest_to_table.py#run(...) ...", "vdk_step_name": "30_ingest_to_table.py", "vdk_step_type": "python", "timestamp": 1697532608.9734771, "level": "INFO"}
{"filename": "ingester_router.py", "lineno": 106, "vdk_job_name": "ingest-from-db-example-job", "message": "Sending tabular data for ingestion with method: sqlite and target: None", "vdk_step_name": "30_ingest_to_table.py", "vdk_step_type": "python", "timestamp": 1697532608.975029, "level": "INFO"}
{"filename": "file_based_step.py", "lineno": 109, "vdk_job_name": "ingest-from-db-example-job", "message": "Exiting 30_ingest_to_table.py#run(...) SUCCESS", "vdk_step_name": "30_ingest_to_table.py", "vdk_step_type": "python", "timestamp": 1697532608.976068, "level": "INFO"}
```

### Example: Bound loggers

TODO: Add an example once bound loggers are part of vdk-core

### Example: Passing custom metadata fields with extra_params

TODO: Add an example

### Build and test

```
pip install -r requirements.txt
pip install -e .
pytest
```

In VDK repo [../build-plugin.sh](https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins/build-plugin.sh) script can be used also.


#### Note about the CICD:

.plugin-ci.yaml is needed only for plugins part of [Versatile Data Kit Plugin repo](https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins).

The CI/CD is separated in two stages, a build stage and a release stage.
The build stage is made up of a few jobs, all which inherit from the same
job configuration and only differ in the Python version they use (3.7, 3.8, 3.9 and 3.10).
They run according to rules, which are ordered in a way such that changes to a
plugin's directory trigger the plugin CI, but changes to a different plugin does not.
6 changes: 6 additions & 0 deletions projects/vdk-plugins/vdk-structlog/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# this file is used to provide testing requirements
# for requirements (dependencies) needed during and after installation of the plugin see (and update) setup.py install_requires section

pytest
vdk-core
vdk-test-utils
42 changes: 42 additions & 0 deletions projects/vdk-plugins/vdk-structlog/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Copyright 2021-2023 VMware, Inc.
# SPDX-License-Identifier: Apache-2.0
import pathlib

import setuptools

"""
Builds a package with the help of setuptools in order for this package to be imported in other projects
"""

__version__ = "0.1.0"

setuptools.setup(
name="vdk-structlog",
version=__version__,
url="https://github.com/vmware/versatile-data-kit",
description="Structured logging for versatile data kit",
long_description=pathlib.Path("README.md").read_text(),
long_description_content_type="text/markdown",
install_requires=["vdk-core", "python-json-logger"],
package_dir={"": "src"},
packages=setuptools.find_namespace_packages(where="src"),
# This is the only vdk plugin specifc part
# Define entry point called "vdk.plugin.run" with name of plugin and module to act as entry point.
entry_points={
"vdk.plugin.run": ["vdk-structlog = vdk.plugin.structlog.structlog_plugin"]
},
classifiers=[
"Development Status :: 2 - Pre-Alpha",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
],
project_urls={
"Documentation": "https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins/vdk-structlog",
"Source Code": "https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins/vdk-structlog",
"Bug Tracker": "https://github.com/vmware/versatile-data-kit/issues/new/choose",
},
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Copyright 2021-2023 VMware, Inc.
# SPDX-License-Identifier: Apache-2.0
import logging

STRUCTLOG_LOGGING_METADATA_KEY = "logging_metadata"
STRUCTLOG_LOGGING_FORMAT_KEY = "logging_format"

STRUCTLOG_LOGGING_FORMAT_POSSIBLE_VALUES = ["console", "json"]
STRUCTLOG_LOGGING_FORMAT_DEFAULT = "console"

STRUCTLOG_LOGGING_METADATA_JOB = {
"vdk_job_name": "%(vdk_job_name)s",
"vdk_step_name": "%(vdk_step_name)s",
"vdk_step_type": "%(vdk_step_type)s",
}

STRUCTLOG_METADATA_FIELDS = {
"timestamp": "created",
"level": "levelname",
"logger_name": "name",
"file_name": "filename",
"line_number": "lineno",
"function_name": "funcName",
}

JSON_STRUCTLOG_LOGGING_METADATA_DEFAULT = {
"timestamp": "%(timestamp)s",
"level": "%(level)s",
"logger_name": "%(name)s",
"file_name": "%(filename)s",
"line_number": "%(lineno)s",
"function_name": "%(funcName)s",
}

CONSOLE_STRUCTLOG_LOGGING_METADATA_DEFAULT = {
"timestamp": "%(asctime)s [VDK]",
"level": "[%(levelname)-5.5s]",
"logger_name": "%(name)-30.30s",
"file_name": "%(filename)20.20s",
"line_number": ":%(lineno)-4.4s",
"function_name": "%(funcName)-16.16s",
}

CONSOLE_STRUCTLOG_LOGGING_METADATA_JOB = {
"vdk_job_name": "%(vdk_job_name)s",
"vdk_step_name": "%(vdk_step_name)s",
"vdk_step_type": "%(vdk_step_type)s",
}

CONSOLE_STRUCTLOG_LOGGING_METADATA_ALL = {
**CONSOLE_STRUCTLOG_LOGGING_METADATA_DEFAULT,
**CONSOLE_STRUCTLOG_LOGGING_METADATA_JOB,
}

STRUCTLOG_LOGGING_METADATA_ALL = {
**JSON_STRUCTLOG_LOGGING_METADATA_DEFAULT,
**STRUCTLOG_LOGGING_METADATA_JOB,
}

STRUCTLOG_LOGGING_METADATA_ALL_KEYS = set(
JSON_STRUCTLOG_LOGGING_METADATA_DEFAULT.keys()
) | set(STRUCTLOG_LOGGING_METADATA_JOB.keys())

RECORD_DEFAULT_FIELDS = set(vars(logging.LogRecord("", "", "", "", "", "", "")))
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Copyright 2021-2023 VMware, Inc.
# SPDX-License-Identifier: Apache-2.0
from logging import Filter
from logging import LogRecord
from typing import Set

from vdk.plugin.structlog.constants import RECORD_DEFAULT_FIELDS


class JsonMetadataFilter(Filter):
"""
Modifies log records when json formatting is used.
Removes the metadata fields that are not part of the metadata config.
The json formatter displays all custom fields by default, so if they're not
part of the metadata config, they should be removed in order for filtering
to work.
"""

def __init__(self, key_set: Set):
super().__init__()
self._key_set = key_set

def filter(self, record: LogRecord) -> LogRecord:
fields = vars(record)
for field in fields:
if field not in self._key_set and field not in RECORD_DEFAULT_FIELDS:
# not sure if we want to delete, e.g. delattr(record, field) or just set to empty
setattr(record, field, "")
return record


class ConsoleMetadataFilter(Filter):
"""
Modifies log records when console formatting is used
Sets metadata fields that are in the config set but not in the log record to empty string.
This prevents the formatter from breaking for different records, e.g. some records have vdk_job_name and some don't
and records that don't have vdk_job_name should still have vdk_job_name in order not to break the formatter.
"""

def __init__(self, key_set: Set):
super().__init__()
temp = [
RECORD_DEFAULT_FIELDS[x] if x in RECORD_DEFAULT_FIELDS else x
for x in key_set
]
self._key_set = set(temp)

def filter(self, record: LogRecord) -> LogRecord:
fields = vars(record)
for field in self._key_set:
if field not in fields and field not in RECORD_DEFAULT_FIELDS:
setattr(record, field, "")
return record


class AttributeAdder(Filter):
"""
Adds attributes to log records.
"""

def __init__(self, attr_key: str, attr_value: str):
super().__init__()
self._attr_key = attr_key
self._attr_value = attr_value

def filter(self, record: LogRecord) -> LogRecord:
setattr(record, self._attr_key, self._attr_value)
return record
Loading

0 comments on commit d6f9715

Please sign in to comment.