-
Notifications
You must be signed in to change notification settings - Fork 59
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
vdk-structlog: create structured logging plugin
Why? As part of VEP-2448, we'd like to make structured logging pluggable What? Create a vdk-structlog plugin. The plugin allows users to configure logging metadata and logging format. It also works with bound loggers. How was this tested? Installed the plugin and ran a data job locally with different configurations White kind of change is this? feature/non-breaking Signed-off-by: Dilyan Marinov <[email protected]>
- Loading branch information
Dilyan Marinov
committed
Oct 20, 2023
1 parent
e9af4fa
commit d58f224
Showing
10 changed files
with
683 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Copyright 2021-2023 VMware, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
image: "python:3.7" | ||
|
||
.build-vdk-structlog: | ||
variables: | ||
PLUGIN_NAME: vdk-structlog | ||
extends: .build-plugin | ||
|
||
build-py37-vdk-structlog: | ||
extends: .build-vdk-structlog | ||
image: "python:3.7" | ||
|
||
build-py311-vdk-structlog: | ||
extends: .build-vdk-structlog | ||
image: "python:3.11" | ||
|
||
release-vdk-structlog: | ||
variables: | ||
PLUGIN_NAME: vdk-structlog | ||
extends: .release-plugin |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
# Structured Logging For VDK | ||
|
||
This plugin allows users to: | ||
- select the log output format | ||
- configure the logging metadata | ||
- display metadata added by bound loggers | ||
|
||
## Usage | ||
|
||
``` | ||
pip install vdk-structlog | ||
``` | ||
|
||
### Configuration | ||
|
||
(`vdk config-help` is useful command to browse all config options of your installation of vdk) | ||
|
||
| Name | Description | (example) Value | | ||
| ---------------- | --------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | | ||
| logging_metadata | Configure the metadata that will be output along with the log message | "timestamp, level, logger_name, file_name, line_number, function_name, vdk_job_name, vdk_step_name, vdk_step_type" | | ||
| logging_format | Configure the logging output format. Available formats: json, console | "console" | | ||
|
||
### Example: Configure metadata | ||
|
||
Create a data job and add the following config options | ||
|
||
``` | ||
[vdk] | ||
logging_metadata=timestamp,level,file_name,vdk_job_name | ||
logging_format=console | ||
``` | ||
|
||
Then run the data job. You should see just the configured tags where relevant. | ||
For example, you won't see the vdk_job_name outside of log statements directly | ||
related to job execution. | ||
|
||
<details> | ||
<summary>Click to expand</summary> | ||
``` | ||
2023-10-17 11:20:59,202 [VDK] [INFO ] managed_cursor.py ingest-from-db-example-job - Executing query SUCCEEDED. Query duration 00h:00m:00s | ||
2023-10-17 11:20:59,202 [VDK] [INFO ] managed_connection_b ingest-from-db-example-job - Fetching query result... | ||
2023-10-17 11:20:59,202 [VDK] [INFO ] managed_cursor.py ingest-from-db-example-job - Fetching all results from query ... | ||
2023-10-17 11:20:59,202 [VDK] [INFO ] managed_cursor.py ingest-from-db-example-job - Fetching all results from query SUCCEEDED. | ||
2023-10-17 11:20:59,202 [VDK] [INFO ] managed_cursor.py ingest-from-db-example-job - Closing DB cursor ... | ||
2023-10-17 11:20:59,202 [VDK] [INFO ] managed_cursor.py ingest-from-db-example-job - Closing DB cursor SUCCEEDED. | ||
2023-10-17 11:20:59,203 [VDK] [INFO ] file_based_step.py ingest-from-db-example-job - Entering 30_ingest_to_table.py#run(...) ... | ||
2023-10-17 11:20:59,203 [VDK] [INFO ] ingester_router.py ingest-from-db-example-job - Sending tabular data for ingestion with method: sqlite and target: None | ||
2023-10-17 11:20:59,204 [VDK] [INFO ] file_based_step.py ingest-from-db-example-job - Exiting 30_ingest_to_table.py#run(...) SUCCESS | ||
``` | ||
</details> | ||
|
||
Now, let's remove the timestamp from the configuration and add the line number | ||
|
||
``` | ||
logging_metadata=level,file_name,line_number,vdk_job_name | ||
logging_format=console | ||
``` | ||
|
||
And run the job again | ||
|
||
<details> | ||
<summary>Click to expand</summary> | ||
``` | ||
[INFO ] managed_cursor.py :97 ingest-from-db-example-job - Executing query SUCCEEDED. Query duration 00h:00m:00s | ||
[INFO ] managed_connection_b :133 ingest-from-db-example-job - Fetching query result... | ||
[INFO ] managed_cursor.py :193 ingest-from-db-example-job - Fetching all results from query ... | ||
[INFO ] managed_cursor.py :196 ingest-from-db-example-job - Fetching all results from query SUCCEEDED. | ||
[INFO ] managed_cursor.py :203 ingest-from-db-example-job - Closing DB cursor ... | ||
[INFO ] managed_cursor.py :205 ingest-from-db-example-job - Closing DB cursor SUCCEEDED. | ||
[INFO ] file_based_step.py :103 ingest-from-db-example-job - Entering 30_ingest_to_table.py#run(...) ... | ||
[INFO ] ingester_router.py :106 ingest-from-db-example-job - Sending tabular data for ingestion with method: sqlite and target: None | ||
[INFO ] file_based_step.py :109 ingest-from-db-example-job - Exiting 30_ingest_to_table.py#run(...) SUCCESS | ||
``` | ||
</details> | ||
|
||
### Example: Configure logging format | ||
|
||
Create a data job and add the following config options | ||
|
||
``` | ||
[vdk] | ||
logging_metadata=timestamp,level,file_name,vdk_job_name | ||
logging_format=json | ||
``` | ||
|
||
And you should see json-formatted logs. | ||
|
||
<details> | ||
<summary>Click to expand</summary> | ||
``` | ||
{"filename": "managed_cursor.py", "lineno": 97, "vdk_job_name": "ingest-from-db-example-job", "message": "Executing query SUCCEEDED. Query duration 00h:00m:00s", "vdk_step_name": "20_create_table.sql", "vdk_step_type": "sql", "timestamp": 1697532608.968082, "level": "INFO"} | ||
{"filename": "managed_connection_base.py", "lineno": 133, "vdk_job_name": "ingest-from-db-example-job", "message": "Fetching query result...", "vdk_step_name": "20_create_table.sql", "vdk_step_type": "sql", "timestamp": 1697532608.9681149, "level": "INFO"} | ||
{"filename": "managed_cursor.py", "lineno": 193, "vdk_job_name": "ingest-from-db-example-job", "message": "Fetching all results from query ...", "vdk_step_name": "20_create_table.sql", "vdk_step_type": "sql", "timestamp": 1697532608.968137, "level": "INFO"} | ||
{"filename": "managed_cursor.py", "lineno": 196, "vdk_job_name": "ingest-from-db-example-job", "message": "Fetching all results from query SUCCEEDED.", "vdk_step_name": "20_create_table.sql", "vdk_step_type": "sql", "timestamp": 1697532608.9681568, "level": "INFO"} | ||
{"filename": "managed_cursor.py", "lineno": 203, "vdk_job_name": "ingest-from-db-example-job", "message": "Closing DB cursor ...", "vdk_step_name": "20_create_table.sql", "vdk_step_type": "sql", "timestamp": 1697532608.968405, "level": "INFO"} | ||
{"filename": "managed_cursor.py", "lineno": 205, "vdk_job_name": "ingest-from-db-example-job", "message": "Closing DB cursor SUCCEEDED.", "vdk_step_name": "20_create_table.sql", "vdk_step_type": "sql", "timestamp": 1697532608.9684658, "level": "INFO"} | ||
{"filename": "file_based_step.py", "lineno": 103, "vdk_job_name": "ingest-from-db-example-job", "message": "Entering 30_ingest_to_table.py#run(...) ...", "vdk_step_name": "30_ingest_to_table.py", "vdk_step_type": "python", "timestamp": 1697532608.9734771, "level": "INFO"} | ||
{"filename": "ingester_router.py", "lineno": 106, "vdk_job_name": "ingest-from-db-example-job", "message": "Sending tabular data for ingestion with method: sqlite and target: None", "vdk_step_name": "30_ingest_to_table.py", "vdk_step_type": "python", "timestamp": 1697532608.975029, "level": "INFO"} | ||
{"filename": "file_based_step.py", "lineno": 109, "vdk_job_name": "ingest-from-db-example-job", "message": "Exiting 30_ingest_to_table.py#run(...) SUCCESS", "vdk_step_name": "30_ingest_to_table.py", "vdk_step_type": "python", "timestamp": 1697532608.976068, "level": "INFO"} | ||
``` | ||
</details> | ||
|
||
### Example: Bound loggers | ||
|
||
TODO: Add an example once bound loggers are part of vdk-core | ||
|
||
### Example: Passing custom metadata fields with extra_params | ||
|
||
TODO: Add an example | ||
|
||
### Build and test | ||
|
||
``` | ||
pip install -r requirements.txt | ||
pip install -e . | ||
pytest | ||
``` | ||
|
||
In VDK repo [../build-plugin.sh](https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins/build-plugin.sh) script can be used also. | ||
|
||
|
||
#### Note about the CICD: | ||
|
||
.plugin-ci.yaml is needed only for plugins part of [Versatile Data Kit Plugin repo](https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins). | ||
|
||
The CI/CD is separated in two stages, a build stage and a release stage. | ||
The build stage is made up of a few jobs, all which inherit from the same | ||
job configuration and only differ in the Python version they use (3.7, 3.8, 3.9 and 3.10). | ||
They run according to rules, which are ordered in a way such that changes to a | ||
plugin's directory trigger the plugin CI, but changes to a different plugin does not. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# this file is used to provide testing requirements | ||
# for requirements (dependencies) needed during and after installation of the plugin see (and update) setup.py install_requires section | ||
|
||
pytest | ||
vdk-core | ||
vdk-test-utils |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Copyright 2021-2023 VMware, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
import pathlib | ||
|
||
import setuptools | ||
|
||
""" | ||
Builds a package with the help of setuptools in order for this package to be imported in other projects | ||
""" | ||
|
||
__version__ = "0.1.0" | ||
|
||
setuptools.setup( | ||
name="vdk-structlog", | ||
version=__version__, | ||
url="https://github.com/vmware/versatile-data-kit", | ||
description="Structured logging for versatile data kit", | ||
long_description=pathlib.Path("README.md").read_text(), | ||
long_description_content_type="text/markdown", | ||
install_requires=["vdk-core", "python-json-logger"], | ||
package_dir={"": "src"}, | ||
packages=setuptools.find_namespace_packages(where="src"), | ||
# This is the only vdk plugin specifc part | ||
# Define entry point called "vdk.plugin.run" with name of plugin and module to act as entry point. | ||
entry_points={ | ||
"vdk.plugin.run": ["vdk-structlog = vdk.plugin.structlog.structlog_plugin"] | ||
}, | ||
classifiers=[ | ||
"Development Status :: 2 - Pre-Alpha", | ||
"License :: OSI Approved :: Apache Software License", | ||
"Programming Language :: Python :: 3.7", | ||
"Programming Language :: Python :: 3.8", | ||
"Programming Language :: Python :: 3.9", | ||
"Programming Language :: Python :: 3.10", | ||
"Programming Language :: Python :: 3.11", | ||
], | ||
project_urls={ | ||
"Documentation": "https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins/vdk-structlog", | ||
"Source Code": "https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins/vdk-structlog", | ||
"Bug Tracker": "https://github.com/vmware/versatile-data-kit/issues/new/choose", | ||
}, | ||
) |
64 changes: 64 additions & 0 deletions
64
projects/vdk-plugins/vdk-structlog/src/vdk/plugin/structlog/constants.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# Copyright 2021-2023 VMware, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
import logging | ||
|
||
STRUCTLOG_LOGGING_METADATA_KEY = "logging_metadata" | ||
STRUCTLOG_LOGGING_FORMAT_KEY = "logging_format" | ||
|
||
STRUCTLOG_LOGGING_FORMAT_POSSIBLE_VALUES = ["console", "json"] | ||
STRUCTLOG_LOGGING_FORMAT_DEFAULT = "console" | ||
|
||
STRUCTLOG_LOGGING_METADATA_JOB = { | ||
"vdk_job_name": "%(vdk_job_name)s", | ||
"vdk_step_name": "%(vdk_step_name)s", | ||
"vdk_step_type": "%(vdk_step_type)s", | ||
} | ||
|
||
STRUCTLOG_METADATA_FIELDS = { | ||
"timestamp": "created", | ||
"level": "levelname", | ||
"logger_name": "name", | ||
"file_name": "filename", | ||
"line_number": "lineno", | ||
"function_name": "funcName", | ||
} | ||
|
||
JSON_STRUCTLOG_LOGGING_METADATA_DEFAULT = { | ||
"timestamp": "%(timestamp)s", | ||
"level": "%(level)s", | ||
"logger_name": "%(name)s", | ||
"file_name": "%(filename)s", | ||
"line_number": "%(lineno)s", | ||
"function_name": "%(funcName)s", | ||
} | ||
|
||
CONSOLE_STRUCTLOG_LOGGING_METADATA_DEFAULT = { | ||
"timestamp": "%(asctime)s [VDK]", | ||
"level": "[%(levelname)-5.5s]", | ||
"logger_name": "%(name)-30.30s", | ||
"file_name": "%(filename)20.20s", | ||
"line_number": ":%(lineno)-4.4s", | ||
"function_name": "%(funcName)-16.16s", | ||
} | ||
|
||
CONSOLE_STRUCTLOG_LOGGING_METADATA_JOB = { | ||
"vdk_job_name": "%(vdk_job_name)s", | ||
"vdk_step_name": "%(vdk_step_name)s", | ||
"vdk_step_type": "%(vdk_step_type)s", | ||
} | ||
|
||
CONSOLE_STRUCTLOG_LOGGING_METADATA_ALL = { | ||
**CONSOLE_STRUCTLOG_LOGGING_METADATA_DEFAULT, | ||
**CONSOLE_STRUCTLOG_LOGGING_METADATA_JOB, | ||
} | ||
|
||
STRUCTLOG_LOGGING_METADATA_ALL = { | ||
**JSON_STRUCTLOG_LOGGING_METADATA_DEFAULT, | ||
**STRUCTLOG_LOGGING_METADATA_JOB, | ||
} | ||
|
||
STRUCTLOG_LOGGING_METADATA_ALL_KEYS = set( | ||
JSON_STRUCTLOG_LOGGING_METADATA_DEFAULT.keys() | ||
) | set(STRUCTLOG_LOGGING_METADATA_JOB.keys()) | ||
|
||
RECORD_DEFAULT_FIELDS = set(vars(logging.LogRecord("", "", "", "", "", "", ""))) |
68 changes: 68 additions & 0 deletions
68
projects/vdk-plugins/vdk-structlog/src/vdk/plugin/structlog/filters.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# Copyright 2021-2023 VMware, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
from logging import Filter | ||
from logging import LogRecord | ||
from typing import Set | ||
|
||
from vdk.plugin.structlog.constants import RECORD_DEFAULT_FIELDS | ||
|
||
|
||
class JsonMetadataFilter(Filter): | ||
""" | ||
Modifies log records when json formatting is used. | ||
Removes the metadata fields that are not part of the metadata config. | ||
The json formatter displays all custom fields by default, so if they're not | ||
part of the metadata config, they should be removed in order for filtering | ||
to work. | ||
""" | ||
|
||
def __init__(self, key_set: Set): | ||
super().__init__() | ||
self._key_set = key_set | ||
|
||
def filter(self, record: LogRecord) -> LogRecord: | ||
fields = vars(record) | ||
for field in fields: | ||
if field not in self._key_set and field not in RECORD_DEFAULT_FIELDS: | ||
# not sure if we want to delete, e.g. delattr(record, field) or just set to empty | ||
setattr(record, field, "") | ||
return record | ||
|
||
|
||
class ConsoleMetadataFilter(Filter): | ||
""" | ||
Modifies log records when console formatting is used | ||
Sets metadata fields that are in the config set but not in the log record to empty string. | ||
This prevents the formatter from breaking for different records, e.g. some records have vdk_job_name and some don't | ||
and records that don't have vdk_job_name should still have vdk_job_name in order not to break the formatter. | ||
""" | ||
|
||
def __init__(self, key_set: Set): | ||
super().__init__() | ||
temp = [ | ||
RECORD_DEFAULT_FIELDS[x] if x in RECORD_DEFAULT_FIELDS else x | ||
for x in key_set | ||
] | ||
self._key_set = set(temp) | ||
|
||
def filter(self, record: LogRecord) -> LogRecord: | ||
fields = vars(record) | ||
for field in self._key_set: | ||
if field not in fields and field not in RECORD_DEFAULT_FIELDS: | ||
setattr(record, field, "") | ||
return record | ||
|
||
|
||
class AttributeAdder(Filter): | ||
""" | ||
Adds attributes to log records. | ||
""" | ||
|
||
def __init__(self, attr_key: str, attr_value: str): | ||
super().__init__() | ||
self._attr_key = attr_key | ||
self._attr_value = attr_value | ||
|
||
def filter(self, record: LogRecord) -> LogRecord: | ||
setattr(record, self._attr_key, self._attr_value) | ||
return record |
Oops, something went wrong.