Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-structlog: default formatter #2936

Merged
merged 21 commits into from
Dec 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 30 additions & 4 deletions projects/vdk-plugins/vdk-structlog/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,36 @@ pip install vdk-structlog

(`vdk config-help` is useful command to browse all config options of your installation of vdk)

| Name | Description | Example Value | Possible Values |
| ---------------- | --------------------------------------------------------------------- |---------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|
| logging_metadata | Configure the metadata that will be output along with the log message | "timestamp, level, logger_name, file_name, vdk_job_name | "timestamp, level, logger_name, file_name, line_number, function_name, vdk_job_name, vdk_step_name, vdk_step_type" |
| logging_format | Configure the logging output format. Available formats: json, console | "console" | "console", "json" |
| Name | Description | Example Value | Possible Values |
|----------------------------|---------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|
| logging_metadata | Configure the metadata that will be output along with the log message | "timestamp, level, logger_name, file_name, vdk_job_name | "timestamp, level, logger_name, file_name, line_number, function_name, vdk_job_name, vdk_step_name, vdk_step_type" |
| logging_format | Configure the logging output format. Available formats: json, console | "console" | "console", "json" |
| custom_console_log_pattern | Custom format string for console logging, applied only when`logging_format` is 'console' and overrides `logging_metadata` | "%(asctime)s %(name)-12s %(levelname)-8s %(message)s" | Any valid Python logging format string |

### Example: Configure Custom Console Format

If you wish to apply a specific format to your console logs, you can define a custom format using the `custom_console_log_pattern` configuration. This custom format string will be used only when the `logging_format` is set to 'console'.

For example, add the following to your data job configuration:

```
[vdk]
custom_console_log_pattern=%(asctime)s %(name)-12s %(levelname)-8s %(message)s
```
When you run your data job, regardless of other logging settings, your logs will strictly follow this custom format, displaying the timestamp, logger's name, log level, and the log message as per the format string specified.

```
2023-10-17 11:20:59,202 managed_cursor INFO ingest-from-db-example-job - Executing query SUCCEEDED. Query duration 00h:00m:00s
2023-10-17 11:20:59,202 managed_connectio INFO ingest-from-db-example-job - Fetching query result...
2023-10-17 11:20:59,202 managed_cursor INFO ingest-from-db-example-job - Fetching all results from query ...
2023-10-17 11:20:59,202 managed_cursor INFO ingest-from-db-example-job - Fetching all results from query SUCCEEDED.
2023-10-17 11:20:59,202 managed_cursor INFO ingest-from-db-example-job - Closing DB cursor ...
2023-10-17 11:20:59,202 managed_cursor INFO ingest-from-db-example-job - Closing DB cursor SUCCEEDED.
2023-10-17 11:20:59,203 file_based_step INFO ingest-from-db-example-job - Entering 30_ingest_to_table.py#run(...) ...
2023-10-17 11:20:59,203 ingester_router INFO ingest-from-db-example-job - Sending tabular data for ingestion with method: sqlite and target: None
2023-10-17 11:20:59,204 file_based_step INFO ingest-from-db-example-job - Exiting 30_ingest_to_table.py#run(...) SUCCESS
```


### Example: Configure metadata

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

STRUCTLOG_LOGGING_METADATA_KEY = "logging_metadata"
STRUCTLOG_LOGGING_FORMAT_KEY = "logging_format"
STRUCTLOG_CONSOLE_LOG_PATTERN = "logging_custom_format"

STRUCTLOG_LOGGING_FORMAT_POSSIBLE_VALUES = ["console", "json"]
STRUCTLOG_LOGGING_FORMAT_DEFAULT = "console"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,9 @@ def build_ltsv_format(self) -> str:
return ltsv_format


def create_formatter(logging_format: str, metadata_keys: str) -> [Formatter, Filter]:
def create_formatter(
logging_format: str, metadata_keys: str, custom_console_format: str = None
) -> [Formatter, Filter]:
"""
Creates a formatter and a filter based on the logging format configuration
and metadata_keys configuration that are passed. The formatter takes care
Expand All @@ -152,9 +154,13 @@ def create_formatter(logging_format: str, metadata_keys: str) -> [Formatter, Fil
fmt=StructlogMetadataBuilder(metadata_keys).build_ltsv_format()
)
custom_key_filter = ConsoleMetadataFilter(key_set)
else:
formatter = ConsoleFormatter(
fmt=StructlogMetadataBuilder(metadata_keys).build_console_format()
elif logging_format == "console":
format_string = (
custom_console_format
if custom_console_format
else StructlogMetadataBuilder(metadata_keys).build_console_format()
)
formatter = ConsoleFormatter(fmt=format_string)
custom_key_filter = ConsoleMetadataFilter(key_set)

return formatter, custom_key_filter
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from vdk.internal.core.config import ConfigurationBuilder
from vdk.internal.core.context import CoreContext
from vdk.plugin.structlog.constants import JSON_STRUCTLOG_LOGGING_METADATA_DEFAULT
from vdk.plugin.structlog.constants import STRUCTLOG_CONSOLE_LOG_PATTERN
from vdk.plugin.structlog.constants import STRUCTLOG_LOGGING_FORMAT_DEFAULT
from vdk.plugin.structlog.constants import STRUCTLOG_LOGGING_FORMAT_KEY
from vdk.plugin.structlog.constants import STRUCTLOG_LOGGING_FORMAT_POSSIBLE_VALUES
Expand Down Expand Up @@ -79,6 +80,12 @@ def vdk_configure(self, config_builder: ConfigurationBuilder):
),
)

config_builder.add(
key=STRUCTLOG_CONSOLE_LOG_PATTERN,
default_value="",
description="Custom format string for console logging. Leave empty for default format.",
)

config_builder.add(
key=STRUCTLOG_LOGGING_FORMAT_KEY,
default_value=STRUCTLOG_LOGGING_FORMAT_DEFAULT,
Expand All @@ -93,8 +100,13 @@ def vdk_initialize(self, context: CoreContext):
logging_formatter = context.configuration.get_value(
STRUCTLOG_LOGGING_FORMAT_KEY
)
custom_format_string = context.configuration.get_value(
STRUCTLOG_CONSOLE_LOG_PATTERN
)

formatter, metadata_filter = create_formatter(logging_formatter, metadata_keys)
formatter, metadata_filter = create_formatter(
logging_formatter, metadata_keys, custom_format_string
)

root_logger = logging.getLogger()
root_logger.removeHandler(root_logger.handlers[0])
Expand All @@ -107,14 +119,19 @@ def vdk_initialize(self, context: CoreContext):

@hookimpl(hookwrapper=True)
def initialize_job(self, context: JobContext) -> None:
metadata_keys = context.core_context.configuration.get_value(
STRUCTLOG_LOGGING_METADATA_KEY
)
logging_formatter = context.core_context.configuration.get_value(
STRUCTLOG_LOGGING_FORMAT_KEY
)
metadata_keys = context.core_context.configuration.get_value(
STRUCTLOG_LOGGING_METADATA_KEY
custom_format_string = context.core_context.configuration.get_value(
STRUCTLOG_CONSOLE_LOG_PATTERN
)

formatter, metadata_filter = create_formatter(logging_formatter, metadata_keys)
formatter, metadata_filter = create_formatter(
logging_formatter, metadata_keys, custom_format_string
)
job_name_adder = AttributeAdder("vdk_job_name", context.name)

root_logger = logging.getLogger()
Expand All @@ -134,14 +151,19 @@ def initialize_job(self, context: JobContext) -> None:

@hookimpl(hookwrapper=True)
def run_job(self, context: JobContext) -> Optional[ExecutionResult]:
metadata_keys = context.core_context.configuration.get_value(
STRUCTLOG_LOGGING_METADATA_KEY
)
logging_formatter = context.core_context.configuration.get_value(
STRUCTLOG_LOGGING_FORMAT_KEY
)
metadata_keys = context.core_context.configuration.get_value(
STRUCTLOG_LOGGING_METADATA_KEY
custom_format_string = context.core_context.configuration.get_value(
STRUCTLOG_CONSOLE_LOG_PATTERN
)

formatter, metadata_filter = create_formatter(logging_formatter, metadata_keys)
formatter, metadata_filter = create_formatter(
logging_formatter, metadata_keys, custom_format_string
)
job_name_adder = AttributeAdder("vdk_job_name", context.name)

root_logger = logging.getLogger()
Expand Down
52 changes: 52 additions & 0 deletions projects/vdk-plugins/vdk-structlog/tests/test_structlog.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,58 @@ def test_stock_fields_removal(log_format):
assert re.search(stock_field_reps[shown_field], test_log) is not None


@pytest.mark.parametrize("log_format", ["console"])
def test_custom_format_applied(log_format):
custom_format_string = "%(asctime)s %(name)-12s %(levelname)-8s %(message)s"

with mock.patch.dict(
os.environ,
{
"VDK_LOGGING_FORMAT": log_format,
"VDK_CUSTOM_CONSOLE_LOG_PATTERN": custom_format_string,
},
):
logs = _run_job_and_get_logs()

for log in logs:
if "Log statement with no bound context" in log:
assert _matches_custom_format(log)
break
else:
pytest.fail("Log statement with no bound context not found in logs")


@pytest.mark.parametrize("log_format", ["json", "ltsv"])
def test_custom_format_not_applied_for_non_console_formats(log_format):
custom_format_string = "%(asctime)s %(name)-12s %(levelname)-8s %(message)s"

with mock.patch.dict(
os.environ,
{
"VDK_LOGGING_METADATA": "timestamp,level,file_name,vdk_job_name",
"VDK_LOGGING_FORMAT": log_format,
"VDK_CUSTOM_CONSOLE_LOG_PATTERN": custom_format_string,
},
):
logs = _run_job_and_get_logs()

for log in logs:
if "Log statement with no bound context" in log:
assert not _matches_custom_format(
log
), f"Custom format was incorrectly applied for {log_format} format. Log: {log}"
break
else:
pytest.fail("Log statement with no bound context not found in logs")


def _matches_custom_format(log):
pattern = re.compile(
r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \S{1,12} \S{1,8} .+"
)
return bool(pattern.search(log))


def _run_job_and_get_logs():
"""
Runs the necessary job and returns the output logs.
Expand Down