sklearn instrumentation #151

crflynn · 2020-11-06T02:10:27Z

Description

Provides an opentelemetry instrumentation package for sklearn models, instrumenting internal spans at the estimator level. The motivation is to provide observability into machine learning models that run for realtime predictive applications that have many complex transformers and predictors.

The instrumentor adds spans to sklearn estimators according to a set of default estimator methods (namely fit, predict, predict_proba and transform) and other configuration parameters that determine how spans are implemented through the model hierarchy. The default configuration also handles Pipeline and FeatureUnion hierarchies. Since sklearn's API is easily extended, the configuration parameters allow for custom model hierarchy traversal, allowing spans to be implemented in custom estimators as well.

Type of change

New feature (non-breaking change which adds functionality)
This change requires a documentation update

How Has This Been Tested?

There are multiple tests for instrumentation on/off, span attributes, and configuration args.

Checklist:

Followed the style guidelines of this project
Changelogs have been updated
Unit tests have been added
Documentation has been updated

Originally open-telemetry/opentelemetry-python#1054

toumorokoshi

Thanks! Almost LGTM, just missing a couple licenses in files.

toumorokoshi · 2020-11-06T05:14:17Z

.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py

+    if isclass(estimator):
+        name = estimator.__name__
+    else:
+        name = estimator.__class__.__name__


is it worth guarding and raising and exception in this case? it looks like this is assuming that it's a BaseEstimator object.

Whether a class or object is passed, we want the name of the class for naming the span. This handles both.

Yep! That's clear to me. I was just wondering if you need additional constraints on whether the object / class is an estimator.

toumorokoshi · 2020-11-06T05:16:49Z

.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py

+    def implement_spans_fn(func: Callable):
+        @wraps(func)
+        def wrapper(*args, **kwargs):
+            with get_tracer(__name__, __version__).start_as_current_span(


nitpick: this code looks like it could be refactored and shared with the wrapper code above.

toumorokoshi · 2020-11-06T06:13:15Z

.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py

+            name="{cls}.{func}".format(cls=name, func=func.__name__),
+        ) as span:
+            if span.is_recording():
+                for key, val in attributes.items():


tracing the call tree, it looks like this facility only supports static attributes to annotate an instrumentation.

I would wonder if it's valuable to add a callable edition, where it will pass in the object that is starting the operation, so additional information can be extracted (e.g. some parameterization of the model)?

It's not like this couldn't be added later, just a thought.

toumorokoshi · 2020-11-06T06:14:24Z

instrumentation/opentelemetry-instrumentation-sklearn/tests/fixtures.py

@@ -0,0 +1,40 @@
+import numpy as np


test code also requires the license in the header. could you add that?

toumorokoshi · 2020-11-06T06:14:35Z

instrumentation/opentelemetry-instrumentation-sklearn/tests/test_sklearn.py

@@ -0,0 +1,175 @@
+from sklearn.ensemble import RandomForestClassifier


please add license.

toumorokoshi

LGTM! Thanks. It's really cool to have instrumentation even for machine learning systems.

toumorokoshi · 2020-11-07T20:52:08Z

instrumentation/opentelemetry-instrumentation-sklearn/CHANGELOG.md

+
+## Unreleased
+
+- Initial release


would it be worth adding the PR link here?

instrumentation/opentelemetry-instrumentation-sklearn/setup.cfg

lzchen

Nice!

Co-authored-by: Leighton Chen <[email protected]>

sklearn instrumentation

623d2ec

crflynn requested review from a team, toumorokoshi and aabmass and removed request for a team November 6, 2020 02:10

crflynn and others added 2 commits November 5, 2020 21:10

Merge branch 'master' into opentelemetry-instrumentation-sklearn

e420c18

missing test

369dba4

toumorokoshi suggested changes Nov 6, 2020

View reviewed changes

licenses and refactor span wrapper

b5f8973

crflynn force-pushed the opentelemetry-instrumentation-sklearn branch from c28e9f4 to b5f8973 Compare November 7, 2020 18:32

Merge branch 'master' into opentelemetry-instrumentation-sklearn

f262a4e

toumorokoshi approved these changes Nov 7, 2020

View reviewed changes

add pr link

3f8eab3

lzchen reviewed Nov 9, 2020

View reviewed changes

instrumentation/opentelemetry-instrumentation-sklearn/setup.cfg Outdated Show resolved Hide resolved

lzchen approved these changes Nov 9, 2020

View reviewed changes

crflynn and others added 3 commits November 9, 2020 11:28

Update instrumentation/opentelemetry-instrumentation-sklearn/setup.cfg

a91956e

Co-authored-by: Leighton Chen <[email protected]>

Merge branch 'master' into opentelemetry-instrumentation-sklearn

4ee69ac

Merge branch 'master' into opentelemetry-instrumentation-sklearn

ed3559a

lzchen merged commit dad5f5b into open-telemetry:master Nov 10, 2020

matiasdahl mentioned this pull request Feb 5, 2023

docs: Comparison with similar tools (if any?) composable-logs/composable-logs#122

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sklearn instrumentation #151

sklearn instrumentation #151

crflynn commented Nov 6, 2020

toumorokoshi left a comment

toumorokoshi Nov 6, 2020

crflynn Nov 7, 2020

toumorokoshi Nov 7, 2020

toumorokoshi Nov 6, 2020

toumorokoshi Nov 6, 2020

toumorokoshi Nov 6, 2020

toumorokoshi Nov 6, 2020

toumorokoshi left a comment

toumorokoshi Nov 7, 2020

lzchen left a comment

		@@ -0,0 +1,175 @@
		from sklearn.ensemble import RandomForestClassifier

sklearn instrumentation #151

sklearn instrumentation #151

Conversation

crflynn commented Nov 6, 2020

Description

Type of change

How Has This Been Tested?

Checklist:

toumorokoshi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toumorokoshi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lzchen left a comment

Choose a reason for hiding this comment