Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sklearn instrumentation #151

Merged

Conversation

crflynn
Copy link
Contributor

@crflynn crflynn commented Nov 6, 2020

Description

Provides an opentelemetry instrumentation package for sklearn models, instrumenting internal spans at the estimator level. The motivation is to provide observability into machine learning models that run for realtime predictive applications that have many complex transformers and predictors.

The instrumentor adds spans to sklearn estimators according to a set of default estimator methods (namely fit, predict, predict_proba and transform) and other configuration parameters that determine how spans are implemented through the model hierarchy. The default configuration also handles Pipeline and FeatureUnion hierarchies. Since sklearn's API is easily extended, the configuration parameters allow for custom model hierarchy traversal, allowing spans to be implemented in custom estimators as well.

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

There are multiple tests for instrumentation on/off, span attributes, and configuration args.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Originally open-telemetry/opentelemetry-python#1054

@crflynn crflynn requested review from a team, toumorokoshi and aabmass and removed request for a team November 6, 2020 02:10
Copy link
Member

@toumorokoshi toumorokoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Almost LGTM, just missing a couple licenses in files.

if isclass(estimator):
name = estimator.__name__
else:
name = estimator.__class__.__name__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth guarding and raising and exception in this case? it looks like this is assuming that it's a BaseEstimator object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether a class or object is passed, we want the name of the class for naming the span. This handles both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! That's clear to me. I was just wondering if you need additional constraints on whether the object / class is an estimator.

def implement_spans_fn(func: Callable):
@wraps(func)
def wrapper(*args, **kwargs):
with get_tracer(__name__, __version__).start_as_current_span(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: this code looks like it could be refactored and shared with the wrapper code above.

name="{cls}.{func}".format(cls=name, func=func.__name__),
) as span:
if span.is_recording():
for key, val in attributes.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracing the call tree, it looks like this facility only supports static attributes to annotate an instrumentation.

I would wonder if it's valuable to add a callable edition, where it will pass in the object that is starting the operation, so additional information can be extracted (e.g. some parameterization of the model)?

It's not like this couldn't be added later, just a thought.

@@ -0,0 +1,40 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test code also requires the license in the header. could you add that?

@@ -0,0 +1,175 @@
from sklearn.ensemble import RandomForestClassifier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add license.

@crflynn crflynn force-pushed the opentelemetry-instrumentation-sklearn branch from c28e9f4 to b5f8973 Compare November 7, 2020 18:32
Copy link
Member

@toumorokoshi toumorokoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks. It's really cool to have instrumentation even for machine learning systems.


## Unreleased

- Initial release
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be worth adding the PR link here?

Copy link
Contributor

@lzchen lzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants