-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sklearn instrumentation #151
sklearn instrumentation #151
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Almost LGTM, just missing a couple licenses in files.
if isclass(estimator): | ||
name = estimator.__name__ | ||
else: | ||
name = estimator.__class__.__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth guarding and raising and exception in this case? it looks like this is assuming that it's a BaseEstimator object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether a class or object is passed, we want the name of the class for naming the span. This handles both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! That's clear to me. I was just wondering if you need additional constraints on whether the object / class is an estimator.
def implement_spans_fn(func: Callable): | ||
@wraps(func) | ||
def wrapper(*args, **kwargs): | ||
with get_tracer(__name__, __version__).start_as_current_span( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: this code looks like it could be refactored and shared with the wrapper code above.
name="{cls}.{func}".format(cls=name, func=func.__name__), | ||
) as span: | ||
if span.is_recording(): | ||
for key, val in attributes.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracing the call tree, it looks like this facility only supports static attributes to annotate an instrumentation.
I would wonder if it's valuable to add a callable edition, where it will pass in the object that is starting the operation, so additional information can be extracted (e.g. some parameterization of the model)?
It's not like this couldn't be added later, just a thought.
@@ -0,0 +1,40 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test code also requires the license in the header. could you add that?
@@ -0,0 +1,175 @@ | |||
from sklearn.ensemble import RandomForestClassifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add license.
c28e9f4
to
b5f8973
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks. It's really cool to have instrumentation even for machine learning systems.
|
||
## Unreleased | ||
|
||
- Initial release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be worth adding the PR link here?
instrumentation/opentelemetry-instrumentation-sklearn/setup.cfg
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Description
Provides an opentelemetry instrumentation package for sklearn models, instrumenting internal spans at the estimator level. The motivation is to provide observability into machine learning models that run for realtime predictive applications that have many complex transformers and predictors.
The instrumentor adds spans to sklearn estimators according to a set of default estimator methods (namely
fit
,predict
,predict_proba
andtransform
) and other configuration parameters that determine how spans are implemented through the model hierarchy. The default configuration also handlesPipeline
andFeatureUnion
hierarchies. Since sklearn's API is easily extended, the configuration parameters allow for custom model hierarchy traversal, allowing spans to be implemented in custom estimators as well.Type of change
How Has This Been Tested?
There are multiple tests for instrumentation on/off, span attributes, and configuration args.
Checklist:
Originally open-telemetry/opentelemetry-python#1054