Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add usage tracking (with segment) #174

Merged
merged 7 commits into from
Sep 9, 2020
Merged

Conversation

cgardens
Copy link
Contributor

@cgardens cgardens commented Sep 4, 2020

What

  • Add TrackingStrategy to the env.
  • Add customerId to the workspace config so that we can group the actions for an instance.
    • this comes with a hack to make sure it is set in time. I'll add a comment below that people can use to leave feedback.
  • Add TrackingClients for tracking usage. (iface, SegmentTrackingClient, and LoggingTrackingTrackingClient)
  • Use tracking client as a singleton (so that we do not need to dependency inject it everywhere). This may be controversial and would like to hear if anyone doesn't like this choice. I'll add a comment below that people can use to leave feedback.
  • Added an example usage of how to use the tracking client (but have not added all of the track events that we want yet). Example usages are in ServerApp.java and SchedulerHandler.java

Checklist (going to hold off on this until I get some initial feedback that people are happy with the approach)

  • Set this up in segment and make sure it reports properly
  • Add env variable for tracking to docker configs
  • Unit tests
  • Add track invocations for everything we actually want to track

Recommended reading order

  1. TrackingClient.java, SegmentTrackingClient.java, LoggingTrackingTrackingClient.java, TrackingClientSingleton.java
  2. EnvConfigs.java
  3. SeverApp.java
  4. the rest


private static TrackingClient trackingClient;

public static TrackingClient get() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid having to dependency inject the tracker everywhere. Would be curious if people are okay with this approach for tracking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the production tracker being a singleton but it seems like it needs to be injectable if we're going to test our metrics? And if we don't test our metrics we can't really trust them 100%.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per our conversation offline, keeping this as a singleton is valuable so that we don't have to dependency inject everywhere. but in cases where we do need to test the behavior, we might dependency inject more locally into a class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed for the singleton.

We could have a way to change it with a setter so you can actually set it to a mock and test interactions in the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest having it set by default to Logging and have a way to set it up with env that you call in the main. Something like a TrackingClientSingleton.configure(opts).

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michel-tricot - i took a hack at this. not quite what you described. lmk if this works for you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk about it offline. I am not fan that you are creating a configpersistence that could be different from the rest of the application.

We shouldn't be afraid to have a static initialializer that needs to be called in the main (so you can inject options). Having the initializer would probably also solve the TrackingIdentiy supplier piece

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -59,6 +69,12 @@ public ServerApp(final Path configRoot) {
}

public void start() throws Exception {
// hack: upon installation we need to assign a random customerId so that when
// tracking we can associate all action with the correct anonymous id.
setCustomerIdIfNotSet(configRoot);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really not great. I like it better than the other option I could come up with though. The alternative I think is to have an initialize script in our docker setup that finds the default workspace json file and mutates it to inject a customerId. I don't like it as much as what I have here for two reasons:

  1. Don't want to add initialize scripts in docker right now since we don't do this for anything else.
  2. It's a hack but at least it's pretty easy to find and understand when it's in the java code as opposed to some random bash script.

Copy link
Contributor

@jrhizor jrhizor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments.

Additionally, we should change customerId to instanceId/installationId/similar since most of our users will not actually be customers at first.

@@ -83,6 +86,22 @@ public String getDockerNetwork() {
return DEFAULT_NETWORK;
}

@Override
public TrackingStrategy getTrackingStrategy() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the enum's valueOf with a fallback to the default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 +1


public class SegmentTrackingClient implements TrackingClient {

private static final String SEGMENT_WRITE_KEY = "123";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be hardcoded with the actual key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was planning to. it's not considered a secret. i don't see alot of value in making it an env variable. will have to dependency inject it so that i can test this class but planning to keep it hardcoded.

public class SegmentTrackingClient implements TrackingClient {

private static final String SEGMENT_WRITE_KEY = "123";
private static final String EMAIL_KEY = "email";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd be fine inlining this. Two adjacent *_KEY definitions with different meanings makes you do a double take.


private static TrackingClient trackingClient;

public static TrackingClient get() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the production tracker being a singleton but it seems like it needs to be injectable if we're going to test our metrics? And if we don't test our metrics we can't really trust them 100%.

case SEGMENT:
trackingClient = new SegmentTrackingClient(identitySupplier);
break;
case LOGGING:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would people want to configure logging metrics instead of segment metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any sort of local development.


private static TrackingClient trackingClient;

public static TrackingClient get() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed for the singleton.

We could have a way to change it with a setter so you can actually set it to a mock and test interactions in the test.


private static TrackingClient trackingClient;

public static TrackingClient get() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest having it set by default to Logging and have a way to set it up with env that you call in the main. Something like a TrackingClientSingleton.configure(opts).

WDYT?

final Configs configs = new EnvConfigs();
final Path configRoot = configs.getConfigRoot();

final StandardWorkspace workspace;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put the decl with the intialization?

public static TrackingClient get() {
if (trackingClient == null) {
final TrackingIdentity trackingIdentity = getTrackingIdentity();
final Supplier<TrackingIdentity> identitySupplier = () -> trackingIdentity;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to pass a supplier instead of the identity directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the TrackingIdentity change between the time of install and after they entered their email? Maybe the supplier should just be getTrackingIdentity

@@ -37,6 +37,9 @@
public static final String WORKSPACE_DOCKER_MOUNT = "WORKSPACE_DOCKER_MOUNT";
public static final String CONFIG_ROOT = "CONFIG_ROOT";
public static final String DOCKER_NETWORK = "DOCKER_NETWORK";
public static final String TRACKING_STRATEGY = "TRACKING_STRATEGY";
public static final String TRACKING_STRATEGY_SEGMENT = "segment";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These finals are not necessary you should be able to use the enum directly (with value of and potentially a case change)

@@ -83,6 +86,22 @@ public String getDockerNetwork() {
return DEFAULT_NETWORK;
}

@Override
public TrackingStrategy getTrackingStrategy() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 +1

@@ -115,6 +131,32 @@ public void configure() {
server.join();
}

private void setCustomerIdIfNotSet(final Path configRoot) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that code live in TrackingClientSingleton ?

Copy link
Contributor Author

@cgardens cgardens Sep 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting thought. if we only ever want to use customerId in the tracker, then i'd be fine with that. in the current iteration of the PR, i've added customerId as part of the API interface in the WorkspaceRead struct. If we think customerId is useful outside of just tracking and should stay in the API interface, then I think it's better for it to be in ServerApp than for it to be in TrackingClientSingleton, because then the ServerApp and its data model would have an implicit dependency TrackingClientSingleton being initialized. If we want to confine customerId just to tracking and remove it from the API, then yeah, happy to stick this in TrackingClientSingleton.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let's keep that as a hack then.

import io.dataline.config.persistence.DefaultConfigPersistence;
import java.nio.file.Path;

public class TrackingClientSingleton {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw this singleton is not thread safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay i'll take a look


private static TrackingClient trackingClient;

public static TrackingClient get() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk about it offline. I am not fan that you are creating a configpersistence that could be different from the rest of the application.

We shouldn't be afraid to have a static initialializer that needs to be called in the main (so you can inject options). Having the initializer would probably also solve the TrackingIdentiy supplier piece

public TrackingIdentity get() {
try {
final StandardWorkspace workspace = configPersistence.getConfig(
ConfigSchema.STANDARD_WORKSPACE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random question: how is it going to work when we have multiple workspaces?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It almost feels like the workspace should be an attribute of the tracking instead of being the key. and the user/email/anonymous id should tied to the application itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we cross that bridge when we have multiple workspaces. it's unclear at this point if that's even a thing we'll end up needing.

email = workspace.getEmail();
}
return new TrackingIdentity(workspace.getCustomerId(), email);
} catch (ConfigNotFoundException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can collapse these exceptions since the confignotfound already mentions a similar message

@michel-tricot
Copy link
Contributor

I don't want to block the merge either so go ahead, we can always refactor later

@cgardens
Copy link
Contributor Author

cgardens commented Sep 9, 2020

i'll take another look at this today and address the issues you mentioned.

fix dep syntax

valueOf, envconfigs test

inline

add segment write key

remove supplier

tracking identity supplier test

add TrackingClientSingletonTest
@cgardens cgardens force-pushed the cgardens/add_usage_tracking branch from e07c40b to f931491 Compare September 9, 2020 17:05
@cgardens cgardens merged commit 38e84f1 into master Sep 9, 2020
@cgardens cgardens deleted the cgardens/add_usage_tracking branch September 9, 2020 23:36
davydov-d added a commit that referenced this pull request Aug 23, 2022
davydov-d added a commit that referenced this pull request Aug 23, 2022
davydov-d added a commit that referenced this pull request Aug 23, 2022
* #174 source file: fix csv schema discovery

* #174 source file: upd changelog

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <[email protected]>
rodireich pushed a commit that referenced this pull request Aug 25, 2022
* #174 source file: fix csv schema discovery

* #174 source file: upd changelog

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants