Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for a paste operator in join.as() prefix specifications. #444

Closed

Conversation

jonseymour
Copy link
Contributor

Per the discussion in #435, the join.as() method now:

  • is optional, if the field names of all input streams are unique
  • supports the use of a trailing '#' paste operator in prefix specifications.
  • supports duplicate prefixes, provided that the resulting field names are unique

An output column derived from field k of input stream i is now named as follows:

if the ith prefix is empty or not specified:
    k
if the ith prefix contains a trailing '#'
   prefix[i][0:len(len(prefix[i])-1]+k
otherwise
  prefix[i]+"."+k

An error is detected at runtime if the output of a join node constructed using these rules would result in points with duplicate field names.

All existing, valid Tick scripts should operate per the current behaviour. Some Tick scripts that were invalid are now valid.

@jonseymour
Copy link
Contributor Author

Note this PR currently includes implementations of #443 and #331, assuming that these PRs will be reviewed and merged in that order. I will re-roll this PR once the other PRs are merged or otherwise resolved.

@jonseymour
Copy link
Contributor Author

The tests here demonstrate the use of the new as() implementation.

All of these are now valid constructions:

metric1
    |join(metric2)
    |log() # output fields are metric1, metric2

metric1
    |join(metric2).as('out1', 'out2')
    |log() # output fields are out1.metric1, out2.metric2

metric1
    |join(metric2).as('out1')
    |log() # output fields are out1.metric1, metric2

metric1
    |join(metric2).as('#')
    |log() # output fields are metric1, metric2

metric1
    |join(metric2).as('#', '#')
    |log() # output fields are metric1, metric2

metric2
    |join(metric2a).as('i1_#', 'i2_#')
    |log() # output fields are i1_metric2, i2_metric2

The following is an example of a join that causes a runtime error because of duplicate names in the output stream.

var metric2 = stream
    |from()
        .database('sampledb')
        .retentionPolicy('default')
        .measurement('sample')
        .groupBy('node')
    |window()
        .align()
        .period(5m)
        .every(5m)
    |shift(-5m)
    |mean('metric2').as('metric2')

var metric2a = stream
    |from()
        .database('sampledb')
        .retentionPolicy('default')
        .measurement('sample')
        .groupBy('node')
    |window()
        .align()
        .period(5m)
        .every(5m)
    |shift(-5m)
    |mean('metric2').as('metric2')

metric2
    |join(metric2a)
    |log()

@jonseymour jonseymour force-pushed the jss-435-join-as-paste-operator branch from 385bf32 to bac2dbf Compare April 9, 2016 06:47
jonseymour added a commit to jonseymour/kapacitor that referenced this pull request Apr 9, 2016
jonseymour added a commit to jonseymour/kapacitor that referenced this pull request Apr 9, 2016
jonseymour added a commit to jonseymour/kapacitor that referenced this pull request Apr 9, 2016
jonseymour added a commit to jonseymour/kapacitor that referenced this pull request Apr 9, 2016
@jonseymour jonseymour force-pushed the jss-435-join-as-paste-operator branch from bac2dbf to 63673e0 Compare April 9, 2016 09:33
jonseymour added a commit to jonseymour/kapacitor that referenced this pull request Apr 9, 2016
@jonseymour
Copy link
Contributor Author

I wonder if there is really a good reason to prevent "." appearing in a specified prefix?

@nathanielc
Copy link
Contributor

I wonder if there is really a good reason to prevent "." appearing in a specified prefix?

prefix1: 'abc'
prefix2: 'ab'

field1 'd'
field2 'cd'

resulting fields using . delimiter 'abc.d' and 'ab.cd' but without a delimiter you get a conflict on 'abcd'.

Any delimiter is needed to resolve issues where one of the prefixes is a prefix of the other. As long as you disallow the delimiter in the prefix conflicts are impossible.

This case is obviously not likely but I'd rather just prevent it.

@jonseymour jonseymour force-pushed the jss-435-join-as-paste-operator branch from f089852 to d41c867 Compare April 13, 2016 12:48
jonseymour added a commit to jonseymour/kapacitor that referenced this pull request Apr 13, 2016
jonseymour added a commit to jonseymour/kapacitor that referenced this pull request Apr 13, 2016
jonseymour added a commit to jonseymour/kapacitor that referenced this pull request Apr 13, 2016
jonseymour added a commit to jonseymour/kapacitor that referenced this pull request Apr 13, 2016
@jonseymour jonseymour force-pushed the jss-435-join-as-paste-operator branch from d41c867 to 7bfdcc1 Compare May 8, 2016 09:33
@jonseymour
Copy link
Contributor Author

@nathanielc I've rebased this change onto master. Ready for your review.

}
n := prefix + k
if claim, ok := js.claims[n]; ok && claim != i {
panic(fmt.Errorf("field %s of input %d conflicts with input %d", k, i, claim))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove the panic here and return an error instead?

The calling functions JoinInto* can then log the error and return models.Point/Batch{},false the join will be skipped.

Copy link
Contributor Author

@jonseymour jonseymour May 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made use of panic because it complicates every caller of outName which would otherwise have to check for an error.

update: I had a quick look at removing the panic - it really does make things quite messy. However, if you really want me to do that, let me know and I will push such a change

@nathanielc
Copy link
Contributor

@jonseymour I have been thinking about the user experience here and would like some feedback:

First, to be clear the entire purpose of the # char is to remove the . from the prefix name? And as a result the onus is now on the user to choose unique prefixes?

To keep it simple I see two ways of going about the .as property

  1. Leave it how it is currently, which prevents the user from creating name conflicts, but gives less control over the names.
  2. Give user full control and explicitly check for name conflicts and handle them.

There may be some middle ground but its not clear to me what that would be.

I think if the ability to have full control over the field name is something we want then we go with #2. Based on your experience and others I believe we do need the control.

Also by adding runtime checking for name conflicts we can remove the requirement to have an as property. The default will be to assume that the field names are unique, if that assumption fails an error occurs and is handled appropriately.

Thoughts?

@jonseymour
Copy link
Contributor Author

Yes, I think we do need to give the user more control and one of the implications of this change is that .as() becomes optional unless there is a conflict to resolve.

@nathanielc
Copy link
Contributor

Closing in favor of a simpler solution #698

The .as property is still required but that can be changed later as well.

@nathanielc nathanielc closed this Jul 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants