-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for IP Address Function #9501
Conversation
Context - One of our observability users wanted subnet-contains and other similar functionality for IP addresses stored as STRINGs. To be used in SQL WHERE clause mostly |
f35676b
to
03b6b18
Compare
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/IpAddressFunctions.java
Show resolved
Hide resolved
Can we add some e2e query execution tests where we use this in query execution (WHERE clause) similar to how you added other transform functions in the past ? Let's add some tests with meaningful IP addresses and then test out the functionality for both IPv4 and IPv6. |
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/IpAddressFunctions.java
Outdated
Show resolved
Hide resolved
return true; | ||
} | ||
|
||
int shift; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a standard / known way of writing this logic ? We need to come up with comprehensive tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no Java native library functions that handle both IPv4 and IPv6 prefix. I added comments on the alg details.
Presto uses a similar approach to compute the min and max for a given IP prefix.
max: https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/scalar/IpPrefixFunctions.java#L89
min: https://github.com/prestodb/presto/blob/8a488efc63c94273c7961c87b6ab9adf141c83bb/presto-main/src/main/java/com/facebook/presto/type/IpPrefixOperators.java#L152
...roker/src/test/java/org/apache/pinot/broker/requesthandler/LiteralOnlyBrokerRequestTest.java
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/IpAddressFunctions.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/IpAddressFunctions.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/IpAddressFunctions.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/IpAddressFunctions.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/IpAddressFunctions.java
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #9501 +/- ##
=============================================
+ Coverage 24.62% 63.91% +39.29%
- Complexity 53 4910 +4857
=============================================
Files 1935 1893 -42
Lines 103815 101871 -1944
Branches 15758 15529 -229
=============================================
+ Hits 25560 65107 +39547
+ Misses 75635 31997 -43638
- Partials 2620 4767 +2147
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
We've been using https://github.com/seancfoley/IPAddress for exactly this type of address parsing. It's very well tested, and seems robust. Wondering why you wouldn't want to use it for this functionality (and eventually other IP address functions supported by Presto, such as |
Nit - PR description says the function name is |
One more comment - I think Presto handles either VARCHAR or IPADDRESS for the ip address. For one of our use cases, we've got IPv4 stored as longs in the Pinot table, which would be more efficient to query as-is versus using a custom function to convert back to a string. We could extend this support in the future, wondering if there would be a reason we shouldn't plan on being able to do that. |
ff7acbe
to
ee975cd
Compare
@walterddr - please take a look as well |
d3ca232
to
74a7203
Compare
pinot-common/src/main/java/org/apache/pinot/common/utils/request/RequestUtils.java
Outdated
Show resolved
Hide resolved
Based on @kkrugler 's suggestion to explore library, I had asked @SabrinaZhaozyf to do some investigation along the following lines.
@SabrinaZhaozyf , you may want to share any of your findings here for reference / discussion. @Jackie-Jiang @walterddr @xiangfu0 - both approaches are implemented here. Can we get consensus on how we move forward ? One thing is that we will have to overload this for different types. Have you seen a similar requirement ? |
I have summarized my research on IP processing functions in other systems and the library in this doc: https://docs.google.com/document/d/1SS04jQCojcvCrrIGJX_aHocUEXXy8rpX0uUhwAren4k/edit?usp=sharing Please feel free to provide feedback / comments and let's get consensus on how we want to proceed from here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we can use a specific network address type for the storage. i can imagine using no-dict columns but storing it as fixed-size data type might be desirable
see: https://www.postgresql.org/docs/current/datatype-net-types.html
public static boolean isSubnetOfV1(String ipPrefix, String ipAddress) | ||
throws UnknownHostException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need a v1 ? and why is it throwing exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR, there are 2 paths of implementations as discussed here.
The v1 is the in-house implementation without calling the library.
I was about to suggest the same as Rong. Given that we are going toward s postgres with joins, let's emulate the functions as well. |
ea213bf
to
6c1ddf9
Compare
6c1ddf9
to
9994381
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @walterddr. We agree that we can implement proper IP type and optimal storage support.
In this PR, we are just implementing the basic function for subnet contains / isSubnetOf
that takes a STRING and returns BOOLEAN output.
With the orthogonal work going on function registry, later we should be able to add function that takes network address type or whatever it evolves to in long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm but one question. we can also follow up
IPAddress prefix = getPrefix(ipPrefix); | ||
IPAddress ip = getAddress(ipAddress); | ||
if (ip.isPrefixed()) { | ||
throw new IllegalArgumentException("IP Address " + ipAddress + " should not be prefixed."); | ||
} | ||
return prefix.contains(ip); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we allow string input then we need to handle this (e.g. malformed string should be handled. throwing here seems a bit risky)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @walterddr , thanks for the comment! Could you please elaborate a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually no worries. we can follow up.
- upon checking some other systems, very little has IPADDRESS built in
- PostgreSQL throws when IP varchar cast is a malform string
- MySQL has a is IP address function to check if a varchar is a IP address format.
so IMO,
- for now this impl is good.
- we can provide a safety net like MySQL later:
CASE WHEN is_ipaddress(col1) AND is_ipaddress(col2) THEN isSubnetOf(col1, col2) ELSE false END
Label =
feature
This PR adds a scalar function
isSubnetOff(IP_PREFIX, IP_ADDRESS)
which checks ifIP_ADDRESS
is in the subnet ofIP_PREFIX
(this function handles both IPv4 and IPv6).Both
IP_PREFIX
andIP_ADDRESS
areSTRING
type.Use Cases
SELECT isSubnetOf('1.2.3.128/26', '1.2.3.129') FROM FOO
--> returns trueSELECT isSubnetOf('64:fa9b::17/64', '64:ffff::17') FROM FOO
--> returns false