-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Length Row Level Results #465
Feature: Length Row Level Results #465
Conversation
@@ -249,6 +255,12 @@ case class NumMatchesAndCount(numMatches: Long, count: Long, override val fullCo | |||
} | |||
} | |||
|
|||
case class AnalyzerOptions(convertNull: Boolean) { | |||
def getConvertNull(): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have this be an enum or other option, and also named more clearly since we put it in Analyzer.
I see two problems with it:
- The name isn't clear. I don't know when nulls will be converted
- The result isn't clear. I don't know (and can't control) what nulls will be converted to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to use an enum NullBehavior
that gives users 3 options
- Ignore (Default behavior)
- Empty (Treat as 0 or empty string)
- Fail (Always fail nulls)
private def criterion(convertNull: Boolean): Column = { | ||
if (convertNull) { | ||
val colLengths: Column = length(conditionalSelection(column, where)).cast(DoubleType) | ||
conditionalSelectionForLength(colLengths, Option(s"${column} IS NULL"), Double.MinValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we replacing after the length has been computed? I would expect to replace all null strings with ""
and then compute the length. Is this approach better?
Also, min value is going to be negative, is that what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now with the NullBehavior
enum, users can choose to replace with "" or treat as false (fixed to Double.MaxValue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left some minor comments.
@@ -453,6 +471,20 @@ private[deequ] object Analyzers { | |||
conditionalSelection(col(selection), where) | |||
} | |||
|
|||
def conditionalSelection(selection: Column, where: Option[String], replaceWith: Double): Column = { | |||
val conditionColumn = where.map { expression => expr(expression) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: where.map(expr)
} | ||
|
||
def conditionalSelection(selection: Column, where: Option[String], replaceWith: String): Column = { | ||
val conditionColumn = where.map { expression => expr(expression) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: where.map(expr)
def getNullBehavior(): NullBehavior = { | ||
nullBehavior | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required.
scala> case class Options(iLikeNulls: Boolean)
defined class Options
scala> val opt = Options(iLikeNulls = false)
opt: Options = Options(false)
scala> opt.iLikeNulls
res2: Boolean = false
|
||
object NullBehavior extends Enumeration { | ||
type NullBehavior = Value | ||
val Ignore, Empty, Fail = Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does Empty
mean for numbers? If it is only applicable for Strings, should we call it EmptyString
?
case _ => length(conditionalSelection(column, where)).cast(DoubleType) | ||
} | ||
} | ||
private def getNullBehavior(): NullBehavior = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The ()
can be removed. In idiomatic Scala, ()
, called an empty paren(theses) method, denotes that function performs some side effect. Like calling a service or even just logging. Here, we are just extracting data from an object, and therefore, we can remove this ()
from the method.
@@ -41,4 +48,22 @@ case class MinLength(column: String, where: Option[String] = None) | |||
} | |||
|
|||
override def filterCondition: Option[String] = where | |||
|
|||
@VisibleForTesting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed if we are using package private modifier and have the test in the appropriate package?
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Excess whitespace
- Add row level results for MinLength in addition to MaxLength - Add AnalzyerOptions case class with options to change null behavior (Ignore, EmptyString, Fail)
- Add row level results for MinLength in addition to MaxLength - Add AnalzyerOptions case class with options to change null behavior (Ignore, EmptyString, Fail)
- Add row level results for MinLength in addition to MaxLength - Add AnalzyerOptions case class with options to change null behavior (Ignore, EmptyString, Fail)
- Add row level results for MinLength in addition to MaxLength - Add AnalzyerOptions case class with options to change null behavior (Ignore, EmptyString, Fail)
- Add row level results for MinLength in addition to MaxLength - Add AnalzyerOptions case class with options to change null behavior (Ignore, EmptyString, Fail)
Issue #, if available:
N/A
Description of changes:
Building on this PR to add more analyzers that support row level results.
This PR address this for
MinLength
, asMaxLength
was completed in the PR mentioned above.In addition, this PR adds an
AnalzyerOptions
case class that gives an option to convert null values to result to failing the check. This option is not set by default, which means that for aMinLength
analyzer, the check would succeed even if null values were present.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.