Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature-wip](array) remove array config and check array nested depth #13428

Merged
merged 9 commits into from
Oct 20, 2022

Conversation

morningman
Copy link
Contributor

@morningman morningman commented Oct 17, 2022

Proposed changes

Issue Number: close #xxx

Problem summary

  1. remove FE config enable_array_type
  2. limit the nested depth of array in FE side.
  3. Fix bug that when loading array from parquet, the decimal type is treated as bigint
  4. Fix loading array from csv(vec-engine), handle null and "null"
  5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null.
  6. Remove check_array_format(), because it's logic is wrong and meaningless
  7. Add stream load csv test cases and more parquet broker load tests

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added area/planner Issues or PRs related to the query planner area/vectorization kind/docs Categorizes issue or PR as related to documentation. labels Oct 17, 2022
remove array config and limit nested depth of array in FE

fix array sort key bug

support parquet array

final
@github-actions github-actions bot added the area/sql/function Issues or PRs related to the SQL functions label Oct 18, 2022
@morningman morningman added area/array-type Issues or PRs related to array type kind/behavior-changed labels Oct 19, 2022
if (*rb.position() != '[') {
return Status::InvalidArgument("Array does not start with '[' character, found '{}'",
*rb.position());
}
if (*(rb.end() - 1) != ']') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will overflow if rb is empty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checked in ConvertImplGenericFromString in function_cast.h:

 332                 if (val.size == 0) {
 333                     col_to->insert_default();
 334                     continue;
 335                 }
 336                 ReadBuffer read_buffer((char*)(val.data), val.size);
 337                 Status st = data_type_to->from_string(read_buffer, col_to);
 338                 // if parsing failed, will return null
 339                 (*vec_null_map_to)[i] = !st.ok();
 340                 if (!st.ok()) {
 341                     col_to->insert_default();
 342                 }

I will add a DCHECK here

}
block.replace_by_position(result, std::move(col_to));
// block.replace_by_position(result, std::move(col_to));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// nested types) at a nesting depth between 200 and 300 (200 worked, 300 crashed).
public static int MAX_NESTING_DEPTH = 2;
// Currently only support Array type with max 9 depths.
public static int MAX_NESTING_DEPTH = 9;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is 9?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a hard limit on BE side. See TypeInfo* get_array_type_info in be/src/olap/types.cpp:

DCHECK(iterations <= 8) << "the depth of nested array type should not be larger than 8";

if (!check_array_format(_split_values)) {
return Status::OK();
}
// This check is meaningless, should be removed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove this, may be we should also covert the invalid array value in this row to null. Keep the same logic as cast function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in ConvertImplGenericFromString in function_cast.h.
Just as you recommended, all malformat array string will return null.

@morningman morningman requested review from cambyzju and xy720 October 19, 2022 14:24
Comment on lines +220 to +222
auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
nested_null_col.get_nested_column().insert_default();
nested_null_col.get_null_map_data().push_back(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check whether the nested column is nullable here. In future, we may enable the not_null syntax to create an array with the elements which are all not nullable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. data_type_array.cpp:185 has DCHECK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently Doris only support "nullable" element inside array.
I will add a DCHECK here.
After we support "not_null" element, I will rethink this logic

Copy link
Contributor

@adonis0147 adonis0147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 32b1456 into apache:master Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/array-type Issues or PRs related to array type area/planner Issues or PRs related to the query planner area/sql/function Issues or PRs related to the SQL functions area/vectorization kind/behavior-changed kind/docs Categorizes issue or PR as related to documentation. kind/test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants