Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][transform] transform support explode #7928

Merged
merged 31 commits into from
Nov 9, 2024

Conversation

CosmosNi
Copy link
Contributor

#7926
support explode transform function

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

@Hisoka-X
Copy link
Member

cc @corgy-w

@Hisoka-X
Copy link
Member

cc @YuriyGavrilov as well.

* @param row the data need be transformed.
* @return transformed data.
*/
List<T> flatMap(T row);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CosmosNi
Copy link
Contributor Author

CosmosNi commented Nov 1, 2024

@Hisoka-X @liunaijie @corgy-w please cc

SELECT * FROM fake
LATERAL VIEW EXPLODE (SPILT ( NAME, ',' )) AS NAME
LATERAL VIEW EXPLODE (SPILT ( pk_id, ';' )) AS pk_id
LATERAL VIEW OUTER EXPLODE ( age ) AS age

@YuriyGavrilov
Copy link

Hi Guys, I can't cover all review, just to inform when we rotate data on 90 degrees we should have possibilities to limit row count due to it can course to problem on next step. For my case we will take 10 or 20 rows per each table and will send this data to llm. Output will be fixed or knowable length size. Columns as rows with description as option (maybe) and llm output (as json perhaps)

Many thanks 🙏 for making all of this happening. It is amazing job.

@corgy-w
Copy link
Contributor

corgy-w commented Nov 7, 2024

Hi Guys, I can't cover all review, just to inform when we rotate data on 90 degrees we should have possibilities to limit row count due to it can course to problem on next step. For my case we will take 10 or 20 rows per each table and will send this data to llm. Output will be fixed or knowable length size. Columns as rows with description as option (maybe) and llm output (as json perhaps)

Many thanks 🙏 for making all of this happening. It is amazing job.

@YuriyGavrilov This pr does not include what you mentioned, your needs are still in progress

@github-actions github-actions bot removed the paimon label Nov 8, 2024

select ARRAY('test1','test2','test3') as arrays

### SPLIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return;
}
for (Object fieldValue : (Object[]) splitFieldValue) {
Object value = keepValueType ? fieldValue : String.valueOf(fieldValue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the fieldValue is null, it will be convert to "null" by String.valueOf(fieldValue). The value will be wrong.

}
}

private SeaTunnelRow copySeaTunnelRow(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private SeaTunnelRow copySeaTunnelRow(
private SeaTunnelRow copySeaTunnelRowWithNewValue(

Hisoka-X
Hisoka-X previously approved these changes Nov 8, 2024
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @CosmosNi and @corgy-w @liunaijie @YuriyGavrilov for review!

@Hisoka-X Hisoka-X added the don't merge There needs to be a specific reason in the PR, and it cannot be merged for the time being. label Nov 8, 2024
@Hisoka-X
Copy link
Member

Hisoka-X commented Nov 8, 2024

Hi Guys, I can't cover all review, just to inform when we rotate data on 90 degrees we should have possibilities to limit row count due to it can course to problem on next step. For my case we will take 10 or 20 rows per each table and will send this data to llm. Output will be fixed or knowable length size. Columns as rows with description as option (maybe) and llm output (as json perhaps)

Many thanks 🙏 for making all of this happening. It is amazing job.

We need a sql function named slice to slice array in another PR. cc @corgy-w @CosmosNi

@CosmosNi
Copy link
Contributor Author

CosmosNi commented Nov 8, 2024

like slice(field,start,end)?

@Hisoka-X
Copy link
Member

Hisoka-X commented Nov 8, 2024

like slice(field,start,end)?

yes.

row_rules = [
{
rule_type = MAX_ROW
rule_value = 24
Copy link
Contributor Author

@CosmosNi CosmosNi Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name×pk_id×age×array 2×2×2×2+2×1×1×2+1×2×1×2 = 24

@Hisoka-X Hisoka-X removed the don't merge There needs to be a specific reason in the PR, and it cannot be merged for the time being. label Nov 8, 2024
@hailin0 hailin0 merged commit 132278c into apache:dev Nov 9, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants