Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: filter only the columns that are provided in the schema #1562

Merged
merged 1 commit into from
Jan 30, 2025

Conversation

gventuri
Copy link
Collaborator

@gventuri gventuri commented Jan 30, 2025

Important

Add _filter_columns method to DatasetLoader to filter DataFrame columns based on schema and integrate it into the load method, with corresponding tests.

  • Behavior:
    • Add _filter_columns method in DatasetLoader to filter DataFrame columns based on schema.
    • Integrate _filter_columns into load() method in loader.py.
  • Tests:
    • Add test_filter_columns_with_schema_columns to verify filtering with specified schema columns.
    • Add test_filter_columns_without_schema_columns to verify no filtering when schema columns are not specified.
    • Add test_filter_columns_with_non_matching_columns to verify behavior when schema columns don't match DataFrame columns.
    • Add test_filter_columns_without_schema to verify no filtering when no schema is set.

This description was created by Ellipsis for 221a6f9. It will automatically update as commits are pushed.

@gventuri gventuri requested a review from ArslanSaleem January 30, 2025 10:21
Copy link

codecov bot commented Jan 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.53%. Comparing base (4a093c6) to head (221a6f9).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1562      +/-   ##
==========================================
+ Coverage   82.47%   82.53%   +0.05%     
==========================================
  Files          64       64              
  Lines        2408     2416       +8     
==========================================
+ Hits         1986     1994       +8     
  Misses        422      422              
Flag Coverage Δ
unittests 82.53% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 221a6f9 in 1 minute and 35 seconds

More details
  • Looked at 97 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. pandasai/data_loader/loader.py:220
  • Draft comment:
    Simplify column filtering by using set intersection:
        columns_to_keep = list(set(df_columns) & set(schema_columns))
  • Reason this comment was not posted:
    Confidence changes required: 33%
    The _filter_columns method is correctly implemented, but the logic for filtering columns can be simplified by using the intersection of sets.
2. pandasai/data_loader/loader.py:203
  • Draft comment:
                f"Failed to execute query for '{source_type}' with query: {formatted_query}"
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_9xrDsgm3ZhRS821G


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@gventuri gventuri merged commit a0b5878 into main Jan 30, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant