Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sql_connector): add support for sql connector #1543

Merged
merged 9 commits into from
Jan 27, 2025

Conversation

ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Jan 23, 2025

Important

Add support for SQL connectors in pandasai, enabling connections to MySQL, PostgreSQL, and SQLite, with updated documentation and tests.

  • Behavior:
    • Add support for SQL connectors in create() function in pandasai/__init__.py, allowing connection to MySQL, PostgreSQL, and SQLite.
    • Introduce SQLConnectionConfig and SqliteConnectionConfig in semantic_layer_schema.py for connection configurations.
    • Update DatasetLoader in loader.py to handle SQL sources and execute queries.
  • Functions:
    • Implement load_from_mysql, load_from_postgres, load_from_sqlite, and load_from_cockroachdb in pandasai_sql/__init__.py.
  • Tests:
    • Add unit tests in test_sql.py, test_loader.py, and test_pandasai_init.py to verify SQL connector functionality.
    • Ensure create() handles both DataFrame and connector inputs correctly.
  • Documentation:
    • Update semantic-layer.mdx to include examples and descriptions for using SQL connectors.

This description was created by Ellipsis for 28e9510. It will automatically update as commits are pushed.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 23, 2025
@ArslanSaleem ArslanSaleem marked this pull request as draft January 23, 2025 09:01
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 783aa43 in 2 minutes and 11 seconds

More details
  • Looked at 357 lines of code in 6 files
  • Skipped 0 files when reviewing.
  • Skipped posting 8 drafted comments based on config settings.
1. extensions/connectors/sql/pandasai_sql/__init__.py:19
  • Draft comment:
    Consider using a context manager (with statement) to ensure that the database connection is properly closed after the query execution. This applies to all database connection functions in this file.
  • Reason this comment was not posted:
    Comment was on unchanged code.
2. pandasai/__init__.py:43
  • Draft comment:
    Consider using a structured object like SQLConnectionConfig or SqliteConnectionConfig for the connector parameter instead of a dictionary. This will provide better type safety and clarity.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The suggestion seems reasonable since it would provide better type safety and IDE support. However, the code already has validation for the connector type through SQL_SOURCE_TYPES check. The change would be a refactor rather than fixing a clear issue. The comment doesn't provide strong evidence that this change is necessary.
    The suggestion could improve code quality, but it's more of an optional enhancement than a required fix. The current implementation with dict validation works fine.
    While using structured objects would be nice, the current implementation is functional and validated. This suggestion falls into the category of optional refactoring rather than a necessary change.
    Delete the comment as it suggests an optional enhancement rather than pointing out a clear issue that needs to be fixed.
3. pandasai/data_loader/loader.py:60
  • Draft comment:
    Ensure self.schema is not None before accessing self.schema.source.type to avoid potential AttributeError.
  • Reason this comment was not posted:
    Comment did not seem useful.
4. extensions/connectors/sql/pandasai_sql/__init__.py:9
  • Draft comment:
    New functionality for SQL connectors is introduced, but there are no corresponding unit tests. Please add unit tests to ensure this new functionality is tested.
  • Reason this comment was not posted:
    Comment was on unchanged code.
5. extensions/connectors/sql/pandasai_sql/__init__.py:42
  • Draft comment:
    The function name load_from_cockroachdb is inconsistent with the naming pattern of other functions like load_from_mysql and load_from_postgres. Consider renaming it to load_from_cockroach for consistency.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is actually suggesting to revert a deliberate change made in this PR. The new name load_from_cockroachdb is more precise and matches the official product name "CockroachDB". The old name was less accurate. The change appears intentional and makes the code better, not worse.
    Maybe there's a good reason for shorter names in function signatures? Maybe there's a team naming convention I'm not aware of?
    Even if there was a convention, using the full, official product name is clearer and more professional. The change makes the code better.
    The comment should be deleted as it's suggesting to revert an intentional improvement to the function name.
6. pandasai/__init__.py:40
  • Draft comment:
    New functionality for SQL connectors is introduced, but there are no corresponding unit tests. Please add unit tests to ensure this new functionality is tested.
  • Reason this comment was not posted:
    Marked as duplicate.
7. pandasai/data_loader/loader.py:29
  • Draft comment:
    New functionality for SQL connectors is introduced, but there are no corresponding unit tests. Please add unit tests to ensure this new functionality is tested.
  • Reason this comment was not posted:
    Marked as duplicate.
8. pandasai/data_loader/semantic_layer_schema.py:22
  • Draft comment:
    New functionality for SQL connectors is introduced, but there are no corresponding unit tests. Please add unit tests to ensure this new functionality is tested.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_geewBNZVJ8tnNvhH


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

codecov bot commented Jan 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.63%. Comparing base (63e8c26) to head (28e9510).
Report is 148 commits behind head on release/v3.

Additional details and impacted files
@@              Coverage Diff               @@
##           release/v3    #1543      +/-   ##
==============================================
+ Coverage       79.09%   79.63%   +0.53%     
==============================================
  Files             149       59      -90     
  Lines            6013     2013    -4000     
==============================================
- Hits             4756     1603    -3153     
+ Misses           1257      410     -847     
Flag Coverage Δ
unittests 79.63% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ArslanSaleem ArslanSaleem marked this pull request as ready for review January 24, 2025 08:10
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jan 24, 2025
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 148d585 in 1 minute and 37 seconds

More details
  • Looked at 1582 lines of code in 11 files
  • Skipped 0 files when reviewing.
  • Skipped posting 7 drafted comments based on config settings.
1. extensions/connectors/sql/pandasai_sql/__init__.py:38
  • Draft comment:
    Ensure connection_info is an instance of SqliteConnectionConfig for clarity when using sqlite3.connect(connection_info.file_path).
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In the load_from_sqlite function, the connection is established using sqlite3.connect(connection_info.file_path). This is correct, but the comment should mention that connection_info is an instance of SqliteConnectionConfig for clarity.
2. extensions/connectors/sql/pandasai_sql/__init__.py:42
  • Draft comment:
    Ensure the function name load_from_cockroachdb matches the loader function name in the __all__ list to avoid confusion.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In the load_from_cockroachdb function, the connection is established using psycopg2.connect. The function name should match the loader function name in the __all__ list to avoid confusion.
3. pandasai/__init__.py:59
  • Draft comment:
    The error message should specify df must be a PandasAI DataFrame for consistency with the class name.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In the create function, the check for df being a PandaAI DataFrame is correct, but the error message should specify PandasAI DataFrame for consistency with the class name.
4. pandasai/data_loader/semantic_layer_schema.py:101
  • Draft comment:
    Clarify the comment indicating that SQLite is a special case for local sources to avoid confusion.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In the SemanticLayerSchema class, the validate_type_and_fields method has a comment indicating that SQLite is a special case for local sources. This should be clarified in the code to avoid confusion.
5. docs/v3/semantic-layer.mdx:138
  • Draft comment:
    The type for the connector field is incorrectly mentioned as DataFrame. It should be a dictionary specifying connection details. Please update the documentation accordingly.
  • Reason this comment was not posted:
    Marked as duplicate.
6. pandasai/__init__.py:59
  • Draft comment:
        raise ValueError("df must be a PandaAI DataFrame")
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
7. pandasai/__init__.py:58
  • Draft comment:
    raise ValueError("df must be a PandaAI DataFrame")
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_QarLLfonNjse9mY2


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

docs/v3/semantic-layer.mdx Outdated Show resolved Hide resolved
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 6610d38 in 25 seconds

More details
  • Looked at 56 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 drafted comments based on config settings.
1. pandasai/data_loader/loader.py:4
  • Draft comment:
    Unused import 'datetime' and 'timedelta' removed. Good cleanup.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The import statement for 'datetime' and 'timedelta' is removed but not used in the file. This is a good cleanup.
2. pandasai/dataframe/base.py:242
  • Draft comment:
    'data_loader' parameter added to 'pull' method. Consistent with changes in 'virtual_dataframe.py'.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The 'pull' method in 'base.py' and 'virtual_dataframe.py' has been updated to include 'data_loader' as a parameter. This change is consistent across both files and aligns with the intent of adding SQL connector support.
3. pandasai/dataframe/virtual_dataframe.py:39
  • Draft comment:
    'name' and 'description' are now popped from kwargs to prevent unintended passing to superclass.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The 'name' and 'description' parameters are now being popped from kwargs in 'VirtualDataFrame'. This change ensures that these parameters are correctly handled and not passed to the superclass unintentionally.
4. pandasai/dataframe/virtual_dataframe.py:37
  • Draft comment:
    Add a period at the end of the error message for consistency.
            raise VirtualizationError("Schema is required for virtualization!")
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The error message in line 37 is missing a period at the end. This is a minor grammar issue but should be corrected for consistency.
5. pandasai/dataframe/base.py:239
  • Draft comment:
    The reinitialization of the DataFrame in the pull method is repeated in virtual_dataframe.py. Consider refactoring to avoid code duplication.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The function pull in base.py and virtual_dataframe.py have similar code for reinitializing the DataFrame. This violates the DRY principle.
6. pandasai/dataframe/virtual_dataframe.py:48
  • Draft comment:
    The reinitialization of the DataFrame in the pull method is repeated in base.py. Consider refactoring to avoid code duplication.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The function pull in base.py and virtual_dataframe.py have similar code for reinitializing the DataFrame. This violates the DRY principle.

Workflow ID: wflow_IhG6yIOFkpnLao9d


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 52b4863 in 22 seconds

More details
  • Looked at 19 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. docs/v3/semantic-layer.mdx:135
  • Draft comment:
    The change from DataFrame to Dict is inconsistent with the rest of the document, which still references DataFrame. Ensure consistency across the documentation.
  • Reason this comment was not posted:
    Comment did not seem useful.
2. docs/v3/semantic-layer.mdx:135
  • Draft comment:
    Ensure that the documentation is updated to reflect the change from using a DataFrame to a SQL connector source dict. This includes updating any examples, descriptions, and ensuring that tests are updated accordingly.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_89EW0oTbzuGlbl1X


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@scaliseraoul-sinaptik scaliseraoul-sinaptik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good to me 👍🏻

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 28e9510 in 27 seconds

More details
  • Looked at 12 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. pandasai/__init__.py:112
  • Draft comment:
    The removal of schema = df.schema seems incorrect. schema is used later and should be initialized with df.schema when df is not None. This could lead to a NameError.
  • Reason this comment was not posted:
    Comment was not on a valid diff hunk.
2. pandasai/__init__.py:112
  • Draft comment:
    The removal of schema = df.schema seems incorrect as schema is used later in the code. Consider restoring this line.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_sXrC019EUxW7iOa0


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@gventuri gventuri merged commit 61ababc into release/v3 Jan 27, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants