Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search by key word #60

Merged
merged 4 commits into from
Feb 10, 2025
Merged

Search by key word #60

merged 4 commits into from
Feb 10, 2025

Conversation

selvaebi
Copy link
Contributor

@selvaebi selvaebi commented Feb 9, 2025

Summary by CodeRabbit

  • New Features

    • Introduced an enhanced CLI search command that now requires a keyword and supports additional options for filtering, pagination, and custom sorting of project results.
  • Documentation

    • Updated usage instructions to incorporate the new search functionality and improved clarity by correcting minor textual errors.

Copy link
Contributor

coderabbitai bot commented Feb 9, 2025

Walkthrough

This pull request integrates a new command, search-projects-by-keywords-and-filters, into both the CLI documentation and implementation. The README now includes usage examples and corrected header text, while the pridepy module has expanded the search function’s options by making keyword input required and adding filter, pagination, and sorting parameters. Additionally, a minor formatting correction has been applied to the category options for one download function.

Changes

File(s) Change Summary
README.md Added new command (search-projects-by-keywords-and-filters) with usage example; corrected header typo ("Streamming metadata" → "Streaming metadata").
pridepy/pridepy.py Updated search_projects_by_keywords_and_filters: made keyword required; added options (-f/--filter, -ps/--page_size, -p/--page, -sd/--sort_direction, -sf/--sort_fields) with default values; adjusted internal sort fields concatenation; fixed spacing in category option for download_all_public_category_files.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant C as CLI Interface
    participant P as CLI Parser
    participant S as search_projects_by_keywords-and-filters
    participant E as Search Engine

    U->>C: Enter search-projects-by-keywords-and-filters command with options
    C->>P: Parse command-line arguments (-k, -f, -ps, -p, -sd, -sf)
    P->>S: Invoke search_projects_by_keywords-and-filters with parameters
    S->>E: Build query & call search method with filters and sorting
    E-->>S: Return search results
    S-->>C: Return formatted results
    C-->>U: Display search results
Loading

Possibly related PRs

  • fix: removed dataegap and test searchby keyword #56: The changes in the main PR are directly related to the modifications made to the search_projects_by_keywords-and-filters function in the retrieved PR, as both PRs involve updates to the same function's parameters and functionality.
  • major refactoring #59: The changes in the main PR, specifically the introduction and modification of the search_projects_by_keywords-and-filters command in pridepy.py, are directly related to the adjustments made to the same command in the retrieved PR, which also involves search capabilities.

Poem

I'm a little rabbit, hopping with delight,
New commands bloom like carrots, shiny and bright.
I search with keywords and filters galore,
Sorting and paging—oh, there's so much more!
With each little hop, I celebrate the code spree,
Bounding through changes as happy as can be!
🐇🥕 Celebrate the tweaks with glee!

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pridepy/pridepy.py (1)

361-378: Add validation for sort_direction value.

The function should validate that the sort_direction value is either "ASC" or "DESC" before passing it to the search method.

Apply this diff to add validation:

def search_projects_by_keywords_and_filters(
    keyword, filter, page_size, page, sort_direction, sort_fields
):
    project = Project()
+   if sort_direction not in ["ASC", "DESC"]:
+       raise ValueError("sort_direction must be either 'ASC' or 'DESC'")
    sf = ', '.join(sort_fields)
    logging.info(
        project.search_by_keywords_and_filters(
            keyword, filter, page_size, page, sort_direction, sf
        )
    )
README.md (1)

160-166: Improve the command example documentation.

  1. The command example should show its output to comply with markdown best practices.
  2. Consider adding expected output to help users understand the command's behavior.

Apply this diff to improve the documentation:

## Search projects by keywords and filters

Get the Project metadata by keywords and filters

```bash
$  python -m pridepy.pridepy search-projects-by-keywords-and-filters -f projectTags==Proteometools,organismsPart==Pancreas -k human -sd DESC -sf accession -sf submissionDate
+# Example output:
+# {
+#   "accession": "PXD000001",
+#   "title": "Human Proteome Project",
+#   "projectTags": ["Proteometools"],
+#   "submissionDate": "2023-01-01",
+#   ...
+# }

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

165-165: Dollar signs used before commands without showing output
null

(MD014, commands-show-output)

</details>

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: CodeRabbit UI**
**Review profile: CHILL**
**Plan: Pro**

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between e57c89ab9dd2a084ea9f55a9d1f5b820ae74e8a1 and 0c2301178a8f825fe0d6defe8550604d5c5d5dd9.

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `README.md` (3 hunks)
* `pridepy/pridepy.py` (2 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

<details>
<summary>README.md</summary>

165-165: Dollar signs used before commands without showing output
null

(MD014, commands-show-output)

</details>

</details>
<details>
<summary>🪛 GitHub Actions: Python package</summary>

<details>
<summary>pridepy/pridepy.py</summary>

[warning] 13-13: expected 2 blank lines, found 1

---

[warning] 231-231: line too long (143 > 127 characters)

---

[warning] 274-274: expected 2 blank lines, found 1

---

[warning] 311-311: expected 2 blank lines, found 1

---

[warning] 383-383: too many blank lines (3)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms (1)</summary>

* GitHub Check: build

</details>

<details>
<summary>🔇 Additional comments (4)</summary><blockquote>

<details>
<summary>pridepy/pridepy.py (2)</summary>

`129-129`: **LGTM!**

The formatting of category choices has been improved by removing unnecessary spaces.

---

`315-318`: **LGTM!**

Making the keyword parameter required is a good choice as it ensures more focused search results.

</details>
<details>
<summary>README.md (2)</summary>

`61-61`: **LGTM!**

The new command has been correctly added to the list of available commands.

---

`139-139`: **LGTM!**

The typo in the section header has been correctly fixed.

</details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines 319 to 359
@click.option(
"-f",
"--filter",
required=False,
help="Parameters to filter the search results. The structure of the "
"filter is: field1==value1, field2==value2. Example "
"accession==PRD000001",
)
@click.option(
"-ps",
"--page_size",
required=False,
default=100,
help="Number of results to fetch in a page",
)
@click.option(
"-p",
"--page",
required=False,
default=0,
help="Identifies which page of results to fetch",
)
@click.option(
"-sd",
"--sort_direction",
required=False,
default="DESC",
help="Sorting direction: ASC or DESC",
)
@click.option(
"-sf",
"--sort_fields",
required=False,
default="submission_date",
multiple=True,
help="Field(s) for sorting the results on. Default for this "
"request is submission_date. More fields can be separated by "
"comma and passed. Example: submissionDate,accession",
type=click.Choice("accession,submissionDate,diseases,organismsPart,organisms,instruments,softwares,"
"avgDownloadsPerFile,downloadCount,publicationDate".split(",")),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add type validation for sort_direction and fix inconsistency in sort_fields.

  1. The sort_direction option should validate input using type=click.Choice(["ASC", "DESC"]).
  2. There's an inconsistency in the sort fields: the help text mentions "submissionDate" but the choice uses "submission_date".

Apply this diff to improve the options:

@click.option(
    "-sd",
    "--sort_direction",
    required=False,
    default="DESC",
-   help="Sorting direction: ASC or DESC",
+   help="Sorting direction",
+   type=click.Choice(["ASC", "DESC"]),
)
@click.option(
    "-sf",
    "--sort_fields",
    required=False,
-   default="submission_date",
+   default="submissionDate",
    multiple=True,
    help="Field(s) for sorting the results on. Default for this "
    "request is submission_date. More fields can be separated by "
    "comma and passed. Example: submissionDate,accession",
    type=click.Choice("accession,submissionDate,diseases,organismsPart,organisms,instruments,softwares,"
                     "avgDownloadsPerFile,downloadCount,publicationDate".split(",")),
)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pridepy/pridepy.py (1)

363-372: Enhance docstring with more detailed parameter descriptions.

The docstring should provide more information about:

  1. Valid filter formats and examples
  2. Valid range for page_size
  3. Valid values for sort_direction
  4. List of available sort_fields
    """
    Search all projects by keywords and filters
    Parameters:
-       keyword (str): keyword to search in entire project.
-       filter (str): filter the search results. field1==value1
-       page_size (int): no of records or projects per page
-       page (int): Page number
-       sort_direction (str): sort direction of the results based on sortfield
-       sort_fields (str): field to sort the results by.
+       keyword (str): Keyword to search across all project fields.
+       filter (str): Filter string in format "field1==value1, field2==value2".
+                     Example: "accession==PRD000001, organisms==Human"
+       page_size (int): Number of results per page (1-1000, default: 100)
+       page (int): Zero-based page number (default: 0)
+       sort_direction (str): Sort direction, either "ASC" or "DESC" (default: "DESC")
+       sort_fields (str): Fields to sort by, comma-separated.
+                         Available fields: accession, submissionDate, diseases,
+                         organismsPart, organisms, instruments, softwares,
+                         avgDownloadsPerFile, downloadCount, publicationDate
    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c23011 and 38fbb73.

📒 Files selected for processing (1)
  • pridepy/pridepy.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build
🔇 Additional comments (2)
pridepy/pridepy.py (2)

129-129: LGTM!

The formatting fix ensures consistent category choices without extra spaces.


341-359: Add type validation for sort_direction and fix inconsistency in sort_fields.

  1. The sort_direction option should validate input using type=click.Choice(["ASC", "DESC"]).
  2. There's an inconsistency in the sort fields: the help text mentions "submissionDate" but the choice uses "submission_date".

pridepy/pridepy.py Outdated Show resolved Hide resolved
Comment on lines +319 to +326
@click.option(
"-f",
"--filter",
required=False,
help="Parameters to filter the search results. The structure of the "
"filter is: field1==value1, field2==value2. Example "
"accession==PRD000001",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Clarify filter parsing logic.

The help text suggests using comma-separated field-value pairs (e.g., "field1==value1, field2==value2"), but the implementation doesn't show how these are parsed. Please verify:

  1. Is the filter string parsed correctly?
  2. Are multiple filters supported?

🏁 Script executed:

#!/bin/bash
# Description: Check if filter parsing logic exists in the Project class

# Search for filter parsing logic in the Project class
ast-grep --pattern $'class Project {
  $$$
  search_by_keywords_and_filters($$$) {
    $$$
  }
  $$$
}'

# Search for any filter-related code
rg -A 5 "filter.*==.*"

Length of output: 1141


Clarify Filter Parsing Implementation

The current implementation only shows a help text suggesting the use of comma‑separated field–value pairs, but no explicit code was found that splits or otherwise processes the filter string. In particular, a search for filter‐related parsing (e.g. splitting on commas or "==") within the Project class (and specifically in the search_by_keywords_and_filters function) didn’t reveal any parsing logic. This raises two concerns:

• The filter string does not appear to be explicitly parsed to convert something like "field1==value1, field2==value2" into structured filters.
• There is no clear evidence that multiple filters are handled, even though the documentation suggests that they should be.

It would be advisable to verify whether filter processing is handled elsewhere or to add explicit parsing (with corresponding inline comments and tests) so that the behavior matches the help text.

@selvaebi selvaebi requested a review from ypriverol February 9, 2025 22:11
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
pridepy/pridepy.py (2)

313-318: Improve keyword parameter help text.

The current help text could be clearer about what fields are searched and how the keyword is used.

     help="The entered word will be searched among the fields to fetch "
-    "matching pride."
+    "matching PRIDE projects. The keyword will be searched across project metadata "
+    "including titles, descriptions, and other relevant fields."

376-376: Consider handling empty sort_fields.

The current implementation joins sort fields without checking if the tuple is empty. While Click's default value ensures it won't be empty, it's good practice to handle edge cases.

-    sf = ', '.join(sort_fields)
+    sf = ', '.join(sort_fields) if sort_fields else 'submissionDate'
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38fbb73 and 1d95677.

📒 Files selected for processing (1)
  • pridepy/pridepy.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build
🔇 Additional comments (4)
pridepy/pridepy.py (4)

129-129: LGTM! Improved formatting of category choices.

The formatting of category choices has been improved by removing unnecessary spaces.


343-349: Add type validation for sort_direction.

As mentioned in the previous review, the sort_direction option should validate input using type=click.Choice(["ASC", "DESC"]).

     required=False,
     default="DESC",
-    help="Sorting direction: ASC or DESC",
+    help="Sorting direction",
+    type=click.Choice(["ASC", "DESC"]),

351-361: Fix inconsistency in sort_fields default value.

As mentioned in the previous review, there's an inconsistency between the default value "submission_date" and the choice "submissionDate".

     required=False,
-    default=["submission_date"],
+    default=["submissionDate"],
     multiple=True,

319-326: Clarify filter parsing implementation.

As mentioned in the previous review, the filter parsing logic needs clarification. The help text suggests using comma-separated field-value pairs, but the implementation details are missing. Please verify:

  1. Is the filter string parsed correctly?
  2. Are multiple filters supported?
#!/bin/bash
# Description: Check if filter parsing logic exists in the Project class

# Search for filter parsing logic in the Project class
ast-grep --pattern $'class Project {
  $$$
  search_by_keywords_and_filters($$$) {
    $$$
  }
  $$$
}'

# Search for any filter-related code
rg -A 5 "filter.*==.*"

@ypriverol ypriverol merged commit 0d8b565 into master Feb 10, 2025
8 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants