Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relying on pySBD is not feasible- package hasn't been updated in 3 years (sentence segmentation) #1736

Closed
diegodebrito opened this issue Dec 6, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@diegodebrito
Copy link

Describe the Feature
One of RAGAS dependencies is the package pySBD. This package has several bugs but it seems to be largely unmaintained at this point. The last release was 3 years ago. This will expose RAGAS to bugs, security vulnerabilities and more.

Why is the feature important for you?
I found a bug related to pySBD and opened an issue there. I noticed that the repo seems to have no activity anymore. It's an extremely bad idea to have that as one of the main dependencies in my opinion.

Additional context
Switching to a better known and more well-maintained package will avoid headaches for RAGAS maintainers in the future.

@diegodebrito diegodebrito added the enhancement New feature or request label Dec 6, 2024
@shahules786
Copy link
Member

Hey @diegodebrito thanks for reporting this. Any alternative package you can recommend to do just segmentation?

@diegodebrito
Copy link
Author

Hi @shahules786, sorry for taking so long to answer. I am not a specialist on that, but maybe just standard Spacy would work for that purpose?

Something along these lines

import spacy

    split_sentences = []
    for answer in answers:
        doc = nlp(answer)
        sentences_with_index = {i: sent.text for i, sent in enumerate(doc.sents)}
        split_sentences.append(sentences_with_index)

What was the reason you went with pySBD in the first place?

@shahules786
Copy link
Member

Hey @diegodebrito We have removed the need for segmenting from ragas with the latest PR. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants