-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing of files is not currently supported #453
Comments
@Weilin37 You need to add Ideally file type identification by checking file header should be used, but it require special |
Hi @lalitpagaria, I checked my directory and indeed the ".txt" suffix is present. Here is one of the file names: PMC7462872.txt Which contains the following title and abstract text from public scientific literature:
I checked the other files which were auto generated in the same format and they all have ".txt" suffix as well. I just tried only putting in one of the "txt" files to test if one file would work but it gave me the same error. Could it be possibly due to something that's inside the file? |
@Weilin37 I am able to reproduce this issue, it happen when OS create few internal files with extension(s) not supported by converter. |
@lalitpagaria A PR would be great. Thx! |
* Skip file converter if file type is not supported. Refer #453 * Fixing issue reported by mypy * Addressing review comments
Thanks! Just to be thorough - I generated the txt files in python and then manually moved them to a new folder. So I'm not entirely sure any hidden files are there but we'll have to see! |
@Weilin37 can you please try latest changes now. See if fix which merged for this is working in your case or you are getting some other issue. |
it works! |
Awesome! Great to see the community helping each other :) |
@lalitpagaria with 0.4.9 this problem appears again. When I upgrade from master (0.5.0), I get a different issues with the following code:
|
@Weilin37 DPR issue on 0.5.0 is because of changes in #527 Regarding your original issue I am not sure why you getting that as haystack already have test to catch it. Can you please try above suggested change on 0.5.0 and then share stacktrace or error. |
@Weilin37 I guess with 0.5.0 you refer to the FARM version? Please always make sure that your Haystack and FARM version are compatible. For example, with the latest Haystack, we expect FARM 0.5.0 (as specified in requirements.txt). As @lalitpagaria already mentioned, the signature of DPR has changed in #527. You can also see an updated example reflecting the changes in our Tutorial 6 |
@tholor thanks the changing from max_seq_len to max_seq_len_passage worked! |
Describe the bug
I am trying to follow the tutorial notebook for DPR and replace the GOT text files with my own text files. My text files is just a simple batch of 2k text files comprising of a title and abstract (paragraph or two long) separate by a newline.
Error message
in this section of the code:
I get an error message: Indexing of files is not currently supported:
Expected behavior
I was expecting the code to simply convert the files to dicts in the same way it does for the GoT text
To Reproduce
https://github.com/deepset-ai/haystack/blob/master/tutorials/Tutorial6_Better_Retrieval_via_DPR.ipynb
System:
The text was updated successfully, but these errors were encountered: