-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit lines #101
Limit lines #101
Conversation
…is idiomatic and faster
… evaluated in order to limit the duration of the validation process
I'm going to leave this for now. Most of the time we'll want to analyse all of the CSV, as some errors like Thanks for the PR though, great that people are finding this useful! 😄 👍 |
@pezholio What's the harm in adding an option to limit the number of lines? If the option is not set, then the whole file is analyzed. While it's possible for a CSV to have all its bad rows in the last 10% of the file, bad rows in real CSVs tend to be spread throughout a file. |
I've finally got to looking at this, and it's a great pile of stuff, thanks! Yes, adding a limit is perfectly fine as an option, as it won't affect existing users unless they choose to use it. Only thing is, the I think we should:
I will pull across the changes from your branch and open separate PRs for those, then get it all merged. |
Ah, my mistake, I see the speed fixes are already in, but obviously this branch just needs a merge from master. |
I'll merge this, and then change to a separate option. |
Thanks @Floppy. Sorry for having the speed improvement already in my branch. I initially planned to only have the limit change in my branch but I was already working with the speed improvement and noticed it only when the PR was already sent. |
No worries, sorry I got confused - I hadn't had coffee yet :) |
Currently csvlint.rb reads all the lines of the CSV files which can become very long for large files while most of the issues will be discovered in the first 10,000 or 100,000 lines (in files with potentially millions of records).
The proposed change add a supported parameter (limitLines) in the configuration dialect to limit the maximum number of lines evaluated, allowing to control the execution duration. (could even be useful for the web instance csvlint.io where most users will think that no output will come.