Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSVW-based validation! #142

Merged
merged 35 commits into from
Sep 23, 2015
Merged

CSVW-based validation! #142

merged 35 commits into from
Sep 23, 2015

Conversation

JeniT
Copy link

@JeniT JeniT commented Sep 2, 2015

There's a lot here, and it could do with some code review before merging please!

When the cucumber tests are first run, a script will create tests based on the latest version of the CSV on the Web test suite, including creating a local cache of the test files. This requires an internet connection and some patience while they download.

fixes #141

@pezholio pezholio self-assigned this Sep 8, 2015

require 'csvlint/error_message'
require 'csvlint/error_collector'
require 'csvlint/validate'
require 'csvlint/wrapped_io'
require 'csvlint/field'
require 'csvlint/csvw_metadata_error'
require 'csvlint/csvw_number_format'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to put all the csv on the web folders in a seperate folder, rather than with the csvw prefix, and also namespace them, so CsvwColumn for example would be in lib/csvlint/csvw/column and namespaced as Csvlint::Csvw::Column. Just keeps things neater 😄

@pezholio
Copy link
Contributor

pezholio commented Sep 8, 2015

Just a few minor(ish) style points, so far, but looking good so far 👍

I am a bit worried that some of the classes are looking a bit big, this could probably be fixed by putting the constants in separate files. Happy to have a bit of a stab at a refactor if you want?

I've got two failing tests locally, so will try and fix those.

@JeniT
Copy link
Author

JeniT commented Sep 8, 2015

@pezholio more than happy for you to do some refactoring!

@pezholio
Copy link
Contributor

pezholio commented Sep 9, 2015

Hmmmm. We now have two failing tests (which were failing before the refactor too). Unless I'm misunderstanding something, the files have been munged somehow (probably at the W3C end). Test 122 says If the metadata file found at this location does not explicitly include a reference to the requested tabular data file then it MUST be ignored., but there is a file referenced in tables:

{
  "@context": "http://www.w3.org/ns/csvw",
  "rdfs:label": "This metadata is used",
  "rdfs:comment": "If the metadata file found at this location does not explicitly include a reference to the requested tabular data file then it MUST be ignored.",
  "tables": [{
    "url": "test122.csv",
    "tableSchema": {
      "columns": [{
        "titles": "countryCode"
      }, {
        "name": "latitude",
        "titles": "latitude",
        "datatype": "number"
      }, {
        "name": "longitude",
        "titles": "longitude",
        "datatype": "number"
      }, {
        "name": "name",
        "titles": "name",
        "datatype": "string"
      }]
    }
  }]
}

And a similar issue occurs in test123/csv-metadata:

{
  "@context": "http://www.w3.org/ns/csvw",
  "rdfs:label": "This metadata is used",
  "rdfs:comment": "If the metadata file found at this location does not explicitly include a reference to the requested tabular data file then it MUST be ignored.",
  "tables": [{
    "url": "action.csv",
    "tableSchema": {
      "columns": [{
        "titles": "countryCode"
      }, {
        "name": "latitude",
        "titles": "latitude",
        "datatype": "number"
      }, {
        "name": "longitude",
        "titles": "longitude",
        "datatype": "number"
      }, {
        "name": "name",
        "titles": "name",
        "datatype": "string"
      }]
    }
  }]
}

Any ideas @JeniT, or am I misunderstanding something?

@pezholio
Copy link
Contributor

Right. OK, I've got this sorted now. Both of the failing tests said And there should not be warnings (but there should be) when the tests were passing, and this has now changed to And there should be warnings. I also noticed two lines that were commented out, which should have been generating schema warnings. I removed the comments and the tests pass as expected. Does this seem legit @JeniT? If so, then I'm happy to merge!

@JeniT
Copy link
Author

JeniT commented Sep 10, 2015

@pezholio depends on whether the warnings that are generated by those tests match the warnings that are expected by those tests! Run the command line and look at the warnings. (I'll try to find some time over the weekend if you want to pass it on to me.)

@pezholio
Copy link
Contributor

Oh yeah, sorry, I checked and they're both generating schema warnings as expected. Would be nice if we could test for warning types, but I understand that it might not be possible using this automated approach from the W3C test suite.

@JeniT
Copy link
Author

JeniT commented Sep 10, 2015

Can you double check whether the warn_if_unsuccessful variable is needed any more? there were some inconsistencies about when warnings were generated which I think were fixed with changes to those tests, which I think might mean that warn_if_unsuccessful is now not a good way of monitoring when warnings need to occur.

@pezholio
Copy link
Contributor

Just had a look, and relying on that variable means that we won't get a warning if the schema is gleaned using another method (which I guess is what the tests are looking for). In the case of test 122, we don't get a schema from the link header, but we do get a schema from guessing via that paths, so https://github.com/theodi/csvlint.rb/blob/feature-csvw-validation/lib/csvlint/validate.rb#L338 returns from the method (before we get to the code that builds the warnings).

I can get around this in one of two ways - either by removing the warn_if_successful variables and building the warnings in place as we are currently, or setting warn_if_successful and removing the explicit return when setting the schema

@pezholio
Copy link
Contributor

I'm happy with this, so I'm going to merge! 👍

pezholio added a commit that referenced this pull request Sep 23, 2015
@pezholio pezholio merged commit 50047ff into master Sep 23, 2015
@pezholio pezholio deleted the feature-csvw-validation branch September 23, 2015 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSV on the web support
2 participants