Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add list of notice names to the dataset response #429

Closed
emmambd opened this issue May 14, 2024 · 1 comment
Closed

Add list of notice names to the dataset response #429

emmambd opened this issue May 14, 2024 · 1 comment
Milestone

Comments

@emmambd
Copy link
Contributor

emmambd commented May 14, 2024

User Experience

API users should be able to query a dataset and see the validation notices it includes, grouped by severity. This makes it possible for consumers to:

  • Screen or pre-process a dataset based on its expected data quality issues: what types of issues should I expect from this data?
  • Run analytics on the quality of a feed over time (compare quality between a feed's datasets): is the quality of this feed getting better over time? does something look unusual in this new feed release?

Expected Result

Using dataset id mdb-1210-202402121801 as an example:

Option 1:

      "validation_report": {
         "validated_at": "2023-07-10T22:06:00Z",
          "features": [
            "Fares_V1"
          ],
          "validator_version": "4.2.0",
            "total_unique_error": 1,
            "total_unique_warning": 2,
            "total_unique_info": 3
            "errors": {
                decreasing_or_equal_stop_time_distance
            }
            "warnings": {
                equal_shape_distance_same_coordinates
                fast_travel_between_consecutive_stops
                leading_or_trailing_whitespaces
                missing_feed_info_date
                missing_recommended_field
                mixed_case_recommended_field
                route_color_contrast
                route_short_name_too_long
                same_name_and_description_for_route
                stop_too_far_from_shape_using_user_distance
                stop_without_stop_time
                trip_distance_exceeds_shape_distance_below_threshold
            }
            "info": {
                unknown_file
            }
      }

Option 2:

      "validation_report": {
         "validated_at": "2023-07-10T22:06:00Z",
          "features": [
            "Fares_V1"
          ],
          "validator_version": "4.2.0",
            "total_unique_error": 1,
            "total_unique_warning": 2,
            "total_unique_info": 3
            "notices": [
                {
                    "code": "decreasing_or_equal_stop_time_distance",
                    "severity": "ERROR"
                },
                {
                    "code": "equal_shape_distance_same_coordinates",
                    "severity": "WARNING"
                }
            ]
            
      }

Considerations

  • If we want to model the full validation report as an API endpoint in the future, does the response schema need to look different to scale? (e.g to show occurrences of each notice)
@emmambd
Copy link
Contributor Author

emmambd commented May 27, 2024

Decided this may not be useful for consumers - closing for now. Idea would be:

  • Consumers invested in the data quality report will download and parse the JSON report of the validator
  • Consumers who aren't don't want a long validation report response

@emmambd emmambd closed this as completed May 27, 2024
@emmambd emmambd closed this as not planned Won't fix, can't repro, duplicate, stale May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant