-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DS-4301] Added Content Reports section and Filtered Collections report therein #202
Changes from 8 commits
177457e
1d35992
022db3c
7a51061
9e094d3
77bcc5e
3470e30
b8a1274
1527355
8bad9b3
a1a13dd
3cc4598
5455187
41e9ac3
001a342
b6af0ae
26648ee
718ca8a
a552cb8
6386f46
768e541
487f59b
014af1b
9b2bdca
1d4a1ff
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,149 @@ | ||
# Displaying the filtered collections report | ||
[Back to the list of all defined endpoints](endpoints.md) | ||
|
||
## Statistics for the whole repository | ||
**GET /api/contentreports/filteredcollections** | ||
|
||
**POST /api/contentreports/filteredcollections** | ||
|
||
This endpoint provides aggregated statistics about the number of items per collection according to selected filters. | ||
|
||
For each collection, the basic report consists of: | ||
* name (label) and handle of the collection | ||
* name (label) and handle of the parent community | ||
* total number of items | ||
* number of items matching all selected filters | ||
|
||
In addition, a `summary` element provides the total number of items and the total number of items matching all filters | ||
for the whole repository. | ||
|
||
An example JSON response document to `/api/contentreports/filteredcollections`: | ||
```json | ||
{ | ||
"id": "filteredcollections", | ||
"collections": [ | ||
{ | ||
"label": "Collection 1", | ||
"handle": "100/1", | ||
"values": { | ||
"is_discoverable": 23, | ||
"has_multiple_originals": 3, | ||
"has_pdf_original": 14 | ||
}, | ||
"community_label": "Community 1", | ||
"community_handle": "20.500.11794/1", | ||
"nb_total_items": 23, | ||
"all_filters_value": 3 | ||
}, | ||
{ | ||
"label": "Collection 2", | ||
"handle": "100/2", | ||
"values": { | ||
"is_discoverable": 1, | ||
"has_multiple_originals": 0, | ||
"has_pdf_original": 0 | ||
}, | ||
"community_label": "Community 1", | ||
"community_handle": "20.500.11794/1", | ||
"nb_total_items": 1, | ||
"all_filters_value": 0 | ||
}, | ||
{ | ||
"label": "Collection 3", | ||
"handle": "100/3", | ||
"values": { | ||
"is_discoverable": 1, | ||
"has_multiple_originals": 0, | ||
"has_pdf_original": 1 | ||
}, | ||
"community_label": "Community 1", | ||
"community_handle": "20.500.11794/1", | ||
"nb_total_items": 1, | ||
"all_filters_value": 0 | ||
} | ||
], | ||
"summary": { | ||
"label": null, | ||
"handle": null, | ||
"values": { | ||
"is_discoverable": 25, | ||
"has_multiple_originals": 3, | ||
"has_pdf_original": 15 | ||
}, | ||
"community_label": null, | ||
"community_handle": null, | ||
"nb_total_items": 25, | ||
"all_filters_value": 3 | ||
}, | ||
"type": "filtered-collections", | ||
"_links": { | ||
"self": { | ||
"href": "http://localhost:8080/dspace-server/api/contentreports/filtered-collections" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
The request can be parameterized with a series of filters to add to the basic report. | ||
|
||
In GET mode, it consists of a `filters` query parameter whose value is a comma-separated list of filters | ||
like the following: | ||
``` | ||
?filters=is_discoverable,has_multiple_originals,has_pdf_original | ||
``` | ||
|
||
In POST mode, it is defined as a JSON document like this: | ||
```json | ||
{ | ||
"filters": { | ||
"is_discoverable": true, | ||
"has_multiple_originals": true, | ||
"has_pdf_original": true | ||
} | ||
} | ||
``` | ||
|
||
The available filters are as follows: | ||
|
||
* Item Property Filters | ||
* `is_item`: Is Item - always true | ||
* `is_withdrawn`: Withdrawn Items | ||
* `is_not_withdrawn`: Available Items - Not Withdrawn | ||
* `is_discoverable`: Discoverable Items - Not Private | ||
* `is_not_discoverable`: Not Discoverable - Private Item | ||
* Basic Bitstream Filters | ||
* `has_multiple_originals`: Item has Multiple Original Bitstreams | ||
* `has_no_originals`: Item has No Original Bitstreams | ||
* `has_one_original`: Item has One Original Bitstream | ||
* Bitstream Filters by MIME Type | ||
* `has_doc_original`: Item has a Doc Original Bitstream (PDF, Office, Text, HTML, XML, etc) | ||
* `has_image_original`: Item has an Image Original Bitstream | ||
* `has_unsupp_type`: Has Other Bitstream Types (not Doc or Image) | ||
* `has_mixed_original`: Item has multiple types of Original Bitstreams (Doc, Image, Other) | ||
* `has_pdf_original`: Item has a PDF Original Bitstream | ||
* `has_jpg_original`: Item has JPG Original Bitstream | ||
* `has_small_pdf`: Has unusually small PDF | ||
* `has_large_pdf`: Has unusually large PDF | ||
* `has_doc_without_text`: Has document bitstream without TEXT item | ||
* Supported MIME Type Filters | ||
* `has_only_supp_image_type`: Item Image Bitstreams are Supported | ||
* `has_unsupp_image_type`: Item has Image Bitstream that is Unsupported | ||
* `has_only_supp_doc_type`: Item Document Bitstreams are Supported | ||
* `has_unsupp_doc_type`: Item has Document Bitstream that is Unsupported | ||
* Bitstream Bundle Filters | ||
* `has_unsupported_bundle`: Has bitstream in an unsupported bundle | ||
* `has_small_thumbnail`: Has unusually small thumbnail | ||
* `has_original_without_thumbnail`: Has original bitstream without thumbnail | ||
* `has_invalid_thumbnail_name`: Has invalid thumbnail name (assumes one thumbnail for each original) | ||
* `has_non_generated_thumb`: Has non-generated thumbnail | ||
* `no_license`: Doesn't have a license | ||
* `has_license_documentation`: Has documentation in the license bundle | ||
* Permission Filters | ||
* `has_restricted_original`: Item has Restricted Original Bitstream | ||
* `has_restricted_thumbnail`: Item has Restricted Thumbnail | ||
* `has_restricted_metadata`: Item has Restricted Metadata | ||
|
||
Possible response status | ||
|
||
* 200 OK - The specific report data was found, and the data has been properly returned. | ||
* 403 Forbidden - if a valid CSRF token is missing when issuing a POST request. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
# Displaying the filtered collections report | ||
[Back to the list of all defined endpoints](endpoints.md) | ||
|
||
## Statistics for the whole repository | ||
**POST /api/contentreports/filtereditems** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this a POST instead of a GET? I notice that the "statistics" endpoints always use GET except when they are adding data to the statistics. See https://github.com/DSpace/RestContract/blob/main/statistics-reports.md and https://github.com/DSpace/RestContract/blob/main/statistics-viewevents.md Could we better describe why we need to use POST for these endpoints? It appears they are readonly, which implies they might be switched to GET. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's true, I thought about it. My concern with GET is, as you suggested above, the limited length of the parameters passed as part of the URL. There should be no problems with the Filtered Collections report (only Boolean filters). Parameterization of the Filtered Items report, however, is much more complex and can easily become long enough to exceed any limit enforced by application servers for URL query strings. This is why I implemented this report as a POST endpoint. For (a bit of) uniformity, I also added POST support to the Filtered Collections report. Besides, while the HTTP spec clearly states that GET should be used for read-only requests, I saw nothing stating that POST should be used only for data-changing requests. If you feel that everything should be switched to GET anyway, I can do it. In this case, the Filtered Items report shall be thoroughly tested with lots of parameters selected to make sure that nothing goes wrong due to too long a query string. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. About Filtered Items: The parameters could be organized into a GET query string, although such a string might become quite long. Another concern I had while designing the API for this report is the "query predicates" part: these are structured parameters (a query predicate is a (field, operator, value) tuple). This is another reason why I didn't include a GET version of this report. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please check: https://github.com/DSpace/RestContract/blob/main/search-endpoint.md solution. A query string solution it's used for filtering results:
where a {
"query":"my query",
"scope":"9076bd16-e69a-48d6-9e41-0238cb40d863",
"appliedFilters": [
{
"filter" : "title",
"operator" : "notcontains",
"value" : "abcd",
"label" : "abcd"
}, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this discussion stems from a disagreement between HTTP and REST about what POST is for. RFC 9110 says that creating a new resource is only one possible use for POST. "The POST method requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics." The description of POST here is quite a bit narrower. One might say that REST is a reuse of HTTP syntax with different semantics. So it can be argued that REST is not a very good fit for this operation, but it is what we have. Would it be a violation of the "spirit of REST" to consider such a POST to be storing a report description, which is consumed in the act of generating the report? Reports may take some time to create. Suppose that one POSTs a document describing the desired report and receives a token in the response. The report generator runs in the background. When finished, the token can be presented (using GET) to receive the report, and the report description is then destroyed. |
||
|
||
This endpoint provides a custom query API to select items from existing collections. | ||
|
||
An example JSON response document to `/api/contentreports/filtereditems` (metadata removed for brevity): | ||
```json | ||
{ | ||
"id": "filtereditems", | ||
"items": [ | ||
{ | ||
"id": "07e388ff-f22b-4d4f-8275-acab5c3edacc", | ||
"uuid": "07e388ff-f22b-4d4f-8275-acab5c3edacc", | ||
"name": "Enhancing the lubricity of an environmentally friendly Swedish diesel fuel MK1", | ||
"handle": "20.500.11794/42", | ||
"metadata": { | ||
"dc.contributor.author": [ | ||
{ | ||
"value": "Smith, John", | ||
"language": null, | ||
"authority": "6eee383a-f126-4705-9ffb-b4aa4832070e", | ||
"confidence": 600, | ||
"place": 0 | ||
} | ||
], | ||
"dc.publisher": [ | ||
{ | ||
"value": "Elsevier", | ||
"language": "fr_CA", | ||
"authority": null, | ||
"confidence": -1, | ||
"place": 0 | ||
} | ||
], | ||
}, | ||
"inArchive": true, | ||
"discoverable": true, | ||
"withdrawn": false, | ||
"lastModified": "2015-11-23T17:30:21.463+00:00", | ||
"entityType": "Publication", | ||
"owningCollection": { | ||
"id": "d98a828c-45c2-43d9-9861-6b9800bf14f5", | ||
"uuid": "d98a828c-45c2-43d9-9861-6b9800bf14f5", | ||
"name": "Articles publiés dans des revues avec comité de lecture", | ||
"handle": "100/1", | ||
"metadata": { | ||
"dc.identifier.uri": [ | ||
{ | ||
"value": "http://localhost:4000/handle/100/1", | ||
"language": null, | ||
"authority": null, | ||
"confidence": -1, | ||
"place": 0 | ||
} | ||
], | ||
"dspace.entity.type": [ | ||
{ | ||
"value": "Publication", | ||
"language": null, | ||
"authority": null, | ||
"confidence": -1, | ||
"place": 0 | ||
} | ||
] | ||
}, | ||
"type": "collection" | ||
}, | ||
"type": "item" | ||
}, | ||
{ | ||
... | ||
} | ||
], | ||
"itemCount": 40, | ||
"type": "filtereditemsreport", | ||
"_links": { | ||
"self": { | ||
"href": "http://localhost:8080/dspace-server/api/contentreports/filtereditems" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
The request is defined as a JSON document like this: | ||
```json | ||
{ | ||
{ | ||
"collections": [ | ||
"" | ||
], | ||
"presetQuery": "new", | ||
"queryPredicates": [ | ||
{ | ||
"field": "*", | ||
"operator": null, | ||
"value": null | ||
} | ||
], | ||
"pageLimit": "100", | ||
"filters": { | ||
"is_discoverable": true, | ||
"has_multiple_originals": true, | ||
"has_pdf_original": true | ||
}, | ||
"additionalFields": [ | ||
"dc.contributor.advisor" | ||
] | ||
}} | ||
``` | ||
|
||
The parameters are specified as follows: | ||
|
||
* collections: The collections where to search items. If none are provided, the whole repository is searched. | ||
* presetQuery: This parameter is not used on the REST API side. It defines a predefined set of query predicates | ||
defined in the Angular layer. | ||
* queryPredicates: Predicates used to filter matching items. They can be predefined (see presetQuery above) | ||
or defined specifically by the user. | ||
* pageLimit: Maximum number of items per page. | ||
* filters: Supplementary filters, these are the same as those available in the Filtered Collections report. | ||
Please see [/api/contentreports/filteredcollections](contentreports-filteredcollections.md) for details. | ||
* additionalFields: Fields to add to the basic report for each item included in the report. | ||
|
||
Possible response status | ||
|
||
* 200 OK - The specific report data was found, and the data has been properly returned. | ||
* 403 Forbidden - if a valid CSRF token is missing when issuing a POST request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if we describe GET and POST mode separately. I'm having a hard time understanding the way this is documented. When would someone use GET and when would they use POST? It's unclear if everything below this point in the docs is ONLY for POST or if it also applies to GET? Could we give some more examples here as to what the differences are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reorganized report parameterization in the Filtered Collections report documentation. I also fixed a few mistakes and added some info I realised that was missing (e.g., definition of "basic report" in Filtered Items).
About usage of GET vs. POST, please see my other comments below.