Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IIIF dragsource #1570

Open
thomasstjerne opened this issue Jun 28, 2021 · 26 comments
Open

IIIF dragsource #1570

thomasstjerne opened this issue Jun 28, 2021 · 26 comments
Assignees

Comments

@thomasstjerne
Copy link
Contributor

If a IIIF manifest is given in an multimedia extension, show a draggable IIIF logo on the Occurrence page.

@thomasstjerne thomasstjerne self-assigned this Jun 28, 2021
thomasstjerne added a commit that referenced this issue Jun 28, 2021
@rogerhyam
Copy link

rogerhyam commented Jul 1, 2021

I've added IIIF manifests to our DwC Archive feed here

https://data.rbge.org.uk/service/dwca/data/darwin_core.zip

This is this dataset within the portal

https://www.gbif.org/dataset/bf2a4bf0-5f31-11de-b67e-b8a03c50a862

Unfortunately it is now just over 100mb so the validator won't check it but I believe it is OK.

The manifests have been added to the end of the simple GBIF multimedia extension. (It starts with jpg images linked in this extension.) There are 491k of them.

For the manifests I have set the dc:type to InteractiveResource and the dc:format to application/ld+json. I have also added another property from Audubon Core ac:serviceExpectation and set it to IIIF. These are somewhat arbitrary decisions. My first thought was to put "IIIF" or "IIIF Manifest" in the dc:type but then I saw that dc:type should be limited to what Dublin Core expects and I thought it would be more interoperable this way. But I didn't want you to have to guess that this was a IIIF manifest from just the combination of InteractiveResource and the mime type. In that case the only way to be sure it was a IIIF manifest would be to call and parse the JSON LD, so better to have a hint in the ac:serviceExpectation field.

I have put the actual URI of the IIIF Manifest in the dc:identifier field and I have put a link to our Mirador instance with the manifest URI in it in the dc:references field. I'll discuss possible uses of this in the interface in a separate comment.

I'm not attached to this approach and happy to change it if you think something else would be better. It would take just a few minutes.

If you could harvest this then there would be IIIF manifests to display in the interface.

@rogerhyam
Copy link

How to display the IIIF Icon on the Occurrence Page?

The Icons are pretty simple

<a target="_blank" href="mirador/?manifest=https://iiif.rbge.org.uk/herb/iiif/E00008781/manifest">
  <img 
      src="logo-iiif.png" 
      alt="IIIF Manifest"
      title="Click to use IIIF viewer or drag'n'drop to add to open viewer"
      draggable="true"
   >
</a>

There are some examples on this page:

https://iiif.rbge.org.uk/viewers/

Or our query results page here:

https://data.rbge.org.uk/search/herbarium/?family=&genus=&species=&coll_name=&coll_num=&barcode=&country_name=&region=&major_taxon=&hasimage=1&cfg=vherb.cfg&keywords=

The href contains a link to a viewer with the manifest URI as the manifest param in the query string. This means a click on the icon loads a viewer in a new tab with that manifest loaded. If the link is dragged over another viewer then it parses out the manifest from the query string and adds it to the current workspace.

The decision that would need to be made is whether GBIF supply their own viewer or not. Three options are:

  1. Take the URI from the dc:identifier field and add it to a query string that will load an instance of Mirador, UV or other viewer hosted by GBIF with that manifest. This may be more robust in that GBIF can keep the viewer up to date, fewer components in the user experience. But overhead of maintaining viewer.

  2. Take the URI from the dc:references field and use that in the href. Clicking on the icon will load the manifest in the viewer of the data supplier. Easier to implement but if suppliers viewer breaks then user experience breaks.

  3. Link to IIIF Manifest directly. Clicking icon would display JSON file. Currently dragging icon to Mirador wouldn't work, not sure about UV, but it may become standard. This is conceptually cleaner but may scare users.

Gotchas: The version of Mirador installed by npm (and available via content delivery system) at the moment doesn't support drag and drop so it needs to built from the master branch (which is pretty simple). This may change with next release. I'm not sure about UV. I think it supports out of the box. I'll check.

IIIF are working on a Content State specification which should standardise this behaviour (https://iiif.io/api/content-state/0.9/). Any changes are likely to be trivial though.

Sorry - too many words!

thomasstjerne added a commit that referenced this issue Jul 1, 2021
@thomasstjerne
Copy link
Contributor Author

Thanks @rogerhyam
There was an unknown term in the meta.xml as it was using simple Multimedia rather than Audubon Core. @MattBlissett did an updated meta.xml available here
Can you update the source archive with that one?

@rogerhyam
Copy link

Thanks @thomasstjerne & @MattBlissett

I've swapped to that metadata file and regenerated the archives. You should be able to attempt harvest again now.

@thomasstjerne
Copy link
Contributor Author

@rogerhyam the dataset is now re-ingested, and you should see IIIF icons on occurrence pages:
Screenshot at Jul 02 12-37-18

@rogerhyam
Copy link

Brilliant stuff. Thank you.

Is it OK if I start "marketing" this way of publishing IIIF to GBIF? There are a few collections out there who could implement it quite quickly and others who will be inspired.

@MattBlissett
Copy link
Member

I'd like a quick check we're using Audubon Core appropriately, ideally with @baskaufs.

We've set these terms:

ac:accessURI = https://iiif.rbge.org.uk/herb/iiif/E00719041/manifest
dc:format = application/json
ac:serviceExpectation = IIIF

Can an IIIF manifest represent multiple, non-equivalent images? ("Equivalent" meaning different resolutions/formats of the same image).


But people can implement this, or be inspired by it, knowing we might ask them to tweak their implementations if it turns out there's a better way.

@rogerhyam
Copy link

rogerhyam commented Jul 2, 2021

@MattBlissett The relationship is between IIIF Manifest and object (specimen in our case but could be book). The manifest is supplied by the presentation API and might contain multiple images of the same object and also non-image annotations like annotations and transcriptions. It can be used for books etc. The images are painted onto canvases in the manifest and are supplied by the IIIF Image API. There are various levels of implementation of the image API but mainly it is a tiling and scaling service.

Here is one of our specimens with 9 images of it

https://iiif.rbge.org.uk/viewers/mirador/?manifest=https://iiif.rbge.org.uk/herb/iiif/E00010016/manifest

We should really add an annotation layer to the manifest that transcribed this handwriting!

And of course here it is through GBIF now

https://www.gbif.org/occurrence/574685369

This example from Yale illustrates well different views of same object. I don't think we will be X-raying herbarium specimens soon but the zoologists might be into this.

https://iiif.rbge.org.uk/viewers/mirador/?manifest=https://manifests.collections.yale.edu/ycba/obj/5005

So any one specimen should only need one IIIF Manifest associated with it.

@baskaufs
Copy link

baskaufs commented Jul 5, 2021

@MattBlissett @rogerhyam Although we've talked about IIIF in the AC maintenance group, I don't think there is a clear notion about how to handle situations like this. The concept of a "media item" is pretty broad (see https://tdwg.github.io/ac/subtype/ for examples), but I'm not sure that there is a clear idea of how to handle a composite entity like in this example. AC is supposed to be designed to also describe "collections" of media items, which might apply to the set of images described in a manifest, but I don't think anyone had something like an IIIF manifest in mind when AC was ratified. So I think we are somewhat in uncharted territory here.

@rogerhyam
Copy link

IIIF does not allow for domain specific metadata. It only permits metadata that describes how the annotations are linked to a set of canvases. Domain specific data should always be via a seeAlso link. IIIF is a pure media object in the sense that any image, movie, audio service would be. So I don't think it breaks anything semantically.

@rogerhyam
Copy link

Should we do something about the "No media identifier provided" message? Could we replace it with a repeat of the IIIF Logo, perhaps with an instruction that the icon can be clicked or dragged to a suitable viewer?

image

@thomasstjerne
Copy link
Contributor Author

Should we do something about the "No media identifier provided" message? Could we replace it with a repeat of the IIIF Logo, perhaps with an instruction that the icon can be clicked or dragged to a suitable viewer?

Yes - that is the intention. But it requires a little change in the GBIF API: https://api.gbif.org/v1/enumeration/basic/MediaType.
in this response, you will see that type and format are missing for "InteractiveResource" and "application/ld+json":

"media": [
{
"type": "StillImage",
"format": "image/jpeg",
"description": "Image of herbarium specimen E00622961 by Specimen Digitisation Pipeline",
"creator": "Specimen Digitisation Pipeline",
"license": "http://creativecommons.org/publicdomain/zero/1.0/",
"rightsHolder": "Royal Botanic Garden Edinburgh",
"identifier": "http://repo.rbge.org.uk/image_server.php?kind=1500&path_base64=L2hlcmJhcml1bV9zcGVjaW1lbl9zY2Fucy9FMDAvNjIyLzk2MS85Mzk2MTkuanBn"
},
{
"description": "IIIF Manifest for specimen E00622961",
"creator": "Royal Botanic Garden Edinburgh",
"license": "http://creativecommons.org/publicdomain/zero/1.0/",
"rightsHolder": "Royal Botanic Garden Edinburgh",
"identifier": "https://iiif.rbge.org.uk/herb/iiif/E00622961/manifest"
}
]

@jholetschek
Copy link

Hi everyone,

sorry for jumping in a bit belated, I've been on vacation.

BGBM also serves the images as IIIF. We could easily add the manifest as an additional MultimediaObject with format=application/ld+json. Would it be possible to index this in a similar way to the AudubonCore extenion?

Cheers, Jörg

@baskaufs
Copy link

I've been pondering how IIIF manifests might fit into Audubon Core. I'm thinking that it might be good to just add a property that could be used to store a manifest for an AC resource, such as an individual media item. It could be a bit complicated for manifests that describe more complex objects with multiple canvases containing multiple media items. But at least in theory, AC is supposed to be able to describe "collections". I think the idea there was institutional collections, but I don't see necessarily why it couldn't be any sort of collection. In that case the multi-image resource could be a collection with a manifest property.

As a point of reference, Wikidata has a property "IIIF manifest" (https://www.wikidata.org/wiki/Property:P6108) whose value is a URL linking to a manifest for a media item. When the IIIF manifest addon (https://www.wikidata.org/wiki/User:Btwashburn/iiif-mirador.js) is installed, an imbedded Mirador IIIF viewer appears for any item page that has a value for the IIIF manifest property. For example:

image

Having such a property in Audubon Core would at least in theory allow data users to make use of any manifest data included in an AC record. If there is interest in adding this term, please feel free to open an issue in the Audubon Core issues tracker suggesting such a term addition and the maintenance group would be happy to handle it as a potential addition.

@rogerhyam
Copy link

I'm always keen to avoid property inflation/multiplication. Analogous media types might be PDF, HTML or multipage TIFF. Should we then have new properties for each of these?

We already have a mechanism to indicate what the media is. Adding a new property would make the ac:serviceExpectation and dc:format redundant. It is a good mechanism in that we don't have to add a new property every time a new media type arrives. Who knows what's around the corner!

@jholetschek
Copy link

I agree with Roger - IIIF is just another media type which can be provided using the existing mechanisms, so I would prefer to use the existing terms.

Concerning ABCD: I'll be away the coming week but will be back on August 2nd, so happy to work on that/comment after that day.

@baskaufs
Copy link

I agree that we should avoid adding unnecessary new properties if the existing ones will work. If the design pattern you are working out works, I would like to document it in the Audubon Core examples pages that we are trying to put together. For images, it's this page.

There are a couple practical details that I would like to work out relating to using dc:format and ac:serviceExpectation to identify IIIF manifests.

Currently, JSON-LD is not part of the controlled vocabulary for format. It could be added, but it would have to be considered generic JSON-LD and not be assumed to be a IIIF manifest since JSON-LD can be used to describe anything. The other issue is that although the recommended file extension for JSON-LD is .jsonld, people typically don't use it and just use .json. So there would basically have to be two sets of format values added to the controlled values list:

application/ld+json for MIME type with jsonld as the file extension for JSON-LD and

application/json for MIME type with json as the file extension for vanilla JSON.

Neither of these would unambiguously indicate an IIIF manifest, so this brings us to designating a controlled value for serviceExpectation. The AC Maintenance Group has been on a campaign to formalize controlled vocabularies for terms that need them and so far we haven't done that for serviceExpectation. But maybe this is the time for that. The current values recommended in the notes are online, authenticate, and published(non digital). In the other controlled vocabularies that have been developed so far in TDWG, for controlled value strings we've been trying to follow the convention of using lowerCamelCase with no special characters to minimize the number of variants that people dream up to use. So following that convention, the values would be online, authenticate, and publishedNonDigital. If we added IIIF to the mix, following the pattern would be iiif rather than IIIF as @rogerhyam had given in his example.

The other complication with serviceExpectation is that for better or worse, the pattern in Audubon Core has been to use xLiteral as the local name for terms whose values are strings and x as the local name for terms whose values are IRIs. Currently, there is no ac:serviceExpectationLiteral but if the existing patterns are to be followed, we should mint that term and create a controlled vocabulary document listing standard value IRIs and controlled value strings as we have for other terms with controlled values. @MattBlissett, how frequently is ac:serviceExpectation currently used (i.e. how much would we break if we followed the AC pattern and switched the controlled string version of the term to ac:serviceExpectationLiteral)?

It would be best to settle all of this sooner rather than later while y'all are working this out since we don't want to break stuff later. We do not have any maintenance group meetings scheduled at the moment but I can schedule one within the next month to get the ball rolling on this.

@jholetschek
Copy link

@MattBlissett: I can provide a sample ABCD archive with IIIF manifest provided in a similar manner as in Roger's DwC archive - namely, an additional MultimediaObject with format="application/ld+json". Would this end up being harvested in a similar way? That is, with the manifest shown as the IIIF icon?

@MattBlissett
Copy link
Member

Jörg, I suspect we'd need to add some extra fields for the ABCD→DWC mapping. If you can provide an example (either an archive, or a change to a test dataset you control on gbif-uat.org) we can investigate what we need to do.

@jholetschek
Copy link

Hi Matt,

I've updated the archive for our herbarium dataset with an additional MultimediaObject for the manifest:
https://www.gbif.org/dataset/85714c48-f762-11e1-a439-00145eb45e9a

I've also talked to the other consumer of this dataset, the OpenUp aggregator for Europeana. Europeana also supports IIIF and the suggested way of providing the manifests seems to be suitable for them as well.

@MortenHofft
Copy link
Member

MortenHofft commented Sep 20, 2021

I do not know much about iiif. But I miss having a way to link existing images (jpg and similar) to their iiif equivalents in the UI. And ideally without having to parse the manifest.

I imagine something like an identifier that I can use to go to the corresponding canvas/image in e.g. Mirador. So that clicking an image on https://www.gbif.org/occurrence/574685369 takes you to the correct image in Miradaor.

A similar thought might be the reason why the same IIIF manifest is included 9 times in https://www.gbif.org/occurrence/574685369?


Secondly, these examples are using the presentation API (i think). Would it make sense to support the image API as well (as a simpler version)? E.g. just having https://iiif.rbge.org.uk/herb/iiif/E00010016_f/info.json as the identifier/accessURI.

I notice because Open Layers and many other libraries support showing IIIF images, but not presentations.

@rogerhyam
Copy link

@MortenHofft I think you need to parse the manifest. That is the whole point of being able to bind images together. i.e. if you have abaxial and adaxial views of the same specimen where would you specify that but in the manifest?

If you linked to the Image API end point which image would you link to and where would you specify what the image was of and would you include a plain link to the JPEG as well?

None of this precludes having a link to a JPG (or other image) in the "old fashioned" way of linking to a simple image. In fact that link could be to an image served by the IIIF Image API.

The example you give with 9 images illustrates this well. The IIIF manifest is not displayed 8 times. There are 8 images of the specimen that are linked the old way. There is a single IIIF end point we have just added in that represents the whole lot. In the future we'd ideally publish a summary image as a jpeg and the whole object as a IIIF manifest.

@MortenHofft
Copy link
Member

Hi, thank you for taking time to respond

My aim here is to present the occurrence in the best possible way to the end user.

That is the whole point of being able to bind images together. i.e. if you have abaxial and adaxial views of the same specimen where would you specify that but in the manifest?

I'm not saying that the presentation isn't useful as a way to group, describe and present the resources. Simple that the presentation manifest and the regular images are decoupled as if they had no relation. That makes it more difficult to present the data as an API consumer.

As an example: a natural way to present and interact with that occurrence page is to look at the images, and click on one you find interesting and be taken to that image in e.g. Mirador.

It is in theory perfectly possible to link to a specific image in the presentation (currently it is hardcoded to go to image number 3 as you can see by clicking the iiif link). But since we have no information on the relation then we cannot provide that linking from the jpg to the presentation. It could be as simple as a fixed order (fragile) or a property that defined it. You might as an insider recognise the iiif logo, but most users will scroll to the images and interact with them - not the iiifPresentation.

I'm NOT too keen to parse the manifest, fetch the images and compare them to the provided JPGs to figure out what the correspondence is. Nor am I keen to parse the manifest to extract the thumbnails (if I am lucky and they are present) and show those in combination with the already provided stillImages (most likely leading to duplicated images).

So I'm not suggesting to not use the presentation, but since you ask "if you have abaxial and adaxial views of the same specimen where would you specify that but in the manifest" - couldn't subjectOrientation in AudubonCore be used for that?

The example you give with 9 images illustrates this well. The IIIF manifest is not displayed 8 times. There are 8 images of the specimen that are linked the old way. There is a single IIIF end point we have just added in that represents the whole lot. In the future we'd ideally publish a summary image as a jpeg and the whole object as a IIIF manifest.

There is 9 regular jpg images and then there is 9 iiif manifests.
https://api.gbif.org/v1/occurrence/574685369/fragment
The manifest is indeed linked 9 times. But I agree it doesn't make sense. The UI has sensibly decided only to show one of them.

My last comment was simply wether I should be able to share images linking to the imageAPI alone (not using the presentation manifest). Not as a replacement, but as an alternative. It was more if a question really in case it was simpler for some publishers. I guess this is also a way to say that serviceExpectation: IIIF is equating IIIF to the presentation API. Would it not be more flexible to do serviceExpectation: IIIF/presentation and serviceExpectation: IIIF/image or similar?

I see that my answer got quite long - I'm certain that a native speaker could have been more concise.

@baskaufs
Copy link

I do not have any answer about where one would specify multiple views of the same specimen. However, I am keenly interested in determining the answer to that question. The Views Controlled Vocabularies Task Group has finished draft controlled vocabularies for ac:subjectOrientation and ac:subjectPart and is looking for test implementations where we can gather user experience data. This seems like a really important use case (how you would use a IIIF manifest and accompanying Audubon Core records to link related views of a specimen) and we would like to capture what is learned from this effort and incorporate in a user experience report prior to submitting the vocabularies for public comment.

I have been slow in the cleanup of our draft and now that the TDWG meeting is over, I need to get it in presentable form on the Audubon Core website. However, some draft Google Sheets tables are here. In particular, subjectOrientation_cv, subjectPart_cv, part_collection_join, and skos_collections contain the raw term metadata. The connections between these tables may not be apparent, but I'm going to convert them to structure JSON-LD, which may make the connections more apparent.

@MattBlissett
Copy link
Member

MattBlissett commented Mar 14, 2022

This will be a task on pipelines / gbif-api, but I'll keep the discussion here until there's a decision.

The next thing we need, prompted by Roger, is to identify the existence of IIIF manifests in the GBIF occurrence API response.

Normal images look like this: https://api.gbif.org/v1/occurrence/574685369

    {
      "type": "StillImage",
      "format": "image/jpeg",
      "description": "Image of herbarium specimen E00010016 by Specimen Digitisation Pipeline",
      "creator": "Specimen Digitisation Pipeline",
      "license": "http://creativecommons.org/publicdomain/zero/1.0/",
      "rightsHolder": "Royal Botanic Garden Edinburgh",
      "identifier": "http://repo.rbge.org.uk/image_server.php?kind=1500&path_base64=L2hlcmJhcml1bV9zcGVjaW1lbl9zY2Fucy9FMDAvMDEwLzAxNi81ODQ0Mi5qcGc="
    },

i.e. type = StillImage and format = image/*.

The IIIF manifests don't show either type or format:

    {
      "description": "IIIF Manifest for specimen E00010016",
      "creator": "Royal Botanic Garden Edinburgh",
      "license": "http://creativecommons.org/publicdomain/zero/1.0/",
      "rightsHolder": "Royal Botanic Garden Edinburgh",
      "identifier": "https://iiif.rbge.org.uk/herb/iiif/E00010016/manifest"
    },
  1. We could add an additional MediaType from the dcmi terms list. The RBGE example is using InteractiveResource, although I'm not certain that's best ("Examples include forms on Web pages, applets, multimedia learning objects, chat services, or virtual reality environments"), probably dcmi-type:Collection is more appropriate -- it seems IIIF can support text, audio and video too ("A collection is described as a group; its parts may also be separately described."). It could also be Service.

  2. We could also expand the format to include a profile/schema, as shown on IIIF content negotiation, so we'd have type=Collection + format=application/json+ld;profile="http://iiif.io/api/image/3/context.json".

  3. Or we could put that profile in a new term, which would be like ac:serviceExpectation but with a schema rather than a vocabulary.

We'd then have something like this:

    {
      "type": "Collection",
      "format": "application/json+ld;profile=\"http://iiif.io/api/image/3/context.json\"",
      "description": "IIIF Manifest for specimen E00010016",
      "creator": "Royal Botanic Garden Edinburgh",
      "license": "http://creativecommons.org/publicdomain/zero/1.0/",
      "rightsHolder": "Royal Botanic Garden Edinburgh",
      "identifier": "https://iiif.rbge.org.uk/herb/iiif/E00010016/manifest"
    },

@rogerhyam
Copy link

rogerhyam commented Mar 14, 2022

Thanks Matt.

  1. My life has become easier since I've become semantically promiscuous so I'm not too bothered about which particular term is used but ... When I see a Collection of objects I assume they are of one kind or have something in common, a list of some things rather not a complex object as described in a IIIF manifest. A collection of Nodes in a DOM is an example. The IIIF Presentation API describes itself in the standard as: "The objective of the IIIF Presentation API is to provide the information necessary to allow a rich, online viewing environment for compound digital objects to be presented to a human user, often in conjunction with the IIIF Image API." Which sounds like an InteractiveResource to me. A rich online viewing environment could be an applet or single page React.js application to me. But again, I'm easy on semantics if someone wants to pick a word.
  2. I don't think we should expand the schema to include the profile. The example of content negotiation is how a client may call an endpoint if it only understands one version of the API and occurs after the client has found the API end point. It should be possible for a data provider to add/remove different protocol versions to their server without having to update all their records in GBIF. We don't want to wrap up the location/identity of the end point with the particular version of the protocol it serves. The example you cite is for the Image API. I don't think we should be linking from GBIF to a IIIF Image API end point. Links should always go to IIIF Manifest document (which may be trivially simple). As above the provider should be able to swap out their image server and not have to update all their records in GBIF. Also the Manifest is what binds the labels, metadata and other images together. An image API on its own isn't very exciting.
  3. At the moment we just put "IIIF" in service expectation but we could put something more descriptive.

I guess this comes to how much data to keep in the index and how much the client is expected to work out by calling the URI and parsing the Manifest. We don't want to have to update the index if the content of the Manifest changes in any way that is unrelated to discovery of the resource.

My preference is that we need the minimum here for the client to know that it has to call a URI to get some JSON LD and a hint that it contains a IIIF Manifest of some version.

dcterms:MediaType => InteractiveResource
dc:format => application/ld+json
ac:serviceExpectation => IIIF

Does that but some other combination would also work. It would be better to have a working solution than spend too long on semantics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants