-
Notifications
You must be signed in to change notification settings - Fork 51
Index Nextcloud content in a external indexation platform
For a ready-to-use Full Text Search feature with Nextcloud based on Elastic Search, see https://github.com/nextcloud/fulltextsearch_elasticsearch/wiki
The following documentation aims to explain how to index content and metadata of documents stored on Nextcloud in a third-party indexation platform.
Full Text Search app keep updated a collection (list) of changed documents since the last indexation.
Following APIs can be used by a third-party app to:
- query the collection in order to know which documents changed.
- retrieve the document content and metadatas (that can then be indexed in whatever indexation platform)
- update the collection to say which documents have been indexed
Create one collection per script indexing the content.
To list all collections:
./occ fulltextsearch:collection:list
To create a new collection:
./occ fulltextsearch:collection:init <collectionName>
To destroy a collection:
./occ fulltextsearch:collection:delete <collectionName>
Using the OCS API require admin rights on the account used.
Get list of documents that needs to be indexed: (using test
as collection name)
curl -X GET "https://cloud.example.net/ocs/v2.php/apps/fulltextsearch/collection/test/index?format=json&length=50" -H "OCS-APIRequest: true" -u "admin:password"
{
"ocs": {
"meta": {
"status": "ok",
"statuscode": 200,
"message": "OK"
},
"data": [
{
"url": "https://cloud.example.net/ocs/v2.php/apps/fulltextsearch/collection/test/document/files/597996",
"status": 28
}
]
}
}
-
url
is the link to the document, -
status
is a bitflag based on this list:-
4
=> meta have been modified, -
8
=> content have been modified, -
16
=> parts have been modified -
32
=> document have been removed
-
Get data and metadata from a a document:
curl -X GET "https://cloud.example.net/ocs/v2.php/apps/fulltextsearch/collection/test/document/files/597996" -H "OCS-APIRequest: true" -u "admin:password"
{
"ocs": {
"meta": {
"status": "ok",
"statuscode": 200,
"message": "OK"
},
"data": {
"id": "597996",
"providerId": "files",
"access": {
"ownerId": "cult",
"viewerId": "",
"users": ['test1', 'test2'],
"groups": [],
"circles": [],
"links": []
},
"index": {
"ownerId": "cult",
"providerId": "files",
"collection": "test",
"source": "files_local",
"documentId": "597996",
"lastIndex": 0,
"errors": [],
"errorCount": 0,
"status": 28,
"options": []
},
"title": "640-240-max.png",
"link": "http://nc23.local/index.php/f/597996",
"parts": {
"comments": "<test1> This is a comment !"
},
"content": "VGhlIHF1aWNrIGJyb3duIGZveApqdW1wcyBvdmVyCnRoZSBsYXp5IGRvZy4=",
"isContentEncoded": 1
}
}
}
content is encoded with base64. In case of text file, the text itself is available as encoded content In case of Office document, the whole content of the file is sent is available this way. In case of image, the content is OCR; this is the file used in our current example:
$ php -r "echo base64_decode('VGhlIHF1aWNrIGJyb3duIGZveApqdW1wcyBvdmVyCnRoZSBsYXp5IGRvZy4=');"
The quick brown fox
jumps over
the lazy dog.
Set document as indexed:
curl -X POST "https://cloud.example.net/ocs/v2.php/apps/fulltextsearch/collection/test/document/files/597996/done" -H "OCS-APIRequest: true" -u "admin:password"
{
"ocs": {
"meta": {
"status": "ok",
"statuscode": 200,
"message": "OK"
},
"data": []
}
}