You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ADMIN ] Starting user script with executable='/sist2-admin/scripts/test/run.sh', index_path='/sist2-admin/scan-TEST_v2-2023-12-11 15:27:05.254099.sist2', extra_args=''
[INFO ] Instantiating the Index...
[INFO ] Iterating through the documents...
[INFO ] Could not decode to UTF-8 column 'json_data' with text '{"extension":"pdf","name":"DOC NAME","path":"PATH/TO/DOC
[INFO ] [ERROR] Something went wrong with the doc loop!
[INFO ] Finished Processing 5040 documents.
User Script:
importsysprint("Instantiating the Index...")
index=Sist2Index(sys.argv[1])
print("Iterating through the documents...")
docs=0try:
fordocinindex.document_iter():
docs+=1exceptExceptionaserror:
print(error)
print("[ERROR] Something went wrong with the doc loop!")
print("Finished Processing %d documents."%docs)
I am not sure where the non-UTF-8 data is coming from. The document identified is a PDF and it does include pages that were OCR'd during the scan so, maybe it came in through that?
I am unsure how to identify the character that is causing the issue. The sqlite reader I have used seems to handle it gracefully.
The text was updated successfully, but these errors were encountered:
I am getting the above error message.
Log:
User Script:
I am not sure where the non-UTF-8 data is coming from. The document identified is a PDF and it does include pages that were OCR'd during the
scan
so, maybe it came in through that?I am unsure how to identify the character that is causing the issue. The sqlite reader I have used seems to handle it gracefully.
The text was updated successfully, but these errors were encountered: