Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte #8

Closed
ghost opened this issue Mar 30, 2020 · 12 comments

Comments

@ghost
Copy link

ghost commented Mar 30, 2020

@fengxinjie
when running predict.py i get the error bellow.
I am using the IC15.pth and the image below, along with resnet101.pth from https://download.pytorch.org/models/resnet101-5d3b4d8f.pth

(mben) home@home-desktop:~/p13/Transformer-OCR$ python predict.py 
/home/home/p13/Transformer-OCR/model.py:255: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  nn.init.xavier_uniform(p)
Traceback (most recent call last):
  File "predict.py", line 81, in <module>
    do_folder('./images/1.jpg')
  File "predict.py", line 66, in do_folder
    for line in open(root).readlines():
  File "/home/home/anaconda3/envs/mben/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

The image trying to predict:
1

@decoder746
Copy link

From where did you get the "IC15.pth" file

@ghost
Copy link
Author

ghost commented Mar 30, 2020

@decoder746 were you able to train a new model?

@decoder746
Copy link

decoder746 commented Mar 30, 2020

As for your problem do_folder(root) is a function which requires as input a file with certain specifications. To find predictions for a single image, instead of using do_folder you can use the following function:-

def do_image(img): 
    img = cv2.imread(img)
    img = resize(img) / 255.
    img = np.transpose(img, (2, 0, 1))
    img = torch.from_numpy(img).float().unsqueeze(0).cuda()
    pred = greedy_decode(img)
    print(pred)

@decoder746
Copy link

@deepseek I thought it was the pretrained model given by the author

@ghost
Copy link
Author

ghost commented Mar 30, 2020

it is
download the repo:
https://github.com/fengxinjie/Transformer-OCR/tree/76c321fb89be51c1718b98f5c5c446633614f97b

then

cd checkpoints && cat IC1500 IC1501 > IC15.zip && unzip IC15.zip

@ghost
Copy link
Author

ghost commented Mar 30, 2020

@decoder746
upload your modified predict.py so i can run
also from where did you download resnet101.pth

@decoder746
Copy link

I downloaded resenet101.pth from https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
The predict.py is attached but its predictions seem to be completely off (this might be the wrong way to do it).

import torch
from torch.autograd import Variable
import numpy as np
from model import make_model
from dataset import vocab, char2token, token2char
from dataset import subsequent_mask
import cv2
import sys, os

model = make_model(len(char2token))
model.load_state_dict(torch.load('IC15.pth'))
model.cuda()
model.eval()
src_mask=Variable(torch.from_numpy(np.ones([1, 1, 36], dtype=np.bool)).cuda())
SIZE=96

def greedy_decode(src, max_len=36, start_symbol=1):
    global model
    global src_mask
    memory = model.encode(src, src_mask)
    ys = torch.ones(1, 1).fill_(start_symbol).long().cuda()
    for i in range(max_len-1):
        out = model.decode(memory, src_mask, 
                           Variable(ys), 
                           Variable(subsequent_mask(ys.size(1))
                                    .long().cuda()))
        prob = model.generator(out[:, -1])
        _, next_word = torch.max(prob, dim = 1)
        next_word = next_word.data[0]
        ys = torch.cat([ys, 
                        torch.ones(1, 1).long().cuda().fill_(next_word)], dim=1)
        if token2char[next_word.item()] == '>':
            break
    ret = ys.cpu().numpy()[0]
    out = [token2char[i] for i in ret]
    out = "".join(out[1:-1])
    return out

def resize(img):
    h, w, c = img.shape
    if w > h:
        nw, nh = SIZE, int(h * SIZE/w)
        if nh < 10 : nh = 10
        #print(h, w, nh, nw)
        img = cv2.resize(img, (nw, nh))
        a1 = int((SIZE-nh)/2)
        a2= SIZE-nh-a1
        pad1 = np.zeros((a1, SIZE, c), dtype=np.uint8)
        pad2 = np.zeros((a2, SIZE, c), dtype=np.uint8)
        img = np.concatenate((pad1, img, pad2), axis=0)
    else:
        nw, nh = int(w * SIZE/h), SIZE
        if nw < 10 : nw = 10
        #print(h, w, nh, nw)
        img = cv2.resize(img, (nw, nh))
        a1 = int((SIZE-nw)/2)
        a2= SIZE-nw-a1
        pad1 = np.zeros((SIZE, a1, c), dtype=np.uint8)
        pad2 = np.zeros((SIZE, a2, c), dtype=np.uint8)
        img = np.concatenate((pad1, img, pad2), axis=1)
    return img

def do_folder(root):
    hit = 0
    all = 0
    for line in open(root).readlines():
        all += 1
        imp, label = line.strip('\n').split('\t')
        img = cv2.imread(imp)
        img = resize(img) / 255.
        img = np.transpose(img, (2, 0, 1))
        img = torch.from_numpy(img).float().unsqueeze(0).cuda()
        pred = greedy_decode(img)
        if pred != label:
            hit += 1
            print('imp:', imp, 'label:', label, 'pred:', pred, hit, all, hit/all)
    print(hit, all, hit/all)

def do_image(img):
    img = cv2.imread(img)
    img = resize(img) / 255.
    img = np.transpose(img, (2, 0, 1))
    img = torch.from_numpy(img).float().unsqueeze(0).cuda()
    pred = greedy_decode(img)
    print(pred)

if __name__ == '__main__':
    do_image("img88.jpg")
    # do_folder('your-test-lines')

@ghost
Copy link
Author

ghost commented Mar 30, 2020

hmmmm...
what about training?
for train.py, what should be the structure of the list for your-train-lines

@decoder746
Copy link

The your-train-files consists of lines where each line has structure image_path \t label \n as can be seen commented in dataset.py under __getitem__. If there are one or more such your-train-files you have to pass them as a list of files. Don't know how to pass if one image has more than one label(some text written on the top and some text written on the bottom too).

@ghost
Copy link
Author

ghost commented Mar 31, 2020

@decoder746 thats why i am asking since it's not clear.
But according to the developer it seems that he cropped the words, then trained with image \t label structure.
If you can give it a try, it might be great. Though i am still sceptical since the developer "stated" having very high accuracy rate, "stated", and he's not replying back nor disclosing any sort of training documentation.

@ghost ghost closed this as completed Mar 31, 2020
@gussmith
Copy link

@deepseek

cd checkpoints && cat IC1500 IC1501 > IC15.zip && unzip IC15.zip

How did you know to do this? (it works by the way and it loads without complaining)
Is this some kind of convention to load the model in multiple files because some sites have file size limits?

@gussmith
Copy link

@fengxinjie
when running predict.py i get the error bellow.
I am using the IC15.pth and the image below, along with resnet101.pth from https://download.pytorch.org/models/resnet101-5d3b4d8f.pth

Use this to remove the first hidden symbol:

            lines = open(f,encoding='utf-8-sig').readlines()
            self.lines += [i for i in lines if not illegal(i.strip('\n').split(', ')[1].strip('"'))]

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants