Retrain model #9

1653100 · 2020-02-13T08:58:47Z

Hello,
I had used your package and it is very usefull. But the my data is formatted in UNICODE, which is Vietnamese, and it not working well. So can i use your code to retrain a new model for my own Vietnamese data? If yes, can you please help me? Thank you a lot.
For UNICODE example, "Số nhà 25, ngõ 294 Kim Mã, Phường Kim Mã, Quận Ba Đình, Thành phố Hà Nội". "street" is now "ngõ", "state" is now "Quận", ...
Sorry for my bad english,
Looking forward to hearing from you soon.

jasonrig · 2020-02-21T04:17:10Z

Your English is completely fine, don't worry!

This model is trained only on Australian address data, so it will not work at all for Vietnamese addresses, and probably it will have a lot of problems with any other country.

The model itself is quite simple, so you can retrain it. You can see from my answer in issue #10 that the model produces one class per character. Since you are using unicode characters for the Vietnamese language, there are many more possible characters than the standard English alphabet (e.g. ă, â, đ, ê, ô, ơ). So, you have a choice:

expand the number of possible characters ("vocabulary") to be bigger
find a method to reduce the characters with accent marks back to their base character, e.g. ă, â -> a

Once you have decided how you will approach the problem, you need to find a structured database of addresses. You can use this to automatically generate labelled training data.

1653100 · 2020-03-14T16:42:44Z

Thank you so much. Your answer helped me a lot.
I have rebuilt the model using keras, and it ran well. Even though it doesn't work as well as yours, the predict is still mislabeled by wrong letters.
By the way, can i have your model outline, like the order of layers, the number of layers, ....
Once again, thank you a lot. ^^
Have a nice day.

jasonrig added the question Further information is requested label Feb 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrain model #9

Retrain model #9

1653100 commented Feb 13, 2020

jasonrig commented Feb 21, 2020

1653100 commented Mar 14, 2020

Retrain model #9

Retrain model #9

Comments

1653100 commented Feb 13, 2020

jasonrig commented Feb 21, 2020

1653100 commented Mar 14, 2020