Skip to content
This repository was archived by the owner on Feb 24, 2025. It is now read-only.

Retrain model #9

Open
1653100 opened this issue Feb 13, 2020 · 2 comments
Open

Retrain model #9

1653100 opened this issue Feb 13, 2020 · 2 comments
Labels
question Further information is requested

Comments

@1653100
Copy link

1653100 commented Feb 13, 2020

Hello,
I had used your package and it is very usefull. But the my data is formatted in UNICODE, which is Vietnamese, and it not working well. So can i use your code to retrain a new model for my own Vietnamese data? If yes, can you please help me? Thank you a lot.
For UNICODE example, "Số nhà 25, ngõ 294 Kim Mã, Phường Kim Mã, Quận Ba Đình, Thành phố Hà Nội". "street" is now "ngõ", "state" is now "Quận", ...
Sorry for my bad english,
Looking forward to hearing from you soon.

@jasonrig
Copy link
Owner

Your English is completely fine, don't worry!

This model is trained only on Australian address data, so it will not work at all for Vietnamese addresses, and probably it will have a lot of problems with any other country.

The model itself is quite simple, so you can retrain it. You can see from my answer in issue #10 that the model produces one class per character. Since you are using unicode characters for the Vietnamese language, there are many more possible characters than the standard English alphabet (e.g. ă, â, đ, ê, ô, ơ). So, you have a choice:

  1. expand the number of possible characters ("vocabulary") to be bigger
  2. find a method to reduce the characters with accent marks back to their base character, e.g. ă, â -> a

Once you have decided how you will approach the problem, you need to find a structured database of addresses. You can use this to automatically generate labelled training data.

@jasonrig jasonrig added the question Further information is requested label Feb 21, 2020
@1653100
Copy link
Author

1653100 commented Mar 14, 2020

Thank you so much. Your answer helped me a lot.
I have rebuilt the model using keras, and it ran well. Even though it doesn't work as well as yours, the predict is still mislabeled by wrong letters.
By the way, can i have your model outline, like the order of layers, the number of layers, ....
Once again, thank you a lot. ^^
Have a nice day.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants