Skip to content

Spacy model trained based on Norwegian corpus converted from OBT to Universal dep.

License

Notifications You must be signed in to change notification settings

ohenrik/nb_dep_ud_sm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Experimental Norwegian (Bokmål) language model for Spacy

This model is based of the Norwegian Universal dependency dataset that can be found here:

https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal

Command used to train the model:

batch_from=16 batch_to=64 python -m spacy train nb model_out no_bokmaal-ud-train.json no_bokmaal-ud-dev.json -n 30

There is probably much room for improvement on this model. However in regards to tagging the model seems to perform pretty well.

Iteration 7 seemed to be working best so this is the one packaged here.

To get the same results as show here please use the updated Norwegian language package for Spacy. It should now be a part of the master branch, but the Pull request can be found here: explosion/spaCy#1882

Installation

To install the package use this command:

pip install https://github.com/ohenrik/nb_dep_ud_sm/raw/master/nb_dep_ud_sm-0.0.1/dist/nb_dep_ud_sm-0.0.1.tar.gz

Usage

import spacy
nb = spacy.load("nb_dep_ud_sm")

doc = nb("Det er kaldt på vinteren i Norge.")

Training results:

Itn. P.Loss N.Loss UAS NER P. NER R. NER F. Tag % Token % na na
0 500.962 0.000 83,67 0.000 0.000 0.000 93,269 100.000 3542.9 0.0
1 86.554 0.000 86,38 0.000 0.000 0.000 94,396 100.000 3767.6 0.0
2 35.351 0.000 87,07 0.000 0.000 0.000 94,762 100.000 3611.1 0.0
3 21.769 0.000 87,99 0.000 0.000 0.000 94,839 100.000 3779.8 0.0
4 19.490 0.000 88,26 0.000 0.000 0.000 95,02 100.000 3565.9 0.0
5 17.730 0.000 88,48 0.000 0.000 0.000 95,084 100.000 3421.0 0.0
6 16.141 0.000 88,77 0.000 0.000 0.000 95,042 100.000 3533.3 0.0
7 14.906 0.000 88,72 0.000 0.000 0.000 95,139 100.000 3572.3 0.0
8 13.644 0.000 88,76 0.000 0.000 0.000 95,042 100.000 3585.8 0.0
9 12.909 0.000 88,72 0.000 0.000 0.000 95,125 100.000 3694.2 0.0
10 12.194 0.000 88,72 0.000 0.000 0.000 95,075 100.000 3618.3 0.0
11 11.435 0.000 88,65 0.000 0.000 0.000 95,042 100.000 3738.2 0.0
12 10.950 0.000 88,67 0.000 0.000 0.000 94,754 100.000 3909.9 0.0
13 10.325 0.000 88,85 0.000 0.000 0.000 47,879 100.000 3673.9 0.0
14 9.793 0.000 88,88 0.000 0.000 0.000 42,063 100.000 3758.4 0.0
15 9.456 0.000 88,77 0.000 0.000 0.000 43,68 100.000 3497.1 0.0
16 8.967 0.000 88,69 0.000 0.000 0.000 45,06 100.000 3514.9 0.0
17 8.493 0.000 88,88 0.000 0.000 0.000 46,537 100.000 3632.7 0.0
18 8.109 0.000 88,76 0.000 0.000 0.000 47,249 100.000 3837.6 0.0
19 7.795 0.000 88,73 0.000 0.000 0.000 47,485 100.000 3473.2 0.0
20 7.573 0.000 88,81 0.000 0.000 0.000 47,579 100.000 3482.8 0.0
21 7.131 0.000 88,82 0.000 0.000 0.000 47,282 100.000 3327.1 0.0
22 7.053 0.000 88,87 0.000 0.000 0.000 46,916 100.000 3576.0 0.0
23 6.736 0.000 88,61 0.000 0.000 0.000 46,394 100.000 3223.6 0.0
24 6.459 0.000 88,83 0.000 0.000 0.000 45,841 100.000 3523.7 0.0
25 6.364 0.000 88,67 0.000 0.000 0.000 45,423 100.000 3163.7 0.0
26 6.080 0.000 88,80 0.000 0.000 0.000 44,959 100.000 3497.2 0.0
27 5.984 0.000 88,77 0.000 0.000 0.000 44,56 100.000 3642.3 0.0
28 5.724 0.000 88,99 0.000 0.000 0.000 44,249 100.000 3467.4 0.0
29 5.620 0.000 88,97 0.000 0.000 0.000 43,895 100.000 3628.4 0.0

Not an official model

This is not yet an official spacy model

About

Spacy model trained based on Norwegian corpus converted from OBT to Universal dep.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages