Skip to content

Commit

Permalink
Add Porter stemmer for Dutch (NaturalNode#423)
Browse files Browse the repository at this point in the history
  • Loading branch information
Hugo-ter-Doest authored Apr 7, 2018
1 parent 39cf012 commit 261fcd5
Show file tree
Hide file tree
Showing 8 changed files with 46,190 additions and 25 deletions.
33 changes: 17 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ The following

## Stemmers

Currently stemming is supported via the [Porter](http://tartarus.org/martin/PorterStemmer/index.html) and [Lancaster](http://www.comp.lancs.ac.uk/computing/research/stemming/) (Paice/Husk) algorithms.
Currently stemming is supported via the [Porter](http://tartarus.org/martin/PorterStemmer/index.html) and [Lancaster](http://www.comp.lancs.ac.uk/computing/research/stemming/) (Paice/Husk) algorithms. The Indonesian and Japanese stemmers do not follow a known algorithm.

```javascript
var natural = require('natural');
Expand All @@ -323,20 +323,21 @@ console.log(natural.PorterStemmerEs.stem("jugaría"));

The following stemmers are available:

* `PorterStemmer`
* `LancasterStemmer`
* `PorterStemmerFa`
* `PorterStemmerFr`
* `PorterStemmerRu`
* `PorterStemmerEs`
* `PorterStemmerIt`
* `PorterStemmerNo`
* `PorterStemmerSv`
* `PorterStemmerPt`
* `StemmerFr`
* `StemmerPl`
* `StemmerJa`
* `StemmerId`
| Language | Porter | Lancaster | Other | Module name(s) | Unit test |
| ------------- |:-----------:|:---------:|:---------:|----------------|:---------:|
| Dutch | X | | | `PorterStemmer` | X |
| English | X | X | | `PorterStemmer`, `LancasterStemmer` | XX |
| Farsi (in progress) | X | | | `PorterStemmerFa` | |
| Dutch | | | | `PorterStemmerNl` | X |
| French | X | | | `PorterStemmerFr` | X |
| Indonesian | | | X | `StemmerId` | |
| Italian | X | | | `PorterStemmerIt` | X |
| Japanese | X | | X | `StemmerJa` | X |
| Norwegian | X | | | `PorterStemmerNo` | X |
| Portugese | X | | | `PorterStemmerPt` | X |
| Russian | X | | | `PorterStemmerRu` | X |
| Swedish | X | | | `PorterStemmerSv` | X |


`attach()` patches `stem()` and `tokenizeAndStem()` to String as a shortcut to
`PorterStemmer.stem(token)`. `tokenizeAndStem()` breaks text up into single words
Expand All @@ -348,7 +349,7 @@ console.log("i am waking up to the sounds of chainsaws".tokenizeAndStem());
console.log("chainsaws".stem());
```

the same thing can be done with a Lancaster stemmer:
The same thing can be done with a Lancaster stemmer:

```javascript
natural.LancasterStemmer.attach();
Expand Down
7 changes: 5 additions & 2 deletions lib/natural/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,12 @@ exports.PorterStemmerIt = require('./stemmers/porter_stemmer_it');
exports.PorterStemmerNo = require('./stemmers/porter_stemmer_no');
exports.PorterStemmerSv = require('./stemmers/porter_stemmer_sv');
exports.PorterStemmerPt = require('./stemmers/porter_stemmer_pt');
exports.PorterStemmerNl = require('./stemmers/porter_stemmer_nl');
exports.LancasterStemmer = require('./stemmers/lancaster_stemmer');
exports.StemmerFr = require('./stemmers/stemmer_fr');
exports.StemmerPl = require('./stemmers/stemmer_pl');
// StemmerFr and StemmerPl are not stemmers. A Polish stemmer is
// not available, and for French PorterStemmerFr should be used.
//exports.StemmerFr = require('./stemmers/stemmer_fr');
//exports.StemmerPl = require('./stemmers/stemmer_pl');
exports.StemmerJa = require('./stemmers/stemmer_ja');
exports.StemmerId = require('./stemmers/indonesian/stemmer_id');
exports.AggressiveTokenizerNl = require('./tokenizers/aggressive_tokenizer_nl');
Expand Down
2 changes: 1 addition & 1 deletion lib/natural/stemmers/porter_stemmer_fr.js
Original file line number Diff line number Diff line change
Expand Up @@ -374,4 +374,4 @@ function isVowel(letter) {
function endsin(token, suffix) {
if (token.length < suffix.length) return false;
return (token.slice(-suffix.length) == suffix);
};
};
Loading

0 comments on commit 261fcd5

Please sign in to comment.