Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPENNLP-1566 - Array writing error in code example #605

Merged
merged 1 commit into from
Jun 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions opennlp-docs/src/docbkx/chunker.xml
Original file line number Diff line number Diff line change
Expand Up @@ -98,26 +98,26 @@ ChunkerME chunker = new ChunkerME(model);]]>
The following code shows how to determine the most likely chunk tag sequence for a sentence.
<programlisting language="java">
<![CDATA[
String sent[] = new String[] { "Rockwell", "International", "Corp.", "'s",
String[] sent = new String[] { "Rockwell", "International", "Corp.", "'s",
"Tulsa", "unit", "said", "it", "signed", "a", "tentative", "agreement",
"extending", "its", "contract", "with", "Boeing", "Co.", "to",
"provide", "structural", "parts", "for", "Boeing", "'s", "747",
"jetliners", "." };

String pos[] = new String[] { "NNP", "NNP", "NNP", "POS", "NNP", "NN",
String[] pos = new String[] { "NNP", "NNP", "NNP", "POS", "NNP", "NN",
"VBD", "PRP", "VBD", "DT", "JJ", "NN", "VBG", "PRP$", "NN", "IN",
"NNP", "NNP", "TO", "VB", "JJ", "NNS", "IN", "NNP", "POS", "CD", "NNS",
"." };

String tag[] = chunker.chunk(sent, pos);]]>
String[] tag = chunker.chunk(sent, pos);]]>
</programlisting>
The tags array contains one chunk tag for each token in the input array. The corresponding
tag can be found at the same index as the token has in the input array.
The confidence scores for the returned tags can be easily retrieved from
a ChunkerME with the following method call:
<programlisting language="java">
<![CDATA[
double probs[] = chunker.probs();]]>
double[] probs = chunker.probs();]]>
</programlisting>
The call to probs is stateful and will always return the probabilities of the last
tagged sentence. The probs method should only be called when the tag method
Expand All @@ -130,7 +130,7 @@ double probs[] = chunker.probs();]]>
It can be called in a similar way as chunk.
<programlisting language="java">
<![CDATA[
Sequence topSequences[] = chunk.topKSequences(sent, pos);]]>
Sequence[] topSequences = chunk.topKSequences(sent, pos);]]>
</programlisting>
Each Sequence object contains one sequence. The sequence can be retrieved
via Sequence.getOutcomes() which returns a tags array
Expand Down
2 changes: 1 addition & 1 deletion opennlp-docs/src/docbkx/introduction.xml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ ToolName toolName = new ToolName(model);]]>
and the input is a String or an array of String.
<programlisting language="java">
<![CDATA[
String output[] = toolName.executeTask("This is a sample text.");]]>
String[] output = toolName.executeTask("This is a sample text.");]]>
</programlisting>
</para>
</section>
Expand Down
4 changes: 2 additions & 2 deletions opennlp-docs/src/docbkx/namefinder.xml
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ for (String document[][] : documents) {
the following snippet shows a call to find
<programlisting language="java">
<![CDATA[
String sentence[] = new String[]{
String[] sentence = new String[]{
"Pierre",
"Vinken",
"is",
Expand All @@ -140,7 +140,7 @@ String sentence[] = new String[]{
"."
};

Span nameSpans[] = nameFinder.find(sentence);]]>
Span[] nameSpans = nameFinder.find(sentence);]]>
</programlisting>
The nameSpans arrays contains now exactly one Span which marks the name Pierre Vinken.
The elements between the start and end offsets are the name tokens. In this case the start
Expand Down
2 changes: 1 addition & 1 deletion opennlp-docs/src/docbkx/parser.xml
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ Parser parser = ParserFactory.create(model);]]>
<programlisting language="java">
<![CDATA[
String sentence = "The quick brown fox jumps over the lazy dog .";
Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);]]>
Parse[] topParses = ParserTool.parseLine(sentence, parser, 1);]]>
</programlisting>

The topParses array only contains one parse because the number of parses is set to 1.
Expand Down
8 changes: 4 additions & 4 deletions opennlp-docs/src/docbkx/postagger.xml
Original file line number Diff line number Diff line change
Expand Up @@ -86,17 +86,17 @@ POSTaggerME tagger = new POSTaggerME(model);]]>
The following code shows how to determine the most likely pos tag sequence for a sentence.
<programlisting language="java">
<![CDATA[
String sent[] = new String[]{"Most", "large", "cities", "in", "the", "US", "had",
String[] sent = new String[]{"Most", "large", "cities", "in", "the", "US", "had",
"morning", "and", "afternoon", "newspapers", "."};
String tags[] = tagger.tag(sent);]]>
String[] tags = tagger.tag(sent);]]>
</programlisting>
The tags array contains one part-of-speech tag for each token in the input array. The corresponding
tag can be found at the same index as the token has in the input array.
The confidence scores for the returned tags can be easily retrieved from
a POSTaggerME with the following method call:
<programlisting language="java">
<![CDATA[
double probs[] = tagger.probs();]]>
double[] probs = tagger.probs();]]>
</programlisting>
The call to probs is stateful and will always return the probabilities of the last
tagged sentence. The probs method should only be called when the tag method
Expand All @@ -109,7 +109,7 @@ double probs[] = tagger.probs();]]>
It can be called in a similar way as tag.
<programlisting language="java">
<![CDATA[
Sequence topSequences[] = tagger.topKSequences(sent);]]>
Sequence[] topSequences = tagger.topKSequences(sent);]]>
</programlisting>
Each Sequence object contains one sequence. The sequence can be retrieved
via Sequence.getOutcomes() which returns a tags array
Expand Down
4 changes: 2 additions & 2 deletions opennlp-docs/src/docbkx/sentdetect.xml
Original file line number Diff line number Diff line change
Expand Up @@ -94,14 +94,14 @@ SentenceDetectorME sentenceDetector = new SentenceDetectorME(model);]]>
The Sentence Detector can output an array of Strings, where each String is one sentence.
<programlisting language="java">
<![CDATA[
String sentences[] = sentenceDetector.sentDetect(" First sentence. Second sentence. ");]]>
String[] sentences = sentenceDetector.sentDetect(" First sentence. Second sentence. ");]]>
</programlisting>
The result array now contains two entries. The first String is "First sentence." and the
second String is "Second sentence." The whitespace before, between and after the input String is removed.
The API also offers a method which simply returns the span of the sentence in the input string.
<programlisting language="java">
<![CDATA[
Span sentences[] = sentenceDetector.sentPosDetect(" First sentence. Second sentence. ");]]>
Span[] sentences = sentenceDetector.sentPosDetect(" First sentence. Second sentence. ");]]>
</programlisting>
The result array again contains two entries. The first span beings at index 2 and ends at
17. The second span begins at 18 and ends at 34. The utility method Span.getCoveredText can be used to create a substring which only covers the chars in the span.
Expand Down
8 changes: 4 additions & 4 deletions opennlp-docs/src/docbkx/tokenizer.xml
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ Tokenizer tokenizer = new TokenizerME(model);]]>
Strings, where each String is one token.
<programlisting language="java">
<![CDATA[
String tokens[] = tokenizer.tokenize("An input sample sentence.");]]>
String[] tokens = tokenizer.tokenize("An input sample sentence.");]]>
</programlisting>
The output will be an array with these tokens.
<programlisting>
Expand All @@ -183,7 +183,7 @@ String tokens[] = tokenizer.tokenize("An input sample sentence.");]]>
String.
<programlisting language="java">
<![CDATA[
Span tokenSpans[] = tokenizer.tokenizePos("An input sample sentence.");]]>
Span[] tokenSpans = tokenizer.tokenizePos("An input sample sentence.");]]>
</programlisting>
The tokenSpans array now contain 5 elements. To get the text for one
span call Span.getCoveredText which takes a span and the input text.
Expand All @@ -195,8 +195,8 @@ Span tokenSpans[] = tokenizer.tokenizePos("An input sample sentence.");]]>
<![CDATA[
TokenizerME tokenizer = ...

String tokens[] = tokenizer.tokenize(...);
double tokenProbs[] = tokenizer.getTokenProbabilities();]]>
String[] tokens = tokenizer.tokenize(...);
double[] tokenProbs = tokenizer.getTokenProbabilities();]]>
</programlisting>
The tokenProbs array now contains one double value per token, the
value is between 0 and 1, where 1 is the highest possible probability
Expand Down
Loading