Java – extract all nouns, adjective forms and text through Stanford parser
I try to extract all nouns and adjectives from a given text through the Stanford parser
My current attempt is to use pattern matching in getchildrenaslist() of tree object to locate the following contents:
(NN paper),(NN algorithm),(NN information),...
And save them in an array
Enter sentence:
Result – string:
[(S (PP (IN In) (NP (DT this) (NN paper))) (NP (PRP we)) (VP (VBP present) (NP (NP (DT an) (NN algorithm)) (SBAR (WHNP (WDT that)) (S (VP (VBD extracts) (NP (JJ semantic) (NN information)) (PP (IN from) (NP (DT an) (ADJP (JJ arbitrary)) (NN text)))))))) (. .))]
I try to use pattern matching because I can't find a method in the Stanford parser that returns all word classes, such as nouns
Is there a better way to extract these words, or does the parser provide specific methods?
public static void main(String[] args) { String str = "In this paper we present an algorithm that extracts semantic information from an arbitrary text."; LexicalizedParser lp = new LexicalizedParser("englishPCFG.ser.gz"); Tree parseS = (Tree) lp.apply(str); System.out.println("tr.getChildrenAsList().toString()"+ parseS.getChildrenAsList().toString()); } }
Solution
By the way, if all you want is part of speech such as nouns and verbs, you should use a part of speech marker, such as the Stanford POS marker It will run several orders of magnitude faster and at least accurate
But you can use a parser to do it The method you want is taggedyield (), which returns a list < taggedword > So you have
List<TaggedWord> taggedWords = (Tree) lp.apply(str); for (TaggedWord tw : taggedWords) { if (tw.tag().startsWith("N") || tw.tag().startsWith("J")) { System.out.printf("%s/%s%n",tw.word(),tw.tag()); } }
(this method will cut a corner because you know that all and only adjective and noun tags begin with J or N in the Penn Tree Library tag set. You can more generally check the membership in a group of tags.)
Attachment: the Stanford NLP tag is most suitable for the Stanford NLP tool on stackoverflow