Java – extract all nouns, adjective forms and text through Stanford parser

I try to extract all nouns and adjectives from a given text through the Stanford parser

My current attempt is to use pattern matching in getchildrenaslist() of tree object to locate the following contents:

(NN paper),(NN algorithm),(NN information),...

And save them in an array

Enter sentence:

Result – string:

[(S (PP (IN In) (NP (DT this) (NN paper))) (NP (PRP we)) (VP (VBP present) (NP (NP (DT an) (NN algorithm)) (SBAR (WHNP (WDT that)) (S (VP (VBD extracts) (NP (JJ semantic) (NN information)) (PP (IN from) (NP (DT an) (ADJP (JJ arbitrary)) (NN text)))))))) (. .))]

I try to use pattern matching because I can't find a method in the Stanford parser that returns all word classes, such as nouns

Is there a better way to extract these words, or does the parser provide specific methods?

public static void main(String[] args) {
    String str = "In this paper we present an algorithm that extracts semantic information from an arbitrary text.";
    LexicalizedParser lp = new LexicalizedParser("englishPCFG.ser.gz"); 
    Tree parseS = (Tree) lp.apply(str);
    System.out.println("tr.getChildrenAsList().toString()"+ parseS.getChildrenAsList().toString());
    }
}

Solution

By the way, if all you want is part of speech such as nouns and verbs, you should use a part of speech marker, such as the Stanford POS marker It will run several orders of magnitude faster and at least accurate

But you can use a parser to do it The method you want is taggedyield (), which returns a list < taggedword > So you have

List<TaggedWord> taggedWords = (Tree) lp.apply(str);
for (TaggedWord tw : taggedWords) {
  if (tw.tag().startsWith("N") || tw.tag().startsWith("J")) {
    System.out.printf("%s/%s%n",tw.word(),tw.tag());
  }
}

(this method will cut a corner because you know that all and only adjective and noun tags begin with J or N in the Penn Tree Library tag set. You can more generally check the membership in a group of tags.)

Attachment: the Stanford NLP tag is most suitable for the Stanford NLP tool on stackoverflow

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>