How to get the logical part of a sentence in Java?
Suppose there is a sentence:
On March 1,he was born.
Change it to
He was born on March 1.
Without breaking the meaning of the sentence, it is still valid Reorganizing words in any other way will have a strange effect on invalid sentences So basically, I'm talking about part of a sentence. It makes the information more specific, but deleting them doesn't destroy the whole sentence Are there any NLP libraries that can identify these components?
Solution
component
It sounds like you want to identify the constructs of sentences, which are groups of words that run as a single unit according to the grammar of the language
In fact, when linguistics tries to discover the grammar of a language, they do so in part by looking at movement In your example, this is a group of words that can be moved to different positions in the sentence while still retaining meaning
Components can be single words, phrases, or even larger groups, such as the entire clause In a sentence, they have a nested hierarchy For example, the first example sentence you give can be analyzed as:
(S (PP (IN On) (NP (NNP March) (CD 1))) (NP (PRP he)) (VP (VBD was) (VP (VBN born))))
The whole sentence consists of prepositional phrase, followed by noun phrase, followed by verb phrase Prepositional phrases can be further decomposed into units consisting of a single word "on" followed by a noun phrase
Phrase structure parser
To find components automatically, you may need to use a phrase structure parser There are many such parsing options that can be used as open source, including:
>Stanford parser (Java) > Berkeley parser (Java) > bllip (charniak Johnson) parser (c) > bike parser (this is a re implementation and improved version of Collins parser written in Java) > Collins parser (c) > OpenNLP parser (Java) > sharpnlp parser (C #)
Stanford and Berkeley parsers are probably the easiest to install and use As shown by cer et al. 2010, the most accurate parsers are Berkeley and charniak The bike parser is slower and less accurate than other parsers
Online demonstration
There is an online demonstration of Stanford parser here I use this demonstration to generate the parsing of the example sentence given above
Notes on deletion
In each component, there will be a head word For example, name word phrase:
(NP (DT the) (JJ large) (JJ blue) (NN ball))
The first word here is the noun ball, which is modified by the adjectives big and blue If the noun phrase is embedded in a sentence, you can delete those modifiers and still have the same meaning as the original sentence, but less specific content
In noun phrases, you can usually delete adjectives, non - Head Nouns and nested prepositional phrases
In phrasal verbs and full clauses, things get trickier because deleting servers as verb parameters can completely change the interpretation of sentences For example, deleting the book from the book he sold Jim led him to sell Jim