Simple natural language processing to start Java
See English answer > is there a good natural language processing library [closed] 3
Unfortunately, I didn't see any complete tutorials on using the API They all lack some general steps I need a tutorial from the ground I see a lot of downloads on the website, but I don't know how to use them? Do I need training or something? That's what I want to know –
How to install / set up an NLP system,
>Analyze English sentences > identify different parts
Solution
You say you need to "parse" each sentence You may already know this, but in NLP, just to be clear, the term "parsing" usually means restoring some hierarchical syntactic structure The most common types are composition structures (for example, through contextless syntax) and dependency structures
If you need a hierarchy, I suggest you consider starting with a parser Most parsers I know include POS tags during parsing and may provide more accurate tags than finite state POS tags (caveat – I'm more familiar with component parsers than relying on parsers) There may be some or most dependent parsers that require POS tags as input)
The big disadvantage of parsing is the complexity of time Finite state POS tags often run at thousands of words per second Even greedy dependency parsers are quite slow. Component parsers usually run at the speed of 1-5 sentences per second So if you don't need a hierarchy, you may want to insist on using finite state POS tags to improve efficiency
If you determine that you need to resolve the structure, please provide the following suggestions:
I think @aab the proposed Stanford parser includes both a composition parser and a dependency parser
Berkeley parser( http://code.google.com/p/berkeleyparser/ )It is a very famous PCFG component parser, which achieves the most advanced accuracy (equivalent to Stanford parser equal to or higher than me) and equivalent efficiency (3-5 sentences per second)
Bubs parser( http://code.google.com/p/bubs-parser/ )You can also run high-precision Berkeley grammar and improve the efficiency to about 15-20 sentences / second Full disclosure - I am one of the main researchers working on this parser
Warning: both parsers study code and cause all problems But I'd like to see people actually use bubs, so if it's useful to you, please try it and contact me to ask questions, comments, suggestions, etc
And several Wikipedia references, if needed:
>Contextless syntax: @ L_ 403_ 3 @ > dependency syntax: http://en.wikipedia.org/wiki/Dependency_grammar