Java – breaks a paragraph into a single sentence Am I covering all the bases here?

I'm trying to split a string containing multiple sentences into a string array of single sentences

This is what I have so far,

String input = "Hello World. " 
             + "Today in the U.S.A.,it is a nice day! "
             + "Hurrah!"
             + "Here it comes... "
             + "Party time!";
String array[] = input.split("(?<=[.?!])\\s+(?=[\\D\\d])");

This code works very well Oh, I see

Hello World.
Today in the U.S.A.,it is a nice day!
Hurrah!
Here it comes...
Party time!

I use the look behind function to see if the ending punctuation sentence precedes a or a space If so, we broke up

However, this regular expression does not cover some exceptions For example, the United States is a great country, wrongly divided into the United States, is a great country

Any ideas on how to solve this problem?

And did I miss any edge cases here?

Solution

If you don't have to use regular expressions, you can use Java's built-in breakiterator

The following code shows an example of parsing a sentence, but breakiterator supports other forms of parsing (word, line, etc.) If you work with different languages, you can also choose to pass in different locales This example uses the default locale

String input = "Hello World. " 
    + "Today in the U.S.A.,it is a nice day! "
    + "Hurrah!"
    + "The U.S. is a great country. "
    + "Here it comes... "
    + "Party time!";
BreakIterator iterator = BreakIterator.getSentenceInstance();
iterator.setText(input);
int start = iterator.first();
for (int end = iterator.next(); end != BreakIterator.DONE; start = end,end = iterator.next()) {
    System.out.println(input.substring(start,end));
}

This results in the following outputs:

Hello World. 
Today in the U.S.A.,it is a nice day! 
Hurrah!
The U.S. is a great country. 
Here it comes... 
Party time!
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>