Zero length matching in Java regex
My code:
Pattern pattern = Pattern.compile("a?"); Matcher matcher = pattern.matcher("ababa"); while(matcher.find()){ System.out.println(matcher.start()+"["+matcher.group()+"]"+matcher.end()); }
Output:
0[a]1 1[]1 2[a]3 3[]3 4[a]5 5[]5
I know:
>"One?" Represents zero or one occurrence of the character 'a'
The Java API says:
> matcher. Start() returns the start index of the last match. > matcher. End() returns the offset of the last character after matching. > matcher. Group() returns the input subsequence matching the previous one For a matcher with an input sequence s, the expressions m.group() and s.substring (m.start(), m.end()) are equivalent And matches an empty string for some patterns, such as a * This method returns the empty string in the empty string input when the pattern matches successfully
What I want to know:
>Under what circumstances does the regular expression engine encounter zero and a given role appears - here is role 'a'. > In those cases, what is the actual value returned by start (), the end () and group () methods in the matcher I've already mentioned what the Java API says But I don't know the actual situation
Solution
of Is a greedy quantifier, so it will try to match 1 occurrence before trying 0 occurrences In your string,
>It starts with the first character 'a' and attempts to match 1 occurrence again‘ The a 'character matches, so it returns the first result you see > and then it moves forward and finds' B'‘ The B 'character does not match the regular expression 1, so the engine backtracks and tries to match 0 The result is an empty string match – > you get the second result. > Then it moves before B, because there is no match there, and it will start with your second "a" character again. > Wait... You see
It's a little more complicated than that, but that's the main idea When a mismatch occurs 1 times, it will try 0 times
As for the values of start, end and group, they will be the positions where the matching starts, ends and groups match, so in the first 0-occurrence matching of your string, you get 1,1 and emtpy strings I'm not sure it really answers your question