Java – regular expressions and escaped and non escaped separators
Problems related to this
I have a string
a\;b\\;c;d
Looks like in Java
String s = "a\\;b\\\\;c;d"
I need to split it with a semicolon according to the following rules:
>If a semicolon is preceded by a backslash, it should not be considered a separator (between a and b). > If the backslash itself is escaped and therefore not escaped as a semicolon, the semicolon should be a separator (between B and C)
Therefore, if there are zero or even backslashes before it, the semicolon should be regarded as a separator
For example, above, I want to get the following string (double backslash of java compiler):
a\;b\\ c d
Solution
You can use regular expressions
(?:\\.|[^;\\]++)*
Match all text between non escaped semicolons:
List<String> matchList = new ArrayList<String>(); try { Pattern regex = Pattern.compile("(?:\\\\.|[^;\\\\]++)*"); Matcher regexMatcher = regex.matcher(subjectString); while (regexMatcher.find()) { matchList.add(regexMatcher.group()); }
explain:
(?: # Match either... \\. # any escaped character | # or... [^;\\]++ # any character(s) except semicolon or backslash; possessive match )* # Repeat any number of times.
Because of nested quantifiers, possessive matching () is very important to avoid catastrophic backtracking