Java – regular expressions retrieve referenced strings and reference characters
I have a language that defines a string as single or double quotation marks and escapes the string as a separator in the string by doubling For example, all of the following are legal strings:
'This isn''t easy to parse.' 'Then John said,"Hello Tim!"' "This isn't easy to parse." "Then John said,""Hello Tim!"""
I have a collection of strings (defined above) separated by things that do not contain quotes I'm trying to use regular expressions that parse every string in the list For example, here is an input:
The regular expression used to determine whether a string has this form is trivial:
^(?:"(?:[^"]|"")*"|'(?:[^']|'')*')(?:\s+[^"'\s]+\s+(?:"(?:[^"]|"")*"|'(?:[^']|'')*')*
Run the above expression to test whether it is in this form. I need another regular expression to get each delimited string from the input I intend to do this:
Pattern pattern = Pattern.compile("What REGEX goes here?"); Matcher matcher = pattern.matcher(inputString); int startIndex = 0; while (matcher.find(startIndex)) { String quote = matcher.group(1); String quotedString = matcher.group(2); ... startIndex = matcher.end(); }
I want a regular expression to capture the quoted characters in group #1 and the text in group # 2 (I'm using java regex) Therefore, for the above input, I am looking for a regular expression to produce the following output in each loop iteration:
Loop 1: matcher.group(1) = " matcher.group(2) = Some String #1 Loop 2: matcher.group(1) = ' matcher.group(2) = Some String #2 Loop 3: matcher.group(1) = " matcher.group(2) = Some 'String' #3 Loop 4: matcher.group(1) = ' matcher.group(2) = Some "String" #4 Loop 5: matcher.group(1) = " matcher.group(2) = Some ""String"" #5 Loop 6: matcher.group(1) = ' matcher.group(2) = Some ''String'' #6
Patterns I've tried so far (not escaped, followed by java code escape):
(["'])((?:[^\1]|\1\1)*)\1 "([\"'])((?:[^\\1]|\\1\\1)*)\\1" (?<quot>")(?<val>(?:[^"]|"")*)"|(?<quot>')(?<val>(?:[^']|'')*)' "(?<quot>\")(?<val>(?:[^\"]|\"\")*)\"|(?<quot>')(?<val>(?:[^']|'')*)'"
When trying to compile mode, both fail
Is such a regular expression possible?
Solution
Create a utility class that matches you:
class test { private static Pattern pd = Pattern.compile("(\")((?:[^\"]|\"\")*)\""); private static Pattern ps = Pattern.compile("(')((?:[^']|'')*)'"); public static Matcher match(String s) { Matcher md = pd.matcher(s); if (md.matches()) return md; else return ps.matcher(s); } }