Java – a regular expression used to split a German address into its parts
Good evening,
I tried to split the parts of the German address string into its parts through Java Does anyone know regular expressions or libraries do this? To split it, look like this:
Namederstra ß e25a88489 teststadt to namederstra ß e|25a | 88489 | teststadt
or
Teststr. 3 88489 beispieort (gro ß erkreis) to teststr| 3 | 88489 | Beispielort(GroßerKreis)
If the system / regular expression is still valid, it will be perfect if parts such as zip code or city are missing
Is there a regular expression or library I can archive?
Edit: German address rules: Street: people, numbers and spaces, house number: numbers and any characters (or spaces) up to a series of numbers (zip) (at least in these examples) zip code: 5 digits, place or city: the rest may also have spaces, commas or parentheses
Solution
I encountered a similar problem, slightly adjusted the solution provided here, and found that this solution can also work, but (IMO) is a little easy to understand and expand:
/^([a-zäöüß\s\d.,-]+?)\s*([\d\s]+(?:\s?[-|+/]\s?\d+)?\s*[a-z]?)?\s*(\d{5})\s*(.+)?$/i
Here are some example matches
It can also handle missing street numbers and can be easily extended by adding special characters to character classes
[a-zäöüß\s\d,.-]+? # Street name (lazy) [\d\s]+(?:\s?[-|+/]\s?\d+)?\s*[a-z]?)? # Street number (optional)
After that, there must be a zip code, which is absolutely necessary because it is the only constant part Everything after the zip code is regarded as the city name