Java – regular expressions are slow. How to check whether a string is only word characters fast?
I have a function to check that a string (most strings have only one CJK character) has only word characters. It will be called many times, so the cost is unacceptable, but I don't know how to optimize it. Any suggestions?
/*\w is equivalent to the character class [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}]. For more details see Unicode TR-18,and bear in mind that the set of characters in each class can vary between Unicode releases.*/ private static final Pattern sOnlyWordChars = Pattern.compile("\\w+"); private boolean isOnlyWordChars(String s) { return sOnlyWordChars.matcher(s).matches(); }
When s is "3G" or "go_url" or "hao123", isonlywordchars (s) should return true
Solution
private boolean isOnlyWordChars(String s) {
private boolean isOnlyWordChars(String s) { char[] chars = s.tocharArray(); for (char c : chars) { if(!Character.isLetter(c)) { return false; } } return true; }
Better implementation
public static boolean isAlpha(String str) { if (str == null) { return false; } int sz = str.length(); for (int i = 0; i < sz; i++) { if (Character.isLetter(str.charAt(i)) == false) { return false; } } return true; }
Or, if you are using Apache commons, stringutils isAlpha(). The second implementation of the answer actually comes from the source code of isalpha
UPDATE
Sorry for your late reply I'm not sure about the speed, although I read in several places that loops are faster than regular expressions To make sure I run the following code in ideoone, the results are as follows
5000000 iterations
Use your code: 4.99 seconds (run-time error after that, so it doesn't work for big data)
Use my first code for 2.71 seconds
Use my second code for 2.52 seconds
500000 iterations
Use your code: 1.07 seconds
Use my first code for 0.36 seconds
Use my second code for 0.33 seconds
Here is the sample code I use
Note: there may be minor errors You can use it to test different scenarios According to Jan's comments, I think these are small things that use private or public Condition checking is a good idea