Java – Dynamic parsing logic operation – and, or, loop condition
I have an incoming record filter stored with logical clauses, as shown below
Acct1 = 'Y' AND Acct2 = 'N' AND Acct3 = 'N' AND Acct4 = 'N' AND Acct5 = 'N' AND ((Acct6 = 'N' OR Acct7 = 'N' AND Acct1 = 'Y') AND Formatted= 'N' AND Acct9 = 'N' AND (Acct10 = 'N' AND Acct11 = 'N') AND EditableField= 'N' )
The data I enter this clause will come from the CSV file, as shown below
Country,Type,Usage,Acct1,Acct2,Acct3,Acct4,Acct5,Acct6,Acct7,Formatted,Acct9,Acct10,Acct11,EditableField USA,Premium,Corporate,Y,N,Mexico,USA,
I will have to filter out the records in the document according to the conditions defined in the terms This is an example of a simple clause, but there will be more internal conditions than this, and the clause can be changed as long as the user needs, and the record must pass 10 such clauses in order
So I'm looking for a way to dynamically interpret this clause and apply it to incoming records Please provide your suggestions on how to design / any examples (if any)
Solution
This is a complete solution and does not include third-party libraries such as ANTLR or JavaCC Note that although it is extensible, its functionality is still limited If you want to create more complex expressions, you'd better use a syntax generator
First, let's write a tokenizer to split the input string into tags This is the token type:
private static enum TokenType { WHITESPACE,AND,OR,EQUALS,LEFT_PAREN,RIGHT_PAREN,IDENTIFIER,LITERAL,EOF }
Token class itself:
private static class Token { final TokenType type; final int start; // start position in input (for error reporting) final String data; // payload public Token(TokenType type,int start,String data) { this.type = type; this.start = start; this.data = data; } @Override public String toString() { return type + "[" + data + "]"; } }
To simplify tokenization, we create a regexp that reads the next tag from the input string:
private static final Pattern TOKENS = Pattern.compile("(\\s+)|(AND)|(OR)|(=)|(\\()|(\\))|(\\w+)|\'([^\']+)\'");
Note that it has many groups, one for each tokentype, in the same order (first Whiteface, then and, and so on) Finally, the tokenizer method:
private static TokenStream tokenize(String input) throws ParseException { Matcher matcher = TOKENS.matcher(input); List<Token> tokens = new ArrayList<>(); int offset = 0; TokenType[] types = TokenType.values(); while (offset != input.length()) { if (!matcher.find() || matcher.start() != offset) { throw new ParseException("Unexpected token at " + offset,offset); } for (int i = 0; i < types.length; i++) { if (matcher.group(i + 1) != null) { if (types[i] != TokenType.WHITESPACE) tokens.add(new Token(types[i],offset,matcher.group(i + 1))); break; } } offset = matcher.end(); } tokens.add(new Token(TokenType.EOF,input.length(),"")); return new TokenStream(tokens); }
I'm using Java text. ParseException. Here we apply a regular expression matcher until the end of input If it does not match in the current position, we throw an exception Otherwise, we will find the matching group and create a token from it, ignoring the Whiteface token Finally, we add an EOF flag to indicate the end of the input The result is returned as a special tokenstream object This is the tokenstream class, which will help us resolve:
private static class TokenStream { final List<Token> tokens; int offset = 0; public TokenStream(List<Token> tokens) { this.tokens = tokens; } // consume next token of given type (throw exception if type differs) public Token consume(TokenType type) throws ParseException { Token token = tokens.get(offset++); if (token.type != type) { throw new ParseException("Unexpected token at " + token.start + ": " + token + " (was looking for " + type + ")",token.start); } return token; } // consume token of given type (return null and don't advance if type differs) public Token consumeIf(TokenType type) { Token token = tokens.get(offset); if (token.type == type) { offset++; return token; } return null; } @Override public String toString() { return tokens.toString(); } }
So we have a marker, hoorah You can now use system out. Println test (tokenize ("acct1 = 'y' and (acct2 = 'n' or acct3 = 'n'));
Now let's write a parser that will create a tree representation of the expression The first is the interface expr of all tree nodes:
public interface Expr { public boolean evaluate(Map<String,String> data); }
Its only method is to evaluate the expression of a given dataset and return true if the dataset matches
The most basic expression is equalsexpr, which is similar to acct1 = 'y' or 'y' = acct1:
private static class EqualsExpr implements Expr { private final String identifier,literal; public EqualsExpr(TokenStream stream) throws ParseException { Token token = stream.consumeIf(TokenType.IDENTIFIER); if(token != null) { this.identifier = token.data; stream.consume(TokenType.EQUALS); this.literal = stream.consume(TokenType.LITERAL).data; } else { this.literal = stream.consume(TokenType.LITERAL).data; stream.consume(TokenType.EQUALS); this.identifier = stream.consume(TokenType.IDENTIFIER).data; } } @Override public String toString() { return identifier+"='"+literal+"'"; } @Override public boolean evaluate(Map<String,String> data) { return literal.equals(data.get(identifier)); } }
The toString () method is only used to get information, which you can delete
Next, we will define the subexpr class, which is equalsexpr or something more complex in parentheses (if we see parentheses):
private static class SubExpr implements Expr { private final Expr child; public SubExpr(TokenStream stream) throws ParseException { if(stream.consumeIf(TokenType.LEFT_PAREN) != null) { child = new OrExpr(stream); stream.consume(TokenType.RIGHT_PAREN); } else { child = new EqualsExpr(stream); } } @Override public String toString() { return "("+child+")"; } @Override public boolean evaluate(Map<String,String> data) { return child.evaluate(data); } }
Next is andexpr, which is a set of subexpr expressions connected by and operator:
private static class AndExpr implements Expr { private final List<Expr> children = new ArrayList<>(); public AndExpr(TokenStream stream) throws ParseException { do { children.add(new SubExpr(stream)); } while(stream.consumeIf(TokenType.AND) != null); } @Override public String toString() { return children.stream().map(Object::toString).collect(Collectors.joining(" AND ")); } @Override public boolean evaluate(Map<String,String> data) { for(Expr child : children) { if(!child.evaluate(data)) return false; } return true; } }
For brevity, I use the java-8 stream API in toString If you can't use java-8, you can use the for loop to rewrite it or completely delete toString
Finally, we define orexpr, which is a set of andexprs connected by or (usually the priority of or is lower than and) It is very similar to andexpr:
private static class OrExpr implements Expr { private final List<Expr> children = new ArrayList<>(); public OrExpr(TokenStream stream) throws ParseException { do { children.add(new AndExpr(stream)); } while(stream.consumeIf(TokenType.OR) != null); } @Override public String toString() { return children.stream().map(Object::toString).collect(Collectors.joining(" OR ")); } @Override public boolean evaluate(Map<String,String> data) { for(Expr child : children) { if(child.evaluate(data)) return true; } return false; } }
Final analysis method:
public static Expr parse(TokenStream stream) throws ParseException { OrExpr expr = new OrExpr(stream); stream.consume(TokenType.EOF); // ensure that we parsed the whole input return expr; }
Therefore, you can parse the expression to get the expr object, and then evaluate it based on the lines of the CSV file I assume that you can parse CSV rows into map < string, string > This is an example of usage:
Map<String,String> data = new HashMap<>(); data.put("Acct1","Y"); data.put("Acct2","N"); data.put("Acct3","Y"); data.put("Acct4","N"); Expr expr = parse(tokenize("Acct1 = 'Y' AND (Acct2 = 'Y' OR Acct3 = 'Y')")); System.out.println(expr.evaluate(data)); // true expr = parse(tokenize("Acct1 = 'N' OR 'Y' = Acct2 AND Acct3 = 'Y'")); System.out.println(expr.evaluate(data)); // false