Java – EBNF / parsed: how to translate regexp into peg?

This is a problem specific to the parsed parser framework and general BNF / PEG

Suppose I have a fairly simple regular expression

^\\s*([A-Za-z_][A-Za-z_0-9]*)\\s*=\\s*(\\S+)\\s*$

Represents pseudo EBNF

<line>               ::= <ws>? <identifier> <ws>? '=' <nonwhitespace> <ws>?
<ws>                 ::= (' ' | '\t' | {other whitespace characters})+
<identifier>         ::= <identifier-head> <identifier-tail>
<identifier-head>    ::= <letter> | '_'    
<identifier-tail>    ::= (<letter> | <digit> | '_')*
<letter>             ::= ('A'..'Z') | ('a'..'z')
<digit>              ::= '0'..'9'
<nonwhitespace>      ::= ___________

How do I define non whitespace (one or more characters that are not spaces) in EBNF?

For those who are familiar with the Java parsed library, how to implement rules that define non whitespace?

Solution

You still insist on using the lexical generator convention to specify character ranges and operations on character ranges

Many lexical analyzer generators accept hexadecimal values (similar to 0x) to represent characters, so you can write:

'0'..'9'
 0x30..\0x39

For numbers

For non whitespace, you need to know which character set you are using For 7-Bit ASCII, non whitespace is conceptually all printed characters:

0x21..\0x7E

For iso8859-1:

( 0x21..\0x7E | 0x80-0xFF )

You can decide whether the character code above 0x80 is a space (space is an uninterrupted space?) You can also decide the control character 0x0 0x1f status Is the label (0x9) a blank character? How about Cr 0xd and LF 0xa? How about ETB control characters?

Unicode is more difficult because it is a huge collection and your list becomes huge and confusing it is life. Our DMS software reengineering toolkit is used to build parsers for various languages, and must support ASCII, iso8859-z lexical parsers and many Z and Unicode DMS does not write a complex "addition" regular expression range, but allows subtraction of regular expressions. Therefore, we can write:

<UniCodeLegalCharacters>-<UniCodeWhiteSpace>

This is easier to understand and correct on the first attempt

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>