拥有自己的解析器(C#实现LALR(1)语法解析器和miniDFA词法分析器的生成器)
拥有自己的解析器(C#实现LALR(1)语法解析器和miniDFA词法分析器的生成器)参考lex和yacc的输入格式,参考虎书《现代编译原理-C语言描述》的算法,不依赖第三方库,大力整合优化,实现了LALR(1)语法解析器和miniDFA词法分析器的C#生成器(暂命名为bitParser)。
可在(https://gitee.com/bitzhuwei/bitParser-demos)下载ANSI C语言、GLGL4.60.8和各类测试用例的解析器完整代码。
(https://www.cnblogs.com/bitzhuwei/p/18544785)列示了实现词法分析器和语法分析器的生成器的全部算法。
1234+567+89+0+0的语法树生成过程☟
(1234+567)/(89-0)的语法树生成过程☟
可在(https://www.cnblogs.com/bitzhuwei/p/18679529)查看更多语法树的生成过程。(想看自定义表达式的语法树生成过程的同学请留言)
词法分析器的生成器
[*]分别生成DFA和最小化DFA(以下简称miniDFA)的词法分析器代码(状态转换表、保留字、Token类型等)
[*]支持全Unicode字符、/后缀、前缀<'Vt'>、状态信号<signal1, signal2, ..>,便于识别id = refId、struct type_name、<Comment>[^*\n]*、subroutine(type_name_list)这类情况。类似lex,但不完全相同。
[*]无须显式书写Token类型、状态信号、保留字,即无须lex中的%s NUM ID ..、%x Comment End Text ..、[(] { return ('('); }。
[*]注释详尽。每个词法状态的每个条件分支都在注释中说明其正则表达式、导向的Token类型(和signal)等。
[*]生成ε-NFA、NFA、DFA、miniDFA的状态图(mermaid格式)和各类型Token的状态图(mermaid格式),便于学习和调试。
语法分析器的生成器
[*]分别生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的语法分析器代码(分析表、规则列表、语法树结点类型等)。
[*]支持优先级指令%nonassoc、%left、%right、%prec,自动解决Shift/Reduce、Reduce/Reduce冲突,并在分析表代码的注释中列示之。
[*]注释详尽。在注释中列示:每个语法状态的LR项和lookahead;冲突数量、已解决数量、未解决数量等。
[*]生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的状态图(mermaid格式)和状态表(markdown格式),便于学习和调试。
[*]生成nullable、FIRST集、FOLLOW集的文档。
其他
[*]无须lex和yacc中的%option、%union、%define、%{、%}、%parse-param、%lex-param、%pure-parser、%expect、%token、%type。做成类库,直接按如下方式调用即可:
var compiler = new CompilerXxx();var sourceCode = File.ReadAllText("input.st");var tokens = compiler.Analyze(sourceCode);var syntaxTree = compiler.Parse(tokens);var extractedObj = compiler.Extract(syntaxTree, tokens, sourceCode);// use extractedObj for user-defined business ..
[*]支持多行注释指令%blockComment on/off和单行注释指令%inlineComment on/off。默认格式同C语言的/**/和//,可自定义其格式,例如:在解析VRML文件时,将单行注释的格式定义为从#到行尾:
%%#[^\r\n]*%% 'inlineComment'
[*]生成遍历语法树提取语义信息的框架,提供适用各种语言的源代码格式化算法。可用于格式化、进一步生成中间代码等后续业务逻辑。
[*]大力优化,例如生成ANSI C语言的全部解析器代码+文档只需3秒,生成GLSL4.60.8的全部解析器代码+文档只需9秒。
点击查看 其他功能
[*]支持Scope范围指令%validScopeChars和全局范围指令%validGlobalChars,默认范围均为[\u0001-\uFFFF](即除'\0'外的全部Unicode字符),可自定义其范围。
[*]支持%omit指令,可指定要忽略的空白符。默认为'空格'、'\t'、'\n'、'\r'、'\0'。
[*]支持%start指定起始语法结点。
举例-Calc.st
输入文件Calc.st
能够处理加减乘除和括号运算的解析器,其文法如下:
// 输入文件Calc.stExp : Exp '+' Term | Exp '-' Term | Term ;Term : Term '*' Factor | Term '/' Factor | Factor ;Factor : '(' Exp ')' | 'number' ;%%+%% 'number' // 示例只处理非负整数//无须书写 %%[+]%% '+' 等据此文法,我们可以生成下述内容:
生成的词法分析器
生成的ε-NFA的词法分析器的状态图如下:
生成的miniDFA的词法分析器的状态图如下:
点击查看 生成的 终结点Vt和非终结点Vn 代码// 如不需要,可删除此数组public static readonly IReadOnlyList<string> stArray = new string[] { "'¥'", // @终 = 0; "'+'", // @Plus符 = 1; // '+' "'-'", // @Dash符 = 2; // '-' "'*'", // @Asterisk符 = 3; // '*' "'/'", // @Slash符 = 4; // '/' "'('", // @LeftParenthesis符 = 5; // '(' "')'", // @RightParenthesis符 = 6; // ')' "'number'", // @number = 7; // 'number' // end of 1 + 7 Vt "Exp", // Exp枝 = 8; // Exp "Term", // Term枝 = 9; // Term "Factor", // Factor枝 = 10; // Factor // end of 3 Vn};/// <summary>/// Vt types are used both for lexical-analyze and syntax-parse./// <para>Vn types are only for syntax-parse.</para>/// <para>Vt is quoted in ''.</para>/// <para>Vn is not quoted in ''.</para>/// </summary>public static class st { // Vt /// <summary> /// Something wrong within the source code. /// </summary> public const int Error错 = -1; // "'×'"; /// <summary> /// end of token list. /// </summary> public const int @终 = 0; // "'¥'"; /// <summary> /// '+' /// </summary> public const int @Plus符 = 1; // "'+'" /// <summary> /// '-' /// </summary> public const int @Dash符 = 2; // "'-'" /// <summary> /// '*' /// </summary> public const int @Asterisk符 = 3; // "'*'" /// <summary> /// '/' /// </summary> public const int @Slash符 = 4; // "'/'" /// <summary> /// '(' /// </summary> public const int @LeftParenthesis符 = 5; // "'('" /// <summary> /// ')' /// </summary> public const int @RightParenthesis符 = 6; // "')'" /// <summary> /// 'number' /// </summary> public const int @number = 7; // "'number'" /// <summary> /// count of ('¥' + user-defined Vt) /// </summary> public const int VtCount = 8; // Vn /// <summary> /// Exp /// </summary> public const int Exp枝 = 8; // "Exp" /// <summary> /// Term /// </summary> public const int Term枝 = 9; // "Term" /// <summary> /// Factor /// </summary> public const int Factor枝 = 10; // "Factor"}点击查看 生成的 保留字(即一个语言中的keyword) 相关代码public static class reservedWord { /// <summary> /// + /// </summary> public const string @Plus符 = "+"; /// <summary> /// - /// </summary> public const string @Dash符 = "-"; /// <summary> /// * /// </summary> public const string @Asterisk符 = "*"; /// <summary> /// / /// </summary> public const string @Slash符 = "/"; /// <summary> /// ( /// </summary> public const string @LeftParenthesis符 = "("; /// <summary> /// ) /// </summary> public const string @RightParenthesis符 = ")";}/// <summary>/// if <paramref name="token"/> is a reserved word, assign correspond type and return true./// <para>otherwise, return false.</para>/// </summary>/// <param name="token"></param>/// <returns></returns>private static bool CheckReservedWord(AnalyzingToken token) { bool isReservedWord = true; switch (token.value) { case reservedWord.@Plus符: token.type = st.@Plus符; break; case reservedWord.@Dash符: token.type = st.@Dash符; break; case reservedWord.@Asterisk符: token.type = st.@Asterisk符; break; case reservedWord.@Slash符: token.type = st.@Slash符; break; case reservedWord.@LeftParenthesis符: token.type = st.@LeftParenthesis符; break; case reservedWord.@RightParenthesis符: token.type = st.@RightParenthesis符; break; default: isReservedWord = false; break; } return isReservedWord;}以下是用8个Action<LexicalContext, char, CurrentStateWrapper>函数委托实现的词法状态转换表。
点击查看 生成的 lexi状态0 相关代码// lexicalState0private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState0 =static (context, c, wrapper) => { if (false) { /* for simpler code generation purpose. */ } /* */ else if (/* possible Vt : 'number' */ '0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) { BeginToken(context); ExtendToken(context); wrapper.currentState = lexicalState1; } /* \) */ else if (/* possible Vt : ')' */ c == ')'/*'\u0029'(41)*/) { BeginToken(context); ExtendToken(context); wrapper.currentState = lexicalState2; } /* \( */ else if (/* possible Vt : '(' */ c == '('/*'\u0028'(40)*/) { BeginToken(context); ExtendToken(context); wrapper.currentState = lexicalState3; } /* \/ */ else if (/* possible Vt : '/' */ c == '/'/*'\u002F'(47)*/) { BeginToken(context); ExtendToken(context); wrapper.currentState = lexicalState4; } /* \* */ else if (/* possible Vt : '*' */ c == '*'/*'\u002A'(42)*/) { BeginToken(context); ExtendToken(context); wrapper.currentState = lexicalState5; } /* - */ else if (/* possible Vt : '-' */ c == '-'/*'\u002D'(45)*/) { BeginToken(context); ExtendToken(context); wrapper.currentState = lexicalState6; } /* \+ */ else if (/* possible Vt : '+' */ c == '+'/*'\u002B'(43)*/) { BeginToken(context); ExtendToken(context); wrapper.currentState = lexicalState7; } /* deal with everything else. */ else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') { wrapper.currentState = lexicalState0; // skip them. } else { // unexpected char. BeginToken(context); ExtendToken(context); AcceptToken(st.Error, context); wrapper.currentState = lexicalState0; }};点击查看 生成的 lexi状态1 相关代码// lexicalState1private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState1 =static (context, c, wrapper) => { if (false) { /* for simpler code generation purpose. */ } /* */ else if (/* possible Vt : 'number' */ '0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) { ExtendToken(context); wrapper.currentState = lexicalState1; } /* deal with everything else. */ else { AcceptToken(st.@number, context); wrapper.currentState = lexicalState0; }};点击查看 生成的 lexi状态2 相关代码// lexicalState2private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState2 =static (context, c, wrapper) => { if (false) { /* for simpler code generation purpose. */ } /* deal with everything else. */ else { AcceptToken(st.@RightParenthesis符, context); wrapper.currentState = lexicalState0; }};点击查看 生成的 lexi状态3 相关代码// lexicalState3private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState3 =static (context, c, wrapper) => { if (false) { /* for simpler code generation purpose. */ } /* deal with everything else. */ else { AcceptToken(st.@LeftParenthesis符, context); wrapper.currentState = lexicalState0; }};点击查看 生成的 lexi状态4 相关代码// lexicalState4private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState4 =static (context, c, wrapper) => { if (false) { /* for simpler code generation purpose. */ } /* deal with everything else. */ else { AcceptToken(st.@Slash符, context); wrapper.currentState = lexicalState0; }};点击查看 生成的 lexi状态5 相关代码// lexicalState5private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState5 =static (context, c, wrapper) => { if (false) { /* for simpler code generation purpose. */ } /* deal with everything else. */ else { AcceptToken(st.@Asterisk符, context); wrapper.currentState = lexicalState0; }};点击查看 生成的 lexi状态6 相关代码// lexicalState6private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState6 =static (context, c, wrapper) => { if (false) { /* for simpler code generation purpose. */ } /* deal with everything else. */ else { AcceptToken(st.@Dash符, context); wrapper.currentState = lexicalState0; }};点击查看 生成的 lexi状态7 相关代码// lexicalState7private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState7 =static (context, c, wrapper) => { if (false) { /* for simpler code generation purpose. */ } /* deal with everything else. */ else { AcceptToken(st.@Plus符, context); wrapper.currentState = lexicalState0; }};其调用过程如下:
// analyze the specified sourceCode and return a list of Token.public TokenList Analyze(string sourceCode) { var context = new LexicalContext(sourceCode); var wrapper = new CurrentStateWrapper(this.initialState); do { char currentChar = context.MoveForward(); wrapper.currentState(context, currentChar, wrapper); // wrapper.currentState will be set to next lexi-state. } while (!context.EOF); return context.result;}下面是用一个二维数组ElseIf[][]实现的词法状态转换表,其占用空间较少,且执行效率也有所提高。
private static readonly ElseIf[][] lexiStates = new ElseIf[];static void InitializeLexiTable() { lexiStates = new ElseIf[] { // possible Vt: '(' /*0*/new('('/*'\u0028'(40)*/, Acts.Begin | Acts.Extend, 3), // possible Vt: ')' /*1*/new(')'/*'\u0029'(41)*/, Acts.Begin | Acts.Extend, 2), // possible Vt: '*' /*2*/new('*'/*'\u002A'(42)*/, Acts.Begin | Acts.Extend, 5), // possible Vt: '+' /*3*/new('+'/*'\u002B'(43)*/, Acts.Begin | Acts.Extend, 7), // possible Vt: '-' /*4*/new('-'/*'\u002D'(45)*/, Acts.Begin | Acts.Extend, 6), // possible Vt: '/' /*5*/new('/'/*'\u002F'(47)*/, Acts.Begin | Acts.Extend, 4), // possible Vt: 'number' /*6*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 1), }; lexiStates = new ElseIf[] { // possible Vt: 'number' /*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 1), // possible Vt: 'number' /*1*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@number), }; lexiStates = new ElseIf[] { // possible Vt: ')' /*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@RightParenthesis符), }; lexiStates = new ElseIf[] { // possible Vt: '(' /*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@LeftParenthesis符), }; lexiStates = new ElseIf[] { // possible Vt: '/' /*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Slash符), }; lexiStates = new ElseIf[] { // possible Vt: '*' /*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Asterisk符), }; lexiStates = new ElseIf[] { // possible Vt: '-' /*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Dash符), }; lexiStates = new ElseIf[] { // possible Vt: '+' /*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Plus符), };}顾名思义,一个ElseIf与函数委托方式中的一个else if ('0' <= c && c <= '9') { .. }的作用相同。这样,就用一个ElseIf[]取代了一个函数委托;而且在调用此表时可以用折半查找方式快速定位ElseIf。
点击查看 调用二维数组实现的词法分析器 相关代码// skip '\0' at lexi-state 0private static readonly ElseIf skipZero = new( char.MinValue, char.MaxValue, Acts.None, nextState: 0);// construct a error token at lexi-state 0private static readonly ElseIf unexpectedChar = new( char.MinValue, char.MaxValue, Acts.Begin | Acts.Extend | Acts.Accept, nextState: 0, -1);// -1 means error("'×'");// construct a error token at other lexi-statesprivate static readonly ElseIf errorToken = new( char.MinValue, char.MaxValue, Acts.Extend | Acts.Accept, nextState: 0, -1);// -1 means error("'×'");// analyze the specified sourceCode and return a list of Token.public TokenList Analyze(string sourceCode) { var context = new LexicalContext(sourceCode); var currentStateId = 0; do { // read current char, char currentChar = context.MoveForward(); ElseIf[] lexiState = lexiStates; // binary search for the segment( else if (left <= c && c <= right) { ... } ) var segment = BinarySearch(currentChar, lexiState, currentStateId != 0); if (segment is null) { if (currentStateId == 0) { // the initial state of lexical analyze. segment = BinarySearch(currentChar, this.omitChars, currentStateId != 0); if (segment is null) { // '\0' must be skipped. if (currentChar == '\0') { segment = skipZero; } else { segment = unexpectedChar; } } } else { // token with error type segment = errorToken; } } // construct the next token, var scripts = segment.scripts; if (scripts != 0) { // it is 0 in most cases. if ((scripts & Acts.Begin) != 0) { this.beginToken(context); } if ((scripts & Acts.Extend) != 0) { this.extendToken(context); } if ((scripts & Acts.Accept) != 0) { this.acceptToken(context, segment.Vt); } else if ((scripts & Acts.Accept2) != 0) { this.acceptToken2(context, segment.ifVts); } } // point to next state. currentStateId = segment.nextStateId; } while (!context.EOF); return context.result;}private ElseIf? BinarySearch(char currentChar, ElseIf[] lexiState, bool specialLast) { var left = 0; var right = lexiState.Length - (specialLast ? 2 : 1); if (right < 0) { } else { var result = -1; while (left < right) { var mid = (left + right) / 2; var current = lexiState; if (currentChar < current.min) { result = -1; } else if (current.max < currentChar) { result = 1; } else { return current; } if (result < 0) { right = mid; } else { left = mid + 1; } } { var current = lexiState; if (current.min <= currentChar && currentChar <= current.max) { return current; } } } if (specialLast) { var last = lexiState; return last; // no need to compare, because it's ['\0', '\uFFFF'] //if (last.min <= currentChar && currentChar <= last.max) { // return last; //} } return null;}生成的语法分析器
点击查看 nullable、FIRST集、FOLLOW集nullable:: nullable( Exp' ) = False: nullable( Exp ) = False: nullable( Term ) = False: nullable( Factor ) = False: nullable( '¥' ) = False: nullable( '+' ) = False: nullable( '-' ) = False: nullable( '*' ) = False: nullable( '/' ) = False: nullable( '(' ) = False: nullable( ')' ) = False: nullable( 'number' ) = FalseFIRST集:: FIRST( Exp' ) = { '(' 'number' }: FIRST( Exp ) = { '(' 'number' }: FIRST( Term ) = { '(' 'number' }: FIRST( Factor ) = { '(' 'number' }: FIRST( '¥' ) = { '¥' }: FIRST( '+' ) = { '+' }: FIRST( '-' ) = { '-' }: FIRST( '*' ) = { '*' }: FIRST( '/' ) = { '/' }: FIRST( '(' ) = { '(' }: FIRST( ')' ) = { ')' }: FIRST( 'number' ) = { 'number' }: FIRST( Exp '+' Term ) = { '(' 'number' }: FIRST( Exp '-' Term ) = { '(' 'number' }: FIRST( Term '*' Factor ) = { '(' 'number' }: FIRST( Term '/' Factor ) = { '(' 'number' }: FIRST( '(' Exp ')' ) = { '(' }FOLLOW集:: FOLLOW( Exp' ) = { '¥' }: FOLLOW( Exp ) = { '-' ')' '+' '¥' }: FOLLOW( Term ) = { '-' ')' '*' '/' '+' '¥' }: FOLLOW( Factor ) = { '-' ')' '*' '/' '+' '¥' }点击查看 生成的 规约规则 代码public static IReadOnlyList<Regulation> Regulations => regulations;private static readonly Regulation[] regulations = new Regulation[] { // =Exp : Exp '+' Term ; new(0, st.Exp枝, st.Exp枝, st.@Plus符, st.Term枝), // =Exp : Exp '-' Term ; new(1, st.Exp枝, st.Exp枝, st.@Dash符, st.Term枝), // =Exp : Term ; new(2, st.Exp枝, st.Term枝), // =Term : Term '*' Factor ; new(3, st.Term枝, st.Term枝, st.@Asterisk符, st.Factor枝), // =Term : Term '/' Factor ; new(4, st.Term枝, st.Term枝, st.@Slash符, st.Factor枝), // =Term : Factor ; new(5, st.Term枝, st.Factor枝), // =Factor : '(' Exp ')' ; new(6, st.Factor枝, st.@LeftParenthesis符, st.Exp枝, st.@RightParenthesis符), // =Factor : 'number' ; new(7, st.Factor枝, st.@number), };点击查看 生成的 LALR(1)语法分析表 代码const int syntaxStateCount = 16;// LALR(1) syntax parse tableprivate static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[] syntaxStates = new Dictionary<string, LRParseAction>;private static void InitializeSyntaxStates() { var states = CompilerExp.syntaxStates; // 78 actions // conflicts(0)=not sovled(0)+solved(0)(0 warnings) #region create objects of syntax states states = new(capacity: 5); states = new(capacity: 3); states = new(capacity: 6); states = new(capacity: 6); states = new(capacity: 5); states = new(capacity: 6); states = new(capacity: 4); states = new(capacity: 4); states = new(capacity: 3); states = new(capacity: 3); states = new(capacity: 3); states = new(capacity: 6); states = new(capacity: 6); states = new(capacity: 6); states = new(capacity: 6); states = new(capacity: 6); #endregion create objects of syntax states #region re-used actions LRParseAction aGoto2 = new(LRParseAction.Kind.Goto, states);// refered 2 times LRParseAction aGoto3 = new(LRParseAction.Kind.Goto, states);// refered 4 times LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states);// refered 2 times LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states);// refered 2 times LRParseAction aShift8 = new(LRParseAction.Kind.Shift, states);// refered 3 times LRParseAction aShift9 = new(LRParseAction.Kind.Shift, states);// refered 3 times LRParseAction aReduce2 = new(regulations);// refered 4 times LRParseAction aReduce5 = new(regulations);// refered 6 times LRParseAction aReduce7 = new(regulations);// refered 6 times LRParseAction aReduce0 = new(regulations);// refered 4 times LRParseAction aReduce1 = new(regulations);// refered 4 times LRParseAction aReduce3 = new(regulations);// refered 6 times LRParseAction aReduce4 = new(regulations);// refered 6 times LRParseAction aReduce6 = new(regulations);// refered 6 times #endregion re-used actions // 78 actions // conflicts(0)=not sovled(0)+solved(0)(0 warnings) #region init actions of syntax states // syntaxStates: // [-1] Exp' : ⏳ Exp ;☕ '¥' // Exp : ⏳ Exp '+' Term ;☕ '-' '+' '¥' // Exp : ⏳ Exp '-' Term ;☕ '-' '+' '¥' // Exp : ⏳ Term ;☕ '-' '+' '¥' // Term : ⏳ Term '*' Factor ;☕ '-' '*' '/' '+' '¥' // Term : ⏳ Term '/' Factor ;☕ '-' '*' '/' '+' '¥' // Term : ⏳ Factor ;☕ '-' '*' '/' '+' '¥' // Factor : ⏳ '(' Exp ')' ;☕ '-' '*' '/' '+' '¥' // Factor : ⏳ 'number' ;☕ '-' '*' '/' '+' '¥' /*0*/states.Add(st.Exp枝, new(LRParseAction.Kind.Goto, states)); /*1*/states.Add(st.Term枝, aGoto2); /*2*/states.Add(st.Factor枝, aGoto3); /*3*/states.Add(st.@LeftParenthesis符, aShift4); /*4*/states.Add(st.@number, aShift5); // syntaxStates: // [-1] Exp' : Exp ⏳ ;☕ '¥' // Exp : Exp ⏳ '+' Term ;☕ '-' '+' '¥' // Exp : Exp ⏳ '-' Term ;☕ '-' '+' '¥' /*5*/states.Add(st.@Plus符, aShift6); /*6*/states.Add(st.@Dash符, aShift7); /*7*/states.Add(st.@终, LRParseAction.accept); // syntaxStates: // Exp : Term ⏳ ;☕ '-' ')' '+' '¥' // Term : Term ⏳ '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' // Term : Term ⏳ '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' /*8*/states.Add(st.@Asterisk符, aShift8); /*9*/states.Add(st.@Slash符, aShift9); /*10*/states.Add(st.@Dash符, aReduce2); /*11*/states.Add(st.@RightParenthesis符, aReduce2); /*12*/states.Add(st.@Plus符, aReduce2); /*13*/states.Add(st.@终, aReduce2); // syntaxStates: // Term : Factor ⏳ ;☕ '-' ')' '*' '/' '+' '¥' /*14*/states.Add(st.@Dash符, aReduce5); /*15*/states.Add(st.@RightParenthesis符, aReduce5); /*16*/states.Add(st.@Asterisk符, aReduce5); /*17*/states.Add(st.@Slash符, aReduce5); /*18*/states.Add(st.@Plus符, aReduce5); /*19*/states.Add(st.@终, aReduce5); // syntaxStates: // Factor : '(' ⏳ Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '+' Term ;☕ '-' ')' '+' // Exp : ⏳ Exp '-' Term ;☕ '-' ')' '+' // Exp : ⏳ Term ;☕ '-' ')' '+' // Term : ⏳ Term '*' Factor ;☕ '-' ')' '*' '/' '+' // Term : ⏳ Term '/' Factor ;☕ '-' ')' '*' '/' '+' // Term : ⏳ Factor ;☕ '-' ')' '*' '/' '+' // Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' // Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' /*20*/states.Add(st.Exp枝, new(LRParseAction.Kind.Goto, states)); /*21*/states.Add(st.Term枝, aGoto2); /*22*/states.Add(st.Factor枝, aGoto3); /*23*/states.Add(st.@LeftParenthesis符, aShift4); /*24*/states.Add(st.@number, aShift5); // syntaxStates: // Factor : 'number' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' /*25*/states.Add(st.@Dash符, aReduce7); /*26*/states.Add(st.@RightParenthesis符, aReduce7); /*27*/states.Add(st.@Asterisk符, aReduce7); /*28*/states.Add(st.@Slash符, aReduce7); /*29*/states.Add(st.@Plus符, aReduce7); /*30*/states.Add(st.@终, aReduce7); // syntaxStates: // Exp : Exp '+' ⏳ Term ;☕ '-' ')' '+' '¥' // Term : ⏳ Term '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' // Term : ⏳ Term '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' // Term : ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥' // Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' /*31*/states.Add(st.Term枝, new(LRParseAction.Kind.Goto, states)); /*32*/states.Add(st.Factor枝, aGoto3); /*33*/states.Add(st.@LeftParenthesis符, aShift4); /*34*/states.Add(st.@number, aShift5); // syntaxStates: // Exp : Exp '-' ⏳ Term ;☕ '-' ')' '+' '¥' // Term : ⏳ Term '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' // Term : ⏳ Term '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' // Term : ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥' // Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' /*35*/states.Add(st.Term枝, new(LRParseAction.Kind.Goto, states)); /*36*/states.Add(st.Factor枝, aGoto3); /*37*/states.Add(st.@LeftParenthesis符, aShift4); /*38*/states.Add(st.@number, aShift5); // syntaxStates: // Term : Term '*' ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥' // Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' /*39*/states.Add(st.Factor枝, new(LRParseAction.Kind.Goto, states)); /*40*/states.Add(st.@LeftParenthesis符, aShift4); /*41*/states.Add(st.@number, aShift5); // syntaxStates: // Term : Term '/' ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥' // Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' /*42*/states.Add(st.Factor枝, new(LRParseAction.Kind.Goto, states)); /*43*/states.Add(st.@LeftParenthesis符, aShift4); /*44*/states.Add(st.@number, aShift5); // syntaxStates: // Factor : '(' Exp ⏳ ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '+' Term ;☕ '-' ')' '+' // Exp : Exp ⏳ '-' Term ;☕ '-' ')' '+' /*45*/states.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states)); /*46*/states.Add(st.@Plus符, aShift6); /*47*/states.Add(st.@Dash符, aShift7); // syntaxStates: // Exp : Exp '+' Term ⏳ ;☕ '-' ')' '+' '¥' // Term : Term ⏳ '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' // Term : Term ⏳ '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' /*48*/states.Add(st.@Asterisk符, aShift8); /*49*/states.Add(st.@Slash符, aShift9); /*50*/states.Add(st.@Dash符, aReduce0); /*51*/states.Add(st.@RightParenthesis符, aReduce0); /*52*/states.Add(st.@Plus符, aReduce0); /*53*/states.Add(st.@终, aReduce0); // syntaxStates: // Exp : Exp '-' Term ⏳ ;☕ '-' ')' '+' '¥' // Term : Term ⏳ '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' // Term : Term ⏳ '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' /*54*/states.Add(st.@Asterisk符, aShift8); /*55*/states.Add(st.@Slash符, aShift9); /*56*/states.Add(st.@Dash符, aReduce1); /*57*/states.Add(st.@RightParenthesis符, aReduce1); /*58*/states.Add(st.@Plus符, aReduce1); /*59*/states.Add(st.@终, aReduce1); // syntaxStates: // Term : Term '*' Factor ⏳ ;☕ '-' ')' '*' '/' '+' '¥' /*60*/states.Add(st.@Dash符, aReduce3); /*61*/states.Add(st.@RightParenthesis符, aReduce3); /*62*/states.Add(st.@Asterisk符, aReduce3); /*63*/states.Add(st.@Slash符, aReduce3); /*64*/states.Add(st.@Plus符, aReduce3); /*65*/states.Add(st.@终, aReduce3); // syntaxStates: // Term : Term '/' Factor ⏳ ;☕ '-' ')' '*' '/' '+' '¥' /*66*/states.Add(st.@Dash符, aReduce4); /*67*/states.Add(st.@RightParenthesis符, aReduce4); /*68*/states.Add(st.@Asterisk符, aReduce4); /*69*/states.Add(st.@Slash符, aReduce4); /*70*/states.Add(st.@Plus符, aReduce4); /*71*/states.Add(st.@终, aReduce4); // syntaxStates: // Factor : '(' Exp ')' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' /*72*/states.Add(st.@Dash符, aReduce6); /*73*/states.Add(st.@RightParenthesis符, aReduce6); /*74*/states.Add(st.@Asterisk符, aReduce6); /*75*/states.Add(st.@Slash符, aReduce6); /*76*/states.Add(st.@Plus符, aReduce6); /*77*/states.Add(st.@终, aReduce6); #endregion init actions of syntax states}生成的LALR(1)语法分析器的状态图和状态表如下:
由于mermaid支持的字符数量有限,往往不能完全显示一个语言(例如C语言)的全部语法状态及其LR项+lookahead,我在生成状态图时默认只显示每个语法状态的前3个LR项+lookahead。完整的LR项+lookahead可以在生成的语法分析表代码中找到。
状态'+''-''*''/''('')''number''¥'ExpTermFactor0S4S5G1G2G31S6S7✅2RRS8S9RR3RRRRRR4S4S5G10G2G35RRRRRR6S4S5G11G37S4S5G12G38S4S5G139S4S5G1410S6S7S1511RRS8S9RR12RRS8S9RR13RRRRRR14RRRRRR15RRRRRR上表中:
[*]S6表示Shift并进入状态6;
[*]R表示用regulations=Exp : Term ;规约;
[*]G1表示进入状态1;
[*]✅表示Accept,即分析完毕,语法树生成成功;
[*]空白的地方表示遇到语法错误。
点击查看 调用语法分析表 代码public SyntaxTree Parse(TokenList tokenList) { var context = LRSyntaxContext(tokenList, this.initialState, this.EOT, this.stArray); var accept = false; do { Token token = context.CurrentToken; int nodeType = token.type;// auto-convert from string to string. while (nodeType == blockComment || nodeType == inlineComment) { // skip comment token context.cursor++; token = context.CurrentToken; nodeType = token.type;// auto-convert from string to string. } Dictionary<int/*Node.type*/, LRParseAction> currentState = context.stateStack.Peek(); if (currentState.TryGetValue(nodeType, out var parseAction)) { parseAction.Execute(context); accept = parseAction.kind == LRParseAction.Kind.Accept; } else { // syntax error happened. return new SyntaxTree(context); } } while (!accept); var root = context.root; Debug.Assert(root != null); return new SyntaxTree(root);}public class LRParseAction { public enum Kind { Error, Shift, Reduce, Goto, Accept, } public readonly Kind kind; struct Union { public Dictionary<int/*Node.type*/, LRParseAction> nextState; public Regulation regulation; } private Union union; // 执行分析动作。 public void Execute(LRSyntaxContext context) { switch (this.kind) { case Kind.Error: { throw new NotImplementedException(); } //break; case Kind.Shift: { var token = context.CurrentToken; var leaf = new LRNode(token); context.nodeStack.Push(leaf); var nextState = this.union.nextState; context.stateStack.Push(nextState); // prepare for next loop. context.cursor++; } break; case Kind.Reduce: { var regulation = this.union.regulation; int count = regulation.right.Length; var children = new LRNode; var start = Token.empty; LRNode? lastNode = null; var first = true; for (int i = 0; i < count; i++) { var state = context.stateStack.Pop();//只弹出,不再使用。 var node = context.nodeStack.Pop(); children = node; if (node.start.index >= 0) { // this node includes token if (first) { lastNode = node; first = false; } start = node.start; } } int tokenCount = 0; if (lastNode is not null) { tokenCount = lastNode.start.index // comment tokens inside of parent are included. - start.index // comment tokens before parent are excluded. + lastNode.tokenCount; // comment tokens after parent are excluded. } var parent = new LRNode(regulation, start, tokenCount, children); for (var i = 0; i < count; i++) { children.parent = parent; } context.nodeStack.Push(parent); // goto next syntax-state Dictionary<int/*Node.type*/, LRParseAction> currentState = context.stateStack.Peek(); var nodeType = regulation.left; if (currentState.TryGetValue(nodeType, out var parseAction)) { parseAction.Execute(context); // parseAction is supposed to be a Goto action } Debug.Assert(parseAction != null && parseAction.kind == Kind.Goto); } break; case Kind.Goto: { var nextState = this.union.nextState; context.stateStack.Push(nextState); } break; case Kind.Accept: { var state = context.stateStack.Pop(); context.root = context.nodeStack.Pop(); context.Finish(context.root, 40, context.stArray); } break; default: { throw new NotImplementedException(); } } }}生成的语义内容提取器
举例来说,对于1234+567+89+0+0这个输入,Calc.st经过语法分析得到的语法树如下所示:(语法树的生成过程见本文开头)
R=Exp : Exp '+' Term ;⛪T ├─R=Exp : Exp '+' Term ;⛪T │├─R=Exp : Exp '+' Term ;⛪T ││├─R=Exp : Exp '+' Term ;⛪T │││├─R=Exp : Term ;⛪T ││││└─R=Term : Factor ;⛪T ││││ └─R=Factor : 'number' ;⛪T ││││ └─T='number' 1234 │││├─T='+' + │││└─R=Term : Factor ;⛪T │││ └─R=Factor : 'number' ;⛪T │││ └─T='number' 567 ││├─T='+' + ││└─R=Term : Factor ;⛪T ││ └─R=Factor : 'number' ;⛪T ││ └─T='number' 89 │├─T='+' + │└─R=Term : Factor ;⛪T │ └─R=Factor : 'number' ;⛪T │ └─T='number' 0 ├─T='+' + └─R=Term : Factor ;⛪T └─R=Factor : 'number' ;⛪T └─T='number' 0从左上到右下,连续4个R显著增加了树的深度。
bitParser自动生成的语义内容提取器,会后序遍历此语法树,提取结点信息。
点击查看 通用的 遍历语法树 相关代码// Extract some data structure from syntax tree.// <para>post-order traverse <paramref name="root"/> with stack(without recursion).</para>public T? Extract(LRNode root, TokenList tokens, string sourceCode) { var context = new TContext<T>(root, tokens, sourceCode); var nodeStack = new Stack<LRNode>(); var indexStack = new Stack<int>(); // init stack. { // push nextLeft and its next pending children. var nextLeft = root; var index = 0; nodeStack.Push(nextLeft); indexStack.Push(index); while (nextLeft.children.Count > 0) { nextLeft = nextLeft.children; nodeStack.Push(nextLeft); indexStack.Push(0); } } while (nodeStack.Count > 0) { var current = nodeStack.Pop(); var index = indexStack.Pop() + 1; if (index < current.children.Count) { // push this node back again. nodeStack.Push(current); indexStack.Push(index); // push nextLeft and its next pending children. var nextLeft = current.children; nodeStack.Push(nextLeft); indexStack.Push(0); while (nextLeft.children.Count > 0) { nextLeft = nextLeft.children; nodeStack.Push(nextLeft); indexStack.Push(0); } } else { // Visit(current); if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) { action(current, context); } } } { var current = this.EOT; // Visit(current); if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) { action(current, context); } } return context.result;}点击查看 生成的 在遍历语法树时提取结点信息 相关代码private static readonly Dictionary<int/*LRNode.type*/, Action<LRNode, TContext<Exp>>> @expExtractorDict = new();private static readonly Action<LRNode, TContext<Exp>> VtHandler =(node, context) => { var token = node.start; context.objStack.Push(token);};// initialize dict for extractor.private static void InitializeExtractorDict() { var extractorDict = @expExtractorDict; extractorDict.Add(st.@Plus符, VtHandler); extractorDict.Add(st.@Dash符, VtHandler); extractorDict.Add(st.@Asterisk符, VtHandler); extractorDict.Add(st.@Slash符, VtHandler); extractorDict.Add(st.@LeftParenthesis符, VtHandler); extractorDict.Add(st.@RightParenthesis符, VtHandler); extractorDict.Add(st.@number, VtHandler); extractorDict.Add(st.@终, static (node, context) => { // [-1]=Exp' : Exp ; // dumped by ExternalExtractor var @final = (VnExp?)context.objStack.Pop(); var left = new Exp(@final); context.result = left; // final step, no need to push into stack. }); // end of extractorDict.Add(st.@终, (node, context) => { ... }); extractorDict.Add(st.Exp枝, static (node, context) => { switch (node.regulation.index) { case 0: { // =Exp : Exp '+' Term ; // dumped by ListExtractor 2 var r0 = (VnTerm?)context.objStack.Pop(); var r1 = (Token?)context.objStack.Pop(); var r2 = (VnExp?)context.objStack.Pop(); var left = r2; left.Add(r1, r0); context.objStack.Push(left); } break; case 1: { // =Exp : Exp '-' Term ; // dumped by ListExtractor 2 var r0 = (VnTerm?)context.objStack.Pop(); var r1 = (Token?)context.objStack.Pop(); var r2 = (VnExp?)context.objStack.Pop(); var left = r2; left.Add(r1, r0); context.objStack.Push(left); } break; case 2: { // =Exp : Term ; // dumped by ListExtractor 1 var r0 = (VnTerm?)context.objStack.Pop(); var left = new VnExp(r0); context.objStack.Push(left); } break; default: throw new NotImplementedException(); } }); // end of extractorDict.Add(st.Exp枝, (node, context) => { ... }); extractorDict.Add(st.Term枝, static (node, context) => { switch (node.regulation.index) { case 3: { // =Term : Term '*' Factor ; // dumped by ListExtractor 2 var r0 = (VnFactor?)context.objStack.Pop(); var r1 = (Token?)context.objStack.Pop(); var r2 = (VnTerm?)context.objStack.Pop(); var left = r2; left.Add(r1, r0); context.objStack.Push(left); } break; case 4: { // =Term : Term '/' Factor ; // dumped by ListExtractor 2 var r0 = (VnFactor?)context.objStack.Pop(); var r1 = (Token?)context.objStack.Pop(); var r2 = (VnTerm?)context.objStack.Pop(); var left = r2; left.Add(r1, r0); context.objStack.Push(left); } break; case 5: { // =Term : Factor ; // dumped by ListExtractor 1 var r0 = (VnFactor?)context.objStack.Pop(); var left = new VnTerm(r0); context.objStack.Push(left); } break; default: throw new NotImplementedException(); } }); // end of extractorDict.Add(st.Term枝, (node, context) => { ... }); extractorDict.Add(st.Factor枝, static (node, context) => { switch (node.regulation.index) { case 6: { // =Factor : '(' Exp ')' ; // dumped by DefaultExtractor var r0 = (Token?)context.objStack.Pop(); var r1 = (VnExp?)context.objStack.Pop(); var r2 = (Token?)context.objStack.Pop(); var left = new VnFactor(r2, r1, r0); context.objStack.Push(left); } break; case 7: { // =Factor : 'number' ; // dumped by DefaultExtractor var r0 = (Token?)context.objStack.Pop(); var left = new VnFactor(r0); context.objStack.Push(left); } break; default: throw new NotImplementedException(); } }); // end of extractorDict.Add(st.Factor枝, (node, context) => { ... });}提取到的结点信息,展示出来如下:
VnExp⛪T ├─VnTerm⛪T │└─VnFactor⛪T │ └─T='number' 1234 └─List T ├─Item T │├─T='+' + │└─VnTerm⛪T │ └─VnFactor⛪T │ └─T='number' 567 ├─Item T │├─T='+' + │└─VnTerm⛪T │ └─VnFactor⛪T │ └─T='number' 89 ├─Item T │├─T='+' + │└─VnTerm⛪T │ └─VnFactor⛪T │ └─T='number' 0 └─Item T ├─T='+' + └─VnTerm⛪T └─VnFactor⛪T └─T='number' 0类似Exp : Exp '+' Term ;的规约规则,会导致语法树深度过大。这一步的语义提取,作用是将此类树结构“压平”。根据压平了的语义信息,很容易将源代码格式化。
点击查看 VnExp结点的格式化 代码/// <summary>/// Correspond to the Vn node Exp in the grammar(Exp)./// </summary>internal partial class VnExp : IFullFormatter { // =Exp : Exp '+' Term ; // =Exp : Exp '-' Term ; // =Exp : Term ; private readonly VnTerm first0; public class PostItem : IFullFormatter { private readonly Token r1; private readonly VnTerm r0; public PostItem(Token r1, VnTerm r0) { this.r1 = r1; this.r0 = r0; this._scope = new TokenRange(r1, r0); } private readonly TokenRange _scope; public TokenRange Scope => _scope; public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) { context.PrintBlanksAnd(this.r1, preConfig, writer); // '+'或'-'与其左右两边的Token间隔1个空格 var config = new BlankConfig(inlineBlank: 1, forceNewline: false); context.PrintCommentsBetween(this.r1, this.r0, config, writer); context.PrintBlanksAnd(this.r0, config, writer); } } public class PostItemList : IFullFormatter { private readonly List<PostItem> list = new(); public PostItemList(PostItem item) { this.list.Add(item); this._scope = new TokenRange(item); } public void Add(PostItem item) { this.list.Add(item); this._scope.end = item.Scope.end; } private readonly TokenRange _scope; public TokenRange Scope => _scope; public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) { // '+'或'-'与其左右两边的Token间隔1个空格 var config = new BlankConfig(inlineBlank: 1, forceNewline: false); for (int i = 0; i < list.Count; i++) { if (i == 0) { context.PrintBlanksAnd(list, preConfig, writer); } else { context.PrintCommentsBetween(list, list, config, writer); context.PrintBlanksAnd(list, config, writer); } } } } private PostItemList? list; private readonly TokenRange _scope; public TokenRange Scope => _scope; internal VnExp(VnTerm first0) { this.first0 = first0; this._scope = new TokenRange(first0); } // =Exp : Exp '+' Term ; // =Exp : Exp '-' Term ; internal void Add(Token r1, VnTerm r0) { if (this.list == null) { this.list = new PostItemList(new(r1, r0)); } else { this.list.Add(new(r1, r0)); } this._scope.end = this.list.Scope.end; } public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) { context.PrintBlanksAnd(this.first0, preConfig, writer); if (this.list != null) { // '+'或'-'与其左右两边的Token间隔1个空格 var config = new BlankConfig(inlineBlank: 1, forceNewline: false); context.PrintCommentsBetween(this.first0, this.list, config, writer); context.PrintBlanksAnd(this.list, config, writer); } }}点击查看 VnTerm结点的格式化 代码/// <summary>/// Correspond to the Vn node Term in the grammar(Exp)./// </summary>internal partial class VnTerm : IFullFormatter { // =Term : Term '*' Factor ; // =Term : Term '/' Factor ; // =Term : Factor ; private readonly VnFactor first0; public class PostItem : IFullFormatter { private readonly Token r1; private readonly VnFactor r0; public PostItem(Token r1, VnFactor r0) { this.r1 = r1; this.r0 = r0; this._scope = new TokenRange(r1, r0); } private readonly TokenRange _scope; public TokenRange Scope => _scope; public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) { context.PrintBlanksAnd(this.r1, preConfig, writer); // '+'或'-'与其左右两边的Token间隔1个空格 var config = new BlankConfig(inlineBlank: 1, forceNewline: false); context.PrintCommentsBetween(this.r1, this.r0, config, writer); context.PrintBlanksAnd(this.r0, config, writer); } } public class PostItemList : IFullFormatter { private readonly List<PostItem> list = new(); public PostItemList(PostItem item) { this.list.Add(item); this._scope = new TokenRange(item); } public void Add(PostItem item) { this.list.Add(item); this._scope.end = item.Scope.end; } private readonly TokenRange _scope; public TokenRange Scope => _scope; public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) { // '*'或'/'与其左右两边的Token间隔1个空格 var config = new BlankConfig(inlineBlank: 1, forceNewline: false); for (int i = 0; i < list.Count; i++) { if (i == 0) { context.PrintBlanksAnd(list, preConfig, writer); } else { context.PrintCommentsBetween(list, list, config, writer); context.PrintBlanksAnd(list, config, writer); } } } } private PostItemList? list; private readonly TokenRange _scope; public TokenRange Scope => _scope; // =Term : Factor ; internal VnTerm(VnFactor first0) { this.first0 = first0; this._scope = new TokenRange(first0); } // =Term : Term '*' Factor ; // =Term : Term '/' Factor ; internal void Add(Token r1, VnFactor r0) { if (this.list == null) { this.list = new PostItemList(new(r1, r0)); } else { this.list.Add(new(r1, r0)); } this._scope.end = this.list.Scope.end; } public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) { context.PrintBlanksAnd(this.first0, preConfig, writer); if (this.list != null) { // '*'或'/'与其左右两边的Token间隔1个空格 var config = new BlankConfig(inlineBlank: 1, forceNewline: false); context.PrintCommentsBetween(this.first0, this.list, config, writer); context.PrintBlanksAnd(this.list, config, writer); } }}点击查看 VnFactor结点的格式化 代码/// <summary>/// Correspond to the Vn node Factor in the grammar(Exp)./// </summary>internal abstract partial class VnFactor : IFullFormatter { // =Factor : '(' Exp ')' ; // =Factor : 'number' ; public class C0 : VnFactor { // =Factor : '(' Exp ')' ; public C0(Token r2, VnExp r1, Token r0) { this.r2 = r2; this.r1 = r1; this.r0 = r0; this._scope = new TokenRange(r2, r0); } private readonly Token r2; private readonly VnExp r1; private readonly Token r0; private readonly TokenRange _scope; public override TokenRange Scope => _scope; public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) { context.PrintBlanksAnd(this.r2, preConfig, writer); // ( Exp )之间不留空格 var config = new BlankConfig(inlineBlank: 0, forceNewline: false); context.PrintCommentsBetween(this.r2, this.r1, config, writer); context.PrintBlanksAnd(this.r1, config, writer); context.PrintCommentsBetween(this.r1, this.r0, config, writer); context.PrintBlanksAnd(this.r0, config, writer); } } public class C1 : VnFactor { // =Factor : 'number' ; public C1(Token r0) { this.r0 = r0; this._scope = new TokenRange(r0); } private readonly Token r0; private readonly TokenRange _scope; public override TokenRange Scope => _scope; public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) { // 根据上级设置的preConfig输出自己唯一的token context.PrintBlanksAnd(this.r0, preConfig, writer); } } public abstract TokenRange Scope { get; } public abstract void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context);}最终,1234+567+89+0+0被格式化后的样子如下(在运算符两侧各加入1个空格):
1234 + 567 + 89 + 0 + 0下面是更多示例:
// 格式化 1-2*31 - 2 * 3// 格式化 (1+2)/3(1 + 2) / 3// 格式化 (1+2)*(3-4)(1 + 2) * (3 - 4)点击查看 GLSL代码格式化的示例1// 示例1 格式化前void main() { int a=0; a++++; ++++a;}// 示例1 格式化后void main() { int a = 0; a++ ++; ++ ++a;}点击查看 GLSL代码格式化的示例2// 示例2 格式化前in vec3passNormal;in vec2passTexCoord;uniform sampler2D textureMap;uniform vec3 lihtDirection=vec3(1,1,1);uniform vec3 diffuseColor;uniform bool transparent=false;out vec4 outColor;void main(){ if (transparent) { if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard;} if (passTexCoord==vec2(-1,-1)){ // when texture coordinate not exists.. float diffuse=max(dot(normalize(lihtDirection),normalize(passNormal)),0); outColor = vec4(diffuseColor * diffuse, 1.0); } else { outColor = texture(textureMap, passTexCoord);}}// 示例2 格式化后in vec3 passNormal;in vec2 passTexCoord;uniform sampler2D textureMap;uniform vec3 lihtDirection = vec3(1, 1, 1);uniform vec3 diffuseColor;uniform bool transparent = false;out vec4 outColor;void main() { if (transparent) { if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard; } if (passTexCoord == vec2(-1, -1)) { // when texture coordinate not exists.. float diffuse = max(dot(normalize(lihtDirection), normalize(passNormal)), 0); outColor = vec4(diffuseColor * diffuse, 1.0); } else { outColor = texture(textureMap, passTexCoord); }}关于这个格式化算法的详细介绍,可参考我的另一篇文章(GLSL Shader的格式化算法(LALR解析器))。
举例-自动解决Shift/Reduce冲突
C语言和GLSL中都有if-else悬挂问题,它是由下述产生式引起的:
selection_statement : 'if' '(' expression ')' selection_rest_statement ;selection_rest_statement : statement 'else' statement | statement ;这样,在语法分析器读到'else'这个Token时,它是应当Shift这个'else'呢,还是应当按照selection_rest_statement : statement ;进行规约呢?这就产生了Shift/Reduce冲突。
bitParser会自动选择按照Shift处理,将按照Ruduce方式处理的那一项注释掉,如下代码所示:
const int syntaxStateCount = 477;// LALR(1) syntax parse tableprivate static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[] syntaxStates = new Dictionary<string, LRParseAction>;private static void InitializeSyntaxStates() { var states = CompilerGLSL.syntaxStates; ... // 30814 actions // conflicts(1)=not sovled(0)+solved(1)(1 warnings) #region init actions of syntax states ... // syntaxStates: // selection_rest_statement : statement ⏳ 'else' statement ;☕ '--' '-' ';' '!' '(' '{' '}' '+' .. // selection_rest_statement : statement ⏳ ;☕ '--' '-' ';' '!' '(' '{' '}' '+' .. // 'else' repeated 2 times states/*28145*/.Add(st.@else, new(LRParseAction.Kind.Shift, states)); // ⚔ PreferShiftToReduce states/*28146*/.Add(st.@else, new(regulations)); states/*28147*/.Add(st.@Dash符Dash符, new(regulations)); ... #endregion init actions of syntax states}举例-优先级指令%nonassoc、%left、%right、%prec
如果我们按最直观的方式书写Calc.st,可能是这样的:
Exp : Exp '+' Exp | Exp '-' Exp | Exp '*' Exp | Exp '/' Exp | '(' Exp ')' | 'number' ;%%+%% 'number'点击查看 其LALR(1)语法分析器的状态转换表代码const int syntaxStateCount = 14;/// <summary>/// LALR(1) syntax parse table/// </summary>private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates = new Dictionary<string/*Node.type*/, LRParseAction>;private static void InitializeSyntaxStates() { var states = CompilerExp.syntaxStates; // 80 actions // conflicts(16)=not sovled(0)+solved(16)(16 warnings) #region create objects of syntax states states = new(capacity: 3); states = new(capacity: 5); states = new(capacity: 3); states = new(capacity: 6); states = new(capacity: 3); states = new(capacity: 3); states = new(capacity: 3); states = new(capacity: 3); states = new(capacity: 5); states = new(capacity: 6); states = new(capacity: 6); states = new(capacity: 6); states = new(capacity: 6); states = new(capacity: 6); #endregion create objects of syntax states #region re-used actions LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aReduce5 = new(regulations);// refered 6 times LRParseAction aReduce0 = new(regulations);// refered 6 times LRParseAction aReduce1 = new(regulations);// refered 6 times LRParseAction aReduce2 = new(regulations);// refered 6 times LRParseAction aReduce3 = new(regulations);// refered 6 times LRParseAction aReduce4 = new(regulations);// refered 6 times #endregion re-used actions // 80 actions // conflicts(16)=not sovled(0)+solved(16)(16 warnings) #region init actions of syntax states // syntaxStates: // [-1] Exp' : ⏳ Exp ;☕ '¥' // Exp : ⏳ Exp '+' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : ⏳ Exp '-' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : ⏳ Exp '*' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : ⏳ Exp '/' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : ⏳ '(' Exp ')' ;☕ '-' '*' '/' '+' '¥' // Exp : ⏳ 'number' ;☕ '-' '*' '/' '+' '¥' states/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states)); states/*1*/.Add(st.@LeftParenthesis符, aShift2); states/*2*/.Add(st.@number, aShift3); // syntaxStates: // [-1] Exp' : Exp ⏳ ;☕ '¥' // Exp : Exp ⏳ '+' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : Exp ⏳ '-' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : Exp ⏳ '*' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : Exp ⏳ '/' Exp ;☕ '-' '*' '/' '+' '¥' states/*3*/.Add(st.@Plus符, aShift4); states/*4*/.Add(st.@Dash符, aShift5); states/*5*/.Add(st.@Asterisk符, aShift6); states/*6*/.Add(st.@Slash符, aShift7); states/*7*/.Add(st.@终, LRParseAction.accept); // syntaxStates: // Exp : '(' ⏳ Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' // Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' // Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' // Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' // Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' // Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' states/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states)); states/*9*/.Add(st.@LeftParenthesis符, aShift2); states/*10*/.Add(st.@number, aShift3); // syntaxStates: // Exp : 'number' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' states/*11*/.Add(st.@Dash符, aReduce5); states/*12*/.Add(st.@RightParenthesis符, aReduce5); states/*13*/.Add(st.@Asterisk符, aReduce5); states/*14*/.Add(st.@Slash符, aReduce5); states/*15*/.Add(st.@Plus符, aReduce5); states/*16*/.Add(st.@终, aReduce5); // syntaxStates: // Exp : Exp '+' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' states/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states)); states/*18*/.Add(st.@LeftParenthesis符, aShift2); states/*19*/.Add(st.@number, aShift3); // syntaxStates: // Exp : Exp '-' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' states/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states)); states/*21*/.Add(st.@LeftParenthesis符, aShift2); states/*22*/.Add(st.@number, aShift3); // syntaxStates: // Exp : Exp '*' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' states/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states)); states/*24*/.Add(st.@LeftParenthesis符, aShift2); states/*25*/.Add(st.@number, aShift3); // syntaxStates: // Exp : Exp '/' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' states/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states)); states/*27*/.Add(st.@LeftParenthesis符, aShift2); states/*28*/.Add(st.@number, aShift3); // syntaxStates: // Exp : '(' Exp ⏳ ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' // Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' // Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' // Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' states/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states)); states/*30*/.Add(st.@Plus符, aShift4); states/*31*/.Add(st.@Dash符, aShift5); states/*32*/.Add(st.@Asterisk符, aShift6); states/*33*/.Add(st.@Slash符, aShift7); // syntaxStates: // Exp : Exp '+' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // '+' repeated 2 times states/*34*/.Add(st.@Plus符, aShift4); // ⚔ PreferShiftToReduce states/*35*/.Add(st.@Plus符, aReduce0); // '-' repeated 2 times states/*36*/.Add(st.@Dash符, aShift5); // ⚔ PreferShiftToReduce states/*37*/.Add(st.@Dash符, aReduce0); // '*' repeated 2 times states/*38*/.Add(st.@Asterisk符, aShift6); // ⚔ PreferShiftToReduce states/*39*/.Add(st.@Asterisk符, aReduce0); // '/' repeated 2 times states/*40*/.Add(st.@Slash符, aShift7); // ⚔ PreferShiftToReduce states/*41*/.Add(st.@Slash符, aReduce0); states/*42*/.Add(st.@RightParenthesis符, aReduce0); states/*43*/.Add(st.@终, aReduce0); // syntaxStates: // Exp : Exp '-' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // '+' repeated 2 times states/*44*/.Add(st.@Plus符, aShift4); // ⚔ PreferShiftToReduce states/*45*/.Add(st.@Plus符, aReduce1); // '-' repeated 2 times states/*46*/.Add(st.@Dash符, aShift5); // ⚔ PreferShiftToReduce states/*47*/.Add(st.@Dash符, aReduce1); // '*' repeated 2 times states/*48*/.Add(st.@Asterisk符, aShift6); // ⚔ PreferShiftToReduce states/*49*/.Add(st.@Asterisk符, aReduce1); // '/' repeated 2 times states/*50*/.Add(st.@Slash符, aShift7); // ⚔ PreferShiftToReduce states/*51*/.Add(st.@Slash符, aReduce1); states/*52*/.Add(st.@RightParenthesis符, aReduce1); states/*53*/.Add(st.@终, aReduce1); // syntaxStates: // Exp : Exp '*' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // '+' repeated 2 times states/*54*/.Add(st.@Plus符, aShift4); // ⚔ PreferShiftToReduce states/*55*/.Add(st.@Plus符, aReduce2); // '-' repeated 2 times states/*56*/.Add(st.@Dash符, aShift5); // ⚔ PreferShiftToReduce states/*57*/.Add(st.@Dash符, aReduce2); // '*' repeated 2 times states/*58*/.Add(st.@Asterisk符, aShift6); // ⚔ PreferShiftToReduce states/*59*/.Add(st.@Asterisk符, aReduce2); // '/' repeated 2 times states/*60*/.Add(st.@Slash符, aShift7); // ⚔ PreferShiftToReduce states/*61*/.Add(st.@Slash符, aReduce2); states/*62*/.Add(st.@RightParenthesis符, aReduce2); states/*63*/.Add(st.@终, aReduce2); // syntaxStates: // Exp : Exp '/' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // '+' repeated 2 times states/*64*/.Add(st.@Plus符, aShift4); // ⚔ PreferShiftToReduce states/*65*/.Add(st.@Plus符, aReduce3); // '-' repeated 2 times states/*66*/.Add(st.@Dash符, aShift5); // ⚔ PreferShiftToReduce states/*67*/.Add(st.@Dash符, aReduce3); // '*' repeated 2 times states/*68*/.Add(st.@Asterisk符, aShift6); // ⚔ PreferShiftToReduce states/*69*/.Add(st.@Asterisk符, aReduce3); // '/' repeated 2 times states/*70*/.Add(st.@Slash符, aShift7); // ⚔ PreferShiftToReduce states/*71*/.Add(st.@Slash符, aReduce3); states/*72*/.Add(st.@RightParenthesis符, aReduce3); states/*73*/.Add(st.@终, aReduce3); // syntaxStates: // Exp : '(' Exp ')' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' states/*74*/.Add(st.@Dash符, aReduce4); states/*75*/.Add(st.@RightParenthesis符, aReduce4); states/*76*/.Add(st.@Asterisk符, aReduce4); states/*77*/.Add(st.@Slash符, aReduce4); states/*78*/.Add(st.@Plus符, aReduce4); states/*79*/.Add(st.@终, aReduce4); #endregion init actions of syntax states}当处于syntaxStates时,如果遇到'+'这个Token,那么本应当Reduce,但是bitParser根据默认的“Shift优先于Reduce”的原则,选择了Shift。显然,这无法正确处理加减运算和乘除运算的优先关系。
如果不想将文法改写为本文最初的样式,这里可以用优先级指令声明加减乘除运算的优先级,从而得到正确的语法分析表。
Exp : Exp '+' Exp | Exp '-' Exp | Exp '*' Exp | Exp '/' Exp | '(' Exp ')' | 'number' ;%%+%% 'number'%left '+' '-' // '+' '-' 的优先级相同,且偏向于Reduce%left '*' '/' // '*' '/' 的优先级相同,且高于'+' '-',且偏向于Reduce点击查看 使用了优先级指令的LALR(1)语法分析器的状态转换表 代码const int syntaxStateCount = 14;/// <summary>/// LALR(1) syntax parse table/// </summary>private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates = new Dictionary<string/*Node.type*/, LRParseAction>;private static void InitializeSyntaxStates() { var states = CompilerExp.syntaxStates; // 80 actions // conflicts(16)=not sovled(0)+solved(16)(0 warnings) #region create objects of syntax states states = new(capacity: 3); states = new(capacity: 5); states = new(capacity: 3); states = new(capacity: 6); states = new(capacity: 3); states = new(capacity: 3); states = new(capacity: 3); states = new(capacity: 3); states = new(capacity: 5); states = new(capacity: 6); states = new(capacity: 6); states = new(capacity: 6); states = new(capacity: 6); states = new(capacity: 6); #endregion create objects of syntax states #region re-used actions LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states);// refered 6 times LRParseAction aReduce5 = new(regulations);// refered 6 times LRParseAction aReduce0 = new(regulations);// refered 6 times LRParseAction aReduce1 = new(regulations);// refered 6 times LRParseAction aReduce2 = new(regulations);// refered 6 times LRParseAction aReduce3 = new(regulations);// refered 6 times LRParseAction aReduce4 = new(regulations);// refered 6 times #endregion re-used actions // 80 actions // conflicts(16)=not sovled(0)+solved(16)(0 warnings) #region init actions of syntax states // syntaxStates: // [-1] Exp' : ⏳ Exp ;☕ '¥' // Exp : ⏳ Exp '+' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : ⏳ Exp '-' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : ⏳ Exp '*' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : ⏳ Exp '/' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : ⏳ '(' Exp ')' ;☕ '-' '*' '/' '+' '¥' // Exp : ⏳ 'number' ;☕ '-' '*' '/' '+' '¥' states/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states)); states/*1*/.Add(st.@LeftParenthesis符, aShift2); states/*2*/.Add(st.@number, aShift3); // syntaxStates: // [-1] Exp' : Exp ⏳ ;☕ '¥' // Exp : Exp ⏳ '+' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : Exp ⏳ '-' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : Exp ⏳ '*' Exp ;☕ '-' '*' '/' '+' '¥' // Exp : Exp ⏳ '/' Exp ;☕ '-' '*' '/' '+' '¥' states/*3*/.Add(st.@Plus符, aShift4); states/*4*/.Add(st.@Dash符, aShift5); states/*5*/.Add(st.@Asterisk符, aShift6); states/*6*/.Add(st.@Slash符, aShift7); states/*7*/.Add(st.@终, LRParseAction.accept); // syntaxStates: // Exp : '(' ⏳ Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' // Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' // Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' // Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' // Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' // Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' states/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states)); states/*9*/.Add(st.@LeftParenthesis符, aShift2); states/*10*/.Add(st.@number, aShift3); // syntaxStates: // Exp : 'number' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' states/*11*/.Add(st.@Dash符, aReduce5); states/*12*/.Add(st.@RightParenthesis符, aReduce5); states/*13*/.Add(st.@Asterisk符, aReduce5); states/*14*/.Add(st.@Slash符, aReduce5); states/*15*/.Add(st.@Plus符, aReduce5); states/*16*/.Add(st.@终, aReduce5); // syntaxStates: // Exp : Exp '+' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' states/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states)); states/*18*/.Add(st.@LeftParenthesis符, aShift2); states/*19*/.Add(st.@number, aShift3); // syntaxStates: // Exp : Exp '-' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' states/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states)); states/*21*/.Add(st.@LeftParenthesis符, aShift2); states/*22*/.Add(st.@number, aShift3); // syntaxStates: // Exp : Exp '*' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' states/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states)); states/*24*/.Add(st.@LeftParenthesis符, aShift2); states/*25*/.Add(st.@number, aShift3); // syntaxStates: // Exp : Exp '/' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' states/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states)); states/*27*/.Add(st.@LeftParenthesis符, aShift2); states/*28*/.Add(st.@number, aShift3); // syntaxStates: // Exp : '(' Exp ⏳ ')' ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' // Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' // Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' // Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' states/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states)); states/*30*/.Add(st.@Plus符, aShift4); states/*31*/.Add(st.@Dash符, aShift5); states/*32*/.Add(st.@Asterisk符, aShift6); states/*33*/.Add(st.@Slash符, aShift7); // syntaxStates: // Exp : Exp '+' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // '+' repeated 2 times // ⚔ LeftShouldReduce states/*34*/.Add(st.@Plus符, aShift4); states/*35*/.Add(st.@Plus符, aReduce0); // '-' repeated 2 times // ⚔ LeftShouldReduce states/*36*/.Add(st.@Dash符, aShift5); states/*37*/.Add(st.@Dash符, aReduce0); // '*' repeated 2 times states/*38*/.Add(st.@Asterisk符, aShift6); // ⚔ LowPrecedence states/*39*/.Add(st.@Asterisk符, aReduce0); // '/' repeated 2 times states/*40*/.Add(st.@Slash符, aShift7); // ⚔ LowPrecedence states/*41*/.Add(st.@Slash符, aReduce0); states/*42*/.Add(st.@RightParenthesis符, aReduce0); states/*43*/.Add(st.@终, aReduce0); // syntaxStates: // Exp : Exp '-' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // '+' repeated 2 times // ⚔ LeftShouldReduce states/*44*/.Add(st.@Plus符, aShift4); states/*45*/.Add(st.@Plus符, aReduce1); // '-' repeated 2 times // ⚔ LeftShouldReduce states/*46*/.Add(st.@Dash符, aShift5); states/*47*/.Add(st.@Dash符, aReduce1); // '*' repeated 2 times states/*48*/.Add(st.@Asterisk符, aShift6); // ⚔ LowPrecedence states/*49*/.Add(st.@Asterisk符, aReduce1); // '/' repeated 2 times states/*50*/.Add(st.@Slash符, aShift7); // ⚔ LowPrecedence states/*51*/.Add(st.@Slash符, aReduce1); states/*52*/.Add(st.@RightParenthesis符, aReduce1); states/*53*/.Add(st.@终, aReduce1); // syntaxStates: // Exp : Exp '*' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // '+' repeated 2 times // ⚔ LowPrecedence states/*54*/.Add(st.@Plus符, aShift4); states/*55*/.Add(st.@Plus符, aReduce2); // '-' repeated 2 times // ⚔ LowPrecedence states/*56*/.Add(st.@Dash符, aShift5); states/*57*/.Add(st.@Dash符, aReduce2); // '*' repeated 2 times // ⚔ LeftShouldReduce states/*58*/.Add(st.@Asterisk符, aShift6); states/*59*/.Add(st.@Asterisk符, aReduce2); // '/' repeated 2 times // ⚔ LeftShouldReduce states/*60*/.Add(st.@Slash符, aShift7); states/*61*/.Add(st.@Slash符, aReduce2); states/*62*/.Add(st.@RightParenthesis符, aReduce2); states/*63*/.Add(st.@终, aReduce2); // syntaxStates: // Exp : Exp '/' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' // Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' // '+' repeated 2 times // ⚔ LowPrecedence states/*64*/.Add(st.@Plus符, aShift4); states/*65*/.Add(st.@Plus符, aReduce3); // '-' repeated 2 times // ⚔ LowPrecedence states/*66*/.Add(st.@Dash符, aShift5); states/*67*/.Add(st.@Dash符, aReduce3); // '*' repeated 2 times // ⚔ LeftShouldReduce states/*68*/.Add(st.@Asterisk符, aShift6); states/*69*/.Add(st.@Asterisk符, aReduce3); // '/' repeated 2 times // ⚔ LeftShouldReduce states/*70*/.Add(st.@Slash符, aShift7); states/*71*/.Add(st.@Slash符, aReduce3); states/*72*/.Add(st.@RightParenthesis符, aReduce3); states/*73*/.Add(st.@终, aReduce3); // syntaxStates: // Exp : '(' Exp ')' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' states/*74*/.Add(st.@Dash符, aReduce4); states/*75*/.Add(st.@RightParenthesis符, aReduce4); states/*76*/.Add(st.@Asterisk符, aReduce4); states/*77*/.Add(st.@Slash符, aReduce4); states/*78*/.Add(st.@Plus符, aReduce4); states/*79*/.Add(st.@终, aReduce4); #endregion init actions of syntax states}bitParser中的优先级指令%nonassoc、%left、%right、%prec,与yacc相同:
[*]书写顺序靠后的,优先级更高;
[*]%left偏向于Reduce;
[*]%right偏向于Shift;
[*]%nonassoc表示有语法错误;
[*]%prec可以特别指定一个Token类型,以使用其优先级,而不是采用默认的(规约规则最右边Vt)的优先级。而且可以指定文法中不存在的Vt(即此Vt纯粹是个占位符)。
举例-/后缀
在Step格式的文件里,1=2中的1应当被认为是一个'entityId',而2应当被认为是一个'refEntity',即对“entity 2”的引用。
如何区别两者呢?当某个数值后面有=时,它就是'entityId',否则就是'refEntity'。此时就需要用/后缀功能,如下所示:
// postfix.st// regulations:Items : Items Item | Item ;Item : 'entityId' '=' 'refEntity' ;// lexi statements:%%+/[ \t]*=%% 'entityId' // 数字后跟随着“ =”,那么这些数字就是'entityId'%%+%% 'refEntity' 点击查看 生成的 lexi状态0 相关代码// lexicalState0private static readonly Action<LexicalContext, char> lexicalState0 =static (context, c) => { if (false) { /* for simpler code generation purpose. */ } /* = */ else if (/* possible Vt : '=' */ c == '='/*'\u003D'(61)*/) { BeginToken(context); ExtendToken(context); context.currentState = lexicalState2; } /* */ else if (/* possible Vt : 'entityId' 'refEntity' */ '0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) { BeginToken(context); ExtendToken(context); context.currentState = lexicalState3; } /* deal with everything else. */ else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') { context.currentState = lexicalState0; // skip them. } else { // unexpected char. BeginToken(context); ExtendToken(context); AcceptToken(st.Error, context); context.currentState = lexicalState0; }};点击查看 生成的 lexi状态1 相关代码// lexicalState1private static readonly Action<LexicalContext, char> lexicalState1 =static (context, c) => { if (false) { /* for simpler code generation purpose. */ } /* = */ else if (/* possible Vt : 'entityId' */ c == '='/*'\u003D'(61)*/) { context.currentState = lexicalState4; } /* [ \t] */ else if (/* possible Vt : 'entityId' */ (c == ' '/*'\u0020'(32)*/) || (c == '\t'/*'\u0009'(9)*/)) { context.currentState = lexicalState1; } /* deal with everything else. */ else { // token with error type ExtendToken(context); AcceptToken(st.Error, context); context.currentState = lexicalState0; }};点击查看 生成的 lexi状态2 相关代码// lexicalState2private static readonly Action<LexicalContext, char> lexicalState2 =static (context, c) => { if (false) { /* for simpler code generation purpose. */ } /* deal with everything else. */ else { AcceptToken(st.@Equal符, context); context.currentState = lexicalState0; }};点击查看 生成的 lexi状态3 相关代码// lexicalState3private static readonly Action<LexicalContext, char> lexicalState3 =static (context, c) => { if (false) { /* for simpler code generation purpose. */ } /* = */ else if (/* possible Vt : 'entityId' */ c == '='/*'\u003D'(61)*/) { context.currentState = lexicalState4; } /* [ \t] */ else if (/* possible Vt : 'entityId' */ (c == ' '/*'\u0020'(32)*/) || (c == '\t'/*'\u0009'(9)*/)) { context.currentState = lexicalState1; } /* */ else if (/* possible Vt : 'entityId' 'refEntity' */ '0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) { ExtendToken(context); context.currentState = lexicalState3; } /* deal with everything else. */ else { AcceptToken(st.@refEntity, context); context.currentState = lexicalState0; }};点击查看 生成的 lexi状态4 相关代码// lexicalState4private static readonly Action<LexicalContext, char> lexicalState4 =static (context, c) => { if (false) { /* for simpler code generation purpose. */ } /* deal with everything else. */ else { AcceptToken(st.@entityId, context); context.currentState = lexicalState0; }};下面是二维数组ElseIf[][]形式的状态转换表,它复用了一些ElseIf对象,这进一步减少了空间占用。
点击查看 生成的 二维数组状态转换表 代码private static readonly ElseIf[][] lexiStates = new ElseIf[];static void InitializeLexiTable() { ElseIf s9_9_0_1 = new('\t'/*'\u0009'(9)*/, Acts.None, 1);//refered 2 times ElseIf s32_32_0_1 = new(' '/*'\u0020'(32)*/, Acts.None, 1);//refered 2 times ElseIf s61_61_0_4 = new('='/*'\u003D'(61)*/, Acts.None, 4);//refered 2 times lexiStates = new ElseIf[] { // possible Vt: 'entityId' 'refEntity' /*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 3), // possible Vt: '=' /*1*/new('='/*'\u003D'(61)*/, Acts.Begin | Acts.Extend, 2), }; lexiStates = new ElseIf[] { // possible Vt: 'entityId' /*0*/ //new('\t'/*'\u0009'(9)*/, Acts.None, 1), /*0*/s9_9_0_1, // possible Vt: 'entityId' /*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1), /*1*/s32_32_0_1, // possible Vt: 'entityId' /*2*///new('='/*'\u003D'(61)*/, Acts.None, 4), /*2*/s61_61_0_4, }; lexiStates = new ElseIf[] { // possible Vt: '=' /*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Equal符), }; lexiStates = new ElseIf[] { // possible Vt: 'entityId' /*0*///new('\t'/*'\u0009'(9)*/, Acts.None, 1), /*0*/s9_9_0_1, // possible Vt: 'entityId' /*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1), /*1*/s32_32_0_1, // possible Vt: 'entityId' 'refEntity' /*2*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 3), // possible Vt: 'entityId' /*3*///new('='/*'\u003D'(61)*/, Acts.None, 4), /*3*/s61_61_0_4, // possible Vt: 'refEntity' /*4*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@refEntity), }; lexiStates = new ElseIf[] { // possible Vt: 'entityId' /*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@entityId), };}其miniDFA的状态图如下:
结合代码和状态图容易发现:
[*]当处于miniDFA3状态时,如果遇到空格、\t、=则可断定下一个Token类型是'entityId'(或错误类型'Error错');否则就是'refEntity'(并返回miniDFA0状态)。这符合我们的预期。
[*]当处于miniDFA0状态(即初始状态)时,如果遇到,那么下一个Token类型既可能是'entityId',又可能是'refEntity'。此时尚且无法唯一断定之。
也就是说,这个例子也包含了将NFA转换为DFA的问题,因为它的两种Token'entityId'和'refEntity'的开头+是相同的。
我们来对比它的NFA和DFA:
下面是它的NFA状态图:
下面是它的DFA状态图:
可见:
[*]在NFA状态图中,NFA0-0状态在遇到时,有两个选择:NFA2-1状态和NFA3-1状态。在对应的DFA状态图中,NFA2-1状态和NFA3-1状态就被合并到了一个DFA状态(DFA2)中。这是通过子集构造法实现的。
这里顺便展示一个构造miniDFA的例子:
// regulations:Left : 'min' ;// lexical statements%%*/,%% 'min' //标识符后跟',',则为'min'这个'min'的DFA如下:
这个'min'的miniDFA如下:
可见,miniDFA合并了DFA中的DFA1状态和DFA3状态。它们两个能够合并,是因为它们两个在读到相同的char时均会跳转到相同的下一个状态。这是通过Hopcroft算法实现的。
举例-前缀<'Vt'>
在GLSL中,为了方便语法分析,我需要将struct Point { float x; float y; }中的Point识别为一个“type_name”类型的Token,而不是“identifier”类型的Token。这可以通过前缀来实现。
// ..struct_specifier : 'struct' 'type_name' '{' struct_declaration_list '}' ;// ..// lexical statements%%<'struct'>*%% 'type_name' // 跟在struct之后的Token应当被设定为“type_name”类型%%*%% 'identifier' // 平时应当被设定为“identifier”类型添加前缀<'struct'>不会影响词法分析器的状态构成,只会在设置Token类型时看一看上一个Token是不是“struct”类型:若是,则设置为“type_name”类型,否则,设置为“identifier”类型。
另外,还需要新增一个数组,用于记录已识别出的全部“type_name”类型,以便再次遇到它时,也能够将其设置为“type_name”类型。
点击查看 状态机受前缀影响的部分 代码// lexicalState1private static readonly Action<LexicalContext, char> lexicalState1 =static (context, c) => { if (false) { /* for simpler code generation purpose. */ } /* */ else if (/* possible Vt : 'type_name' 'identifier' */ ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z') || ('0' <= c && c <= '9') || (c == '_')) { ExtendToken(context); context.currentState = lexicalState3; } /* deal with everything else. */ else { AcceptToken2(context // 如果上一个Token是struct,那么新Token是type_name , new(/*<'Vt'>*/st.@struct, st.@type_name) // 否则,新Token是identifier , new(/*default preVt*/string.Empty, st.@identifier)); context.currentState = lexicalState0; }};与之配套的AcceptToken2(context, ifVts);函数也就复杂些:
点击查看 void AcceptToken2(LexicalContext context, params IfVt[] ifVts); 代码struct IfVt { public readonly string preVt; public readonly string Vt; public IfVt(string preVt, string Vt) { this.preVt = preVt; this.Vt = Vt; }}private static void AcceptToken2(LexicalContext context, params IfVt[] ifVts) { var startIndex = context.analyzingToken.start.index; var endIndex = context.analyzingToken.end.index; context.analyzingToken.value = context.sourceCode.Substring( startIndex, endIndex - startIndex + 1); var typeSet = false; const string key = "type_name"; var hadThisTypeName = false; if (!typeSet) { if (context.tagDict.TryGetValue(key, out var type_nameList)) { var list = type_nameList as List<string>; if (list.Contains(context.analyzingToken.value)) { // 如果是已识别出的type_name context.analyzingToken.type = st.type_name; typeSet = true; hadThisTypeName = true; } } } if (!typeSet) { int lastType = 0; if (context.lastSyntaxValidToken != null) { lastType = context.lastSyntaxValidToken.type; } for (var i = 0; i < ifVts.Length; i++) { var ifVt = ifVts; if (ifVt.preVt == 0 // 默认没有前缀 || ifVt.preVt == lastType) { // 匹配到了前缀<'Vt'> context.analyzingToken.type = ifVt.Vt; typeSet = true; break; } } } if (!typeSet) { // we failed to assign type according to lexi statements. // this indicates token error in source code or inappropriate lexi statements. context.analyzingToken.type = st.Error错; context.signalCondition = LexicalContext.defaultSignal; } // cancel forward steps for post-regex var backStep = context.cursor.index - context.analyzingToken.end.index; if (backStep > 0) { context.MoveBack(backStep); } // next operation: context.MoveForward(); var token = context.analyzingToken.Dump(); context.result.Add(token); // 跳过注释 if (context.analyzingToken.type != st.blockComment && context.analyzingToken.type != st.inlineComment) { context.lastSyntaxValidToken = token; } if (!hadThisTypeName && context.analyzingToken.type == st.type_name) { // 将新识别出的type_name加入list if (!context.tagDict.TryGetValue(key, out var type_nameList)) { type_nameList = new List<string>(); context.tagDict.Add(key, type_nameList); } var list = type_nameList as List<string>; list.Add(context.analyzingToken.value); }}注意,语法分析不需要blockComment和inlineComment(类型为“注释”的Token),因而在记录上一个Token类型时,我们要跳过注释。
举例-状态信号<signal1, signal2, ..>
在GLSL中,为了方便语法分析,我需要将subroutine ( r1, r2 )中的r1、r2都识别为“type_name”类型的Token,而不是“identifier”类型的Token。这无法通过前缀实现,但可以通过状态信号LexicalContext.signal实现。
状态信号是这样起作用的:
在读到一个“subroutine”类型的Token时,将LexicalContext.signal设置为subroutine0;
在读到一个“(”类型的Token时,如果LexicalContext.signal是subroutine0,就将LexicalContext.signal设置为subroutine1;
在读到一个符合*的标识符时,如果LexicalContext.signal是subroutine1(这说明词法分析器刚刚连续读到了'subroutine' '('),就将它识别为“type_name”类型的Token,否则识别为“identifier”类型的Token;
在读到一个“)”类型的Token时,如果LexicalContext.signal是subroutine1,就将LexicalContext.signal设置为default(默认状态),即不再理会状态信号。
storage_qualifier : | 'subroutine' '(' type_name_list ')' ;// lexical statements%%subroutine%% 'subroutine' subroutine0<subroutine0>%%[(]%% '(' subroutine1<subroutine1>%%*%% 'type_name'<subroutine1>%%[,]%% ','<subroutine1>%%[)]%% ')' default状态信号,是在词法分析器这个状态机的基础上又附加了一个状态机,因而其应用比较复杂,容易出错,应当尽量少用。
举例-中文字符
如果我想识别出一个文本文件中的全部汉字(假设汉字字符位于\u4E00和\u9FFF之间),可以这样:
Items : Items Item | Item ;Item : 'chineseChar' | 'other' ;%%[\u4E00-\u9FFF]%% 'chineseChar' %%[^\u4E00-\u9FFF]%% 'other'End
页:
[1]