artefaktur
software engineer &        architecture

 
 
 
 

AAL parser

| Intentional Programming | ACI Concept | AAL definition | AAL syntax | AAL parser | Object Representation | OpCode | Reference Links |

Design of the AAL extensible Parser.



Content of this chapter:

   Overview
   Scanner
   Symbol Table, Type Library and Type Scope
     SymbolTable
     Unresolved TypeDefinitions
     Correlation between SymbolTable and Interpreter Env
   Parser
     BNF
     Sample: Lisp Core
     Problem left recursion
     Terminals
     Regular Expression
     Classloading of Rules
     Embedding Code in Syntax
   Interpreter and Generator
     Interpreter
       Interpreter and SymbolTable
     OpCode Generator
       OpCode Details



 Overview

  • Parser should be extensible at runtime.
  • Parser will be defined by Code-Fragments. These Code-Fragments defines the syntax-grammar, the connection to the SymbolTable, the execution in a interpreter environment, the translation to intermediate code.
    The Code implementation can be reused with a different Grammar.
    For example, the standard implementation of AdditiveExpr expects 3 sub Code-Fragments, with first argument as Expression, operator (+ or -) and second argument as Expression. The same Code instance can be used for interpretation and intermediate code generation with different language syntax. So most code for a new language can be reused.

 Scanner

  • Lookahead, stacked
  • Scanner is not independ, but part of the parser (ParseNode)

 Symbol Table, Type Library and Type Scope

 SymbolTable

For each declarative element (types, functions) an element will be inserted into the Symbol Table.

// global static dictionary
namespace acdk { 
  // acdk.foo_0
  void foo()
  {
    // dictionary acdk.foo_0.X
    class X {}
    
  }
}


Variables may be declared in the same SymbolTable:
void foo(int i)
{
  int j = i;
}
may be:
SymbolTable.foo_i[0].i
SymbolTable.foo_i[1].j

Sub-Blocks will have a link to the parent scope
void foo()
{
  int i = i;
  {
    int j = i;
    {
      int i = j;
    }
  }
}

Basic Structure of SymbolTable:
class TypeDefinition { RString name; RDClazzInfo clazzInfo };
class VarDefinition { RString name; RTypeDefinition typeDef; };

class SymbolTable
{
  // String -> TypeDefinition | VarDefinition 
  HashMap _typeMap;
  HashMap _varMap;
  RSymbolTableArray _seealso;
};

Dynamic dictionary may also be a static Dictionary link.
namespace acdk {
  class X {}
}
namespace other {
  // implicit block, which add 
  // an entry in the SeeAlso list of the block
  use acdk;
}

 Unresolved TypeDefinitions

To enable soft typing of VarDefintions, an TypeDefinition may not resolved. This means, in current compiler state, the TypeDefinition is currently not known.

 Correlation between SymbolTable and Interpreter Env

Each Code can have an associated SymbolTable, which will setup in the postParse() step of compilation.

The EvalScope Holds Vars, which contains a VarDefinition and a ScriptVar which holds the Value.

 Parser

Using a LL(n) parser:
Pro:
  • Extending the parser at runtime probably difficult for an LR parser.
  • Rules for an LR parser are more difficult to understand.
Con:
  • Left recursion is not allowed.
  • Less Performance.


 BNF


The acdk::aci::parser::SyntaxParser uses BNF sytax:
# Rulename is the NodeName of the SyntaxParser, the right size of the
# rule defintion is the syntax of the rule
RuleName : RuleDef

# RuleDef is true if 'text' (terminal token) can find in input stream
RuleName: 'text'

# RuleDef is true if RulDef1 and following RulDef2 is true
RuleDef: RulDef1 RulDef2

# RuleDef is true if RulDef1 or RulDef2 is true
# if first RuleDef1 is true, RuleDef2 will not be 
# evaluated
RuleDef: RuleDef1 | RuleDef2


# Rules can be grouped
RuleDef: ( RulDef1 )

 
# Rules in [ ] are optional
RuleDef: RuleDef1 [ RulDef1 ]


# Rules in ( )* can occours 0 - n times
RuleDef: RuleDef1 ( RulDef1 )*


# Rules in ( )* can occours 1 - n times
RuleDef: RuleDef1 ( RulDef1 )+

 Sample: Lisp Core

LispText : ( List | Element )
List : '(' ( Element )* ')'
Element : [ '\'' ] ( List | Atom )
Atom : Identifier | Constant
Some extension controls generating the Syntax Tree:

# the $ saves the RuleDef as node
# note RuleDef will only saved if RuleDef2 matches
RuleDef: RuleDef1 | RuleDef2 $
# If you want to save the rule if 
RuleDef: RuleDef1 $ | RuleDef2 $ 
#save if RuleDef1 or RuleDef2 matches or alternativ:
RuleDef: ( RuleDef1 | RuleDef2) $

# % saves RuleDef if last evaluated group (here RuleDef2)
# will return a Node
RuleDef: RuleDef1 [ RuleDef2 ] %


# Same as above
RuleDef: RuleDef1 ( RuleDef2 )* %

 Problem left recursion

Left recursion results in not terminating loops:
A: A [ '+' B ] // not working

A: B [ '+' A ] // terminates

 Terminals

To enable lookahead in the syntax, the rules which represents terminals should have a rule name (nodeName) with only capital letters should be derived from acdk::aci::parser::TerminalParseNode.

 Regular Expression

A syntax can also use regular expression for matching
Rule : /matchExpr/
What should the parsed content of the Rule?

 Classloading of Rules

If a rule cannot be found, the compiler tries to load the Rule via ClassLoader.
Rule : acdk.aci.parser.IdentifierParseNode
Currently not implemented

 Embedding Code in Syntax

The text quoted with back ticks will be evaluated with the CfgScript language.
Rule: RuleA !{ CfgScript Code }!
Inside the code following variables are defined:
  • compiler: acdk::aci::Compiler, the current compiler
  • thisnode: acdk::aci::parser::ParseNode, the current parse node.
  • ...
Note: The code "CfgScript Code" will only be executed, if RuleA is already parsed successfully. This is not implemented yet.

 Interpreter and Generator

The interpreter executes the Code at evaluation time. The Generator(s) generates OpCodes for the given Code. The interpreter branches through the Code tree multiple times (loop, function calls). The Generator only iterate once throgh all Code nodes. The OpCodes will be added to the Code node. Maybe it makes sense, that there is no special interpreter step (Code.execute()) but a Code generates its OpCode (if not already done) and executes simply this OpCode.

 Interpreter

To enable evaluation of Code at compile time (and extentions of grammar at compile time) the mayor Code nodes has to implement the execute method. After the ParseNode's has created the AST (ParseNode.parse()) as Code nodes, the SymbolTable will be build in in the Code.postParse() step. After then the Code.execute() function of the root node will be called.

 Interpreter and SymbolTable

The first call of a Code with associated SymbolTable should register all user defined classes into the ACDK DMI. Every call to a Code with associated SymbolTable should create the local variables on the stack.

 OpCode Generator

The OpCode should regard following issues:
  • Can be executed directly by AAL Virtual Machine.
  • Can be stored into a file (including meta information/SymbolTable)
  • Can be transformed into:
    • Java Byte Code
    • .NET ILASM
    • other ?

 OpCode Details

The OpCode operates only on the stack. Open Issues:
  • The definitions of Metainfo also as OpCode (ILASM like) or as associated ressource (JavaVM).
  • Resolution of method invoke, polymorphic functions, parameter matching.
 
Last modified 2005-05-08 22:35 by SYSTEM By Artefaktur, Ing. Bureau Kommer