The Parser Service Class
The service class
Parser analyses character strings in a known computer language,
according to the rules of that language. At the present time there are two language types
processed by this class,
Basic and the class of contemporary OOP languages referred
to here simply as
Java.
The field commun
Prototype
The
commun field contains current statement being parsed.
The field fileType
Prototype
The
fileType field specifies the type of file being processed.
The field icol
Prototype
The
icol field specifies the current character column.
The field ierc
Prototype
The
ierc field specifies the character starting column of last token.
The field iorec
Prototype
The
iorec field specifies the current record number within the source file.
The field iostate
Prototype
The
iostate field specifies the record number of current statement start.
The field lcol
Prototype
The
lcol field specifies the last position in the current input record.
The field token
Prototype
The
token field contains the current input token in character form. The maximum
length of a token is 8190 characters.
The field tokenType
Prototype
The
tokenType field specifies the type of the current input token.
The field Parser_scriptType
Prototype
The
Parser_scriptType field specifies the dialect of current language being processed.
It is initialized with the value of
LNG_BASIC.
The method Parser_CompareBounded
Prototype
int Parser_CompareBounded(CONST char* oper,int length);
The
Parser_CompareBounded method compares the current input to a bounded symbol.
Given a particular bounded symbol -- i.e., any sequence of nonblank characters with an
explicit length specified -- this method checks the next sequence of nonblank characters at
the current location in the coded record to determine if those characters match the symbol.
If the
SkipWhiteSpace attribute is on, then any whitespace characters encountered in
the input record are skipped. All alphabetic characters in the symbol must be specified in
lower case. The criteria for a match varies depending upon the setting of the flag
WhiteSpaceBoundary. If this flag is off, a match does not occur with a symbol ending
in an identifier character if the immediately following character in the coded record is also
an identifier character. If there is an
AbbeviationSymbol defined and used in the
symbol being checked for, then match up to that point is sufficient. Its parameters are:
Parameter | Description
|
oper | Contains the symbol being checked for. It need not be null-terminated.
|
length | Specifies the length of symbol
|
If a match is found, then this method returns the length of the symbol and updates the
current location in the coded record so that it points to the first character beyond the end
of the match. If no match is found, then a zero is returned and the current location is left
unchanged. If a match is found but there are additional identifier characters following
immediately after then a minus one is returned.
The method Parser_CompareSymbol
Prototype
int Parser_CompareSymbol(CONST char* oper);
The
Parser_CompareSymbol method compares the current input to a null-terminated
symbol. Given a particular symbol -- i.e., any sequence of nonblank characters -- this method
checks the next sequence of nonblank characters at the current location in the coded record
to determine if those characters match the symbol. All alphabetic characters in the symbol
must be specified in lower case. This method is a simplified interface to the method
Parser_CompareBounded. See the discussion there for the criteria for a match. Its
parameter is:
Parameter | Description
|
oper | Contains the symbol being checked for in null-terminated form.
|
If a match is found, then this method returns the length of the symbol and updates the
current location in the coded record so that it points to the first character beyond the end
of the match. If no match is found, then a zero is returned and the current location is left
unchanged.
The method Parser_Expression
Prototype
int Parser_Expression(int* parents,int expType,int isSet,int level)
The
Parser_Expression method is the primary control method for compiling expressions
in the source languages. It uses a generalized table-driven recursive descent algorithm
driven by an internal operation information table.
Token precedence */
lexeme VB JS opcode subcode type args VB JS */
---------- ---- ---- ------ -------- ---- ---- -----------*/
MIN - - OPC_NEG BIN_Arithmetic TYP_VOID 1 10 11
NOT not ! OPC_NOT BIN_Arithmetic TYP_VOID 1 3 11
POW ^ OPC_EXP BIN_Arithmetic TYP_VOID 2 11 0
MUL * * OPC_MUL BIN_Arithmetic TYP_VOID 2 9 10
DIV / / OPC_DIV BIN_Arithmetic TYP_VOID 2 9 10
IDV \ OPC_IDV BIN_Arithmetic TYP_VOID 2 8 0
MOD mod % OPC_MOD BIN_Arithmetic TYP_VOID 2 7 10
ADD + + OPC_ADD BIN_Arithmetic TYP_VOID 2 6 9
SUB - - OPC_SUB BIN_Arithmetic TYP_VOID 2 6 9
CAT & + OPC_CAT BIN_Arithmetic TYP_VOID 2 5 9
NEQ <> != OPC_NEQ BIN_Arithmetic TYP_BOOLEAN 2 4 6
GTE >= >= OPC_GTE BIN_Arithmetic TYP_BOOLEAN 2 4 7
LTE <= <= OPC_LTE BIN_Arithmetic TYP_BOOLEAN 2 4 7
EQL = == OPC_EQL BIN_Arithmetic TYP_BOOLEAN 2 4 6
GTR > > OPC_GTR BIN_Arithmetic TYP_BOOLEAN 2 4 7
LTH < < OPC_LTH BIN_Arithmetic TYP_BOOLEAN 2 4 7
IOR or || OPC_IOR BIN_Arithmetic TYP_VOID 2 2 1
AND and && OPC_AND BIN_Arithmetic TYP_VOID 2 2 2
LIKE like === OPC_LIK BIN_Arithmetic TYP_VOID 2 4 6
ISA is i-of OPC_ISA BIN_Arithmetic TYP_VOID 2 4 7
XOR xor ^ OPC_XOR BIN_Arithmetic TYP_VOID 2 2 4
EQV equ OPC_XOR BIT_Equiv TYP_VOID 2 2 0
IMP imp OPC_XOR BIT_Implies TYP_VOID 2 2 0
BWA & OPC_AND BIN_BitWise TYP_VOID 2 0 5
BWO | OPC_IOR BIN_BitWise TYP_VOID 2 0 3
Though hardwired here, this table could easily be specified and stored in a language file.
Each row corresponds to an operator symbol defined via its lexeme code. The table assumes
that unary minus has the lowest lexeme code and that the codes for the other operators follow
it sequentially. For each operator the table contains its opcode and subcode to be emitted;
the type of its result, relational operators produce a boolean result while other result
types depend upon the types of the arguments; the number of operator arguments, unary or
binary; and its precedence order.
The parameters of the method are:
Parameter | Description
|
parents | contains the number of parents controlling the current expression and the root
offsets of those parents.
|
expType | specifies the expected binary type of the expression.
|
isSet | specifies the language context of the current expression
|
level | specifies the current hierarchy level within the expression. It controls the
recursive descent of this method and must always be set at -1 when called
externally.
|
The method returns the status of the factor and of the operator either preceding or following
it when the end of the expression was reached:
Code | Meaning
|
+i | The operation code of unary operator encountered immediately or the operation code of a
binary operator immediately following a processed quantity expression.
|
0 | the expression consisted of a simple l-value not followed by an operator.
|
-1 | the expression consisted of a constant not followed by an operator.
|
-2 | the expression consisted of a complex "parenthetical" expression not followed by an
operator.
|
Prototype
int Parser_ExtractToken(char** Position,UBYTE* Lexeme);
The
Parser_ExtractToken method extracts the next lexeme from character string and
returns its token. A
lexeme is a string of connected characters that is the lowest
level syntactic unit in a programming language and a
token is a syntactic category
that can encompass many difference lexemes but often only defines a single one. Lexemes are
the words and the punctuation of the programming language. There are a set of tokens that are
identified generically based primarily one the character types defined within the
Character class. Languages are distinguished by assigning different characters
different types using methods like
Character_SetIdent,
Character_SetQuote, etc.
The generic tokens are defined in the
ParserToken enumeration and are as follows:
Token | Description of meaning
|
EndOfRecord | The end of the current record was encountered -- i.e., a null-byte was
encountered. In this case the lexeme-length is set to zero.
|
Identifier | The lexeme read was a valid identifier or keyword -- i.e., it begins with an
identifier character and continues until a character that is neither an
identifier nor a digit is encountered.
|
Integer | The lexeme read was an integer constant.
|
Float | The lexeme read was a floating-point constant. At this time
ANSI-C format is assumed for floating point constants.
|
Quoted | The lexeme was a quoted string.
|
Special | The lexeme was some character that could not be
classified as part of one of the above lexemes. In this
the lexeme-length is one and the lexeme-value is the
character value.
|
Other tokens are defined via the two standard lists
ParserReservedWords and
ParserReservedSymbols. The parameters of the method are:
Parameter | Description
|
Position | Contains a pointer to the pointer to the current position in the character string
being parsed. When this method returns this parameter is updated to point to the
character position immediately after the end of the lexeme, or to the null-byte
in the case where the end-of-record is encountered.
|
Lexeme | Receives the actual character form of the lexeme encountered in n-string
form -- i.e., Lexeme[0] receives the length and Lexeme[1..] receive the characters
making up the lexeme. To ensure compatibility the character sequence is then
null-terminated as well. Lexemes longer than 255 characters are returned with a
length of 255. In this case it is up to the caller to compute the actual length
of the lexeme.
|
The method returns the token value of the lexeme which might either be one of the generic
values or might be a value taken from one of the two lists.
The method Parser_FindSymbol
Prototype
int Parser_FindSymbol(UBYTE* oper);
The
Parser_FindSymbol method finds the current input in a
StandardList of
symbols. Often the next symbol in the coded record may be one of many symbols which are
specified in a standard list representation. This method checks the next sequence of
characters at the current location in the coded record to determine if those characters
match one of a list of symbols. This method is a simplified interface to the method
Parser_CompareBounded. See the discussion there for the criteria for a match. Its
parameter is:
Parameter | Description
|
oper | Contains the list of symbols in the StandardList form.
|
If a match is found, then this method returns the offset in the symbol list of the start of
the symbol information associated with the matched symbol and updates the current location in
the coded record so that it points to the first character beyond the end of the match. If no
match is found, then a zero is returned and the current location is left unchanged.
The method Parser_GetReservedWords
Prototype
UBYTE* Parser_GetReservedWords(int active);
The
Parser_GetReservedWords method returns the handle to the standard list of reserved
words associated with a specified language type. It parameter is:
Parameter | Description
|
active | Specifies the language. A setting of LNG_BASIC specifies VB6, Visual
Basic; a setting of zero specifies none; and any other setting specifies
the generic OOP, Java here.
|
The method Parser_GetSymbol
Prototype
int Parser_GetSymbol(char* Record);
The
Parser_GetSymbol method gets the next symbol from a character string. Once the
caller has found the start of a symbol in a character string, this method can be used to find
its extent. In this content a
symbol is defined as a sequence of non-null characters
none of which have are defined as being whitespace characters. Its parameter is:
Parameter | Description
|
Record | Contains a null-terminated string which is assumed to begin with a
non-whitespace character.
|
The method returns the offset in the string of the first character in the string which is
null or which has the whitespace attribute. A return value of zero, indicates that the string
does not begin with a valid non-whitespace character.
The method Parser_GetIdentifier
Prototype
int Parser_GetIdentifier(UBYTE* Record,int nRecord);
The
Parser_GetIdentifier method isolates an identifier in a character string. Once the
caller has found the start of an identifier in a string, this method can be used to find its
extent. In this content an
identifier is defined as a sequence of characters all of
which are classified as being identifier characters. Its parameters are:
Parameter | Description
|
Record | Contains a string which may begin with an identifier character.
|
nRecord | Specifies the length of the string.
|
The method returns the offset in the string of the first character in the string
which does not have the identifier attribute. A return value of zero, indicates that the
string does not begin with an identifier character.
The method Parser_GetToken
Prototype
int Parser_GetToken(void);
The
Parser_GetToken method gets the next token from the current statement and stores
its value in the global token value buffer. The method returns the type code of the token.
This method is simply a specialized access point to the
Parser_ExtractToken method
which does the bulk of the processing.
The method Parser_LookAhead
Prototype
int Parser_LookAhead(void);
The
Parser_LookAhead method gets the following token in the current statement without
actually changing the cursor position within the statement. The value of the following
token is stored in the global token value buffer. The method returns the type code of that
token. This method is simply a specialized access point to the
Parser_GetToken method.
The method Parser_GetBuffer
Prototype
UBYTE* Parser_GetBuffer(void);
The
Parser_GetBuffer method returns the value of the global field
token which
Contains the current input token in character form. It is primarily intended for use by
gmSL and
gmNI which do not have access to the global fields. The method has
no parameters.
Prototype
int Parser_ResetInput(void);
The
Parser_ResetInput method resets the starting position of the next token to the
starting position of the current token. It is primarily intended for use by
gmSL and
gmNI which do not have access to the global fields. The method has no parameters and
returns the new starting position.
The method Parser_SetDoubleQuotes
Prototype
void Parser_SetDoubleQuotes(int status);
The
Parser_SetDoubleQuotes method sets the double quote status for the statements
currently being processed. Its parameter is:
Parameter | Description
|
status | Specifies if the status is to be on or off. A zero value, or LBC_False,
sets the status off, while a nonzero value, or LBC_True sets it on.
|
The method Parser_SetExternalLanguage
Prototype
void Parser_SetExternalLanguage(int status);
The
Parser_SetExternalLanguage method sets the external language status for the
statements currently being processed. Its parameter is:
Parameter | Description
|
status | Specifies if the status is to be on or off. A zero value, or LBC_False,
sets the status off, while a nonzero value, or LBC_True sets it on.
|
The method Parser_SetNumericIdentifiers
Prototype
void Parser_SetNumericIdentifiers(int status);
The
Parser_SetNumericIdentifiers method sets the numeric identifiers status for the
statements currently being processed. Its parameter is:
Parameter | Description
|
status | Specifies if the status is to be on or off. A zero value, or LBC_False,
sets the status off, while a nonzero value, or LBC_True sets it on.
|
Prototype
void Parser_SetInput(int position);
The
Parser_SetInput method sets the starting positions of the current and next token
to a specified value. It is primarily intended for use by
gmSL and
gmNI which
do not have access to the global fields. Its parameter is:
Parameter | Description
|
position | Specifies the starting position for the current and next input token.
|
The method Parser_SetReserved
Prototype
void Parser_SetReserved(int active);
The
Parser_SetReserved method sets the tokens to be associated with the reserved words
and symbols in the particular language being parsed. Most contemporary languages contain
reserved words and symbols which have special unique meaning to the language and which may
not be used for any other purpose. To recognize these when the language statements are being
initially parsed, simplifies the later work of the parser. Its parameter is:
Parameter | Description
|
active | Specifies the type of language being processed. A setting of LNG_BASIC
Specifies VB6, Visual Basic; a setting of zero specifies none; and any
other setting specifies the generic OOP, Java here.
|
Each of the two language types has two lists associated with it: words and symbols.
The
words list contains the reserved words to be recognized. A word must begin with an
identifier character and may contain only identifier and digit characters. In this
implementation reserved words are not case sensitive. The
symbols list contains the
reserved symbols to be recognized. A symbol must begin with a non-identifier character. There
are no other restrictions, but they are case sensitive if they contain alphabetic characters
(which most do not).
The method Parser_SetStatement
Prototype
void Parser_SetStatement(char* record);
The
Parser_SetStatement method sets the content of the global communications buffer.
It is primarily intended for use by
gmSL and
gmNI which do not have access to
the global fields. Its parameter is:
Parameter | Description
|
record | Contains the string in null-terminated form to be copied into the communications
buffer.
|
The method Parser_SetWhiteSpaceBoundary
Prototype
void Parser_SetWhiteSpaceBoundary(int status);
The
Parser_SetWhiteSpaceBoundary method sets the whitespace boundary status for the
statements currently being processed. Its parameter is:
Parameter | Description
|
status | Specifies if the status is to be on or off. A zero value, or LBC_False,
sets the status off, while a nonzero value, or LBC_True sets it on.
|
The method Parser_SetSkipWhiteSpace
Prototype
void Parser_SetSkipWhiteSpace(int status);
The
Parser_SetSkipWhiteSpace method sets the skip whitespace status for the statements
currently being processed. Its parameter is:
Parameter | Description
|
status | Specifies if the status is to be on or off. A zero value, or LBC_False,
sets the status off, while a nonzero value, or LBC_True sets it on.
|
The method Parser_SetAbbreviationSymbol
Prototype
void Parser_SetAbbreviationSymbol(char symbol);
The
Parser_SetAbbreviationSymbol method sets the abbreviation symbol for the
statements currently being processed. Its parameter is:
Parameter | Description
|
symbol | Specifies the abbreviation symbol value.
|
The method Parser_StringExpression
Prototype
void Parser_StringExpression(int* Parents,int exp,char* strValue,int langType)
The
Parser_StringExpession is used to process nested expressions within other complex
contexts like default value specifications or
gmPL attribute values. Its parameters
are:
Parameter | Description
|
parents | contains the number of parents controlling the current expression and the root
offsets of those parents.
|
exp | specifies the expected binary type of the expression.
|
strValue | contains the actual expression to be parsed.
|
langType | specifies the dialect of current statements being compiled.
|