The translation process includes an authoring step that uses a "String Machine" to generate the target code. The string machine is like a runtime execution engine (e.g., like the .NET CLR that executes the IL operation streams that are stored in assemblies). But, there is a critical difference: instead of performing the operations, the string machine authors the code described by those operations. This is what we mean by semantic translation as opposed to syntax translation. It may help you to think of gmBasic as a sophisticated VB6/ASP/COM to .NET compiler-decompiler.
The form of the target code for each operation is defined in terms of patterns associated with operations in the metalanguage files. There are many details in how information in the operation stream may be emitted during the expression of target code. Some of these details may be controlled by special codes and escape sequences in the pattern strings. For convenience, an excerpt from the gmBasic Language reference page that describes these special codes and escape sequences in more detail is included below.
Content of pattern strings
Within the pattern strings there are three types of specifications. First, there are special operation parameters which consist of a backslash followed by a letter. These parameters trigger special conversions. Second, there are operand conversion parameters which consist of a percent sign, followed by a numeric digit, followed by a conversion code. The numeric digit specifies which operand is to be entered at this point in the string, and the conversion code specifies any special operation to be performed. Third, there are simple character specifications which are any characters not forming one of the two specifications above. Simple characters are entered into result strings exactly as entered.
The special operation parameter characters are as follows:
Char | Description |
c | A statement has been completed. Write it to the current output text buffer and continue processing the pattern string. If the syntax of the dialect being authored requires an end of statement like ";" then add it. These end of statement characters should not be entered into the pattern strings directly. |
e | The operation code is followed by the storage offset of an enumeration entry in an external library or language file. Obtain the offset and enter its library-style identifier into the current statement. |
ki | The operation code is followed by a short integer value. Convert the value to string and enter it into the current statement. |
kl | The operation is followed by the offset of the string representation of a long or exact representation constant. Retrieve it, do any dialect specific editing that is needed, and enter it into the current statement. |
kr | The operation is followed by the offset of the string representation or a real constant. Retrieve it, do any dialect specific editing that is needed, and enter it into the current statement. |
ks | The operation is followed by the offset of a character string. Retrieve it and enter it into the current statement surrounded by quotes. |
kp | The operation is followed by the offset of a character string. Retrieve it and enter it as is into the current statement -- i.e., without quotes. |
kc | The operation is followed by the offset of a single-character string. Retrieve it and enter it into the current record surrounded by single or double quotes, depending upon the requirements for character constants in the target dialect. |
l | The operation is followed by the root offset of a component in an external library. Display the identifier of this component either as a fully qualified identifier (library.class.component) or as a simple identifier depending upon the context of its use in the intermediate code. |
v | The operation is followed by the root offset of a component in the user code. In cases where a qualified identifier is needed, simplify it depending upon the location of the reference. |
V | The operation is followed by the root offset of a user code component that is in a different project than the reference. The qualifications that have to be associated with the reference may have to be fully specified. |
p | The operation increases the logical nesting of the authored code. Write the current record and then increase the margin setting for the following records by the indentation margin width. |
q | The operation decreases the logical nesting of the authored code. Write the current record and then decrease the margin setting for the following records by the indentation margin width. |
n | Simply write the current record without associating any language specific end-of-statement characters. |
f | Flush the string stack -- enter all active entries on the string stack into the output record |
m | Enter without margin -- write the current record without a margin. The "m" pattern opcodes have a 0,1, or 2 to distinguish #if, #else, and #end; however that is not used in this implementation. |
w | Write the current record without a new line. This means that the following record written will be concatenated with this record in the final written record. |
t | Tab to an indicated position -- the "t" pattern code is optionally followed by an integer constant which indicates the absolute character position to which the current record should be tabbed. A value of 0 or no value, simply inserts a tab into the current record. |
s | Enter a single double quote character into the current record. These pattern codes are tracked by pair by the surface pattern author. If this is a second one entered, the characters after the first one are scanned for double quotes that need to be escaped and if found the language specific conventions are used. |
S | Enter a single or double quote into the output record depending upon the presence of <% in the current record. This is a special operator used for writing the ends of attribute values for HTML statements within ASP code. It switches between single and double quotes when the language level change. |
R | The operation is followed is by the offset of a stored resource. The label of that resource is entered into the current record. |
B | The operation is followed by the root offset of a control whose code needs to be written. First write the current record and then author the code associated with the control. |
C | The current record contains a comment that needs to be authored as such using the conventions of the target language and the various Select attributes that control the form and spacing of comments. |
D | The operation code is followed by the root offset of a component whose declaration needs to be authored. Do so and associate any inline comment that follows it with the declaration -- if possible. |
E | The operation ends the authoring of a control code started by the "B" pattern opcode. If it is followed by a 1 decrement the current indentation nesting level. |
H | This operation is a specialty operator for writing the arguments to be associated with a direct call to an event handler. The operation scans the code looking for the event handler being call and then determines from its description what the appropriate event arguments are in the target language. |
Q | This operation simply enters two double quotes -- typically indicating an empty string -- into the output record. |
Qo | One of the problems faced by the author involves authoring strings with embedded quotes in quoted form. The term for this is "Quote-enclosure". The Qcontext characters are used to control the process which uses special characters called "hard quotes" to distinguish quotes that have to be escaped from quotes that do not. The basic issue is that once the quotes in part of an output string have been escaped, they must not be escaped again. The terminal hard quote of a string then is a blocking quote. The "Qo" operation opens a string to be quoted by entering a blocking hard quote |
This Qcontext operation simple enters a hard quote at the possible end of a string. | |
Qr | This Qcontext operation removes a terminating hard quote. It is needed for multiple concatenations where the termination expected by the proceeding pattern is now delayed to the end of the current pattern. |
Qe | This Qcontext operation ends a string to be quoted. This is where all the work is done. The entire current output record starting at the last blocking quote is searched for soft quotes which are then escaped. When the record is actually written, all hard quotes are displayed as quotes. |
T | The operation code is followed by the root offset of a component whose binary type display needs to be entered into the current record. |
, | This specialty operator, replace with comma, is used when possibly indexed assignments must be authored as calls to a Set operation. The current record is checked for an ending right-parenthesis and if present it is removed before a comma is entered into the record. |
X | This specialty operator, author asp Xml comment, checks the current record to see if it is an ASP include. If not it uses the target language appropriate annotations to make it a comment. If it is an include it then authors using the appropriate markup conventions. |
b | This operator enters a blank into the current record if it is needed to create a token break -- if the current ending character in the record is an identifier character. |
N | Return previous argument string to the string stack so it may be used again in authoring the next operation pattern. For example VB6\COM using MSComCtlLib.TabStrip Set objTab = Me.TabStrip1.Tabs.Add(1, "key1", "Browser") Refactor Command <Migrate id="ITabs.Add" migPattern="%1d\Nnew TabPage(%5d); %1d.Name=%4d; %2d.Add(%1d)" nPram="6" /> Output objTab = new TabPage("Browser"); objTab.Name="key1"; this.TabStrip1.TabPages.Add(objTab); |
\\ | This operator simply enters a backslash into the current record. |
Each conversion code has an argument string associated with it that was removed from the string stack. They all also compare the hierarchy levels as described above to determine if parentheses have to be entered. The conversion codes differ in what changes they make in the argument string before they enter it into the current record. The conversion code characters are as follows:
Char | Description |
d | This code simply enters the argument with no editing. |
i | This code assumes that the argument is a class name whose corresponding interface name should be displayed instead. |
U | The standard way that transforms convert complex identifiers of the form id1.id2.id3 into simple identifiers is by changing the periods to underscores. This code here performs this operation on the argument string before it is entered. |
o | The argument string is the representation of an optional argument being passed to a subprogram. If it is empty and the current record ends in a comma then remove the comma. |
q | The argument from the stack is to be enclosed in quotes. |
Q | Here the string argument is to be entered into a context which will be subject the quote-enclosure. However any quotes within this string should not be escaped. This display type converts any soft quotes it finds in the argument into hard quotes. See the Qcontext discussion above. |
u | If the argument is enclosed in quotes, then remove them before entering it. |
D | The argument is to be decremented by one. It it is a valid numeric constant, compute its value, decrement it, and then redisplay it. If it is not a valid numeric constant append "-1" to it. |
H | This is a deprecated speciality operator which takes the argument as a hexadecimal constant which must be broken into a three part comma-delimited string. |
P | This is a speciality operator. The argument is an identifier with the possible form name1.name. Only the name1 part is desired. |
Limitations of pattern strings
Generally speaking pattern strings relate individual elemental operations in the source code to corresponding individual elemental operations in the target code. But in some cases, more complex target forms may be expressed. It can be tricky to to this in general, but if the operations follow consistent conventions, it will work out for those cases. Consider for example is from the MSXML upgrade rules. This is not a general pattern, but it works well some of the time.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ScanTool\proj_csh\usr\msxml6.dll.Refactor.xml ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 49 <Migrate id="IXMLDOMDocument.load" nPram="2" cshPattern="try{\p%1d.Load(%2d);\q}catch{}\n%1d.HasChildNodes" vbnPattern="%1d.Load(%2d)\n%1d.HasChildNodes" /> 50 <Migrate id="IXMLDOMDocument.loadXML" nPram="2" cshPattern="try{%1d.LoadXml(%2d);}catch{}\n%1d.HasChildNodes" vbnPattern="%1d.LoadXml(%2d)\n%1d.HasChildNodes" />
If more complex or dynamic transformations are required, you will typically have to use dynamic translation rules built with gmSL or gmAPI as discussed elsewhere in this site.