gmniCodeStyle
- Mark Juras
Implementing Target Coding Standards
The default translations produced by gmBasic are generic and designed to be compilable even in situations where the target code is not fully mature. These translations are useful for most applications; however, they may not meet the desired coding standards. The gmCodeStyle.exe is a Custom Translation Engine distributed with gmStudio that demonstrates how to produce translations that follow alternative coding standards, gmCodeStyle.exe is a .NET assembly implemented in C# using the gmAPI framework.
Contact Us of you would like to see gmCodeStyle in action with.
The transformations performed by CodeStyle take particular advantage of the following features of the tool:
- There are two identifiers maintained for each symbol -- source and target. By default the target identifier is set equal to the source, but it can be changed at will. This makes changing the naming conventions relatively simple.
- Code to be made available to the tool is linked into a dynamic-link-library that is then executed by the tool when certain events occur. A key event is the FinishAnalyser event that is triggered when the underlying code has been completed, but before it is passed to the author for surface-form formulation.
- The tool has built in code for authoring declarations; however, there is an AuthorDeclaration event which can be used to override the default declaration.
- When the tool actually authors the final target code, rather than actually writing it to a file it enters it into a stored text buffer. There is an EditTranslation event and an extensive text-editing service that can be used change the content of the text buffer before it is finally sent to the output file.
CodeStyle.std.xml
gmCodeStyle.exe uses a specifications file directing the coding style conventions. The specification file is implemented with XML script and placed in the target location (workspace\usr folder) of the migration project. Its full name is %targetLocation%\CodeStyle.%sysId%.xml. A sample, CodeStyle.std.xml, is installed with gmStudio and may be may be activated in your project using the gmStudio Configuration form displayed by clicking Settings on the toolbar. The initial record for this file must be <CodeStyle> and it must end with </CodeStyle>. Between those two tags are the various code style specification commands.
The Messages Command
The Messages command specifies the syntax to be used for warning messages needed during processing. The Messages command has a set of Entry subcommands with the following attributes:Attribute | Description |
id | Specifies the identifier of the message. The only current one is RENAME which is issued when the process attempts to introduce a new name in the target symbol table which might cause a name clash. |
name | Specifies the actual message to be issued. |
<Messages> <Entry id="Rename" name="UPGRADE_TODO: identifier [$1d] for $2d already defined for $3d" /> </Messages>
The Indent Command
The gmBasic tool keeps track of indentation level as it authors the target code. The Indent command can be used to specify how much white space is to be associated with each indentation level. The only attribute of this command is Value which specifies a value greater than or equal to zero. A value of zero indicates that a tab should be used for each indentation level; while a nonzero value of n specifies that n spaces should be associated with each level. Thus, the following CodeStyle file<CodeStyle> ... <Indent value="4" /> ... </CodeStyle> will produce a well indented code with 4 spaces allocated for each indentation level.
The Hungarian Command
The Hungarian command deals with the issue that some VB6 codes use Hungarian prefix notation to indicate the binary type of quantity symbols. The goal is to remove these prefixes from the target code and then possibly to use other conventions to name the target symbols. Renaming is triggered by the presence of a list of source code prefixes within the Hungarian command; however, much more machinery is needed if compilable target code is needed.The Rename Subcommand
The Rename subcommand can appear anywhere within the Hungarian command. It changes the authored name of a symbol and blocks the application of any of the renaming algorithms specified to that name. The attributes of the Rename statement are as follows:Attribute | Description |
Identifier | This required identifier attribute specifies the component to be renamed. It is specified relative to the root of the symbol table -- i.e., it is a fully qualified identifier. It is expected that the same CodeStyle script will be used by multiple code sets. If an undefined identifier is encountered, it is simply assumed to apply to a different code set and is skipped. |
Content | This required identifier specifies the name to be used for the component in the target code. |
The Rename subcommand is applied as the Hungarian command is being read which means that it applies before any of the code style specific algorithms are applied. Note that refactoring Rename commands may be entered in the translation scripts themselves and cause the same blocking of the code style algorithms for individual identifiers.
The SourcePrefixes Subcommand
The SourcePrefixes subcommand specifies the binary type Hungarian prefixes. Only variables are assumed to have type prefixes. It is the presence of a SourcePrefixes subcommand that triggers steps 3 through 7 of the renaming algorithm. The command itself introduces a series of Entry subcommands each of which has two required attributes:Attribute | Description |
Type | Specifies the binary type that has a certain prefix. The possible binary type identifiers are listed below. |
Value | Specifies the actual Hungarian prefix in case sensitive form. If a variable of the type indicated by the Type attribute has this prefix then that prefix is stripped. |
VB6 | .NET Equivalent C#, VB.NET |
Byte | byte, Byte |
Short | short, Short |
Integer | int, Integer |
Long | long, Long |
Currency | decimal, Decimal |
Single | float, Single |
Double | double, Double |
String | string, String |
Boolean | bool, Boolean |
Date | DateTime |
Variant | object, Object |
Object | object, Object |
User | object, Object |
Control | System.Windows.Forms.Control |
Second there the special processing types used by gmBasic to deal with various special circumstances:
Vb6Special | .NET Equivalent C#, VB.NET |
Icon | System.Drawing.Icon |
FrxPicture | System.Drawing.Image |
Any | object, Object |
TwipsX | int, Integer |
TwipsY | int, Integer |
UnsInteger | unit, Integer |
WinPanel | System.Windows.Forms.GroupBox |
VarArray | Object[], Object() |
StringPtr | System.Text.StringBuilder, String |
CallHwnd4 | MigrationSupport.Vb7_Callback.Hwnd4 |
ControlCollection | System.Windows.Forms.Control.ControlCollection |
CheckedListBox | System.Windows.Forms.ListBox |
Exception | System.Exception |
SafeArray | System.Array |
SecurityManager | UserSecurityManager |
Dynamic | dynamic |
ValueType | object, Object |
Vb6Class | .NET Equivalent C#, VB.NET |
PictureBox | System.Windows.Forms.PictureBox |
Label | System.Windows.Forms.Label |
TextBox | System.Windows.Forms.TextBox |
Frame | System.Windows.Forms.GroupBox |
CommandButton | System.Windows.Forms.Button |
CheckBox | System.Windows.Forms.CheckBox |
OptionButton | System.Windows.Forms.RadioButton |
ComboBox | System.Windows.Forms.ComboBox |
ListBox | System.Windows.Forms.ListBox |
HScrollBar | System.Windows.Forms.HScrollBar |
VScrollBar | System.Windows.Forms.VScrollBar |
Timer | System.Windows.Forms.Timer |
Printer | MigrationSupport.Printer |
Form | System.Windows.Forms.Form |
DriveListBox | Microsoft.VisualBasic.Compatibility.VB6.DriveListBox |
DirListBox | Microsoft.VisualBasic.Compatibility.VB6.DirListBox |
FileListBox | Microsoft.VisualBasic.Compatibility.VB6.FileListBox |
Menu | System.Windows.Forms.ToolStripMenuItem |
MDIForm | System.Windows.Forms.Form |
Shape | System.Windows.Forms.Label |
Line | System.Windows.Forms.Label |
Image | System.Windows.Forms.PictureBox |
Data | MigrationSupport.DataControl.DataControl |
PropertyPage | MigrationSupport.PropertyBag |
TabControl | System.Windows.Forms.TabControl |
ErrObject | VBNET.ErrObject, ErrObject |
Vb6Enumeration | .NET Equivalent C#, VB.NET |
SimpleBorderStyle | System.Windows.Forms.BorderStyle |
KeyCodeConstants | System.Windows.Forms.Keys |
LogEventTypeConstants | System.Diagnostics.EventLogEntryType |
DrawStyle | MigrationSupport.Utils.DrawStyle |
DrawMode | MigrationSupport.Utils.DrawMode |
MousePointerConstants | System.Windows.Forms.Cursor |
WindowStyle | VBNET.AppWinStyle, AppWinStyle |
OpenMode | VBNET.OpenMode, OpenMode |
vbTristate | VBNET.TriStatem TriState |
ScaleType | MigrationSupport.Utils.ScaleType |
VbCompareMethod | VBNET.CompareMethod, CompareMethod |
VbFileAttribute | VBNET.FileAttribute, FileAttribute |
MsgBoxResult | VBNET.MsgBoxResult, MsgBoxResult |
VbMsgBoxStyle | VBNET.MsgBoxStyle, MsgBoxStyle |
VariableType | VBNET.VariantType, VariantType |
ButtonAppearanceStyle | System.Windows.Forms.Appearance |
ApplicationStartMode | MigrationSupport.Utils.StartMode |
MouseButtonConstants | System.Windows.Forms.MouseButtons |
ResourceType | MigrationSupport.Utils.ResourceType |
FirstDayOfWeek | VBNET.FirstDayOfWeek, FirstDayOfWeek |
FirstDayOfYear | VBNET.FirstDayOfYear, FirstDayOfYear |
DueDate | VBNET.DueDate, DueDate |
AlignConstants | MigrationSupport.Utils.AlignConstants |
CheckboxConstants | System.Windows.Forms.CheckState |
AlignmentConstants | System.Drawing.ContentAlignment |
BorderStyle | System.Windows.Forms.FormBorderStyle |
ComboBoxStyle | System.Windows.Forms.ComboBoxStyle |
ColorConstants | System.Drawing.Color |
LayoutArrangement | MdiLayout |
RLDirection | System.Windows.Forms.RightToLeft |
ShiftConstants | MigrationSupport.Utils.ShiftConstants |
BackStyle | MigrationSupport.Utils.BackStyleConstants |
QueryUnloadConstants | MigrationSupport.Utils.QueryUnloadConstants |
ClipboardConstants | MigrationSupport.Utils.ClipboardConstants |
The type designator Object refers to any external type and the type designator User refers to any user defined type. Below is a simple SourcePrefixes specification.
<SourcePrefixes > <Entry type="Boolean" value="bln" /> <Entry type="String" value="str" /> <Entry type="Integer" value="lng" /> <Entry type="User" value="obj" /> <Entry type="Object" value="dic" /> </SourcePrefixes>
The ExcludedSuffixes Subcommand
One of the potential problems with stripping identifiers of their Hungarian prefixes is that there will be symbols whose identifiers are distinguished only by their prefixes. The ExcludedSuffixes command specifies these symbols. Any identifier that ends in one of the excluded suffixes is excluded from the renaming algorithm. The comparison is case insensitive. The actual list of symbols can be entered as a single semicolon delimited list using a single value attribute. This might look as follows<ExcludedSuffixes value="Data;Connection;ErrorMessage;Table;Field;IndexName" />
<ExcludedSuffixes > <Entry value="Data" /> <Entry value="Connection" /> <Entry value="ErrorMessage" /> <Entry value="Table" /> <Entry value="Field" /> <Entry value="IndexName" /> </ExcludedSuffixes >
The StatusPrefixes Subcommand
In addition to the binary type Hungarian prefixes there are sometimes also various types of status Hungarian prefixes which must be stripped before the actual type prefixes can be examined. The StatusPrefixes command specifies these prefixes. Any identifier that begins with one of these prefixes has that prefix stripped off. The comparison is case insensitive. The actual list of prefixes can be entered as a single semicolon delimited list using a single value attribute. This might look as follows<StatusPrefixes value="m_;i_;o_;io_;l_" />
<StatusPrefixes > <Entry value="m_" /> <Entry value="i_" /> <Entry value=";o_" /> <Entry value="io_" /> <Entry value="l_" /> </StatusPrefixes >
The GlobalPrefixes Subcommand
Non local variables often also have a prefix used to indicate that they are not local, which also precede the type prefix. There might be identifiers like "gblnReadAll" for a global boolean variable so these are assumed to combine with Hungarian. So these need to be checked for as well and be stripped. They are specified via the GlobalPrefixes command. The comparison is case insensitive. The actual list of prefixes can be entered as a single semicolon delimited list using a single value attribute. This might look as follows<GlobalPrefixes value="g;m" />
<GlobalPrefixes > <Entry value="g" /> <Entry value="m" /> </GlobalPrefixes >
The NamingStyle Subcommand
The changing naming style algorithm is made possible by the fact that the modern target languages are case sensitive while the historical source languages are case insensitive. This allows modern naming styles to distinguish different symbol types based solely on the case pattern of their identifiers. The key notion here is CamelCase which is the practice of writing compound names such that each word or abbreviation within the name begins with a capital letter. Camel case may start with a capital or lowercase letter. As an example consider the identifier CamelCase itself beside its possible alternative camelCase. In general the naming style algorithm recognizes four case styles:Style | Description |
lowercase | All the alphabetic characters in the identifier are lowercase as in "lowercase" |
uppercase | All the alphabetic characters in the identifier are uppercase as in "UPPERCASE" |
lowercamel | The first character of words in the identifier begin with an uppercase character followed by lowercase characters except the first character which is lowercase as in "lowerCamel" |
uppercamel | The first character of words in the identifier begin with an uppercase character followed by lowercase characters as in "UpperCamel" |
The NamingStyle subcommand itself specifies the naming style to be associated with symbols names. This command has the following attributes:
Attribute | Description |
Style | Specifies the naming style to be used. It has 5 possible entries -- Original, LowerCase, UpperCase, LowerCamel, and UpperCamel. The Original style resets the name to its original form as of the end of the renaming algorithm. The other styles are discussed above. |
Object | Specifies the object type of the symbol. It has the following possible entries -- Subprogram, Variable, Constant, Property, Declaration, Structure, Enumeration, EnumeratedEntry, StatementLabel, Event, Vb_Name. |
Access | Specifies the access type of the symbol. It has the following possible entries -- , Public, Private. |
Type | Specifies a binary type. The possible binary type identifiers are discussed under the SourcePrefixes subcommand. |
Prefix | In addition to the case style of the name a prefix can be added to the front of the name as well. This attribute specifies that prefix. Note that combining these prefixes with types allows the reintroduction of Hungarian notation in the target names, if that is desired. |
Here is a sample set of NamingStyle entries.
<NamingStyle> <Entry style="Original" object="Vb_name" /> <Entry style="lowerCamel" access="local" /> <Entry style="lowerCamel" access="Private" object="Variable" prefix="_" /> </NamingStyle >
The SpecialNames Subcommand
There are some special names specified in the gmBasic language files, such as arguments to event handlers, That are also referenced by micro-code in the language files. These can not be changed via this set of specifications. The ones generated by the client code translations must be listed as SpecialNames so that they are not changed. The comparison is case insensitive. The actual list of special names can be entered as a single semicolon delimited list using a single value attribute. This might look as follows<SpecialNames value="Cancel;UnloadMode" />
<SpecialNames > <Entry value="Cancel" /> <Entry value="UnloadMode" /> </SpecialNames >
The Acronyms Subcommand
The NamingStyle algorithm has no way of locating words within compound names, because it does not know what the names are. There is one exception to this -- acronyms like "SQL" or "XML". The Acronyms command specifies a list of acronyms or simply words which should be entered in a particular style in the target name. The individual entries are specified in their desired target language form. The algorithm does a case insensitive search of each name for the entry and, if found, substitutes the target form for the original form. The actual list of acronyms can be entered as a single semicolon delimited list using a single value attribute. This might look as follows<Acronyms Value="Xml;Sql" />
<Acronyms > <Entry Value="Xml" /> <Entry Value="Sql" /> </Acronyms >
The ReservedWords Subcommand
The NamingStyle algorithm can form reserved words like default or in. These can be repaired by changing their case. The ReservedWords command specifies the list of reserved words in the form in which they can be used as identifiers in the targe code. The algorithm does a case insensitive search of each name for the entry and, if found, substitutes the target form for the original form. The actual list of reserved words can be entered as a single semicolon delimited list using a single value attribute. This might look as follows<ReservedWords value="Default;String;In" />
<ReservedWords > <Entry Value="Default" /> <Entry Value="String" /> <Entry Value="In" /> </ReservedWords >
The LoopVariables Subcommand
The LoopVariables command changes the names of loop variables. There is a common convention in code bases to use simple identifiers like i or j for loop variables. These simple identifiers can be difficult to find and/or trace in the target code. This command changes these identifiers to something more readable like "index". The LoopVariables command has a set of Entry subcommands with the following attributes:Attribute | Description |
id | Specifies the identifier in the source code of a loop variable to be renamed. Comparison is case sensitive. |
name | Specifies the identifier in the target code to be used for the loop variable. |
The change only applies to variables that are explicitly used as a counter in a For loop. A possible LoopVariables specification might be as follows.
<LoopVariables > <Entry id="i" name="loopIndex" /> </LoopVariables>
sample.cs(1666,14): error CS0136: A local variable named 'loopIndex' cannot be declared in this scope because it would give a different meaning to 'loopIndex', which is already used in a 'parent or current' scope to denote something else [C:\temp\Sample.csproj]
Algorithm to Strip Source Identifiers
The renaming algorithm is the first algorithm applied to the target code. It is applied after all code in a given code unit has been compiled and analyzed. At this point in time, in addition to the compiled code there is also a symbol table. Though additional renaming can occur during a later code scan, the bulk of the renaming process is done through a scan of the symbol table. It begins by applying the source specifications so that a root identifier is formed which can the be used the form a target identifier: It proceeds as follows:- The symbol table is scanned looking for any symbols that are a subprogram, variable, constant, property, declare, structure, enumeration, enumeration entry, statement label, event, or class name. These are the types of symbols that can be renamed here. The following steps apply to each one of these symbols separately. Note that any symbol that already has a target name associated with it via a Rename command is skipped as well.
- The access type of the symbol is determined -- local, public, or private.
- If the source code used Hungarian notation, then the source Hungarian prefixes can be removed. The specification commands include a Hungarian command which supplies the prefix used for each binary and access type combination. The presence of this specification triggers the prefix removal steps.
- In source codes there will often be symbols whose identifiers are only distinguished by their Hungarian prefixes. A list of these symbols is supplied via a ExcludedSuffixes command.
- In source codes there are often symbol status codes that precede the actual Hungarian type prefix. These must be checked for first and stripped from the identifier. They are specified via a StatusPrefixes command.
- Non local variables often also have a prefix used to indicate that they are not local. These must be checked for as well and be stripped. They are specified via the GlobalPrefixes command.
- Finally the actual Hungarian prefixes can be stripped.
Algorithm to Form Target Identifiers
Once the source symbol has be stripped of its hungarian annotations, the target language naming styles can be applied. The actual application of this algorithm depends upon the presence of a NamingStyle subcommand within the Hungarian command. Any identifiers skipped because they were explicitly renamed are not changed by this algorithm. Also before the names can be changed into one of the style forms they first need to be changed into a standard form from which the other styles can be derived. That standard form is UpperCamel. The problem is that the algorithm here has no way of breaking possibly compound names into their component words. Fortunately, many code bases use the underscore character in symbol names to separate their words. At this point then the algorithm looks for names like "KEY_QUERY_VALUE" and changes them to "KeyQueryValue". Any name that does not have this form is simply changed by making its first character upper case. Some typical names changes at this point might be as follows:Original | Changed |
KEY_ALL_ACCESS | KeyAllAccess |
READ_CONTROL | ReadControl |
STANDARD_RIGHTS_READ | StandardRightsRead |
SYNCHRONIZE | Synchronize |
KEY_READ | KeyRead |
dwType | DwType |
szData | SzData |
cbData | CbData |
ctlReadyToGenerate | CtlReadyToGenerate |
enumOperationMode | EnumOperationMode |
ctlSelectDatasource | CtlSelectDatasource |
When the algorithm applies, the binary, component, and access types of the symbol underlying the identifier are all known. The algorithm itself proceeds as follows:
- Exclude any SpecialNames that are referenced by the micro-code in the language files
- Convert the names into uppercamel form when the word boundaries can be detected.
- Apply the specifications in the NamingStyle command
- Repair any ReservedWords that may have been formed
The DoNotInitialize Command
The DoNotInitialze command removes default initializations of variables and fields that are not necessary to avoid using an uninitialized value. The default translations produced are generic and designed to be compilable even in situations where the target code is not fully mature. By default, all variable and field declarations have an initialization value specified regardless of need.The Fields Subcommand
The Fields subcommand requests the fields that have a public access type not be supplied with a default initialization value. The subcommand is a singleton with no attributes. It appears as follows.<Fields />
The Variables Subcommand
The Variables subcommand requests that local variables that are assigned a value within the code, not also be assigned a default value. Simply being assigned a value is too weak. The actual test used here traces all references to the variable to make certain that no nested use of the variable is on a possibly unassigned path through the code. The subcommand is a singleton with no attributes. It appears as follows.<Variables />
The OutParameters Subcommand
The OutParameters subcommand examines all parameters that are being passed ByRef to determine if their values are being changed before they are being used. If so, then they can be reclassified as being ByOut. The actual command is a singleton with no attributes. It looks like this<OutParameters />
sample.cs(669,6): error CS0177: The out parameter 'script' must be assigned to before control leaves the current method [C:\temp\Sample.csproj]
The SimpleProperty Command
The SimpleProperty command checks for simple getter/setter properties whose operation codes match a code pattern and then reauthors them using a specified .NET surface form pattern. As an example consider a VB6 property source pattern that always includes On Error GoTo error handling code. In the .NET implementation this error handling code is to be removed and the properties are to be authored using an internal declaration. Here is a sample codePrivate fieldValue As ValueType Friend Propery Get PropValue() As ValueType On Error GoTo ErrorHandler PropValue = fieldValue; Exit Property ErrorHandler: ... End Property Friend Property Let PropValue(ByVal myValue As ValueType) On Error GoTo ErrorHandler fieldValue = myValue Exit Property ErrorHandler: ... End Property
private ValueType fieldValue = ""; public ValueType PropValue { get { ValueType PropValue = ""; try { PropValue = fieldValue; return PropValue; } catch(Exception exc) { ... } return PropValue; } set { try { fieldValue = value; return; } catch(Exception exc) { ... } } }
private ValueType fieldValue; internal ValueType PropValue { get { return fieldValue; } set { fieldValue = value; } }
Actual csh Codeblock Associated with Get: Opcode | Operation support information ------ | ----------------------------- NEW | 25 On Error GoTo ErrorHandler NEW | 27 PropValue = fieldValue ERR | Try LEV | Nest0 LDA | Variable:fieldValue:610921 ARG | ValueType LDA | Property:PropValue:610968 STR | AssignValue NEW | 29 Exit Property LDA | Property:PropValue:610968 EXI | Function ERR | Catch1 ... ERR | Catch3 Actual csh Codeblock Associated with Let: Opcode | Operation support information ------ | ----------------------------- NEW | 36 On Error GoTo ErrorHandler NEW | 38 fieldValue = myValue ERR | Try LEV | Nest0 SPV | Value ARG | String LDA | Variable:fieldValue:610921 STR | AssignValue NEW | 40 Exit Property EXI | Property ERR | Catch1 ... ERR | Catch3
The Getter Subcommand
The Getter subcommand specifies a set of code patterns that a given property getter must match if it is to be authored in a simpler way. It has a series of Entry subcommands that specify the actual code patterns. Here is the specification for the above example along with a second pattern for a getter that has no try-catch.<Getter> <Entry value="NEW,NEW,ERR.Try,LEV,LDA,ARG,LDA,STR.AssignValue,NEW,LDA,EXI,ERR.Catch1,...,ERR.Catch3" /> <Entry value="NEW,Argument,EXI.Function" /> </Getter>
The Setter Subcommand
The Setter subcommand specifies a set of code patterns that a given property letter or setter must match if it is to be authored in a simpler way. It has a series of Entry subcommands that specify the actual code patterns. Here is the specification for the above example along with a second pattern that has no try-catch.<Setter> <Entry value="NEW,NEW,ERR.Try,LEV,SPV.Value,ARG,LDA,STR.AssignValue,NEW,EXI,ERR.Catch1,...,ERR.Catch3" /> <Entry value="Argument,LDA,STR.AssignValue" /> </Setter>
The AuthorSame Subcommand
The AuthorSame subcommand contains the patterned text block that specifies how the simplified property is to be authored when the type of the property and the type of the value are the same. Note the manner in which the text is surrounded by CDATA directives. These are required in the form shown. Also the dollar sign, as opposed to the percent sign, is used to mark the locations of the variable strings in the pattern.<AuthorSame><![CDATA[ private $1d $3d; $5d $4d $2d { get { return $3d; } set { $3d = value; } }
- is the .NET identifier of the value type.
- is the target form of the property identifier. This may well be the output of the renaming algorithms.
- is the target form of the field identifier. This may well be the output of the renaming algorithms.
- is the .NET identifier of the property
- is the .NET scope specification. If the property was Public or Friend then it is "public" else it is "internal".
The AuthorDifferent Subcommand
The AuthorDifferent subcommand contains the patterned text block that specifies how the simplified property is to be authored when the type of the property and the type of the value are different.<AuthorDifferent><![CDATA[ private $1d $3d; $5d $4d $2d { get { return ($4d)$3d; } set { $3d = ($1d)value; } }
The PublicFields Subcommand
Many code style standards forbid the use of global fields. They prefer global properties. In .NET there are auto-properties that can be used to define what were simply global fields in VB6.Public GlobalField As fieldType
public static fieldType GlobalField { get; set;}
<PublicFields><![CDATA[ public static $1d $2d { get; set; }
The GetEnumerator Subcommand
The VB6 NewEnum property getters are replaced by .NET GetEnumerator() methods. These methods should not contain any initialization code and must almost always be rewritten as part of a migration. By default, then, the tool strips away all code from the getter and simply authors it using this gmSL method.void AuthorGetEnumerator(int iHost) { if(Select.Dialect == Dialects.csh) { #TextStart public IEnumerator GetEnumerator() { return (%= Store.GetName(iHost) %).GetEnumerator(); } #TextEnd } else { #TextStart Public Function GetEnumerator() As IEnumerator GetEnumerator = (%= Store.GetName(iHost) %).GetEnumerator(); End Function #TextEnd } }
Set NewEnum = mcolTables.[_NewEnum]
<csh role="property" narg="1" code="MigrationSupport.Utils.NewEnum(%1d)" />
<GetEnumerator entry="%1d.GetEnumerator()" />
The CodeScan Operations
When the initialize scan of the symbol table via the FinishAnalyser event handler encounters a property or a variable, it invokes the operations of this command. If the symbol is a variable with a Public access type and if a special declaration was specified via the PublicFields subcommand, then the information vector for that symbol is marked so that the it can be declared later as specified.The Authoring the Declarations
When the AuthorDeclaration event handler is called for a field whose information structure is marked with propery, then the property is authored in the way specified by command. The four needed strings are formed from the information in the symbol table. then the appropriate form is used depending upon whether the types are the same or different.The ChangeIntroduced Command
The ChangeIntroduced command changes introduced variables so that they follow the same naming conventions that other variables follow. The primary source of introduced variables is the need to create a variable when a constant or an expression or an object instance of the wrong class serves as an argument to a ByRef or ByOut parameter. The need to create these variables pervades the VBb to .NET migration process. The tool carefully analyses user code parameters to change them to ByVal whenever possible. But the tool has no control over the status of parameters in external libraries which are often needlessly ByRef. To make these easy to find during the migration process the tool uses a standard naming convention, argTemp(n), to name these introduced temporaries. This makes them easy to find in the target code. Here is an example.object argTemp1 = MigrationSupport.Utils.VarPtr(SwapEndian) + 3; object argTemp2 = dw; CopyMemory(ref argTemp1,ref argTemp2,1); object argTemp3 = MigrationSupport.Utils.VarPtr(SwapEndian) + 2; object argTemp4 = MigrationSupport.Utils.VarPtr(dw) + 1; CopyMemory(ref argTemp3,ref argTemp4,1); object argTemp5 = MigrationSupport.Utils.VarPtr(SwapEndian) + 1; object argTemp6 = MigrationSupport.Utils.VarPtr(dw) + 2; CopyMemory(ref argTemp5,ref argTemp6,1); object argTemp7 = SwapEndian; object argTemp8 = MigrationSupport.Utils.VarPtr(dw) + 3; CopyMemory(ref argTemp7,ref argTemp8,1);
object lpvSource = null; object lpvDest = null; lpvDest = MigrationSupport.Utils.VarPtr(SwapEndian) + 3; lpvSource = dw; CopyMemory(ref lpvDest,ref lpvSource,1); lpvDest = MigrationSupport.Utils.VarPtr(SwapEndian) + 2; lpvSource = MigrationSupport.Utils.VarPtr(dw) + 1; CopyMemory(ref lpvDest,ref lpvSource,1); lpvDest = MigrationSupport.Utils.VarPtr(SwapEndian) + 1; lpvSource = MigrationSupport.Utils.VarPtr(dw) + 2; CopyMemory(ref lpvDest,ref lpvSource,1); lpvDest = SwapEndian; lpvSource = MigrationSupport.Utils.VarPtr(dw) + 3; CopyMemory(ref lpvDest,ref lpvSource,1);
Attribute | Description |
id | Specifies the generated identifier of an introduced variable in case sensitive form. |
name | Specifies the name to be used instead of the generated identifier. |
<ChangeIntroduced> <Entry id="index" name="indexPram" /> </ChangeIntroduced>
The OperationCode Command
The OperationCode command contains requests to introduce code styles that require changing the operation code. Its subcommands are implemented during the final pass of the operation code via the FinishAnalyser event handler.The OptimizeFunctions Subcommand
The OptimizeFunctions subcommand is a singleton command with no attributes. It basic role is to replace sequences like the following in the target codestatic bool myFunction { bool myFunction; myFunction = false; return myFunction; }
static bool myFunction { return false; }
<OptimizeFunctions />
The PostIncrement Subcommand
The PostIncrement subcommand is a singleton command with no attributes. It requests that assignments to variables that simply add one be replaced by the ++ post increment operation. It may appear anywhere with the scope of the OperationCode command.<PostIncrement />
The RemoveReturns Subcommand
The RemoveReturns subcommand is a singleton command with no attributes. It requests that additional checks be made for unneeded explicit return statements in the target codes. An example would be a return at the bottom of a try block whose catch block immediately precedes the end. It may appear anywhere with the scope of the OperationCode command.<RemoveReturns />
The SimpleCasts Subcommand
The SimpleCasts subcommand is a singleton command with no attributes. It requests that casts within the target code of the form (type)(value or instance) be replaced with the form (type)value or instance. This subcommand is implemented by replacing the CNV.CastType operation with CNV.CastSimple. It may appear anywhere with the scope of the OperationCode command.<SimpleCasts />
The StandardFunctions Subcommand
The StandardFunctions subcommand replaces references to the standard VB6 functions with alternative operations that give different target code for them. The subcommand has Entry subcommands that specify the individual functions and their desired target code surface pattern. The Entry subcommand has two attributes as follows:Attribute | Description |
id | The VB6 source code identifier of the function |
name | The desired target code surface pattern |
<StandardFunctions> <Entry id="Trim" name="%1d.Trim()" /> <Entry id="Left" name="%1d.Substring(0,%2d)" /> <Entry id="InStr" name="%2d.IndexOf(%3d,%1o)" /> <Entry id="Right" name="%1d.Substring(%1d.Length - %2h)" /> <Entry id="Len" name="%1d.Length" /> </Standard Functions>
The OptionalArguments Command
Starting with the March 2023 release, OptionalArguments="on" is set by default in the standard translation template script.
<OptionalArguments />
The TargetCode Command
The TargetCode command contains requests to introduce code styles that require changing the target code directly. When the tool actually authors the final target code, rather than simply writing it to a file, it enters it into a stored text buffer. There is an EditTranslation event and an extensive text-editing service that can be used change the content of this text buffer before it is finally sent to the output file.The AddSpaces Command
By default the target code does not add a space after each comma in lists, because the target output lines are often very long. The AddSpaces command adds these spaces. The command itself is a singleton with one attributes. The AddSpaces command has a Vertical="on" attribute that adds a blank line after a mainline right brace. This adds an additional line of separation after all "complex" component declarations that used braces, not just methods.<AddSpaces Vertical="on" />
The AllowBlankLines Command
By default the translator passes all blank lines and empty comment lines in the source through to the target code. In addition the translator moves declarations when necessary to resolve nesting scope errors in the target. These moves can some times leave blocks of blank lines behind which come through to the target code. The AllowBlankLines subcommand will remove sequences of blank lines from the translations.<AllowBlankLines Limit="n" />
This subcommand allows no more than "n" consecutive blank lines in authored code. The default does not check for blank lines so there is no limit. Setting the limit to zero will remove all blank lines from the target code.
The ReduceBraces Command
The ReduceBraces subcommand removes the braces from if/while/for statements, when they are controlling a single statement. A structure like the followingconditional { statement }
conditional statement.
if (_disposed) { return; } else { Class_Terminate(); } _disposed = true;
<ReduceBraces statement="off" />
if (_disposed) return; else Class_Terminate(); _disposed = true;
<ReduceBraces statement="on" />
if (_disposed) return; else Class_Terminate(); _disposed = true;
if (enumTableStatus == basGlobal.DefinedEnum.DefDeleted) { // We don't need to add any more information to display } else ..
if (enumTableStatus == basGlobal.DefinedEnum.DefDeleted) // We don't need to add any more information to display else ..
The RemoveUsing Command
The RemoveUsing subcommand removes specified using statements from the target code text buffer, unless that buffer contains one of a list of substrings.<RemoveUsing> <Entry id="System.Drawing;" /> <Entry id="System.Collections;" name=" IEnumerator "/> <Entry id="System.ComponentModel;" /> <Entry id="System.Runtime.InteropServices;" name="[Dllimport" /> <Entry id="System.Data;" /> <Entry id="Microsoft.VisualBasic.CompilerServices;" /> <Entry id="System.Linq;" name=".ToArray<" /> <Entry id="System.Collections.Generic;" name="List<;Dictionary<;HashSet<" /> <Entry id="VBNET = Microsoft.VisualBasic;" name="VBNET." /> </RemoveUsing>
The Replacements Command
The Replacements subcommand scans the target code text buffer for a specified substrings and either replaces them with a second substring or simply removes them. The subcommand has a set of Entry subcommands each of which define the individual substrings. It has two attributes:Attribute | Description |
id | A substring to be replaced or removed. A simple case insensitive text search is performed for the string -- i.e., substring boundaries are not considered. |
name | This optional substring is the replacement string to be used. |
<Replacements> <Entry id="String.Empty" name="string.Empty" /> <Entry id="System.Int32" name="int" /> <Entry id="System.Windows.Forms." /> <Entry id="this." /> <Entry id="VBNET.Constants.vbNullString" name="null" /> </Replacements>
The VerticalList Command
The class VerticalList reformats long target code statements into readable form by converting them from horizontal form into vertical lists. The sorts of statements that are typically in need of this sort of reformatting are as follows:
- Calling / declaring methods with "many" parameters;
- Initializing arrays with many elements;
- Complex formulas/conditionals with a series of similar repeating factors;
- Building strings with a series of many concatenations.
All of these scenarios are reformatted by this command, but there is an important caveat. Not all target code can be processed by this class: only C# code produced by the surface code patterns specified in the metalanguage files can be processed. There are plans to also both VB.NET and XML/HTML target code as produced by the same surface codes as well, but these are not yet implemented.
The VerticalList statement is a section level statement with one optional attribute:
Attribute | Description |
---|---|
MinLineLength | The MinLineLength attribute is used to define the phrase "long" as applied to statements, expressions, and lists. These are considered to be "long" and thus are broken into a vertical list if their number of characters exceeds MinLineLength. Its default value is 60. |
In addition, the VerticalList command may have zero or more <Breaker> elements. Each breaker contains a user-defined character strings called Breakers:
<VerticalList MinLineWidth="160" > <Breaker>+ ", " +</Breaker> <Breaker>+ "'" + "\r\n" +</Breaker> <Breaker>+ "\r\n" +</Breaker> </VerticalList>
The guidelines for setting these user-defined strings observes that in practice the longest statements are often multi-part string concatenations that may be broken into more meaningful chunks based on user-defined character sequences called "breakers". These user-defined breakers will take precedence over the default set of single token-based breakers (e.g. arithmetic and logical operators and commas). Their precedence is based on the order of <Breaker> elements in the VerticalList section.