gmniCodeStyle

Implementing Target Coding Standards

The default translations produced by gmBasic are generic and designed to be compilable even in situations where the target code is not fully mature. These translations are useful for most applications; however, they may not meet the desired coding standards. The gmCodeStyle.exe is a Custom Translation Engine distributed with gmStudio that demonstrates how to produce translations that follow alternative coding standards, gmCodeStyle.exe is a .NET assembly implemented in C# using the gmAPI framework. 

Contact Us of you would like to see gmCodeStyle in action with.

The transformations performed by CodeStyle take particular advantage of the following features of the tool:

  1. There are two identifiers maintained for each symbol -- source and target. By default the target identifier is set equal to the source, but it can be changed at will. This makes changing the naming conventions relatively simple.
  2. Code to be made available to the tool is linked into a dynamic-link-library that is then executed by the tool when certain events occur. A key event is the FinishAnalyser event that is triggered when the underlying code has been completed, but before it is passed to the author for surface-form formulation.
  3. The tool has built in code for authoring declarations; however, there is an AuthorDeclaration event which can be used to override the default declaration.
  4. When the tool actually authors the final target code, rather than actually writing it to a file it enters it into a stored text buffer. There is an EditTranslation event and an extensive text-editing service that can be used change the content of the text buffer before it is finally sent to the output file.

CodeStyle.std.xml

gmCodeStyle.exe uses a specifications file directing the coding style conventions.  The specification file is implemented with XML script and placed in the target location (workspace\usr folder) of the migration project.  Its full name is %targetLocation%\CodeStyle.%sysId%.xml.  A sample, CodeStyle.std.xml, is installed with gmStudio and may be may be activated in your project using the gmStudio Configuration form displayed by clicking Settings on the toolbar.  The initial record for this file must be <CodeStyle> and it must end with </CodeStyle>. Between those two tags are the various code style specification commands.  

The Messages Command

The Messages command specifies the syntax to be used for warning messages needed during processing. The Messages command has a set of Entry subcommands with the following attributes:


Attribute Description
idSpecifies the identifier of the message. The only current one is RENAME which is issued when the process attempts to introduce a new name in the target symbol table which might cause a name clash.
nameSpecifies the actual message to be issued.


The sample below uses a format that is compatible with similar messages produced by the tool.


<Messages>
   <Entry id="Rename" name="UPGRADE_TODO: identifier [$1d] for $2d already defined for $3d" />
</Messages>
For RENAME argument $1d is the created identifier that is causing the clash; argument $2d is the fully-qualified identifier of the component that was to receive the identifier; and argument $3d is the fully-qualified identifier of the component that already has the identifier.

The Indent Command

The gmBasic tool keeps track of indentation level as it authors the target code. The Indent command can be used to specify how much white space is to be associated with each indentation level. The only attribute of this command is Value which specifies a value greater than or equal to zero. A value of zero indicates that a tab should be used for each indentation level; while a nonzero value of n specifies that n spaces should be associated with each level. Thus, the following CodeStyle file


<CodeStyle>
    ...
   <Indent value="4" />
    ...
</CodeStyle>
will produce a well indented code with 4 spaces allocated for each indentation level.

Note that the Select indent="width"> command in the translation script may also be used to set the indent. And if inserted in that script immediately before the Author command would override the CodeStyle entry here, since indentation value is set when the command is read via the StartPass2 event handler.

The Hungarian Command

The Hungarian command deals with the issue that some VB6 codes use Hungarian prefix notation to indicate the binary type of quantity symbols. The goal is to remove these prefixes from the target code and then possibly to use other conventions to name the target symbols. Renaming is triggered by the presence of a list of source code prefixes within the Hungarian command; however, much more machinery is needed if compilable target code is needed.


The Hungarian command has a set of subcommands that organize the needed information. The following subtopics describe the subcommands themselves, and then describe the algorithms that applythem.

The Rename Subcommand

The Rename subcommand can appear anywhere within the Hungarian command. It changes the authored name of a symbol and blocks the application of any of the renaming algorithms specified to that name. The attributes of the Rename statement are as follows:


Attribute Description
IdentifierThis required identifier attribute specifies the component to be renamed. It is specified relative to the root of the symbol table -- i.e., it is a fully qualified identifier. It is expected that the same CodeStyle script will be used by multiple code sets. If an undefined identifier is encountered, it is simply assumed to apply to a different code set and is skipped.
ContentThis required identifier specifies the name to be used for the component in the target code.

The Rename subcommand is applied as the Hungarian command is being read which means that it applies before any of the code style specific algorithms are applied. Note that refactoring Rename commands may be entered in the translation scripts themselves and cause the same blocking of the code style algorithms for individual identifiers.

The SourcePrefixes Subcommand

The SourcePrefixes subcommand specifies the binary type Hungarian prefixes. Only variables are assumed to have type prefixes. It is the presence of a SourcePrefixes subcommand that triggers steps 3 through 7 of the renaming algorithm. The command itself introduces a series of Entry subcommands each of which has two required attributes:


Attribute Description
TypeSpecifies the binary type that has a certain prefix. The possible binary type identifiers are listed below.
ValueSpecifies the actual Hungarian prefix in case sensitive form. If a variable of the type indicated by the Type attribute has this prefix then that prefix is stripped.


These types are as follows:


VB6 .NET Equivalent C#, VB.NET
Bytebyte, Byte
Shortshort, Short
Integerint, Integer
Longlong, Long
Currencydecimal, Decimal
Singlefloat, Single
Doubledouble, Double
Stringstring, String
Booleanbool, Boolean
DateDateTime
Variantobject, Object
Objectobject, Object
Userobject, Object
ControlSystem.Windows.Forms.Control

Second there the special processing types used by gmBasic to deal with various special circumstances:


Vb6Special .NET Equivalent C#, VB.NET
IconSystem.Drawing.Icon
FrxPictureSystem.Drawing.Image
Anyobject, Object
TwipsXint, Integer
TwipsYint, Integer
UnsIntegerunit, Integer
WinPanelSystem.Windows.Forms.GroupBox
VarArrayObject[], Object()
StringPtrSystem.Text.StringBuilder, String
CallHwnd4MigrationSupport.Vb7_Callback.Hwnd4
ControlCollectionSystem.Windows.Forms.Control.ControlCollection
CheckedListBoxSystem.Windows.Forms.ListBox
ExceptionSystem.Exception
SafeArraySystem.Array
SecurityManagerUserSecurityManager
Dynamicdynamic
ValueTypeobject, Object


Third are the VB6 classes:


Vb6Class .NET Equivalent C#, VB.NET
PictureBoxSystem.Windows.Forms.PictureBox
LabelSystem.Windows.Forms.Label
TextBoxSystem.Windows.Forms.TextBox
FrameSystem.Windows.Forms.GroupBox
CommandButtonSystem.Windows.Forms.Button
CheckBoxSystem.Windows.Forms.CheckBox
OptionButtonSystem.Windows.Forms.RadioButton
ComboBoxSystem.Windows.Forms.ComboBox
ListBoxSystem.Windows.Forms.ListBox
HScrollBarSystem.Windows.Forms.HScrollBar
VScrollBarSystem.Windows.Forms.VScrollBar
TimerSystem.Windows.Forms.Timer
PrinterMigrationSupport.Printer
FormSystem.Windows.Forms.Form
DriveListBoxMicrosoft.VisualBasic.Compatibility.VB6.DriveListBox
DirListBoxMicrosoft.VisualBasic.Compatibility.VB6.DirListBox
FileListBoxMicrosoft.VisualBasic.Compatibility.VB6.FileListBox
MenuSystem.Windows.Forms.ToolStripMenuItem
MDIFormSystem.Windows.Forms.Form
ShapeSystem.Windows.Forms.Label
LineSystem.Windows.Forms.Label
ImageSystem.Windows.Forms.PictureBox
DataMigrationSupport.DataControl.DataControl
PropertyPageMigrationSupport.PropertyBag
TabControlSystem.Windows.Forms.TabControl
ErrObjectVBNET.ErrObject, ErrObject


Fourth are the VB6 enumerations:


Vb6Enumeration .NET Equivalent C#, VB.NET
SimpleBorderStyleSystem.Windows.Forms.BorderStyle
KeyCodeConstantsSystem.Windows.Forms.Keys
LogEventTypeConstantsSystem.Diagnostics.EventLogEntryType
DrawStyleMigrationSupport.Utils.DrawStyle
DrawModeMigrationSupport.Utils.DrawMode
MousePointerConstantsSystem.Windows.Forms.Cursor
WindowStyleVBNET.AppWinStyle, AppWinStyle
OpenModeVBNET.OpenMode, OpenMode
vbTristateVBNET.TriStatem TriState
ScaleTypeMigrationSupport.Utils.ScaleType
VbCompareMethodVBNET.CompareMethod, CompareMethod
VbFileAttributeVBNET.FileAttribute, FileAttribute
MsgBoxResultVBNET.MsgBoxResult, MsgBoxResult
VbMsgBoxStyleVBNET.MsgBoxStyle, MsgBoxStyle
VariableTypeVBNET.VariantType, VariantType
ButtonAppearanceStyleSystem.Windows.Forms.Appearance
ApplicationStartModeMigrationSupport.Utils.StartMode
MouseButtonConstantsSystem.Windows.Forms.MouseButtons
ResourceTypeMigrationSupport.Utils.ResourceType
FirstDayOfWeekVBNET.FirstDayOfWeek, FirstDayOfWeek
FirstDayOfYearVBNET.FirstDayOfYear, FirstDayOfYear
DueDateVBNET.DueDate, DueDate
AlignConstantsMigrationSupport.Utils.AlignConstants
CheckboxConstantsSystem.Windows.Forms.CheckState
AlignmentConstantsSystem.Drawing.ContentAlignment
BorderStyleSystem.Windows.Forms.FormBorderStyle
ComboBoxStyleSystem.Windows.Forms.ComboBoxStyle
ColorConstantsSystem.Drawing.Color
LayoutArrangementMdiLayout
RLDirectionSystem.Windows.Forms.RightToLeft
ShiftConstantsMigrationSupport.Utils.ShiftConstants
BackStyleMigrationSupport.Utils.BackStyleConstants
QueryUnloadConstantsMigrationSupport.Utils.QueryUnloadConstants
ClipboardConstantsMigrationSupport.Utils.ClipboardConstants

The type designator Object refers to any external type and the type designator User refers to any user defined type. Below is a simple SourcePrefixes specification.


 <SourcePrefixes >
    <Entry type="Boolean" value="bln" />
    <Entry type="String"  value="str" />
    <Entry type="Integer" value="lng" />
    <Entry type="User"    value="obj" />
    <Entry type="Object"  value="dic" />
 </SourcePrefixes>
It is permissible to have multiple types with the same prefix or multiple prefixes with the same type. The search for a prefix that can be searched continues in the order that the entries were specified until a matching type and prefix is encountered.

The ExcludedSuffixes Subcommand

One of the potential problems with stripping identifiers of their Hungarian prefixes is that there will be symbols whose identifiers are distinguished only by their prefixes. The ExcludedSuffixes command specifies these symbols. Any identifier that ends in one of the excluded suffixes is excluded from the renaming algorithm. The comparison is case insensitive. The actual list of symbols can be entered as a single semicolon delimited list using a single value attribute. This might look as follows


 <ExcludedSuffixes value="Data;Connection;ErrorMessage;Table;Field;IndexName" />
Alternatively a group of Entry subcommands, each with a value attribute, can be used to specify the suffixes individually. This might look as follows


 <ExcludedSuffixes >
    <Entry value="Data" />
    <Entry value="Connection" />
    <Entry value="ErrorMessage" />
    <Entry value="Table" />
    <Entry value="Field" />
    <Entry value="IndexName" />
 </ExcludedSuffixes >
The two forms above would produce the same result.

The StatusPrefixes Subcommand

In addition to the binary type Hungarian prefixes there are sometimes also various types of status Hungarian prefixes which must be stripped before the actual type prefixes can be examined. The StatusPrefixes command specifies these prefixes. Any identifier that begins with one of these prefixes has that prefix stripped off. The comparison is case insensitive. The actual list of prefixes can be entered as a single semicolon delimited list using a single value attribute. This might look as follows


 <StatusPrefixes value="m_;i_;o_;io_;l_" />
Alternatively a group of Entry subcommands, each with a value attribute, can be used to specify the prefixes individually. This might look as follows


 <StatusPrefixes >
    <Entry value="m_" />
    <Entry value="i_" />
    <Entry value=";o_" />
    <Entry value="io_" />
    <Entry value="l_" />
 </StatusPrefixes >
The two forms above would produce the same result.

The GlobalPrefixes Subcommand

Non local variables often also have a prefix used to indicate that they are not local, which also precede the type prefix. There might be identifiers like "gblnReadAll" for a global boolean variable so these are assumed to combine with Hungarian. So these need to be checked for as well and be stripped. They are specified via the GlobalPrefixes command. The comparison is case insensitive. The actual list of prefixes can be entered as a single semicolon delimited list using a single value attribute. This might look as follows


<GlobalPrefixes value="g;m" />
Alternatively a group of Entry subcommands, each with a value attribute, can be used to specify the prefixes individually. This might look as follows


<GlobalPrefixes >
   <Entry value="g" />
   <Entry value="m" />
</GlobalPrefixes >
The two forms above would produce the same result.

The NamingStyle Subcommand

The changing naming style algorithm is made possible by the fact that the modern target languages are case sensitive while the historical source languages are case insensitive. This allows modern naming styles to distinguish different symbol types based solely on the case pattern of their identifiers. The key notion here is CamelCase which is the practice of writing compound names such that each word or abbreviation within the name begins with a capital letter. Camel case may start with a capital or lowercase letter. As an example consider the identifier CamelCase itself beside its possible alternative camelCase. In general the naming style algorithm recognizes four case styles:


Style Description
lowercaseAll the alphabetic characters in the identifier are lowercase as in "lowercase"
uppercaseAll the alphabetic characters in the identifier are uppercase as in "UPPERCASE"
lowercamelThe first character of words in the identifier begin with an uppercase character followed by lowercase characters except the first character which is lowercase as in "lowerCamel"
uppercamelThe first character of words in the identifier begin with an uppercase character followed by lowercase characters as in "UpperCamel"

The NamingStyle subcommand itself specifies the naming style to be associated with symbols names. This command has the following attributes:


Attribute Description
StyleSpecifies the naming style to be used. It has 5 possible entries -- Original, LowerCase, UpperCase, LowerCamel, and UpperCamel. The Original style resets the name to its original form as of the end of the renaming algorithm. The other styles are discussed above.
ObjectSpecifies the object type of the symbol. It has the following possible entries -- Subprogram, Variable, Constant, Property, Declaration, Structure, Enumeration, EnumeratedEntry, StatementLabel, Event, Vb_Name.
AccessSpecifies the access type of the symbol. It has the following possible entries -- , Public, Private.
TypeSpecifies a binary type. The possible binary type identifiers are discussed under the SourcePrefixes subcommand.
PrefixIn addition to the case style of the name a prefix can be added to the front of the name as well. This attribute specifies that prefix. Note that combining these prefixes with types allows the reintroduction of Hungarian notation in the target names, if that is desired.

Here is a sample set of NamingStyle entries.


 <NamingStyle>
    <Entry style="Original" object="Vb_name" />
    <Entry style="lowerCamel" access="local" />
    <Entry style="lowerCamel" access="Private" object="Variable" prefix="_" />
 </NamingStyle >

The SpecialNames Subcommand

There are some special names specified in the gmBasic language files, such as arguments to event handlers, That are also referenced by micro-code in the language files. These can not be changed via this set of specifications. The ones generated by the client code translations must be listed as SpecialNames so that they are not changed. The comparison is case insensitive. The actual list of special names can be entered as a single semicolon delimited list using a single value attribute. This might look as follows


<SpecialNames value="Cancel;UnloadMode" />
Alternatively a group of Entry subcommands, each with a value attribute, can be used to specify the special names individually. This might look as follows


<SpecialNames >
   <Entry value="Cancel" />
   <Entry value="UnloadMode" />
</SpecialNames >
The two forms above would produce the same result.

The Acronyms Subcommand

The NamingStyle algorithm has no way of locating words within compound names, because it does not know what the names are. There is one exception to this -- acronyms like "SQL" or "XML". The Acronyms command specifies a list of acronyms or simply words which should be entered in a particular style in the target name. The individual entries are specified in their desired target language form. The algorithm does a case insensitive search of each name for the entry and, if found, substitutes the target form for the original form. The actual list of acronyms can be entered as a single semicolon delimited list using a single value attribute. This might look as follows


<Acronyms Value="Xml;Sql" />
Alternatively a group of Entry subcommands, each with a value attribute, can be used to specify the acronyms names individually. This might look as follows


<Acronyms >
   <Entry Value="Xml" />
   <Entry Value="Sql" />
</Acronyms >
The two forms above would produce the same result.

The ReservedWords Subcommand

The NamingStyle algorithm can form reserved words like default or in. These can be repaired by changing their case. The ReservedWords command specifies the list of reserved words in the form in which they can be used as identifiers in the targe code. The algorithm does a case insensitive search of each name for the entry and, if found, substitutes the target form for the original form. The actual list of reserved words can be entered as a single semicolon delimited list using a single value attribute. This might look as follows


<ReservedWords value="Default;String;In" />
Alternatively a group of Entry subcommands, each with a value attribute, can be used to specify the reserved words individually. This might look as follows


<ReservedWords >
   <Entry Value="Default" />
   <Entry Value="String" />
   <Entry Value="In" />
</ReservedWords >
The two forms above would produce the same result.

The LoopVariables Subcommand

The LoopVariables command changes the names of loop variables. There is a common convention in code bases to use simple identifiers like i or j for loop variables. These simple identifiers can be difficult to find and/or trace in the target code. This command changes these identifiers to something more readable like "index". The LoopVariables command has a set of Entry subcommands with the following attributes:


Attribute Description
idSpecifies the identifier in the source code of a loop variable to be renamed. Comparison is case sensitive.
nameSpecifies the identifier in the target code to be used for the loop variable.

The change only applies to variables that are explicitly used as a counter in a For loop. A possible LoopVariables specification might be as follows.


<LoopVariables >
   <Entry id="i" name="loopIndex" />
</LoopVariables>
Whenever new identifiers are introduced, name clashes can occur. As a result of using the above, the following error might occur when the target code is compiled.


sample.cs(1666,14): error CS0136: A local variable named 'loopIndex' cannot be declared in this scope
   because it would give a different meaning to 'loopIndex', which is already used in a 'parent or current'
   scope to denote something else [C:\temp\Sample.csproj]
Notice that the scope rules can be complicated. The .NET languages do not support strictly hierarchical symbol scope. The only solution is to use a different new name.

Algorithm to Strip Source Identifiers

The renaming algorithm is the first algorithm applied to the target code. It is applied after all code in a given code unit has been compiled and analyzed. At this point in time, in addition to the compiled code there is also a symbol table. Though additional renaming can occur during a later code scan, the bulk of the renaming process is done through a scan of the symbol table. It begins by applying the source specifications so that a root identifier is formed which can the be used the form a target identifier: It proceeds as follows:
  1. The symbol table is scanned looking for any symbols that are a subprogram, variable, constant, property, declare, structure, enumeration, enumeration entry, statement label, event, or class name. These are the types of symbols that can be renamed here. The following steps apply to each one of these symbols separately. Note that any symbol that already has a target name associated with it via a Rename command is skipped as well.
  2. The access type of the symbol is determined -- local, public, or private.
  3. If the source code used Hungarian notation, then the source Hungarian prefixes can be removed. The specification commands include a Hungarian command which supplies the prefix used for each binary and access type combination. The presence of this specification triggers the prefix removal steps.
  4. In source codes there will often be symbols whose identifiers are only distinguished by their Hungarian prefixes. A list of these symbols is supplied via a ExcludedSuffixes command.
  5. In source codes there are often symbol status codes that precede the actual Hungarian type prefix. These must be checked for first and stripped from the identifier. They are specified via a StatusPrefixes command.
  6. Non local variables often also have a prefix used to indicate that they are not local. These must be checked for as well and be stripped. They are specified via the GlobalPrefixes command.
  7. Finally the actual Hungarian prefixes can be stripped.

Algorithm to Form Target Identifiers

Once the source symbol has be stripped of its hungarian annotations, the target language naming styles can be applied. The actual application of this algorithm depends upon the presence of a NamingStyle subcommand within the Hungarian command. Any identifiers skipped because they were explicitly renamed are not changed by this algorithm. Also before the names can be changed into one of the style forms they first need to be changed into a standard form from which the other styles can be derived. That standard form is UpperCamel. The problem is that the algorithm here has no way of breaking possibly compound names into their component words. Fortunately, many code bases use the underscore character in symbol names to separate their words. At this point then the algorithm looks for names like "KEY_QUERY_VALUE" and changes them to "KeyQueryValue". Any name that does not have this form is simply changed by making its first character upper case. Some typical names changes at this point might be as follows:


Original Changed
KEY_ALL_ACCESSKeyAllAccess
READ_CONTROLReadControl
STANDARD_RIGHTS_READStandardRightsRead
SYNCHRONIZESynchronize
KEY_READKeyRead
dwTypeDwType
szDataSzData
cbDataCbData
ctlReadyToGenerateCtlReadyToGenerate
enumOperationModeEnumOperationMode
ctlSelectDatasourceCtlSelectDatasource

When the algorithm applies, the binary, component, and access types of the symbol underlying the identifier are all known. The algorithm itself proceeds as follows:
  1. Exclude any SpecialNames that are referenced by the micro-code in the language files
  2. Convert the names into uppercamel form when the word boundaries can be detected.
  3. Apply the specifications in the NamingStyle command
  4. Repair any ReservedWords that may have been formed

The DoNotInitialize Command

The DoNotInitialze command removes default initializations of variables and fields that are not necessary to avoid using an uninitialized value. The default translations produced are generic and designed to be compilable even in situations where the target code is not fully mature. By default, all variable and field declarations have an initialization value specified regardless of need.

The Fields Subcommand

The Fields subcommand requests the fields that have a public access type not be supplied with a default initialization value. The subcommand is a singleton with no attributes. It appears as follows.


<Fields />

The Variables Subcommand

The Variables subcommand requests that local variables that are assigned a value within the code, not also be assigned a default value. Simply being assigned a value is too weak. The actual test used here traces all references to the variable to make certain that no nested use of the variable is on a possibly unassigned path through the code. The subcommand is a singleton with no attributes. It appears as follows.


<Variables />

The OutParameters Subcommand

The OutParameters subcommand examines all parameters that are being passed ByRef to determine if their values are being changed before they are being used. If so, then they can be reclassified as being ByOut. The actual command is a singleton with no attributes. It looks like this


<OutParameters />
The tool already removes ByRef specifications from parameters that are not changed by their subprograms, making them ByVal; therefore, the only additional check needed is to verify that the first change precedes the first use. This check is equivalent to the one done by the variables subcommand, and is the first reason why this operation is included under the DoNotInitialize command. Making this additional check in the code and changing the reference status of the parameter is straight-forward. The problem is that the Out parameters have an additional requirement. Making the simple change causes errors of the following sort.


sample.cs(669,6): error CS0177: The out parameter 'script' must be assigned to before control
leaves the current method [C:\temp\Sample.csproj]
Out parameters must be assigned along ALL PATHS before control leaves the method. In cases where there are branching or conditional statements, the parameters may not get assigned on all paths through the code. The simple solution would be to add initialization code to the start of the methods for all out parameters, but this would often lead to redundant initializations. Again the same algorithm that tests variables can be used here.


The transformation converts many ByRefs to ByOuts, but not all. The code inserts initialization code only if absolutely needed. Even code assignments to variable are removed if those variables are then directly passed ByOut


The ByRef to ByOut status change is made during the first pass though the operation code. Of course, if a parameter is ByOut, then any arguments passed to it elsewhere in the code must be annotated with out as opposed to ref. The repair of these annotations is then done during the second pass through the operation code.

The SimpleProperty Command

The SimpleProperty command checks for simple getter/setter properties whose operation codes match a code pattern and then reauthors them using a specified .NET surface form pattern. As an example consider a VB6 property source pattern that always includes On Error GoTo error handling code. In the .NET implementation this error handling code is to be removed and the properties are to be authored using an internal declaration. Here is a sample code


Private fieldValue As ValueType
Friend Propery Get PropValue() As ValueType
   On Error GoTo ErrorHandler
   PropValue = fieldValue;
   Exit Property
ErrorHandler:
   ...
End Property
Friend Property Let PropValue(ByVal myValue As ValueType)
   On Error GoTo ErrorHandler
   fieldValue = myValue
   Exit Property
ErrorHandler:
   ...
End Property
The default translation for this property as produced by the tool is


private ValueType fieldValue = "";
public ValueType PropValue
{
   get
   {
      ValueType PropValue = "";
      try
      {
         PropValue = fieldValue;
         return PropValue;
      }
      catch(Exception exc)
      {
         ...
      }
      return PropValue;
   }
   set
   {
      try
      {
         fieldValue = value;
         return;
      }
      catch(Exception exc)
      {
         ...
      }
   }
}
The desired translation for these properties is


private ValueType fieldValue;
internal ValueType PropValue
{
   get { return fieldValue; }
   set { fieldValue = value; }
}
Notice first of all, that the tool converts the error handling into try-catch. Also most code style transformations will involve renaming symbols so the target names for the symbols will possibly change as will the identifier for the .NET implementation of ValueType. It might be possible to write editing code that looks for patterns in the actual authored code, but it would be very difficult. The operation code, however, for these is very patterned.


Actual csh Codeblock Associated with Get:
Opcode | Operation support information
------ | -----------------------------
NEW    | 25 On Error GoTo ErrorHandler
NEW    | 27 PropValue = fieldValue
ERR    | Try
LEV    | Nest0
LDA    | Variable:fieldValue:610921
ARG    | ValueType
LDA    | Property:PropValue:610968
STR    | AssignValue
NEW    | 29 Exit Property
LDA    | Property:PropValue:610968
EXI    | Function
ERR    | Catch1
    ...
ERR    | Catch3

Actual csh Codeblock Associated with Let:
Opcode | Operation support information
------ | -----------------------------
NEW    | 36 On Error GoTo ErrorHandler
NEW    | 38 fieldValue = myValue
ERR    | Try
LEV    | Nest0
SPV    | Value
ARG    | String
LDA    | Variable:fieldValue:610921
STR    | AssignValue
NEW    | 40 Exit Property
EXI    | Property
ERR    | Catch1
  ...
ERR    | Catch3
Note that the above requires that the type of the property and the type of the variable are the same. This is not true or necessary in general. The only requirement is that the two types can be cast to each other.


The SimpleProperty subcommands specify the code patterns for the getters and setters or letters that qualify them for simplification and the actual syntax of the simplified target code. These must all be specified via this command. There are also two optional commands that deal with public field properties and enumerators.

The Getter Subcommand

The Getter subcommand specifies a set of code patterns that a given property getter must match if it is to be authored in a simpler way. It has a series of Entry subcommands that specify the actual code patterns. Here is the specification for the above example along with a second pattern for a getter that has no try-catch.


<Getter>
   <Entry value="NEW,NEW,ERR.Try,LEV,LDA,ARG,LDA,STR.AssignValue,NEW,LDA,EXI,ERR.Catch1,...,ERR.Catch3" />
   <Entry value="NEW,Argument,EXI.Function" />
</Getter>
Note that the initial LDA operation is assumed to specify the field that contains the value. There can be multiple code patterns specified, if needed.

The Setter Subcommand

The Setter subcommand specifies a set of code patterns that a given property letter or setter must match if it is to be authored in a simpler way. It has a series of Entry subcommands that specify the actual code patterns. Here is the specification for the above example along with a second pattern that has no try-catch.


<Setter>
   <Entry value="NEW,NEW,ERR.Try,LEV,SPV.Value,ARG,LDA,STR.AssignValue,NEW,EXI,ERR.Catch1,...,ERR.Catch3" />
   <Entry value="Argument,LDA,STR.AssignValue" />
</Setter>
There can be multiple code patterns specified, if needed. To be eligible for a simplification the getter and setter codes must match at least one of their specified patterns.

The AuthorSame Subcommand

The AuthorSame subcommand contains the patterned text block that specifies how the simplified property is to be authored when the type of the property and the type of the value are the same. Note the manner in which the text is surrounded by CDATA directives. These are required in the form shown. Also the dollar sign, as opposed to the percent sign, is used to mark the locations of the variable strings in the pattern.


<AuthorSame><![CDATA[
private $1d $3d;
$5d $4d $2d
{
   get { return $3d; }
   set { $3d = value; }
}
The patterns assume for variable strings as follows:
  1. is the .NET identifier of the value type.
  2. is the target form of the property identifier. This may well be the output of the renaming algorithms.
  3. is the target form of the field identifier. This may well be the output of the renaming algorithms.
  4. is the .NET identifier of the property
  5. is the .NET scope specification. If the property was Public or Friend then it is "public" else it is "internal".

The AuthorDifferent Subcommand

The AuthorDifferent subcommand contains the patterned text block that specifies how the simplified property is to be authored when the type of the property and the type of the value are different.


<AuthorDifferent><![CDATA[
private $1d $3d;
$5d $4d $2d
{
   get { return ($4d)$3d; }
   set { $3d = ($1d)value; }
}
The low level required syntax and string values are as specified above.

The PublicFields Subcommand

Many code style standards forbid the use of global fields. They prefer global properties. In .NET there are auto-properties that can be used to define what were simply global fields in VB6.


Public GlobalField As fieldType
can be authored in C# as


 public static fieldType GlobalField { get; set;}
The PublicFields subcommand specifies that global fields be authored differently than their default. It simply specifies the text block to be used to do the authoring. To reproduce the above


<PublicFields><![CDATA[
public static $1d $2d { get; set; }
where the $1d parameter refers to the type of the field and the $2d parameter refers to the name of the field.


During the initial symbol scan of the FinishAnalyser event global fields are marked so that the AuthorDeclaration event can be used to override their default declaration.


Some caution should be used because code that passed a public field ByRef will fail to compile if that field was declared as an auto-property. Since this passing might be outside of the compilation unit, it would be difficult to know of in advance.

The GetEnumerator Subcommand

The VB6 NewEnum property getters are replaced by .NET GetEnumerator() methods. These methods should not contain any initialization code and must almost always be rewritten as part of a migration. By default, then, the tool strips away all code from the getter and simply authors it using this gmSL method.


void AuthorGetEnumerator(int iHost)
{
   if(Select.Dialect == Dialects.csh)
   {
      #TextStart
      public IEnumerator GetEnumerator()
      {
          return (%= Store.GetName(iHost) %).GetEnumerator();
      }
      #TextEnd
   }
   else
   {
      #TextStart
      Public Function GetEnumerator() As IEnumerator
         GetEnumerator = (%= Store.GetName(iHost) %).GetEnumerator();
      End Function
      #TextEnd
   }
}
The GetEnumerator subcommand instructs the tool to retain the original version of the code in the NewEnum getter. The only problem is what to do with VB6 code like


Set NewEnum = mcolTables.[_NewEnum]
whose surface pattern in the target code is by default


 <csh role="property" narg="1" code="MigrationSupport.Utils.NewEnum(%1d)" />
The command is a singleton with a single optional attribute Entry. If specified this attribute supplies an alternative for the surface pattern. For example


<GetEnumerator entry="%1d.GetEnumerator()" />

The CodeScan Operations

When the initialize scan of the symbol table via the FinishAnalyser event handler encounters a property or a variable, it invokes the operations of this command. If the symbol is a variable with a Public access type and if a special declaration was specified via the PublicFields subcommand, then the information vector for that symbol is marked so that the it can be declared later as specified.


If the symbol is the NewEnum property with a getter, and if the GetEnumerator subcommand was specified, then the information vector of the getter is marked so that the tool will not override its code and if specified the COL.NewEnum operations are replaced with the desired one.


Finally, the actual code patterns of the property getter and setter or letter are compared with the code patterns specified in their subcommand. If both match one of the specified pattern entries, then the information structure of the field backing up the value of the property is marked so that it can be authored later via the AuthorDeclaration event handler.

The Authoring the Declarations

When the AuthorDeclaration event handler is called for a field whose information structure is marked with propery, then the property is authored in the way specified by command. The four needed strings are formed from the information in the symbol table. then the appropriate form is used depending upon whether the types are the same or different.

The ChangeIntroduced Command

The ChangeIntroduced command changes introduced variables so that they follow the same naming conventions that other variables follow. The primary source of introduced variables is the need to create a variable when a constant or an expression or an object instance of the wrong class serves as an argument to a ByRef or ByOut parameter. The need to create these variables pervades the VBb to .NET migration process. The tool carefully analyses user code parameters to change them to ByVal whenever possible. But the tool has no control over the status of parameters in external libraries which are often needlessly ByRef. To make these easy to find during the migration process the tool uses a standard naming convention, argTemp(n), to name these introduced temporaries. This makes them easy to find in the target code. Here is an example.


 object argTemp1 = MigrationSupport.Utils.VarPtr(SwapEndian) + 3;
 object argTemp2 = dw;
 CopyMemory(ref argTemp1,ref argTemp2,1);
 object argTemp3 = MigrationSupport.Utils.VarPtr(SwapEndian) + 2;
 object argTemp4 = MigrationSupport.Utils.VarPtr(dw) + 1;
 CopyMemory(ref argTemp3,ref argTemp4,1);
 object argTemp5 = MigrationSupport.Utils.VarPtr(SwapEndian) + 1;
 object argTemp6 = MigrationSupport.Utils.VarPtr(dw) + 2;
 CopyMemory(ref argTemp5,ref argTemp6,1);
 object argTemp7 = SwapEndian;
 object argTemp8 = MigrationSupport.Utils.VarPtr(dw) + 3;
 CopyMemory(ref argTemp7,ref argTemp8,1);
Though easy to find, this makes to target code ugly and for many difficult to read.


The alternative supported by the ChangeIntroduced command is the use the identifier of the original parameter, which obviously follows the naming conventions, to form the name of the introduced variables. The above then becomes this.


  object lpvSource = null;
  object lpvDest = null;
  lpvDest = MigrationSupport.Utils.VarPtr(SwapEndian) + 3;
  lpvSource = dw;
  CopyMemory(ref lpvDest,ref lpvSource,1);
  lpvDest = MigrationSupport.Utils.VarPtr(SwapEndian) + 2;
  lpvSource = MigrationSupport.Utils.VarPtr(dw) + 1;
  CopyMemory(ref lpvDest,ref lpvSource,1);
  lpvDest = MigrationSupport.Utils.VarPtr(SwapEndian) + 1;
  lpvSource = MigrationSupport.Utils.VarPtr(dw) + 2;
  CopyMemory(ref lpvDest,ref lpvSource,1);
  lpvDest = SwapEndian;
  lpvSource = MigrationSupport.Utils.VarPtr(dw) + 3;
  CopyMemory(ref lpvDest,ref lpvSource,1);
The identifiers of parameters can be very simple so there is always a possibility of name clash. If there are conflicts between these new names introduced, the name of the method is appended as well. This removes argTemps from the code replacing them with "conventional names". There may be unintended clashes as well. To avoid these the ChangeIntroduced command has a set of Entry subcommands with the following attributes:


Attribute Description
idSpecifies the generated identifier of an introduced variable in case sensitive form.
nameSpecifies the name to be used instead of the generated identifier.


The an actual command might look as follows


<ChangeIntroduced>
   <Entry id="index" name="indexPram" />
</ChangeIntroduced>
and may appear anywhere in the CodeStyle Script. The logic for this command has been added to the operation code scan. It locates argTemps being generated and replaces them with an identifier for the parameter receiving the argument.

The OperationCode Command

The OperationCode command contains requests to introduce code styles that require changing the operation code. Its subcommands are implemented during the final pass of the operation code via the FinishAnalyser event handler.

The OptimizeFunctions Subcommand

The OptimizeFunctions subcommand is a singleton command with no attributes. It basic role is to replace sequences like the following in the target code


 static bool myFunction
 {
    bool myFunction;

    myFunction = false;
    return myFunction;
 }
with the simpler


 static bool myFunction
 {
    return false;
 }
Note that the declaration of the internal function variable is removed only if there are no other references to it. The optimization itself applies to both assignment statements and set statements. It may appear anywhere with the scope of the OperationCode command.


<OptimizeFunctions />

The PostIncrement Subcommand

The PostIncrement subcommand is a singleton command with no attributes. It requests that assignments to variables that simply add one be replaced by the ++ post increment operation. It may appear anywhere with the scope of the OperationCode command.


<PostIncrement />

The RemoveReturns Subcommand

The RemoveReturns subcommand is a singleton command with no attributes. It requests that additional checks be made for unneeded explicit return statements in the target codes. An example would be a return at the bottom of a try block whose catch block immediately precedes the end. It may appear anywhere with the scope of the OperationCode command.


<RemoveReturns />

The SimpleCasts Subcommand

The SimpleCasts subcommand is a singleton command with no attributes. It requests that casts within the target code of the form (type)(value or instance) be replaced with the form (type)value or instance. This subcommand is implemented by replacing the CNV.CastType operation with CNV.CastSimple. It may appear anywhere with the scope of the OperationCode command.


<SimpleCasts />

The StandardFunctions Subcommand

The StandardFunctions subcommand replaces references to the standard VB6 functions with alternative operations that give different target code for them. The subcommand has Entry subcommands that specify the individual functions and their desired target code surface pattern. The Entry subcommand has two attributes as follows:


Attribute Description
idThe VB6 source code identifier of the function
nameThe desired target code surface pattern


Here is a sample of this command


<StandardFunctions>
   <Entry id="Trim"   name="%1d.Trim()" />
   <Entry id="Left"   name="%1d.Substring(0,%2d)" />
   <Entry id="InStr"  name="%2d.IndexOf(%3d,%1o)" />
   <Entry id="Right"  name="%1d.Substring(%1d.Length - %2h)" />
   <Entry id="Len"    name="%1d.Length" />
 </Standard Functions>
As can be seen, it is common to replace the commonly used function(arguments) notation with postfix notation.

The OptionalArguments Command


Starting with the March 2023 release, OptionalArguments="on" is set by default in the standard translation template script.  


Visual C# 2010 introduced optional arguments. The definition of a method, constructor, indexer, or delegate can specify that its parameters are required or that they are optional. Any call must provide arguments for all required parameters, but can omit arguments for optional parameters. Each optional parameter has a default value as part of its definition. If no argument is sent for that parameter, the default value is used. A default value must be a constant expression and it must be ByVal. Optional parameters are defined at the end of the parameter list, after any required parameters. If the caller provides an argument for any one of a succession of optional parameters, it must provide arguments for all preceding optional parameters. Comma-separated gaps in the argument list are not supported.


The default translation into C# do not use optional arguments, rather they supply the VB6 default values in the calls. The OptionalArguments command tells the tool to use them. It is a singleton command with no attribute.


<OptionalArguments />
The implementation of the command must precede the actual compilation of the code as it is the compiler that does the default value insertion in calls; therefore, the command is executed as part of the StartPass2 event handler. The symbol table is scanned looking for optional VB6 parameters. These are marked with the context flag OverLoad which blocks the default value insertion, and the migration status flag Overloads which authors the initialization value in the method declaration. Note that all parameters so marked are also forced to be ByVal.

The TargetCode Command

The TargetCode command contains requests to introduce code styles that require changing the target code directly. When the tool actually authors the final target code, rather than simply writing it to a file, it enters it into a stored text buffer. There is an EditTranslation event and an extensive text-editing service that can be used change the content of this text buffer before it is finally sent to the output file.

The AddSpaces Command

By default the target code does not add a space after each comma in lists, because the target output lines are often very long. The AddSpaces command adds these spaces. The command itself is a singleton with one attributes. The AddSpaces command has a Vertical="on" attribute that adds a blank line after a mainline right brace. This adds an additional line of separation after all "complex" component declarations that used braces, not just methods.


<AddSpaces Vertical="on" />

The AllowBlankLines Command

By default the translator passes all blank lines and empty comment lines in the source through to the target code. In addition the translator moves declarations when necessary to resolve nesting scope errors in the target. These moves can some times leave blocks of blank lines behind which come through to the target code. The AllowBlankLines subcommand will remove sequences of blank lines from the translations.
<AllowBlankLines Limit="n" />

This subcommand allows no more than "n" consecutive blank lines in authored code. The default does not check for blank lines so there is no limit. Setting the limit to zero will remove all blank lines from the target code.

The ReduceBraces Command

The ReduceBraces subcommand removes the braces from if/while/for statements, when they are controlling a single statement. A structure like the following


conditional
{
   statement
}
is reduced by removing the two braces and then optionally converting it to a compound statement.


conditional statement.
The actual ReducedBraces is a singleton command with one attribute Statement. If this attribute is on then a compound statement is formed. If it is off then no compound statement is formed. Consider the following target code


if (_disposed)
{
   return;
}
else
{
   Class_Terminate();
}
_disposed = true;
The specification


<ReduceBraces statement="off" />
produces this target code.


 if (_disposed)
    return;
 else
    Class_Terminate();
 _disposed = true;
While the specification


<ReduceBraces statement="on" />
produces this target code.


 if (_disposed) return;
 else Class_Terminate();
 _disposed = true;
The command looks for the specified pattern in the target text buffer, and when found makes the specified change. There is one special case that is checked for


if (enumTableStatus == basGlobal.DefinedEnum.DefDeleted)
{
   //  We don't need to add any more information to display
}
else ..
Is not equivalence to


if (enumTableStatus == basGlobal.DefinedEnum.DefDeleted) //  We don't need to add any more information to display
else ..
A special check for this had to be put into the editing code.


In addition to removing braces this command also removes any blank lines immediately following an opening left brace.

The RemoveUsing Command

The RemoveUsing subcommand removes specified using statements from the target code text buffer, unless that buffer contains one of a list of substrings.


<RemoveUsing>
   <Entry id="System.Drawing;" />
   <Entry id="System.Collections;" name=" IEnumerator "/>
   <Entry id="System.ComponentModel;" />
   <Entry id="System.Runtime.InteropServices;" name="[Dllimport" />
   <Entry id="System.Data;" />
   <Entry id="Microsoft.VisualBasic.CompilerServices;" />
   <Entry id="System.Linq;" name=".ToArray<" />
   <Entry id="System.Collections.Generic;" name="List<;Dictionary<;HashSet<" />
   <Entry id="VBNET = Microsoft.VisualBasic;" name="VBNET." />
</RemoveUsing>

The Replacements Command

The Replacements subcommand scans the target code text buffer for a specified substrings and either replaces them with a second substring or simply removes them. The subcommand has a set of Entry subcommands each of which define the individual substrings. It has two attributes:


Attribute Description
idA substring to be replaced or removed. A simple case insensitive text search is performed for the string -- i.e., substring boundaries are not considered.
nameThis optional substring is the replacement string to be used.


A typical set of entries might be


<Replacements>
   <Entry id="String.Empty" name="string.Empty" />
   <Entry id="System.Int32" name="int" />
   <Entry id="System.Windows.Forms." />
   <Entry id="this." />
   <Entry id="VBNET.Constants.vbNullString" name="null" />
</Replacements>

The VerticalList Command

The class VerticalList reformats long target code statements into readable form by converting them from horizontal form into vertical lists. The sorts of statements that are typically in need of this sort of reformatting are as follows:

  1. Calling / declaring methods with "many" parameters;
  2. Initializing arrays with many elements;
  3. Complex formulas/conditionals with a series of similar repeating factors;
  4. Building strings with a series of many concatenations.

All of these scenarios are reformatted by this command, but there is an important caveat. Not all target code can be processed by this class: only C# code produced by the surface code patterns specified in the metalanguage files can be processed. There are plans to also both VB.NET and XML/HTML target code as produced by the same surface codes as well, but these are not yet implemented.

The VerticalList statement is a section level statement with one optional attribute:

AttributeDescription
MinLineLengthThe MinLineLength attribute is used to define the phrase "long" as applied to statements, expressions, and lists.
These are considered to be "long" and thus are broken into a vertical list if their number of characters exceeds MinLineLength.
Its default value is 60.

In addition, the VerticalList command may have zero or more <Breaker> elements. Each breaker contains a user-defined character strings called Breakers:

 <VerticalList MinLineWidth="160" >
    <Breaker>+ ", " +</Breaker>
    <Breaker>+ "'" + "\r\n" +</Breaker>
    <Breaker>+ "\r\n" +</Breaker>
 </VerticalList>

The guidelines for setting these user-defined strings observes that in practice the longest statements are often multi-part string concatenations that may be broken into more meaningful chunks based on user-defined character sequences called "breakers". These user-defined breakers will take precedence over the default set of single token-based breakers (e.g. arithmetic and logical operators and commas).  Their precedence is based on the order of <Breaker> elements in the VerticalList section.

Table of Contents