Implementing Target Coding Standards

The default translations produced by gmBasic are generic and designed to be compilable even in situations where the target code is not fully mature. These translations are useful for most applications; however, they may not meet the desired coding standards. The gmCodeStyle.exe is a Custom Translation Engine distributed with gmStudio that demonstrates how to produce translations that follow alternative coding standards, gmCodeStyle.exe is a .NET assembly implemented in C# using the gmAPI framework.

Contact Us of you would like to see gmCodeStyle in action with.

The transformations performed by CodeStyle take particular advantage of the following features of the tool:

There are two identifiers maintained for each symbol -- source and target. By default the target identifier is set equal to the source, but it can be changed at will. This makes changing the naming conventions relatively simple.
Code to be made available to the tool is linked into a dynamic-link-library that is then executed by the tool when certain events occur. A key event is the FinishAnalyser event that is triggered when the underlying code has been completed, but before it is passed to the author for surface-form formulation.
The tool has built in code for authoring declarations; however, there is an AuthorDeclaration event which can be used to override the default declaration.
When the tool actually authors the final target code, rather than actually writing it to a file it enters it into a stored text buffer. There is an EditTranslation event and an extensive text-editing service that can be used change the content of the text buffer before it is finally sent to the output file.

CodeStyle.std.xml

gmCodeStyle.exe uses a specifications file directing the coding style conventions. The specification file is implemented with XML script and placed in the target location (workspace\usr folder) of the migration project. Its full name is %targetLocation%\CodeStyle.%sysId%.xml. A sample, CodeStyle.std.xml, is installed with gmStudio and may be may be activated in your project using the gmStudio Configuration form displayed by clicking Settings on the toolbar. The initial record for this file must be <CodeStyle> and it must end with </CodeStyle>. Between those two tags are the various code style specification commands.

The Messages Command

The Messages command specifies the syntax to be used for warning messages needed during processing. The Messages command has a set of Entry subcommands with the following attributes:

Attribute	Description
id	Specifies the identifier of the message. The only current one is RENAME which is issued when the process attempts to introduce a new name in the target symbol table which might cause a name clash.
name	Specifies the actual message to be issued.

The sample below uses a format that is compatible with similar messages produced by the tool.

<Messages>
   <Entry id="Rename" name="UPGRADE_TODO: identifier [$1d] for $2d already defined for $3d" />
</Messages>

For RENAME argument $1d is the created identifier that is causing the clash; argument $2d is the fully-qualified identifier of the component that was to receive the identifier; and argument $3d is the fully-qualified identifier of the component that already has the identifier.

The Indent Command

The gmBasic tool keeps track of indentation level as it authors the target code. The Indent command can be used to specify how much white space is to be associated with each indentation level. The only attribute of this command is Value which specifies a value greater than or equal to zero. A value of zero indicates that a tab should be used for each indentation level; while a nonzero value of n specifies that n spaces should be associated with each level. Thus, the following CodeStyle file

<CodeStyle>
    ...
   <Indent value="4" />
    ...
</CodeStyle>
will produce a well indented code with 4 spaces allocated for each indentation level.

Note that the Select indent="width"> command in the translation script may also be used to set the indent. And if inserted in that script immediately before the Author command would override the CodeStyle entry here, since indentation value is set when the command is read via the StartPass2 event handler.

The Hungarian Command

The Hungarian command deals with the issue that some VB6 codes use Hungarian prefix notation to indicate the binary type of quantity symbols. The goal is to remove these prefixes from the target code and then possibly to use other conventions to name the target symbols. Renaming is triggered by the presence of a list of source code prefixes within the Hungarian command; however, much more machinery is needed if compilable target code is needed.

The Hungarian command has a set of subcommands that organize the needed information. The following subtopics describe the subcommands themselves, and then describe the algorithms that applythem.

The Rename Subcommand

The Rename subcommand can appear anywhere within the Hungarian command. It changes the authored name of a symbol and blocks the application of any of the renaming algorithms specified to that name. The attributes of the Rename statement are as follows:

Attribute	Description
Identifier	This required identifier attribute specifies the component to be renamed. It is specified relative to the root of the symbol table -- i.e., it is a fully qualified identifier. It is expected that the same CodeStyle script will be used by multiple code sets. If an undefined identifier is encountered, it is simply assumed to apply to a different code set and is skipped.
Content	This required identifier specifies the name to be used for the component in the target code.

The Rename subcommand is applied as the Hungarian command is being read which means that it applies before any of the code style specific algorithms are applied. Note that refactoring Rename commands may be entered in the translation scripts themselves and cause the same blocking of the code style algorithms for individual identifiers.

The SourcePrefixes Subcommand

The SourcePrefixes subcommand specifies the binary type Hungarian prefixes. Only variables are assumed to have type prefixes. It is the presence of a SourcePrefixes subcommand that triggers steps 3 through 7 of the renaming algorithm. The command itself introduces a series of Entry subcommands each of which has two required attributes:

Attribute	Description
Type	Specifies the binary type that has a certain prefix. The possible binary type identifiers are listed below.
Value	Specifies the actual Hungarian prefix in case sensitive form. If a variable of the type indicated by the Type attribute has this prefix then that prefix is stripped.

These types are as follows:

VB6	.NET Equivalent C#, VB.NET
Byte	byte, Byte
Short	short, Short
Integer	int, Integer
Long	long, Long
Currency	decimal, Decimal
Single	float, Single
Double	double, Double
String	string, String
Boolean	bool, Boolean
Date	DateTime
Variant	object, Object
Object	object, Object
User	object, Object
Control	System.Windows.Forms.Control

Second there the special processing types used by gmBasic to deal with various special circumstances:

Vb6Special	.NET Equivalent C#, VB.NET
Icon	System.Drawing.Icon
FrxPicture	System.Drawing.Image
Any	object, Object
TwipsX	int, Integer
TwipsY	int, Integer
UnsInteger	unit, Integer
WinPanel	System.Windows.Forms.GroupBox
VarArray	Object[], Object()
StringPtr	System.Text.StringBuilder, String
CallHwnd4	MigrationSupport.Vb7_Callback.Hwnd4
ControlCollection	System.Windows.Forms.Control.ControlCollection
CheckedListBox	System.Windows.Forms.ListBox
Exception	System.Exception
SafeArray	System.Array
SecurityManager	UserSecurityManager
Dynamic	dynamic
ValueType	object, Object

Third are the VB6 classes:

Vb6Class	.NET Equivalent C#, VB.NET
PictureBox	System.Windows.Forms.PictureBox
Label	System.Windows.Forms.Label
TextBox	System.Windows.Forms.TextBox
Frame	System.Windows.Forms.GroupBox
CommandButton	System.Windows.Forms.Button
CheckBox	System.Windows.Forms.CheckBox
OptionButton	System.Windows.Forms.RadioButton
ComboBox	System.Windows.Forms.ComboBox
ListBox	System.Windows.Forms.ListBox
HScrollBar	System.Windows.Forms.HScrollBar
VScrollBar	System.Windows.Forms.VScrollBar
Timer	System.Windows.Forms.Timer
Printer	MigrationSupport.Printer
Form	System.Windows.Forms.Form
DriveListBox	Microsoft.VisualBasic.Compatibility.VB6.DriveListBox
DirListBox	Microsoft.VisualBasic.Compatibility.VB6.DirListBox
FileListBox	Microsoft.VisualBasic.Compatibility.VB6.FileListBox
Menu	System.Windows.Forms.ToolStripMenuItem
MDIForm	System.Windows.Forms.Form
Shape	System.Windows.Forms.Label
Line	System.Windows.Forms.Label
Image	System.Windows.Forms.PictureBox
Data	MigrationSupport.DataControl.DataControl
PropertyPage	MigrationSupport.PropertyBag
TabControl	System.Windows.Forms.TabControl
ErrObject	VBNET.ErrObject, ErrObject

Fourth are the VB6 enumerations:

Vb6Enumeration	.NET Equivalent C#, VB.NET
SimpleBorderStyle	System.Windows.Forms.BorderStyle
KeyCodeConstants	System.Windows.Forms.Keys
LogEventTypeConstants	System.Diagnostics.EventLogEntryType
DrawStyle	MigrationSupport.Utils.DrawStyle
DrawMode	MigrationSupport.Utils.DrawMode
MousePointerConstants	System.Windows.Forms.Cursor
WindowStyle	VBNET.AppWinStyle, AppWinStyle
OpenMode	VBNET.OpenMode, OpenMode
vbTristate	VBNET.TriStatem TriState
ScaleType	MigrationSupport.Utils.ScaleType
VbCompareMethod	VBNET.CompareMethod, CompareMethod
VbFileAttribute	VBNET.FileAttribute, FileAttribute
MsgBoxResult	VBNET.MsgBoxResult, MsgBoxResult
VbMsgBoxStyle	VBNET.MsgBoxStyle, MsgBoxStyle
VariableType	VBNET.VariantType, VariantType
ButtonAppearanceStyle	System.Windows.Forms.Appearance
ApplicationStartMode	MigrationSupport.Utils.StartMode
MouseButtonConstants	System.Windows.Forms.MouseButtons
ResourceType	MigrationSupport.Utils.ResourceType
FirstDayOfWeek	VBNET.FirstDayOfWeek, FirstDayOfWeek
FirstDayOfYear	VBNET.FirstDayOfYear, FirstDayOfYear
DueDate	VBNET.DueDate, DueDate
AlignConstants	MigrationSupport.Utils.AlignConstants
CheckboxConstants	System.Windows.Forms.CheckState
AlignmentConstants	System.Drawing.ContentAlignment
BorderStyle	System.Windows.Forms.FormBorderStyle
ComboBoxStyle	System.Windows.Forms.ComboBoxStyle
ColorConstants	System.Drawing.Color
LayoutArrangement	MdiLayout
RLDirection	System.Windows.Forms.RightToLeft
ShiftConstants	MigrationSupport.Utils.ShiftConstants
BackStyle	MigrationSupport.Utils.BackStyleConstants
QueryUnloadConstants	MigrationSupport.Utils.QueryUnloadConstants
ClipboardConstants	MigrationSupport.Utils.ClipboardConstants

The type designator Object refers to any external type and the type designator User refers to any user defined type. Below is a simple SourcePrefixes specification.

 <SourcePrefixes >
    <Entry type="Boolean" value="bln" />
    <Entry type="String"  value="str" />
    <Entry type="Integer" value="lng" />
    <Entry type="User"    value="obj" />
    <Entry type="Object"  value="dic" />
 </SourcePrefixes>

It is permissible to have multiple types with the same prefix or multiple prefixes with the same type. The search for a prefix that can be searched continues in the order that the entries were specified until a matching type and prefix is encountered.

The ExcludedSuffixes Subcommand

One of the potential problems with stripping identifiers of their Hungarian prefixes is that there will be symbols whose identifiers are distinguished only by their prefixes. The ExcludedSuffixes command specifies these symbols. Any identifier that ends in one of the excluded suffixes is excluded from the renaming algorithm. The comparison is case insensitive. The actual list of symbols can be entered as a single semicolon delimited list using a single value attribute. This might look as follows

 <ExcludedSuffixes value="Data;Connection;ErrorMessage;Table;Field;IndexName" />

Alternatively a group of Entry subcommands, each with a value attribute, can be used to specify the suffixes individually. This might look as follows

 <ExcludedSuffixes >
    <Entry value="Data" />
    <Entry value="Connection" />
    <Entry value="ErrorMessage" />
    <Entry value="Table" />
    <Entry value="Field" />
    <Entry value="IndexName" />
 </ExcludedSuffixes >

The two forms above would produce the same result.

The StatusPrefixes Subcommand

In addition to the binary type Hungarian prefixes there are sometimes also various types of status Hungarian prefixes which must be stripped before the actual type prefixes can be examined. The StatusPrefixes command specifies these prefixes. Any identifier that begins with one of these prefixes has that prefix stripped off. The comparison is case insensitive. The actual list of prefixes can be entered as a single semicolon delimited list using a single value attribute. This might look as follows

 <StatusPrefixes value="m_;i_;o_;io_;l_" />

Alternatively a group of Entry subcommands, each with a value attribute, can be used to specify the prefixes individually. This might look as follows

 <StatusPrefixes >
    <Entry value="m_" />
    <Entry value="i_" />
    <Entry value=";o_" />
    <Entry value="io_" />
    <Entry value="l_" />
 </StatusPrefixes >

The two forms above would produce the same result.

The GlobalPrefixes Subcommand

Non local variables often also have a prefix used to indicate that they are not local, which also precede the type prefix. There might be identifiers like "gblnReadAll" for a global boolean variable so these are assumed to combine with Hungarian. So these need to be checked for as well and be stripped. They are specified via the GlobalPrefixes command. The comparison is case insensitive. The actual list of prefixes can be entered as a single semicolon delimited list using a single value attribute. This might look as follows

<GlobalPrefixes value="g;m" />

Alternatively a group of Entry subcommands, each with a value attribute, can be used to specify the prefixes individually. This might look as follows

<GlobalPrefixes >
   <Entry value="g" />
   <Entry value="m" />
</GlobalPrefixes >

The two forms above would produce the same result.

The NamingStyle Subcommand

The changing naming style algorithm is made possible by the fact that the modern target languages are case sensitive while the historical source languages are case insensitive. This allows modern naming styles to distinguish different symbol types based solely on the case pattern of their identifiers. The key notion here is CamelCase which is the practice of writing compound names such that each word or abbreviation within the name begins with a capital letter. Camel case may start with a capital or lowercase letter. As an example consider the identifier CamelCase itself beside its possible alternative camelCase. In general the naming style algorithm recognizes four case styles:

Style	Description
lowercase	All the alphabetic characters in the identifier are lowercase as in "lowercase"
uppercase	All the alphabetic characters in the identifier are uppercase as in "UPPERCASE"
lowercamel	The first character of words in the identifier begin with an uppercase character followed by lowercase characters except the first character which is lowercase as in "lowerCamel"
uppercamel	The first character of words in the identifier begin with an uppercase character followed by lowercase characters as in "UpperCamel"

The NamingStyle subcommand itself specifies the naming style to be associated with symbols names. This command has the following attributes:

Attribute	Description
Style	Specifies the naming style to be used. It has 5 possible entries -- Original, LowerCase, UpperCase, LowerCamel, and UpperCamel. The Original style resets the name to its original form as of the end of the renaming algorithm. The other styles are discussed above.
Object	Specifies the object type of the symbol. It has the following possible entries -- Subprogram, Variable, Constant, Property, Declaration, Structure, Enumeration, EnumeratedEntry, StatementLabel, Event, Vb_Name.
Access	Specifies the access type of the symbol. It has the following possible entries -- , Public, Private.
Type	Specifies a binary type. The possible binary type identifiers are discussed under the SourcePrefixes subcommand.
Prefix	In addition to the case style of the name a prefix can be added to the front of the name as well. This attribute specifies that prefix. Note that combining these prefixes with types allows the reintroduction of Hungarian notation in the target names, if that is desired.

Here is a sample set of NamingStyle entries.

 <NamingStyle>
    <Entry style="Original" object="Vb_name" />
    <Entry style="lowerCamel" access="local" />
    <Entry style="lowerCamel" access="Private" object="Variable" prefix="_" />
 </NamingStyle >

The SpecialNames Subcommand

There are some special names specified in the gmBasic language files, such as arguments to event handlers, That are also referenced by micro-code in the language files. These can not be changed via this set of specifications. The ones generated by the client code translations must be listed as SpecialNames so that they are not changed. The comparison is case insensitive. The actual list of special names can be entered as a single semicolon delimited list using a single value attribute. This might look as follows

<SpecialNames value="Cancel;UnloadMode" />

Alternatively a group of Entry subcommands, each with a value attribute, can be used to specify the special names individually. This might look as follows

<SpecialNames >
   <Entry value="Cancel" />
   <Entry value="UnloadMode" />
</SpecialNames >

The two forms above would produce the same result.

The Acronyms Subcommand

The NamingStyle algorithm has no way of locating words within compound names, because it does not know what the names are. There is one exception to this -- acronyms like "SQL" or "XML". The Acronyms command specifies a list of acronyms or simply words which should be entered in a particular style in the target name. The individual entries are specified in their desired target language form. The algorithm does a case insensitive search of each name for the entry and, if found, substitutes the target form for the original form. The actual list of acronyms can be entered as a single semicolon delimited list using a single value attribute. This might look as follows

<Acronyms Value="Xml;Sql" />

Alternatively a group of Entry subcommands, each with a value attribute, can be used to specify the acronyms names individually. This might look as follows

<Acronyms >
   <Entry Value="Xml" />
   <Entry Value="Sql" />
</Acronyms >

The two forms above would produce the same result.

The ReservedWords Subcommand

The NamingStyle algorithm can form reserved words like default or in. These can be repaired by changing their case. The ReservedWords command specifies the list of reserved words in the form in which they can be used as identifiers in the targe code. The algorithm does a case insensitive search of each name for the entry and, if found, substitutes the target form for the original form. The actual list of reserved words can be entered as a single semicolon delimited list using a single value attribute. This might look as follows

<ReservedWords value="Default;String;In" />

Alternatively a group of Entry subcommands, each with a value attribute, can be used to specify the reserved words individually. This might look as follows

<ReservedWords >
   <Entry Value="Default" />
   <Entry Value="String" />
   <Entry Value="In" />
</ReservedWords >

The two forms above would produce the same result.

The LoopVariables Subcommand

The LoopVariables command changes the names of loop variables. There is a common convention in code bases to use simple identifiers like i or j for loop variables. These simple identifiers can be difficult to find and/or trace in the target code. This command changes these identifiers to something more readable like "index". The LoopVariables command has a set of Entry subcommands with the following attributes:

Attribute	Description
id	Specifies the identifier in the source code of a loop variable to be renamed. Comparison is case sensitive.
name	Specifies the identifier in the target code to be used for the loop variable.

The change only applies to variables that are explicitly used as a counter in a For loop. A possible LoopVariables specification might be as follows.

<LoopVariables >
   <Entry id="i" name="loopIndex" />
</LoopVariables>

Whenever new identifiers are introduced, name clashes can occur. As a result of using the above, the following error might occur when the target code is compiled.

sample.cs(1666,14): error CS0136: A local variable named 'loopIndex' cannot be declared in this scope
   because it would give a different meaning to 'loopIndex', which is already used in a 'parent or current'
   scope to denote something else [C:\temp\Sample.csproj]

Notice that the scope rules can be complicated. The .NET languages do not support strictly hierarchical symbol scope. The only solution is to use a different new name.

Algorithm to Strip Source Identifiers

The renaming algorithm is the first algorithm applied to the target code. It is applied after all code in a given code unit has been compiled and analyzed. At this point in time, in addition to the compiled code there is also a symbol table. Though additional renaming can occur during a later code scan, the bulk of the renaming process is done through a scan of the symbol table. It begins by applying the source specifications so that a root identifier is formed which can the be used the form a target identifier: It proceeds as follows:

The symbol table is scanned looking for any symbols that are a subprogram, variable, constant, property, declare, structure, enumeration, enumeration entry, statement label, event, or class name. These are the types of symbols that can be renamed here. The following steps apply to each one of these symbols separately. Note that any symbol that already has a target name associated with it via a Rename command is skipped as well.
The access type of the symbol is determined -- local, public, or private.
If the source code used Hungarian notation, then the source Hungarian prefixes can be removed. The specification commands include a Hungarian command which supplies the prefix used for each binary and access type combination. The presence of this specification triggers the prefix removal steps.
In source codes there will often be symbols whose identifiers are only distinguished by their Hungarian prefixes. A list of these symbols is supplied via a ExcludedSuffixes command.
In source codes there are often symbol status codes that precede the actual Hungarian type prefix. These must be checked for first and stripped from the identifier. They are specified via a StatusPrefixes command.
Non local variables often also have a prefix used to indicate that they are not local. These must be checked for as well and be stripped. They are specified via the GlobalPrefixes command.
Finally the actual Hungarian prefixes can be stripped.

Algorithm to Form Target Identifiers

Once the source symbol has be stripped of its hungarian annotations, the target language naming styles can be applied. The actual application of this algorithm depends upon the presence of a NamingStyle subcommand within the Hungarian command. Any identifiers skipped because they were explicitly renamed are not changed by this algorithm. Also before the names can be changed into one of the style forms they first need to be changed into a standard form from which the other styles can be derived. That standard form is UpperCamel. The problem is that the algorithm here has no way of breaking possibly compound names into their component words. Fortunately, many code bases use the underscore character in symbol names to separate their words. At this point then the algorithm looks for names like "KEY_QUERY_VALUE" and changes them to "KeyQueryValue". Any name that does not have this form is simply changed by making its first character upper case. Some typical names changes at this point might be as follows:

Original	Changed
KEY_ALL_ACCESS	KeyAllAccess
READ_CONTROL	ReadControl
STANDARD_RIGHTS_READ	StandardRightsRead
SYNCHRONIZE	Synchronize
KEY_READ	KeyRead
dwType	DwType
szData	SzData
cbData	CbData
ctlReadyToGenerate	CtlReadyToGenerate
enumOperationMode	EnumOperationMode
ctlSelectDatasource	CtlSelectDatasource

When the algorithm applies, the binary, component, and access types of the symbol underlying the identifier are all known. The algorithm itself proceeds as follows:

Exclude any SpecialNames that are referenced by the micro-code in the language files
Convert the names into uppercamel form when the word boundaries can be detected.
Apply the specifications in the NamingStyle command
Repair any ReservedWords that may have been formed

The DoNotInitialize Command

The DoNotInitialze command removes default initializations of variables and fields that are not necessary to avoid using an uninitialized value. The default translations produced are generic and designed to be compilable even in situations where the target code is not fully mature. By default, all variable and field declarations have an initialization value specified regardless of need.

The Fields Subcommand

The Fields subcommand requests the fields that have a public access type not be supplied with a default initialization value. The subcommand is a singleton with no attributes. It appears as follows.

<Fields />

The Variables Subcommand

The Variables subcommand requests that local variables that are assigned a value within the code, not also be assigned a default value. Simply being assigned a value is too weak. The actual test used here traces all references to the variable to make certain that no nested use of the variable is on a possibly unassigned path through the code. The subcommand is a singleton with no attributes. It appears as follows.

<Variables />

The OutParameters Subcommand

The OutParameters subcommand examines all parameters that are being passed ByRef to determine if their values are being changed before they are being used. If so, then they can be reclassified as being ByOut. The actual command is a singleton with no attributes. It looks like this

<OutParameters />

The tool already removes ByRef specifications from parameters that are not changed by their subprograms, making them ByVal; therefore, the only additional check needed is to verify that the first change precedes the first use. This check is equivalent to the one done by the variables subcommand, and is the first reason why this operation is included under the DoNotInitialize command. Making this additional check in the code and changing the reference status of the parameter is straight-forward. The problem is that the Out parameters have an additional requirement. Making the simple change causes errors of the following sort.

sample.cs(669,6): error CS0177: The out parameter 'script' must be assigned to before control
leaves the current method [C:\temp\Sample.csproj]

Out parameters must be assigned along ALL PATHS before control leaves the method. In cases where there are branching or conditional statements, the parameters may not get assigned on all paths through the code. The simple solution would be to add initialization code to the start of the methods for all out parameters, but this would often lead to redundant initializations. Again the same algorithm that tests variables can be used here.

The transformation converts many ByRefs to ByOuts, but not all. The code inserts initialization code only if absolutely needed. Even code assignments to variable are removed if those variables are then directly passed ByOut

The ByRef to ByOut status change is made during the first pass though the operation code. Of course, if a parameter is ByOut, then any arguments passed to it elsewhere in the code must be annotated with out as opposed to ref. The repair of these annotations is then done during the second pass through the operation code.

The SimpleProperty Command

The SimpleProperty command checks for simple getter/setter properties whose operation codes match a code pattern and then reauthors them using a specified .NET surface form pattern. As an example consider a VB6 property source pattern that always includes On Error GoTo error handling code. In the .NET implementation this error handling code is to be removed and the properties are to be authored using an internal declaration. Here is a sample code

Private fieldValue As ValueType
Friend Propery Get PropValue() As ValueType
   On Error GoTo ErrorHandler
   PropValue = fieldValue;
   Exit Property
ErrorHandler:
   ...
End Property
Friend Property Let PropValue(ByVal myValue As ValueType)
   On Error GoTo ErrorHandler
   fieldValue = myValue
   Exit Property
ErrorHandler:
   ...
End Property

The default translation for this property as produced by the tool is

private ValueType fieldValue = "";
public ValueType PropValue
{
   get
   {
      ValueType PropValue = "";
      try
      {
         PropValue = fieldValue;
         return PropValue;
      }
      catch(Exception exc)
      {
         ...
      }
      return PropValue;
   }
   set
   {
      try
      {
         fieldValue = value;
         return;
      }
      catch(Exception exc)
      {
         ...
      }
   }
}

The desired translation for these properties is

private ValueType fieldValue;
internal ValueType PropValue
{
   get { return fieldValue; }
   set { fieldValue = value; }
}

Notice first of all, that the tool converts the error handling into try-catch. Also most code style transformations will involve renaming symbols so the target names for the symbols will possibly change as will the identifier for the .NET implementation of ValueType. It might be possible to write editing code that looks for patterns in the actual authored code, but it would be very difficult. The operation code, however, for these is very patterned.

Actual csh Codeblock Associated with Get:
Opcode | Operation support information
------ | -----------------------------
NEW    | 25 On Error GoTo ErrorHandler
NEW    | 27 PropValue = fieldValue
ERR    | Try
LEV    | Nest0
LDA    | Variable:fieldValue:610921
ARG    | ValueType
LDA    | Property:PropValue:610968
STR    | AssignValue
NEW    | 29 Exit Property
LDA    | Property:PropValue:610968
EXI    | Function
ERR    | Catch1
    ...
ERR    | Catch3

Actual csh Codeblock Associated with Let:
Opcode | Operation support information
------ | -----------------------------
NEW    | 36 On Error GoTo ErrorHandler
NEW    | 38 fieldValue = myValue
ERR    | Try
LEV    | Nest0
SPV    | Value
ARG    | String
LDA    | Variable:fieldValue:610921
STR    | AssignValue
NEW    | 40 Exit Property
EXI    | Property
ERR    | Catch1
  ...
ERR    | Catch3

Note that the above requires that the type of the property and the type of the variable are the same. This is not true or necessary in general. The only requirement is that the two types can be cast to each other.

The SimpleProperty subcommands specify the code patterns for the getters and setters or letters that qualify them for simplification and the actual syntax of the simplified target code. These must all be specified via this command. There are also two optional commands that deal with public field properties and enumerators.

The Getter Subcommand

The Getter subcommand specifies a set of code patterns that a given property getter must match if it is to be authored in a simpler way. It has a series of Entry subcommands that specify the actual code patterns. Here is the specification for the above example along with a second pattern for a getter that has no try-catch.

<Getter>
   <Entry value="NEW,NEW,ERR.Try,LEV,LDA,ARG,LDA,STR.AssignValue,NEW,LDA,EXI,ERR.Catch1,...,ERR.Catch3" />
   <Entry value="NEW,Argument,EXI.Function" />
</Getter>

Note that the initial LDA operation is assumed to specify the field that contains the value. There can be multiple code patterns specified, if needed.

The Setter Subcommand

The Setter subcommand specifies a set of code patterns that a given property letter or setter must match if it is to be authored in a simpler way. It has a series of Entry subcommands that specify the actual code patterns. Here is the specification for the above example along with a second pattern that has no try-catch.

<Setter>
   <Entry value="NEW,NEW,ERR.Try,LEV,SPV.Value,ARG,LDA,STR.AssignValue,NEW,EXI,ERR.Catch1,...,ERR.Catch3" />
   <Entry value="Argument,LDA,STR.AssignValue" />
</Setter>

There can be multiple code patterns specified, if needed. To be eligible for a simplification the getter and setter codes must match at least one of their specified patterns.

The AuthorSame Subcommand

The AuthorSame subcommand contains the patterned text block that specifies how the simplified property is to be authored when the type of the property and the type of the value are the same. Note the manner in which the text is surrounded by CDATA directives. These are required in the form shown. Also the dollar sign, as opposed to the percent sign, is used to mark the locations of the variable strings in the pattern.

<AuthorSame><![CDATA[
private $1d $3d;
$5d $4d $2d
{
   get { return $3d; }
   set { $3d = value; }
}

The patterns assume for variable strings as follows:

is the .NET identifier of the value type.
is the target form of the property identifier. This may well be the output of the renaming algorithms.
is the target form of the field identifier. This may well be the output of the renaming algorithms.
is the .NET identifier of the property
is the .NET scope specification. If the property was Public or Friend then it is "public" else it is "internal".

The AuthorDifferent Subcommand

The AuthorDifferent subcommand contains the patterned text block that specifies how the simplified property is to be authored when the type of the property and the type of the value are different.

<AuthorDifferent><![CDATA[
private $1d $3d;
$5d $4d $2d
{
   get { return ($4d)$3d; }
   set { $3d = ($1d)value; }
}

The low level required syntax and string values are as specified above.

The PublicFields Subcommand

Many code style standards forbid the use of global fields. They prefer global properties. In .NET there are auto-properties that can be used to define what were simply global fields in VB6.

Public GlobalField As fieldType

can be authored in C# as

 public static fieldType GlobalField { get; set;}

The PublicFields subcommand specifies that global fields be authored differently than their default. It simply specifies the text block to be used to do the authoring. To reproduce the above

<PublicFields><![CDATA[
public static $1d $2d { get; set; }

where the $1d parameter refers to the type of the field and the $2d parameter refers to the name of the field.

During the initial symbol scan of the FinishAnalyser event global fields are marked so that the AuthorDeclaration event can be used to override their default declaration.

Some caution should be used because code that passed a public field ByRef will fail to compile if that field was declared as an auto-property. Since this passing might be outside of the compilation unit, it would be difficult to know of in advance.

The GetEnumerator Subcommand

The VB6 NewEnum property getters are replaced by .NET GetEnumerator() methods. These methods should not contain any initialization code and must almost always be rewritten as part of a migration. By default, then, the tool strips away all code from the getter and simply authors it using this gmSL method.

void AuthorGetEnumerator(int iHost)
{
   if(Select.Dialect == Dialects.csh)
   {
      #TextStart
      public IEnumerator GetEnumerator()
      {
          return (%= Store.GetName(iHost) %).GetEnumerator();
      }
      #TextEnd
   }
   else
   {
      #TextStart
      Public Function GetEnumerator() As IEnumerator
         GetEnumerator = (%= Store.GetName(iHost) %).GetEnumerator();
      End Function
      #TextEnd
   }
}

The GetEnumerator subcommand instructs the tool to retain the original version of the code in the NewEnum getter. The only problem is what to do with VB6 code like

Set NewEnum = mcolTables.[_NewEnum]

whose surface pattern in the target code is by default

 <csh role="property" narg="1" code="MigrationSupport.Utils.NewEnum(%1d)" />

The command is a singleton with a single optional attribute Entry. If specified this attribute supplies an alternative for the surface pattern. For example

<GetEnumerator entry="%1d.GetEnumerator()" />

The CodeScan Operations

When the initialize scan of the symbol table via the FinishAnalyser event handler encounters a property or a variable, it invokes the operations of this command. If the symbol is a variable with a Public access type and if a special declaration was specified via the PublicFields subcommand, then the information vector for that symbol is marked so that the it can be declared later as specified.

If the symbol is the NewEnum property with a getter, and if the GetEnumerator subcommand was specified, then the information vector of the getter is marked so that the tool will not override its code and if specified the COL.NewEnum operations are replaced with the desired one.

Finally, the actual code patterns of the property getter and setter or letter are compared with the code patterns specified in their subcommand. If both match one of the specified pattern entries, then the information structure of the field backing up the value of the property is marked so that it can be authored later via the AuthorDeclaration event handler.

The Authoring the Declarations

When the AuthorDeclaration event handler is called for a field whose information structure is marked with propery, then the property is authored in the way specified by command. The four needed strings are formed from the information in the symbol table. then the appropriate form is used depending upon whether the types are the same or different.

The ChangeIntroduced Command

The ChangeIntroduced command changes introduced variables so that they follow the same naming conventions that other variables follow. The primary source of introduced variables is the need to create a variable when a constant or an expression or an object instance of the wrong class serves as an argument to a ByRef or ByOut parameter. The need to create these variables pervades the VBb to .NET migration process. The tool carefully analyses user code parameters to change them to ByVal whenever possible. But the tool has no control over the status of parameters in external libraries which are often needlessly ByRef. To make these easy to find during the migration process the tool uses a standard naming convention, argTemp(n), to name these introduced temporaries. This makes them easy to find in the target code. Here is an example.

 object argTemp1 = MigrationSupport.Utils.VarPtr(SwapEndian) + 3;
 object argTemp2 = dw;
 CopyMemory(ref argTemp1,ref argTemp2,1);
 object argTemp3 = MigrationSupport.Utils.VarPtr(SwapEndian) + 2;
 object argTemp4 = MigrationSupport.Utils.VarPtr(dw) + 1;
 CopyMemory(ref argTemp3,ref argTemp4,1);
 object argTemp5 = MigrationSupport.Utils.VarPtr(SwapEndian) + 1;
 object argTemp6 = MigrationSupport.Utils.VarPtr(dw) + 2;
 CopyMemory(ref argTemp5,ref argTemp6,1);
 object argTemp7 = SwapEndian;
 object argTemp8 = MigrationSupport.Utils.VarPtr(dw) + 3;
 CopyMemory(ref argTemp7,ref argTemp8,1);

Though easy to find, this makes to target code ugly and for many difficult to read.

The alternative supported by the ChangeIntroduced command is the use the identifier of the original parameter, which obviously follows the naming conventions, to form the name of the introduced variables. The above then becomes this.

  object lpvSource = null;
  object lpvDest = null;
  lpvDest = MigrationSupport.Utils.VarPtr(SwapEndian) + 3;
  lpvSource = dw;
  CopyMemory(ref lpvDest,ref lpvSource,1);
  lpvDest = MigrationSupport.Utils.VarPtr(SwapEndian) + 2;
  lpvSource = MigrationSupport.Utils.VarPtr(dw) + 1;
  CopyMemory(ref lpvDest,ref lpvSource,1);
  lpvDest = MigrationSupport.Utils.VarPtr(SwapEndian) + 1;
  lpvSource = MigrationSupport.Utils.VarPtr(dw) + 2;
  CopyMemory(ref lpvDest,ref lpvSource,1);
  lpvDest = SwapEndian;
  lpvSource = MigrationSupport.Utils.VarPtr(dw) + 3;
  CopyMemory(ref lpvDest,ref lpvSource,1);

The identifiers of parameters can be very simple so there is always a possibility of name clash. If there are conflicts between these new names introduced, the name of the method is appended as well. This removes argTemps from the code replacing them with "conventional names". There may be unintended clashes as well. To avoid these the ChangeIntroduced command has a set of Entry subcommands with the following attributes:

Attribute	Description
id	Specifies the generated identifier of an introduced variable in case sensitive form.
name	Specifies the name to be used instead of the generated identifier.

The an actual command might look as follows

<ChangeIntroduced>
   <Entry id="index" name="indexPram" />
</ChangeIntroduced>

and may appear anywhere in the CodeStyle Script. The logic for this command has been added to the operation code scan. It locates argTemps being generated and replaces them with an identifier for the parameter receiving the argument.

The OperationCode Command

The OperationCode command contains requests to introduce code styles that require changing the operation code. Its subcommands are implemented during the final pass of the operation code via the FinishAnalyser event handler.

The OptimizeFunctions Subcommand

The OptimizeFunctions subcommand is a singleton command with no attributes. It basic role is to replace sequences like the following in the target code

 static bool myFunction
 {
    bool myFunction;

    myFunction = false;
    return myFunction;
 }

with the simpler

 static bool myFunction
 {
    return false;
 }

Note that the declaration of the internal function variable is removed only if there are no other references to it. The optimization itself applies to both assignment statements and set statements. It may appear anywhere with the scope of the OperationCode command.

<OptimizeFunctions />

The PostIncrement Subcommand

The PostIncrement subcommand is a singleton command with no attributes. It requests that assignments to variables that simply add one be replaced by the ++ post increment operation. It may appear anywhere with the scope of the OperationCode command.

<PostIncrement />

The RemoveReturns Subcommand

The RemoveReturns subcommand is a singleton command with no attributes. It requests that additional checks be made for unneeded explicit return statements in the target codes. An example would be a return at the bottom of a try block whose catch block immediately precedes the end. It may appear anywhere with the scope of the OperationCode command.

<RemoveReturns />

The SimpleCasts Subcommand

The SimpleCasts subcommand is a singleton command with no attributes. It requests that casts within the target code of the form (type)(value or instance) be replaced with the form (type)value or instance. This subcommand is implemented by replacing the CNV.CastType operation with CNV.CastSimple. It may appear anywhere with the scope of the OperationCode command.

<SimpleCasts />

The StandardFunctions Subcommand

The StandardFunctions subcommand replaces references to the standard VB6 functions with alternative operations that give different target code for them. The subcommand has Entry subcommands that specify the individual functions and their desired target code surface pattern. The Entry subcommand has two attributes as follows:

Attribute	Description
id	The VB6 source code identifier of the function
name	The desired target code surface pattern

Here is a sample of this command

<StandardFunctions>
   <Entry id="Trim"   name="%1d.Trim()" />
   <Entry id="Left"   name="%1d.Substring(0,%2d)" />
   <Entry id="InStr"  name="%2d.IndexOf(%3d,%1o)" />
   <Entry id="Right"  name="%1d.Substring(%1d.Length - %2h)" />
   <Entry id="Len"    name="%1d.Length" />
 </Standard Functions>

As can be seen, it is common to replace the commonly used function(arguments) notation with postfix notation.

The OptionalArguments Command

Starting with the March 2023 release, OptionalArguments="on" is set by default in the standard translation template script.

Visual C# 2010 introduced optional arguments. The definition of a method, constructor, indexer, or delegate can specify that its parameters are required or that they are optional. Any call must provide arguments for all required parameters, but can omit arguments for optional parameters. Each optional parameter has a default value as part of its definition. If no argument is sent for that parameter, the default value is used. A default value must be a constant expression and it must be ByVal. Optional parameters are defined at the end of the parameter list, after any required parameters. If the caller provides an argument for any one of a succession of optional parameters, it must provide arguments for all preceding optional parameters. Comma-separated gaps in the argument list are not supported.

The default translation into C# do not use optional arguments, rather they supply the VB6 default values in the calls. The OptionalArguments command tells the tool to use them. It is a singleton command with no attribute.

<OptionalArguments />

The implementation of the command must precede the actual compilation of the code as it is the compiler that does the default value insertion in calls; therefore, the command is executed as part of the StartPass2 event handler. The symbol table is scanned looking for optional VB6 parameters. These are marked with the context flag OverLoad which blocks the default value insertion, and the migration status flag Overloads which authors the initialization value in the method declaration. Note that all parameters so marked are also forced to be ByVal.

The TargetCode Command

The TargetCode command contains requests to introduce code styles that require changing the target code directly. When the tool actually authors the final target code, rather than simply writing it to a file, it enters it into a stored text buffer. There is an EditTranslation event and an extensive text-editing service that can be used change the content of this text buffer before it is finally sent to the output file.

The AddSpaces Command

By default the target code does not add a space after each comma in lists, because the target output lines are often very long. The AddSpaces command adds these spaces. The command itself is a singleton with one attributes. The AddSpaces command has a Vertical="on" attribute that adds a blank line after a mainline right brace. This adds an additional line of separation after all "complex" component declarations that used braces, not just methods.

<AddSpaces Vertical="on" />

The AllowBlankLines Command

By default the translator passes all blank lines and empty comment lines in the source through to the target code. In addition the translator moves declarations when necessary to resolve nesting scope errors in the target. These moves can some times leave blocks of blank lines behind which come through to the target code. The AllowBlankLines subcommand will remove sequences of blank lines from the translations.

<AllowBlankLines Limit="n" />

This subcommand allows no more than "n" consecutive blank lines in authored code. The default does not check for blank lines so there is no limit. Setting the limit to zero will remove all blank lines from the target code.

The ReduceBraces Command

The ReduceBraces subcommand removes the braces from if/while/for statements, when they are controlling a single statement. A structure like the following

conditional
{
   statement
}

is reduced by removing the two braces and then optionally converting it to a compound statement.

conditional statement.

The actual ReducedBraces is a singleton command with one attribute Statement. If this attribute is on then a compound statement is formed. If it is off then no compound statement is formed. Consider the following target code

if (_disposed)
{
   return;
}
else
{
   Class_Terminate();
}
_disposed = true;

The specification

<ReduceBraces statement="off" />

produces this target code.

 if (_disposed)
    return;
 else
    Class_Terminate();
 _disposed = true;

While the specification

<ReduceBraces statement="on" />

produces this target code.

 if (_disposed) return;
 else Class_Terminate();
 _disposed = true;

The command looks for the specified pattern in the target text buffer, and when found makes the specified change. There is one special case that is checked for

if (enumTableStatus == basGlobal.DefinedEnum.DefDeleted)
{
   //  We don't need to add any more information to display
}
else ..

Is not equivalence to

if (enumTableStatus == basGlobal.DefinedEnum.DefDeleted) //  We don't need to add any more information to display
else ..

A special check for this had to be put into the editing code.

In addition to removing braces this command also removes any blank lines immediately following an opening left brace.

The RemoveUsing Command

The RemoveUsing subcommand removes specified using statements from the target code text buffer, unless that buffer contains one of a list of substrings.

<RemoveUsing>
   <Entry id="System.Drawing;" />
   <Entry id="System.Collections;" name=" IEnumerator "/>
   <Entry id="System.ComponentModel;" />
   <Entry id="System.Runtime.InteropServices;" name="[Dllimport" />
   <Entry id="System.Data;" />
   <Entry id="Microsoft.VisualBasic.CompilerServices;" />
   <Entry id="System.Linq;" name=".ToArray<" />
   <Entry id="System.Collections.Generic;" name="List<;Dictionary<;HashSet<" />
   <Entry id="VBNET = Microsoft.VisualBasic;" name="VBNET." />
</RemoveUsing>

The Replacements Command

The Replacements subcommand scans the target code text buffer for a specified substrings and either replaces them with a second substring or simply removes them. The subcommand has a set of Entry subcommands each of which define the individual substrings. It has two attributes:

Attribute	Description
id	A substring to be replaced or removed. A simple case insensitive text search is performed for the string -- i.e., substring boundaries are not considered.
name	This optional substring is the replacement string to be used.

A typical set of entries might be

<Replacements>
   <Entry id="String.Empty" name="string.Empty" />
   <Entry id="System.Int32" name="int" />
   <Entry id="System.Windows.Forms." />
   <Entry id="this." />
   <Entry id="VBNET.Constants.vbNullString" name="null" />
</Replacements>

The VerticalList Command

The class VerticalList reformats long target code statements into readable form by converting them from horizontal form into vertical lists. The sorts of statements that are typically in need of this sort of reformatting are as follows:

Calling / declaring methods with "many" parameters;
Initializing arrays with many elements;
Complex formulas/conditionals with a series of similar repeating factors;
Building strings with a series of many concatenations.

All of these scenarios are reformatted by this command, but there is an important caveat. Not all target code can be processed by this class: only C# code produced by the surface code patterns specified in the metalanguage files can be processed. There are plans to also both VB.NET and XML/HTML target code as produced by the same surface codes as well, but these are not yet implemented.

The VerticalList statement is a section level statement with one optional attribute:

Attribute	Description
MinLineLength	The MinLineLength attribute is used to define the phrase "long" as applied to statements, expressions, and lists. These are considered to be "long" and thus are broken into a vertical list if their number of characters exceeds MinLineLength. Its default value is 60.

In addition, the VerticalList command may have zero or more <Breaker> elements. Each breaker contains a user-defined character strings called Breakers:

 <VerticalList MinLineWidth="160" >
    <Breaker>+ ", " +</Breaker>
    <Breaker>+ "'" + "\r\n" +</Breaker>
    <Breaker>+ "\r\n" +</Breaker>
 </VerticalList>

The guidelines for setting these user-defined strings observes that in practice the longest statements are often multi-part string concatenations that may be broken into more meaningful chunks based on user-defined character sequences called "breakers". These user-defined breakers will take precedence over the default set of single token-based breakers (e.g. arithmetic and logical operators and commas). Their precedence is based on the order of <Breaker> elements in the VerticalList section.

Table of Contents

gmniCodeStyle

Implementing Target Coding Standards

CodeStyle.std.xml

The Messages Command

The Indent Command

The Hungarian Command

The Rename Subcommand

The SourcePrefixes Subcommand

The ExcludedSuffixes Subcommand

The StatusPrefixes Subcommand

The GlobalPrefixes Subcommand

The NamingStyle Subcommand

The SpecialNames Subcommand

The Acronyms Subcommand

The ReservedWords Subcommand

The LoopVariables Subcommand

Algorithm to Strip Source Identifiers

Algorithm to Form Target Identifiers

The DoNotInitialize Command

The Fields Subcommand

The Variables Subcommand

The OutParameters Subcommand

The SimpleProperty Command

The Getter Subcommand

The Setter Subcommand

The AuthorSame Subcommand

The AuthorDifferent Subcommand

The PublicFields Subcommand

The GetEnumerator Subcommand

The CodeScan Operations

The Authoring the Declarations

The ChangeIntroduced Command

The OperationCode Command

The OptimizeFunctions Subcommand

The PostIncrement Subcommand

The RemoveReturns Subcommand

The SimpleCasts Subcommand

The StandardFunctions Subcommand

The OptionalArguments Command

The TargetCode Command

The AddSpaces Command

The AllowBlankLines Command

The ReduceBraces Command

The RemoveUsing Command

The Replacements Command

The VerticalList Command

Related content