Translation Tuning Concepts

Overview

This page introduces concepts of translation tuning.


Rewrite by tool or by hand?

One of the most frequently asked questions we hear in our migration services work is: "how much will we have to do by hand to finish the job?" It is an important question, and one that can only be answered by asking and answering a few more questions:

  • What do you mean when you say "the job"?  
  • What do you mean when you say "by hand"?

The reason for the first question is to help you understand and articulate your target architecture requirements. These requirements define how you will use .NET and will have a big impact on the migration effort.

The reason for the second question is to help you understand that doing things "by hand" with gmStudio can mean one of two things:

  • Modifying the translation configuration to produce "better" translations or  
  • Modifying the translations by hand to make them "better".

Better means "more correct" in terms or reproducing the functions of the legacy application" and "more conformant" in terms of following design and coding standards for the target platform. In many cases, you will find that it is more efficient to invest in configuring the translator to produce "better" code before spending time fixing translations by hand. This is particularly true if you have to migrate a large and active codebase.

Manual Design + Automated Implementation

Doing things by tool requires the same technical design work that is required when doing things by hand. Furthermore, you will typically fine-tune the .NET codes by hand in your favorite .NET IDE where you have access to features like intellisense. Once you have the details worked out, you should implement the design rules in the translation configuration so they can be applied in a repeatable and systematic manner across the codebase.

The Claim

The claim is made that the process of rehosting VB6/ASP/COM source code in .NET via translation to C# or VB.NET is refactoring. Martin Fowler in "Refactoring, Improving The Design of Existing Code" says:"Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure". First, it is clear that translating a source code to a new language and then moving it to a new operating system constitutes "changing a software system" and, if done correctly, "improves its internal structure". The problem, of course, is "does not alter the external behavior". This is hard to achieve in old applications running under .NET; the user interface components have changed, error handling has changed, security has changed, data access has changed, and so on. There does, however, seem to be a mutually agreed upon notion of "Functional Equivalence", whose meaning will not be expanded here. When you have it you know it, but never stop testing -- enough said.

Different Refactoring operations fall on a spectrum from "Shallow" to "Deep":

  • Shallow refactoring is applied to the surface or syntactic form of the code. Shallow changes can be applied to both the source code before it is translated and to the target code after it is produced, but before it is published. Such changes are easy to visualize and to specify. Unfortunately many of the changes needed cannot be specified in this way.
     
  • Deep refactoring involves manipulating the underlying semantic operations that are being performed. The representation of these operations is referred to as "deep structure". The ability to create and then manipulate the deep structure of the system being migrated is what sets the Great Migrations technology apart from all others.

gmStudio facilitates the entire spectrum of user-defined refactoring and correctly propagates refactoring changes consistently across the appropriate scope of code.

Translation Tuning Techniques

This topic introduces the techniques for customizing the translation process so it generates codes that are more correct and conformant to your coding standards.

The simplest form of user-defined refactoring is adding translation options to the Translation Script. The translation options are simple statements (e.g., one-liners). There are three types of translation option statements:

  • Select: set the value of a translation setting  
  • Reference: explicitly load an IDF file  
  • Registry: specify a wide variety of custom translation behaviors by registering various name-value pairs in the gmBasic registry

Translation options can be used to direct many aspects of the translator's behavior:

  • Logging Levels: control logging and error handling  
  • Processing Conventions: control how code is transformed at a global level  
  • Configuration Locations: control the directories that the tool uses for its Configuration Files  
  • Deployment Locations: control where target files and binaries will be located and related to each other  
  • Type Inference: control how the translator determines missing or ambiguous variable types.  
  • Interface Description File Handling: control how external COM libraries are interpreted and how interface description files are authored and accessed.

Surface Edits: Search and Replace

In general, the Great Migrations translators operate at the semantic level; however the system also includes an "editor" that works directly on the surface form of the source and target codes. The editor is a command-driven, multi-line search and replace facility. The rationale for the editor is discussed here. All edits must be explicitly defined in the Translation Scripts.

Source Edits - the Pre-processor

Sometimes a block of VB6 code is "too creative", archaic, or just plain wrong and must be changed in order to facilitate a clean translation. Source Edits allow you to do this in a repeatable, documented, automated manner. Source Edits can search and replace code, delete code, or comment things out. You can also use Source Edits to delete code or controls altogether, but be sure you also remove all references to any deleted identifiers (See also refactor/remove). Source edits are done after all the code is read in from an original source file and they do not modify that original file.

Pre-Edits in the Compile Block

...
   <Compile Project="%SrcPath%">
   <Fix host="Project1" name="Pre-Edit">
    <Replace status="active" name="remove unusual use of &">
    <OldBlock><![CDATA[Variant = &0)]]></OldBlock>
    <NewBlock><![CDATA[Variant = 0)]]></NewBlock>
    </Replace>
    </Fix>
...
    </Compile>

Pre-Edits in a GlobalSettings File

<GlobalImports>
<Storage Action="Create" Identifier="%UserFolder%\GlobalSettings" />
<Registry type="EditFile" Source="%VirtualRoot%\INCLUDES\companyUsers\companyUserPreProc.asp">
<Fix name="Pre-Edit">
    <Replace status="active" name="remove unusual use of &">
    <OldBlock><![CDATA[Variant = &0)]]></OldBlock>
    <NewBlock><![CDATA[Variant = 0)]]></NewBlock>
    </Replace>
</Fix>
</Registry>
...
</GlobalImports>


Target Edits - the Post-Processor

Target Edits are search and replace operations applied to the target code before it is authored. You can modify, add, or remove blocks of target code almost anywhere in the output code stream, including designer code and project files. There is a variation of the replace command that allows you to replace an entire file with a hand-written version.

Post-Edits in the Author Command

<Author...>
 
   <Author name="%MigName%">
   <Fix host="[%VirtualRoot%\includes\theFunctions.asp]">
   <Replace name="add forTemp to avoid naming conflict">
   <OldBlock><![CDATA[
          foreach(string value in arTransfer)
          {
   ]]></OldBlock>   
   <NewBlock><![CDATA[
          foreach(string forTemp in arTransfer)
          {
             value = forTemp;
   ]]></NewBlock>
   </Replace>
   </Fix>
...
</Author>

The PostEdits are applied to the .NET code stored in the system model immediately prior to it being published in a bundle file.  If something is not matching it either does not exist in the system model of you are not specifying the replace command properly.   You can usually check for a match first using the Search panel on Target code.

The csproj file is a special case as it is not generated from your source code, like other files.  It is too different.  It is generated from information from the VBP and rules in authortext.gmsl, one of the system scripts.  Edits on the csproj file should have no host on the fix element and lang="csproj"  For example:

<Author>
   <Fix name="PostEdit"> notice there is no host attribute
   <Replace name="Update App csproj files to reference gmRTL Assembly" lang="csproj">
   <OldBlock><![CDATA[
         <Reference Include="gmRTL.Core">
            <Name>gmRTL.Core</Name>
            <HintPath>%DeployFolder%\gmRTL.Core\bin\gmRTL.Core.dll</HintPath>
         </Reference>
   ]]></OldBlock>
   <NewBlock><![CDATA[
         <Reference Include="gmRTL.Core">
            <Name>gmRTL.Core</Name>
            <HintPath>%DeployFolder%\gmRTL.Core\bin\Debug\gmRTL.Core.dll</HintPath>
         </Reference>
   ]]></NewBlock>
   </Replace>
   </Fix>
</Author>


Post-Edits using Regular Expressions (Replace status="regex")

Most edits are processed by the translation engine, gmBasic.exe, and these do not support regular expressions, but they have other special properties for white-space handling.  The edits done by gmBasic.exe will run as long as the replace element has no status attribute or has attribute status="active". gmBasic edits operate on the generated code in memory before it is written to disk by the Author command.  Sometime you may find that you want to use a regular expression to modify the generated code.  If you find yourself fixing something with a regex replacement, it may indicate a matter that would be handled better by other techniques and you should contact us for assistance.  However, as a short term work around, we offer a special type of post-edit using status="regex".  These edits are performed by the gmStudio IDE and they are done by editing the generated code bundle file after it is written by gmBasic.  

An example of a regex fix is shown below:

<Author ...> 
... 
<Fix host="" name="Post-Edit">
<Replace name="correct include tag malformation" status="regex">
<OldBlock><![CDATA[runat="server" />\r\n; %>]]></OldBlock>
<NewBlock><![CDATA[runat="server" />
]]></NewBlock>
</Replace>
<Replace name="correct include tag malformation" status="regex">
<OldBlock><![CDATA[;\r\n<inc:]]></OldBlock>
<NewBlock><![CDATA[;
%>
<inc:]]></NewBlock>
</Replace>

Refactoring Commands

Within gmStudio many of the changes made to the code as it moves from its source form into its ultimate .NET target code form can be most easily formulated as refactoring operations. These include the following types of transformations:

  • Renaming symbols either to avoid clashes or to make them easier to maintain.  
  • Re-authoring a subprogram because its original approach is difficult to maintain or is incompatible with the .NET environment  
  • Changing the type of a symbol which was either undefined or specified too weakly (or even incorrectly)  
  • Changing the status of a symbol to change its scope or behavior  
  • Changing the structure of a symbol to clarify its size and dimensionality  
  • Removing a symbol or block of code because it is not needed or wanted.

Note that refactoring operations involve manipulation of the symbol table and of the semantic pseudocode produced. They do not directly reference the source or target code. To reemphasize this point, refactoring tends to affect all occurrences of a specific "type of" symbol across the entire codebase. Edits make individual changes based on the specific instances of source or target symbols the code.

The rule of thumb for using a refactoring operation as opposed to an edit operation is that the specification for the final recipient of the change is a symbol type as opposed to a line/block of code. Both techniques are useful and the selection of which approach to use will clearly vary by individual and application.

Custom COM Replacement

gmStudio allows you to translate code that used COM types to code that uses .NET types. The following rewriting operations can be automated:

  • Rewriting references to .NET assemblies instead of the COM Interop components
  • Rewrite declarations to use .NET classes, interfaces, and enums
  • Rewriting code to use .NET enum entries
  • Rewrite code to call .NET properties and methods, such as changing the name, return type, and arguments
  • Rewriting event declarations, attachment, and handlers.
  • Rewriting UI designer code to setup and use a .NET control or component instead of an ActiveX control.This facility can be applied to both local and external COM components using a special type of Configuration File called a RefactorLibrary. RefactorLibrary techniques can also be used directly within Translation Scripts.

More details on this process are described in Custom COM Replacement.

Custom Language Replacement

gmStudio allows you to customize how VB6 statements and project are expressed in .NET.  The general process for doing this is

  1. Copy the necessary metalanguage files to your project workspace and modify them to your specifications
  2. Select project-specific configuration on the [settings\configuration files] dialog
  3. Edit the metalanguage script to refer to your modified metalanguage files
  4. Rebuild the language configuration file

More details on this process are described in Custom VB6 Language Replacement.

Advanced Transformation -- VB6 forms to WPF 

Examine the WPF Sample to see how a custom language replacement, COM replacement, and gmSL scripts can be used for extremely advanced transformation:VB6 Forms to xaml and WPF