Support Statement: Unused Symbol Analysis and Reporting

Overview

Large mature systems often contain symbols that are no longer used. These may be methods that are no longer called or variables and data structures that are no longer referenced by active logic. These unused symbols are often referred to as "dead code, dead wood, Cruft". Dead code make a system "smell bad" and a form of technical debt, and it can be difficult to identify and risky to remove.  

An Upgrade Project is an optimal time to deal with dead code because gmStudio can help you identify and remove it.  Removing dead code makes the overall Upgrade effort proceed more efficiently and can produce more maintainable results. And, you can test for problems associated with removing dead code while you are regression testing for the platform upgrade.  Removing dead code is a Structural Upgrade feature that teams can integrate with other Custom Upgrade efforts.

Identifying Dead Code

gmStudio's Refactor/Remove command directs the removal of symbol declarations, and optionally, symbol references as well. The Refactor/Remove command requires you to specify the identifier of the symbol to remove. In some cases, the Upgrade Team may already know some of the identifiers of dead symbols, but typically, there are many cases of dead code that are not known. Teams can benefit from an automated analysis that identifies dead code. This document describes how gmStudio can help automate the process of identifying and removing dead code.

Be Careful with inadvertently removing Late-bound Symbol

CAUTION: The Unused Symbol analysis depends on the detailed symbol reference information gathered by gmBasic during the translation process: unreferenced symbols are unused symbols. However, this approach fails to identify references made through late calls and may incorrectly report late-called symbols as unused. You should complete type inference optimization and other techniques to reduce late calls prior to removing symbols identified by Unused Analysis. You should also plan to take advantage of the LateCalls Report and DoNotRemove features to fine tune the Unused Analysis.  Always take time to consider these limitations and review the removal scripts generated by gmGlobal as well as the impact of those scripts on your translations.

Unused Symbols Analysis should be performed after Shared Files Consolidation.

Types of Symbols assumed used by default

There are certain categories of symbols that are automatically excluded from Unused Analysis and will not be reported as unused even if they are not referenced.  These include Event handlers (e.g.  Form and Control event handlers, Class_Terminate, Class_InItialize, Sub Main, etc.) and symbols declaring or implementing interfaces.  Event handlers may be called from external code and interface declarations and implementations are also needed even if they are not explicitly used. 

Unused Symbols Report

Unused symbols analysis results may be reported from the Report/Analytics menu.  Simply select the desired translated migration project tasks to include in the analysis then click the report from the menu.  gmStudio will prepare a reporting script, run the analysis, and produce the report.  For example, the default script to report Unused Symbols is shown below:

<gmGlobal>
<!--
Description: gmGlobal Script for Unused Symbols Analysis
Learn more here: https://portal.greatmigrations.com/display/GMG/Support+Statement%3A+Unused+Symbol+Analysis+and+Reporting
-->
<Select Progress="1" />
<InformationFiles>
%VbiList%
</InformationFiles>

<Output Status="New" FileName="..\report\RemoveUnused.xml" />
<RemoveUnused Include="method,property,declare" />
<Output Status="Close" />

<Output Status="New" FileName="%ReportPath%" syntax="Tabbed" />
<ReportRemovals />
<Output Status="Close" />

</gmGlob

The script above produces two outputs:

  • report\RemoveUnused.xml: translation rules that will direct gmStudio to remove unused symbols from the translations.  These rules may be incorporated into you upgrade solution as desired.
  • report\<MigName>-Unused.tab: a tab-delimited report of symbols identified as unused.  The resulting report looks like this:
        
      

A sample RemoveUnused Script

I will demonstrate the Unused Symbol Analysis and Reporting with a small demo project. Project has default translations setup for two related VBPs:

  • projUnusedDll.vbp
  • UnUsedTestEXE.vbp

The first example script is RemoveUnused1.xml:

<gmGlobal>
   <Select Progress="1" />
   <InformationFiles>
      <Load id="UnUsedTest-projUnusedDll-std-csh.vbi" />
      <Load id="UnUsedTest-UnUsedTestEXE-std-csh.vbi" />
   </InformationFiles>
   <Output Status="New" FileName="..\report\RemoveUnused.xml" />
   <RemoveUnused />
   <Output Status="Close" />
</gmGlobal>

This script directs gmGlobal.exe to load the VBIs for two VBP translations and generate a RemoveUnused script. The script is placed in the bundle File associated with the task.

The InformationFiles statement supplies the list of information files to be processed by down stream statements like FindCallByName or RemoveUnused . The list is initiated either by the statement <InformationFiles site="folder"> or by a statement <InformationFiles> with no site attribute specified. When "site" is specified the list consists of all files with the "vbi" extension in the specified folder. When not, each file is introduced by a statement <Load id="pathname">. 

The RemoveUnused statement scans a set of loaded information files to find those members that are not used. The RemoveUnused algorithm proceeds in two phases – Local and Global.  Once a set of members are removed, any other members that were referenced only in that set also become unused. These members then form the next set of removals and so on.   But "identifying members only referenced by the removed members" cannot be done directly. Rather all references made by unremoved members must be computed. This core operation is performed by the gmAPI Runtime.References() service method and this method is at the heart of the algorithm here. This final loop of identifying unused members and recomputing the references using Runtime.References() until no new unused members are found is performed by the method RescanForUnusedReferences(). This method is called by both the Local and Global removal phases. Once all Unused members have been identified, the RemoveUnused statements authors a set of Registry.RefactorFile statements that can be used by the same translation scripts that produced the information files to author target code that does not include the unused members.

CAUTION: The RemoveUnused algorithm changes the attributes of the members of the information files. Once it has completed, these information files cannot used again to do this operation again, perhaps with different member types or different do not removes. The source information files must be refreshed by rerunning the scripts that produced them after they are processed by gmGlobal.

The reporting process also produces a log of its operations with verbosity corresponding to the Select.Progress setting; for example:

Select.Progress=1

gmGlobal V40.34x86(09/22/22) System Build(09/22/22 14:48:10)
The InformationFiles list contains 2 files.
The Local Removal Scan of <UnUsedTest-projUnusedDll-std-csh.vbi> required 2 passes.
The Local Removal Scan of <UnUsedTest-UnUsedTestEXE-std-csh.vbi> required 3 passes.
Performing Global Removal pass 1
Performing Global Removal pass 2
Performing Global Removal pass 3
Performing Global Removal pass 4

Select.Progress=2

gmGlobal V40.34x86(09/22/22) System Build(09/22/22 14:48:10)
The InformationFiles list contains 2 files as follows:
   Information File(1): UnUsedTest-projUnusedDll-std-csh.vbi
   Information File(2): UnUsedTest-UnUsedTestEXE-std-csh.vbi
The Local Removal Scan of <UnUsedTest-projUnusedDll-std-csh.vbi> required 2 passes.
The Local Removal Scan of <UnUsedTest-UnUsedTestEXE-std-csh.vbi> required 3 passes.
Performing Global Removal pass 1
The information file <UnUsedTest-projUnusedDll-std-csh.vbi> has 3 unUsed members
   projUnusedDll.Class1.DLLexposedUsedbyClient:42265 has 0 global references
   projUnusedDll.Class1.dllPropUsedbyClient:42327 has 0 global references
   projUnusedDll.Class1.dllPropNotUsedbyClient:42584 has 0 global references
Performing Global Removal pass 2
The information file <UnUsedTest-projUnusedDll-std-csh.vbi> has 1 unUsed members
   projUnusedDll.Class1.DLLexposedNotUsedbyClient:42202 has 0 global references
Performing Global Removal pass 3
The information file <UnUsedTest-projUnusedDll-std-csh.vbi> has 1 unUsed members
   projUnusedDll.Class1.pubOnlyUsedinDLL:42140 has 0 global references
Performing Global Removal pass 4

The output of the run produces this script with refactoring commands.

<Registry type="RefactorFile" Source="...\UnusedTestDLL.vbp">
<Refactor errorStatus="warn">
   <Remove identifier="projUnusedDll.Class1.privNotUsedinDLL"/>
   <Remove identifier="projUnusedDll.Class1.pubUsedOnlyFromPrivate"/>
   <Remove identifier="projUnusedDll.Class1.dllPropUsedbyClient.Let.val"/>
   <Remove identifier="projUnusedDll.Class1.dllOnlyGetUsedbyClient.Let.val"/>
</Refactor>
</Registry>
<Registry type="RefactorFile" Source="...\UnUsedTest.vbp">
<Refactor errorStatus="warn">
   <Remove identifier="UnUsedTestEXE.modUnUsedTest.NotUsedDecl"/>
   <Remove identifier="UnUsedTestEXE.modUnUsedTest.UnreachableDecl"/>
   <Remove identifier="UnUsedTestEXE.notUsedSub"/>
   <Remove identifier="UnUsedTestEXE.notUsedFunc"/>
   <Remove identifier="UnUsedTestEXE.UnreachableSub"/>
   <Remove identifier="UnUsedTestEXE.UnreachableFunc"/>
</Refactor>
</Registry>

The commands above may be used as a starting point for implementing rules that will remove unused code from your translations. The commands would be integrated with the translation process using a GlobalSettings script.

Customizing Global Analysis

If you want to customize the behavior of either of these reports, use the Settings Screen to activate a copy of rpt.RemoveUnused.xml or rpt.UnusedSymbols.xml Template and modify it to meet your needs as described below.

Controlling the Types of symbols examined by the Unused Analysis

The RemoveUnused command itself has two mutually exclusive attributes: Include and Exclude. These specify what types of members – methods, fields, constants, properties, declarations, events, enumerations, or structures – are eligible for removal. By default, all of these types are eligible for removal. Automatically removing all of these unused member types from the application code may be going too far in many cases. The user may be mainly looking for help with identifying and removing unused subprograms since they typically reference each other creating a complex web of dependencies that are tedious to unravel. Or a user might want to retain all events rather than singling out particular ones via a DoNotRemove list.

The types of symbols included in the Unused Symbols analysis may be set the Include attribute of the RemoveUnused command; for example:

<RemoveUnused Include="method, declare, property" /> 

Alternatively, the types of symbols excluded from the Unused Symbols analysis may be set the Exclude attribute of the RemoveUnused command; for example:

<RemoveUnused Exclude="field, constant, structure, enumeration, event" /> 

The following type names are recognized in the Include and Exclude attributes:

      method
      field
      constant
      property
      declare
      structure
      enumeration
      event

Preventing symbols from being marked as Unused

In addition to its Include and Exclude attributes the RemoveUnused statement also supports a set of DoNotRemove subcommands. The DoNotRemove subcommands store a series of DoNotRemove identifiers classified by their scope and then scans the set of loaded information files to locate and mark any members that are specified in the list. There are three scope levels: Global = 0, Library = 1, and member = 2+. The level is simply computed by counting the periods in the identifier. Each subcommand has a single "id" attribute. The example below shows using a DoNotRemove element to suppress removal of a specific symbol.

<gmGlobal>
   <InformationFiles>
      <Load id="UnUsedTest-projUnusedDll-std-csh.vbi" />
      <Load id="UnUsedTest-UnUsedTestEXE-std-csh.vbi" />
   </InformationFiles>
   <Output Status="New" FileName="..\report\RemoveUnused.xml" />
   <RemoveUnused>
      <DoNotRemove id="projUnusedDll.Class1.privNotUsedinDLL"/>
   </RemoveUnused>
   <Output Status="Close" />
</gmGlobal>

Identifying Late Called Symbols

As mentioned above, the Unused Analysis risks marking late called symbols as unused.  Users must carefully reviewing the reported removals for symbols known to be accessed by a late call and make exceptions for them using DoNotRemove commands.  gmGlobal provides the FindCallByName command to assist with finding late calls as illustrated below:

gmAPI_CallByName.xml

<gmGlobal>
   <Storage Action="Create" Identifier="gmAPI_CallByName" />
   <LoadEnvironment />
   <Select MaxOutputWidth="2048" />
   <Output Status="New" Filename="..\report\gmAPI_CallByName.out" />
   <InformationFiles>
      <Load id="UnUsedTest-projUnusedDll-std-csh.vbi" />
      <Load id="UnUsedTest-UnUsedTestEXE-std-csh.vbi" />
   </InformationFiles>
   <FindCallByName ShowDetails="on" />
   <Storage Action="Close" />
</gmGlobal>
The C# gmAPI has not yet been integrated with the RemoveUnused logic, but this may be done in a future release.

One of the major strategies for removing unwanted CallByNames is to provide interfaces which can be used to either type the host of the CallByName or to box the CallByName itself. Before actual CallByName refactoring instructions can be authored to use these interfaces the interfaces themselves have to be defined. The optional "Interfaces" attribute generates an Interface Description File (IDF) based on the unresolved CallByNames encountered; for example:

<FindCallByName ShowDetails="on" interfaces="LateCallInterfaces" />

This IDF file can then be referenced by a FindCallByName script that does not specify the Interfaces attribute. In this form a set of refactoring commands are authored using the IDF generated by FindCallByName@interfaces.

Using gmGlobal Analysis as an Upgrade Project Task 

The gmGlobal Tool

The Unused Analysis and Late Calls Reports are produced by gmGlobal.exe, a tool distributed with gmStudio. gmGlobal is a console application that takes a gmPL script on its command line. The script tells gmGlobal what code to analyse, how to analyse it, and how to report the results. As discussed above, gmGlobal recognizes and internally processes the following gmPL commands:

  • InformationFiles
  • FindCallByName
  • RemoveUnused
  • ReportRemovals

gmGlobal can analyse a collection of inter-related components and also identifies symbols that are not referenced at all as well as symbols that are referenced only from dead code.

You may integrate gmGlobal into your gmStudio project as a special gmGlobal task. Typically this will be inserted into the task list to run after the VBPs are translated to create the VBI files to be analysed. When you Translate this special task, gmStudio will run prepare an actual script and run gmGlobal.exe passing the actual script as the first argument.

The gmGlobal tasks have the Source File set to the gmGlobal script template. This task must use the GMTOOL: notation to specify the location of gmGlobal.exe.

GMTOOL:C:\Program Files (x86)\GreatMigrations\gmStudio\gmGlobal.exe

Here are some attributes of a sample gmGlobal task:

Source Name              = [RemoveUnused1]
Source Location          = [C:\gmSpec\Util\UnUsedTest\proj\usr]
Source File Name         = [RemoveUnused1.xml]
Translation Script       = [GMTOOL:C:\Program Files (x86)\GreatMigrations\gmStudio\gmGlobal.exe] 
Task Command Script      = [UserCmds.cmd]
Code Bundle Path         = [C:\gmSpec\Util\UnUsedTest\proj\log\UnUsedTest-RemoveUnused1-std-csh.bnd]