Support Statement: Improving Type Inference for MSHTML
Overview
We have this VB6
Dim wb As WebBrowser Set wb = New WebBrowser Dim v As Variant v = wb.Document.All("ID").innerText The initial C# translation of the complex expression is: v = ((object)((dynamic)wb.Document).All("ID").innerText());
The presence of dynamic here indicates the tool was not able to fully resolve the expression. The code builds, but it may lead to runtime problems during testing.
Analysis and Migration Rulese
The first weak link in the expression is WebBrowser.Document. This property is weakly typed in COM:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:\gmSpec\COM\MSHTMLTest\proj\idf\FromIdl\shdocvw.dll.xml ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 152 <class id="IWebBrowser" parent="IDispatch"> > 156 <property id="Document" type="Object" status="Out"/>
At runtime, I believe this will have the interface described by MSHTML.HTMLDocument.
I can declare this typing using a RefactorLibrary:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:\gmSpec\COM\MSHTMLTest\proj\usr\shdocvw.dll.Refactor.xml ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 <RefactorLibrary> 5 <Refactor id="[shdocvw.dll]" errorStatus="warn"> 7 <reference id="mshtml.tlb" /> > 8 <Migrate id="IWebBrowser.Document" type="MSHTML.HTMLDocument" /> 10 </Refactor> 11 </RefactorLibrary>
Note that the Refactor command also loads mshtml.tlb so the HTMLDocument type is available.
This gets us to this point:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \gmSpec\COM\MSHTMLTest\proj\deploy\MSHTMLTest\modMSHTMLTest.cs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 44 public static void Test() 45 { 46 SHDocVw.WebBrowser wb = null; 47 48 wb = new SHDocVw.WebBrowser(); 49 50 object v = null; 51 > 52 v = ((object)((dynamic)wb.Document.all.item("ID")).innerText);
Notice that all, and item are now lower case and innerText is a property rather than a method.
This is all consistent with the declarations in MSHTML.
Also the stub framework generated for WebBrowser reflects the type change:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \gmSpec\COM\MSHTMLTest\proj\deploy\MSHTMLTest\externs\SHDocVw.cs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 using System; 3 namespace SHDocVw 4 { ... 16 public class WebBrowser : InternetExplorer 17 { 18 public MSHTML.HTMLDocument Document
Note that MSHTML is a very large, complex API with many scars from the browser wars. There are numerous versions of similar interfaces and liberal use of late binding. I suspect this complexity was done to allow the API to maintain some backward compatibility as Microsoft rolled out version after version over the last 30 years.
Starting from HTMLDocument, I trace my way through the MSHTML object model to find the most likely parent of innerText; it has many to choose from. I choose DispHTMLGenericElement.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:\gmSpec\COM\MSHTMLTest\proj\idf\FromIdl\mshtml.tlb.xml ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >60720 <coclass id="HTMLDocument" uuid="25336920-03F9-11CF-8FD0-00AA00686F13"> 60721 <subclass id="DispHTMLDocument"/> 60722 <subclass id="HTMLDocumentEvents"/> >34034 <class id="DispHTMLDocument" parent="None"> 34036 <property id="all" type="HTMLElementCollection" status="Out"/> >60226 <coclass id="HTMLElementCollection" uuid="3050F4CB-98B5-11CF-BB82-00AA00BDCE0B" creatable="off"> 60227 <subclass id="DispHTMLElementCollection"/> >21052 <class id="DispHTMLElementCollection" parent="None" default="item"> 21058 <method id="item" type="stdole.IDispatch"> 21059 <argument id="name" type="Variant" status="ByVal" optional="Default"/> 21060 <argument id="index" type="Variant" status="ByVal" optional="Default"/> 21061 </method> 8883 <class id="DispHTMLGenericElement" parent="None"> > 8913 <property id="innerText" type="String" status="InOut"/>
I can reflect this decision in another RefactorLibrary:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:\gmSpec\COM\MSHTMLTest\proj\usr\mshtml.tlb.Refactor.xml ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 <RefactorLibrary> 5 <Refactor id="[mshtml.tlb]" errorStatus="warn"> > 6 <Migrate id="DispHTMLElementCollection.Item" type="DispHTMLGenericElement" /> 7 </Refactor> 8 </RefactorLibrary>
The resulting translation now looks like this:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \gmSpec\COM\MSHTMLTest\proj\deploy\MSHTMLTest\modMSHTMLTest.cs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 44 public static void Test() 45 { 46 SHDocVw.WebBrowser wb = null; 48 wb = new SHDocVw.WebBrowser(); 50 string v = ""; > 52 v = wb.Document.all.item("ID").innerText;
Note also that the type of innerText has been used to infer the type of variant v as string.
The translation also picks up a stub object model for the MSHTML namespace.
The resulting translation is clean and builds in .NET.
inline method
The approach above used two RefactorLibrary files and registered them with commands in the translation script:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:\gmSpec\COM\MSHTMLTest\proj\usr\MSHTMLTest_std.tran.xml ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... > 26 <Registry type="MigFile" Source="shdocvw.dll" Target="shdocvw.dll.Refactor" /> 27 <Registry type="MigFile" Source="mshtml.tlb" Target="mshtml.tlb.Refactor" /> 28 29 <Compile Project="%SrcPath%">
It is also possible to inline the RefactorLibraries directly:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:\gmSpec\COM\MSHTMLTest\proj\usr\MSHTMLTest_std2.tran.xml ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... 27 <Compile Project="%SrcPath%"> > 28 <Refactor id="[shdocvw.dll]" errorStatus="warn"> 29 <reference id="mshtml.tlb" /> 30 <Migrate id="IWebBrowser.Document" type="MSHTML.HTMLDocument" /> 31 </Refactor> 32 <Refactor id="[mshtml.tlb]" errorStatus="warn"> 33 <Migrate id="DispHTMLElementCollection.Item" type="DispHTMLGenericElement" /> 34 </Refactor> 35 </Compile>
The RefactorLibrary approach is bit more work, but it is recommended because it provides more structure to the upgrade solution and RefactorLibrary files may be reused and extended over time.
Note that gmStudio will create or open the RefactorLibrary for a COM file when you click "Edit RefactorLibrary" from the References context menu.