холни маси
WordWalkingStick 10-21-2010 10-42-07 AM

There are a happy few people on this planet who expressed concern about the next version of CleanXHTML. So today finds me pasting this Blog post into WordPress from my next version of CleanXHTML: WordWalkingStick. This project will be released on CodePlex.com (along with my other projects) ‘soon.’ This project will be released under the same license as Eric White’s license for PowerTools for Open XML, the Ms-PL. This is done out of respect for portions of WordWalkingStick depending on PowerTools for Open XML.

Here are some random points about WordWalkingStick, written just before midnight:

  • WordWalkingStick is based on .NET 4, using MEF, WPF and PowerTools for Open XML.

  • This application has a scope larger than CleanXHTML as it provides a small framework for rolling up all of my Office Word customizations. Previously, my practice depended on customizing Normal.dot.

  • This move puts me out of the commercial software business (based on a business model from the early 21st century).

Using my “Swiss Army Knife” (my stick) for Office Word is supposed to make customizing faster and easy to migrate to future versions of Office (until it moves entirely to the cloud or VSTO is discontinued as we now know it).

The shot below shows a Word 2010 Rich Text Content Control, nesting a Plain Text Content Control:

This idea of nesting content controls comes to me from Eric White’s “Using Nested Content Controls for Data and Content Extraction from Open XML WordprocessingML Documents.” Eric mentions this very important bit:

Important note: In order to nest content controls, the containing content control must be a rich-text content control.  You create one of these using the upper-left button in the Controls section of the Developer tab.  Thanks, Darin.

Another important bit: you cannot nest content controls in Word 2007 or earlier! This new feature in Word 2010 effectively replaces the functionality of “Custom XML” that has been removed by a court order from Word 2010. I daresay nested content controls are not as conceptually embarrassing as some critics of Microsoft have claimed. The Content Control does not require the use of an external schema file (which was technically entertaining to me—but not to many, many others).

It is very, very important (to me) to see nested content controls in Design View (above). However, most writing about this subject shows them in print/layout view (below):

Without the news in Eric’s article, I would be essentially doomed. Yes, ‘doomed’ is a strong word so let the research of Peter Sefton help me be a bit more articulate. He has a 2008 article entitled “Embedding metadata and other semantics in word processing documents” and the title speaks clearly to  me. Modern word processing file formats need a standard way to store metadata. And, no, there is no quiet, elegant Open Source program out there that saves the day. Anyone out there who considers their documents first-class entities for any data management system cannot dismiss Word 2010 with a bunch of Microsoft player-hating. I keep trying to get rid of Word and I keep going back.

BTW: In case you can’t get that Peter Sefton article, try the slide deck “Embedding Metadata In Word Processing Documents” (or the PDF).

My limited research informs me that Eric White has gone the longest way toward consistently (almost daily at times) and explicitly applying contemporary .NET technologies with Microsoft Office. Surely Eric would suggest that he deals in Microsoft Office file formats—not Office itself (VSTO). Moreover Eric might say that he had very little to do with the Document Reflector—arguably the most important tool written for the VSTO world seen through the lens of Open XML.

The Document Reflector is part of version 2.0 of the Open XML SDK bundled in a GUI application called “Open XML SDK Development Productivity Tools.” BTW: since I am unknown for “beating up”on Microsoft’s Brian Jones, it must be said that Brian Jones mentions Document Reflector more than Eric White, according to my last search.

Hey, Eric is a busy guy—he’s been writing or pointing us to articles like:

Transforming Flat OPC Format to Open XML Documents Even though Eric White makes no mention of VSTO in this article. This is the one that suggests (to me) how to use the full power of the Open XML SDK inside of Microsoft Word. The official priority by the way appears to be that Open XML tools are written for processing documents outside of word (for massive, long-awaited, server-based solutions).
The Flat OPC Format “Note that the Flat OPC format is not the same as the ‘Word 2003 XML Document’ format.  Those documents have a schema that is very different from the Flat OPC format.”
Using Open XML to Improve Automation Performance in Word 2010 for Large Amounts of Data

“The Range.WordXml object returns a Flat OPC XML document for that range as a string. You use this to prepare an in-memory package so that your code can access necessary parts such as the main document part, the styles part, and the numbering part.”

The Word object model is moving target with regard to Open XML. The Range.WordXml object has been replaced by the Range.WordOpenXML Property.

Transforming Open XML WordprocessingML to XHtml A “map” listing 18 articles on the subject of Open XML and XHTML. Truly groundbreaking for Microsoft!

Open XML for Word 2010 VSTO links:

Open XML, the “Custom XML” litigation and Content Controls

So, I’ve talked about what appears to be my “Custom XML” problem earlier. It may be the right place and time to add a few flippant remarks.

Microsoft’s recognition of this Texan ruling lies in “Utility to manage custom XML markup feature availability for customers outside the United States and its territories”; the title speaks for itself. Articles like “Associating Data with Content Controls” from the TechNet world (Gray Knowlton) go deeper into this “Custom XML” issue (and back to Eric White).

I’m sure I was wearing headphones with the sound going directly into my ears while Paul Thurrott in some episode of Windows Weekly mentioned in passing that “Microsoft complies with court, strips Word of custom XML.” It was a jury in Texas that decided that my digital life should be intimately disrupted as “Microsoft has issued updates for Word 2007 and Word 2003 that strip those applications of a feature that infringes on the patent of a tiny Canadian software company, i4i.” And I’m flippantly sure that Paul Thurrott said that this change will have an “insignificant” impact on whatever he continually says “whatever” about… so, speaking of bad comedy, here’s a picture from a previous post showing just how much I’m into “custom XML”:

One important finding of mine disagrees with the use of the word “strip” in sentences like:

So what do you do if you have custom XML in your Word documents? If you don’t use the custom XML, then there’s no problem, just open the files and Word will strip it out, leaving you the rest of the document. Same if your use can be switched to using another feature. You will lose your existing markers but otherwise can continue.

What’s actually happening (according to my copy of Word 2010) is that word is not altering the contents of my documents simply because it contains “custom XML.” This apparently “illegal” content is not displayed in Word 2010. The XML defining the “custom XML” is still stored in the document.

What this suggests (after many hours curled up on the floor sobbing, Why me!) is that the Open XML SDK can be used to reach those fragments of “custom XML”—once there one could:

  • Brutally copy the contents of the document (with a VSTO add-in) and paste it back into Word. This might coerce the “custom XML” tags to show again because (according to my copy of Word 2010) the commands and tools related to “custom XML” work as expected—you simply can’t display your work in a future editing session.
  • Stop using “custom XML” and use the Content Control instead. In “What is ‘Custom XML?’ … and the impact of the i4i judgment on Word,” this suggestion is made. The first subtle problem here is that Content Control visuals don’t appear in draft mode—which is my favorite mode to work in Word.
  • Assume that Microsoft will not let some judge in Texas and some company in Canada stop them from “innovating” with Word. It may take them years but they’ll come out with some kind of “embrace and extend” trick.

In the summer of 2009, Mary Jo Foley reported that Microsoft appealed the decision. Since I’m writing this very, very late to the party, clearly the appeal failed. In fact, in the winter of 2009 we find Tim Bray saying:

I see that Microsoft lost an appeal in the “Custom XML” litigation, and may be forced to disable that functionality in Microsoft Office. This is a short backgrounder explaining what “Custom XML” is about, and why nobody should care.

Hey, let’s drive this issue into the ground (deeper) with Stéphane Rodriguez (in 2008):

It’s interesting that Microsoft bloggers don’t even seem to be [embarrassed] by ridiculous expressions such as “Custom XML”. Custom XML is indeed just as silly as “Office Open XML” : the reason is X in XML already means Custom.