2. WordprocessingML Reference Material

From OOXML-Wiki

Jump to: navigation, search

Contents

[edit] Throughout

Prefixed element names are used throughout without indication whether these prefixes are bound to XML Namespace URIs. Alexbrn 16:35, 1 May 2007 (BST)

[edit] Errors in examples

There are approximately 2300 examples in the WordProcessingML section of the specification. These examples were tested for well-formedness and validity against the schema, using custom software. Approximately 300 of the examples are in error - more than 10%. The list is available at http://surguy.net/articles/ooxml-validation-and-technical-review.xml (I'll move the files here once upload of files to the wiki is possible --Inigo). The examples in error haven't all been checked manually - but a random selection has been checked, and all of those have proved to be correctly identified as errors, which gives a high confidence that the majority of the remainder are also genuine errors.

(Opinion: While a certain number of errors is understandable in any large specification, the sheer volume of errors indicates that the specification has not been through a rigorous technical review before becoming an Ecma standard, and therefore may not be suitable for the fast-track process. Let's discuss this further. --Inigo)

Please read the comments on the discussion page for this topic for more information on the automated validation. User:Inigo.surguy

The use of xml:space='preserve' is inconsistent in examples, which is confusing because it is not clear when and how they should be used. For example on page 989 one of the w:t elements has this attribute, the others do not. This should be corrected here and in all other examples. [Robin]

[edit] Interoperability between ODF and OOXML

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following features:

[edit] Absolute image positioning within a frame

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: image can be positioned absolutely within a frame

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] An option to rotate the text by 90 or 270 degrees

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: an option to rotate the text by 90 or 270 degrees.

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Any number of rows can be selected for repeating Heading

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: any number of rows can be selected for repeating Heading

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Allow 8192 table columns rather than OOXML's 63

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: allow 8192 table columns rather than OOXML's 63

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Background Image in Tables

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Background Image in Tables -- background image can be defined for an entire table, a row or an individual cell. This image is automatically resized when modifying the table.

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Evenly distribute contents in a multi-column section

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Contents in a multi-column section can be evenly distributed resulting in balanced columns

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Ability to set arbitrary Text background color

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: ability to set arbitrary Text background color

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Before/After text around foot notes references

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Before/After text around foot notes references

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Copy Heading while splitting Table

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Copy Heading while splitting Table

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Table Shadowing Style

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Table Shadowing Style

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Vertical numbering in list items

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: vertical numbering in list items

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] 'Leading' line spacing in a paragraph

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: 'Leading' line spacing in a paragraph

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] 'May Break Between Rows' attribute so as not to split a table

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: a 'May Break Between Rows' attribute so as not to split a table

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] An option to specify "Numbers of lines" for widow or orphans lines

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: an option to specify "Numbers of lines" for widow or orphans lines

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] 'Manual' and 'From left' alignment in tables

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: 'Manual' and 'From left' alignment in tables

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Last line alignment in justified paragraph

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Last line alignment in justified paragraph (provision that we can change the last line of the paragraph as Left, Center and Justify)

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Allow entire sections to be marked as hidden

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: allow entire sections to be marked as hidden

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Tabs fill character of a paragraph

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Tabs fill character of a paragraph

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] 'Title' and 'lowercase' style options

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: 'Title' and 'lowercase' style options

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Table can have 'keep with next paragraph' set

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: table can have 'keep with next paragraph' set

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Ability to set each image border with different properties

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: ability to set each image border with different properties

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Font weights beyond just 'normal' and 'bold'.

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: font weights beyond just 'normal' and 'bold'.

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Table of content protection against manual changes

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Table of content protection against manual changes

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Background opacity

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Background opacity

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] auto page break option

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: 'auto' option when application decides if it should insert a page break

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Shadow distance, and a color of shadow other than black

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: shadow distance, and a color of shadow other than black

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Table cell protection

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Table cell protection

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Text blinking

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Text blinking

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Column separator attributes

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Column separator attributes : width, color, height, vertical-align.

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Text-box can define the vertical text alignment

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: text-box can define the vertical alignment of text (top, center, bottom)

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Notes embedded in text-boxes

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Notes embedded in text-boxes

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Assign different page colors in a document

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: ability to assign different page colors throughout the document

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Columns for frames/text-boxes

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Columns for frames/text-boxes

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.

[edit] Keep ratio feature for frames

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the following feature: Keep ratio feature for frames

Proposed change: Include support for this feature from ISO ODF in order to improve interoperability between the two formats.


[edit] Other issues

[edit] 2.1 [36 on p12]

Replace "Pansose-1" with "Panose-1"

[edit] 2.2 [p26, 24 and 27]

These lines define the contents of an OOXML document of type Wordprocessing in terms that are not compatible with the definition of OOXML documents given in Part 1, Section 4. Definitions, page 7, lines 1 to 3. Note that Section 2.2 as a whole is affected by that inconsistency.

Proposed change: Rewrite or remove Section 2.2. May consider explaining what a OOXML story would be in terms of documents renditions by applications.

[edit] 2.2 [p26, 27 and 28]

The definition of 'story' is inappropriate. We shouldn't be defining a markup standard in application terms. We should be defining markup in markup terms. Where the user can type is immaterial.

Proposed change: Clarify the definition of 'story'.

[edit] 2.2.1 [p27]

The child elements of the background element are defined to be in the VML namespace. VML is "a legacy format... included for backwards compatibility reasons". WordProcessingML should be dependent on a deprecated legacy format.

Proposed change Remove this reference to VML. The child elements of the background element should be defined in DrawingML.

[edit] 2.2.1 [p27, 1 and 2]

Assuming that background be referring to the background of the document defined by one of its enclosing elements, assuming that the notion of document page and the notion of displaying be properly defined and that their definitions match commonly accepted ones, then the 'This background shall be displayed on all pages of the document, behind all other document content.' sentence makes unclear whether the total surface of a page must be filled with the background, or else how the subpart of the said surface can be determined.

Proposed change: Clarify the definition of 'background'.

[edit] 2.2.1 [p27, 8 and 21]

Contradicting use of accent3 and accent5 – the text says one thing, but the example says another.

Proposed change: Fix the contradiction.

[edit] 2.2.1 [p28, 0]

The reference to the urn:schemas-microsoft-com:vml namespace references VML, which is considered as deprecated (Part 4, page 4343, lines 11&12). A new standard should not contain deprecated parts.

Proposed change: Remove all references to VML from the OOXML text, hence remove the reference to the urn:schemas-microsoft-com:vml namespace here.

[edit] 2.2.1 [p28, 0]

Child elements of background are described using deprecated features only. Accordingly, the background element should either be described in terms of current OOXML elements or deprecated.

Proposed change: Describe the background element in terms of non-deprecated elements or remove it.

[edit] 2.2.1 [p28, 1]

The sentence 'or auto to allow a consumer to automatically determine the background color as appropriate.' does not define the appropriate behavior of the consumer, whereas the definition of the corresponding simple type, found in Part 4, page 1737, explicitly states that 'This value shall be used to specify an automatically determined color value, the meaning of which is interpreted based on the context of the parent XML element.'

Proposed change: Define the characteristics of the auto value for the color attribute of the background element properly.

[edit] 2.2.1 [p29, 0]

There are several instances of the word 'border' that are meaningless in this context (the text is supposed to describe the 'background' element at that location and no “border” has been defined).

Proposed change: Clarify which border the text refers to (if any notion of border must be introduced here) or else rewrite the text so that it makes sense.

[edit] 2.3.1 [28]

Replace "... a designating specifying ..." with "... an attribute specifying ..."

(The same phrase is also used on p4 in 2.4.1 [36] of the Primer.)

[edit] 2.3.1.8 [p59]

This element uses a bitmask to specify various paragraph conditional formatting properties.. The use of bitmasks rather than a set of boolean types makes this data almost impossible to work with standard XML tools like XSLT which lack bit-level operations.

Proposed change: Rewrite this subclause to express the feature using XML constructs rather than bitmasks.

[edit] 2.3.2 [9 on p159]

Run properties are not represented by the r element.

Proposed change: Change the order or the sentence to more directly relate the r element to the run. "The next level of the document hierarchy is the run, which defines a region of text with a common set of 9 properties. A run is represented by an r element, which allows the producer..."

[edit] 2.3.2.36 [13 on p229]

This section should make it clear that it applies only to non-complex script characters. Compare with "2.3.2.37 1 szCs (Complex Script Font Size)".

Proposed change: Change the title to read "2.3.2.36 sz (Non-Complex Font Size)"

[edit] 2.3.2.36 [14 on p229]

This section should make it clear that the font size are expressed in half-point values.

Proposed change: Change line 14 to read "This element specifies the font size, in half-point values, which shall be applied to all non-complex script characters in the contents of this run when displayed."

Why are half-point values needed anyway? The OOXML binary constrained need for half-point should be within the implementation that takes the standard back to the old binary format. For future documents and evolution, the font size should be specified in a real standard way.

Why are different units of measurement used elsewhere? There should be a standard unit of measurement throughout, or some unit from existing standards such as in the @font-face font description of [CSS2] and the element of [SVG].

[edit] 2.3.2.36 [16 on p229]

Line 16 has confusing structure and use of "value".

Change this to "If this element is not present, the default is to leave the font size as the value applied at the previous level in the style hierarchy.

[edit] 2.3.2.36 [16 on p229]

The font size value used ultimately depends on the font being available. If the font is not available, and font substitution is turned on, then a different font(and size) may be displayed. (See 2.15.3.46 subFontBySize (Increase Priority Of Font Size During Font Substitution))

Font substitution is the process by which an application determines which font to use in place of a font that is referenced by a document, but is not available to the application trying to display the document.

Typically, applications may perform font substitution using any mechanism available. This element, when present with a val attribute value of true (or equivalent), specifies that finding a font with a similar font size shall have increased precedence when doing font substitution for this document.

Change this section and section 2.7.2 Style Hierarchy to clarify effect of missing fonts and font-substitution on the text style and font size used. (link to 2.8 Fonts)


[edit] 2.3.3.6 [26]

It is not clear why the day is shown in the example as 12 rather than Wednesday.

The use of US-style dates in these examples is misleading (do Canadian's use the US format or the UK format?). Dates should be printed in the unambiguous ISO 8601 format in international standards.

Response: The dayShort element exists for legacy reasons as stated in the specification: [Note: The date block is a legacy construct used for compatibility with older word processors, and should not be produced unless it was consumed while reading a document – it is recommended that the DATE field is used in its place. end note]

It is recommended that if present the dayShort element be displayed using the primary editing language of the consuming application. This is legacy behavior that has existed in some legacy documents, which is why it was included in the specification. Clearly that correct approach is as described on the DATE field.

[edit] 2.3.3.14 [p256]

The text says, “This element specifies the language which shall be for this phonetic guide.” This sentence is missing a verb. Is it "used"?

Proposed change: Insert the missing verb

[edit] 2.3.3.19 [p261]

This says that “The layout properties of this embedded object are specified using the VML syntax”. However, in Part 1, Section 8.2.6 says, “VML should be considered a deprecated format included in Office Open XML for legacy reasons only and new applications that need a file format for drawings are strongly encouraged to use preferentially DrawingML”. Certainly a new document creating an OLE embedding should not be using VML. Otherwise, all OOXML consumers will need to support VML, even where legacy documents are not present.

Proposed change: Define layout properties of embedded objects using DrawingML rather than VML

[edit] 2.3.3.30 [7 on p276]

The space name for the attribute implies w:space, which is confusing with the description of xml:space.

Proposed change: For clarity, the name should be shown as xml:space.

Does w:t use of xml:space attribute need the xml namespace to be specified in the “/[Content_Types].xml” or some other aspect of a WordprocessingML document? If needed, clarify any extra impact.

The space attribute description does not define what the default white-space processing mode means for WordprocessingML.

Proposed change: Qualify that default white-space processing depends on application implementing the standard, and what the default mode is for OOXML.

The example states "Although there are three spaces on each side of the text content in the run,..." In the example, the three spaces are at the end of the text content.

Proposed change: Change the example to match description, for most complete example of removing preceding and trailing white space.

<w:t> significant whitespace </w:t>

The default surely depends on the application implementing the standard. Therefore the example cannot say that the white space is removed if the space attribute is not specified. (That may be the default for the MS implementation, but not for other standard implementations.)

Proposed change: Qualify example default white-space processing.

The namespace referenced does not define the possible values. They are defined in the separate XML 1.0 specification, (currently) linked from the namespace reference.

Proposed change: Include definition of supported values or provide dependable link to external definition.

[edit] 2.4.1.16 [7]

re-word FROM "does this cell ... " TO "whether this cell continues the horizontal merge or starts a new merged group of cells". Alexbrn 16:36, 1 May 2007 (BST)

[edit] 2.4.7 [p302]

This element uses bitmasks to specify various table cell formatting properties. The use of bitmasks rather than a set of boolean types makes this data almost impossible to work with standard XML tools like XSLT which lack bit-level operations.

Proposed change: Rewrite this subclause to express the feature using XML constructs rather than bitmasks.

[edit] 2.4.8 [p303]

This element uses bitmasks to specify various table row formatting properties. The use of bitmasks rather than a set of boolean types makes this data almost impossible to work with standard XML tools like XSLT which lack bit-level operations.

Proposed change: Rewrite this subclause to express the feature using XML constructs rather than bitmasks.

[edit] 2.4.46 [p421]

It is desired to have improved interoperability between ODF and OOXML. However, OOXML lacks the ability to specify a multi-row header that repeats across pages, where ODF does.

Proposed change: Include in this section the ability to specify that the first N rows of a table can be selected as a header.

[edit] 2.4.51 [p428-429]

This element uses bitmasks to specify various table style formatting properties. The use of bitmasks rather than a set of boolean types makes this data almost impossible to work with standard XML tools like XSLT which lack bit-level operations.

Proposed change: Rewrite this subclause to express the feature using XML constructs rather than bitmasks.

[edit] 2.4.52 [p430-431]

This element uses bitmasks to specify various table style formatting properties. The use of bitmasks rather than a set of boolean types makes this data almost impossible to work with standard XML tools like XSLT which lack bit-level operations.

Proposed change: Rewrite this subclause to express the feature using XML constructs rather than bitmasks.

[edit] 2.5.2.8 [7]

The date in the w:fullDate attribute should be specified in an ISO 8601 format (yyyy-mm-dd) so that it can be processed by XML processors as an XSD date.

This attribute is inconsistent with respect to the w:date attribute, which requires the use of ISO 8601 format dates.

Response: w:fullDate attribute represents a date in the standard XML Schema DateTime Syntax Format. Therefore it will be processed by XML Processors as an xsd date anyway. XMLschema is the W3C standard which closely follows ISO 8601, hence this is not a contradiction.

[edit] 2.7.3 [16]

See 2.7.3.17 [6 on p690] below. Replace "w:priority" with "w:uiPriority"

[edit] 2.7.3.17 [6 on p690]

It appears that the child w:priority of w:style has been changed to w:uiPriority in the schema and in the tables in the Language Reference but has not been changed in any of the examples throughout the documents I've seen so far.

Replace "w:priority" with "w:uiPriority"

[edit] 2.7.3.17 [27 on p690]

See 2.7.3.17 [6 on p690] above. Replace "w:priority" with "w:uiPriority"

[edit] 2.7.5 [4 on p709]

See 2.7.3.17 [6 on p690] above. Replace "w:priority" with "w:uiPriority"

[edit] 2.7.6 [23]

See 2.7.3.17 [6 on p690] above. Replace "w:priority" with "w:uiPriority"

[edit] 2.7.8 [7 on p734]

See 2.7.3.17 [6 on p690] above. Replace "w:priority" with "w:uiPriority"

[edit] 2.8.2.2 [p740, 0xEE]

This value is said to signify “an Eastern European character set”. There is no such thing. First, “Eastern Europe” is not unambiguously delineated. Second, this region uses many character scripts, including Roman, Cyrillic, Arabic, Armenian, etc.

Proposed change: Explain what is meant by “an Eastern European character set”.

[edit] 2.8.2.2 [p740, 2]

The default character set is said to be “the ANSI character set”. But ANSI has standards for many character sets. Do you mean ANSI 209-1992 “Matrix Character Set for OCR”? Probably not. So a normative reference to a specific standard is required.

Proposed change: Provide normative reference for “the ANSI character set”.

[edit] 2.8.2.13 [5]

Replace "Pansose-1" with "Panose-1"

[edit] 2.8.2.16 [p758-763]

This element uses a set of bitmasks to specify which code pages a given font supports. The use of bitmasks rather than an XML Schema derived type makes this data almost impossible to work with standard XML tools like XSLT which lack bit-level operations. One of the advantages of XML is that we don't need to encode data like this any more.

Proposed change: Rewrite this subclause to express the feature using XML constructs rather than bitmasks.

[edit] 2.11 [6]

typo: "storied"

[edit] 2.13.4 [20]

The w:date attribute is (correctly) shown in ISO 8601 format. This attribute is, therefore, inconsistent with the w:fullDate attribute defined in 2.5.2.8. Inconsistent use of value types in attributes that serve similar purposes will only serve to confuse developers and will inherently lead to errors in coding of applications.

Response: 2.5.2.8 does not define a date attribute. Perhaps this comment is based on the example in 2.5.2.8 which shows a use case involving the dateFormat element which uses the attribute fullDate (both defined in section 2.5.2.7). As can be seen in this section, the fullDate attribute uses the ST_DateTime simple type defined in 2.18.15.

Similarly, the referenced use of w:date in section 2.13.4 is part of an example. The actual definition of that attribute comes in section 2.13.4.2 where the w:date attribute is also defined as being of type ST_DateTime.

ST_DateTime uses the W3C's XML schema dateTime type which follows ISO 8601's conventions for dates.

[edit] 2.13.5.26 [p988]

The restriction applied here has no place in a standard. The presence of a move without proper from and to locations is an error and should not be treated as an insertion. This provisio makes it harder for the reader and easier for a writer, but an interchange standard should always make it easier for a reader, whose task is more difficult in any case. This and other similar provisos should be removed from the specification.

[edit] 2.15.1.14 [p1126]

Heading and schema refer to "bordersDoNotSurroundFooter", but text and examples refer to "bordersDontSurroundFooter". Replace "bordersDontSurroundFooter" with "bordersDoNotSurroundFooter" throughout.

[edit] 2.15.1.15 [p1127]

Heading and schema refer to "bordersDoNotSurroundHeader", but text and examples refer to "bordersDontSurroundHeader". Replace "bordersDontSurroundHeader" with "bordersDoNotSurroundHeader" throughout.

[edit] 2.15.1.28 [p1158, 7]

This says that document protection “shall be enforced”. “Shall” indicates required behavior. But then a few sentences later it says that document protection “may be ignored”.

Proposed change: Clarify this contradiction.

[edit] 2.15.1.28 [p1158]

A hash algorithm is provided, likely based on a legacy algorithm used in Word. This legacy algorithm is known to be a weak algorithm and has effectively been cracked. One could argue that no hash algorithm would be effective in OOXML, since a user could simply unzip the document and hand edit the XML to remove the hash or to set it to some known value. However, some application types such as online editing via Google Docs, or other similar applications, can secure physical access to the document via other means. Editing access to the document does not necessarily presuppose physical access to the document's XML. So there is a necessity for a secure & interoperable hash algorithm, such as SHA-256 for document protection.

Proposed change: Use a standard, FIPS-180 compliant hash algorithm as the default. Legacy hash algorithms should be supported via the described extension mechanism.

[edit] 2.15.1.28 [p1158, 13...]

This algorithm description fails to specify the encoding of the input password. Presumably it is Unicode, but in what encoding? UTF-16BE? UTF-16LE? UTF-16 with a BOM (Byte Ordering Mark)? The described algorithms make use of byte-level manipulations which depend on the machine architecture (big endian versus little endian). So it is necessary that all byte ordering assumptions be made explicit.

Proposed change: Make the byte ordering assumptions explicit, both for the input password and the processing steps, so as to allow cross-platform interoperability. Keep in mind that the hash may be calculated on a different machine architecture than the password was entered with.

[edit] 2.15.1.28 [p1158, 16]

What if the entered password is shorter than 15 characters? Do we truncate to the actual length? Or fill with 0's? Or something else?

Proposed change: Clarify this processing step.

[edit] 2.15.1.28 [p1159, 6-9]

The described processing steps are ambiguous. In particular SHR and SHL give different results on different machines and with signed and unsigned values

Proposed change: Describe the hash algorithm in a platform independent manner.

[edit] 2.15.1.29 [p1172]

This element allows the classification of the document into one of three types: “letter”, “email” or “general”. Although the description says that this feature can be used by, “hosting applications to facilitate customized user interface and/or automatic formatting behaviors based on the 'type' of a given WordprocessingML document”, the taxonomy provided is so weak as to be practically useless.

Proposed change: Either provide a reasonable document type taxonomy, or loosen the type to an xsd:string to allow applications to provide their own.

[edit] 2.15.1.86 [p1251]

This element uses a bitmask to specify a style display filter. The use of bitmasks rather than a set of boolean types makes this data almost impossible to work with standard XML tools like XSLT which lack bit-level operations.

Proposed change: Rewrite this subclause to express the feature using XML constructs rather than bitmasks.

[edit] 2.15.1.87 [p1253]

This element uses a bitmask to specify style display sorting parameters. The use of bitmasks rather than a set of boolean types makes this data almost impossible to work with standard XML tools like XSLT which lack bit-level operations.

Proposed change: Rewrite this subclause to express the feature using XML constructs rather than bitmasks.

[edit] 2.15.2.32 [p1337]

This feature has been defined in a way which ignores the existence of current browsers other than Internet Explorer. What about Firefox? What about Safari? What about Opera? None of these can be set as target browsers. This section requires that “all settings which are not compatible with the target web browser shall be disabled.” But what if I want my application to produce standards-compliant output? So yes to PNG, no to VML, yes to MathML and SVG? I can't seem to specify this.

Proposed change: Ecma should rethink the entire optimizeForBrowser subclause. It looks very much like it is mapping directly to the arbitrary choices of a single vendor's application. This clause should be rewritten to express this feature in an application- and platform-neutral way.

[edit] 2.15.3 [p1368]

These “compatibility” settings solve no general problem. They are merely a museum of settings from previous versions of Microsoft Word. No allowance has been made for legacy settings from other applications. Better to have these be application-specific settings using the existing extensibility mechanisms of OOXML.

Proposed change: Remove the compatibility settings from OOXML.

[edit] 2.15.3.6 [p1378]

The “autoSpaceLikeWord95” element is defined in terms of mimicking a legacy application's behavior. The standard contains insufficient detail on how to replicate this behavior.

Proposed change: Define the intended behavior.

[edit] 2.15.3.26 [p1416]

The “footnoteLayoutLikeWW8” element is defined in terms of mimicking a legacy application's behavior. The standard contains insufficient detail on how to replicate this behavior.

Proposed change: Define the intended behavior.

[edit] 2.15.3.31 [p1426]

The “lineWrapLikeWord6” is defined in terms of mimicking a legacy application's behavior. The standard contains insufficient detail on how to replicate this behavior.

Proposed change: Define the intended behavior.

[edit] 2.15.3.32 [p1427]

The “mwSmallCaps” element is defined in terms of mimicking a legacy application's behavior. The standard contains insufficient detail on how to replicate this behavior.

Proposed change: Define the intended behavior.

[edit] 2.15.3.41 [p1442]

The “shapeLayoutLikeWW8” element is defined in terms of mimicking a legacy application's behavior. The standard contains insufficient detail on how to replicate this behavior.

Proposed change: Define the intended behavior.

[edit] 2.15.3.51 [p1462]

The “suppressTopSpacingWP” element is defined in terms of mimicking a legacy application's behavior. The standard contains insufficient detail on how to replicate this behavior.

Proposed change: Define the intended behavior.

[edit] 2.15.3.53 [p1467]

The “truncateFontHeightsLikeWP6” element is defined in terms of mimicking a legacy application's behavior. The standard contains insufficient detail on how to replicate this behavior.

Proposed change: Define the intended behavior.

[edit] 2.15.3.54 [p1469]

The “uiCompat97To2003” element is defined as: “Disable UI functionality that is not compatible with Word97-2003”. But what use is this if I am using OOXML in OpenOffice or WordPerfect Office? What if I want to disable UI functionality that is not compatible with OpenOffice 1.5? Or WordPerfect 8? Or any other application? Where is the ability for other implementations to specify their preferences?

Proposed change: Define this an application-neutral way. If it is truly a Word-only feature, then remove it from OOXML and express as an application-defined extension.

[edit] 2.15.3.63 [p1481]

The “useWord2002TableStyleRules” element is defined in terms of mimicking a legacy application's behavior. The standard contains insufficient detail on how to replicate this behavior.

Proposed change: Define the intended behavior.

[edit] 2.15.3.64 [p1482]

The “useWord97LineBreakRules” element is defined in terms of mimicking a legacy application's behavior. The standard contains insufficient detail on how to replicate this behavior.

Proposed change: Define the intended behavior.

[edit] 2.15.3.65 [p1483]

The “wpJustification” element is defined in terms of mimicking a legacy application's behavior. The standard contains insufficient detail on how to replicate this behavior.

Proposed change: Define the intended behavior.

[edit] 2.15.3.66 [p1485]

The “wpSpaceWidth” element is defined in terms of mimicking a legacy application's behavior. The standard contains insufficient detail on how to replicate this behavior.

Proposed change: Define the intended behavior.

[edit] 2.16.1 [p1487]

The production rule for field-switch-character is defined as: “field-switch-character: ! one or two Latin letters” However, “Latin letters” is not defined in this specification. Are we to take this literally as only allowing the letters used in Latin, i.e., capital letters A-Z excluding J, U and W? Or is meant the ISO 8859-1, the Latin-1 character set? Or is something else meant?

Proposed change: Provide a precise definition for this production rule.

[edit] 2.16.3 [p1581]

Incorrect capitalization. Substitute "fldChar" for "fldchar" in example. Typo in example. Substitute "separate" for "seperate" in example.

[edit] 2.16.4 [p 1497, day names in different languages]

The instruction ddd "Formats the day of the week or month in its abbreviated form according to the language specified by the lang element (§2.3.2.18) on the run containing the field instructions." There is a similar instruction for month names. There is no list of these names referenced in the specification.

Proposed change: Provide a normative reference to the day and month names used to be used.

[edit] 2.16.4.3 [p1501]

The definition for BATHTEXT references 'the given Thai format', which makes no sense in the context of that definition. What “given Thai format”?

Proposed change: Clarify the definition of 'BATHTEXT'.

[edit] 2.16.5.1 [p 1509, address formats in different countries]

The switch \d "Specifies that the address is to be formatted according to the country/region of the recipient." There is no list of address formats in the specification.

Proposed change: Provide a normative reference to the address formats to be used.

[edit] 2.16.5.3 [p 1511, missing default for ASK response]

The switch \d specifies a default response. "If no default response is specified, the most recent response is used". No behaviour is specified if there is no prior response.

Proposed change: Document the behaviour if there is no default specified, and no prior response.

[edit] 2.16.5.5 [p1512, 11-12]

According to the text, the AUTONUM field is deprecated. A new standard should not contain deprecated parts.

Proposed change: Remove all references to AUTONUM from the OOXML text.

[I disagree with the previous reviewer's proposed change. Instead, I recommend: ]

Proposed change: Mark AUTONUM as deprecated in normative text. Clarify the behaviour associated with deprecation in the Fundamentals section.

[edit] 2.16.5.6 [p1513]

AUTONUMLGL is deprecated - see the comments on AUTONUM.

[edit] 2.16.5.7 [p1514]

AUTONUMOUT is deprecated - see the comments on AUTONUM.

[edit] 2.16.5.8 [p1515]

AUTOTEXT "Inserts the AutoText entry whose name is specified by text in field-argument. Regarding XML generation, the field result is the value of the autotext.". What does this mean? Does it mean that it is possible to use AUTOTEXT to generate OOXML elements that should be parsed? What is the behaviour if that generated OOXML contains further AUTOTEXT elements? How can a document containing the AUTOTEXT element be validated if it can self-modify?

Proposed change: Clarify this text. Add an example of XML generation. Describe the behaviour if AUTOTEXT is present recursively. Describe the implications for validation.

[edit] 2.16.5.8 [p1515]

The AUTOTEXT entry "can be arbitrarily complex and involve VML". VML is deprecated elsewhere in the specification. Non-deprecated components should not depend on deprecated components.

Proposed change: Either deprecate AUTOTEXT, or replace the use of VML within it with the use of DrawingML.

[edit] 2.16.5.10 [p1516]

BARCODE produces barcodes in FIM or POSTNET format. There is no reference to a definition of these formats.

Proposed change: Provide a reference to FIM and POSTNET formats.

[edit] 2.16.5.12 [p1518]

BIDIOUTLINE replicates the AUTONUMOUT command, except for differences in Arabic/Hebrew numbering. AUTONUMOUT is deprecated. BIDIOUTLINE should likewise be deprecated if it replicates a deprecated command.

Proposed change: Deprecate BIDIOUTLINE in normative text.

[edit] 2.16.5.16 [p1524 definition of calendars]

Various calendars (the Saka Era, the Gregorian calendar, and the Lunar/Hijri calendar) are mentioned in the specification without a reference to a definition of those calendars. This is particularly serious for the Gregorian calendar here, because the specification doesn't make clear what sort of Gregorian calendar it is (and calendars such as GregorianXLitEnglish are mentioned elsewhere in the specification).

Proposed change: Provide a reference to the formal definition of those calendars used within the specification.

[edit] 2.16.5.18 [p1524]

The \l switch specifies "If no date-and-time-formatting-switch is used, the date shall use the date format last used by the hosting application when inserting a new DATE field". There is no behaviour specified when a date format has not previously been used.

Proposed change: Specify the behaviour when \l is used and there is no previous date format.

[edit] 2.16.5.22 [p1527]

The EQ command allows equations to be inserted. This is using a different mechanism from the one specified in the Math section of the SharedML. There should not be multiple non-deprecated methods for defining equations defined within the specification.

Proposed change: Deprecate the EQ command in favour of the Math markup language.

[edit] 2.16.5.24 [p1532]

FILESIZE can "round to the nearest kilobyte". The example specifies a file with 4660736 bytes and gives the result of rounding to the nearest kb as 4661 - this is incorrect. It should be 4552, since there are 1024 bytes in a kilobyte.

Proposed change: Correct the example to match the specification.

[edit] 2.16.5.33 [p1537]

This field says that it merely retrieves the picture contained in the named document. Is nothing else to be done with the picture? For example, should it be displayed?

Proposed change: Define what is to be done with the picture once it is retrieved.

[edit] 2.16.5.33 [p1537]

This does not define how a picture is named (field-argument). Is it by a URI? By a local file system path? Either? The example given has a DOS file path, a construct which is not portable.

Proposed change: Define how pictures are named.

[edit] 2.16.5.33 [p1537]

This subclause defines an INCLUDEPICTURE field which “Retrieves the picture contained in the document named”. However, no mention is made of what formats are permissible for the picture.

Proposed change: There should be specified at least a small set of interoperable formats.

[edit] 2.16.5.34 [p1537-1538]

This does not define how a document is named (field-argument-1). Is it by a URI? By a local file system path? Either? The example given has a DOS file path, a construct which is not portable.

Proposed change: Define how documents are named.

[edit] 2.16.5.34 [p1538]

This subclause defines an INCLUDETEXT field which “Inserts all or part of the text and graphics contained in the document named”. However, no mention is made of what formats are permissible for the retrieved text.

Proposed change: There should be specified at least a small set of interoperable formats.

[edit] 2.16.5.34 [p1538]

The \t flag will apply a named XSLT transform to the input XML file and insert the resulting output. However, no proper reference is given to XSLT, so we do not know what version XSLT transform is permitted here.

Proposed change: Provide a proper external normative reference for the XSLT which is allowed here.

[edit] 2.16.5.35 [p1539]

"The text in this switch's field-argument is specifies the language ID"

Proposed change: Remove "is"

[edit] 2.16.5.40 [p1543, 12-13]

The definition for 'LISTNUM' is built upon the concepts of 'current' or 'specific' or 'next series', which are not defined in this context (a backward search on 'series' shades no light on this). Those concepts should be defined in the text, and their definition should either be copied or referenced in the context of the definition for 'LISTNUM'.

Proposed change: Expand or reference the definition for 'series', and/or clarify the definition for 'LISTNUM' by any appropriate means.

[edit] 2.16.5.41 [p1545]

This describes a “MACROBUTTON” field which can run a designated macro or command. But there is no mention of what programming language or API's are allowed for such a designated macro or command.

Proposed change: Described this feature to a level where cross-platform, cross-application interoperability is possible.

[edit] 2.16.5.53 [p1552]

The PRINT field specifies PostScript strings to be sent to the printer. Presumably this is only on printing of the document, but the specification does not make this explicit.

Proposed change: Make it explicit that PostScript within a PRINT field should only be sent to the printer when the document is printed..

[edit] 2.16.5.77 [p1570]

The example that illustrates USERINITIALS section instead shows USERNAME.

Proposed change: Correct the example..

[edit] 2.16.19 [p1591]

Apparently the fldData element contains custom data which must be preserved, but the contents of which are undefined. In other words, this is a place where the legacy format is not being specified. I was contacted a developer of an OpenXML converter, who highlighted this as something they had been unable to decode (many other ambiguities can apparently be deciphered by using Office 2007). It should be deleted or explained. Similarly 2.16.20.

Response: For the element "fldData" no particular semantics are defined, and therefore this field may be used as desired to store additional application-specific data. Section 2.16.19 states that "This element specifies custom field data which shall be associated with the parent field. No information orsemantics are applied to the contents of this data by this Office Open XML Standard, and therefore this fieldmay be used as desired to store additional application-specific data with the field. However, applications shouldnot lose the contents of this custom data if they do not understand or utilize it" That is application specific data could be stored using this field and the format in which it is stored is specific to the consumer or producer of the application and is not a predefined one. Section 2.6 - "Interoperability Guidelines" state that "For the guidelines to be meaningful, a software application should be accompanied by publicly availabledocumentation that describes what subset of this Standard it supports." That is for interoperability the application should provide a documentation on application specific data.

[edit] 2.16.20 [p1591/2]

Same comment as 2.16.19.

[edit] 2.16.25 [p1604]

Incorrect capitalization. Substitute "fldChar" for "fldchar" in example. Typo in example. Substitute "separate" for "seperate" in example.

[edit] 2.18.4 [p1631...]

The artwork provided here is of poor quality providing neither intended scale, spacing, color depth, etc. A small example diagram is an insufficient definition. For example, are the dimensions of the borders absolute? Or do they scale with page size? Also, some of the images, such as 'apples' or 'balloons3Colors' show copying artifacts, where extraneous lines or blotches appear.

Proposed change: Provide full normative definitions for these graphical elements. Also, for informative purposes, these graphics may be provided in standalone file form, preferably in a scalable graphics format like SVG.

[edit] 2.18.4 [p1631...]

No mechanism for expanding the set of art borders is provided. Since the specified art borders are heavily Western-oriented, it would be good to provide a way for an application to supplement these styles with graphics that provide more regional flavor.

Proposed change: Provide an interoperable extensibility mechanism for a document author or application to specify their own art border graphics.

[edit] 2.18.7 [p1690]

Various calendars are mentioned in the specification (Saka Era, Taiwan, Gregorian, GregorianXLitEnglish, etc.) without a reference to a definition of those calendars.

Proposed change: Provide a reference to the formal definition of those calendars used within the specification.

[edit] 2.18.45 [p1737]

Length is said to be “exactly 3 characters”. This is inconsistent with the example given which has a length of 6 characters.

Proposed change: Clarify the definition. In particular note that xsd:hexBinary measure length in octets, not characters.

[edit] 2.18.51 [p1747]

The use of 255 enumerated language codes, in addition to ISO 639-1 codes, adds no expressive value and only increases the work required of any application that would process an OOXML document.

Proposed change: Drop the use of the redundant ST_LangCode

[edit] 2.18.51 [p1747, 22]

Double quotes used incorrectly, with two sets of close quotes.

Proposed change: XML examples should be given using straight quotes

[edit] 2.18.52 [p1748]

This type is defined as containing, “a two digit hexadecimal language code”. It is fruther stated that, “This simple type's contents must have a length of exactly 2 characters”. However, two hex digits can count up to 255 and the values enumerated in this clause go far beyond that.

Proposed change: Reconcile the description of the type with the enumerated values.

[edit] 2.18.57 [p1759]

The description of this type says it contains four hexadecimal digits, four hexadecimal octets and exactly four characters. These definitions are not compatible. A hexadecimal octet is two hexadecimal digits.

Proposed change: Clarify the definition. In particular note that xsd:hexBinary measure length in octets, not characters.

[edit] 2.18.66 [p1771]

The formatting system described here is not comprehensive, lacking, for example, support for Armenian, Tamil, Greek alphabetic, Ethiopic and Khmer numerations, all in use today, as well as the various historical systems still used by scholars.

Proposed change: Use a more flexible, extensible, generative approach to numeration, such as that used by the W3C's XSLT standard in their xsl:number support

[edit] 2.18.66 [p1771]

There is nothing in this section which is normatively defined except some enumeration values. No normative meanings to these values are given. An informative example is insufficient in all but the most trivial cases. For example, where is “Korean Legal Counting System” defined?

Proposed change: Give explicit definitions of these numbering styles or proper external normative references.

[edit] 2.18.66 "chicago" [p1772]

Format is defined in reference to the “Chicago Manual of Style”, but no edition or page reference is provided.

Proposed change: Either include the entire definition in the standard, or provide a proper external reference.

[edit] 2.18.66 “decimalEnclosedFullstop” [p1772]

The example given does not show enclosed characters and so contradicts the normative text.

Proposed change: Reconcile the text and the example.

[edit] 2.18.66 “decimalFullwidth”, etc. [p1773]

There are several mentions of double-byte and single-byte Arabic numbering schemes. Since the conformance clause for OOXML requires the use of Unicode in UTF8 or ITF