Conversion specifications

Original documents

The original document file format of the conversion source is docx file only. doc format files saved in old Microsoft Word are not subject to conversion processing.

Version of destination HTML

By default, tags that conform to the HTML specifications are output.

HTML specification reference

If you specify “-xhtml” parameter as a conversion option, XHTML 1.0 compliant tags will be output.

In addition, the tag samples of the following conversion specifications explain the state of conformance to the HTML specifications.

Root, head and meta-information

Conversion source Conversion destination (HTML tag) Remarks
Root <!DOCTYPE html> <html lang="">

Japanese ver.: lang=”ja”

English ver.: lang=”en”

See Note 1 for language judgment.

The language can also be specified by a parameter in the conversion options.

Character encoding <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> UTF-8 is the basic format. In addition, Shift_JIS and UTF-16 can be specified as conversion option parameters.
Info: Title <head> <title>~</title> </head> Get the title information from the contents of the property "Title" on the Word "Info" tab.
<head> <meta name="author" content=""> <meta name="description" content=""> <meta name="keywords" content="">

Converts the property items in the Word "Info" tab to name attribute values and the settings to content attribute values. the correspondence between the name attribute values and content attribute values is as follows:

author: Author

description: Comment

keywords: Tag

CSS link <link href="xxx.css" rel="stylesheet" type="text/css" media="print"> xxx.css is the specified CSS file name. The media attribute is optional.
Default style <head> <style>CSS style</style> </head>

Sets the default CSS to be applied to the entire HTML. The two settings are as follows:

(1) Paragraph text alignment (see 5.9)

(2) border attribute of table, vertical position of td/th (vertical-align).

However, it is not output when linking external CSS.

JavaScript specification <head> <script src=”xx/yy.js”></script> </head> xx/yy.js is the JavaScript path

Note 1 Language judgement

Estimated from the percentage of full-width characters in a Word document and the default style language setting. Note that estimates may not be correct.

In such cases, the language can be specified in the conversion options at command line execution. See the "-lang" parameter in the table of conversion options for details.

Block elements

Conversion source Conversion destination (HTML tag) Remarks
Body text <body>~</body>
Title style When outline level 1 is set for the title style. <h1>~</h1> Some of the title styles registered in Word's Style Gallery have outline level 1 set, while others do not.
When the title style does not have an outline level set. <p>-</p>
Paragraph <p>content</p>

By default, lines with only line breaks are ignored.

If the "-emptyP" parameter is specified in the conversion options, lines with only line breaks are output as empty <p></p>.

Forced line break <br >
Forced page break and column break Ignored.
Section When the <h> start tag is at the beginning or only the <h> tag with a lower rank before it, the <section> start tag is output before the <h> start tag. When there is a <h> with a higher rank before it, output the </section>.

Create a tree structure with the <section> tag before <h>.

If the "-xhtml" parameter is specified in the conversion options, <section> tags are output as <div class="section-area"> tags.

Heading styles and outline levels

Conversion source Conversion destination (HTML tag) Remarks
Heading 1 to Heading 6 (Heading style) <h1>~<h6> Set the heading style outline level to the heading rank tag.
Heading 7 to Heading 9 (Heading style)

<p class=”l7”>~

<p class=”l9”>

Heading style outline levels 7 and 8 are set as class attributes of a paragraph.
Paragraph outline levels 1 to 6 <h1>~<h6> Set the paragraph outline level to the heading rank tag.
Paragraph outline levels 1 to 6 <h1>~<h6> Set the paragraph outline level to the heading rank tag.
Paragraph outline levels 7 to 9

<p class=”l7”>~

<class=”l9”>

Paragraph outline levels 7 and 8 are set as class attributes of a paragraph.

Heading outline numbers

When an outline number is added to a paragraph for which a heading style is specified, the outline number is enclosed in a <span> tag with the class attribute value “number”, and converted to the content string of the <h> tag after specifying the class attribute value number for the outline number. If there is a space between the outline number and the heading text, the space is output as a single-byte space, or if there is a tab, the tab is deleted and a single-byte space is inserted instead.

Lists

Paragraphs with Word lists are converted to HTML lists (unordered lists) (<ul>/<li>). At this time, the bullet symbols in Word paragraphs are removed.

Paragraph numbering and ordered lists

Paragraphs that have been numbered at the beginning of a paragraph using Word's paragraph numbering feature (numbered paragraphs) are converted as follows:

  1. When a numbered paragraph is preceded or followed by an unnumbered paragraph or line break, the numbered paragraph is output as an HTML paragraph (<p> tag). In this case, the paragraph number is enclosed in a <span> tag with the class attribute value specified as number, and then output as normal text.
  2. When two or more numbered paragraphs are consecutive, they are output to an HTML ordered list (<ol>/<li> tags):
    1. If numbered paragraphs are arranged in a hierarchy and the first and next paragraphs are adjacent to each other, even if they are at different levels, they are considered to be consecutive.
    2. Sets the type of numbering specified in the Word document as the value of the class attribute of the <ol> tag.

The start number is specified in the start attribute when the start number is 2 or more.

Paragraph style name (optional)

By default, paragraph style names are not output.

If you specify “-pstyle” parameter as a conversion option, the name of the paragraph style is output as the value of the class attribute of the <p> or <h> tag when a paragraph style is specified in a Word paragraph. When paragraph formatting is specified without using the paragraph style feature, the value of the class attribute is not set.

Figure and figure arrangements

Output folder and file name for illustrations

  1. The illustrations inserted into the docx document are extracted from the docx document and the that paths are set to the value of the src attribute of the <img> tag in HTML. The default folder for extracted illustration files is "image". If the "-fileimages" parameter of the conversion option is specified, a folder named "destination_file_name.images" is created for each output HTML file. The file names are automatically generated with sequential numbers.
  2. Illustrations linked to a docx document will have the path of the linked file set to the value of the src attribute of the <img> tag in HTML. Linked illustraion files will not be copied or moved. The illustration paths are converted to relative paths from the output HTML file. If the original docx document and the folder of the linked illustraions have been moved, the path may not be set to a proper relative path. Note that if the "-embedimg" parameter is specified, the images will be embedded in the HTML file.

Note that if the "-embedimg" parameter is specified, the image will be embedded in the HTML file.

Image and shape formats

By default, images are converted to PNG or JPEG format, and AutoShape, line shapes inserted in Word, and shape files in EMF and WMF formats are converted to SVG format for output.

If you specify the “-throughimg” parameter in the conversion option, images and shapes inserted into Word in GIF, EMF or WMF formats are saved to the illustration output folder in their original formats without file format conversion.

Layout Options

Saves the layout option type specified in Word format as the <img> class attribute.

class attribute
Conversion source Options
In Line with Text class="inline"

With Text Wrapping

Common for “With Text Wrapping” class="block"

Square

class="block square"

Tight

class="class="block tight"

Through

class="block through"

Top and Bottom

class="block top-bottom"

Top and Bottom

class="block behind"

In Front of Text

class="block front"
CAUTION:

In CSS, the display property specifies whether the figure layout is inline or block. Since the default value of the display property is inline, even if you set “With Text Wrapping” in the Layout Options in Word, it may be displayed as “In Line with Text” in the browser. In such a case, specify as follows in CSS:

img.block {
display: block
}

Position to output the figure with “With Text Wrapping” specified

The output position of the <img> tag for an illustration that specifies string wrapping is after the end tag of the block that sets the anchor in headings and paragraphs. However, in bulleted items, it is just before the end tag. For details, refer to “6.4 Layout of shapes”.

Alternative text for figures

Outputs the alt attribute to the <img> tag in HTML, where the value of the alt attribute is the string entered to the alternate text for the figure inserted in the Word document. If no string is set, "Please enter alt text." is output.

Formula

Formulas edited in Word's formula editor are output as SVG format files using <img> tags by default.

Depending on the conversion option parameters, you can convert to an external file in MathML format, convert to MathML format markup, or output as Office Math markup which is the Word's unique representation of Office Open XML formulas.

Parameter Output format
Unspecified Output formulas to <img> tags as svg format files.
-math Output formulas to <img> tags as MathML format files.
-xmath Output formulas as mathML format markups.
-omath Output formulas in Word's own Office Math format.

Tables

Conversion source HTML element Example
Table
<table>
<tbody>
<tr>
<td>

The value set in the "Table Styles" property: Name in the Word ribbon "Table Design" will be output as the class attribute of the <table> tag.

Style names other than single-byte alphanumeric characters and some single-byte symbols are not output as the value of the class attribute.

Merge Cell merge <td colspan="n"> “n” is the number of horizontally merged cells.
“n” is the number of vertically merged cells. <td rowspan="n"> “n” is the number of vertically merged cells.

Table header row

To output the table header tag (table header: thead), set either of the following in the first row of the table.

  1. Select the first row of the table and turn on "Repeat Header Rows" in "Table Tools: Layout" on the Word ribbon.
  2. Check only "Header Row" in "Table Style Options" in "Table Tools: Table Design" on the Word ribbon.
Conversion source HTML element Description

“Table Tools: Layout”

<thead><tr><td>…</td></tr></thead>

The first row of the table is enclosed with <thead>.

If you turn on “Repeat Header Rows”, the header rows will be repeated on each page whren the table spans pages. If you want to avoid this, turn off "Repeat Header Rows" and check "Header Row" in “Table Style Options” in “Table Design”.

“Table Tools: Table Design”: “Table Style Options”

Table header column

Select the first column of the table and check only "First Column" in "Table Style Options" in "Table Tools: Table Design" on the Word ribbon to set the cell of the first column as the header cell.

Conversion source HTML element Description

“Table Tools: Table Design”: “Table Style Options”

<tr><th>…</th></tr> The cells in the first column of the table are marked up with the header cell tags.

Cell alignment

When the alignment in a cell is specified in "Alignment" of the Word ribbon "Table Tools: Layout" or in the table style property cell, the class attribute is output to the <td>/<th> tag for the vertical alignment, and the style is defined in the <head> of the HTML with the <style> tag. However, if external CSS is linked or "-defstyle" is specified in the conversion option, the style definition is not output.

Conversion source HTML element Description

Table Tools: Alignment Options in Layout

<head>内への出力 <style>html{text-align:justify;}table,td,th{border:solid 1px;}td,th{vertical-align:top;}td.center,th.center{vertical-align:middle;}td.bottom,th.bottom{vertical-align:bottom;}</style> The relevant styles are in bold in the source code on the left.
Align Top No output due to default value.
Align Center (vertical) <td class=”center”>/<th class=”center”>
Align Bottom <td class=”bottom”>/<th class=”bottom”>
CAUTION:

The horizontal alignment is output as a class attribute in the paragraph <p> tag within the <td>/<th> tag.

Justified:class=”start”

Center:class=”center”

Right:class=”end”

Inline elements

Font group

Font group HTML element Example
Bold strong If the "-hstrong" parameter is specified in the conversion options, the bold set in the heading style is ignored.
Italic

Ignored by default. Output with <i> tag or the following CSS style specification in the conversion options:

<span style="font-style:italic;>

Underline

Ignored by default. Optionally set the <u> tag or the following CSS style specification for output:

<span style="text-decoration-line:underline>

Note that the anchor text of the link is not underlined.

Strikethrough

Ignored by default. Output with <del> tag or the following CSS style specification in the conversion options:

<span style="text-decoration-line:line-through;">

Subscript sub
Superscript sup
Text Effects and Typography Ignored.
Text Highlight Color Ignored.
Font Color

Ignored by default. Output with the following CSS style specification in the conversion options:

<span style="color;color value">

<span style="color:red;">text color red</span>, <span style="color:#00B050;">text color green</span>
Character Shading Ignored.
Enclose Characters Ignored.
Font Ignored.
Font Size Ignored.
Case Ignored.
Phonetic Guide ruby rp rt <ruby>漢<rp>(</rp><rt>かん</rt><rp>)</rp>字<rp>(</rp><rt>じ</rt><rp>)</rp></ruby>
Character Border Ignored.

Links and cross-references

References HTML element Example
Link (external URL) <a href=”Link URL”>label</a> “Link” on the “Insert” tab on the ribbon.
Link (id) <a href=”#id value”>label</a>
Cross-reference <a href=”#id value”> label</a> References in Word documents by "Cross-references" in the "References" tab on the ribbon.
<span id="">
id value <span id=”id value”></span> Link to bookmark "here"

Paragraph text alignment

Set the paragraph alignment set to the “Normal” style in the style gallery on the “Home” tab of the Microsoft Word ribbon to the <style> element of the <head>. However, when left alignment is set in the "Normal" style, text-align:start is the default value in CSS, and it is not necessary to specify the alignment, so it is not set.

Note that <style> in <head> is not output if “-defstyle” parameter is specified in the conversion option (see "3.2 Conversion options").

Paragraph alignment Elements and class attributes Example
Alignment of "Normal" style Align Left No settings. <style></style>
Center text-align:center <style>html{text-align:center;}</style>
Align Right text-align:end <style>html{text-align:end;}</style>
Justify text-align:justify <style>html{text-align:justify;}</style>
Distributed text-align:justify;
text-justify:auto;
<style>html{text-align:justify;text-justify:auto;}</style>

If you specify the paragraph alignment other than “Normal” in the "Paragraph” group on the "Home" tab of the ribbon, the following class attributes will be set in the heading rank tag (h1 to h6) or p tag.

Paragraph alignment Elements and class attributes Example
Align Left class="start" <p class=”start”>…</p>
Center class="center" <p class=”center”>…</p>
Align Right class="end" <p class=”end”>…</p>
Justify class=” justify ” <p>…</p>
Distributed class="distribute" <p class=”distribute”>…</p>

Text Box

Endnote

An anchor tag is set to an endnote symbol indicating the location of the endnote in the body text, and the id of the endnote is set to the value of the href attribute of the anchor tag.

The text of the endnote is output at the end of the document, at the same level as the paragraphs at the end of the document except for the endnote. The number of the endnote is set to id="endnote-n" (n is a number).

Table of contents output

The table of contents section created using Word's table of contents function is output to an HTML file with a link to the heading section in the table of contents item. The table of contents is output as follows:

CAUTION:
  • Only tables of contents created from paragraphs with outline levels set are supported.
  • If there are multiple tables of contents, only one will be treated as a table of contents.
    In this case, the table of contents created with the "Built-In" feature of Word's table of contents function will be given priority.
    Otherwise, the first table of contents that appears in the document is treated as the table of contents.
  • Table of contents for charts and tables is excluded.
HTML element Description
<a id="mobile-side-btn" href="javascript:;"><span class="mobile-side-btn-icon" id="mobile-side-btn-icon"></span></a>
<nav class="toc-wrap">

The table of contents sections ④ and ⑤ are enclosed in ② <nav> and ③ <div> tags and output.

If the "-split" + "-tocout" parameters are specified in the conversion options, ③ to ⑤ are output as separate HTML file "inc-toc.html".

<div id="toc">
<p class="toc-heading">[Table of contents heading]</p>

The paragraph style name (blanks are converted to "-") set for the paragraph of the table of contents heading will be output.

For a table of contents inserted using Word's "Built-In" table of contents function, <p class=”toc-heading”> will be output by default.

<p class=”toc-[n]”><a href=”[ Link to the corresponding heading id]”>[Heading name]</p>

The paragraph style name (blanks are replaced with “-“) set for the paragraph of each item in the table of contents will be output.

For a table of contents inserted using Word’s “Built-In” table of contents function, <p class=”toc-[n]”> will be output by default. ([n] is a number from 1 to 6.)

The link to the corresponding heading id will output a URL starting with "#_Toc".

If the HTML file is split and output by specifying the "-split 1|2|3" parameter in the conversion options, the output will be the file name and id of the HTML file to be split. (e.g. index-1.html#_TocXXX)

Table of contents for split output

If the "-split 1|2|3" parameter is specified in the conversion options and the output HTML file is split according to the Word outline level, the table of contents section will be output as follows:
Specified parameter Output Note
Only -split 1|2|3

The table ① to ⑤ in Section 5.12 "Table of contents output" is output immediately after the <body> tag in all HTML files to be split into separate output files.

At this time, "active" is output as the class attribute of the paragraph <p> tag of the table of contents item (the highest hierarchical level in the page) that indicates the own HTML file.

-split 1|2|3 -tocout

Output table ③ to ⑤ from the table in "5.12 Table of contents output" as separate HTML files (inc-toc.html).

In addition, ① and ② are output immediately after the <body> tag in all HTML files to be output separately.

inc-toc.html can be used to load into a split-output HTML file using JavaScript or to load into other HTML files.

For this reason, inc-toc.html does not output tags other than ③ to ⑤ such as <html><head><body>, etc..

Please refer to the following web page for an example of loading a table of contents section using JavaScript.

www.antennahouse.com/html-on-word-samples

Split output

When the "-split 1|2|3" parameter is specified in the conversion options, the HTML file will be split and output according to the outline level of the paragraphs specified in the Word document. The outline levels that can be specified are 1 to 3.

The contents of the splitting are as follows:

Item Content Note
Splitting point Within the outline level of a paragraph in Word (the value following the specified -split), split just before the next paragraph of the same level. If the value is specified as 2 or 3, they are also divided immediately before the higher level, respectively.
Output file name The split output file names are output as sequential numbers connected by "-" (hyphen) before the specified file name extension (.html). The first page is the specified output file name.

Example of specifying index.html as the output file name.

index.html, index-1.html, index-2.html, index-3.html, …

Output HTML

<html>, <meta>, <style>, <link> (CSS), <script> (JavaScritpt) and <body> tags are common to all pages.

The <title> tag is set to [outline level 1 label] - [outline level 2 label] - [outline level 3 label] - [title set in the Word document information] for the relevant page.

Table of contents The table of contents is output at the top of all split HTML files (immediately after the <body> tag).
Page link If the "-pagenavi" parameter is specified when the "-split 1|2|3" parameter is specified in the conversion options, links are output that go to the previous and next pages of the HTML file being displayed. See "Page link output" for details.

Page link output

When the "-split 1|2|3" parameter is specified in the conversion options and the "-pagenavi" parameter is specified, links are output at the top (immediately after the table of contents, if any) and bottom (immediately before the </body> tag) of the split HTML file, based on the sequential number of the HTML file name to be output.

The link labels can be output in Japanese or English by specifying the value following the parameter:

Value Link label Note
ja "前へ" and "次へ" in Japanese. If there is no previous or next page, "前へ" or "次へ" links are not output.
If you specify anything other than "ja" or omit it. “Prev” and “Next” in English. If there is no previous or next page, "Prev" or "Next" links are not output.

Output HTML elements

If the value following the "-pagenavi" parameter is specified anything other than "ja" or omitted, the output is as follows. (Example of displaying the HTML source code of index-1.html among the split HTML files with the output file name index.html)

Tags output at the top

<nav>
<div class="pagenavi-wrap-top">
<div class="pagenavi-prev">
<a href="index.html">Prev</a></div>
<div class="pagenavi-next">
<a href="index-2.html">Next</a></div>
</div>
</nav>

Tags output at the bottom

<nav>
<div class="pagenavi-wrap-bottom">
<div class="pagenavi-prev">
<a href="index.html">Prev</a></div>
<div class="pagenavi-next">
<a href="index-2.html">Next</a></div>
</div></nav>