Technical Notes

Formatting HTML

AH Formatter V7.1 can format HTML designed for the Web (except for HTML that uses a frame). However, there may be few HTML documents that achieve a good result without needing adjustment after formatting. The reasons are as follows:

  • The HTML document was designed especially for the browser and paginated media was not taken into consideration.
  • The HTML document does not follow the HTML specification.
  • The CSS used in the HTML document may not be used exactly according to the CSS specification.

For example, if the HTML can be printed from a Web browser without overflowing the right-hand side of the page, then formatting with AH Formatter V7.1 will produce a reasonable result. However, in order to achieve a better result, the HTML must be designed both for the browser and for printing. The CSS for printing may be precisely defined using rules such as:

@media print { ... }
@page { ... }

Moreover, there are big differences in the CSS implementations of current Web browsers. If the HTML contains grammar mistakes by being designed for a particular browser, or the HTML uses incorrect CSS, it is unlikely that a good result could be obtained.

Many (X)HTML documents on the Web use only generic fonts. (This is desirable considering the characteristics of the Web.) Since the font settings for every script in the Option Setting File always apply in AH Formatter V7.1 GUI on Windows, suitable fonts will be used. However, this applies only to AH Formatter V7.1 GUI and only on Windows. When using the Command-line Interface, set appropriate <script-font> values in the Option Setting File and specify the Option Setting File when formatting.

Cascading Order of CSS

The cascading order of the CSS is defined in the CSS2 Specification as follows:

  1. user agent declarations
  2. user normal declarations
  3. author normal declarations
  4. author important declarations
  5. user important declarations

AH Formatter V7.1 corresponds to the following.

  • user agent declarations

    It is html.css. See also Default CSS for HTML.

  • user declarations

    This can be specified by <usercss>, in the Option Setting File and by the command line of -css or -s. (As for the .NET, Java interface, etc, they are equivalent to the corresponding command line.) These are applied in the following order.

    1. Applies CSS specified by the Option Setting File and -css in the appearance order.
    2. Applies CSS specified by -s.

    Only the Option Setting File is applied in GUI. What is specified on the CSS page of the Format Option Setting dialog will be reflected in the Option Setting File.

  • author declarations

    This can be specified by <link> or <style> inside HTML, by the processing instruction of <?xml-stylesheet ...?>. These are applied in the following order.

    1. Applies the processing instruction in XML in the appearance order. (XML or XHTML)
    2. Applies <link> or <style> inside HTML in the appearance order. (XML or XHTML)

Default CSS for HTML

Default CSS for HTML is used as the first stylesheet (user agent declarations) when formatting (X)HTML. This is html.css, which is placed in the directory indicated by the environment variable, AHF71_DEFAULT_HTML_CSS or AHF71_64_DEFAULT_HTML_CSS. (When html.css does not exist, it is formatted as all the elements are inline.)

This stylesheet is created based on the display of a web browser, the style specified by CSS, etc. However, there may be specification which cannot be well displayed depending on the environment. Probably, there is also a difference of taste. Users are required to optimize the default CSS according to their own environment etc. Some examples are shown below.

  • <q>

    It is specified as follows by default CSS.

    q::before { content: open-quote }
    q::after  { content: close-quote }
    

    In AH Formatter V7.1, the default values of quotes are "\201C" "\201D" "\2018" "\2019". The following specification may be preferable.

    q:lang(en) { quotes: '"' '"' "'" "'" }
    q:lang(no) { quotes: "«" "»" '"' '"' }
    

  • footnote

    A footnote number is specified to be placed in the margin of the left page. If you don't want to make it overflow into the margin, specify padding-left or specify list-style-position:inside to @footnote. decimal is specified for numbering. Probably, it is good to correct as follows when you want to use super-decimal.

    ::footnote-call {
      content: counter(footnote, super-decimal);
    }
    ::footnote-marker {
      content: counter(footnote, super-decimal);
      -ah-margin-end: 0.5em;
      text-indent: 0;
    }
    

  • ::marker

    The symbol used for the marker of the list is specified by <list-style-type> in the Option Setting File. Because that glyph is font dependent, different fonts will show different markers. Since ::marker has no specific font setting, the font used depends on the context at that time. It is good to specify a specific font for the ::marker if necessary.

Detection of Formatting Type

When the formatting starts by setting the detection of formatting type automatically, the formatting type will be determined in the following procedures.

  1. When MIME is specified, AH Formatter V7.1 will follow its settings. That is, if text/html is specified, it will be detected as HTML. When application/xhtml+xml is specified, it will be detected as XHTML.
  2. When auto-formatter-type="html" is specified in the Option Setting File and the extension of the input document is known, AH Formatter V7.1 will follow its setting. That is, when the extension is for HTML such as .htm or .html, it will be detected as HTML. If the extension is for XHTML, such as .xht or .xhtml, it will be detected as XHTML.
  3. When there is no XML declaration and DOCTYPE is for HTML, it will be detected as HTML. If it is for XHTML, it will be detected as XHTML.
  4. When auto-formatter-type="xhtml" is specified in the Option Setting File and the name space is for XHTML, it will be detected as XHTML.
  5. When there is no XML declaration and name space does not exist and the root element is <HTML> with case insensitive, it will be detected as HTML.
  6. When CSS, which is not XSLT, is specified (to the internal or external document), it will be detected as XML+CSS.
  7. When the name space is for XSL-FO, it will be detected as XSL-FO.
  8. Other than these will be detected as XML+CSS.

Although the document does not need to be XML if it's HTML formatting, it is required except HTML that the document should be well formed XML.

Changes from AH Formatter V7.0 V7.1

There are some differences in formatting between AH Formatter V7.1 and AH Formatter V7.0 as listed below.

  • Specifying Image Size in %

    In (X)HTML, when <img src="xxx" width="50%"> is specified, % is based on the size of the image. When img { width:50% } is specified in CSS, the size of the parent element is the base. However, until AH Formatter V7.0, even with CSS, % of the following properties were calculated based on the size of the image.

    • width
    • max-width
    • min-width
    • height
    • max-height
    • min-height

    AH Formatter V7.1 corrects this error and calculates % based on the size of the parent element. If you want to calculate % the same as before, specify fix-css-img-percentage="false" in the Option Setting File.

Changes from AH Formatter V6.6

There are some differences in formatting between AH Formatter V7.0 and AH Formatter V6.6 as listed below.

  • Line Breaking

    Line breaking processing has been improved. See Line Breaking. In addition, the line break position may change due to changes in the Unicode specifications.

  • axf:suppress-if-first-on-page

    The behavior of axf:suppress-if-first-on-page has been improved.

  • ::first-letter

    When the character is enlarged using the ::first-letter pseudo-element in CSS, the line spacing may be expanded unless line-height is specified. In AH Formatter V7.0, the line spacing is not unnecessarily expanded when line-height is not specified in ::first-letter.

  • zwsp-mode

    The default value for zwsp-mode has been changed to 6.

  • hyphenation-keep-mode

    The processing method when the word at the end of the page (column) is hyphenated by hyphenation-keep="page", etc. has been improved. See hyphenation-keep-mode in the Option Setting File. Specify hyphenation-keep-mode="line" when you want to make it the same as V6.

  • white-space-collapse-mode

    white-space-collapse is implemented to apply across <fo:inline>, but until AH Formatter V6.6, it was applied unconditionally, so even in some cases where there is a border in <fo:inline>, it was collapsed. AH Formatter V7.0 no longer collapses under the following conditions:

    • When there is border or padding at the border of <fo:inline>
    • When there is <fo:inline> with width settings in between

    Specify white-space-collapse-mode="6" in the Option Setting File when you want to make it the same as V6.

  • font-stretch-mode

    The behavior of font-stretch-mode="6" has been corrected. When a keyword such as condensed is specified in font-stretch, that information is used for the font selection. For example, when extra-condensed is specified, if there is a condensed font, the compression is performed based on that font. The compression ratio at that time is (62.5/75)% = 83.3%. However, if there is no normal font in the selection candidates, the compression rate will be 62.5%.

  • Poster image when embedding multimedia

    In AH Formatter V7.0, if axf:poster-image is specified when embedding multimedia, the size of that image is used. In order to get the same result as AH Formatter V6.6 or earlier, specify scaling="non-uniform" content-width="scale-to-fit" content-height="scale-to-fit" to <fo:external-graphic>.

  • BIDI

    Revision 41 in UAX #9: Unicode Bidirectional Algorithm is now supported. If you want to process with the same algorithm as V6, set a value less than 37 in unicode-bidi-rev. A value greater than or equal to 37 is considered 41.  BIDI Algorithm Implementation Restrictions

Changes from AH Formatter V6.5

There are some differences in formatting between AH Formatter V6.6 and AH Formatter V6.5 as listed below.

  • html.css

    There are changes in the description of Default CSS for HTML (html.css).

    • li::marker { ... } has been changed to ::marker {...}. With this change, this style also applies to markers of list-items other than <li>. As a result, -ah-margin-end:0.5em; will cause a different margin from the past. Specify -ah-margin-end:0;, etc. if necessary.
    • *[hidden] { visibility: hidden } has been added.

  • -ah-force-page-count

    In CSS,

    @page {
      -ah-force-page-count: document 4;
    

    If you write as above, in AH Formatter V6.6, -ah-force-page-count will work when switching all pages. To make it work only on the last page as AH Formatter V6.5 or earlier, specify -ah-force-page-count to @page where the last used page selector exists.

  • Suppression of ligatures

    Ligatures are suppressed by inserting the following characters:

    • U+200B
    • U+200C
    • U+2060
    • U+FEFF

  • MathML

    When STIX version 2.0 font is installed by default, it will be adopted.

    If OpenType has a MATH table, it will be referenced. You can control it finely using enableOpenTypeMATH, exceptOpenTypeMATHVariants in the Option Setting File.

    The thickness of the line of <mlongdiv> will be drawn with the thickness of mslinethickness.

    The height of each line at the time of line breaking will become the height of each line. To make the height of each line the same, as it has been with AH Formatter V6.5 or earlier, specify linebreakingHeightAdjust="false" in the Option Setting File.

Changes from AH Formatter V6.3

There are some differences in formatting between AH Formatter V6.4 and AH Formatter V6.3 as listed below.

  • MathML

    In the MathML settings in the Option Setting File, the default value for the alignment of subscript/superscript has somewhat been modified. In order to make it the same setting as AH Formatter V6.3, specify mathmlSettingsMode="6.3".

Changes from AH Formatter V6.2

There are some differences in formatting between AH Formatter V6.3 and AH Formatter V6.2 as listed below.

  • keep-footnote-anchor

    With AH Formatter V6.2, the block containing the anchor was sent to the following page due to conditions such as orphans, and as a result, the footnote itself was sometimes arranged in the previous page. On the other hand, AH Formatter V6.3 will try to fit the dividable block after the anchor in the previous page. You can also get the same result by specifying axf:footnote-keep="always" in the original block. In order to get the same result as AH Formatter V6.2 or earlier, specify keep-footnote-anchor="false" in the Option Setting File.

  • list-style-type

    The implemented list-style-type has been changed to use Predefined Counter Styles. The following style names were included in the previous list-style-type but are not included in the Predefined Counter Styles:

    • cjk-ideographic
    • japanese-formal-obsolete
    • urdu
    • lower-norwegian
    • upper-norwegian
    • hangul
    • hangul-consonant
    • halfwidth-katakana
    • halfwidth-katakana-iroha

    Even though the style names are same as those given in the previous list-style-type, some of implementation of the Predefined Counter Style may differ. If you still want to use the style names of the previous list-style-type, specify axf:number-transform="'lower-alpha'" for instance, not axf:number-transform="lower-alpha".

Changes from AH Formatter V6.1

There are some differences in formatting between AH Formatter V6.2 and AH Formatter V6.1 as listed below.

  • latin-ligature / pair-kerning

    The default values of latin-ligature and pair-kerning in the Option Setting File have been changed. Up to AH Formatter V6.1, these values are false. In AH Formatter V6.2, they are changed to true. This intends to be able to get a better formatting result by default. When axf:ligature-mode and axf:kerning-mode are specified specifically about those in FO, they don't influence the formatting result. These settings will influence the formatting speed.

  • Splitting blocks

    In CSS, when the block with auto-height breaks at the end of a page for example, the block height was the break point as is up to AH Formatter V6.1. In AH Formatter V6.2, the height is adjusted to the end of a page. The difference is remarkable when the background or the border is specified to the block. The same is applied to the end of column.  5.3. Splitting Boxes
    This behavior is not applied to FO.
    By specifying splitting-blocks-space="true" in the Option Setting File, you can return to the original V6.1 behavior.

  • Text wrapping with before float

    When the float width on the before side fills up the region and there is no room for wrapping text, although the text is positioned aside by the float, the block itself has overlapped with the float. This can be checked by adding a background or a border to the block. When intrusion-displace="block" is specified, the block itself is positioned aside by the float. In AH Formatter V6.2, regardless of the setting of intrusion-displace, the block itself is positioned aside by the float.

  • Splitting footnotes

    Up to AH Formatter V6.1, a page (column) break did not occur within footnote-body. In AH Formatter V6.2, it is possible to break pages (columns) within footnote-body. A footnote breaks by the setting of axf:footnote-max-height and it occurs by default. For this reason, the formatted result may differ from AH Formatter V6.1. In order to avoid the automatic break, specify auto-break-footnote="false" in the Option Setting File.

  • BIDI

    Up to AH Formatter V6.1, there was a known issue in the BIDI processing. With AH Formatter V6.2, BIDI processing was corrected. Therefore, the formatted result may differ from V6.1.

Changes from AH Formatter V6.0

There are some differences in formatting between AH Formatter V6.1 and AH Formatter V6.0 as listed below.

  • normalize

    In AH Formatter V6.1, Unicode normalization (UAX#15: Unicode Normalization Forms) can be performed to the inputted text. See also axf:normalize. The normalization may somehow influence the formatting speed. If you don't want to perform the normalization by default, specify normalize="none" in the Option Setting File.

  • font-stretch-mode

    In AH Formatter V6.1, when specifying a family name to the font-family, it's made available to choose a condensed font if it actually exists using the information of font-stretch="condensed" etc. Specify font-stretch-mode="6" in the Option Setting File. The operating differences between font-stretch-mode="5" and "6" are as follows:

    • font-stretch-mode="5"

      The behavior is the same as AH Formatter V5. The information on font-stretch is not used for the font selection. That is, even if a condensed font exists in the family, it is not chosen. In order to choose a condensed font, it is necessary to specify the font name. When fonts called Foo-Regular.otf and Foo-Condensed.otf exist with the family name of Foo, Foo-Condensed.otf is not chosen even if <fo:block font-family="Foo" font-stretch="condensed"> is specified. It is necessary to specify <fo:block font-family="Foo-Condensed">.

      When <fo:block font-family="Foo" font-stretch="condensed"> is specified, Foo-Regular.otf is compressed and displayed. The compression ratio at that time is somewhat smaller (larger when expanding) than the value defined in the OpenType specification.

    • font-stretch-mode="6"

      The information on font-stretch is used for the font selection. In the example above, <fo:block font-family="Foo" font-stretch="condensed"> is specified, Foo-Condensed.otf will be chosen. When a numerical value is specified as font-stretch, a condensed font is not searched. <fo:block font-family="Foo" font-stretch="extra-condensed"> is specified, and when there is no extra-condensed font, a condensed font is not necessarily compressed but the normal font will be compressed.

      A compression ratio in case there is no condensed font will be the following values shown in the specification of OpenType.

      ultra-condensed 50%
      extra-condensed 62.5%
      condensed 75%
      semi-condensed 87.5%
      normal100%
      semi-expanded112.5%
      expanded125%
      extra-expanded150%
      ultra-expanded200%

    The behavior has been corrected in AH Formatter V7.0.  font-stretch-mode

  • baseline-mode

    Although the position of the baseline was improved by AH Formatter V5, when the character (alphanumeric character) of European languages was rendered upright in vertical writing mode, there still remains the problem that the center position was not aligned. The problem has been improved by AH Formatter V6.1. Specify baseline-mode="5" in the Option Setting File when you want to make it the same as V5.

  • viewport-length-units-mode

    The interpretation of the vw and vh units have been changed. Formerly the unit was based on the entire page size including page margins. In AH Formatter V6.1, it is based on the size excluding the page margins. In addition, the pvw and pvh units based on the entire page size have been added. Specify viewport-length-units-mode="5" in the Option Setting File when you want to make it the same as V5. In this case, the units behave as vw=pvw, vh=pvh, vmin=pvmin and vmax=pvmax.

  • letter-spacing / word-spacing

    When letter-spacing and word-spacing are specified to the text, the settings of axf:punctuation-trim, axf:text-autospace, etc. were invalid. AH Formatter V6.1 removed this restriction.

  • Treatment of ideographic space

    With AH Formatter V6.1, the treatment of the ideographic space (U+3000) has been somewhat changed.

Changes from AH Formatter V5

There are some differences in formatting between AH Formatter V6.0 and AH Formatter V5 as listed below.

  • span

    In AH Formatter V6.0, the behavior of span="all" differs from that in AH Formatter V5.

    • In AH Formatter V5, the span specified inside of the nested FO, that generates reference area such as <fo:block-container> is also effective. However, in AH Formatter V6.0, the span specified in FO nested inside of FO that generates reference area is invalid. For instance,

      <fo:block-container>
      <fo:block span="all">
      <fo:block>ABC</fo:block>
      </fo:block>
      </fo:block-container>

      In V5, span="all" was effective with <fo:block>ABC</fo:block>. However it's invalid in AH Formatter V6.0. In addition, when span="all" is specified to <fo:block> in the column of <fo:block-container>, that uses axf:column-count, it is considered that the span is specified to the column of the <fo:block-container>. In order to keep the same result as V5, specify span="all" to the parent's <fo:block-contianer>.

    • Although the specification of the forced page break between the empty block at the beginning of the document and the block with span="all" was disregarded in V5, In AH Formatter V6.0, a forced page break is effective and a blank page is produced. In order to keep the same result as V5, specify as follows:

      • Do not place an empty block the block with break-before="page" specified, or
      • Do not specify break-before="page" (as it is the beginning of <fo:flow>, it's not necessary). Or specify it to an empty block.

    • In case of one-column format, span="all" was not effective in V5. AH Formatter V6.0, even if it's one-column format, a reference area is generated. This causes the following differences, for example:

      <fo:block>AAA</fo:block>
      <fo:block space-before="1cm" span="all">BBB</fo:block>

      In case of one-column format, the space was generated between AAA and BBB in V5, but it's not generated in AH Formatter V6.0. It is because a reference area is generated by span even in one-column format, then the space without the specification of space-before.conditionality="retain" will be deleted at the beginning of the reference area. In order to keep the same result as V5, do not specify span="all" in one-column format.

  • text-underline-mode

    In AH Formatter V5, there were the following problems with the position of underline, overline and line-through.

    • axf:vertical-underline-side doesn't work when axf:text-underline-position is specified.
    • It is always interpreted as an offset from base-line when the numerical value is specified to axf:text-underline-position.
    • Even if the position of the underline is changed by axf:vertical-underline-side, the position of the overline is not changed.
    • In CSS, the position of underline and overline differs between -ah-line-stacking-strategy:line-height and -ah-line-stacking-strategy:max-height specified.
    • When the underline etc. are drawn in horizontal writing mode, the line becomes uneven when there are baseline-shift="super", etc. though the line is aligned in a straight in vertical writing mode.
    • When the strikethrough is drawn, the line becomes uneven when the font size differs or there are baseline-shift="super", etc.
    • The line width depends on the font size when axf:text-line-width="auto" is specified. However in CSS, it depends on the height of the line area.

    In AH Formatter V6.0, these are improved as follows:

    • axf:vertical-underline-side is effective even if axf:text-underline-position is specified.
    • The standard position can also be described in the numerical value specified for axf:text-underline-position.
    • The overline is always positioned on the opposite side of the underline.
    • In CSS, the line is drawn at the same position without depending on the value of -ah-line-stacking-strategy.
    • When the underline etc. are drawn in horizontal writing mode, it is aligned in a straight line even if there are baseline-shift="super", etc.
    • When the strikethrough is drawn, it is aligned in a straight line even if the font size differs or there are baseline-shift="super", etc.
    • The line width depends on the height of the line area when axf:text-line-width="auto" is specified.

    Specify text-underline-mode="5" in the Option Setting File when you want to make it the same as V5.

  • intrusion-displace-mode

    In AH Formatter V6.0, the behavior of the intrusion-displace is revised and different from that of AH Formatter V5.

    • text-indent no longer disappears when intrusion-displace="line" or "auto".
    • intrusion-displace="indent" ensures that relative indents by start-indent and end-indent are preserved. In AH Formatter V5, only text-indent was preserved when intrusion-displace="indent".

    Specify intrusion-displace-mode="5" in the Option Setting File when you want to make it the same as V5.

  • vertical-block-width-mode

    The behavior of the auto value of the width of vertical-text block within horizontal-text flow (or the height of horizontal-text block within vertical-text flow) is changed with AH Formatter V6.0.

    In AH Formatter V5, the width of vertical-text block was given by the width of the outer area. In AH Formatter V6.0, the auto width of vertical-text block shrinks to fit the content. If you don't want this behavior you should specify the width explicitly such as width="100%". Also the same behavior will be applied to the height of horizontal-text block within vertical-text flow.

    Specify vertical-block-width-mode="5" in the Option Setting File when you want to make it the same as V5.

  • zwsp-mode

    There is an ambiguous portion of the specification in the operation of ZERO WIDTH SPACE (U+200B). In AH Formatter V5, ZERO WIDTH SPACE is also a target for text-align="justify" and this portion becomes larger than others. In addition, since leading and trailing ZERO WIDTH SPACE in the block are not exceptions, they spread also. AH Formatter V6.0 can format as follows:

    • Remove ZERO WIDTH SPACE from the target of justify.
    • Delete leading and trailing ZERO WIDTH SPACE of a block.

    This will avoid the effect of having a one-line space in the block such like <fo:block>&#x200B;</fo:block>. Specify zwsp-mode in the Option Setting File.

Changes from XSL Formatter V4

There are some differences in formatting between AH Formatter V5 and XSL Formatter V4 as listed below.

  • capitalize

    For example, V4 formats the following

    <fo:block text-transform="capitalize">
    HELLO world!
    </fo:block>
    

    as follows:

    Hello World!

    AH Formatter V5 formats as follows:

    HELLO World!

    That is, although V4 changes the letters except the initial letter into lower case, AH Formatter V5 does nothing. In order to make it the same as V4, specify as follows:

    <fo:block text-transform="capitalize-lowercase">
    

     text-transform

  • text-justify-mode

    AH Formatter V5 improves the processing of trimming a line of text. Although finer control was attained by axf:text-justify-trim with this enhancement, a difference may arise in the number of characters included in one line with XSL Formatter V4. When you want to make it the same as V4 by FO which does not use axf:text-justify-trim, specify text-justify-mode="4" in the Option Setting File.

  • baseline-mode

    AH Formatter V5 improves the processing when putting fonts with different baselines like a mixture of Western and Japanese text. For example,

    <fo:block>Latin漢字</fo:block>
    <fo:block>漢字Latin</fo:block>
    <fo:block>Latin</fo:block>
    <fo:block>漢字</fo:block>
    

    like the above, you may specify font-family="'Times New Roman', 'MS Mincho'" so that Japanese fonts are not applied to Latin. Since the first font specified as font-family determines a baseline by XSL Formatter V4 at this time, a difference may arise in the height of a line. Since AH Formatter V5 selects the font in the font-family by the script or the language specification, a suitable baseline will be applied by specifying language="jpn" in the example above. When you want to make it the same as V4, specify baseline-mode="4" in the Option Setting File.

  • Font selection

    font-selection-strategy="character-by-character" is supported from AH Formatter V5 In addition, auto-fallback-font in the Option Setting File makes it possible to control the fallback. See also Font Selection.

Changes from XSL 1.0 to XSL 1.1

Some incompatible changes from XSL 1.0 are made to XSL 1.1.

  • from-page-master-region()

    In XSL 1.1, even if writing-mode or reference-orientation are specified to <fo:region-*>, these are ignored and not effective. In order to make these specifications effective in XSL 1.1, it is necessary to specify the following to <fo:page-sequence>.

    writing-mode="from-page-master-region()"
    reference-orientation="from-page-master-region()"
    

    In order to evaluate it as well as XSL 1.0 without making any changes in FO, specify default-from-page-master-region="true" in the Option Setting File.

  • fo:table

    In XSL 1.0, fo:table is supposed to generate a reference area (see 5.6 in XSL 1.0). However, in XSL 1.1, it was corrected that this was an error. The difference is mainly generated when converting from margin-* to start-indent and end-indent specified in fo:table. For example:

    <fo:block margin-left="10pt">
      <fo:table margin-left="0pt">
      ...
    

    In the table like above, left margins may differ between XSL 1.0 and XSL 1.1. If start-indent etc. are used instead of margin-*, such incompatibility will not be generated.

    In order to evaluate it as well as XSL 1.0 without making any changes in FO, specify table-is-reference-area="true" in the Option Setting File.

Shorthand

Since the shorthand in the property of XSL has succeeded the definition of CSS, the value is evaluated like CSS. That is,

margin="0pt -10pt"

is evaluated as two values instead of one formula. However, when it's not a shorthand, this is evaluated as one formula. For example, the following is one formula.

margin-left="0pt -10pt"

AH Formatter V7.1 processes such an ambiguous expression by the shorthand as follows:

  • If the expression cannot be one formula like "0pt 10pt", then it is counted as two values.
  • If the mark and the numerical value have adhered like "0pt -10pt", it is counted as two values.
  • If a white space is included between a mark and a numerical value like "0pt - 10pt", it is counted as one formula.
  • "0pt-10pt" is an error. (See 5.9.5 Numerics in the XSL specification)

In FO, when using a formula in the shorthand, it can be enclosed with parentheses, etc.

With CSS, when a function of calc() is written as calc(10pt-5pt), “-” is evaluated as a operator.

Property Value Syntax

We briefly explain a part of property value syntax in the XSL/CSS Extensions. This notation conforms to that in CSS. For more details, see also Value Definition Syntax.

  • Component value combinators
    • All values that are simply placed must appear in the given order.
    • All values that are separated by a double ampersand “&&” must appear in any order.
    • Greater than or equal to one of the values that are separated by a double bar “||” must appear in any order.
    • Exactly one of the values that are separated by a bar “|” must appear.
    • Brackets “[ ]” are for grouping the content.
  • Component value multipliers
    • An asterisk “*” indicates that the content appears greater than or equal to zero times.
    • A plus “+” indicates that the content appears greater than or equal to one times.
    • A question mark “?” indicates that the content appears zero or one time.
    • {N}” indicates that the content appears N times.
    • {N,}” indicates that the content appears greater than or equal to N times.
    • {N,M}” indicates that the content appears at least N and at most M times.
    • A hash mark “#” indicates that the content appears greater than or equal to one times, separated by comma.

Unicode

AH Formatter V7.1 supports Unicode 13.0. Newly added characters may not be treated correctly. In addition, it's impossible to treat the character of unsupported script correctly ( Scripts and Languages). See unicode-bidi-rev in the Option Setting File for the BIDI control characters.

BIDI Algorithm Implementation Restrictions

When an algorithm that is not compatible with V6.6 is selected in unicode-bidi-rev, the BIDI level may not be resolved as specified.

As shown in the following example, if the character before the break is a character such as a space whose BIDI level depends on the subsequent text, or text that changes the BIDI level depending on the presence of the corresponding character, such as parentheses or “isolate” if present, the text in multiple elements must be combined to determine the BIDI level. At this time, the BIDI level is not evaluated correctly if there is an element within the range that the text is converted depending on the evaluation result of the property. The BIDI level is obtained for the text before the property is applied.

<fo:block>aaaa <fo:inline property="xxxx">bbbbb</fo:inline> ...</fo:block>
<fo:block>(aaaa<fo:inline property="xxxx">bbbbb</fo:inline> ...)</fo:block>

As shown in the following example, when the element is backward referenced, the text obtained in the evaluation result is assumed to be “Neutral”, the BIDI level is obtained once, and after the text is obtained, the BIDI processing is performed only with that text.

<p>xxxx<span ref="#yyy" style="content:target-text(attr(ref, url))"></span>zzzz</p>
<p><span id="yyy">ref</span></p>

If you want to place the page number that changes the writing direction of the index reader at the left edge of the page, we recommend that you place a space in front of the reader and change the writing direction with unicode-bidi as shown in the following example, rather than giving control characters to “content”.

a::before {
	content: leader(dotted) " " target-counter(attr(ref, url),page);
	unicode-bidi: embed;
	direction: rtl;
}
<toc>حول xxxx <a ref="#yyy"></a></toc>

Unicode Range

To express the Unicode Range as a property value in the Font Configuration File, Option Setting File, etc., use the following format:

[ <urange> | <string> ]# | all

<urange> is a hexadecimal number with the preceding U+ and one of the following. Hexadecimal is case insensitive. (In the Unicode specification, the code point must be 4 to 6 digits, but here it is allowed to represent less than 4 digits for notation.)

  • a single code point (e.g. U+416)
  • an interval value range (e.g. U+400-4FF)
  • a range where trailing “?” character implies “any digit value” (e.g. U+4??)

U+4?? is equivalent to U+400-4FF. U+??? is equivalent to U+000-FFF. Unicode up to U+10FFFF is effective. Even if a range greater than U+10FFFF is specified, it is disregarded.

<string> is any string enclosed with quotation marks. For example, U+0028-0029 can be written as '()'.

all is considered that U+0-10FFFF is specified.

URI

<uri-specification> in the XSL specification is supposed to specify the character string which fulfills IRI (RFC3987) specification in url(). IRI is called URI for convenience in this document.

Schemes which can actually be specified in AH Formatter V7.1 are as follows:

  • http:
  • https: (Websites cannot be accessed if they have any problem with their certificates)
  • file:
  • data: (RFC2397)
  • jar: (JarURLConnection)

It's possible to specify a correct absolute URI that includes the scheme name without using url(). For example, the following two are the same.

<fo:external-graphic src="url('http://localhost/image.png')"/>
<fo:external-graphic src="http://localhost/image.png"/>

Moreover, it's possible to specify a relative URI without specifying the scheme name.

<fo:external-graphic src="url('image.png')"/>
<fo:external-graphic src="image.png"/>

AH Formatter V7.1 allows specifying the file name on a local file system instead of URI for user's convenience. However, generally there is no compatibility between URI and a local file name. For example, while a white space is not allowed for URI, a white space may be available for a local file name. Moreover, since the direct use of the % may be available to use, a character string called foo%20bar.png will point to a different resource between the two cases, evaluating as URI and evaluating as a local file name.

AH Formatter V7.1 solves this problem as follows:

  • When the scheme is specified, it is adopted as is.
  • When the scheme is not specified and surrounded by url(), it is processed as follows:
    1. If URI is correct, it will be adopted as is.
    2. If URI is incorrect, % escape processing is done.
  • When the scheme is not specified explicitly and not surrounded by url(), it is processed as follows:
    1. In the Windows environment, “\” is changed into “/”.
    2. % escape processing is done.

The relative URI is combined with base-uri and transformed into the absolute URI. All local file names are transformed into a file scheme at this time. For example, in the Windows environment, when base-uri is C:\home\, it is transformed as follows:

foobar.pngfile:///C:/home/foobar.png
url('foobar.png')file:///C:/home/foobar.png
url('url(foobar.png)')file:///C:/home/url(foobar.png)
subdir\foobar.pngfile:///C:/home/subdir/foobar.png
url('subdir\foobar.png')file:///C:/home/subdir%5Cfoobar.png
url('subdir/foobar.png')file:///C:/home/subdir/foobar.png
foo bar.pngfile:///C:/home/foo%20bar.png
url('foo bar.png')file:///C:/home/foo%20bar.png
foo%20bar.pngfile:///C:/home/foo%2520bar.png
url('foo%20bar.png')file:///C:/home/foo%20bar.png
foo%%20bar.pngfile:///C:/home/foo%25%2520bar.png
url('foo%%20bar.png')file:///C:/home/foo%25%2520bar.png
foo#bar.pngfile:///C:/home/foo#bar.png
url('foo#bar.png')file:///C:/home/foo#bar.png
foo%23bar.pngfile:///C:/home/foo%2523bar.png
url('foo%23bar.png')file:///C:/home/foo%23bar.png

A local file name cannot be written directly into url(). For example:

url('C:\My Document\foobar.png')

The string above will not operate as expected. Specify a local file name without surrounding by url().

“#” is a fragment separator. In file:///C:/home/foo#bar.png, the resource actually accessed is file:///C:/home/foo. Specify url('foo%23bar.png') to access a resource called foo#bar.png.

UNC (Universal Naming Convention) in Windows, for example, \\host\My Document\foobar.png is transformed into file://host/My%20Document/foobar.png. Also, //host/My Document/foobar.png will be transformed into http://host/My%20Document/foobar.png when base-uri is http:. (The same applies to https:.) In non-Windows environments, file://host/... is not supported.

The format of the data scheme defined in RFC2397 is:

"data:" [ mediatype ] [ ";base64" ] "," data

Note that a semicolon “;” is required when specifying base64, and a data delimiter is a comma “,”.

The jar scheme defined in JarURLConnection can be specified. This is effective to JAR or ZIP and possible to specify the entry in it.

jar:http://www.foo.com/bar/baz.jar!/COM/foo/Quux.png

What is specified from after the first separator “!/” is considered the entry specification. The nest of JAR or ZIP is not supported.

When accessing HTTP or HTTPS via a proxy in non-Windows environments, it's necessary to specify the proxy address with the HTTP_PROXY or HTTPS_PROXY environment variable.

When the root certificate is necessary in non-Windows environments, it's necessary to specify the file of the root certificate with the SSL_CERT environment variable.

Supports Multi-domain Certificates.

Table Auto Layout

The table (<fo:table>) has the attribute, table-layout="fixed" and table-layout="auto". The former specifies the fixed layout which has the fixed column width, and the latter is a specification of the automatic layout which calculates the column width automatically. When the value is omitted, the default value is table-layout="auto". In the XSL specification, the automatic layout serves as implementation-independent. We will explain the implementation of AH Formatter V7.1 in this document.

An automatic layout can take a lot of time for calculating the width of columns. Specify table-layout="fixed" if high-speed formatting is desired.

In AH Formatter V7.1, the processing method of the table differs between the specification of table-layout and the specification of the width to <fo:table>. When the width of all columns is specified, even if table-layout="auto" is specified, it is treated as table-layout="fixed". Moreover, proportional-column-width() is supposed to be available to specify only in the case of table-layout="fixed" according to the XSL specification. In AH Formatter V7.1, when a column with proportional-column-width() and a column without the width specification are intermingled, it is considered that column-width="proportional-column-width(1)" is specified to the column without the width specification. In addition, it is considered and processed that table-layout="fixed" is specified. That is, in such case, all columns will have the width specification.

table-layoutWidth of fo:tableProcessing Method
fixedYes The width is divided equally and assigned to the column as which width is not specified. When the content exceeds the width, it will overflow.
No The table width becomes 100%. The width is divided equally and assigned to the column where the width is not specified. When the content exceeds the width, it will overflow.
autoYes The content of the column are calculated and the width is assigned to the column where the width is not specified. When the table width exceeds its specified width even if the minimum width of a column is adopted, the table width expands to the exceeded width.
No The content of the column are calculated and the width is assigned to the column where the width is not specified. When the table width does not fill to 100% even if the maximum width of a column is adopted, it will become the table width. When the table width exceeds 100% even if the width of a column is adopted, it will become the table width. Otherwise, the width of a table becomes 100%.

When table-layout="auto" is specified, the content of the column where the width is not specified are investigated. More desirable column width can be determined if all rows are investigated, but it takes too much time for a big table. AH Formatter V7.1 usually investigates the contents only to the column for 100 rows at the maximum and determines the width of a column. This number of rows can be changed by table-auto-layout-limit of Option Setting File.

When table-layout="fixed" is specified, since the contents of the column are not investigated, the processing speed is always high.

Line Breaking

AH Formatter V7.1 processes two types of line breaking. One is to break lines into the line width at appropriate points at the end of every line, and the other is a processing according to the line breaking algorithm by Knuth-Plass's “Breaking Paragraphs into Lines”. (Hereinafater referred to as BPIL.) BPIL determines the break position considering the balance of the whole block.

Candidates for line breaking positions are determined by the processing of UAX#14: Line Breaking Properties. The UAX#14 processing is somewhat different from the Unicode specification as follows:

  • Nonstarter Japanese characters defined in JIS X 4051:2004 can be controlled by axf:line-break.

  • Although LB30 in UAX#14 is a non line-breaking rule before the open-parenthesis and after the close-parenthesis. AH Formatter V7.1 permits line breaking for full-width parentheses. The target objects are full-width open parenthesis, full-width close parenthesis, and full-width punctuation that are indicated in axf:punctuation-trim.

  • The line breaking class AI in a CJK script is processed as ID. However, U+2015 (HORIZONTAL BAR) is processed as IN since it is non-breaking character in JIS X 4051:2004.

  • The line breaking class of half width kana is AL. Unless it leaves a space between words as well as the alphabet, line breaking is not done. AH Formatter V7.1 treats half width kana as full width kana and processes the line breaking.

  • UAX#14 allows a line break immediately after U+002F (SOLIDUS), then a line break occurs with abbreviations such as km/h and w/o. It is described clearly that such breaks are undesirable in UAX#14. AH Formatter V7.1 makes it possible to control the breaking of the word, such as abbreviations by axf:abbreviation-character-count.
  • The ideographic space (U+3000) is treated as a non-starter character. If you don't want to treat it as a non-starter character, specify non-starter-ideographic-space="false" in the Option Setting File.

  • U+200C and U+200D are processed as follows:
    1. Line breaking will not be done before and after U+200D.
    2. Line breaking will be considered to be available before and after U+200C.

BPIL is applied to the following blocks:

The language of the block is specified by xml:lang or the language property, or specified by default-lang in the Option Setting File. However, BPIL is not always applied to all situations. In the following cases, BPIL is not applied, but the line breaking is performed at the end of every line.

  • Blocks that contain leaders (such as <fo:leader>)
  • Blocks that contain floats or blocks with complicated adjustment of the spacing (BPIL may be applied for simple adjustment of the spacing)
  • Blocks that contain form field
  • Blocks that contain ruby
  • Blocks that require BIDI processing
  • Blocks that contain axf:indent-here
  • Blocks that contain <axf:tab> or tab characters with axf:tab-treatment="preserve"
  • Blocks that contain overflowed lines where the line breaking is not restricted by wrap-option="no-wrap", etc.
  • Blocks whose font-size is 0 or line-height is 0 or less
  • Narrow area (The minimum line width is specified by bpil-minimum-line-width in the Option Setting File)
  • Large blocks (Limited number of characters is specified by bpil-limit-chars in the Option Setting File)
  • Blocks that have page masters changing to different page sizes

The following are restrictions.

  • For blocks that contain <fo:initial-property-set> or ::first-line, BPIL is applied to the second and subsequent lines.
  • If the block spans across greater than or equal to 3 pages (columns), BPIL may not be applied to the second and subsequent pages (columns).

Quotation Mark

Quotation marks are characters that belong to the character class QU in UAX#14: Line Breaking Properties. Quotation marks generally have an open and close direction, but QU does not. Therefore, if nothing is done, it will have undesired results when breaking lines. Unicode, on the other hand, says that if language information is available, it can be used to determine which character is used as the open or close quotation marks and treat them as OP or CL.

AH Formatter V7.1 treats the following characters as quotation marks (including some non-QU characters in UAX#14.) QU/OP/CL shown here indicates in which direction AH Formatter V7.1 treats the character (not the character class in UAX#14.)

U+0022QUQUOTATION MARK
U+0027QUAPOSTROPHE
U+00ABOPLEFT-POINTING DOUBLE ANGLE QUOTATION MARK
U+00BBCLRIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
U+2018OPLEFT SINGLE QUOTATION MARK
U+2019CLRIGHT SINGLE QUOTATION MARK
U+201AOPSINGLE LOW-9 QUOTATION MARK
U+201BOPSINGLE HIGH-REVERSED-9 QUOTATION MARK
U+201COPLEFT DOUBLE QUOTATION MARK
U+201DCLRIGHT DOUBLE QUOTATION MARK
U+201EOPDOUBLE LOW-9 QUOTATION MARK
U+201FOPDOUBLE HIGH-REVERSED-9 QUOTATION MARK
U+2039OPSINGLE LEFT-POINTING ANGLE QUOTATION MARK
U+203ACLSINGLE RIGHT-POINTING ANGLE QUOTATION MARK
U+275BQUHEAVY SINGLE TURNED COMMA QUOTATION MARK ORNAMENT
U+275CQUHEAVY SINGLE COMMA QUOTATION MARK ORNAMENT
U+275DQUHEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT
U+275EQUHEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT
U+275FQUHEAVY LOW SINGLE COMMA QUOTATION MARK ORNAMENT
U+2760QUHEAVY LOW DOUBLE COMMA QUOTATION MARK ORNAMENT
U+2E00QURIGHT ANGLE SUBSTITUTION MARKER
U+2E01QURIGHT ANGLE DOTTED SUBSTITUTION MARKER
U+2E02OPLEFT SUBSTITUTION BRACKET
U+2E03CLRIGHT SUBSTITUTION BRACKET
U+2E04OPLEFT DOTTED SUBSTITUTION BRACKET
U+2E05CLRIGHT DOTTED SUBSTITUTION BRACKET
U+2E06QURAISED INTERPOLATION MARKER
U+2E07QURAISED DOTTED INTERPOLATION MARKER
U+2E08QUDOTTED TRANSPOSITION MARKER
U+2E09OPLEFT TRANSPOSITION BRACKET
U+2E0ACLRIGHT TRANSPOSITION BRACKET
U+2E0BQURAISED SQUARE
U+2E0COPLEFT RAISED OMISSION BRACKET
U+2E0DCLRIGHT RAISED OMISSION BRACKET
U+2E1COPLEFT LOW PARAPHRASE BRACKET
U+2E1DCLRIGHT LOW PARAPHRASE BRACKET
U+2E20OPLEFT VERTICAL BAR WITH QUILL
U+2E21CLRIGHT VERTICAL BAR WITH QUILL
U+301DOPREVERSED DOUBLE PRIME QUOTATION MARK
U+301ECLDOUBLE PRIME QUOTATION MARK
U+301FCLLOW DOUBLE PRIME QUOTATION MARK

Quotation marks have different directions, mainly in European languages. For example, in French it will be «Guillemets» and in German it will be »Guillemets«. AH Formatter V7.1 uses the above settings by default, but some languages correct this as follows. Blank cells and characters not listed here are the same as the default.

language codelanguageU+00ABU+00BBU+2018U+2019U+201CU+201DU+2039U+203A
defaultOPCLOPCLOPCLOPCL
azazeAzerbaijani CL CL
bsbosBosnianCLOP QU QUCLOP
bgbulBulgarian CL CL
cscesCzechCLOPCL CL CLOP
dadanDanishCLOPCL CL CLOP
dedeuGermanCLOPCL CL CLOP
de-CHdeu-CHSwitzerland CL CL
etestEstonian CL CL
fifinFinnish QU QU QU QU
hrhrvCroatianCLOP CLOP
huhunHungarianCLOP CLOP
isislIcelandic CL CL
kakatArmenian CL CL
ltlitLithuanian CL CL
mkmkdMacedonia CL CL
nonorNorwegian QU QU QU QU
plpolPolishCLOP CLOP
rurusRussian CL CL
skslkSlovakCLOPCL CL CLOP
slslvSloveneCLOPCL CL CLOP
sqsqiAlbanian CL CL
svsweSwedish QU QU QU QU
uzuzbUzbek CL CL

You can change the direction of the quotation marks by quotationmark in the Options Setting File. You can also specify the direction of the quotation marks with axf:quotetype.

Characters in quotes used in CSS open-quote/close-quote are forced to OP or CL regardless of these settings.

OP is a quotation mark that is treated like an open parenthesis and CL is a quotation mark that is treated like a close parenthesis. QU is a non-directional quotation mark. For characters that are QU, AH Formatter V7.1 processes them as follows:

  • QU at the beginning of the string is considered OP.
  • QU at the end of the string is considered CL.
  • QU within the string is considered OP if there is no white space immediately after it and there is a white space immediately before it.
  • QU within the string is considered CL if there is no white space immediately before it and there is a white space immediately after it.
  • When other than these, leave it as QU.

Hyphenation

This section explains the behavior of the page (or column) break when hyphenation-keep="page" (or "column") is specified. Suppose there is the following sentence with hyphenation-keep="page" specified.

xxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxx
xxxxxxxxxxx abc-
def xxxxxxx ghi-
jkl mnopqr.

When the page break occurs at the last line, ghi will be pushed to the next page and results in the following:

xxxxxxxxxxxxxxxx
xxxxxxxxxxx abc-
def xxxxxxx
---------------- page break
ghijkl mnopqr.

When widows="2" is specified, another 1 line will be pushed to the next page and results in the following:

xxxxxxxxxxxxxxxx
xxxxxxxxxxx abc-
---------------- page break
def xxxxxxx ghi-
jkl mnopqr.

But it acts against the behavior of hyphenation-keep="page". At that time, AH Formatter V7.1 cannot push only abc and accordingly 1 line will be pushed to the next page.

xxxxxxxxxxxxxxxx
---------------- page break
xxxxxxxxxxx abc-
def xxxxxxx ghi-
jkl mnopqr.

When the previous line ends with the hyphenation, lines will be pushed line after line. It's better to use together with hyphenation-ladder-count.

In a slightly different case, lines may increase when ghi is pushed as follows:

xxxxxxxxxxxxxxxx
xxxxxxxxxxx xxxx
xxx xxxxxxx
---------------- page break
ghijkl xxxx mno-
pqr.

When widows="3" is specified, one more line will be pushed. At this time, lines may decrease as follows:

xxxxxxxxxxxxxxxx
xxxxxxxxxxx xxxx
---------------- page break
xxx xxxxxxx ghi-
jkl xxxx mnopqr.

AH Formatter V7.1 cannot dissolve the widows="3" caused by the side effect. This is the limitation of AH Formatter V7.1. widows="2" never cause such scenario.

Variation Sequence

AH Formatter V7.1 supports the Unicode Character “Variation Sequence”. When the OpenType font has the capability of Variation Sequence (cmap Format14), it is processed appropriately. For example, Variant Sequences can be expressed as follows:

葛&#xE0100;城市
葛城市
葛&#xE0101;飾区
葛飾区

Even when it is applied to a CID font which does not have the capability of Variation Sequence, CID is selected according to the following IVD (UTS#37: Ideographic Variation Database).

  • /ivd/data/2007-12-14 Combined registration of the Adobe-Japan1 collection and of sequences in that collection

&#xE0100;, etc. will be disregarded when it is a font which does not have the capability of Variation Sequence or there is no corresponded variation characters, or the specified Variation Sequence is beyond the range. This indicates that even if the setting is the same, the displayed font face may differ depending on which Variation Sequence the font corresponds to.

Font Selection

Fonts in FO or CSS are specified by the font-family property. There are various cases in settings when the candidates of the font are enumerated like font-family="'Courier New', serif", or when there is no specification of font-family, AH Formatter V7.1 determines which font should be applied to a character string as follows:

  1. The character strings in the region are divided into the character strings with the same character by the script information corresponding to the character defined by Unicode, the language specified in FO or CSS, or the script information, etc. and the script of the divided character string is determined. This method of determination is complicated because of the reason that there contains the ambiguous characters to determine if it's a full width character or not in Unicode. Or the language is being unable to determine by kanji only as a character string.

  2. When font-selection-mode="6" is specified in the Option Setting File, each character of this character string is investigated in order whether the font-family specified by FO or CSS has its glyph. Then the font with the first found glyph will be adopted. If these are not specified, each character of this character string is investigated whether the font-family specified by FO or CSS has its glyph, and the font-family supports the Unicode Range or script in order. Then the first found supported font will be adopted. When no font-family is specified, it is considered that the generic font family as the default font family is specified.

In XSL or CSS, the following five can be used as the generic font family.

  • serif
  • sans-serif
  • cursive
  • fantasy
  • monospace

AH Formatter V7.1 has the information of which font is actually made to correspond to these for every script. Moreover, the generic font which does not belong to any script can also be defined now. These can be specified in the Font Setting page of the Option Setting dialog in Graphical User Interface, and also can be specified with <script-font> in the Option Setting File.

  1. When the generic font classified by the script corresponding to the script of the target character string is specified, whether it supports the character string is investigated.

  2. When the corresponding generic font classified by the script is not specified, the generic font is investigated.

  3. When auto-fallback-font="true" is specified in the Option Setting File and any fonts specified in the font-family don't support the target character string, the following fallback processing will be performed.

    1. The font specified to the fallback associated with the corresponding script is investigated.
    2. The font specified to the fallback of the generic font is investigated.
    3. Even then any fonts don't support the target character string, the following fonts are investigated in order.
      • Windows version
        1. Lucida Sans Unicode
        2. Microsoft Sans Serif
        3. IPAGothic
        4. Code2000
        5. MS PGothic
        6. Arial Unicode MS
      • Non-Windows version
        1. Helvetica
        2. IPAGothic
        3. Code2000

  4. It is an error even then the font which supports the target character string is not found.

The settings in the Option Setting dialog is reflected on the Option Setting File. For example, it is written like

<script-font script="Hans" serif="SimSun" sans-serif="SimHei" monospace="SimSun"/>

Since there is no specification of cursive here, cursive in the generic font is adopted to Hans. Like immediately after the installation, when <script-font script="Hans"/> itself is not specified, it is considered that the default group is specified. The following default group is set up with the Windows version. Only scripts that are specified here are set up. Moreover, it is not set up when the font does not actually exist.

Scriptserifsans-serifcursivefantasymonospace
defaultTimes New RomanArial Segeo Script or
Comic Sans MS or
Monotype Corsiva
Impact Courier New
JpanMS MinchoMS GothicMS Mincho or
MS Gothic
MS Mincho or
MS Gothic
MS Gothic or
MS Mincho
HansSimSun or
MS Song
SimHei or
MS Hei or
MS Song
SimSun or
MS Song
SimSun or
MS Song
SimHei or
MS Hei or
MS Song
HantMingLiU
HangBatang or
BatangChe
Gulim or
BatangChe
Batang or
BatangChe
Batang or
BatangChe
BatangChe
Armn V7.1 no-LT Arian AMU Serif or
Arian AMU
Arian AMUArian AMUArian AMUArian AMU Mono or
Arian AMU
Geor V7.1 no-LT Sylfaen
Ethi no-LT Nyala
ArabArabic Typesetting
Syrc no-LT Estrangelo Edessa
HebrFrankRuehl
DevaMangal
Beng no-LT Vrinda
Guru no-LT Raavi
Gujr no-LT Shruti
Taml no-LT Latha
Telu no-LT Gautami
Knda no-LT Tunga
Mlym no-LT Kartika
Sinh no-LT Iskoola Pota
ThaiAngsana New
Khmr no-LT DaunPenh
Laoo no-LT DokChampa
Mymr no-LT Myanmar Text

The following default group is set up with the Macintosh version.

Scriptserifsans-serifcursivefantasymonospace
defaultTimes or
Times New Roman
Helvetica or
Arial
Monaco or
Chalkboard
Monaco or
Chalkboard
Courier
JpanHiraMinPro W3HiraKakuPro W3HiraMaruPro W3 or
HiraKakuPro W3
HiraMaruPro W3 or
HiraKakuPro W3
HiraKakuPro W3
HansSTXihei
or STSong
STSongSTXihei
or STSong
STXihei
or STSong
STSong
HantLiHeiPro
or LiSongPro
LiSongProLiHeiPro
or LiSongPro
LiHeiPro
or LiSongPro
LiSongPro
HangAppleMyungjoAppleGothicAppleMyungjoAppleMyungjoAppleGothic
ArabGeeza Pro
HebrNewPeninimMT
DevaDevanagariMT
ThaiThonburi

The following default group is set up with the Linux version.

Scriptserifsans-serifcursivefantasymonospace
defaultTimesHelveticaTimesTimesCourier

Glyphs in Vertical Text

There are basically three types of the orientation of text in Japanese or Chinese documents as follows:

In horizontal writingIn vertical writing
SVOMVO

Expresses the orientation of text in vertical writing mode with U or R. U is a character displayed upright on the paper. R is a character rotated 90 degrees clockwise on the paper. Then the text orientation in vertical writing mode is as follows:

  • Japanese characters like "漢字" are U.
  • Brackets are R.
  • After the glyph for vertical writing is used, punctuations are U.
  • European characters like "Abc" are U in SVO, R in MVO.

There is an argument of which characters should be upright or which characters should be rotated 90 degrees at UAX#50: Unicode Vertical Text Layout. Right now only the description of MVO (Mixed Vertical Orientation) is here. However, the description of SVO (Stacked Vertical Orientation) was also included in the past (tr50-6.html). AH Formatter V7.1 implements axf:text-orientation="mixed" complying with MVO, axf:text-orientation="upright" complying with SVO. However, AH Formatter V7.1 uses the one with some modifications ( tr50-x.Orientation.txt). This data can be modified arbitrarily in the Option Setting File. See also UAX50.

Usually, the font supporting the vertical writing mode has the glyph for vertical writing for some characters. It is because some are inapplicable to vertical writing simply by rotating the glyph for horizontal writing mode. They are small kana, punctuations, long vowel, etc. In vertical writing mode, if the character has the glyph for vertical writing, it will be used.

The orientation of text (U or R) is decided and expressed as compared to the orientation of the glyph for horizontal writing mode. However some glyphs for vertical writing mode differ from that for horizontal writing mode. The example below shows the glyph of U+3083, U+FF08, and U+2190. U+FF08 and U+2190 have the different orientation between vertical and horizontal writing mode.

Glyph for horizontal writingGlyph for vertical writing

Although “brackets are R” as mentioned above, actually you have to display them as U using the glyph for vertical writing mode. That is, here is a tacit assumption that the glyph for vertical writing mode is designed to have the orientation differently from that for horizontal writing mode. Whether the font has the glyph for vertical writing mode or whether the orientation is the same as that for horizontal writing mode depends on the font. In particular, the difference by a font is remarkable in the orientation of symbols, such as arrows. Since it is impossible to get to know which orientation the glyph is designed, this problem is generally impossible to solve. Therefore, AH Formatter V7.1 controls the orientation of the character according to the major implementations.

Formatting Large Document

When outputting PDF, AH Formatter V7.1 discards pages that have already been formatted, so AH Formatter V7.1 consumes just the memory required for one page when outputting PDF for, for example, a simple FO without <fo:page-number-citation>, no matter how huge the document is (except when formatting from the GUI). However, if a page contains an <fo:page-number-citation> that refers to a following page, we cannot know the page number of the referenced page until that page is actually formatted. For that reason, if a page containing an unresolved <fo:page-number-citation> appears, AH Formatter V7.1 will suspend its output and store the result in memory while continuing formatting. When a document has a table of contents at the start, the table of contents will not be output until all the page numbers appearing in it are resolved. Because of the high memory consumption, there is a limit to the number of formatted pages, so it is not possible to format extremely large documents.

To solve this problem, AH Formatter V7.1 makes it possible to process the document in two formatting passes. In the first pass, the formatting is processed only for resolving <fo:page-number-citation>, and all the required page number information is collected. In the second pass, formatting starts again from the first page. Since all <fo:page-number-citation> are already resolved, AH Formatter V7.1 can discard formatted pages when outputting the document. Although the formatting processing time is increased, the formatting consumes less memory and it is possible to format extremely large documents. But this has no effect on the memory consumption needed for the output.

The following shows how to perform 2-pass formatting:

Temporary File

AH Formatter V7.1 does not make a temporary working file if it can be avoided. The following are the cases that AH Formatter V7.1 makes the temporary file for work.

  • With the COM Interface, PDF of a formatted result is saved to a temporary file when outputting PDF to a Web browser directly.

  • An XML document passed by using DOM with the COM Interface is processed using a temporary file. However, when FO is specified as the formatting type, the temporary file is not generated because DOM is processed directly.

  • When outputting a file while printing, a temporary file is generated.

  • When a file interface is required in the XSLT transformation using external XSLT, a temporary file is generated.

  • When the transformation from XML+XSL is required in the render method of a Java Interface, the result FO is generated as a temporary file.

  • In Windows version, when embedding the image that is not embeddable in PDF, a temporary file is generated in the conversion process.

  • A temporary file is generated when converting EPS to PDF using Distiller or Ghostscript.

  • When processing EPS using Distiller, if joboptions is not specified, a default joboption will be generated as a temporary file.

  • A temporary file is generated when outputting to a XPS file.

  • In GUI of Windows version, a temporary file is suitably generated by Windows System.