AH Formatter V7.0 can format HTML designed for the Web (except for HTML that uses a frame). However, there may be few HTML documents that achieve a good result without needing adjustment after formatting. The reasons are as follows:
For example, if the HTML can be printed from a Web browser without overflowing the right-hand side of the page, then formatting with AH Formatter V7.0 will produce a reasonable result. However, in order to achieve a better result, the HTML must be designed both for the browser and for printing. The CSS for printing may be precisely defined using rules such as:
@media print { ... } @page { ... }
Moreover, there are big differences in the CSS implementations of current Web browsers. If the HTML contains grammar mistakes by being designed for a particular browser, or the HTML uses incorrect CSS, it is unlikely that a good result could be obtained.
Many (X)HTML documents on the Web use only generic fonts. (This is desirable considering the characteristics of the Web.) Since the font settings for every script in the Option Setting File always apply in AH Formatter V7.0 GUI on Windows, suitable fonts will be used. However, this applies only to AH Formatter V7.0 GUI and only on Windows. When using the Command-line Interface, set appropriate <script-font> values in the Option Setting File and specify the Option Setting File when formatting.
CAUTION: | Since AH Formatter V7.0 formats the document for print media, @media screen is ignored even when viewing documents on-screen with the GUI. |
---|
The cascading order of the CSS is defined in the CSS2 Specification as follows:
AH Formatter V7.0 corresponds to the following.
It is html.css. See also Default CSS for HTML.
This can be specified by <usercss>, in the Option Setting File and by the command line of -css or -s. (As for the .NET, Java interface, etc, they are equivalent to the corresponding command line.) These are applied in the following order.
Only the Option Setting File is applied in GUI. What is specified on the CSS page of the Format Option Setting dialog will be reflected in the Option Setting File.
This can be specified by <link> or <style> inside HTML, by the processing instruction of <?xml-stylesheet ...?>. These are applied in the following order.
Default CSS for HTML is used as the first stylesheet (user agent declarations) when formatting (X)HTML. This is html.css, which is placed in the directory indicated by the environment variable, AHF70_DEFAULT_HTML_CSS or AHF70_64_DEFAULT_HTML_CSS. (When html.css does not exist, it is formatted as all the elements are inline.)
This stylesheet is created based on the display of a web browser, the style specified by CSS, etc. However, there may be specification which cannot be well displayed depending on the environment. Probably, there is also a difference of taste. Users are required to optimize the default CSS according to their own environment etc. Some examples are shown below.
It is specified as follows by default CSS.
q::before { content: open-quote } q::after { content: close-quote }
In AH Formatter V7.0, the default values of quotes are "\201C" "\201D" "\2018" "\2019". The following specification may be preferable.
q:lang(en) { quotes: '"' '"' "'" "'" } q:lang(no) { quotes: "«" "»" '"' '"' }
A footnote number is specified to be placed in the margin of the left page. If you don't want to make it overflow into the margin, specify padding-left or specify list-style-position:inside to @footnote. decimal is specified for numbering. Probably, it is good to correct as follows when you want to use super-decimal.
::footnote-call { content: counter(footnote, super-decimal); } ::footnote-marker { content: counter(footnote, super-decimal); -ah-margin-end: 0.5em; text-indent: 0; }
The symbol used for the marker of the list is specified by <list-style-type> in the Option Setting File. Because that glyph is font dependent, different fonts will show different markers. Since ::marker has no specific font setting, the font used depends on the context at that time. It is good to specify a specific font for the ::marker if necessary.
When the formatting starts by setting the detection of formatting type automatically, the formatting type will be determined in the following procedures.
Although the document does not need to be XML if it's HTML formatting, it is required except HTML that the document should be well formed XML.
There are some differences in formatting between AH Formatter V7.0 and AH Formatter V6.6 as listed below.
Line breaking processing has been improved. See Line Breaking. In addition, the line break position may change due to changes in the Unicode specifications.
The behavior of axf:suppress-if-first-on-page has been improved.
When the character is enlarged using the ::first-letter pseudo-element in CSS, the line spacing may be expanded unless line-height is specified. In AH Formatter V7.0, the line spacing is not unnecessarily expanded when line-height is not specified in ::first-letter.
The default value for zwsp-mode has been changed to 6.
The processing method when the word at the end of the page (column) is hyphenated by hyphenation-keep="page", etc. has been improved. See hyphenation-keep-mode in the Option Setting File. Specify hyphenation-keep-mode="line" when you want to make it the same as V6.
white-space-collapse is implemented to apply across <fo:inline>, but until AH Formatter V6.6, it was applied unconditionally, so even in some cases where there is a border in <fo:inline>, it was collapsed. AH Formatter V7.0 no longer collapses under the following conditions:
Specify white-space-collapse-mode="6" in the Option Setting File when you want to make it the same as V6.
The behavior of font-stretch-mode="6" has been corrected. When a keyword such as condensed is specified in font-stretch, that information is used for the font selection. For example, when extra-condensed is specified, if there is a condensed font, the compression is performed based on that font. The compression ratio at that time is (62.5/75)% = 83.3%. However, if there is no normal font in the selection candidates, the compression rate will be 62.5%.
In AH Formatter V7.0, if axf:poster-image is specified when embedding multimedia, the size of that image is used. In order to get the same result as AH Formatter V6.6 or earlier, specify scaling="non-uniform" content-width="scale-to-fit" content-height="scale-to-fit" to <fo:external-graphic>.
Revision 41 in UAX #9: Unicode Bidirectional Algorithm is now supported. If you want to process with the same algorithm as V6, set a value less than 37 in unicode-bidi-rev. A value greater than or equal to 37 is considered 41. ☞ BIDI Algorithm Implementation Restrictions
There are some differences in formatting between AH Formatter V6.6 and AH Formatter V6.5 as listed below.
There are changes in the description of Default CSS for HTML (html.css).
In CSS,
@page { -ah-force-page-count: document 4;
If you write as above, in AH Formatter V6.6, -ah-force-page-count will work when switching all pages. To make it work only on the last page as AH Formatter V6.5 or earlier, specify -ah-force-page-count to @page where the last used page selector exists.
Ligatures are suppressed by inserting the following characters:
When STIX version 2.0 font is installed by default, it will be adopted.
If OpenType has a MATH table, it will be referenced. You can control it finely using enableOpenTypeMATH, exceptOpenTypeMATHVariants in the Option Setting File.
The thickness of the line of <mlongdiv> will be drawn with the thickness of mslinethickness.
The height of each line at the time of line breaking will become the height of each line. To make the height of each line the same, as it has been with AH Formatter V6.5 or earlier, specify linebreakingHeightAdjust="false" in the Option Setting File.
There are some differences in formatting between AH Formatter V6.4 and AH Formatter V6.3 as listed below.
In the MathML settings in the Option Setting File, the default value for the alignment of subscript/superscript has somewhat been modified. In order to make it the same setting as AH Formatter V6.3, specify mathmlSettingsMode="6.3".
There are some differences in formatting between AH Formatter V6.3 and AH Formatter V6.2 as listed below.
With AH Formatter V6.2, the block containing the anchor was sent to the following page due to conditions such as orphans, and as a result, the footnote itself was sometimes arranged in the previous page. On the other hand, AH Formatter V6.3 will try to fit the dividable block after the anchor in the previous page. You can also get the same result by specifying axf:footnote-keep="always" in the original block. In order to get the same result as AH Formatter V6.2 or earlier, specify keep-footnote-anchor="false" in the Option Setting File.
The implemented list-style-type has been changed to use Predefined Counter Styles. The following style names were included in the previous list-style-type but are not included in the Predefined Counter Styles:
Even though the style names are same as those given in the previous list-style-type, some of implementation of the Predefined Counter Style may differ. If you still want to use the style names of the previous list-style-type, specify axf:number-transform="'lower-roman'" for instance, not axf:number-transform="lower-roman".
There are some differences in formatting between AH Formatter V6.2 and AH Formatter V6.1 as listed below.
The default values of latin-ligature and pair-kerning in the Option Setting File have been changed. Up to AH Formatter V6.1, these values are false. In AH Formatter V6.2, they are changed to true. This intends to be able to get a better formatting result by default. When axf:ligature-mode and axf:kerning-mode are specified specifically about those in FO, they don't influence the formatting result. These settings will influence the formatting speed.
In CSS, when the block with auto-height breaks at the end of a page for example, the block height was the break point as is up to AH Formatter V6.1. In AH Formatter V6.2, the height is adjusted to the end of a page.
The difference is remarkable when the background or the border is specified to the block. The same is applied to the end of column.
☞ 5.3. Splitting Boxes
This behavior is not applied to FO.
By specifying splitting-blocks-space="true" in the Option Setting File, you can return to the original V 6.1 behavior.
When the float width on the before side fills up the region and there is no room for wrapping text, although the text is positioned aside by the float, the block itself has overlapped with the float. This can be checked by adding a background or a border to the block. When intrusion-displace="block" is specified, the block itself is positioned aside by the float. In AH Formatter V6.2, regardless of the setting of intrusion-displace, the block itself is positioned aside by the float.
Up to AH Formatter V6.1, a page (column) break did not occur within footnote-body. In AH Formatter V6.2, it is possible to break pages (columns) within footnote-body. A footnote breaks by the setting of axf:footnote-max-height and it occurs by default. For this reason, the formatted result may differ from AH Formatter V6.1. In order to avoid the automatic break, specify auto-break-footnote="false" in the Option Setting File.
Up to AH Formatter V6.1, there was a known issue in the BIDI processing. With AH Formatter V6.2, BIDI processing was corrected. Therefore, the formatted result may differ from V6.1.
There are some differences in formatting between AH Formatter V6.1 and AH Formatter V6.0 as listed below.
In AH Formatter V6.1, Unicode normalization (UAX#15: Unicode Normalization Forms) can be performed to the inputted text. See also axf:normalize. The normalization may somehow influence the formatting speed. If you don't want to perform the normalization by default, specify normalize="none" in the Option Setting File.
In AH Formatter V6.1, when specifying a family name to the font-family, it's made available to choose a condensed font if it actually exists using the information of font-stretch="condensed" etc. Specify font-stretch-mode="6" in the Option Setting File. The operating differences between font-stretch-mode="5" and "6" are as follows:
The behavior is the same as AH Formatter V5. The information on font-stretch is not used for the font selection. That is, even if a condensed font exists in the family, it is not chosen. In order to choose a condensed font, it is necessary to specify the font name. When fonts called Foo-Regular.otf and Foo-Condensed.otf exist with the family name of Foo, Foo-Condensed.otf is not chosen even if <fo:block font-family="Foo" font-stretch="condensed"> is specified. It is necessary to specify <fo:block font-family="Foo-Condended">.
When <fo:block font-family="Foo" font-stretch="condensed"> is specified, Foo-Regular.otf is compressed and displayed. The compression ratio at that time is somewhat smaller (larger when expanding) than the value defined in the OpenType specification.
The information on font-stretch is used for the font selection. In the example above, <fo:block font-family="Foo" font-stretch="condensed"> is specified, Foo-Condensed.otf will be chosen. When a numerical value is specified as font-stretch, a condensed font is not searched. <fo:block font-family="Foo" font-stretch="extra-condensed"> is specified, and when there is no extra-condensed font, a condensed font is not necessarily compressed but the normal font will be compressed.
A compression ratio in case there is no condensed font will be the following values shown in the specification of OpenType.
ultra-condensed | 50% |
extra-condensed | 62.5% |
condensed | 75% |
semi-condensed | 87.5% |
normal | 100% |
semi-expanded | 112.5% |
expanded | 125% |
extra-expanded | 150% |
ultra-expanded | 200% |
The behavior has been corrected in AH Formatter V7.0. ☞ font-stretch-mode
Although the position of the baseline was improved by AH Formatter V5, when the character (alphanumeric character) of European languages was rendered upright in vertical writing mode, there still remains the problem that the center position was not aligned. The problem has been improved by AH Formatter V6.1. Specify baseline-mode="5" in the Option Setting File when you want to make it the same as V5.
The interpretation of the vw and vh units have been changed. Formerly the unit was based on the entire page size including page margins. In AH Formatter V6.1, it is based on the size excluding the page margins. In addition, the pvw and pvh units based on the entire page size have been added. Specify viewport-length-units-mode="5" in the Option Setting File when you want to make it the same as V5. In this case, the units behave as vw=pvw, vh=pvh, vmin=pvmin and vmax=pvmax.
When letter-spacing and word-spacing are specified to the text, the settings of axf:punctuation-trim, axf:text-autospace, etc. were invalid. AH Formatter V6.1 removed this restriction.
With AH Formatter V6.1, the treatment of the ideographic space (U+3000) has been somewhat changed.
There are some differences in formatting between AH Formatter V6.0 and AH Formatter V5 as listed below.
In AH Formatter V6.0, the behavior of span="all" differs from that in AH Formatter V5.
In AH Formatter V5, the span specified inside of the nested FO, that generates reference area such as <fo:block-container> is also effective. However, in AH Formatter V6.0, the span specified in FO nested inside of FO that generates reference area is invalid. For instance,
<fo:block-container>
<fo:block span="all">
<fo:block>ABC</fo:block>
</fo:block>
</fo:block-container>
In V5, span="all" was effective with <fo:block>ABC</fo:block>. However it's invalid in AH Formatter V6.0. In addition, when span="all" is specified to <fo:block> in the column of <fo:block-container>, that uses axf:column-count, it is considered that the span is specified to the column of the <fo:block-container>. In order to keep the same result as V5, specify span="all" to the parent's <fo:block-contianer>.
Although the specification of the forced page break between the empty block at the beginning of the document and the block with span="all" was disregarded in V5, In AH Formatter V6.0, a forced page break is effective and a blank page is produced. In order to keep the same result as V5, specify as follows:
In case of one-column format, span="all" was not effective in V5. AH Formatter V6.0, even if it's one-column format, a reference area is generated. This causes the following differences, for example:
<fo:block>AAA</fo:block>
<fo:block space-before="1cm" span="all">BBB</fo:block>
In case of one-column format, the space was generated between AAA and BBB in V5, but it's not generated in AH Formatter V6.0. It is because a reference area is generated by span even in one-column format, then the space without the specification of space-before.conditionality="retain" will be deleted at the beginning of the reference area. In order to keep the same result as V5, do not specify span="all" in one-column format.
In AH Formatter V5, there were the following problems with the position of underline, overline and line-through.
In AH Formatter V6.0, these are improved as follows:
Specify text-underline-mode="5" in the Option Setting File when you want to make it the same as V5.
In AH Formatter V6.0, the behavior of the intrusion-displace is revised and different from that of AH Formatter V5.
Specify intrusion-displace-mode="5" in the Option Setting File when you want to make it the same as V5.
The behavior of the auto value of the width of vertical-text block within horizontal-text flow (or the height of horizontal-text block within vertical-text flow) is changed with AH Formatter V6.0.
In AH Formatter V5, the width of vertical-text block was given by the width of the outer area. In AH Formatter V6.0, the auto width of vertical-text block shrinks to fit the content. If you don't want this behavior you should specify the width explicitly such as width="100%". Also the same behavior will be applied to the height of horizontal-text block within vertical-text flow.
Specify vertical-block-width-mode="5" in the Option Setting File when you want to make it the same as V5.
There is an ambiguous portion of the specification in the operation of ZERO WIDTH SPACE (U+200B). In AH Formatter V5, ZERO WIDTH SPACE is also a target for text-align="justify" and this portion becomes larger than others. In addition, since leading and trailing ZERO WIDTH SPACE in the block are not exceptions, they spread also. AH Formatter V6.0 can format as follows:
This will avoid the effect of having a one-line space in the block such like <fo:block>​</fo:block>. Specify zwsp-mode in the Option Setting File.
There are some differences in formatting between AH Formatter V5 and XSL Formatter V4 as listed below.
For example, V4 formats the following
<fo:block text-transform="capitalize"> HELLO world! </fo:block>
as follows:
Hello World!
AH Formatter V5 formats as follows:
HELLO World!
That is, although V4 changes the letters except the initial letter into lower case, AH Formatter V5 does nothing. In order to make it the same as V4, specify as follows:
<fo:block text-transform="capitalize-lowercase">
AH Formatter V5 improves the processing of trimming a line of text. Although finer control was attained by axf:text-justify-trim with this enhancement, a difference may arise in the number of characters included in one line with XSL Formatter V4. When you want to make it the same as V4 by FO which does not use axf:text-justify-trim, specify text-justify-mode="4" in the Option Setting File.
AH Formatter V5 improves the processing when putting fonts with different baselines like a mixture of Western and Japanese text. For example,
<fo:block>Latin漢字</fo:block> <fo:block>漢字Latin</fo:block> <fo:block>Latin</fo:block> <fo:block>漢字</fo:block>
like the above, you may specify font-family="'Times New Roman', 'MS Mincho'" so that Japanese fonts are not applied to Latin. Since the first font specified as font-family determines a baseline by XSL Formatter V4 at this time, a difference may arise in the height of a line. Since AH Formatter V5 selects the font in the font-family by the script or the language specification, a suitable baseline will be applied by specifying language="jpn" in the example above. When you want to make it the same as V4, specify baseline-mode="4" in the Option Setting File.
font-selection-strategy="character-by-character" is supported from AH Formatter V5 In addition, auto-fallback-font in the Option Setting File makes it possible to control the fallback. See also Font Selection.
Some incompatible changes from XSL 1.0 are made to XSL 1.1.
In XSL 1.1, even if writing-mode or reference-orientation are specified to <fo:region-*>, these are ignored and not effective. In order to make these specifications effective in XSL 1.1, it is necessary to specify the following to <fo:page-sequence>.
writing-mode="from-page-master-region()" reference-orientation="from-page-master-region()"
In order to evaluate it as well as XSL 1.0 without making any changes in FO, specify default-from-page-master-region="true" in the Option Setting File.
In XSL 1.0, fo:table is supposed to generate a reference area (see 5.6 in XSL 1.0). However, in XSL 1.1, it was corrected that this was an error. The difference is mainly generated when converting from margin-* to start-indent and end-indent specified in fo:table. For example:
<fo:block margin-left="10pt"> <fo:table margin-left="0pt"> ...
In the table like above, left margins may differ between XSL 1.0 and XSL 1.1. If start-indent etc. are used instead of margin-*, such incompatibility will not be generated.
In order to evaluate it as well as XSL 1.0 without making any changes in FO, specify table-is-reference-area="true" in the Option Setting File.
Since the shorthand in the property of XSL has succeeded the definition of CSS, the value is evaluated like CSS. That is,
margin="0pt -10pt"
is evaluated as two values instead of one formula. However, when it's not a shorthand, this is evaluated as one formula. For example, the following is one formula.
margin-left="0pt -10pt"
AH Formatter V7.0 processes such an ambiguous expression by the shorthand as follows:
In FO, when using a formula in the shorthand, it can be enclosed with parentheses, etc.
With CSS, when a function of calc() is written as calc(10pt-5pt), “-” is evaluated as a operator.We briefly explain a part of property value syntax in the XSL/CSS Extensions. This notation conforms to that in CSS. For more details, see also Value Definition Syntax.
AH Formatter V7.0 supports Unicode 10.0. Newly added characters may not be treated correctly. In addition, it's impossible to treat the character of unsupported script correctly (☞ Scripts and Languages). See unicode-bidi-rev in the Option Setting File for the BIDI control characters.
When an algorithm that is not compatible with V6.6 is selected in unicode-bidi-rev, the BIDI level may not be resolved as specified.
As shown in the following example, if the character before the break is a character such as a space whose BIDI level depends on the subsequent text, or text that changes the BIDI level depending on the presence of the corresponding character, such as parentheses or “isolate” if present, the text in multiple elements must be combined to determine the BIDI level. At this time, the BIDI level is not evaluated correctly if there is an element within the range that the text is converted depending on the evaluation result of the property. The BIDI level is obtained for the text before the property is applied.
<fo:block>aaaa <fo:inline property="xxxx">bbbbb</fo:inline> ...</fo:block> <fo:block>(aaaa<fo:inline property="xxxx">bbbbb</fo:inline> ...)</fo:block>
As shown in the following example, when the element is backward referenced, the text obtained in the evaluation result is assumed to be “Neutral”, the BIDI level is obtained once, and after the text is obtained, the BIDI processing is performed only with that text.
<p>xxxx<span ref="#yyy" style="content:target-text(attr(ref, url))"></span>zzzz</p> <p><span id="yyy">ref</span></p>
If you want to place the page number that changes the writing direction of the index reader at the left edge of the page, we recommend that you place a space in front of the reader and change the writing direction with unicode-bidi as shown in the following example, rather than giving control characters to “content”.
a::before { content: leader(dotted) " " target-counter(attr(ref, url),page); unicode-bidi: embed; direction: rtl; }
<toc>حول xxxx <a ref="#yyy"></a></toc>
To express the Unicode Range as a property value in the Font Configuration File, Option Setting File, etc., use the following format:
[ <urange> | <string> ]# | all
<urange> is a hexadecimal number with the preceding U+ and one of the following. Hexadecimal is case insensitive. (In the Unicode specification, the code point must be 4 to 6 digits, but here it is allowed to represent less than 4 digits for notation.)
U+4?? is equivalent to U+400-4FF. U+??? is equivalent to U+000-FFF. Unicode up to U+10FFFF is effective. Even if a range greater than U+10FFFF is specified, it is disregarded.
<string> is any string enclosed with quotation marks. For example, U+0028-0029 can be written as '()'.
all is considered that U+0-10FFFF is specified.
<uri-specification> in the XSL specification is supposed to specify the character string which fulfills IRI (RFC3987) specification in url(). IRI is called URI for convenience in this document.
CAUTION: | In the XSL specification, it is not necessary to enclose a string in url() with quotation marks. However, it may be difficult to accurately recognize “url(” corresponding to “)”, such as when the string contains parentheses. We strongly recommend to enclose a string in url() with quotation marks. |
---|
Schemes which can actually be specified in AH Formatter V7.0 are as follows:
It's possible to specify a correct absolute URI that includes the scheme name without using url(). For example, the following two are the same.
<fo:external-graphic src="url('http://localhost/image.png')"/> <fo:external-graphic src="http://localhost/image.png"/>
Moreover, it's possible to specify a relative URI without specifying the scheme name.
<fo:external-graphic src="url('image.png')"/> <fo:external-graphic src="image.png"/>
AH Formatter V7.0 allows specifying the file name on a local file system instead of URI for user's convenience. However, generally there is no compatibility between URI and a local file name. For example, while a white space is not allowed for URI, a white space may be available for a local file name. Moreover, since the direct use of the % may be available to use, a character string called foo%20bar.png will point to a different resource between the two cases, evaluating as URI and evaluating as a local file name.
AH Formatter V7.0 solves this problem as follows:
The relative URI is combined with base-uri and transformed into the absolute URI. All local file names are transformed into a file scheme at this time. For example, in the Windows environment, when base-uri is C:\home\, it is transformed as follows:
foobar.png | file:///C:/home/foobar.png |
url('foobar.png') | file:///C:/home/foobar.png |
url('url(foobar.png)') | file:///C:/home/url(foobar.png) |
subdir\foobar.png | file:///C:/home/subdir/foobar.png |
url('subdir\foobar.png') | file:///C:/home/subdir%5Cfoobar.png |
url('subdir/foobar.png') | file:///C:/home/subdir/foobar.png |
foo bar.png | file:///C:/home/foo%20bar.png |
url('foo bar.png') | file:///C:/home/foo%20bar.png |
foo%20bar.png | file:///C:/home/foo%2520bar.png |
url('foo%20bar.png') | file:///C:/home/foo%20bar.png |
foo%%20bar.png | file:///C:/home/foo%25%2520bar.png |
url('foo%%20bar.png') | file:///C:/home/foo%25%2520bar.png |
foo#bar.png | file:///C:/home/foo#bar.png |
url('foo#bar.png') | file:///C:/home/foo#bar.png |
foo%23bar.png | file:///C:/home/foo%2523bar.png |
url('foo%23bar.png') | file:///C:/home/foo%23bar.png |
A local file name cannot be written directly into url(). For example:
url('C:\My Document\foobar.png')
The string above will not operate as expected. Specify a local file name without surrounding by url().
“#” is a fragment separator. In file:///C:/home/foo#bar.png, the resource actually accessed is file:///C:/home/foo. Specify url('foo%23bar.png') to access a resource called foo#bar.png.
UNC (Universal Naming Convention) in Windows, for example, \\host\My Document\foobar.png is transformed into file://host/My%20Document/foobar.png. Also, //host/My Document/foobar.png will be transformed into http://host/My%20Document/foobar.png when base-uri is http:. (The same applies to https:.) In non-Windows environments, file://host/... is not supported.
The format of the data scheme defined in RFC2397 is:
"data:" [ mediatype ] [ ";base64" ] "," data
Note that a semicolon “;” is required when specifying base64, and a data delimiter is a comma “,”.
The jar scheme defined in JarURLConnection can be specified. This is effective to JAR or ZIP and possible to specify the entry in it.
jar:http://www.foo.com/bar/baz.jar!/COM/foo/Quux.png
What is specified from after the first separator “!/” is considered the entry specification. The nest of JAR or ZIP is not supported.
When accessing HTTP or HTTPS via a proxy in non-Windows environments, it's necessary to specify the proxy address with the HTTP_PROXY or HTTPS_PROXY environment variable.
When the root certificate is necessary in non-Windows environments, it's necessary to specify the file of the root certificate with the SSL_CERT environment variable.
Supports Multi-domain Certificates.
The table (<fo:table>) has the attribute, table-layout="fixed" and table-layout="auto". The former specifies the fixed layout which has the fixed column width, and the latter is a specification of the automatic layout which calculates the column width automatically. When the value is omitted, the default value is table-layout="auto". In the XSL specification, the automatic layout serves as implementation-independent. We will explain the implementation of AH Formatter V7.0 in this document.
An automatic layout can take a lot of time for calculating the width of columns. Specify table-layout="fixed" if high-speed formatting is desired.
In AH Formatter V7.0, the processing method of the table differs between the specification of table-layout and the specification of the width to <fo:table>. When the width of all columns is specified, even if table-layout="auto" is specified, it is treated as table-layout="fixed". Moreover, proportional-column-width() is supposed to be available to specify only in the case of table-layout="fixed" according to the XSL specification. In AH Formatter V7.0, when a column with proportional-column-width() and a column without the width specification are intermingled, it is considered that column-width="proportional-column-width(1)" is specified to the column without the width specification. In addition, it is considered and processed that table-layout="fixed" is specified. That is, in such case, all columns will have the width specification.
table-layout | Width of fo:table | Processing Method |
---|---|---|
fixed | Yes | The width is divided equally and assigned to the column as which width is not specified. When the content exceeds the width, it will overflow. |
No | The table width becomes 100%. The width is divided equally and assigned to the column where the width is not specified. When the content exceeds the width, it will overflow. | |
auto | Yes | The content of the column are calculated and the width is assigned to the column where the width is not specified. When the table width exceeds its specified width even if the minimum width of a column is adopted, the table width expands to the exceeded width. |
No | The content of the column are calculated and the width is assigned to the column where the width is not specified. When the table width does not fill to 100% even if the maximum width of a column is adopted, it will become the table width. When the table width exceeds 100% even if the width of a column is adopted, it will become the table width. Otherwise, the width of a table becomes 100%. |
When table-layout="auto" is specified, the content of the column where the width is not specified are investigated. More desirable column width can be determined if all rows are investigated, but it takes too much time for a big table. AH Formatter V7.0 usually investigates the contents only to the column for 100 rows at the maximum and determines the width of a column. This number of rows can be changed by table-auto-layout-limit of Option Setting File.
When table-layout="fixed" is specified, since the contents of the column are not investigated, the processing speed is always high.
CAUTION: | The column width of the table cell whose contents are generated by <fo:retrieve-table-marker> is not automatically calculated. |
---|
AH Formatter V7.0 processes two types of line breaking. One is to break lines into the line width at appropriate points at the end of every line, and the other is a processing according to the line breaking algorithm by Knuth-Plass's “Breaking Paragraphs into Lines”. (Hereinafater referred to as BPIL.) BPIL determines the break position considering the balance of the whole block.
Candidates for line breaking positions are determined by the processing of UAX#14: Line Breaking Properties. The UAX#14 processing is somewhat different from the Unicode specification as follows:
Nonstarter Japanese characters defined in JIS X 4051:2004 can be controlled by axf:line-break.
Although LB30 in UAX#14 is a non line-breaking rule before the open-parenthesis and after the close-parenthesis. AH Formatter V7.0 permits line breaking for full-width parentheses. The target objects are full-width open parenthesis, full-width close parenthesis, and full-width punctuation that are indicated in axf:punctuation-trim.
The line breaking class AI in a CJK script is processed as ID. However, U+2015 (HORIZONTAL BAR) is processed as IN since it is non-breaking character in JIS X 4051:2004.
The line breaking class of half width kana is AL. Unless it leaves a space between words as well as the alphabet, line breaking is not done. AH Formatter V7.0 treats half width kana as full width kana and processes the line breaking.
The ideographic space (U+3000) is treated as a non-starter character. If you don't want to treat it as a non-starter character, specify non-starter-ideographic-space="false" in the Option Setting File.
BPIL is applied to the following blocks:
The language of the block is specified by xml:lang or the language property, or specified by default-lang in the Option Setting File. However, BPIL is not always applied to all situations. In the following cases, BPIL is not applied, but the line breaking is performed at the end of every line.
The following are restrictions.
This section explains the behavior of the page (or column) break when hyphenation-keep="page" (or "column") is specified. Suppose there is the following sentence with hyphenation-keep="page" specified.
xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx xxxxxxxxxxx abc- def xxxxxxx ghi- jkl mnopqr.
When the page break occurs at the last line, ghi will be pushed to the next page and results in the following:
xxxxxxxxxxxxxxxx xxxxxxxxxxx abc- def xxxxxxx ---------------- page break ghijkl mnopqr.
When widows="2" is specified, another 1 line will be pushed to the next page and results in the following:
xxxxxxxxxxxxxxxx xxxxxxxxxxx abc- ---------------- page break def xxxxxxx ghi- jkl mnopqr.
But it acts against the behavior of hyphenation-keep="page". At that time, AH Formatter V7.0 cannot push only abc and accordingly 1 line will be pushed to the next page.
xxxxxxxxxxxxxxxx ---------------- page break xxxxxxxxxxx abc- def xxxxxxx ghi- jkl mnopqr.
When the previous line ends with the hyphenation, lines will be pushed line after line. It's better to use together with hyphenation-ladder-count.
In a slightly different case, lines may increase when ghi is pushed as follows:
xxxxxxxxxxxxxxxx xxxxxxxxxxx xxxx xxx xxxxxxx ---------------- page break ghijkl xxxx mno- pqr.
When widows="3" is specified, one more line will be pushed. At this time, lines may decrease as follows:
xxxxxxxxxxxxxxxx xxxxxxxxxxx xxxx ---------------- page break xxx xxxxxxx ghi- jkl xxxx mnopqr.
AH Formatter V7.0 cannot dissolve the widows="3" caused by the side effect. This is the limitation of AH Formatter V7.0. widows="2" never cause such scenario.
AH Formatter V7.0 supports the Unicode Character “Variation Sequence”. When the OpenType font has the capability of Variation Sequence (cmap Format14), it is processed appropriately. For example, Variant Sequences can be expressed as follows:
葛󠄀城市 | ![]() |
葛󠄁飾区 | ![]() |
Even when it is applied to a CID font which does not have the capability of Variation Sequence, CID is selected according to the following IVD (UTS#37: Ideographic Variation Database).
󠄀, etc. will be disregarded when it is a font which does not have the capability of Variation Sequence or there is no corresponded variation characters, or the specified Variation Sequence is beyond the range. This indicates that even if the setting is the same, the displayed font face may differ depending on which Variation Sequence the font corresponds to.
CAUTION: | Variation Sequences other than Ideographic are not supported. |
---|
Fonts in FO or CSS are specified by the font-family property. There are various cases in settings when the candidates of the font are enumerated like font-family="'Courier New', serif", or when there is no specification of font-family, AH Formatter V7.0 determines which font should be applied to a character string as follows:
The character strings in the region are divided into the character strings with the same character by the script information corresponding to the character defined by Unicode, the language specified in FO or CSS, or the script information, etc. and the script of the divided character string is determined. This method of determination is complicated because of the reason that there contains the ambiguous characters to determine if it's a full width character or not in Unicode. Or the language is being unable to determine by kanji only as a character string.
When font-selection-mode="6" is specified in the Option Setting File, each character of this character string is investigated in order whether the font-family specified by FO or CSS has its glyph. Then the font with the first found glyph will be adopted. If these are not specified, each character of this character string is investigated whether the font-family specified by FO or CSS has its glyph, and the font-family supports the Unicode Range or script in order. Then the first found supported font will be adopted. When no font-family is specified, it is considered that the generic font family as the default font family is specified.
In XSL or CSS, the following five can be used as the generic font family.
AH Formatter V7.0 has the information of which font is actually made to correspond to these for every script. Moreover, the generic font which does not belong to any script can also be defined now. These can be specified in the Font Setting page of the Option Setting dialog in Graphical User Interface, and also can be specified with <script-font> in the Option Setting File.
When the generic font classified by the script corresponding to the script of the target character string is specified, whether it supports the character string is investigated.
When the corresponding generic font classified by the script is not specified, the generic font is investigated.
When auto-fallback-font="true" is specified in the Option Setting File and any fonts specified in the font-family don't support the target character string, the following fallback processing will be performed.
It is an error even then the font which supports the target character string is not found.
The settings in the Option Setting dialog is reflected on the Option Setting File. For example, it is written like
<script-font script="Hans" serif="SimSun" sans-serif="SimHei" monospace="SimSun"/>
Since there is no specification of cursive here, cursive in the generic font is adopted to Hans. Like immediately after the installation, when <script-font script="Hans"/> itself is not specified, it is considered that the default group is specified. The following default group is set up with the Windows version. Only scripts that are specified here are set up. Moreover, it is not set up when the font does not actually exist.
Script | serif | sans-serif | cursive | fantasy | monospace |
---|---|---|---|---|---|
default | Times New Roman | Arial |
Segeo Script or Comic Sans MS or Monotype Corsiva |
Impact | Courier New |
Jpan | MS Mincho | MS Gothic | MS Mincho or MS Gothic |
MS Mincho or MS Gothic |
MS Gothic or MS Mincho |
Hans | SimSun or MS Song |
SimHei or MS Hei or MS Song |
SimSun or MS Song |
SimSun or MS Song |
SimHei or MS Hei or MS Song |
Hant | MingLiU | ← | ← | ← | ← |
Hang | Batang or BatangChe |
Gulim or BatangChe |
Batang or BatangChe |
Batang or BatangChe |
BatangChe |
Ethi no-LT | Nyala | ← | ← | ← | ← |
Arab | Arabic Typesetting | ← | ← | ← | ← |
Syrc no-LT | Estrangelo Edessa | ← | ← | ← | ← |
Hebr | FrankRuehl | ← | ← | ← | ← |
Deva | Mangal | ← | ← | ← | ← |
Beng no-LT | Vrinda | ← | ← | ← | ← |
Guru no-LT | Raavi | ← | ← | ← | ← |
Gujr no-LT | Shruti | ← | ← | ← | ← |
Taml no-LT | Latha | ← | ← | ← | ← |
Telu no-LT | Gautami | ← | ← | ← | ← |
Knda no-LT | Tunga | ← | ← | ← | ← |
Mlym no-LT | Kartika | ← | ← | ← | ← |
Sinh no-LT | Iskoola Pota | ← | ← | ← | ← |
Thai | Angsana New | ← | ← | ← | ← |
Khmr no-LT | DaunPenh | ← | ← | ← | ← |
Laoo no-LT | DokChampa | ← | ← | ← | ← |
Mymr no-LT | Myanmar Text | ← | ← | ← | ← |
The following default group is set up with the Macintosh version.
Script | serif | sans-serif | cursive | fantasy | monospace |
---|---|---|---|---|---|
default | Times or Times New Roman |
Helvetica or Arial |
Monaco or Chalkboard |
Monaco or Chalkboard |
Courier |
Jpan | HiraMinPro W3 | HiraKakuPro W3 | HiraMaruPro W3 or HiraKakuPro W3 |
HiraMaruPro W3 or HiraKakuPro W3 |
HiraKakuPro W3 |
Hans | STXihei
or STSong |
STSong | STXihei
or STSong |
STXihei
or STSong |
STSong |
Hant | LiHeiPro
or LiSongPro |
LiSongPro | LiHeiPro
or LiSongPro |
LiHeiPro
or LiSongPro |
LiSongPro |
Hang | AppleMyungjo | AppleGothic | AppleMyungjo | AppleMyungjo | AppleGothic |
Arab | Geeza Pro | ← | ← | ← | ← |
Hebr | NewPeninimMT | ← | ← | ← | ← |
Deva | DevanagariMT | ← | ← | ← | ← |
Thai | Thonburi | ← | ← | ← | ← |
The following default group is set up with the Linux version.
Script | serif | sans-serif | cursive | fantasy | monospace |
---|---|---|---|---|---|
default | Times | Helvetica | Times | Times | Courier |
There are basically three types of the orientation of text in Japanese or Chinese documents as follows:
In horizontal writing | In vertical writing | |
---|---|---|
SVO | MVO | |
![]() |
![]() |
![]() |
Expresses the orientation of text in vertical writing mode with U or R. U is a character displayed upright on the paper. R is a character rotated 90 degrees clockwise on the paper. Then the text orientation in vertical writing mode is as follows:
There is an argument of which characters should be upright or which characters should be rotated 90 degrees at UAX#50: Unicode Vertical Text Layout. Right now only the description of MVO (Mixed Vertical Orientation) is here. However, the description of SVO (Stacked Vertical Orientation) was also included in the past (tr50-6.html). AH Formatter V7.0 implements axf:text-orientation="mixed" complying with MVO, axf:text-orientation="upright" complying with SVO. However, AH Formatter V7.0 uses the one with some modifications (☞ tr50-x.Orientation.txt). This data can be modified arbitrarily in the Option Setting File. See also UAX50.
Usually, the font supporting the vertical writing mode has the glyph for vertical writing for some characters. It is because some are inapplicable to vertical writing simply by rotating the glyph for horizontal writing mode. They are small kana, punctuations, long vowel, etc. In vertical writing mode, if the character has the glyph for vertical writing, it will be used.
The orientation of text (U or R) is decided and expressed as compared to the orientation of the glyph for horizontal writing mode. However some glyphs for vertical writing mode differ from that for horizontal writing mode. The example below shows the glyph of U+3083, U+FF08, and U+2190. U+FF08 and U+2190 have the different orientation between vertical and horizontal writing mode.
Glyph for horizontal writing | Glyph for vertical writing |
---|---|
![]() |
![]() |
Although “brackets are R” as mentioned above, actually you have to display them as U using the glyph for vertical writing mode. That is, here is a tacit assumption that the glyph for vertical writing mode is designed to have the orientation differently from that for horizontal writing mode. Whether the font has the glyph for vertical writing mode or whether the orientation is the same as that for horizontal writing mode depends on the font. In particular, the difference by a font is remarkable in the orientation of symbols, such as arrows. Since it is impossible to get to know which orientation the glyph is designed, this problem is generally impossible to solve. Therefore, AH Formatter V7.0 controls the orientation of the character according to the major implementations.
When outputting PDF, AH Formatter V7.0 discards pages that have already been formatted, so AH Formatter V7.0 consumes just the memory required for one page when outputting PDF for, for example, a simple FO without <fo:page-number-citation>, no matter how huge the document is (except when formatting from the GUI). However, if a page contains an <fo:page-number-citation> that refers to a following page, we cannot know the page number of the referenced page until that page is actually formatted. For that reason, if a page containing an unresolved <fo:page-number-citation> appears, AH Formatter V7.0 will suspend its output and store the result in memory while continuing formatting. When a document has a table of contents at the start, the table of contents will not be output until all the page numbers appearing in it are resolved. Because of the high memory consumption, there is a limit to the number of formatted pages, so it is not possible to format extremely large documents.
To solve this problem, AH Formatter V7.0 makes it possible to process the document in two formatting passes. In the first pass, the formatting is processed only for resolving <fo:page-number-citation>, and all the required page number information is collected. In the second pass, formatting starts again from the first page. Since all <fo:page-number-citation> are already resolved, AH Formatter V7.0 can discard formatted pages when outputting the document. Although the formatting processing time is increased, the formatting consumes less memory and it is possible to format extremely large documents. But this has no effect on the memory consumption needed for the output.
The following shows how to perform 2-pass formatting:
CAUTION: | Two-pass formatting is not available with CSS formatting. |
---|
CAUTION: | It's not possible to perform two-pass formatting from Graphical User Interface. |
---|
CAUTION: | Two-pass formatting is not available with AH Formatter V7.0 Lite. |
---|
AH Formatter V7.0 does not make a temporary working file if it can be avoided. The following are the cases that AH Formatter V7.0 makes the temporary file for work.
With the COM Interface, PDF of a formatted result is saved to a temporary file when outputting PDF to a Web browser directly.
An XML document passed by using DOM with the COM Interface is processed using a temporary file. However, when FO is specified as the formatting type, the temporary file is not generated because DOM is processed directly.
When outputting a file while printing, a temporary file is generated.
When a file interface is required in the XSLT transformation using external XSLT, a temporary file is generated.
When the transformation from XML+XSL is required in the render method of a Java Interface, the result FO is generated as a temporary file.
In Windows version, when embedding the image that is not embeddable in PDF, a temporary file is generated in the conversion process.
A temporary file is generated when converting EPS to PDF using Distiller or Ghostscript.
When processing EPS using Distiller, if joboptions is not specified, a default joboption will be generated as a temporary file.
A temporary file is generated when outputting to a XPS file.
In GUI of Windows version, a temporary file is suitably generated by Windows System.