PDF Output

PDF output is a standard function of Office Server Document Converter V10.0. The PDF versions that can be output are as follows:

Office Server Document Converter V10.0 outputs PDFs with the following features:

PDF/A

PDF/A is defined by ISO 19005 and it is the specification intended to be suitable for long-term preservation of electronic documents. Office Server Document Converter V10.0 is available for outputting the following version of PDF/A.

The following shows the main features for PDF/A:

PDF/A-1a PDF/A-1b PDF/A-2a PDF/A-2b PDF/A-2u PDF/A-3a PDF/A-3b PDF/A-3u
All fonts must be embedded yes yes yes yes yes yes yes yes
Files must be tagged yes no yes no no yes no no
Files must include XMP compliant metadata yes yes yes yes yes yes yes yes
Files may include encryption no no no no no no no no
Files may include LZW Compression no no no no no no no no
Files may include Transparent images no no yes yes yes yes yes yes
Files may refer to the external content no no no no no no no no
Files may include JavaScript no no no no no no no no
All text must be convertible to Unicode yes no yes no yes yes no yes
PDF/A can be attached no no yes yes yes yes yes yes
Any file other than PDF/A can also be attached no no no no no yes yes yes

PDF/A require that all fonts are embedded; if a font cannot be embedded due to security restrictions or other issues, a PDF/A will not be generated. Also, the embedding of the ICC profile is required with PDF/A, so when specifying the output intent, only the URL specification of the ICC profile is effective.

Most information (including embedding of fonts etc.) is adopted precisely and user settings are ignored.

XMP metadata is automatically generated from the document information of PDF. Embedding additional information is not available in Office Server Document Converter V10.0.

Since fonts cannot be embedded in forms in PDF, PDF/A cannot be generated.

Linearized PDF

A linearized PDF file makes viewing of the generated PDF on the web faster. Features of linearized PDF include the following:

PDF that is optimized for fast web view indicates this linearized PDF.

CAUTION: It has been confirmed that some viewers might judge that linearized PDF with file size of less than or equal to 4 KB is not optimized.

Font Output

Adobe Type 1 fonts (including Adobe Standard 14 Fonts), TrueType fonts (including OpenType fonts with TrueType Outlines), OpenType fonts (PostScript Outline) and Macintosh TrueType font data fork suitcase are supported for PDF output. Other font formats are not supported. For more details, see the “Fonts”.

Office Server Document Converter V10.0 requires that the fonts, which are specified in documents, are installed on your system in order to use them correctly. See Windows help or follow the installation instructions attached to the fonts for the method of installing the font in Windows versions. The font placed outside the font folder can be outputted to PDF in Windows versions. At this time, you need to specify some setting in the Font Configuration File. However, the font cannot be displayed in GUI.

These 14 Adobe Type 1 fonts are called Standard 14 Fonts in PDF.

It is not necessary to prepare an AFM (Adobe Font Metrics) file, even when using an Adobe Type 1 font (except for these Standard 14 Fonts). The glyph names of Adobe Type 1 fonts maps to character codes (Unicode) of formatting data according to the AGL (Adobe Glyph List) specification. The glyph with a name that is not defined in AGL is not output. For more details about AGL, see also Unicode and glyph mapping using the .AFM file.

CAUTION: When the PDF includes a transparent image that is displayed with Adobe Acrobat or Reader, the character might appear somewhat bolder. This is a known problem of Adobe Acrobat or Reader.

Character Sets, Encoding

The following character sets are supported:

  • Adobe Standard Latin character set
  • Symbol character set
  • ZapfDingbats character set
  • Japanese character set (Adobe-Japan1-Supplement2)
  • Simplified Chinese character set (Adobe-GB1-Supplement2)
  • Traditional Chinese character set (Adobe-CNS1-Supplement0)
  • Korean character set (Adobe-Korea1-Supplement1)

Encoding of all characters is processed as Unicode within Office Server Document Converter V10.0. In the case of Chinese, Japanese, Korean (CJK), Office Server Document Converter V10.0 maps the Unicode to glyph in each CJK character sets by using the following CMap:

  • Japanese : UniJIS-UCS2-H(V) UniJIS-UCS2-HW-H(V)
  • Simplified Chinese : UniGB-UCS2-H(V)
  • Traditional Chinese : UniCNS-UCS2-H(V)
  • Korean : UniKS-UCS2-H(V)

The characters that do not belong to the above character sets are embedded in the PDF by getting the glyphs from the font files. This process is done only for TrueType, OpenType fonts.

Font Embedding

Embedding font makes it possible to display PDF files even in the environment where there are no fonts.

In the default setting of TrueType font processing, only the outline of glyphs that are not defined by CMap is embedded. In cases where embedding TrueType fonts are prohibited by a font vendor, error occurs and processing stops. This error can be avoided by replacing it with a white space and output PDF. You can also specify the option that all glyphs of a font are to be embedded whether the character is defined by CMap or not.

In the default setting of Adobe Type 1 font processing, only the outline of a font that has font specific encoding is embedded. The option to embed all glyphs of a font can also be specified whether the font has standard or font specific encoding.

Image Output

For more information about supported graphic images, see the “Graphics”.

Vector Images

The following vector images outputted to PDF as vector primitives are replaced with PDF operators:

Raster Images

The raster images attached in a MS Office document are stored in different formats. In Office Server Document Converter V10.0, when the image is the format which can be embedded directly into PDF, the image extracted from the original document is embedded. Otherwise, it is embedded after converting into JPEG etc. The image format that is not able to convert will be disregarded.

The raster images which can be embedded directly in a PDF are as follows:

These are the following restrictions:

  • Progressive JPEG, Interlaced GIF are transformed into regular JPEG or GIF images.
  • 16-bit color in PNG or TIFF is reduced to 8-bit color.
  • When alpha channel is attached to PNG or TIFF, it is divided.
  • There are some unsupported TIFF formats.
  • JPEG 2000 is embedded into PDF only when it is PDF1.5 or later. For other versions, it is embedded after being converted to JPEG etc.

RGB Output

When the file which includes the transparent object is converted into PDF, a color may be dull if a conversion result is displayed by Adobr Reader, etc. In that case, it is avoidable by changing the transparent color space into RGB. Specify as follows in the Option Setting File.

<?xml version="1.0"?> <formatter-config> <pdf-settings transparency-color-space="DeviceRGB"/> </formatter-config>

This setting is effective with V4.0 MR3 or later. From V5, the color model can be outputted as RGB by default.