Extracts the text from the input PDF file and outputs it in the form of a text file.
Extracts the text of test.pdf and outputs it to out.txt.
AHPDFToolCmd80.exe -extractText C:\sav\out.txt -pageNo 0,2-4 -sort -d C:\test\test.pdf
AHPDFToolCmd80 -extractText /home/antenna/sav/out.txt -pageNo 0,2-4 -sort -d /home/antenna/test/test.pdf
You can perform batch processing by specifying the input folder to the -d parameter.
If a folder is specified, text will be extracted from the PDF file in the input folder. Specify the output folder with the parameter [outTextFilePath].
The output file will be output to the specified folder with the input file name with the extension changed to ".txt"
|
Parameter |
Content |
|
<outTextFilePath> |
[required] If there are multiple pages to be extracted, "pageX" is output in the first line. |
|
-pageNo <Val> |
Sets the page number to extract text from. Can be omitted. Page number is 0 origin. Therefore, the first page is counted from "0." If specifying multiple names, separate them with commas. (Example) "0,2-4" |
|
-sort |
Sorts text by coordinate. |
|
-rect <left> <bottom> <right> <top> |
Can be omitted. If -sort is specified: Sorts within the specified range. |