PDF Splitter
Description
PDF Splitter is an application, that serves to divide the document into smaller PDF documents, based on certain rules. It find use mainly in cases, when it comes to documents generated by the reporting tool and is therefore not possible to determine, on which side starts a new document. In order not to be separate documents one by one, the application requires as input the directory containing the documents for distribution. The parts are stored in the destination directory and their names are generated based on the specified format.
The user interface is clear and gives the user the possibility of store (imposing) and load settings and test the specified pattern.
Searching text
Text is searching by comparing the actual text with searching pattern. Patern is a regular expression (regex). Patterns of search separators and texts, on which to generate the file names, are therefore regular expressions, that define how the text should look like. On the Internet you can find many articles on regular expressions and how they write.
Entering the actual text in the “Test text” and press the “Test” is possible to determine, whether a pattern is correctly written. Recommended in the field “Test text” enter a single line of the original text, seeing that searching is in progress by lines, and in a regular expression is possible to define inter alia the beginning and / or end of text too.
Determination of the position of text on page
Sometimes it is difficult or even impossible to specify a suitable pattern, because of the parts of text matching with pattern in the document appear in large numbers and it is not always necessary to divide document by this occurrence .
To achieve the greatest possible accuracy in the search text, in such cases should be defined in addition to design and place on the side where the separator or text, that defines the name of the sub-document, is expected. You can choose from the following positions on the page:
- Header / first line
- Specific line
- Footer / last line
- Anywhere on the page
Attention: Unlike PDF document browsers, it can not confidently get from the PDF document header, footer and content of page in the right order, can not always depend on it, where in the extracted text will appear. Therefore, if the separator or data necessary to generate the file name always appear in the same place, choose to recommend specific line and enter the number. The correct line number can determine using the analysis of the document accessible from the toolbar.
- This is the text. This is 1234.other text
- This is the text. This is 1234. other text
- This is the text. This is 123.other text less than 4 numbers
- This is the text. This is 12345.other text more than 4 numbers
- This is the text. This is 123 other text dot missing
- This is the text. This is 1234./other text ‘/’ isn’t alfanumerical
Parts of the pattern can be enclosed in parentheses and define groups. In the above example defined two groups:
(\ d {4}) \. \ s * (\ w +)
Groups are numbered from left to right from 1. In the case of nested groups, outside group go before nested groups. The true value of the group referred to a positive integer enclosed in braces.
Example
In this example we use before define pattern:
(\ d {4}) \. \ s * (\ w +)
and format file-title partial document
HR Report {1}-{2}.pdf
By splitting of document will generate partial documents with folowing file-names:
- 2008.Január HR Report 2008-Január.pdf
- 2002.Júl HR Report 2002-Január.pdf
- 1998. Marec03 HR Report 1998-Marec03.pdf
Run the application
PDF Splitter have one custom parameter:
- –conf path to configure XML file
If the file does not exist (or is not specified), the application will try to find the config.xml file in the directory from which it was launched. If file doesn’t find, open ablank window.