Partneri

PDF Splitter

Description

PDF Splitter is an application, that serves to divide the document into smaller PDF documents, based on certain rules. It find use mainly in cases, when it comes to documents generated by the reporting tool and is therefore not possible to determine, on which side starts a new document. In order not to be separate documents one by one, the application requires as input the directory containing the documents for distribution. The parts are stored in the destination directory and their names are generated based on the specified format.

The user interface is clear and gives the user the possibility of store (imposing) and load settings and test the specified pattern.

Searching text

Text is searching by comparing the actual text with searching pattern. Patern is a regular expression (regex). Patterns of search separators and texts, on which to generate the file names, are therefore regular expressions, that define how the text should look like. On the Internet you can find many articles on regular expressions and how they write.

Entering the actual text in the “Test text” and press the “Test” is possible to determine, whether a pattern is correctly written. Recommended in the field “Test text” enter a single line of the original text, seeing that searching is in progress by lines, and in a regular expression is possible to define inter alia the beginning and / or end of text too.

Determination of the position of text on page

Sometimes it is difficult or even impossible to specify a suitable pattern, because of the parts of text matching with pattern in the document appear in large numbers and it is not always necessary to divide document by this occurrence .

To achieve the greatest possible accuracy in the search text, in such cases should be defined in addition to design and place on the side where the separator or text, that defines the name of the sub-document, is expected. You can choose from the following positions on the page:

Header / first line
Specific line
Footer / last line
Anywhere on the page

Attention: Unlike PDF document browsers, it can not confidently get from the PDF document header, footer and content of page in the right order, can not always depend on it, where in the extracted text will appear. Therefore, if the separator or data necessary to generate the file name always appear in the same place, choose to recommend specific line and enter the number. The correct line number can determine using the analysis of the document accessible from the toolbar.

Generation of document namesIf the application detects, that on the current page is a separator, on the same page to search for text, that would fit a pattern of text that contains the data necessary to generate a new sub-title of the document. The file name is generated according to the specified format.If you find such a text, it is displayed the message and the following pages are skipped until the next separator occurs.When a separator is text found on a page, the application looks for source file nameat the same page by trying to find a part of text that matches the pattern. If nothing isfound, an error message is displayed and following pages are skipped until anotherseparator is found. Otherwise, a file name is generated according to the file name format.Filename format consists of an optional hard parts and at least one variable part (in order to save new-generated documents, it must have a different file name).Variable components are defined by a positive integer in braces (e.g. {2}).

Example of pattern written by regular expression:

(\d{4})\.\s*(\w+)

This pattern is a text anywhere on the line beginning with four digits followed by a dot followed by any number of whitespace (including none), followed by at least one alphanumeric character. These texts match a given pattern:

This is the text. This is 1234.other text
This is the text. This is 1234. other text

This texts don’t match with givven pattern:

This is the text. This is 123.other text less than 4 numbers
This is the text. This is 12345.other text more than 4 numbers
This is the text. This is 123 other text dot missing
This is the text. This is 1234./other text ‘/’ isn’t alfanumerical

Parts of the pattern can be enclosed in parentheses and define groups. In the above example defined two groups:

(\ d {4}) \. \ s * (\ w +)

Groups are numbered from left to right from 1. In the case of nested groups, outside group go before nested groups. The true value of the group referred to a positive integer enclosed in braces.

Example

In this example we use before define pattern:

(\ d {4}) \. \ s * (\ w +)

and format file-title partial document

HR Report {1}-{2}.pdf

By splitting of document will generate partial documents with folowing file-names:

2008.Január HR Report 2008-Január.pdf
2002.Júl HR Report 2002-Január.pdf
1998. Marec03 HR Report 1998-Marec03.pdf

Run the application

PDF Splitter have one custom parameter:

–conf path to configure XML file

If the file does not exist (or is not specified), the application will try to find the config.xml file in the directory from which it was launched. If file doesn’t find, open ablank window.