TextPreParser

Prev Next

Class name

com.ebd.hub.datawizard.parser.TextPreParser

Description

Very similar to the PDFPreParser, this preparser serves to extract a few details from an unstructured text file that would otherwise be difficult or impossible to extract with a 'normal' profile. For technical details, see the PDFPreParser documentation and the example below. The root element is TextPreParser instead of PDFPreParser.

Example

Assume the following, somewhat scattered input file.

************************************************************************************************
Note.............:
Load. Unit.......:  S90     2 Swap bodies 7.45
Booked GM3.......:     80   Muster contact:   Max Muster
Weight KG........:  24000   Tel.........:
Ord. LM..........:      0   Fax.........:   max@muster.com
Loading:    180516 20:30                  1 of 1  Unloading:  180517 04:00
------------------------------------------------------------------------------------------------
053-DT-1                                          147-STO-1
Max Muster AG                   
Muster-Straße 1                              Büttnerstraße 21
30165     Hannover                                  30165     Hannover
DE GERMANY                                        DE GERMANY
+4918049200         Fax:+491804926600             +4918044240         Fax:+491803424232
Sender Ref No:
Consignment              GM3 Euro/Muster/Half  Loading unit ID      Pack Receiver
------------------------------------------------------------------------------------------------
053-DT-180514147546     46.6    0    0    0
------------------------------------------------------------------------------------------------
================================================================================================
B/L No...........:                                Car Ref:
Shipment ID......:  003-DSO-S3543110    (Shipment ID to be entered on the freight invoice)
INCOTERMS........:  DDU CONSIGNEE
Transport Agreement Reference: 2125-COM-807-30
Agreed Price
FREI              560  EUR
--------------------------
Seller...........:  Example GmbH          Hauptstraße 21          GERMANY
                    30165 Hannover            4951167222202             Fax: 51167491222
Buyer on Invoice.:  Muster AG            Musterstraße 15             GERMANY
                    30165 Hannover             +49421588535600              Fax: +41588535601
Invoice Receiver.:  Muster AG                   Musterstraße 15                 GERMANY
                    30165 Hannover             +49147681000               Fax: +492347615732
*** END OF DOCUMENT ***

To get this data under control, we use the following configuration file for the preparser in a profile named TextPreParserProfile.

<?xml version="1.0" encoding="UTF-8"?>
<TextPreParser>
	<Profile>
		<Name>TextPreParserProfile</Name>
		<LineFrom>1</LineFrom>
		<LineTo>100</LineTo>
		<Tag>
			<Name>LoadUnit</Name>
			<BeginsAfter>Load. Unit.......:</BeginsAfter>
			<Words>1</Words>
		</Tag>
		<Tag>
			<Name>SwapBodies</Name>
			<BeginsAfter>2 Swap bodies</BeginsAfter>
			<Words>1</Words>
		</Tag>
		<Tag>
			<Name>ShipmentID</Name>
			<BeginsAfter>Shipment ID......:</BeginsAfter>
			<Words>1</Words>
		</Tag>
		<Tag>
			<Name>Address</Name>
			<BeginsAfter>053-DT-1 </BeginsAfter>
			<EndsBefore>Sender Ref No:</EndsBefore>
		</Tag>
		<Tag>
			<Name>Name</Name>
			<LinesAfter Tag="Address">1</LinesAfter>
		</Tag>
		<Tag>
			<Name>Street</Name>
			<LinesAfter Tag="Address">2</LinesAfter>
		</Tag>
		<Tag>
			<Name>City</Name>
			<LinesAfter Tag="Address">3</LinesAfter>
		</Tag>
		<Tag>
			<Name>Country</Name>
			<LinesAfter Tag="Address">4</LinesAfter>
		</Tag>
		<Tag>
			<Name>AdressID</Name>
			<BeginsAfter>053-DT-</BeginsAfter>
			<Words>1</Words>
		</Tag>
	</Profile>
</TextPreParser>

Download

You can download and import the profile TextPreParserProfile.pak as an example. Of course, the extracted data can be refined to your liking in the target structure.

  • TextPreParser.xml (already integrated in the profile, but here again explicitly).

  • input.txt (already integrated in the profile as test data, but here again explicitly for manual upload).