BOB: Business Objects Board
Not endorsed by or affiliated with SAP

Register | Login 

Follow BOB on Twitter! 
Follow BOB on Twitter! (Opens a new window)  

General Notice: BOB is going to retire...please see details here.
General Notice: No events within the next 45 days.

Reading Structured Data from PDF


 
Search this topic... | Search DI: Text Analytics... | Search Box
Register or Login to Post    Forum Index -> Data Integrator -> DI: Text Analytics  Previous TopicPrint TopicNext Topic
Author Message
oops2001
Forum Member
Forum Member



Joined: 22 Mar 2016

Posts: 1



PostPosted: Wed Mar 30, 2016 2:31 am 
Post subject: Reading Structured Data from PDF

Hello,

We have the data in PDF as shown in the attachment 'sample'. We have to extract the data and make it structured in target database so that further universe can be created over it for anlalysis.

I have tried to extract the data using text entity extraction but only few port names are getting identified. In the field converted_text, we are getting the whole dump in one field only ( which is still unstructured ). As shown in sample 2.

In sample 3, we are getting the whole dump in one field only. Attaching screenshot for more clarity.

How to proceed to fullfill the requirement please suggest.

Many thanks
Back to top
CLS69
Principal Member
Principal Member



Joined: 11 Jun 2009

Posts: 215
Location: Italy


flag
PostPosted: Thu Apr 07, 2016 7:31 am 
Post subject: Re: Reading Structured Data from PDF

an idea could be:
1. read the text file into a 3 columns table: column 1 being a sequence (ID Field), column 2 the whole file line (VAL Field) and column 3 initially NULL (TYPE Field). You have 11 rows in 6 columns, hence the table should have a total of 11x6 = 66 rows;

2. update the TYPE Field 6 times, setting it to "Col_x", filtering each time from ID (x + 1) and (x + 1 + TotRows/6), with "x" spanning from 1 to 6.
Now you should have something like the following:

ID - VAL - TYPE
1 Rank Col_1
2 1 Col_1
....
11 10 Col_1
12 Port Col_2
......

3. Pivot the table around TYPE Column, and leave ID Field back. Now you should have what you display in sample 3.png

4. move everything to your final table, excluding the first row (that contains column names)

_________________
When you are right, there is no need to yell. When you are not right, there is no reason to yell.
Back to top
Display posts from previous:   
Register or Login to Post    Forum Index -> Data Integrator -> DI: Text Analytics  Previous TopicPrint TopicNext Topic
Page 1 of 1 All times are GMT - 5 Hours
 
Jump to:  

Index | About | FAQ | RAG | Privacy | Search |  Register |  Login 

Get community updates via Twitter:

Not endorsed by or affiliated with SAP
Powered by phpBB © phpBB Group
Generated in 0.1192 seconds using 18 queries. (SQL 0.0948 Parse 0.0009 Other 0.0235)
CCBot/2.0 (https://commoncrawl.org/faq/)
Hosted by ForumTopics.com | Terms of Service
phpBB Customizations by the phpBBDoctor.com
Shameless plug for MomentsOfLight.com Moments of Light Logo