BOB: Business Objects Board
Not endorsed by or affiliated with SAP

Register | Login 

Follow BOB on Twitter! 
Follow BOB on Twitter! (Opens a new window)  

General Notice: BOB is going to retire...please see details here.
General Notice: No events within the next 45 days.

Performance Tuning of Dataflows


 
Search this topic... | Search DI: Performance and Tuning... | Search Box
Register or Login to Post    Forum Index -> Data Integrator -> DI: Performance and Tuning  Previous TopicPrint TopicNext Topic
Author Message
didev
Senior Member
Senior Member



Joined: 05 May 2006

Posts: 77



PostPosted: Fri May 19, 2006 2:07 pm 
Post subject: Performance Tuning of Dataflows

Hello all,

I'm using DI XI with Oracle 9i as source & target database.

A DI job which used to complete in 4.5 hours, now running for 7 - 9 hours with marginal increase in input records. (From 1.6 lakh to 1.9 lakh)

On the whole it is two dataflows consuming most of the time. These flows consists no special transformation except Query transformation.

SOURCE --> QT1 --> QT2 --> QT3 --> TARGET

The Query transformations have some complex ifthenelse and lookup functions for some of the column mappings. I found the query being submitted by DI include only basic column selection and doesn't include any complex logic, resulting in more run time.

Introducing outer join instead of lookups is not possible as, the lookups are made to time dimension table to obtain time keys for each of the date columns in the query.

I'm not sure if replacing ifthenelse with decode will help DI pushing the SQL to database server.

Please let me know any ideas for performance tuning of the above said dataflows.

I wish to know if DI has the capability for pushing down the SQL in 2nd/3rd Query transform to database? If not, is there any workaround available for improving performance of the ifthenelse, lookup functions in the query transformations.

Thanks in advance.
Back to top
SantoshNirmala
Forum Associate
Forum Associate



Joined: 15 Mar 2006

Posts: 513
Location: Hartford, CT


flag
PostPosted: Fri May 19, 2006 5:09 pm 
Post subject: Re: Performance Tuning of Dataflows

Hi,

I think decode() is function which implements ifthenelse conveniently but when coming to performance I hope it does the same as ifthenelse.
for tuning, if you are not populating entire columns of the souce table and if you are trying to filter the records in the first transform itself...it would be better if we use the SQL transform(implementing in form of a sql query), this would decrease the execution time but behaves differently with different source databases.

Regards.
Santosh.
Back to top
SantoshNirmala
Forum Associate
Forum Associate



Joined: 15 Mar 2006

Posts: 513
Location: Hartford, CT


flag
PostPosted: Fri May 19, 2006 5:11 pm 
Post subject: Re: Performance Tuning of Dataflows

Hi,

I think decode() is function which implements ifthenelse conveniently but when coming to performance I hope it does the same as ifthenelse.
for tuning, if you are not populating entire columns of the souce table and if you are trying to filter the records in the first transform itself...it would be better if we use the SQL transform(implementing in form of a sql query), this would decrease the execution time but behaves differently with different source databases.

Regards.
Santosh.
Back to top
Werner Daehn
Forum Devotee
Forum Devotee



Joined: 17 Dec 2004

speaker.gif*6
Posts: 10593
Location: Germany


flag
PostPosted: Sat May 20, 2006 1:45 am 
Post subject: Re: Performance Tuning of Dataflows

Yes, if the upstream query can be pushed down entirely, the downstream query can be pushed down as well (or partially pushed down).

Example1: S --- Q with ifthenelse() --- Q with more ifthenelse ---

will become one SQL select


Example2: S --- Q with lookup ---- Q with group by ----

lookups are functions that cannot be pushed down so lookup and group by will be executed in the engine. If parts of the first query can be pushed down, e.g. the where clause and/or other mappings, they will.


Decode is a metafunction and gets converted to ifthenelse internally.


To answer your question: What is the resource characteristic when executing? Is CPU the limit? You might want to try to to set the "Degree of Parallelism" property of the DF to e.g. 4 and see if there is a positive gain.

_________________
Werner Daehn
ex Product Manager Data Services at SAP SE

www.rtdi.io
Back to top
didev
Senior Member
Senior Member



Joined: 05 May 2006

Posts: 77



PostPosted: Sat May 20, 2006 6:18 pm 
Post subject: Re: Performance Tuning of Dataflows

Thanks Santosh and werner!

I tested both ifthenelse and decode functions on a simple dataflow extracting from an oracle source table and inserting to an oracle target table.

I observed in the optimized dataflow in both the cases, the function is not being pushed down to database server but being computed by DI internally. I read from DI reference guide that decode function will be pushed down to database server when possible. But, I'm not sure in what scenario it is possible?

I observed that CPU is not a restriction and trying to analyze the database / network activity.

Werner, please let me know if it is fine to set "Degree of Parallelism" to 4 while having only 1 CPU on the Job Server Computer. Also, what other factors are to be considered while going for parallelism.

Thanks!
Back to top
Werner Daehn
Forum Devotee
Forum Devotee



Joined: 17 Dec 2004

speaker.gif*6
Posts: 10593
Location: Germany


flag
PostPosted: Sun May 21, 2006 7:49 am 
Post subject: Re: Performance Tuning of Dataflows

If the CPU is not at 100% it does make sense to use DOP.

Not a guarantee that everything will be faster but it should not get worse.


As an example, lets use a S --> Q --> T flow and in the query there is a function that takes 1 second, e.g. the sleep() function.

If you set the DOP to four you will execute the function 4 times parallel with 1/4 of the rows each thus be 4 times faster.

On the other hand, if this function is e.g. a lookup and your source system gives you 1 row per second regardeless of the number of requests, it does not matter if you process 4 lookups sequentially thus taking 4 seconds, or 4 functions calls in parallel each waiting 4 secs. It will take 4 secs in both cases.

_________________
Werner Daehn
ex Product Manager Data Services at SAP SE

www.rtdi.io
Back to top
eepjr24
Forum Enthusiast
Forum Enthusiast



Joined: 16 Sep 2005
ASUG Icon
Posts: 1815
Location: Orlando, FL


flag
PostPosted: Mon May 22, 2006 8:27 am 
Post subject: Re: Performance Tuning of Dataflows

And a few general tuning tips:

1. Make a basic dataflow with just S -> Q -> T. Start with all records and then modify this until you get what you want pushed down (IE, filters, group bys, and DB functions) in the Validation SQL. Then add Queries to handle lookups and other portions that cannot be pushed down.

2. Always eliminate records as early as possible. The more times you process a record that will not come out to the target, the more poorly you will perform.

3. Make sure you are using or can add appropriate indices to the pushed down SQL. Get your local DBA to assist if needed.

4. If even one or two of your lookups can be implemented as joins, you will save work. It is often worth it to create cross reference tables for this purpose if you have to.

- Ernie
Back to top
dnewton
Forum Addict
Forum Addict



Joined: 30 Jan 2004
ASUG Icon
Posts: 3458
Location: Massachusetts, USA


flag
PostPosted: Mon May 22, 2006 12:45 pm 
Post subject: Re: Performance Tuning of Dataflows

One other general tip: If you are doing a combination of aggregations and derivations, put the Aggregation step first, if you can. Then derive in a later Query step. That way the database server does the aggregation work.
Back to top
didev
Senior Member
Senior Member



Joined: 05 May 2006

Posts: 77



PostPosted: Tue May 23, 2006 4:26 pm 
Post subject: Re: Performance Tuning of Dataflows

Thanks for all the suggestions.

As Ernie pointed out, making a basic dataflow with just S -> Q -> T and modifying this until the desired SQL is pushed down is a good idea while developing.

Changing the degrees of parallelism to 4 hasn't increased performance except for a few seconds. I'm trying to analyze on this to find out where exactly, time is being consumed.

Thanks again.
Back to top
Display posts from previous:   
Register or Login to Post    Forum Index -> Data Integrator -> DI: Performance and Tuning  Previous TopicPrint TopicNext Topic
Page 1 of 1 All times are GMT - 5 Hours
 
Jump to:  

Index | About | FAQ | RAG | Privacy | Search |  Register |  Login 

Get community updates via Twitter:

Not endorsed by or affiliated with SAP
Powered by phpBB © phpBB Group
Generated in 0.0282 seconds using 17 queries. (SQL 0.0022 Parse 0.0009 Other 0.0251)
CCBot/2.0 (https://commoncrawl.org/faq/)
Hosted by ForumTopics.com | Terms of Service
phpBB Customizations by the phpBBDoctor.com
Shameless plug for MomentsOfLight.com Moments of Light Logo