
| General Notice: Upcoming Events: BI show Toulouse: Oct 18. There are 50 ASUG sponsored events in the next 45 days, see calendar for details. |
| Author |
Message |
wdaehn Forum Groupie


Joined: 17 Dec 2004
    
*6 Posts: 7635 Location: Germany

|
Posted: Mon Sep 15, 2008 9:03 am Post subject: What volumes of data do you move? |
|
|
I would like to know
* how many MB of data you load daily
* what the size your target database is in GB
* how many rows the biggest dataflows load
* what the overall load time is for the daily run
Some values are easy to get, others might not, just answer what you have at hand. This should not be a competition on who has the most, it should help us to understand where we are going. Should we focus more on performance, is usability more important as the volumes are so small for the majority of the customers etc... _________________ Werner Daehn
Product Manager Data Integrator
SAP Business Objects
PLEASE SUPPORT THIS: Run one ATL job and tell us the benchmark result. Details to be found here. |
|
| Back to top |
|
 |
dnewton Forum Addict


Joined: 30 Jan 2004
       Posts: 3194 Location: Massachusetts, USA

|
Posted: Tue Sep 16, 2008 6:56 am Post subject: Re: What volumes of data do you move? |
|
|
1. Not sure of the Mb - looking at the Loader statistics, we load on average 65M rows per night. This number includes re-aggregating certain tables etc., but does not include Data Transfer loading or staging of data.
2. For the core ETL process: Two targets (central DWH and spun-off Marts), totaling ~650Gb. 390 tables across both databases.
3. Largest regularly nightly load is 5M rows. Largest single table is 100M rows.
4. Takes 5 hours to load these 65M rows. Would be faster but we have poor change-data information on several sources.
I'd vote for programmer productivity/usability features, followed closely by 64-bit performance and driver support.  |
|
| Back to top |
|
 |
dbeers Senior Member


Joined: 19 Aug 2002
        Posts: 42 Location: Commerce Township, MI, USA

|
Posted: Tue Sep 23, 2008 12:22 pm Post subject: Re: What volumes of data do you move? |
|
|
We are using DI 11.7.2 to move data near real-time by looping DI batch job flows, so I will try to answer your query from that perspective.
I suspect we move about 5-10GB/day into our 4 datamarts via 4 DI jobs. The largest datamart is currently around 30GB, and the source data warehouse database is about 160GB - all reside on SQL Servers, the source being on a 2000 box and the datamart targets are all on a 2005 box.
Occasionally we need to do a full refresh of the datamarts, and when we do, we are moving about 100M rows into the 4 datamarts. The longest refresh (time-wise) is into the SalesMart - primarily due to transformations. Its largest table has 4.1M rows and the smallest only has 1 row. The total record count for the datamart is in the vicinity of 6M rows. To do this refresh generally takes around 6 hours.
The largest table that gets refreshed is about 24.3M rows (9GB) and takes about 4 hours to refresh.
What I would like to see most is DI support for AS400. When we purchased DI a year ago, we replaced the old ETL tool only for our SQL to SQL needs (building out the datamarts from the warehourse), leaving our existing tool to run between the ERP system on the AS400 and the SQL Server based data warehouse. At that time, we were told that a DI AS400 interface was in the works and was going to be available Q4 2007. Now we are told that there is no AS400 interface in the works, and since we are moving off our old ETL tool completely sometime in the next 6-12 months (due to rising licensing costs with no value add for the existing product), we are now looking at implementing yet another ETL tool for the AS400 to SQL piece. _________________ Doug Beers
HoMedics, Inc. |
|
| Back to top |
|
 |
wdaehn Forum Groupie


Joined: 17 Dec 2004
    
*6 Posts: 7635 Location: Germany

|
Posted: Tue Sep 23, 2008 1:29 pm Post subject: Re: What volumes of data do you move? |
|
|
Can you clarify the AS/400 piece?
We have customers using DB2 connect to read from AS/400, others generate files and read them in via DI Cobol Copybooks and some use the Attunity Mainframe connector for that.
What do you need, what method are you using today, what's missing? _________________ Werner Daehn
Product Manager Data Integrator
SAP Business Objects
PLEASE SUPPORT THIS: Run one ATL job and tell us the benchmark result. Details to be found here. |
|
| Back to top |
|
 |
dbeers Senior Member


Joined: 19 Aug 2002
        Posts: 42 Location: Commerce Township, MI, USA

|
Posted: Wed Sep 24, 2008 7:03 am Post subject: Re: What volumes of data do you move? |
|
|
Specifically what we are looking for is CDC support to the JDE tables on our AS400 to maintain our real-time data warehouse. We have been told that DI does not support that at this time. We currently use DataMirror (now IBM) Transformation Server to handle that functionality. _________________ Doug Beers
HoMedics, Inc. |
|
| Back to top |
|
 |
1010 Principal Member


Joined: 19 Aug 2006
    Posts: 453 Location: Somewhere EveryWhere!

|
Posted: Fri Jan 08, 2010 1:33 pm Post subject: Re: What volumes of data do you move? |
|
|
Hi,
We are planing to move around >= 500Gb of data from Informix Dynamic Server 9.52.FC5W4X3 source to Oracle 10g DB target, which will be our new ODS with 137 tables, and then from ODS to our new Dataware house, consisting of 38 dimensions all of Type-1 and 15 facts tables.
We are using DI 11.7 on Win server 2003.
The largest table we have in our Source has around 300 mill rows.
I will let you know, once we move it, from Informix to Oracle, hopefully pretty soon!
Thanks, _________________ @Data is Delicious
++++++++++++++++
Where you see risk,
I see opportunity! |
|
| Back to top |
|
 |
David Coutts Forum Member


Joined: 03 Jul 2009
 Posts: 21 Location: Scotland

|
Posted: Tue Jan 12, 2010 4:01 am Post subject: Re: What volumes of data do you move? |
|
|
On a nightly basis we will be moving about 250Mb of data and once a week 8.5Gb of data (at the moment)
The target databse is currently around 20Gb
The biggest dataflows process approxiamatley 7000 rows on a nightly basis but are expected to load 4 Million Rows on a weekly basis.
The Nightly Load takes approximatly 1.5-2 hours whereas the weekly refresh takes approximatly 22 hours. It takes so long as we have to do some pretty silly things with the data from flat files to get it into a usable format.
Like dnewton i agree that Usability is the key for us here |
|
| Back to top |
|
 |
k7nixen Senior Member


Joined: 07 Oct 2006
   Posts: 88

|
Posted: Wed Jan 13, 2010 3:24 pm Post subject: Re: What volumes of data do you move? |
|
|
Approximately 50 million records are staged from flat files. After data validation, transformation and aggregation the data is moved to the target. Especially transformation part is very complex since the data is coming from different sources. Based on customer's needs we created a common data architecture model. All of the common data is translated to new business terminalogy based on the predefined rules etc. Currently the biggest target table has more than 20 billion records (it is a GSM -telecommunication- company). Data retention is 400 days so on 401st day, the 1st partition is dropped automatically which means the final table won't grow significantly as long as daily volume doesn't increase dramatically.
Daily load takes 50-55 minutes.
HP-UX Itanium (16 CPUs & 48GM RAM)
DI 11.5
Oracle 10g
Z. |
|
| Back to top |
|
 |
1010 Principal Member


Joined: 19 Aug 2006
    Posts: 453 Location: Somewhere EveryWhere!

|
Posted: Mon Jun 21, 2010 11:03 am Post subject: Re: What volumes of data do you move? |
|
|
Alright Folks...
We just moved around 75 GB (including Indexes) of data from our Legacy Sources to Oracle 10g target in 55 hrs. we loaded around 120 ODS tables.
The Primary sources was IBM Informix Dynamic Server Version 11.50.FC5W4X3, Oracle 10g DB, Flat Files ...
This was not a direct dump, the task was to construct new ODS from these Sources. Some of the ETL code was fairly complex with lots of look_up and complex decode statements.
One of the ETL transfered 23 mill rows in just 1:33 hrs, pretty neat huhhhh..It had 5 look-ups and 3 decode statements, one of them 5 levels deep.
Our Machine Config:
Windows Server 2003 R2 Enterprise x64 edition, 2 Quad CPU @ 3 GHZ, with 32 GB of Ram.
Oracle 10g target DB on UNIX and BODI server is on different machine.
This is just 1/8 of data that we migrated, just for testing purpose!
Any recommendations, we want to minimize our time.
Thanks, _________________ @Data is Delicious
++++++++++++++++
Where you see risk,
I see opportunity! |
|
| Back to top |
|
 |
k7nixen Senior Member


Joined: 07 Oct 2006
   Posts: 88

|
Posted: Mon Jun 21, 2010 8:46 pm Post subject: Re: What volumes of data do you move? |
|
|
| SSV wrote: | Alright Folks...
We just moved around 75 GB (including Indexes) of data from our Legacy Sources to Oracle 10g target in 55 hrs. we loaded around 120 ODS tables.
The Primary sources was IBM Informix Dynamic Server Version 11.50.FC5W4X3, Oracle 10g DB, Flat Files ...
This was not a direct dump, the task was to construct new ODS from these Sources. Some of the ETL code was fairly complex with lots of look_up and complex decode statements.
One of the ETL transfered 23 mill rows in just 1:33 hrs, pretty neat huhhhh..It had 5 look-ups and 3 decode statements, one of them 5 levels deep.
Our Machine Config:
Windows Server 2003 R2 Enterprise x64 edition, 2 Quad CPU @ 3 GHZ, with 32 GB of Ram.
Oracle 10g target DB on UNIX and BODI server is on different machine.
This is just 1/8 of data that we migrated, just for testing purpose!
Any recommendations, we want to minimize our time.
Thanks, |
Given the server configuration, the numbers look pretty impressive. I wonder if you guys did any performance tuning at the database level though, just curiosity... You might get a much better result as I was loading 50 million records in less than an hour(the whole ETL job not only one dataflow, including staging, transformation and dwh population). |
|
| Back to top |
|
 |
1010 Principal Member


Joined: 19 Aug 2006
    Posts: 453 Location: Somewhere EveryWhere!

|
Posted: Tue Jun 22, 2010 10:40 pm Post subject: Re: What volumes of data do you move? |
|
|
k7nixen,
Just curious, when you say 50 mill < 1 hour, I am amazed.
Could you shed some light on your Architecture, i.e., BODI server, Source and Target DB etc....
Some finer point of any DF, that loads large data in target table.
I believe, we did optimize our DB for this load process. But, If there is any other setting that I am missing, I will be very glad to know the same. _________________ @Data is Delicious
++++++++++++++++
Where you see risk,
I see opportunity! |
|
| Back to top |
|
 |
wdaehn Forum Groupie


Joined: 17 Dec 2004
    
*6 Posts: 7635 Location: Germany

|
Posted: Wed Jun 23, 2010 4:16 am Post subject: Re: What volumes of data do you move? |
|
|
One hour for 50 million is 14'000rows/sec. Not bad, but not exceptional. In my tests I came close to 200'000rows/sec API bulkloading for a DWH fact table and comparable hardware.
To make the numbers comparable we need to sync up on
* API bulkload or regular load
* DoP, Number of Loaders, Partitioning
* width of the table loaded
* complexity of the transformations
Both of you should run the customer benchmark so we can compare! _________________ Werner Daehn
Product Manager Data Integrator
SAP Business Objects
PLEASE SUPPORT THIS: Run one ATL job and tell us the benchmark result. Details to be found here. |
|
| Back to top |
|
 |
1010 Principal Member


Joined: 19 Aug 2006
    Posts: 453 Location: Somewhere EveryWhere!

|
Posted: Wed Jun 23, 2010 11:36 am Post subject: Re: What volumes of data do you move? |
|
|
I think , I am replying in wrong section here...But it make sense to me, to put it here.
Source DB: Oracle 10g on Unix
Target DB : Oracle 10g on Unix
BODI 11.7.3.2 Server: Windows Server 2003 R2 Enterprise x64 edition, 2 Quad CPU @ 3 GHZ, with 32 GB of Ram.
Network, Target/Source to DI : Gigabit Etehrnet
All of the above are on different Machine.
Here are the numbers from the Test Environment, I ran this Job 5 times, and the number is from last run.
6104 7028 DATAFLOW 6/23/2010 12:17:57 PM Process to execute data flow <DF_Benchmark_read> is completed.
5808 6540 PRINTFN 6/23/2010 12:17:57 PM Time______________________________ 89 secounds
5180 1416 DATAFLOW 6/23/2010 12:22:49 PM Process to execute data flow <DF_Benchmark_API_bulkloader> is completed.
5808 6540 PRINTFN 6/23/2010 12:22:49 PM Time______________________________ 291 secounds
6896 6468 DATAFLOW 6/23/2010 12:29:29 PM Process to execute data flow <DF_Benchmark_regular_load> is completed.
5808 6540 PRINTFN 6/23/2010 12:29:29 PM Time______________________________ 388 secounds
4432 4600 DATAFLOW 6/23/2010 12:29:58 PM Process to execute data flow <DF_Benchmark_single_thread> is completed.
5808 6540 PRINTFN 6/23/2010 12:29:58 PM Time______________________________ 15 secounds
5040 5000 DATAFLOW 6/23/2010 12:30:26 PM Process to execute data flow <DF_Benchmark_lookup_DOP1> is completed.
5808 6540 PRINTFN 6/23/2010 12:30:26 PM Time______________________________ 27 secounds
5008 6900 DATAFLOW 6/23/2010 12:30:42 PM Process to execute data flow <DF_Benchmark_lookup_DOP10> is completed.
5808 6540 PRINTFN 6/23/2010 12:30:42 PM Time______________________________ 15 secounds
Thanks,
Any Suggestion! _________________ @Data is Delicious
++++++++++++++++
Where you see risk,
I see opportunity! |
|
| Back to top |
|
 |
1010 Principal Member


Joined: 19 Aug 2006
    Posts: 453 Location: Somewhere EveryWhere!

|
Posted: Mon Jul 12, 2010 11:13 am Post subject: Re: What volumes of data do you move? |
|
|
Werner,
After doing some optimization for one of my DF, I was able to get it under 25 mins to process 20 mill rows. i.e., 13334 rows /sec using API bulk load method, if I use Parallel loader it ran for 56 mins. Initially, it use to took like 2 hrs to load.
Still, I am not happy with this number, when I see ours (200'000 rows/sec ), would you mind posting this DF of ours, so that I can peek into it. I have to load around 250 Mill rows, when we go-live!
In your Test, where the Source/Target + BODI server on same Machine?; How many look_ups that Fact table had?
Thanks, _________________ @Data is Delicious
++++++++++++++++
Where you see risk,
I see opportunity! |
|
| Back to top |
|
 |
|
|