Top 20 Informatica Performance Tuning Tips

Saturday, May 21, 2016 / No Comments

Here are a few points which will get you started with Informatica Power center Performance Tuning .Some of these tips are very general in nature please consult your project members before implementing them in your projects.

1) Optimize the Input Query if it’s Relational (e.g. Oracle table) source –

Reduce no.of rows queried by using where conditions instead of Filter/Router transformations later in the mapping. Since you choose fewer rows to start with, your mapping will run faster
Maker sure appropriate Indexes are defined on necessary columns and analyzed. Indexes must be especially defined on the columns in the ‘where’ clause of your input query.

Eliminate columns that you do not need for your transformations

Use ‘Hints’ as necessary.
Use sort order based on need

Note: For # i results will vary based on how the table columns are Indexed/queried…etc.

2) Use the Filter transformation as close to the SQ Transformation as possible.

3) Use sorted input data for Aggregator or Joiner Transformation as necessary.

4) Eliminate un-used columns and redundant code in all the necessary transformations.

5) Use Local variables as necessary to improve the performance.

6) Reduce the amount of data caching in Aggregator.

7) Use parameterized input query and file for flexibility.

8) Changed memory related settings at workflow/session level as necessary.

9) When use multiple condition columns in Joiner/Lookup transformation make sure to

use numeric data type column as first condition.

10) Use persistent cache if possible in Lookup transformations.

11) Go through the sessions Logs CLOSELY to find out any issues and change accordingly

12) Use overwrite queries in Lookup transformation to reduce the amount of data cached.

13) Make sure the data type and sizes are consistent throughout the mapping as much

as possible.

14) For Target Loads use Bulk Load as and when possible.

15) For Target Loads use SQL * Load with DIRECT and UNRECOVERABLE option for large volume of data loads.

16) Use Partitioning options as and when possible. This is true for both Informatica and Oracle. For Oracle, a rule of thumb is to have around 10M rows per partition

17) Make sure that there are NO Indexes for all “Pre/Post DELETE SQLs” used in all

the mappings/Workflows.

18) Use datatype conversions where ever possible

1) Use ZIPCODE as Integer instead of Character this improves the speed of the lookup Transformation comparisons.
2) Use the port to port datatype conversions to improve the session performance.

19) Use operators instead of functions

e.g: For concatenation use “||” instead of CONCAT

20) Reduce the amount of data writing to logs for each transformation by setting

log settings to Terse as necessary only.

21) Use Re-usable transformations and Mapplets where ever possible.

22) In Joiners use less no.of rows as Master ports.

23) Perform joins in db rather than using Joiner transformation where ever possible.

Dynamic Partitioning

/ No Comments

Dynamic Partitioning to Increase Parallelism Based on Resources Availability

Informatica PowerCenter session partition can be used to process data in parallel and achieve faster data delivery. Using Dynamic Session Partitioning capability, PowerCenter can dynamically decide the degree of parallelism. The Integration Service scales the number of session partitions at run time based on factors such as source database partitions or the number of CPUs on the node resulting significant performance improvement.

Dynamic Partitioning Methods

Based on source partitioning : Determines the number of partitions using database partition information. The number of partitions is the maximum of the number of partitions at the source. The Integration Service can decide the number of session partitions at run time based different factors.

Based on number of CPUs : Sets the number of partitions equal to the number of CPUs on the node that prepares the session. If the session is configured to run on a grid, dynamic partitioning sets the number of partitions equal to the number of CPUs on the node that prepares the session multiplied by the number of nodes in the grid.
Based on number of nodes in grid : Sets the partitions to the number of nodes in the grid running the session. If you configure this option for sessions that do not run on a grid, the session runs in one partition and logs a message in the session log.
Based on number of partitions : Sets the partitions to a number that you define in the Number of Partitions attribute. Use the $DynamicPartitionCount session parameter, or enter a number greater than 1.

Note : Do not configure dynamic partitioning for a session that contains manual partitions. If you set dynamic partitioning and you manually partition the session, the session will be invalid.

Session Configuration With Dynamic Partitioning

Session can be setup for dynamic partition from Config Object Tab in the session properties. You can choose the available partitioning options from the drop down list as shown in below image.

Informatica PowerCenter Dynamic Partitioning

If the dynamic partition is setup based on "Based on number of partitions", "Number of Partitions" property can be set up using the session parameter $DynamicPartitionCount, or enter a number greater than 1 as shown in below image.

Dynamic Partitioning with Different Partition Types

When using different partition types with dynamic partition, following rules must be considered.

Pass-through partitioning : If you change the number of partitions at a partition point, the number of partitions in each pipeline stage changes. If you use pass-through partitioning with a relational source, the session runs in one partition in the stage.

Key range partitioning : You must define a closed range of numbers or date keys to use dynamic partitioning. Dynamic partitioning does not scale partitions with key range partitioning on relational targets.

Database partitioning : When you use database partitioning, the Integration Service creates session partitions based on the source database partitions. This can be used only with Oracle and IBM DB2 sources.

Hash auto-keys, Hash user keys, Round robin : Use hash user keys, hash auto-keys, and round-robin partition types to distribute rows with dynamic partitioning. Use hash user keys and hash auto-keys partitioning when you want the Integration Service to distribute rows to the partitions by group. Use round-robin partitioning when you want the Integration Service to distribute rows evenly to partitions.

Rules and Guidelines for Dynamic Partitioning

Use the following rules and guidelines with dynamic partitioning.

Dynamic partitioning uses the same connection for each partition.
You cannot use dynamic partitioning with XML sources and targets.
Sessions that use SFTP fail if you enable dynamic partitioning.
When you set dynamic partitioning to a value other than disabled, and you manually partition the session on the Mapping tab, you invalidate the session.

Using Dynamic Partitioning with Partition Types

The following rules apply to using dynamic partitioning with different partition types.

Pass-through partitioning :- If you change the number of partitions at a partition point, the number of partitions in each pipeline stage changes. If you use pass-through partitioning with a relational source, the session runs in one partition in the stage.

Key range partitioning :- You must define a closed range of numbers or date keys to use dynamic partitioning. The keys must be numeric or date datatypes. Dynamic partitioning does not scale partitions with key range partitioning on relational targets.

Database partitioning :- When you use database partitioning, the Integration Service creates session partitions based on the source database partitions. Use database partitioning with Oracle and IBM DB2 sources.

Hash auto-keys, hash user keys, or round-robin :- Use hash user keys, hash auto-keys, and round-robin partition types to distribute rows with dynamic partitioning. Use hash user keys and hash auto-keys partitioning when you want the Integration Service to distribute rows to the partitions by group. Use round-robin partitioning when you want the Integration Service to distribute rows evenly to partitions.

Hope you enjoyed this article and it is informative. Please leave us your comments and feedback.

Partitioning for Paralell Processing

/ No Comments

Partitioning for Paralell Processing

In addition to a better ETL design, it is obvious to have a session optimized with no bottlenecks to get the best session performance. After optimizing the session performance, we can further improve the performance by exploiting the under utilized hardware power. This refers to parallel processing and we can achieve this in Informatica PowerCenter using Partitioning Sessions.

What is Session Partitioning

Partition Tutorial Series
Part I : Partition Introduction.
Part II : Partition Implementation.
Part III : Dynamic Partition.

The Informatica PowerCenter Partitioning Option increases the performance of PowerCenter through parallel data processing. Partitioning option will let you split the large data set into smaller subsets which can be processed in parallel to get a better session performance.

Partitioning Terminology

Lets understand some partitioning terminology before we get into mode details.

Partition : A partition is a subset of the data that executes in a single thread.
Number of partitions : We can divide the data set into smaller subset by increasing the number of partitions. When we add partitions, we increase the number of processing threads, which can improve session performance.
Stage : Stage is the portion of a pipeline, which is implemented at run time as a thread.
Partition Point : This is the boundary between two stages and divide the pipeline into stages. Partition point is always associated with a transformation.
Partition Type : It is an algorithm for distributing data among partitions, which is always associated with a partition point. The partition type controls how the Integration Service distributes data among partitions at partition points.

Below image shows the points we discussed above. We have three partitions and three partition points in below session demo.

Informatica PowerCenter Partitioning for Parallel Processing and Faster Delivery

Type of Session Partitions

Different type of partition algorithms are available.

Database partitioning : The Integration Service queries the database system for table partition information. It reads partitioned data from the corresponding nodes in the database.
Round-Robin Partitioning : Using this partitioning algorithm, the Integration service distributes data evenly among all partitions. Use round-robin partitioning when you need to distribute rows evenly and do not need to group data among partitions.
Hash Auto-Keys Partitioning : The PowerCenter Server uses a hash function to group rows of data among partitions. When hash auto-key partition is used, the Integration Service uses all grouped or sorted ports as a compound partition key. You can use hash auto-keys partitioning at or before Rank, Sorter, and unsorted Aggregator transformations to ensure that rows are grouped properly before they enter these transformations.
Hash User-Keys Partitioning : Hash user keys. The Integration Service uses a hash function to group rows of data among partitions based on a user-defined partition key. You choose the ports that define the partition key.
Key Range Partitioning : With this type of partitioning, you specify one or more ports to form a compound partition key for a source or target. The Integration Service then passes data to each partition depending on the ranges you specify for each port.
Pass-through Partitioning : In this type of partitioning, the Integration Service passes all rows at one partition point to the next partition point without redistributing them.

Setting Up Session Partitions

Lets see what is required to setup a session with partition enabled.

We can invoke the user interface for session partition as shown in below image from your session using the menuMapping -> Partitions.

The interface will let you Add/Modify Partitions, Partition Points and Choose the type of partition Algorithm. Choose any transformation from the mapping and the "Add Partition Point" button will let you add additional partition points.

Choose any transformation from the mapping and the "Delete Partition Point" or "Edit Partition Point" button will let you modify partition points.

The "Add/Delete/Edit Partition Point" opens up an additional window which let you modify the partition and choose the type of the partition algorithm as shown in below image.

Hope this article is informative and useful for your projects. Please leave your comments and feedback.

Advanced Push Down

/ No Comments

Informatica PowerCenter Pushdown Optimization

Informatica Pushdown Optimization Option increases performance by providing the flexibility to push transformation processing to the most appropriate processing resource. Using Pushdown Optimization, data transformation logic can be pushed to source database or target database or through the PowerCenter server. This gives the option for the ETL architect to choose the best of the available resources for data processing.

What is Pushdown Optimization

Pushdown Optimization Option enables data transformation processing, to be pushed down into any relational database to make the best use of database processing power. It converts the transformation logic into SQL statements, which can directly execute on database. This minimizes the need of moving data between servers and utilizes the power of database engine.

How Pushdown Optimization Works

When you run a session configured for pushdown optimization, the Integration Service analyzes the mapping and transformations to determine the transformation logic it can push to the database. If the mapping contains a mapplet, the Integration Service expands the mapplet and treats the transformations in the mapplet as part of the parent mapping. The Integration Service converts the transformation logic into SQL statements and sends to the source or the target database to perform the data transformation. The amount of transformation logic one can push to the database depends on the database, transformation logic, and mapping and session configuration.

Informatica PowerCenter Full Pushdown Optimization

Different Type Pushdown Optimization

You can configure pushdown optimization in the following ways.

Source-side pushdown optimization
Target-side pushdown optimization
Full pushdown optimization

Source-side pushdown optimization

When you run a session configured for source-side pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the database.

The Integration Service generates a SELECT statement based on the transformation logic for each transformation it can push to the database. When you run the session, the Integration Service pushes all transformation logic that is valid to push to the database by executing the generated SQL statement. Then, it reads the results of this SQL statement and continues to run the session.

Informatica PowerCenter Source Pushdown Optimization

If you run a session that contains an SQL override or lookup override, the Integration Service generates a view based on the override. It then generates a SELECT statement and runs the SELECT statement against this view. When the session completes, the Integration Service drops the view from the database.

Target-side pushdown optimization

Informatica PowerCenter Target Pushdown Optimization

The Integration Service generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the database, starting with the first transformation in the pipeline it can push to the database. The Integration Service processes the transformation logic up to the point that it can push the transformation logic to the target database. Then, it executes the generated SQL.

Full pushdown optimization

The Integration Service pushes as much transformation logic as possible to both source and target databases. If you configure a session for full pushdown optimization, and the Integration Service cannot push all the transformation logic to the database, it performs partial pushdown optimization instead.

To use full pushdown optimization, the source and target must be on the same database. When you run a session configured for full pushdown optimization, the Integration Service analyzes the mapping starting with the source and analyzes each transformation in the pipeline until it analyzes the target. It generates SQL statements that are executed against the source and target database based on the transformation logic it can push to the database. If the session contains an SQL override or lookup override, the Integration Service generates a view and runs a SELECT statement against this view.

Configuring Session for Pushdown Optimization

A session can be configured to use pushdown optimization from informatica powercenter workflow manager. You can open the session and choose the Source, Target or Full pushdown optimization as shown in below image.

Informatica PowerCenter Pushdown Optimization setting

You can additionally choose few options to control how integration service push data transformation into SQL statements. Below screen shot shows the available options.

Allow Temporary View for Pushdown. Allows the Integration Service to create temporary view objects in the database when it pushes the session to the database.
Allow Temporary Sequence for Pushdown. Allows the Integration Service to create temporary sequence objects in the database.
Allow Pushdown for User Incompatible Connections. Indicates that the database user of the active database has read permission on the idle databases.

Using Pushdown Optimization Viewer

Use the Pushdown Optimization Viewer to examine the transformations that can be pushed to the database. Select a pushdown option or pushdown group in the Pushdown Optimization Viewer to view the corresponding SQL statement that is generated for the specified selections.

You can invoke the viewer from highlighted 'Pushdown Optimization' as shown in below image.

Informatica PowerCenter Pushdown Optimization viewer

Pushdown optimizer viewer pops up in a new window and it shows how integration service converts the data transformation logic into SQL statement for a particular mapping. When you select a pushdown option or pushdown group in the viewer, you do not change the pushdown configuration. To change the configuration, we must update the pushdown option in the session properties.

Things to Consider before Using Pushdown Optimization

When you run a session for full pushdown optimization, the database must run a long transaction, if the session contains a large quantity of data. Consider the following database performance issues when you generate a long transaction.

A long transaction uses more database resources.
A long transaction locks the database for longer periods of time, and thereby reduces the database concurrency and increases the likelihood of deadlock.
A long transaction can increase the likelihood that an unexpected event may occur.

Hope you enjoyed this article and it is informative. Please leave us your comments and feedback
.

Push Down Optimization

/ No Comments

PUSH DOWN OPTIMIZATION

You can push transformation logic to the source or target database using pushdown optimization. When you run a session configured for pushdown optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The source or target database executes the SQL queries to process the transformations.

The amount of transformation logic you can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service processes all transformation logic that it cannot push to a database.

Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to pushdown optimization.

The following figure shows a mapping containing transformation logic that can be pushed to the source database:

This mapping contains an Expression transformation that creates an item ID based on the store number 5419 and the item ID from the source. To push the transformation logic to the database, the Integration Service generates the following SQL statement:

INSERT INTO T_ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) SELECT CAST((CASE WHEN 5419 IS NULL THEN '' ELSE 5419 END) + '_' + (CASE WHEN ITEMS.ITEM_ID IS NULL THEN '' ELSE ITEMS.ITEM_ID END) AS INTEGER), ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS2 ITEMS

The Integration Service generates an INSERT SELECT statement to retrieve the ID, name, and description values from the source table, create new item IDs, and insert the values into the ITEM_ID, ITEM_NAME, and ITEM_DESC columns in the target table. It concatenates the store number 5419, an underscore, and the original ITEM ID to get the new item ID.

Pushdown Optimization Types

You can configure the following types of pushdown optimization:

Source-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the source database.
Target-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the target database.
Full pushdown optimization. The Integration Service attempts to push all transformation logic to the target database. If the Integration Service cannot push all transformation logic to the database, it performs both source-side and target-side pushdown optimization.

Running Source-Side Pushdown Optimization Sessions

The Integration Service generates and executes a SELECT statement based on the transformation logic for each transformation it can push to the database. Then, it reads the results of this SQL query and processes the remaining transformations.

Running Target-Side Pushdown Optimization Sessions

When you run a session configured for target-side pushdown optimization, the Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the target database. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the target database. The Integration Service processes the transformation logic up to the point that it can push the transformation logic to the database. Then, it executes the generated SQL on the Target database.

Running Full Pushdown Optimization Sessions

To use full pushdown optimization, the source and target databases must be in the same relational database management system. When you run a session configured for full pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the target database. It generates and executes SQL statements against the source or target based on the transformation logic it can push to the database.

When you run a session with large quantities of data and full pushdown optimization, the database server must run a long transaction. Consider the following database performance issues when you generate a long transaction:

A long transaction uses more database resources.
A long transaction locks the database for longer periods of time. This reduces database concurrency and increases the likelihood of deadlock.
A long transaction increases the likelihood of an unexpected event. To minimize database performance issues for long transactions, consider using source-side or target-side pushdown optimization.

Rules and Guidelines for Functions in Pushdown Optimization

Use the following rules and guidelines when pushing functions to a database:

If you use ADD_TO_DATE in transformation logic to change days, hours, minutes, or seconds, you cannot push the function to a Teradata database.
When you push LAST_DAY () to Oracle, Oracle returns the date up to the second. If the input date contains sub seconds, Oracle trims the date to the second.
When you push LTRIM, RTRIM, or SOUNDEX to a database, the database treats the argument (' ') as NULL, but the Integration Service treats the argument (' ') as spaces.
An IBM DB2 database and the Integration Service produce different results for STDDEV and VARIANCE. IBM DB2 uses a different algorithm than other databases to calculate STDDEV and VARIANCE.
When you push SYSDATE or SYSTIMESTAMP to the database, the database server returns the timestamp in the time zone of the database server, not the Integration Service.
If you push SYSTIMESTAMP to an IBM DB2 or a Sybase database, and you specify the format for SYSTIMESTAMP, the database ignores the format and returns the complete time stamp.
You can push SYSTIMESTAMP (‘SS’) to a Netezza database, but not SYSTIMESTAMP (‘MS’) or SYSTIMESTAMP (‘US’).
When you push TO_CHAR (DATE) or TO_DATE () to Netezza, dates with sub second precision must be in the YYYY-MM-DD HH24: MI: SS.US format. If the format is different, the Integration Service does not push the function to Netezza.

Optimizing the Bottlenecks

/ No Comments

Optimizing the Bottlenecks

If the source is a flat file, ensure that the flat file is local to the Informatica server. If source is a relational table, then try not to use synonyms or aliases.
If the source is a flat file, reduce the number of bytes (By default it is 1024 bytes per line) the Informatica reads per line. If we do this, we can decrease the Line Sequential Buffer Length setting of the session properties.
If possible, give a conditional query in the source qualifier so that the records are filtered off as soon as possible in the process.
In the source qualifier, if the query has ORDER BY or GROUP BY, then create an index on the source table and order by the index field of the source table.

PERFORMANCE TUNING OF TARGETS

If the target is a flat file, ensure that the flat file is local to the Informatica server. If target is a relational table, then try not to use synonyms or aliases.

Use bulk load whenever possible.
Increase the commit level.
Drop constraints and indexes of the table before loading.

PERFORMANCE TUNING OF MAPPINGS

Mapping helps to channel the flow of data from source to target with all the transformations in between. Mapping is the skeleton of Informatica loading process.

Avoid executing major sql queries from mapplets or mappings.
Use optimized queries when we are using them.
Reduce the number of transformations in the mapping. Active transformations like rank, joiner, filter, aggregator etc should be used as less as possible.
Remove all the unnecessary links between the transformations from mapping.
If a single mapping contains many targets, then dividing them into separate mappings can improve performance.
If we need to use a single source more than once in a mapping, then keep only one source and source qualifier in the mapping. Then create different data flows as required into different targets or same target.
If a session joins many source tables in one source qualifier, then an optimizing query will improve performance.
In the sql query that Informatica generates, ORDERBY will be present. Remove the ORDER BY clause if not needed or at least reduce the number of column names in that list. For better performance it is best to order by the index field of that table.
Combine the mappings that use same set of source data.
On a mapping, field with the same information should be given the same type and length throughout the mapping. Otherwise time will be spent on field conversions.
Instead of doing complex calculation in query, use an expression transformer and do the calculation in the mapping.
If data is passing through multiple staging areas, removing the staging area will increase performance.
Stored procedures reduce performance. Try to keep the stored procedures simple in the mappings.
Unnecessary data type conversions should be avoided since the data type conversions impact performance.
Transformation errors result in performance degradation. Try running the mapping after removing all transformations. If it is taking significantly less time than with the transformations, then we have to fine-tune the transformation.
Keep database interactions as less as possible.

PERFORMANCE TUNING OF SESSIONS

A session specifies the location from where the data is to be taken, where the transformations are done and where the data is to be loaded. It has various properties that help us to schedule and run the job in the way we want.

Partition the session: This creates many connections to the source and target, and loads data in parallel pipelines. Each pipeline will be independent of the other. But the performance of the session will not improve if the number of records is less. Also the performance will not improve if it does updates and deletes. So session partitioning should be used only if the volume of data is huge and the job is mainly insertion of data.
Run the sessions in parallel rather than serial to gain time, if they are independent of each other.
Drop constraints and indexes before we run session. Rebuild them after the session run completes. Dropping can be done in pre session script and Rebuilding in post session script. But if data is too much, dropping indexes and then rebuilding them etc. will be not possible. In such cases, stage all data, pre-create the index, use a transportable table space and then load into database.
Use bulk loading, external loading etc. Bulk loading can be used only if the table does not have an index.
In a session we have options to ‘Treat rows as ‘Data Driven, Insert, Update and Delete’. If update strategies are used, then we have to keep it as ‘Data Driven’. But when the session does only insertion of rows into target table, it has to be kept as ‘Insert’ to improve performance.
Increase the database commit level (The point at which the Informatica server is set to commit data to the target table. For e.g. commit level can be set at every every 50,000 records)
By avoiding built in functions as much as possible, we can improve the performance. E.g. For concatenation, the operator ‘||’ is faster than the function CONCAT (). So use operators instead of functions, where possible. The functions like IS_SPACES (), IS_NUMBER (), IFF (), DECODE () etc. reduce the performance to a big extent in this order. Preference should be in the opposite order.
String functions like substring, ltrim, and rtrim reduce the performance. In the sources, use delimited strings in case the source flat files or use varchar data type.
Manipulating high precision data types will slow down Informatica server. So disable ‘high precision’.
Localize all source and target tables, stored procedures, views, sequences etc. Try not to connect across synonyms. Synonyms and aliases slow down the performance.

DATABASE OPTIMISATION

To gain the best Informatica performance, the database tables, stored procedures and queries used in Informatica should be tuned well.

If the source and target are flat files, then they should be present in the system in which the Informatica server is present.
Increase the network packet size.
The performance of the Informatica server is related to network connections.Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance. So avoid network connections.
Optimize target databases.

Identification of Bottlenecks

/ No Comments

IDENTIFICATION OF BOTTLENECKS

IDENTIFICATION OF BOTTLENECKS

Performance of Informatica is dependant on the performance of its several components like database, network, transformations, mappings, sessions etc. To tune the performance of Informatica, we have to identify the bottleneck first.

Bottleneck may be present in source, target, transformations, mapping, session,database or network. It is best to identify performance issue in components in the order source, target, transformations, mapping and session. After identifying the bottleneck, apply the tuning mechanisms in whichever way they are applicable to the project.

Identify bottleneck in Source

If source is a relational table, put a filter transformation in the mapping, just after source qualifier; make the condition of filter to FALSE. So all records will be filtered off and none will proceed to other parts of the mapping.In original case, without the test filter, total time taken is as follows:-

Total Time = time taken by (source + transformations + target load)

Now because of filter, Total Time = time taken by source

So if source was fine, then in the latter case, session should take less time. Still if the session takes near equal time as former case, then there is a source bottleneck.

Identify bottleneck in Target

If the target is a relational table, then substitute it with a flat file and run the session. If the time taken now is very much less than the time taken for the session to load to table, then the target table is the bottleneck.

Identify bottleneck in Transformation

Remove the transformation from the mapping and run it. Note the time taken.Then put the transformation back and run the mapping again. If the time taken now is significantly more than previous time, then the transformation is the bottleneck.

But removal of transformation for testing can be a pain for the developer since that might require further changes for the session to get into the ‘working mode’.

So we can put filter with the FALSE condition just after the transformation and run the session. If the session run takes equal time with and without this test filter,then transformation is the bottleneck.

Identify bottleneck in sessions

We can use the session log to identify whether the source, target or transformations are the performance bottleneck. Session logs contain thread summary records like the following:-

MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_test_all_text_data] has completed: Total Run Time =[11.703201] secs, Total Idle Time = [9.560945] secs, Busy Percentage =[18.304876].

MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_test_all_text_data] has completed: Total Run Time = [11.764368] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].

If busy percentage is 100, then that part is the bottleneck.

Basically we have to rely on thread statistics to identify the cause of performance issues. Once the ‘Collect Performance Data’ option (In session ‘Properties’ tab) is enabled, all the performance related information would appear in the log created by the session.

Subscribe to: Posts ( Atom )

News Ticker

Recent Posts

Latest Post

Entertainment

Business

Technology

Lifestyle

Sports

Gallery

Recent Posts

Top 20 Informatica Performance Tuning Tips

Dynamic Partitioning

Dynamic Partitioning to Increase Parallelism Based on Resources Availability

Dynamic Partitioning Methods

Session Configuration With Dynamic Partitioning

Dynamic Partitioning with Different Partition Types

Rules and Guidelines for Dynamic Partitioning

Using Dynamic Partitioning with Partition Types

Partitioning for Paralell Processing

What is Session Partitioning

Partitioning Terminology

Type of Session Partitions

Setting Up Session Partitions

Advanced Push Down

Informatica PowerCenter Pushdown Optimization

What is Pushdown Optimization

How Pushdown Optimization Works

Different Type Pushdown Optimization

Source-side pushdown optimization

Target-side pushdown optimization

Full pushdown optimization

Configuring Session for Pushdown Optimization

Using Pushdown Optimization Viewer

Things to Consider before Using Pushdown Optimization

Push Down Optimization

PUSH DOWN OPTIMIZATION

Optimizing the Bottlenecks

Optimizing the Bottlenecks

Identification of Bottlenecks

IDENTIFICATION OF BOTTLENECKS