News Ticker

Menu

Aggregator Transformation

Aggregator Transformation in Informatica

Aggregator transformation is an active and Connected transformation used to perform calculations such as sums, averages, counts on groups of data. The integration service stores the data group and row data in aggregate cache. The Aggregator Transformation provides more advantages than the SQL, you can use conditional clauses to filter rows.

Components of the Aggregator Transformation
  1. Aggregate expression
  2. Group by port
  3. Sorted Input
  4. Aggregate cache
1) Aggregate Expressions
  • Entered in an output port.
  • Can include non-aggregate expressions and conditional clauses.
The transformation language includes the following aggregate functions:
  • AVG, COUNT, MAX, MIN, SUM
  • FIRST, LAST
  • MEDIAN, PERCENTILE, STDDEV, VARIANCE
Single Level Aggregate Function: MAX(SAL)
Nested Aggregate Function: MAX( COUNT( ITEM ))
Nested Aggregate Functions
  • In Aggregator transformation, there can be multiple single level functions or multiple nested functions.
  • An Aggregator transformation cannot have both types of functions together.
  • MAX( COUNT( ITEM )) is correct.
  • MIN(MAX( COUNT( ITEM ))) is not correct. It can also include one aggregate function nested within another aggregate function
Conditional Clauses
We can use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
  • SUM( COMMISSION, COMMISSION > QUOTA )
Non-Aggregate Functions
We can also use non-aggregate functions in the aggregate expression.
  • IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))
2) Group By Ports
  • Indicates how to create groups.
  • When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.
The Aggregator transformation allows us to define groups for aggregations, rather than performing the aggregation across all input data.
For example, we can find Maximum Salary for every Department.
  • In Aggregator Transformation, Open Ports tab and select Group By as needed.
3) Using Sorted Input
  • Use to improve session performance.
  • To use sorted input, we must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.
  • When we use this option, we tell Aggregator that data coming to it is already sorted.
  • We check the Sorted Input Option in Properties Tab of the transformation.
  • If the option is checked but we are not passing sorted data to the transformation, then the session fails.
4) Aggregator Caches
  • The Power Center Server stores data in the aggregate cache until it completes Aggregate calculations.
  • It stores group values in an index cache and row data in the data cache. If the Power Center Server requires more space, it stores overflow values in cache files.
Note: The Power Center Server uses memory to process an Aggregator transformation with sorted ports. It does not use cache memory. We do not need to configure cache memory for Aggregator transformations that use sorted ports.
1) Aggregator Index Cache:
The index cache holds group information from the group by ports. If we are using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.
  • All Group By Columns are in AGGREGATOR INDEX CACHE. Ex. DEPTNO
2) Aggregator Data Cache:
DATA CACHE is generally larger than the AGGREGATOR INDEX CACHE.
Columns in Data Cache:
  • Variable ports if any
  • Non group by input/output ports.
  • Non group by input ports used in non-aggregate output expression.
  • Port containing aggregate function

clip_image030
1) Example: To calculate MAX, MIN, AVG and SUM of salary of EMP table.
  • EMP will be source table.
  • Create a target table EMP_AGG_EXAMPLE in target designer. Table should contain DEPTNO, MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL
  • Create the shortcuts in your folder.
Creating Mapping:
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give mapping name. Ex: m_agg_example
4. Drag EMP from source in mapping.
5. Click Transformation -> Create -> Select AGGREGATOR from list. Give name and click Create. Now click done.
6. Pass SAL and DEPTNO only from SQ_EMP to AGGREGATOR Transformation.
7. Edit AGGREGATOR Transformation. Go to Ports Tab
8. Create 4 output ports: OUT_MAX_SAL, OUT_MIN_SAL, OUT_AVG_SAL,
OUT_SUM_SAL
9. Open Expression Editor one by one for all output ports and give the
calculations. Ex: MAX(SAL), MIN(SAL), AVG(SAL),SUM(SAL)
10. Click Apply -> Ok.
11. Drag target table now.
12. Connect the output ports from Rank to target table.
13. Click Mapping -> Validate
14. Repository -> Save
  • Create Session and Workflow as described earlier. Run the Workflow and see the data in target table.
  • Make sure to give connection information for all tables.

Creating an Aggregator Transformation:

Follow the below steps to create an aggregator transformation
  • Go to the Mapping Designer, click on transformation in the toolbar -> create.
  • Select the Aggregator transformation, enter the name and click create. Then click Done. This will create an aggregator transformation without ports.
  • To create ports, you can either drag the ports to the aggregator transformation or create in the ports tab of the aggregator.

 Configuring the aggregator transformation:

You can configure the following components in aggregator transformation
  • Aggregate Cache: The integration service stores the group values in the index cache and row data in the data cache.
  • Aggregate Expression: You can enter expressions in the output port or variable port.
  • Group by Port: This tells the integration service how to create groups. You can configure input, input/output or variable ports for the group.
  • Sorted Input: This option can be used to improve the session performance. You can use this option only when the input to the aggregator transformation in sorted on group by ports.

Properties of Aggregator Transformation:

The below table illustrates the properties of aggregator transformation
PropertyDescription
Cache DirectoryThe Integration Service creates the index and data cache files.
Tracing LevelAmount of detail displayed in the session log for this transformation.
Sorted InputIndicates input data is already sorted by groups. Select this option only if the input to the Aggregator transformation is sorted.
Aggregator Data Cache SizeDefault cache size is 2,000,000 bytes. Data cache stores row data.
Aggregator Index Cache SizeDefault cache size is 1,000,000 bytes. Index cache stores group by ports data
Transformation ScopeSpecifies how the Integration Service applies the transformation logic to incoming data

Group By Ports:

The integration service performs aggregate calculations and produces one row for each group. If you do not specify any group by ports, the integration service returns one row for all input rows. By default, the integration service returns the last row received for each group along with the result of aggregation. By using the FIRST function, you can specify the integration service to return the first row of the group.

Aggregate Expressions:

You can create the aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions. You can use the following aggregate functions in the Aggregator transformation,
AVG 
COUNT 
FIRST 
LAST 
MAX 
MEDIAN 
MIN 
PERCENTILE 
STDDEV 
SUM 
VARIANCE

Examples: SUM(sales), AVG(salary)

Nested Aggregate Functions:

You can nest one aggregate function within another aggregate function. You can either use single-level aggregate functions or multiple nested functions in an aggregate transformation. You cannot use both single-level and nested aggregate functions in an aggregator transformation. The Mapping designer marks the mapping as invalid if an aggregator transformation contains both single-level and nested aggregate functions. If you want to create both single-level and nested aggregate functions, create separate aggregate transformations.
Examples: MAX(SUM(sales))

Conditional clauses:

You can reduce the number of rows processed in the aggregation by specifying a conditional clause.
Example: SUM(salary, slaray>1000)

This will include only the salaries which are greater than 1000 in the SUM calculation.

Non Conditional clauses:

You can also use non-aggregate functions in aggregator transformation.
Example: IIF( SUM(sales) <20000, SUM(sales),0)

Note: By default, the Integration Service treats null values as NULL in aggregate functions. You can change this by configuring the integration service. 

Incremental Aggregation: 

After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the Integration Service performs incremental aggregation, it passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally. 

Sorted Input:

You can improve the performance of aggregator transformation by specifying the sorted input. The Integration Service assumes all the data is sorted by group and it performs aggregate calculations as it reads rows for a group. If you specify the sorted input option without actually sorting the data, then integration service fails the session.

Share This:

Post Tags:

Tharun Katanguru

I'm Tharun Katanguru. A full time web designer and Data Engineer. I enjoy to make modern template. I love create blogger template and write about Web Design, Data Warehousing and Data Management. Now I'm working as a Jr Etl Developer.

No Comment to " Aggregator Transformation "

  • To add an Emoticons Show Icons
  • To add code Use [pre]code here[/pre]
  • To add an Image Use [img]IMAGE-URL-HERE[/img]
  • To add Youtube video just paste a video link like http://www.youtube.com/watch?v=0x_gnfpL3RM