Thursday, February 9, 2012

Concurrent Development in Informatica Environment

This is something which came up for discussion at Informatica Forum today .Here is what I think on this . Do add your thoughts on this .


Make sure ETL detail design is over before you start concurrent development.
  • Create folders for different areas of the warehouse like Staging , Summery , Dimensions, Facts ,Shared components etc .
  • Within each folder you can further expand concurrent development by dividing different activities .Ex : One resource makes sure all the sources for mappings are imported/created , another resourcefocusses on all targets , One resource makes sure all the shared components are at one place(this activity needs to be done multiple times)
  • Once all the above steps are done developers can start developing mappings concurrently .So theinitial responsibility of the developers remains and on top of it they take add on responsibility to develop mappings for some dimension loads, fact loads or loads to staging tables etc .
For us the multiple folder approach along with some division of work at the beginning helped to move ahead .

Monday, August 15, 2011

Critique this model

We are building a prototype model . If it works well we will expand individual dimensions as add more dimensions as needed by business . At this point in time we have added the most important dimensions including the multi value ones . So if see some of the dimensions have got just the basic structure then don't worry as it's intentional . What I want from you is your feedback with respect to multi value dimensions as well as the way we have handled them along with any other feedback you would like to share with us . Once again thanks for all your time .
 

Wednesday, July 14, 2010

Dimensional Modeling and terms associated with it .

Dimensional modeling (DM) is the name of a logical design technique often used for Data Warehouses(A subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of decision-making
process)  or Data Marts (in its most simplistic form a data mart represent data from a single business process) . It is considered to be different from entity-relationship modeling (ER). Dimensional Modeling does not necessarily involve a relational database. The same modeling approach, at the logical level, can be used for any physical form, such as multidimensional database or even flat files. According to Dr. Kimball DM is a design technique for databases intended to support end-user queries in a data warehouse. It is oriented around understandability, contrary to database administration. According to him, although transaction-oriented ER(Entity Relationship) is very useful for the transaction capture, it should be avoided for end-user delivery.DM is very useful when retrieving data is very important compared to storing or inserting data.
Dimensional modeling always uses the concepts of FACTS (measures), and DIMENSIONS (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts .

Here I would like to introduce a new term called Star Schema . A DM is called a star schema if all dimension tables can be joined directly to the fact table. The following diagram1 shows a classic star schema


Example : In Diagram1 Dollars sold, dollars cost, units sold are all part of the fact where as items alike Clerk Name ,Product description are all part of different dimensions .




Dimensional models are built by business process area, e.g. store sales, inventory, claims,finance etc. Because the different business process areas share some but not all dimensions, efficiency in design, operation, and consistency, is achieved using conformed dimensions, i.e. using one copy of the shared dimension across subject areas. The term "conformed dimensions" was originated by Ralph Kimball.


Diagram1 : Sample Dimensional Model showing a Star Schema

Steps to build a dimensional model :
The dimensional model is build on a star-like schema, with dimensions surrounding the fact table. To build the schema, the following design model is used:
  • Choose the business process : Like sales , Inventory etc
  • Declare the Grain : This will determine how much data you are going to store
  • Identify the dimensions : Descriptive fields and which can't be aggregated
  • Identify the Fact : Measures which typically can be aggregated .
Associated exercise :
 Lets take a real life example associated with every ones daily life . We all juggle around our monthly budgets but still mostly we find at the end of the month that the budget could not satisfy all our needs . So at that point we need to answer questions like how much we spent where so as plan our monthly budgets.
-In this case what's the name of the process ? Let's call it monthly budget or say monthly expenses
-What is the granularity of data ? We are going to capture data every day so lowest granularity is "Daily"
-What are the dimensions involved ? Time , Product , Services , WhoSpent etc .
-What are the facts ? Dollar spent , Units bought  ,Dollars earned etc
Now prepare a DM for this scenario .

Let me introduce  a new term today . It's called "Snow Flaking" .
We have to understand multiple things related to a snowflake schema . First of what is it , then how is it different from a regular star schema and finally what is it's purpose .

A snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. Closely related to the star schema, the snowflake schema is represented by centralized fact tables which are connected to multiple dimensions . In the snowflake schema, however, dimensions are normalized into multiple related tables whereas the star schema's dimensions are denormalized with each dimension being represented by a single table.Remember that the "snowflaking" effect only affects the dimension tables and not the fact tables.



 
 
 
 
 
 
 
 
Diagram2 : Dimensional Model showing a Snowflake  Schema

Tuesday, July 13, 2010

Informatica Cloud now with address validation

Address Validation is now available and can be used to strengthen the integrity of sales, marketing and any other mission critical data. This cloud based data cleansing service is an easy, low maintenance, low cost alternative to other on-premise solutions.

Here is  a link to a brief online demonstration of Address Validationzed webinar tailored to your specific needs .


You will find the online demonstration by using this link:
http://www.youtube.com/watch?v=shoNiVq77Ys&feature=digest