Wednesday, July 14, 2010

Dimensional Modeling and terms associated with it .

Dimensional modeling (DM) is the name of a logical design technique often used for Data Warehouses(A subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of decision-making
process)  or Data Marts (in its most simplistic form a data mart represent data from a single business process) . It is considered to be different from entity-relationship modeling (ER). Dimensional Modeling does not necessarily involve a relational database. The same modeling approach, at the logical level, can be used for any physical form, such as multidimensional database or even flat files. According to Dr. Kimball DM is a design technique for databases intended to support end-user queries in a data warehouse. It is oriented around understandability, contrary to database administration. According to him, although transaction-oriented ER(Entity Relationship) is very useful for the transaction capture, it should be avoided for end-user delivery.DM is very useful when retrieving data is very important compared to storing or inserting data.
Dimensional modeling always uses the concepts of FACTS (measures), and DIMENSIONS (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts .

Here I would like to introduce a new term called Star Schema . A DM is called a star schema if all dimension tables can be joined directly to the fact table. The following diagram1 shows a classic star schema


Example : In Diagram1 Dollars sold, dollars cost, units sold are all part of the fact where as items alike Clerk Name ,Product description are all part of different dimensions .




Dimensional models are built by business process area, e.g. store sales, inventory, claims,finance etc. Because the different business process areas share some but not all dimensions, efficiency in design, operation, and consistency, is achieved using conformed dimensions, i.e. using one copy of the shared dimension across subject areas. The term "conformed dimensions" was originated by Ralph Kimball.


Diagram1 : Sample Dimensional Model showing a Star Schema

Steps to build a dimensional model :
The dimensional model is build on a star-like schema, with dimensions surrounding the fact table. To build the schema, the following design model is used:
  • Choose the business process : Like sales , Inventory etc
  • Declare the Grain : This will determine how much data you are going to store
  • Identify the dimensions : Descriptive fields and which can't be aggregated
  • Identify the Fact : Measures which typically can be aggregated .
Associated exercise :
 Lets take a real life example associated with every ones daily life . We all juggle around our monthly budgets but still mostly we find at the end of the month that the budget could not satisfy all our needs . So at that point we need to answer questions like how much we spent where so as plan our monthly budgets.
-In this case what's the name of the process ? Let's call it monthly budget or say monthly expenses
-What is the granularity of data ? We are going to capture data every day so lowest granularity is "Daily"
-What are the dimensions involved ? Time , Product , Services , WhoSpent etc .
-What are the facts ? Dollar spent , Units bought  ,Dollars earned etc
Now prepare a DM for this scenario .

Let me introduce  a new term today . It's called "Snow Flaking" .
We have to understand multiple things related to a snowflake schema . First of what is it , then how is it different from a regular star schema and finally what is it's purpose .

A snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. Closely related to the star schema, the snowflake schema is represented by centralized fact tables which are connected to multiple dimensions . In the snowflake schema, however, dimensions are normalized into multiple related tables whereas the star schema's dimensions are denormalized with each dimension being represented by a single table.Remember that the "snowflaking" effect only affects the dimension tables and not the fact tables.



 
 
 
 
 
 
 
 
Diagram2 : Dimensional Model showing a Snowflake  Schema

No comments:

Post a Comment