Feeds:
Posts
Comments

Posts Tagged ‘SAS’

Modern Business Intelligence tools offer capabilities to developers and end-users to abstract table relationship by offering metadata capabilities with in the tool. Administrators of the tool are responsible to setting up connections and tables definitions and relationships (certain advanced tools).

Users would then go into the tool and pick necessary tables/fields from a catalog or library to create their queries. They do not have to know physical location or connection details for accessing data.

While this is a great feature, I have often come across several shops where inefficient metadata design and poor queries cause a great deal of performance issues. My analysis is applicable to several tools available in the market place today. So I’m writing this blog at a conceptual level where readers can translate this into their respective tools and environments.

 

metadata

 

No matter how many promises are made by BI vendors, you will achieve optimal performance when majority of the work is lifted by database engines. Below are the best and case scenario’s when it comes performance. Keep these in mind before designing your metadata layer, logical models and queries.

 

db vs bi server

 

 

Let us examine below example to understand how performance is impacted by where work gets done.

 

Scenario 1: Single table query

Select col1, sum(col2) as amount from d1_table where col1 = ‘x’ group by col1;

 

BI Server recognizes that query belongs to one connection. Query gets pushed to database and BI server receives final results.

Performance: As good as DB performance

Scenario 2: Query with more than one table from same database connection

 

Select t2.col1, sum(t1.col2) as amount

from d1_table1 t1,

d1_table2 t2

where

t1.col1 = ‘x’

and t1.col1 = t2.col1

group by t1.col1;

 

 

 

BI Server recognizes that query belongs to one connection. Query gets pushed to database and BI server receives final results.

Performance: As good as DB performance

Scenario 3: Query with more than one table from different database connections

 

Select t2.col1, sum(t1.col2) as amount

from d1_table1 t1,

d2_table2 t2

where

t1.col1 = ‘x’

and t1.col1 = t2.col1

group by t1.col1;

BI server recognizes that query is using tables from different databases. So, BI server fetches individual table data from each connection and performs the join and aggregation on BI server.

This is very bad for performance especially if tables involved are large.

Increases stress on network.

Increases stress on BI server.

 

Performance: Bad

Scenario 4: Query between table and file

Similar concept like scenario3. BI server recognizes that join cannot be pushed to database. So, it bring all necessary data on to BI server to perform necessary join, aggregation and sort operations.

 

Recommendations:

I have discussed a common problem across different tools.  Exact solution is different for each tool. However, conceptually they are the same. Make sure that your join and aggregation is performed by databases as much as possible. This is the secret sauce for high performing BI architecture.

Read Full Post »

Over the years I have noticed this practices at several companies. In order to help developers avoid errors or to avoid time spent on proper training and knowledge transfer, complex queries are converted into views and handed off to development teams.

These views are great and they do help cut down developer errors and cut down Knowledge transfers. However, there is huge hidden cost behind using views for developing reports and analytics.

For example, lets review below view syntax.

 

Create or replace view myview as

Select t1.col1, t2.col2,t3.col3, t4.col4, t5.col5, t6.col6

From table1 t1,

        table2 t2,

        table3 t3,

        table4 t4,

        table5 t5,

        table6 t6

Where t1.col1 = t2.col2

…etc

 

Above view has 6 tables involved in a join. When using this view for reporting or querying purposes database is going to perform JOIN operations on all tables involved whether query is actually referring to individual table columns are not.

 

Users will see a significant slowness in performance due to this JOIN operation on all tables. The effect is negligible if tables are small and results are rendered quickly. However, the performance will worsen as table sizes increase.

 

Below are few alternative options. My advise may sound counter intuitive, but worth considering if you want to keep your users happy.

Option 1: For shops with BI tools

If you have any good business intelligence tools (Business Objects, OBIEE, Web Focus and others)  in your shop, model table relationships in the tool.

Proper tables will be brought into join relationship based on column selection.

Some of these tools allow you to pull in implicit columns if you must enforce certain joins whether or not columns selected by users

Option 2: Writing queries directly on database without any BI tools

Avoid using complex views for simplicity during development. Train your developers on building proper queries based on business needs.

Limit views to those situations where data must be restricted to satisfy all necessary join conditions due to business logic.

Read Full Post »