On the Past, Present and Future of OLAP
Recent advances in column oriented databases and the proliferation of self-service BI makes a lot of us wonder: is OLAP/Cube technology still relevant? While the obvious answer is definitely yes, it is helpful to know the origins and evolution of these technologies, in order to predict their future.
These technologies were invented during the 1990’s due to mainly two reasons:
- It was (and still is) painfully complex to write SQL queries involving multiple dimensions/hierarchies, using aggregation functions at different granularities. Pivoting or dicing requires a not so intuitive rewriting of such queries
- Such queries were very slow, since traditional relational databases were at that point designed and optimized for transactions, not for analysis
Cube technology has successfully coped with these issues. It has provided a centralized dimensional model with hierarchies and measures, which both business and technical users can understand. Query languages like MDX are founded on a dimensional model. Performance was tackled by means of intelligent pre-aggregation: proprietary structures in MOLAP, caching ROLAP. What Cube technologies also incurred though, was the creation of a parallel universe, with redundancy and ETL process between them.
So almost after 20 years, what has changed and what not?
Database systems have become so fast that the performance argument may not be that relevant anymore… In an ideal world, there would be no need to store data twice, once for transactional and once for analytical purposes. Nevertheless, this is still rather a vision than a reality. We are going to get there but not anytime soon.
On the other hand, the need for a dimensional model is still there, cause this has been a very successful logical model. The world of analysis remains dimensional, regardless of the technical innovations. So no matter how we call it, the world of analysis remains dimensional.