Page 1

Is Big Data Replacing Data Warehouse? Busting the myth In the past few decades, Data Warehouse had been blazing trails and today it is Big Data that is the latest revolution in technology. One question that is often being asked is whether Big Data will replace Data Warehousing.

Image Source : Though both Big Data and Data Warehousing have similarities, they are two different technologies and there is a huge difference between the two. Before we delve into the dissimilarities, it is important to know what Data Warehousing and Big Data are. Broadly, a big data solution is a technology based on volume, velocity and variety, whereas data warehousing is an architectural concept in data computing.

First, let us look at the board similarities between the two •

Both hold a lot of data


Both can be used for reporting


Both are managed by electronic storage devices

Now, let us deep dive a bit into both the technologies:

Data Warehousing Data Warehousing refers to data which is extracted from one or more homogeneous or heterogeneous data sources, and then transforming the data before loading it into a data repository for data analysis. This data analysis is useful and helps in better judgement for improving performances and can be used for reporting. The data repository which is generated from the process is the data warehouse. It is a conceptual architecture which is aimed at storing structured, subjectoriented, time variant, non-volatile data for decision making. Data Warehouse typically stores the historical data, a copy of transaction data specifically structured for query and analysis. A Data Warehouse traditionally brings together data from many transactional and operational systems, which is then presented as a consolidated and the best real version to decision makers at all levels of the organization. A well done data warehouse design allows us to access, report and analyze that information from all the relevant and possible angles; which drives consistent and accurate information as a result.

Big Data Big data is a technology that is used to store the unstructured data from various sources and to manage huge volume of data in Exabyte (1 billion GB) and Zettabytes (1 trillion GB). Big Data can store all kinds of data like structured, semi-structured and unstructured data which can consists of video, audio, unstructured text, etc., while using cheaper storage devices. The data is not processed at one place and is spread across several servers for faster processing and is stored in the native format without any planning or modelling applied. The actual usage of the data needs rules to be applied to the data to get the report.

Big data refers to volume, variety, and velocity of the data, the 3Vs which were named by industry analyst Doug Laney in the early 2000s. Big Data is determined by the size of the data, the speed at which it is coming and the wide range of data. •

Volume - There is an enormous amount of data that is collected by organizations from multiple sources which may include business transactions, social media, and information from sensor or machine-to-machine data. Today, with newer technologies like Hadoop, it has become very easy to store huge amounts of data.

Velocity - All the data collected streams in at a speed that is unprecedented and it must be taken care of in a timely manner. RFID tags, sensors, and smart metering, all call for dealing with all the loads of data in near-real time.

Variety - The data collected streams in, in different formats – some can be structured, some numeric data in traditional databases and some can be totally unstructured text documents. They can be in the form of email, video, audio, stock ticker data and financial transactions. Finally, let’s have a quick look at how Data Warehouse and Big Data are different

In Data Warehouse, the data is in structured form whereas in Big Data, the data is in unstructured form.

The quality of data is transformed in Data Warehouse whereas Big Data has raw data

Data Warehouse stores large volumes of data while Big Data stores enormous volumes

The cost of storage is comparatively high in Data Warehouse whereas the cost of storage is low in Big Data

The data in a Data Warehouse is highly secured while the same cannot be said of Big Data, it is open source security and is getting better. Big Data technologies are focused on advanced analytics, and can be viewed as a modernization strategy for data archives. Data Warehouses were mostly built for reporting, OLAP and performance management. Hence, we can rightly state that Big Data is a complementary technology and not a replacement to a Data Warehouse. They co-exist based on the business requirements.

Is Big Data Replacing Data Warehouse? Busting the myth  

Learn more about Big Data and Data Warehousing. Similarities and differences that show Big Data is a complementary technology and not a repl...

Is Big Data Replacing Data Warehouse? Busting the myth  

Learn more about Big Data and Data Warehousing. Similarities and differences that show Big Data is a complementary technology and not a repl...