skip to Main Content

There are some key differences between SAP HANA and Hadoop that make each one ideal for different kinds of data within an enterprise. Using integration technologies like SAP HANA Vora, you can make Hadoop data available within the HANA framework. Knowing when to use each solution will give you the ability to do more with your data so you can make better decisions.

The HANA Way

SAP HANA is designed to give you fast access to data you need at your fingertips. Most data scientists call this Enterprise data, and it includes bids, sales and purchase orders, parts inventory, manufacturing data, shipping schedules, and accounting. HANA stores application data in local memory in columns, which is unlike traditional databases that store data in rows. The benefit of HANA’s data format is that it’s available much faster, almost in real time when combined with in-memory architecture.

The Hadoop Way

Hadoop is an open-source framework for storing massive amounts of unstructured data on distributed and clustered hardware. It’s designed for big data, such as data from sensors, cameras, RFID readers, office documents, social media, scans, and more. As you can probably guess, that kind of data does not fit neatly in a column or a row. However, this data can be invaluable if you can find ways to format and read it so that you can compare it with other data. The biggest benefit of Hadoops’ data format is that it’s not very expensive to store incredible amounts of data that you can process over time.

How to Use Them Together: SAP HANA & Hadoop Integration

SAP offers two ways to use HANA and Hadoop in concert. The first is Spark, which is the fast engine you need for the large-scale data processing that Hadoop requires. Spark lets you structure your Hadoop data so that it can be accessed through the SAP HANA Spark Controller, and queried using SQL. It’s not the easiest tool to use, but it does combine SQL, streaming, and complex analytics.

SAP HANA Vora changes that. It sits on top of Spark as an in-memory query engine to get enriched analytics from Hadoop. With its interactive interface, Vora allows data scientists to model data and process it for visualization or analysis. It can work exclusively on Hadoop to provide that interface to your big data, or with HANA to extend your analytics across both Enterprise data and Big Data. It was designed to perform Online Analytical Processing (OLAP) for big-data analysis; however, it has evolved because of its bi-directional ability to read HANA data and write back to it. This eliminates the need to export and move HANA data to HDFS to run analysis. Instead, Vora builds structured data hierarchies for Hadoop data that it can store in memory to run like HANA at similar speeds.

What Hadoop Can and Cannot Do

Hadoop is ideal for analyzing and working with a wide variety of data sources and types, such as social media feeds, office documents, charts and graphs – basically anything that would be cumbersome to manage in database tables. It’s designed to help you see patterns in massive amounts of data, such as social media topics and sentiments, purchasing patterns, and building trends for recommendation engines like Amazon. Hadoop has the ability to store all of your raw data, not just the processed data that is now out of date and needs to be moved to longer-term storage instead of in memory.

In short, Hadoop offers enterprises one of the most affordable ways to consume massive amounts of data that you don’t need on a daily basis. Data that you can then run batch jobs to process because you don’t need it in real time.

On the other hand, using Hadoop for near real-time transactional data processing would be a disaster. Transactional data is typically very complex and needs to be handled in milliseconds, not overnight. In fact, using Hadoop for any application that requires fast processing of data, such as online help or searchable websites, would not keep your customers happy.

What HANA Can and Cannot Do

SAP HANA, on the other hand, stores data in memory, which makes it fast to access but expensive to keep. HANA is designed to store transactional data that is accessed and aggregated often and in different ways. Everything that Hadoop is not good for, HANA excels at – online transactional processing, customer-facing search and query, and anything that demands fast performance.

However, in-memory storage is almost logarithmically more expensive than disk storage. Consequently, HANA may not be cost effective for looking at larger data sets, such as census data or seasonal buying trends for the past five years.

Why You Might Want Both

Now you can see why you might want to have both transactional and big data. Using SAP HANA Vora, you could compare big data, such as seasonal buying trends for the past five years, and compare it in real time to the transactional data that tells you what’s happening in your online store. This allows you to spot issues and advantages much faster. The combination would show you that the spike you are seeing today might only last three months, so you can take maximum advantage of it, knowing its lifespan. On the other hand, you would avoid mistaking this as a long-term trend and overstock on merchandise that will be hard to sell in a couple months.

Getting big data and transactional data to work together takes expertise and deep understanding of both data science and SAP systems. To learn more about deploying SAP HANA and Hadoop in your enterprise talk to Symmetry, your SAP certified expert.

Matt Lonstine, Director of Delivery

A Leader and SAP technical veteran, Matt Lonstine oversees the SAP Technologists that make up Symmetry’s Delivery team, while providing strategic guidance to customers. Matt has managed and executed some of Symmetry’s most complex projects with a current focus on HANA, virtualization, disaster recovery, system hardening, and migrations to Symmetry’s SAP Cloud environment.