While SAP HANA is great for fast-access to high-value, often-used data, sometimes you need to look at massive amounts of data to see bigger patterns and trends. While HANA is ideal for enterprise data that you need instant access to, such as daily transactions like sales orders, purchase orders, parts inventory, and shipping schedules, there is a lot of data that you might need to track to look at long term.
Hadoop, on the other hand, is designed for big data, such as information sensors, remote sensors, cameras, RFID readers, and more. Hadoop databases store data in less structured ways and are ideal for large information sources, such as web logs, social media, office documents, graphs, and anything that doesn’t fit easily into a row or column.
However, the biggest difference is that SAP HANA keeps formatted enterprise data stored in memory, while Hadoop stores huge amounts of raw data on disk farms.
Big Data Means Distributed File Systems
As you might guess, you can’t access unstructured, raw data quickly. SAP uses Spark to structure the Hadoop Distributed File System (HDFS). Spark is a fast engine for large-scale data processing that combines SQL, streaming, and complex analytics. Its drawback is that it doesn’t have an easy and familiar web-based interface. You need Spark to structure your HDFS files and make them searchable via SQL.
Vora is an in-memory query engine that plugs into Spark to get enriched analytics from Hadoop. Vora gives you an interactive interface to model the data and process it for visualization or analysis. It can work alone to provide that interface to your big data, or with HANA to extend your analytics across both Enterprise and Big Data.
Vora was originally used to provide Online Analytical Processing (OLAP) for big-data analysis; however, it has evolved because of its bi-directional ability to read HANA data and write back to it. This allows Vora users to avoid moving HANA data to HDFS to run analysis, using both data sources.
In short, Vora bridges the gap between the data you store and access daily, using HANA, and the data stored in Hadoop.
How SAP HANA Vora Works
Vora offers three tools: a data browser, a SQL editor, and a data modeler that are all accessible from the intuitive web interface. This means that your data team can get what they need without struggling with highly technical tools and scripting languages, such as Python, Java, and Scala.
Vora’s data browser allows you to see the tables, views, dimensions, and cubes that exist in the Vora engine. You can also preview data, filter columns and refresh them, or download the data. The SQL engine makes it easier to run queries on the Vora engine, displaying compilation errors and warnings, as well as outputting the result of the query. The data modeler makes it easier to create SQL views, dimensions, and/or cubes without changing applications. It makes data modeling in Hadoop as easy as data modeling in HANA.
Simplifying Big Data
Most standard business functions are built into Vora, such as currency conversions, units of measurement, and hierarchies. These built-in functions reduce the time it takes to create, join, or aggregate data models. They also allow modelers to get good insights into huge data with just a data model and query.
Looking at SAP Vora versus Spark, before Vora, your data scientists had to deal with Spark directly. Vora sits on top of Spark, giving analysts a more seamless integration with HANA, as well as less coding and debugging. Vora is also set up with enterprise-level security instead of just data security, making life easier for you at audit time and harder for hackers.
Since Need-to-Have (Enterprise) data is stored in expensive memory on HANA and Nice-to-Have (Big) data is stored on cheap drives in Hadoop, getting big data to support enterprise data isn’t as easy or fast as most businesses need. Vora helps bridge that gap.
Because SAP HANA Vora is an in-memory processing engine that runs on a Hadoop cluster, it builds structured data hierarchies for Hadoop data. This makes it possible to access that data with HANA data so you can run OLAP analysis in memory. Vora’s data modeling function also allows modelers to preview data models for analytics much faster, speeding up the whole process. Vora can also virtualize Hadoop data in order to join it with HANA data for reporting.
How to Get Started
Don’t try this at home without at least a robust understanding of your data, data-science best practices, and SAP expertise. Combining big data with enterprise data is challenging even if the data structures were the same. But they’re not. SAP HANA is column-based, Hadoop is unstructured, and SQL is column-based. These fundamentally different data structures create a number of headaches in getting data to line up for comparison and analysis.
If your data science team is already overworked like at most companies, you might want to outsource your SAP HANA Vora training and deployment. Get an SAP certified team that can recognize how your data scientists and data modelers work, understand what you’re trying to achieve, and then marry those variables into a relatively easy-to-use solution that allows you to access enterprise and big data.
It’s not easy, but in the long run, combining big data with enterprise data could help you move your decision making to a whole new level, uncover some new opportunities, and help you spot and fix challenges before they become problems.