MapReduce, SQL-MapReduce® Resources
What is MapReduce?
MapReduce, or map reduce, is a programming framework developed by Google to simplify data processing across massive data sets. As people rapidly increase their online activity and digital footprint, organizations are finding it vital to quickly analyze the huge amounts of data their customers and audiences generate to better understand and serve them. MapReduce is the tool that is helping those organizations.
You can learn more about MapReduce at www.mapreduce.org.
What is SQL-MapReduce®?
SQL-MapReduce is a framework created by Aster Data to allow developers to write powerful and highly expressive SQL-MapReduce functions in languages such as Java, C#, Python, C++, and R and push them into the platform for advanced in-database analytics. Analysts can then invoke SQL-MapReduce functions using standard SQL through Aster Data's nCluster, the first MPP analytic platform that allows applications to be fully embedded within the database engine to enable ultra-fast, deep analysis of massive data sets.
SQL-MapReduce functions are simple to write and are seamlessly integrated within SQL statements. They rely on SQL queries to manipulate the underlying data and provide input. The functions can procedurally manipulate such input data and provide outputs that can be further consumed by SQL queries or written into tables within the database.
MapReduce functions seamlessly integrate into SQL queries
Aster Data's customers use SQL-MapReduce to ask questions of their data that were previously impossible, or the results were so slow that they could not meet service level agreements. In these short tutorials and case studies, you will learn how companies are writing SQL-MapReduce functions for:
- Fraud Detection – A large online gaming company catches cases of fraud that previous queries could not detect. And the company reduced its fraud analytics cycle time from one week to 15 minutes, with query response dropping from 90 minutes to 90 seconds.
- Graph Analysis – A social media company uses the SQL-MapReduce function nPath for graph analysis to understand how its users are connected and enhance the networks of its community.
- Sharing Behavior – ShareThis uses MapReduce to reduce query times as it analyzes the items that people share online to understand sharing behavior.
- Sessionization – A social network uses the SQL-MapReduce function "sessionize" to break user data into sessions based on the length of time between activity on the network. With sessionize, the SQL code dropped from more than 1000 lines to less than 100 and performance improved dramatically.
- Search Behavior – An online media company uses the SQL-MapReduce function nPath to better understand the paths its users follow after conducting a search to improve search results.
- Transformations – Where data transformations previously required multiple complex self joins, a media company now uses the SQL-MapReduce function nPath to make a single pass of its data, significantly simplifying the code and improving performance.
Writing with SQL-MapReduce
In this tutorial series, we explain the inner workings of Aster Data's integration of SQL with Map Reduce and show how to write and call a MapReduce function with SQL-MapReduce.
- SQL-MapReduce Session 1: The Basics of SQL and MapReduce Integration – Peter Pawlowski explains the benefits and limitations of SQL and MapReduce for organizations pushing their analytics to the next level.
- SQL-MapReduce Session 2: nPath – Peter Pawlowski explains how nPath, a SQL-MapReduce function for the analysis of ordered data, is integrated into a SQL query.
- SQL-MapReduce Session 3: Writing a SQL-MapReduce Function – Eric Friedman describes the execution model a developer writing a SQL-MapReduce function needs to consider. Friedman then shows how the "sessionize" was written.
The Aster-Hadoop Data Connector uses Aster Data's patent-pending SQL-MapReduce capabilities for two-way, high-speed, data transfer between Apache Hadoop and Aster Data's massively parallel analytic platform.
The integration of Aster Data and Hadoop allows businesses to leverage Hadoop for data collection and preparation, while they use Aster Data to perform complex data analytics and processing. The connector utilizes SQL-MapReduce functions for ultra-fast, two-way data loading between HDFS (Hadoop Distributed File System) and Aster Data's MPP analytic platform.
Key advantages of the Aster-Hadoop HDFS Data Connector include:
- High-performance: Fast, parallel data transfer between Hadoop and Aster nCluster.
- Ease-of-use: Analysts can now seamlessly invoke a SQL command for ultra-simple import of Hadoop-MapReduce jobs, for deeper data analysis. Aster intelligently and automatically parallelizes the load.
- Data Consistency: Aster Data's data integrity and transactional consistency capabilities treat the data load as a 'transaction,' ensuring that the data load or export is always consistent and can be carried out while other queries are running in parallel in Aster.
- Extensibility: Customers can easily further extend the Connector using SQL-MapReduce, to provide further customization for their specific environment.
Read much more on In-Database MapReduce on our blog