Amazon EMR with the MapR Distribution for Hadoop

Amazon EMR makes it easy to provision and manage Hadoop in the AWS Cloud. Hadoop is available in multiple distributions and Amazon EMR gives you the option of using the Amazon Distribution or the MapR Distribution for Hadoop.

MapR delivers on the promise of Hadoop with a proven, enterprise-grade platform that supports a broad set of mission-critical and real-time production uses. MapR brings unprecedented dependability, ease-of-use and world-record speed to Hadoop, NoSQL, database and streaming applications in one unified Big Data platform. MapR is used across financial services, retail, media, healthcare, manufacturing, telecommunications and government organizations as well as by leading Fortune 100 and Web 2.0 companies. Investors include Lightspeed Venture Partners, Mayfield Fund, NEA, and Redpoint Ventures. Connect with MapR on Facebook, LinkedIn, and Twitter.

Get Started with Amazon EMR

Create a Free Account

If you have large data processing requirements, you may be eligible for lower pricing.

Please Contact Us to learn more.

Features

Enhanced ease-of-use and reliability for Apache HBase applications

Instant Recovery: MapR M7 delivers database high availability. The system automatically recovers from any node failure within seconds, allowing the application to continue operating with no impact.
Zero HBase Administration: MapR M7 allows users to utilize tables without running any separate services, such as RegionServers. In addition, M7 eliminates compactions and provides seamless region splits, so the administrator does not need to run these operations manually.
Continuous Low Latency: MapR M7 provides consistent low latency by avoiding garbage collections or compactions that affect performance. Low Disk I/O coupled with smaller disk footprint makes database operations on disk fast and predictable.
Full Data Protection with Snapshots: M7 delivers full data protection for HBase. Snapshots enable point-in-time recovery of tables to protect against user or application errors. M7 expands snapshots to include all data - both files and tables. HBase tables can be read directly from snapshots and recovered directly without the downtime required to restore HBase in other distributions.
Business Continuity with Mirroring: Mirroring allows users to automatically replicate differential data in real-time across clusters. This could be employed to create disaster recovery solutions for databases or leveraged to provide read-only access to data from multiple locations. Because M7 does not require RegionServers to be reconstructed, databases can be brought up instantly on the mirrored site if the active site goes down.

Industry-standard interfaces

NFS: MapR provides random read/write access and a standard NFS interface so that users can mount the cluster and leverage standard file-based applications with Hadoop, including Linux utilities, file browsers and non-Java applications. When using MapR on Amazon EMR, the NFS interface is pre-mounted at /mapr.
ODBC: MapR provides an ODBC driver for Hive that conforms to the standard ODBC 3.52 specification, enabling users to utilize any BI tool or SQL query builder with Hadoop. MicroStrategy, Tableau, Excel, Toad and many other commercial and open source tools are supported.

Management

Deployment: Amazon EMR with MapR fully automates the provisioning, installation and configuration of the cluster, which can be launched via the AWS Management Console, CLI or API.
MapR Control System (MCS): MapR provides end-to-end monitoring and management for Hadoop, including hardware, storage, MapReduce and other components in the distribution.
CLI and REST API: All MCS capabilities are also exposed through the CLI and REST API. This enables users to obtain cluster information and perform operations programmatically. It also allows integration with third-party and custom monitoring/management systems.

Business continuity

File System High Availability: MapR provides a no-NameNode architecture that can tolerate multiple simultaneous failures with automatic failover and fallback. The metadata is distributed and replicated, just like the data. With no NameNode, there is no practical limit to how many files can be stored, and also no dependency on any external NAS.
MapReduce High Availability: MapR provides JobTracker HA, with automatic failover and fallback. If the active JobTracker fails, it is automatically started on a different node, and all jobs and tasks continue to run with no interruption.
Data Protection: MapR provides snapshots for point-in-time recovery, enabling users to recover from user and application errors. MapR uses redirect-on-write technology, so only changed blocks are snapshotted, avoiding any impact on performance. Note that snapshots are guaranteed to be consistent, so all applications are supported.
Disaster Recovery: MapR provides mirroring between clusters, enabling disaster recovery across availability zones, as well as hybrid deployments involving both on-premise and EMR clusters. For hybrid deployments, all MapR-based Hadoop distributions are supported, including EMC Greenplum MR and the Cisco UCS appliance. Note that only changed blocks are transferred, and all data is automatically compressed.

Compression and performance

Compression: MapR automatically and transparently compresses all data that is not already compressed. This reduces disk and network I/O and increases performance. There is no need to manually compress files or modify applications to handle compression. Random read/writes are also efficient because only the necessary blocks are decompressed with the capability to split files.
Performance: MapR features an advanced architecture that provides higher efficiency and parallelism, while reducing disk and network I/O. MapR holds world records on its performance.

Editions

The M7 Edition is a complete distribution for Apache Hadoop that delivers ease of use, dependability and performance advantages for NoSQL and Hadoop applications. M7 has removed the trade-offs organizations face when looking to deploy a NoSQL solution. M7 provides scale, strong consistency, reliability and continuous low latency with an architecture that does not require compactions or background consistency checks.

The M5 Edition is also a complete distribution for Apache Hadoop that delivers enterprise-grade features for all file operations on Hadoop. Features include mirroring, snapshots, NFS HA, data placement control, and many more, which the most demanding mission-critical environments will welcome.

The M3 Edition is the free version of our complete distribution for Hadoop. The M3 edition delivers a fully random read-write capable platform that supports industry-standard interfaces (e.g., NFS, ODBC), and provides management, compression and performance advantages.

MapR Feature	M7 Edition	M5 Edition	M3 Edition
Complete Distribution for Apache Hadoop	✓	✓	✓
Direct Access NFS	✓	✓	✓
Unlimited Scale	✓	✓	✓
World Record Performance	✓	✓	✓
MapR Control System (MCS)	✓	✓	✓
Volume-based Data Management	✓	✓
No NameNode High Availalbility	✓	✓
JobTracker High Availability	✓	✓
Snapshots for Files	✓	✓
Mirroring for Files	✓	✓
Rolling Upgrades	✓	✓
Instant Recovery for HBase applications	✓
Zero HBase Administration	✓
Continuous Low Latency for HBase	✓
Snapshots for HBase	✓
Mirroring for HBase	✓

Getting Started

The EMR Developer Guide includes detailed instructions on how to launch MapR on EMR using the AWS Management Console, CLI or API. To launch a MapR cluster using the AWS Management Console:

Access the EMR service on the AWS Management Console.
Click Create New Job Flow to start the Create a new Job Flow wizard. This wizard will launch the MapR cluster.
Select MapR M7, M5, or M3 from the Hadoop Version dropdown list on the Define Job Flow pane of the wizard.
Follow the remaining steps in the wizard to launch your job flow.

Support

AWS Premium Support customers may contact Amazon regarding any issues with MapR on EMR.

M5 and M7 users may also contact MapR 24x7 support directly by emailing support@mapr.com. All MapR users are welcome to post questions to the MapR Forums, which are continuously monitored by MapR.

Resources

EMR Developer Guide: Running MapR on EMR
MapR Academy: Free training videos on MapR, including administration and development.
MapR Documentation: Comprehensive technical documentation on MapR.