AWS Cloud
AWS Cloud
Get started with Amazon EMR

Amazon EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances. You can also run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB.

Amazon EMR securely and reliably handles a broad set of big data use cases, including log analysis, web indexing, data transformations (ETL), machine learning, financial analysis, scientific simulation, and bioinformatics.

Adobe Flash Player or a modern browser is required to view videos on this site.

amazon_emr_thumb_resize_prod_pg_378x171
6:03
Intro to Amazon EMR

Click to Enlarge

Amazon EMR Release Velocity

Amazon EMR Release Velocity

With versioned releases on Amazon EMR, you can easily select and use the latest open source projects on your EMR cluster, including applications in the Apache Hadoop and Spark ecosystems. Software is installed and configured by Amazon EMR, so you can spend more time on increasing the value of your data without worrying about infrastructure and administrative tasks.


Easy to Use

Easy to Use

You can launch an Amazon EMR cluster in minutes. You don’t need to worry about node provisioning, cluster setup, Hadoop configuration, or cluster tuning. Amazon EMR takes care of these tasks so you can focus on analysis. 

Low Cost

Low Cost

Amazon EMR pricing is simple and predictable: You pay an hourly rate for every instance hour you use. You can launch a 10-node Hadoop cluster for as little as $0.15 per hour. Because Amazon EMR has native support for Amazon EC2 Spot and Reserved Instances, you can also save 50-80% on the cost of the underlying instances.

Elastic

Elastic

With Amazon EMR, you can provision one, hundreds, or thousands of compute instances to process data at any scale. You can easily increase or decrease the number of instances manually or with Auto Scaling, and you only pay for what you use.

Reliable

Reliable

You can spend less time tuning and monitoring your cluster. Amazon EMR has tuned Hadoop for the cloud; it also monitors your cluster —retrying failed tasks and automatically replacing poorly performing instances.

 

Secure

Secure

Amazon EMR automatically configures Amazon EC2 firewall settings that control network access to instances, and you can launch clusters in an Amazon Virtual Private Cloud (VPC), a logically isolated network you define. For objects stored in Amazon S3, you can use Amazon S3 server-side encryption or Amazon S3 client-side encryption with EMRFS, with AWS Key Management Service or customer-managed keys.

Flexible

Flexible

You have complete control over your cluster. You have root access to every instance, you can easily install additional applications, and you can customize every cluster. Amazon EMR also supports multiple Hadoop distributions and applications.

Clickstream Analysis

Clickstream Analysis

Amazon EMR can be used to analyze clickstream data in order to segment users, understand user preferences, and deliver more effective ads.    
    
Learn how Razorfish uses EMR for click stream analysis »

Real-time Analysis

Real-time Analytics

Consume and process real-time data from Amazon Kinesis, Apache Kafka, or other data streams with Spark Streaming on Amazon EMR. Perform streaming analytics in a fault-tolerant way and write results to Amazon S3 or HDFS.
Learn how Hearst uses Spark Streaming »

Log Analysis

Log Analysis

Amazon EMR can be used to process logs generated by web and mobile applications. Amazon EMR helps customers turn petabytes of un-structured or semi-structured data into useful insights about their applications or users.
Learn how Yelp uses EMR to drive key website features »

ETL

Extract Transform Load (ETL)

Amazon EMR can be used to quickly and cost-effectively perform data transformation workloads (ETL) such as - sort, aggregate, and join - on large datasets.
  
Learn how Redfin uses transient EMR clusters for ETL »

Predictive Analytics

Predictive Analytics

Apache Spark on Amazon EMR includes MLlib for scalable machine learning algorithms or you can use your own libraries. By storing datasets in-memory, Spark can provide great performance for common machine learning workloads.
Learn how Intent Media uses Spark MLib »

Genomics

Genomics

Amazon EMR can be used to process vast amounts of genomic data and other large scientific data sets quickly and efficiently. Researchers can access genomic data hosted for free on AWS.
Learn about Apache Spark and Precision Medicine »

It's easy to get started with Amazon EMR. Follow our Getting Started Guide to launch your first Amazon EMR cluster and start analyzing data in a few clicks.

 

Get Started