Hadoop on EC2 vs Elastic Map Reduce
NickName:OckhamsRazor Ask DateTime:2013-03-03T02:31:55

Hadoop on EC2 vs Elastic Map Reduce

I'm trying to evaluate the differences between these two options. Here are some pros and cons I can think of :

Elastic Map Reduce => Better support from Amazon, No need to administer cluster, More Expensive (?) EC2 + Hadoop => More control of your hadoop configuration, Cheaper (?)

I'm wondering if anyone might have benchmarked the performance of EC2 + Hadoop vis a vis EMR? Is there any significant difference in cost for large cluster deployments? What other differences exist?

Copyright Notice:Content Author:「OckhamsRazor」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/15177908/hadoop-on-ec2-vs-elastic-map-reduce

Answers
WestCoastProjects 2013-03-02T23:28:42

We use both approaches (EMR and EC2) at my job. \n\nThe advantages of EMR that Amar mentioned are more or less true: so if you want simplicity it may be the way to go.\n\nBut there are other considerations:\n\n\nthe version of EMR is far behind apache head. it is approximately 0.20.205 whereas head is at 2.X, which is essentially 3 versions up (1.0, 1.1, 2.0..) \n\n\nhadoop@domU-12-31-39-07-B9-97:~$ ll hadoop*.jar\nlrwxrwxrwx 1 hadoop hadoop 73 Feb 5 12:00 hadoop-examples-0.20.205.jar -> /home/hadoop/.versions/0.20.205/share/hadoop/hadoop-examples-0.20.205.jar\nlrwxrwxrwx 1 hadoop hadoop 69 Feb 5 12:00 hadoop-test-0.20.205.jar -> /home/hadoop/.versions/0.20.205/share/hadoop/hadoop-test-0.20.205.jar\nlrwxrwxrwx 1 hadoop hadoop 69 Feb 5 12:00 hadoop-core-0.20.205.jar -> /home/hadoop/.versions/0.20.205/share/hadoop/hadoop-core-0.20.205.jar\nlrwxrwxrwx 1 hadoop hadoop 70 Feb 5 12:00 hadoop-tools-0.20.205.jar -> /home/hadoop/.versions/0.20.205/share/hadoop/hadoop-tools-0.20.205.jar\nlrwxrwxrwx 1 hadoop hadoop 68 Feb 5 12:00 hadoop-ant-0.20.205.jar -> /home/hadoop/.versions/0.20.205/share/hadoop/hadoop-ant-0.20.205.jar\n\n\nAs a direct consequence I had to re-code /restructure my Map/reduce program due to missing contrib modules in the older version running on EMR\nYou do not have as much of an opportunity to use non-Map/Reduce algorithms as if you were using updated version of M/R.\nFlexibility to mix and match versions of hadoop ecosystem. \n",


Amar 2013-03-02T20:10:38

Well, administering/monitoring/maintaining a cluster isn't a small task in itself.\nUsing EMR really you could get machines configured and up and running with your custom bootstrap code in no time.\nApart from doing all these things EMR provides a A lot of other tools/options/facilities too.\n\nHere you don't have to worry about terminating a cluster after the jobs are done, you can surely implement a way for yourself in the EC2+Hadoop setup, but EMR does this for you in a neat way.\n\nAlso you have facility to resize the cluster size even while your jobs are running!\n\nThe Pig and Hive that are available with EMR also contain patches which make it easier to work with files in S3.\n\nEven here in this answer you may find that EMR has been given an upper hand.",


More about “Hadoop on EC2 vs Elastic Map Reduce” related questions

Hadoop on EC2 vs Elastic Map Reduce

I'm trying to evaluate the differences between these two options. Here are some pros and cons I can think of : Elastic Map Reduce => Better support from Amazon, No need to administer cluster, More

Show Detail

In which scenario should one prefer to create Spark cluster on EC2 machines instead of using Elastic Map Reduce?

Between processing realtime data using Spark cluster on EC2 machines and using Elastic map reduce, some of the differences are: In Elastic Map Reduce, one would not have to manage the infrastructu...

Show Detail

Running MRToolkit hadoop jobs on AWS elastic map/reduce

Loving MRToolkit -- great to get away from Java while writing Hadoop jobs. It has become apparent that the library was written to interface with an EC2 cluster, and not with Amazon's elastic map/r...

Show Detail

Hadoop/Elastic Map Reduce with binary executable?

I am writing and distributed image processing application using hadoop streaming, python, matlab, and elastic map reduce. I have compiled a binary executable of my matlab code using the matlab comp...

Show Detail

EMR, EC2, OpenStack, Please clarify

I am quite new to Amazon services, and started reading about EMR. I am more or less familiar with OpenStack. I just want some one to tell me in short what plays the role of Compute, Controller and ...

Show Detail

Force integer sort in Hadoop on Elastic Map Reduce (AWS)

I am using a map reducer on Hadoop on Elastic Map Reduce (on AWS) but it is sorting it as a string, I want to do integer sorting. How to do it ? I want to take the key as integer and do integer sor...

Show Detail

Hadoop / AWS elastic map reduce performance

I am looking for a ballpark if any one has experience with this... Does anyone have benchmarks on the speed of AWS's map reduce? Lets say I have 100 million records and I am using hadoop streamin...

Show Detail

Elastic Map Reduce: continue on error?

We use Elastic Map Reduce quite extensively, and are processing more and more data with it. Sometimes our jobs fail because the data is malformed. We've constantly revised our map scripts to handle...

Show Detail

Hadoop CouchDB Elastic Search

I have already installed CouchDB (ver 1.1.0), Elastic Search (0.17.6) on my Fedora. I want now to install Hadoop Map/reduce (http://hadoop.apache.org/mapreduce/) and Hadoop DFS (http://hadoop.apach...

Show Detail

Running mahout using hadoop on Amazon's EMR/EC2

I want to migrate my current local hadoop cluster into amazon . In this hadoop cluster I am using services like mahout,hbase and hive . I have two option now in amazon either go for pure EC2 instan...

Show Detail