java – Hadoop 1.0. Jobconf V / s configuration for 4
Hi, I'm new to Hadoop and I'm also a file system I saw two different wordcount examples using jobconf and configuration What's the difference between them
I've studied jobconf. It's an old package org apache. hadoop. Mapred (deprecated in 0.20. X), but configuration is the new package org apache. hadoop. Part of MapReduce But now in V1 It is not recommended in 0.4
At present, we have two methods to run map reduce jobs in Java. One is at org apache. hadoop. The MapReduce package uses (extension) classes, and the other is at org apache. hadoop. Class implemented in mapred package
I want to know:
>What are the differences between mapred and MapReduce package structures and why mapred is not discarded? > Which method is more suitable for v1 0.4 use? Why? Jobconf or configuration? > Which is more suitable for v1 0.4? Mapred or MapReduce?
Solution
If you look at the releases page, you can see 1.0 4 corresponds to 0.20 Something about 20x
In order to give some background, the discussion here is on the mailing list:
The "old" MapReduce API in org.apache.hadoop.mapred was deprecated in the 0.20 release series when the "new" (Context Objects) MapReduce API was added in org.apache.hadoop.mapreduce. Unfortunately,the new API was not complete in 0.20 and most users stayed with the old API. This has led to the confusing situation where the old API is generally recommended,even though it is deprecated.
So you can see that it's mainly about retro compatibility
So most importantly, if you use 1.0 now 4 start the application, you should use MapReduce instead of mapred, because it is now the preferred method, but if you have a legacy application, you can still use the old mapred This means that you should use configuration
As for the difference between mapred and MapReduce, as explained in the above excerpt, it mainly comes from the introduction of context objects, but there are some other changes and new classes that are not available in the old mapred