Java – running standalone Hadoop applications on multiple CPU cores

My team built a Java application using Hadoop library to convert a pile of input files into useful output

When I run this application on the command line (or on eclipse or NetBeans), I haven't been able to convince it to use more maps and / or reduce threads at once Given that the tool takes up a lot of CPU, this "single thread" is my current bottleneck

When running it in the NetBeans profiler, I did see that the application started several threads for various purposes, but only one map / reduce was running at the same time

The input data consists of several input files, so Hadoop should be able to run at least one thread for each input file at the same time for the map phase

What should I do with at least 2 or even 4 active threads running (it should be possible for most of the processing time of this application)?

I expected it to be a very stupid thing that I ignored

I just found this: https://issues.apache.org/jira/browse/MAPREDUCE-1367 This implements the functionality I looked for in Hadoop 0.21, which introduces MapReduce local. map. tasks. The maximum flag to control it

Now I have also found the solution described in here in this question

Solution

I'm not sure if I'm right, but when you run a task in local mode, you can't have multiple mappers / reducers

In any case, to set the maximum number of running mappers and reducers, use the configuration option mapred tasktracker. map. tasks. Maximum and mapred tasktracker. reduce. tasks. Maximum by default, these options are set to 2, so I may be right

Finally, if you want to prepare for a multi node cluster, run it directly in a fully distributed manner, but let all servers (namenode, datanode, tasktracker, jobtracker,...) run on one machine

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>