Detailed examples of Hadoop multi job parallel processing

Detailed examples of Hadoop multi job parallel processing

For Hadoop multi job task parallel processing, after testing, the configuration is as follows:

First, configure as follows:

1. Modify mapred site XML add scheduler configuration:

2. Add jar file address configuration:

The basic Java code is as follows:

Finally, assemble these codes into the main method and run the following commands using Hadoop:

Hadoop jar package name the class where the method entry is located

For example:

You can monitor the parallel status of jobs through port 50030. I won't say more here!!

Explanation:

1. Configuring the jar address can solve the problem of classnotfound at runtime after packaging the generated jar package;

2. Set setjarbyclass for multiple jobs. After testing, if this class is not set, a classnotfound error will appear at run time, where capusedatetimertask is the class name of the main method;

3. There is a difference between waitforcomplete and submit methods. Waitforcomplete is serial and submit is parallel. It is precisely because submit is parallel that subsequent code operations need to judge whether their execution is completed or not, that is, iscomplete();

4. The above job is: org apache. hadoop. mapreduce. Job

The above code operations have passed the test on the single machine / cluster!

If you have any questions, please leave a message or go to the community of this site for exchange and discussion. Thank you for reading. I hope it can help you. Thank you for your support to this site!

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>