Java – Hadoop: can you silently discard failed map tasks?

I am using Hadoop MapReduce to process a large amount of data The problem is, ocassionally, a corrupted file causes the map task to throw a Java heap space error or something like that

If possible, if possible, it would be good to give up what any map task is doing, kill it, and then continue working. Don't mind the lost data I don't want the whole M / r work to fail

How is this possible in Hadoop?

Solution

You can modify MapReduce max.map. failures. Percent parameter The default value is 0 Increasing this parameter will allow a certain percentage of mapping tasks to fail without causing the job to fail

You can use mapred site This parameter can be set in XML (which will be applied to all jobs) or job by job (which may be safer)

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>