Get the input and output quantity of MapReduce in Java

I want to get the number of inputs and outputs of the map phase and the reduce phase and time of completing the map / reduce job using Java These statistics are written on the terminal, but I need to write it in Java code and on my own interface. After this line:

job_blocking.waitForCompletion(true);

Solution

After this line, you can get the map by getting the values of these counters_ INPUT_ Records and reduce_ OUTPUT_ Number of records (also map_output_records):

long map_input_records = job.getCounters()
    .findCounter("org.apache.hadoop.mapreduce.Task$Counter","MAP_INPUT_RECORDS")
    .getValue();
long map_output_records = job.getCounters()
    .findCounter("org.apache.hadoop.mapreduce.Task$Counter","MAP_OUTPUT_RECORDS")
    .getValue();
long reduce_input_records = job.getCounters()
    .findCounter("org.apache.hadoop.mapreduce.Task$Counter","REDUCE_INPUT_RECORDS")
    .getValue();
long reduce_output_records = job.getCounters()
    .findCounter("org.apache.hadoop.mapreduce.Task$Counter","REDUCE_OUTPUT_RECORDS")
    .getValue();

For the time required to run the job, I don't know if there is another way (easier) instead of setting a long variable before and after the current time and getting the difference between them

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>