Java – Hadoop – mapper constructor args

Is there any way to provide mapper constructor args in Hadoop? Libraries that may create jobs through some packaging?

This is my scene:

public class HadoopTest {

    // Extractor turns a line into a "feature"
    public static interface Extractor {
        public String extract(String s);
    }

    // A concrete Extractor,configurable with a constructor parameter
    public static class PrefixExtractor implements Extractor {
        private int endIndex;

        public PrefixExtractor(int endIndex) { this.endIndex = endIndex; }

        public String extract(String s) { return s.substring(0,this.endIndex); }
    }

    public static class Map extends Mapper<Object,Text,Text> {
        private Extractor extractor;

        // Constructor configures the extractor
        public Map(Extractor extractor) { this.extractor = extractor; }

        public void map(Object key,Text value,Context context) throws IOException,InterruptedException {
            String feature = extractor.extract(value.toString());
            context.write(new Text(feature),new Text(value.toString()));
        }
    }

    public static class Reduce extends Reducer<Text,Text> {
        public void reduce(Text key,Iterable<Text> values,InterruptedException {
            for (Text val : values) context.write(key,val);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = new Job(conf,"test");
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        FileInputFormat.addInputPath(job,new Path(args[0]));
        FileOutputFormat.setOutputPath(job,new Path(args[1]));
        job.waitForCompletion(true);
    }
}

It should be clear that because mapper is only provided to the configuration as a class reference (map. Class), Hadoop cannot pass constructor parameters and configure a specific extractor

Some Hadoop packaging frameworks like scoobi, crunch and crunch (there may be more I don't know) seem to have this capability, but I don't know how they are implemented Editor: after working with scoobi, I found something wrong with it If an externally defined object is used in the mapper, scoobi requires it to be serializable, and if not, it will complain at run time So maybe the correct way is to let my extractor serialize and deserialize in mapper's setting method

In addition, I actually work in Scala, so I welcome Scala based solutions (if not encouraged!)

Solution

I suggest telling your mapper which extractor to use through the configuration object you are creating The mapper receives the configuration in its setting method (context. Getconfiguration()) It seems that you can't put an object in the configuration because it is usually constructed from an XML file or command line, but you can set the enumeration value and let the mapper construct its extractor Customizing the mapper after creating the mapper is not very beautiful, but that's my explanation of the API

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>