Writing custom Java objects to parquet

I have some custom Java objects (internally composed of other custom objects) I want to write this to HDFS in parquet format

Even after a lot of searching, most suggestions seem to use Avro format and internal avroconverter from parquet floor to store objects

Seeing this here and here, I seem to have to write a custom writersupport to complete this task

Is there a better way? Which is better, writing custom objects directly or using intermediate schema definitions like Avro?

Solution

You can use Avro reflection to get the schema Its code is like reflectdata AllowNull. get(). getSchema(CustomClass.class). I have a sample parquet demo code snippet

In essence, the custom Java object writer is like this:

Path dataFile = new Path("/tmp/demo.snappy.parquet");

    // Write as Parquet file.
    try (ParquetWriter<Team> writer = AvroParquetWriter.<Team>builder(dataFile)
            .withSchema(ReflectData.AllowNull.get().getSchema(Team.class))
            .withDataModel(ReflectData.get())
            .withConf(new Configuration())
            .withCompressionCodec(SNAPPY)
            .withWriteMode(OVERWRITE)
            .build()) {
        for (Team team : teams) {
            writer.write(team);
        }
    }

You can replace team. Java with a custom Java class You can see that the team class contains a list of person objects, which is similar to your requirements And Avro can get the architecture without any problem

If you want to write to HDFS, you may need to replace the path with HDFS format But I didn't try it myself

By the way, my code was inspired by this parquet - example code

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>