Java read and write parquet format data example code

This paper introduces how to read and write parquet data in Java and share it with you, as follows:

Talk about the schema (the schema is required for writing parquet format data, and the schema is "automatically recognized" if it is read)

The difference between repeated and required is not only the number of times, but also the data types generated after serialization. For example, repeated modifies ttl2 to print as wrappedarray ([7,7_a]), while required modifies ttl2 to print as [7,7_a], except for messagetypeparser The parsemessagetype class can also generate a MessageType using the following method

(note that there is a pit here -- there will be this problem in spark -- ttl2 here as (originaltype. Utf8) has the same function as required binary city (utf8). With utf8, it can be converted to stringtype when reading. If it is not added, an error will be reported [B cannot be cast to Java. Lang.string]

Resolve the [B cannot be cast to java.lang.string exception:

1. Add utf8 when generating parquet file 2 Or provide the same schema class to specify the field type when reading, such as the following:

Maven dependency (I use 1.7)

The above is the whole content of this article. I hope it will be helpful to your study, and I hope you can support programming tips.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>