Sparksql and explode on dataframe in Java

Is there a simple way to use array column explosion on sparksql dataframe? It's relatively simple in Scala, but this function doesn't seem to be available in Java (as described in Javadoc)

One option is to use sqlcontext SQL (...) and explode functions in queries, but I'm looking for a better and cleaner way Dataframes are loaded from parquet files

Solution

I solved this problem in this way: suppose you have an array column containing job descriptions called "positions", for everyone with "fullname"

Then you get from the initial architecture:

root
|-- fullName: string (nullable = true)
|-- positions: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- companyName: string (nullable = true)
    |    |    |-- title: string (nullable = true)
...@H_404_11@ 
 

到架构:

root
 |-- personName: string (nullable = true)
 |-- companyName: string (nullable = true)
 |-- positionTitle: string (nullable = true)@H_404_11@ 
 

通过做:

DataFrame personPositions = persons.select(persons.col("fullName").as("personName"),org.apache.spark.sql.functions.explode(persons.col("positions")).as("pos"));

    DataFrame test = personPositions.select(personPositions.col("personName"),personPositions.col("pos").getField("companyName").as("companyName"),personPositions.col("pos").getField("title").as("positionTitle"));@H_404_11@
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>