Sparksql and explode on dataframe in Java
•
Java
Is there a simple way to use array column explosion on sparksql dataframe? It's relatively simple in Scala, but this function doesn't seem to be available in Java (as described in Javadoc)
One option is to use sqlcontext SQL (...) and explode functions in queries, but I'm looking for a better and cleaner way Dataframes are loaded from parquet files
Solution
I solved this problem in this way: suppose you have an array column containing job descriptions called "positions", for everyone with "fullname"
Then you get from the initial architecture:
root |-- fullName: string (nullable = true) |-- positions: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- companyName: string (nullable = true) | | |-- title: string (nullable = true) ...@H_404_11@到架构:
root |-- personName: string (nullable = true) |-- companyName: string (nullable = true) |-- positionTitle: string (nullable = true)@H_404_11@通过做:
DataFrame personPositions = persons.select(persons.col("fullName").as("personName"),org.apache.spark.sql.functions.explode(persons.col("positions")).as("pos")); DataFrame test = personPositions.select(personPositions.col("personName"),personPositions.col("pos").getField("companyName").as("companyName"),personPositions.col("pos").getField("title").as("positionTitle"));@H_404_11@
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
二维码