Sparksql and explode on dataframe in Java

2019-06-06 • Java

Is there a simple way to use array column explosion on sparksql dataframe? It's relatively simple in Scala, but this function doesn't seem to be available in Java (as described in Javadoc)

One option is to use sqlcontext SQL (...) and explode functions in queries, but I'm looking for a better and cleaner way Dataframes are loaded from parquet files

Solution

I solved this problem in this way: suppose you have an array column containing job descriptions called "positions", for everyone with "fullname"

Then you get from the initial architecture:

root
|-- fullName: string (nullable = true)
|-- positions: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- companyName: string (nullable = true)
    |    |    |-- title: string (nullable = true)
...@H_404_11@ 
 到架构： 
  
 root
 |-- personName: string (nullable = true)
 |-- companyName: string (nullable = true)
 |-- positionTitle: string (nullable = true)@H_404_11@ 
 通过做： 
  
 DataFrame personPositions = persons.select(persons.col("fullName").as("personName"),org.apache.spark.sql.functions.explode(persons.col("positions")).as("pos"));

    DataFrame test = personPositions.select(personPositions.col("personName"),personPositions.col("pos").getField("companyName").as("companyName"),personPositions.col("pos").getField("title").as("positionTitle"));@H_404_11@

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

java – Make Enum. Tostring() localization

< <上一篇

Detailed layout of Android absolutelayout and relativelayout

下一篇>>

搜索内容

Sparksql and explode on dataframe in Java

Solution

热门文章