How to use java to convert UNIX epoch columns to dates in Apache spark dataframe?

I have a JSON data file that contains an attribute [creationdate], which is a UNIX EPOC of "long" numeric type The Apache spark dataframe architecture is as follows:

root 
 |-- creationDate: long (nullable = true) 
 |-- id: long (nullable = true) 
 |-- postTypeId: long (nullable = true)
 |-- tags: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- title: string (nullable = true)
 |-- viewCount: long (nullable = true)

I want to do some groupby "creationdata_year", which needs to be obtained from "creationdate"

What is the easiest way to do this in a dataframe using Java?

Solution

After checking the spark dataframe API and SQL functions, I say in the following fragment:

DateFrame df = sqlContext.read().json("MY_JSON_DATA_FILE");

DataFrame df_DateConverted = df.withColumn("creationDt",from_unixtime(stackoverflow_Tags.col("creationDate").divide(1000)));

The reason why the "creationdate" column is divided by "1000" is that the timeunit is different Orgin "creationdate" is the UNIX period in "milliseconds", but spark SQL "from_unixtime" is intended to handle the UNIX period in "seconds"

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>