How to use java to convert UNIX epoch columns to dates in Apache spark dataframe?
•
Java
I have a JSON data file that contains an attribute [creationdate], which is a UNIX EPOC of "long" numeric type The Apache spark dataframe architecture is as follows:
root |-- creationDate: long (nullable = true) |-- id: long (nullable = true) |-- postTypeId: long (nullable = true) |-- tags: array (nullable = true) | |-- element: string (containsNull = true) |-- title: string (nullable = true) |-- viewCount: long (nullable = true)
I want to do some groupby "creationdata_year", which needs to be obtained from "creationdate"
What is the easiest way to do this in a dataframe using Java?
Solution
After checking the spark dataframe API and SQL functions, I say in the following fragment:
DateFrame df = sqlContext.read().json("MY_JSON_DATA_FILE"); DataFrame df_DateConverted = df.withColumn("creationDt",from_unixtime(stackoverflow_Tags.col("creationDate").divide(1000)));
The reason why the "creationdate" column is divided by "1000" is that the timeunit is different Orgin "creationdate" is the UNIX period in "milliseconds", but spark SQL "from_unixtime" is intended to handle the UNIX period in "seconds"
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
二维码