Java – structured stream exception when using additional output mode with watermark
Although I am using watermark (), when I run my spark work, I receive the following error message:
From what I saw in the programming guide, this exactly matches the expected usage (and sample code) Who knows what might be wrong?
Thank you in advance!
Related codes (Java 8, spark 2.2.0):
StructType logSchema = new StructType() .add("timestamp",TimestampType) .add("key",IntegerType) .add("val",IntegerType); Dataset<Row> kafka = spark .readStream() .format("kafka") .option("kafka.bootstrap.servers",brokers) .option("subscribe",topics) .load(); Dataset<Row> parsed = kafka .select(from_json(col("value").cast("string"),logSchema).alias("parsed_value")) .select("parsed_value.*"); Dataset<Row> tenSecondCounts = parsed .withWatermark("timestamp","10 minutes") .groupBy( parsed.col("key"),window(parsed.col("timestamp"),"1 day")) .count(); StreamingQuery query = tenSecondCounts .writeStream() .trigger(Trigger.ProcessingTime("10 seconds")) .outputMode("append") .format("console") .option("truncate",false) .start();
Solution
The problem is parsed Col Replacing it with col will solve the problem I recommend always using the col function instead of dataset col.
When col returns an unresolved column, the dataset Col returns the parsed column
parsed. Withwatermark ("timestamp", "10 minutes") will create a new dataset containing new columns with the same name The watermark information is attached to the timestamp column in the new dataset instead of parsed Col ("timestamp"), so the columns in groupby have no watermark
When you use unresolved columns, spark will find the correct columns for you