Where is the Union () method of the Java – spark dataframe class?

I am using Spark's Java connector and want to combine two dataframes, but the strange thing is that the dataframe class only has unionall? This is intentional. Is there a way to combine two dataframes without repetition?

Solution

If it is considered safe to assume that it is intentional Other Union operators, such as RDD Union and dataset The Union will also retain duplicates

If you think it makes sense Although the operation equivalent to union all is only a logical operation, it does not require data access or network traffic. Finding different elements requires random playback, so it may be very expensive

df1.unionAll(df2).distinct()
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>