Where is the Union () method of the Java – spark dataframe class?
•
Java
I am using Spark's Java connector and want to combine two dataframes, but the strange thing is that the dataframe class only has unionall? This is intentional. Is there a way to combine two dataframes without repetition?
Solution
If it is considered safe to assume that it is intentional Other Union operators, such as RDD Union and dataset The Union will also retain duplicates
If you think it makes sense Although the operation equivalent to union all is only a logical operation, it does not require data access or network traffic. Finding different elements requires random playback, so it may be very expensive
df1.unionAll(df2).distinct()
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
二维码