Multithreading – Apache spark standalone mode: number of cores

I am trying to understand the basic knowledge inside spark and spark documents used to submit applications in local mode. Please refer to spark submit – master settings:

Since all data is stored on a single local computer, it does not benefit from distributed operations on RDD

When spark uses multiple logical cores, how does it benefit and what is happening internally?

Solution

The system will allocate additional threads to process the data Although limited to one machine, it can still take advantage of the high parallelism available in modern servers

If you have a reasonably large data set, for example, a dozen partitions, you can measure the time required to use local [1] vs local [n] (where n is the number of cores in your machine) You can also see the difference in machine utilization If you have only one core specified for use, it will only use 100% of one core (plus some additional garbage collection) If you have 4 cores and specify local [4], it will use 400% of the cores (4 cores) And the execution time can be significantly reduced (although it is usually not reduced by 4 times)

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>