Java – substitute for spark, fat and jar

I know there are at least two ways to put my dependencies into spark EMR jobs One is to create a fat jar, the other is to use the - packages option to specify the package submitted in spark

Zippers take a long time Is this normal~ 10 minutes. Could we have misconfigured it?

Command line options are good, but error prone

Are there any other options? I like it. If there is (already exists) a way to include the dependency list in a jar with gradle, and then let it download them Is that possible? Are there any other options?

Update: I released some answers One thing I didn't make clear in the original problem is that I also care when you encounter dependency conflicts because you have different versions of jar

to update

Thank you for reducing the number of dependencies or using the response provided as much as possible To solve this problem, let's assume that we have the minimum number of dependencies required to run the jar

Solution

If the spark job must be started with the help of the spark launcher through some applications, you can use the spark launcher. You can configure the jar patah without creating a fat for running the application jar.

To use fat jar, you must install Java and start the spark application. You need to execute Java jar [your fat jar here] If you want to launch an application from a web application, it is difficult to automate it

With sparklauncher, you can choose to launch spark applications from other applications, such as the above web application It's easier

import org.apache.spark.launcher.SparkLauncher

SparkLauncher extends App {

val spark = new SparkLauncher()
.setSparkHome("/home/knoldus/spark-1.4.0-bin-hadoop2.6")
.setAppResource("/home/knoldus/spark_launcher-assembly-1.0.jar")
.setMainClass("SparkApp")
.setMaster("local[*]")
.launch();
spark.waitFor();

}

Code: @ h_ 403_ 25@https ://github. com/phalodi/Spark-launcher

>Setsparkhome ("/ home / knoldus / spark-1.4.0-bin-hadoop 2.6") is used to set the internally used spark home to call spark submit@ H_ 403_ 25@> . Setappresource ("/ home / knoldus / spark_launcher-assembly-1.0. Jar") is used to specify the jar of our spark application@ H_ 403_ 25@> . Setmainclass ("sparkapp") is the entry point of the spark program, that is, the driver@ H_ 403_ 25@> . Setmaster ("local [*]") sets the address of the master from here. Now we run it on the loacal machine@ H_ 403_ 25@> . Launch () just starts our spark application

What are the benefits of SparkLauncher vs java -jar fat-jar?

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-SparkLauncher.html

https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/launcher/SparkLauncher.html

http://henningpetersen.com/post/22/running-apache-spark-jobs-from-applications

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>