Java – substitute for spark, fat and jar
I know there are at least two ways to put my dependencies into spark EMR jobs One is to create a fat jar, the other is to use the - packages option to specify the package submitted in spark
Zippers take a long time Is this normal~ 10 minutes. Could we have misconfigured it?
Command line options are good, but error prone
Are there any other options? I like it. If there is (already exists) a way to include the dependency list in a jar with gradle, and then let it download them Is that possible? Are there any other options?
Update: I released some answers One thing I didn't make clear in the original problem is that I also care when you encounter dependency conflicts because you have different versions of jar
to update
Thank you for reducing the number of dependencies or using the response provided as much as possible To solve this problem, let's assume that we have the minimum number of dependencies required to run the jar
Solution
If the spark job must be started with the help of the spark launcher through some applications, you can use the spark launcher. You can configure the jar patah without creating a fat for running the application jar.
To use fat jar, you must install Java and start the spark application. You need to execute Java jar [your fat jar here] If you want to launch an application from a web application, it is difficult to automate it
With sparklauncher, you can choose to launch spark applications from other applications, such as the above web application It's easier
import org.apache.spark.launcher.SparkLauncher SparkLauncher extends App { val spark = new SparkLauncher() .setSparkHome("/home/knoldus/spark-1.4.0-bin-hadoop2.6") .setAppResource("/home/knoldus/spark_launcher-assembly-1.0.jar") .setMainClass("SparkApp") .setMaster("local[*]") .launch(); spark.waitFor(); }
Code: @ h_ 403_ 25@https ://github. com/phalodi/Spark-launcher
>Setsparkhome ("/ home / knoldus / spark-1.4.0-bin-hadoop 2.6") is used to set the internally used spark home to call spark submit@ H_ 403_ 25@> . Setappresource ("/ home / knoldus / spark_launcher-assembly-1.0. Jar") is used to specify the jar of our spark application@ H_ 403_ 25@> . Setmainclass ("sparkapp") is the entry point of the spark program, that is, the driver@ H_ 403_ 25@> . Setmaster ("local [*]") sets the address of the master from here. Now we run it on the loacal machine@ H_ 403_ 25@> . Launch () just starts our spark application
What are the benefits of SparkLauncher vs java -jar fat-jar?
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-SparkLauncher.html
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/launcher/SparkLauncher.html
http://henningpetersen.com/post/22/running-apache-spark-jobs-from-applications