避免 spark 提交 上传自带 jar包解决办法

1
2
17/09/01 15:38:59 INFO yarn.Client: Uploading resource file:/usr/local/spark-2.1.1-bin-without-hadoop/spark-46d1bd70-b346-4027-bce4-9540f4b6035a/__spark_libs__4051900056689219834.zip -> hdfs://wwj.shise.com:9000/user/hadoop/.sparkStaging/application_1504148698505_0021/__spark_libs__4051900056689219834.zip
17/09/01 15:41:45 INFO yarn.Client: Uploading resource file:/Users/frank/IdeaProjects/simpleApp/target/scala-2.11/simpleApp-assembly-1.0.jar -> hdfs://wwj.shise.com:9000/user/hadoop/.sparkStaging/application_1504148698505_0021/simpleApp-assembly-1.0.jar

可以看到,上传花费约3分钟,这段时间是为了将$SPARK_HOME/jar下的所有jar包上传到yarn,实际上可以完全避免。
Screen Shot 2017-09-01 at 16.16.35

实际上这部分文件完全可以就放在hdfs上,
Screen Shot 2017-09-01 at 16.21.32

先将这部分jar包复制到hdfs:
hadoop fs -mkdir /tmp/spark/lib_jars/
hadoop fs -put $SPARK_HOME/jars/* /tmp/spark/lib_jars/

设置vim $SPARK_HOME/conf/spark-defaults.conf
添加这行

1
spark.yarn.jars /tmp/spark/lib_jars/* ##这里用hdfs相对路径即可

再submit不会出现将jar文件打包成zip文件上传的信息了。