博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
【原创】大叔经验分享(19)spark on yarn提交任务之后执行进度总是10%
阅读量:4968 次
发布时间:2019-06-12

本文共 3696 字,大约阅读时间需要 12 分钟。

spark 2.1.1

系统中希望监控spark on yarn任务的执行进度,但是监控过程发现提交任务之后执行进度总是10%,直到执行成功或者失败,进度会突然变为100%,很神奇,

 

 下面看spark on yarn任务提交过程:

 

spark on yarn提交任务时会把mainClass修改为Client

childMainClass = "org.apache.spark.deploy.yarn.Client"

spark-submit过程详见:

 

下面看Client执行过程:

org.apache.spark.deploy.yarn.Client

def main(argStrings: Array[String]) {...    val sparkConf = new SparkConf    // SparkSubmit would use yarn cache to distribute files & jars in yarn mode,    // so remove them from sparkConf here for yarn mode.    sparkConf.remove("spark.jars")    sparkConf.remove("spark.files")    val args = new ClientArguments(argStrings)    new Client(args, sparkConf).run()...  def run(): Unit = {    this.appId = submitApplication()...  def submitApplication(): ApplicationId = {...      val containerContext = createContainerLaunchContext(newAppResponse)...  private def createContainerLaunchContext(newAppResponse: GetNewApplicationResponse)    : ContainerLaunchContext = {...    val amClass =      if (isClusterMode) {        Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName      } else {        Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName      }

这里调用过程为Client.main->run->submitApplication->createContainerLaunchContext,然后会设置amClass,最终都会调用到ApplicationMaster,因为ExecutorLauncher内部也是调用ApplicationMaster,如下:

org.apache.spark.deploy.yarn.ExecutorLauncher

object ExecutorLauncher {  def main(args: Array[String]): Unit = {    ApplicationMaster.main(args)  }}

 

下面看ApplicationMaster:

org.apache.spark.deploy.yarn.ApplicationMaster

def main(args: Array[String]): Unit = {...    SparkHadoopUtil.get.runAsSparkUser { () =>      master = new ApplicationMaster(amArgs, new YarnRMClient)      System.exit(master.run())    }...  final def run(): Int = {...      if (isClusterMode) {        runDriver(securityMgr)      } else {        runExecutorLauncher(securityMgr)      }...  private def registerAM(      _sparkConf: SparkConf,      _rpcEnv: RpcEnv,      driverRef: RpcEndpointRef,      uiAddress: String,      securityMgr: SecurityManager) = {...    allocator = client.register(driverUrl,      driverRef,      yarnConf,      _sparkConf,      uiAddress,      historyAddress,      securityMgr,      localResources)    allocator.allocateResources()    reporterThread = launchReporterThread()...  private def launchReporterThread(): Thread = {    // The number of failures in a row until Reporter thread give up    val reporterMaxFailures = sparkConf.get(MAX_REPORTER_THREAD_FAILURES)    val t = new Thread {      override def run() {        var failureCount = 0        while (!finished) {          try {            if (allocator.getNumExecutorsFailed >= maxNumExecutorFailures) {              finish(FinalApplicationStatus.FAILED,                ApplicationMaster.EXIT_MAX_EXECUTOR_FAILURES,                s"Max number of executor failures ($maxNumExecutorFailures) reached")            } else {              logDebug("Sending progress")              allocator.allocateResources()            }...

这里调用过程为ApplicationMaster.main->run,run中会调用runDriver或者runExecutorLauncher,最终都会调用到registerAM,其中会调用YarnAllocator.allocateResources,然后在launchReporterThread中会启动一个thread,其中也会不断调用YarnAllocator.allocateResources,下面看YarnAllocator:

org.apache.spark.deploy.yarn.YarnAllocator

def allocateResources(): Unit = synchronized {    updateResourceRequests()    val progressIndicator = 0.1f    // Poll the ResourceManager. This doubles as a heartbeat if there are no pending container    // requests.    val allocateResponse = amClient.allocate(progressIndicator)

可见这里会设置进度为0.1,即10%,而且是硬编码,所以spark on yarn的执行进度一直为10%,所以想监控spark on yarn的任务进度看来是徒劳的;

 

转载于:https://www.cnblogs.com/barneywill/p/10250694.html

你可能感兴趣的文章
旋转数组
查看>>
2015 多校赛 第四场 1010 (hdu 5336)
查看>>
Day02 过滤和排序数据
查看>>
1043: [HAOI2008]下落的圆盘 - BZOJ
查看>>
CSS打造固定表头
查看>>
Activity管理类
查看>>
http缓存
查看>>
Luogu4022 CTSC2012熟悉的文章(广义后缀自动机+二分答案+动态规划+单调队列)
查看>>
java.lang.StringBuilder
查看>>
Net core 关于缓存的实现
查看>>
mysql的表和数据类型
查看>>
uva11988Broken Keeyboard(链表)
查看>>
Java线程池关闭1-shutdown和isTerminated<转>
查看>>
web安全学习第一周
查看>>
nginx的学习材料
查看>>
Java中利用BigInteger类进行大数开方
查看>>
IIS错误:在唯一密钥属性“fileExtension”设置为“.mp4”时,无法添加类型为“mimeMap”的重复集合项...
查看>>
Eclipse快捷键
查看>>
关于jar冲突的解决方向servlet-api
查看>>
洛谷P3369 【模板】普通平衡树(FHQ Treap)
查看>>