Join this group if you use the Hadoop version of DMX.

81 Members
Join Us!

Hi All,

I am trying to run the CDC job J_FileCDC.dxj by specifying

dmxjob /run J_FileCDC.dxj

It is running successfully but I believe it is running on Linux Edge Node but not on Hadoop/YARN MR job. How to force this job to run on Hadoop/YARN as MR job.

Reason for this is suppose say we automate to invoke above "dmxjob..." command through Autosys or CTRL-M and like that if we run say 100 jobs (one per each table) at the same time then I believe edge node will have resources constraint (CPU, RAM etc).

For above reason I wanted to see how we can force a job to run on Hadoop/YARN MR job always.

Thanks,

Kishore Veleti A.V.K.

You need to be a member of Syncsort Community to add comments!

Join Syncsort Community

Email me when people reply –

Replies

  • Kishore,

    Why do you think the job is not running in the Hadoop Framework? Can you share the logs that you have for the run?

    Also I assume that you are running in one of our Test Drives right?

    Thanks!

    Chris

    • Hi Chris,

      Thanks for your quick response appreciate it.

      My understanding is when we submit SyncSort jobs they will run as MapReduce jobs. Based on this I am checking the YARN for an application id and I do not see any applications being run.

      Yes I am using Test Drive (Cloudera + SyncSort).

      Thanks,

      Kishore Veleti A.V.K.

    • Hi Kishore,

      I am trying to use Test Drive (Cloudera + SyncSort, DMX8.1.0_CDH5.3.0_MRv2_VMv1.11) but I am getting errros in all Use Case Accelerators when I run then in Hadoop (dmxjob /RUN job_example.dxj /HADOOP).

      Please could you tell me how to run Hadoop jobs?

      The errros are the following:

      dmxjob /RUN J_FileCDC_MultiTargets.dxj /HADOOP

      DMExpress Job : (TREXPINF) the trial will expire in 20 day(s) on Apr 10, 2016

      ************************ BEGIN JOB J_FileCDC_MultiTargets ************************

      DMExpress Job : (SUCCJOBMRABS) job conversion to "Hadoop MapReduce Flow" flow succeeded

      DMExpress Job : (TREXPINF) the trial will expire in 20 day(s) on Apr 10, 2016

      ************************ BEGIN JOB J_FileCDC_MultiTargets: subgroup T_PerformJoinCDC_MultiTargets ************************

      DMExpress Job : (HRUNMAPR) job was run on the Hadoop cluster as a MapReduce job

      DMExpress Job : (HNOTHDOP) multiple (or no) HDFS targets in the job are written to HDFS by DMExpress, and not by the Hadoop framework

      16/03/21 12:31:36 INFO mapred.FileInputFormat: Total input paths to process : 2

      16/03/21 12:31:36 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032

      16/03/21 12:31:36 INFO input.FileInputFormat: Total input paths to process : 2

      16/03/21 12:31:36 INFO mapreduce.JobSubmitter: number of splits:2

      16/03/21 12:31:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1458191904074_0031

      16/03/21 12:31:37 INFO impl.YarnClientImpl: Submitted application application_1458191904074_0031

      16/03/21 12:31:37 INFO mapreduce.Job: The url to track the job: http://DMXhTstDrvCDH:8088/proxy/application_1458191904074_0031/

      16/03/21 12:31:37 INFO mapreduce.Job: Running job: job_1458191904074_0031

      16/03/21 12:31:44 INFO mapreduce.Job: Job job_1458191904074_0031 running in uber mode : false

      16/03/21 12:31:44 INFO mapreduce.Job:  map 0% reduce 0%

      16/03/21 12:31:51 INFO mapreduce.Job: Task Id : attempt_1458191904074_0031_m_000001_0, Status : FAILED

      Error: java.io.IOException: (DMXMFAIL) the DMX-h map task did not complete as expected; see the standard error log for task attempt <attempt_1458191904074_0031_m_000001_0>

              at com.syncsort.dmexpress.hadoop.H.a(Unknown Source)

              at com.syncsort.dmexpress.hadoop.r.close(Unknown Source)

              at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:727)

              at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)

              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

              at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)

              at java.security.AccessController.doPrivileged(Native Method)

              at javax.security.auth.Subject.doAs(Subject.java:415)

              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)

              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

       

      16/03/21 12:31:51 INFO mapreduce.Job: Task Id : attempt_1458191904074_0031_m_000000_0, Status : FAILED

      Error: java.io.IOException: (DMXMFAIL) the DMX-h map task did not complete as expected; see the standard error log for task attempt <attempt_1458191904074_0031_m_000000_0>

              at com.syncsort.dmexpress.hadoop.H.a(Unknown Source)

              at com.syncsort.dmexpress.hadoop.r.close(Unknown Source)

              at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:727)

              at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)

              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

              at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)

              at java.security.AccessController.doPrivileged(Native Method)

              at javax.security.auth.Subject.doAs(Subject.java:415)

              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)

              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

       

    • Chris,

      I could not find any logs in /usr/local/dmexpress/logs in Test Drive machine. Where can I find the logs for the job I executed which ran successfully.

      Thanks,

      Kishore Veleti A.V.K.

    • The logs for a Hadoop run should show up in YARN as you expect.

      When you run DMX-h from the command line with the command -

      dmxjob /run ...

      There is logging that is written directly to the console. Can you share that with us as well as the dmxjob command that you are running?

      Thanks!

      Chris

    • Chris,

      Below are the logs and commands I executed. Thanks for your time.

      cd /UCA/bin
      ./prep_dmx_example.sh ALL

      cd /UCA/Jobs/HDFSLoad/DMXHDFSJobs

      -bash-4.1$ dmxjob /run J_HDFSLoad.dxj
      DMExpress Job : (TREXPINF) the trial will expire in 30 day(s) on Mar 5, 2016
      ************************ BEGIN JOB J_HDFSLoad ************************

      ************************ BEGIN TASK T_HDFSLoad ************************
      [DMExpress 8.1 Linux 2.6 x86_64 64-bit Copyright (c) 2015 Syncsort Inc.]
      [For demo use only]
      DMExpress : (TREXPINF) the trial will expire in 30 day(s) on Mar 5, 2016
      02/04/2016 19:02:21 - Processing /UCA/Jobs/HDFSLoad/DMXHDFSJobs/T_HDFSLoad.dxt, last modified on 03/11/2015 19:25:52
      02/04/2016 19:02:26 - DMExpress options validated. Processing continues.

                                      DMExpress statistics

      Source: /UCA/Data/Source/supplier.dat
          last modified on 03/11/2015 14:57:14
      Records read:                     10,000  Data read (bytes):          1,395,037
      Records copied:                   10,000  Data copied (bytes):        1,395,037
      Target: /UCA/HDFSData/Target/supplier.dat
          last modified on **********
      Records output:                   10,000  Data output (bytes):        1,385,037
      Input record length:                 192  Output record length:             191
      Memory guideline from job (MB):      100
      Virtual memory allocated (MB):        31  Physical memory used (MB):         14
      Work space used (bytes):               0

      Elapsed time:                 0:00:05.56  CPU time:                  0:00:00.54

      02/04/2016 19:02:27 - DMExpress has completed
      ************************ END TASK T_HDFSLoad   ************************
      Job has completed successfully.
      Total elapsed time: 0:00:07.19
      ************************ END JOB J_HDFSLoad   ************************

    • Ahh this helps a lot.

      To run your job on your cluster you need to pass the /HADOOP option when executing the dmxjob command. For example :

      dmxjob /hadoop /run J_HDFSLoad.dxj

      More detail on the options for running the dmxjob command are in the help.

      Thanks!

      Chris

This reply was deleted.

To access Syncsort Knowledge Base, visit:

My Support