Join this group if you use the Hadoop version of DMX.

79 Members
Join Us!
Hi All,I just found in GitHub SyncSort DTL approach and very excited about it.To handle many number of files belonging to multiple tables in Hive i.e. copy from local Linux host to HDFS based on a filter criteria I thought DTL approach might help.Here are more details:Assume say on Linux file system we have folder by name /mydata/srct-ables-dump/ and this folder assume say I have data files (text files) astable_customers_jan01_2016.txttable_customers_jan02_2016.txttable_orders_jan01_2016.txttable_orders_jan02_2016.txtNow we need to copy above 4 files into two different HDFS folders /hdfs-data/hive-tables/customers and /hdfs-data/hive-tables/orders. Inside Hive we like to have above data partitioned based on business date i.e. date on the file name i.e. eventually like to have HDFS path as /hdfs-data/hive-tables/orders/jan01_2016/table_orders_jan01_2016.txt and /hdfs-data/hive-tables/orders/jan01_2016/table_orders_jan02_2016.txtSince single folder has data related to multiple tables I don't know how we can develop SyncSort COPY task for this. To achieve this I though developing a DTL with COPY task might help. At a high level/TASK COPY/INPUT $LOCAL_DIR/table_customers_jan01_2016.txt ...../OUTPUT $HDFS_DIR/table_customers_jan01_2016.txt .....Planning to write a for loop in Groovy script and each loop will build the above DTL string, convert it into a temp file and give that as an input to dmxjob /run commandWanted to know from the community any issues with this approach or alternative ways of doing the same?i.e.ingest_data.groovy is the groovy file namedef loadDataFromNFSToHDFS() {for (eachTable in inputTableNames) {// using Java IO API find all the file names matching the table name as suffix in the file namefor (eachFileForTable in allFilesInNFSForThisTable) {// build the DTL string with file name as absolute path// store the DTL string as a temp file// run the dmxjob /run// move the local file from its current position to archive location - may be another Task in the above DTL immeidately after copying the local file to HDFS}}}Thanks,Kishore Veleti A.V.K.

You need to be a member of Syncsort Community to add comments!

Join Syncsort Community

Email me when people reply –

Replies

  • Kishore,

    Yes it does look like DTL would help here. The situation you are describing, generation of DMX-h tasks programmatically, is exactly what DTL was designed for.

    Thanks!

    Chris

  • Somehow all my text formatting of above comments is lost, sorry for the inconvenience.

This reply was deleted.

To access Syncsort Knowledge Base, visit:

My Support