Join this group if you use the Hadoop version of DMX.

81 Members
Join Us!

Getting source row counts

I am extracting from operational databases and landing in HDFS. I am qualifying by "last_update_date". The operational DB is constantly changing, so as it changes, the last_update_date is constantly changing. How can I ensure the row counts in HDFS match what I expect? If I go back and re-count my source DB, because the DB is constantly being updated, the row count will for sure be different than what I landed I HDFS.

Is there any "in-line count" in DMX-H? Something to count the rows as I'm extracting them, and storing that metadata somewhere, so AFTER I land in HDFS, I can count what is in HDFS and compare against that metadata?

Seems like this would be a common problem, just wondering how other folks are getting around this balancing problem?

You need to be a member of Syncsort Community to add comments!

Join Syncsort Community

Email me when people reply –

Replies

  • Thanks Brett - easy, excellent solution!! Appreciate it!!

  • Hi Don,

    What we do today is use Python to parse through the DMX-h log to pull out the load metrics and then we store them in a custom built metadata database.

    Thanks,

    Brett

This reply was deleted.

To access Syncsort Knowledge Base, visit:

My Support