Join this group if you use the Hadoop version of DMX.

70 Members
Join Us!

Getting source row counts

I am extracting from operational databases and landing in HDFS. I am qualifying by "last_update_date". The operational DB is constantly changing, so as it changes, the last_update_date is constantly changing. How can I ensure the row counts in HDFS match what I expect? If I go back and re-count my source DB, because the DB is constantly being updated, the row count will for sure be different than what I landed I HDFS.

Is there any "in-line count" in DMX-H? Something to count the rows as I'm extracting them, and storing that metadata somewhere, so AFTER I land in HDFS, I can count what is in HDFS and compare against that metadata?

Seems like this would be a common problem, just wondering how other folks are getting around this balancing problem?

You need to be a member of Syncsort Community to add comments!

Join Syncsort Community

Email me when people reply –

To access Syncsort Knowledge Base, visit:

My Support