Join this group if you use the Hadoop version of DMX.
I am extracting from operational databases and landing in HDFS. I am qualifying by "last_update_date". The operational DB is constantly changing, so as it changes, the last_update_date is constantly changing. How can I ensure the row counts in HDFS match what I expect? If I go back and re-count my source DB, because the DB is constantly being updated, the row count will for sure be different than what I landed I HDFS.
Is there any "in-line count" in DMX-H? Something to count the rows as I'm extracting them, and storing that metadata somewhere, so AFTER I land in HDFS, I can count what is in HDFS and compare against that metadata?
Seems like this would be a common problem, just wondering how other folks are getting around this balancing problem?