ETL Server: avoid redundant extracts

This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

  • ETL Server: avoid redundant extracts

    A question for the developers team:

    In the project which I am currently setting up, there are six really complex (and therefore slow) ODBC extracts. They are used more than once since various transforms are based on each of them.

    I managed to have the entire job execute in half of the time through a preceding job which runs these five extracts just once and writes the result into six flat files so that all the transforms can use the flat file instead of redundantly re-executing the ODBC extracts.

    Could one introduce some logic into ETL Server which does something similar automatically so one does not have to set that up manually?

    The post was edited 1 time, last by holger_b ().

  • Interesting idea, but difficult to find a good logic for it:
    - When should this "Cache" be updated, after each job, every day?
    - If you have a SQL-database available, it would be faster to write the data there, instead of o file.
    It seems that doing it manually as proposed is performant, flexible and not so complicated to model.
  • Hello Andreas,

    sure, not an actual problem to do that manually.

    What I had in mind was something like this: When ETL Server notices that within the same job (for example a default job which contains multiple other jobs) an ODBC source will be queried more than once, it caches the result or stores it in the groovy database (isn't that done anyway?).