ETL Server: avoid redundant extracts

    ETL Server: avoid redundant extracts

    A question for the developers team:

    In the project which I am currently setting up, there are six really complex (and therefore slow) ODBC extracts. They are used more than once since various transforms are based on each of them.

    I managed to have the entire job execute in half of the time through a preceding job which runs these five extracts just once and writes the result into six flat files so that all the transforms can use the flat file instead of redundantly re-executing the ODBC extracts.

    Could one introduce some logic into ETL Server which does something similar automatically so one does not have to set that up manually?

    Post was edited 1 time, last by “holger_b” ().

    Interesting idea, but difficult to find a good logic for it:
    - When should this "Cache" be updated, after each job, every day?
    - If you have a SQL-database available, it would be faster to write the data there, instead of o file.
    It seems that doing it manually as proposed is performant, flexible and not so complicated to model.
    Hello Andreas,

    sure, not an actual problem to do that manually.

    What I had in mind was something like this: When ETL Server notices that within the same job (for example a default job which contains multiple other jobs) an ODBC source will be queried more than once, it caches the result or stores it in the groovy database (isn't that done anyway?).