Performance impact of access to attribute cubes in rules vs. direct storage of data in cube cells

    Performance impact of access to attribute cubes in rules vs. direct storage of data in cube cells

    Dear community,

    What is the performance impact of access to attribute cubes in rules
    vs. direct storage of data in cube cells?

    Simplified example:

    I have a product dimension with 20.000 products.
    Each product has a price which is slowly changing.

    Of course there is a time dimension on level year+month.

    There will be rules on the cube:
    Turnover = Price * Units
    + additionally there are several margin calculations on top

    I would see 2 possibilities to cover this in my cube:

    1.
    Write the prices per month directly into the cube

    2.
    Transform the product dimension into a Type 2 SCD
    via an additional hierarchy level.

    Then store the price in an attribute and refer to it in a cube rule:
    ['Price'] = B: PALO.DATA("db","#_products","price")

    Product A
    Product A: 01.2013 - 06.2014; Price = 400
    Product A: 07.2014 - 12.2015; Price = 420
    Product A: 01.2016 - ; Price = 450

    Annotation:
    There is another dimension with ~20.000 elements as well in the cube, which is probably important for performance considerations.

    I would tend to the SCD approach also considering, that future products will be created manually for planning purposes
    without running through the source system ETL process.
    But are there major drawbacks in terms of performance and scaling?

    Thanks

    JCO :)
    Hi,

    after reading your specs I would personally not go for the attribute solution. The problem is, that you will have to write a rule that get the correct price depending on with year/month you query the data

    so if you have the possibility to pre-calculate the turnover in the ETL that would be better from my point of view

    as for the requirement in the planning you should let the svs calculate the turnover when you change the price or the units value

    hope this helps,
    laloune

    Post hoc, non est propter hoc
    Hi Laloune,

    thanks for your quick reply.
    I didn't thought about SVS in this regard as a kind of replacement for specific rules/ETL.
    It's indeed an interesting approach.
    I have to check this regarding maintainability as we have several cases of the above described scenario.

    However regarding prices and and SCD Type 2 approach:
    The clue in this architecture is that you don't need to do time calculations with the pricing attributes.
    With SCD2 you create a kind of sub elements ("sub products") for each product, where each sub product has its own fixed price.
    Each Sub product has a validity time range.
    It is then more a matter of the ETL process to write the number of sold units per month into the right sub product.
    In the dimension hierarchy then all the sub products are aggregated to the "real" main product with correct turnover.
    (see sample file)

    So then the lookup to attributes is just a simple PALO.DATA to the attribute cube,
    where I have currently no real feeling how expensive these lookups are in Jedox. (as I am coming from the Microsoft BI world)
    ..and if there are any other disadvantages regarding markers etc

    Thanks for your thoughts
    Jens
    Images
    • SCD Prices.png

      112.13 kB, 1,848×118, viewed 57 times
    Files
    • SCD Prices.zip

      (10.68 kB, downloaded 46 times, last: )
    ok I got it, I thought this was the same product element using different prices depending of the time.

    in this case this should be ok, though you can use a PALO.MARKER function for getting the price in the rule so that the rule will not be evaluated if the price is 0
    laloune

    Post hoc, non est propter hoc
    Ok, but coming back to the original question
    ​what is the difference in performance between:

    Variant A:
    all prices are stored directly in cube cells
    this would affect approx. 90million cells out of the whole cube

    Variant B:
    price is taken from attribute cube via rule
    ['Price'] = B: PALO.DATA("db","#_products","price")
    ..I guess such a rule is probably not evaluated for all cells of the cube,
    but for a relevant slice according to the query, correct or not?!

    I just would like to hear some experiences about the performance impact:
    ..is A a lot faster than B? or B a lot faster than A? or is there almost no difference?
    What is the personal experience of the users here in the forum?

    Thanks
    Jens