facttables + dimensions + keys

This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

  • facttables + dimensions + keys

    Fact tables + Dimensions in PALO

    From the demo i see for example that the name of a dimension is used as key for the dimension and facttable as well.

    We are using for different DWH in Oracle / SQL Server keys for fast loading and small tables.

    Will this also be supported by PALO otherwise we have to make new views/tables for slicers and facttables.

    Example:

    facttable:

    Sales: Id Employee Gender Sales
    1 100 1 eur 100
    2 200 2 eur 200

    dimension:

    employee: id name
    100 marco
    200 palo

    gender: id gender
    1 male
    2 female

    OLAP cube:

    Sales: Name Gender Sales
    marco male eur 100
    palo female eur 200

    Regards,
    Marco Groeneveld
  • RE: facttables + dimensions + keys

    Marco

    this sounds like ROLAP, where Palo thinks MOLAP. The way how your master data are organized fits perfectly for a relational database. One could possibly think of imitating this in a MOLAP database, but I believe it is more suitable to include all the necessary fields in your data export.

    Regards
    Holger
  • It is not special ROLAP, MOLAP or HOLAP.

    But in a DWH it is quite common to use keys instead of names in facttables and dimension tables.

    While the performance is better and less space is needed.

    But any idea if in Palo support of numbers as keys between facttables and dimension tables will be supported ?

    If not not a big problem but in any DWH there should be made special views without the very fast keys.

    Regards, Marco
  • Palo doesn't need to have surrogate keys, it's not querying a relational source.

    Such issues as surrogate keys won't affect a system such as palo. It won't affect the performance of palo since such a product doesn't hold its data in a relational form and therefore you don't need to perform joins to get at the data.
    Palo in essence holds data in large structures in ram.

    If you believe that surrogate keys are essential for performance then I suggest you look at Applix's TM1 which Nigel Pendse of the OLAP Report states as being very underated. TM1 is widely regarded as one of the very fastest OLAP servers on the market and it holds data in the same manner as Palo.

    Sorrogate keys are used in on relational tables when you create star/snowflake schemas.

    Surrogate keys are actually used since you can't necessarly trust the source keys, and therefore you choose to create your own keys since there may be issues with the ones present in the source system.
    In terms of performance, you only get an increase in performance with such a system since you have created a Star / snowflake schema and therefore fewer joins are needed to get your desired results.
    Your performance would obviously then depend on the underlying database engine and how well you indexed the tables.

    However none of this is relevant to a system such as Palo. It doesn't hold data in relational tables, and if you processed data into palo from a relational DWH the keys aren't needed since it doesn't query the underlying data, since you process the data into the system on a batch basis.

    If you want to use surrogate keys and relational data, then you are probably best using Mondrian, an open source ROLAP server written in Java (mondrian.sourceforge.net/). Mondrian would take advantage of your keys and indexes, but you wouldn't be able to do anywhere near the amount of things you could with Palo.

    Creating a view to use as a basis of data loading will not affect your performance since the joins would be on the surrogate keys (foreign key on the fact table and primary key on your dimension table), but you would be selecting the element name.
  • Well it is not the performance of PALO. (but I have to say I have not tested the performance of PALO based on a 20milion rows facttable yet).

    When the new PALO is available nex year I will compare it with a 20milion rows facttable and 20 dimensions from an Oracle DWH and processed to a Microsoft Analysis Services 2000 MOLAP database and queried with Pivottables in Excel / Panorama Novaview.

    I know you can work without keys and use the description in a direct way. But for the import to PALO you need to have the fast access to import the data. And this is why most DWH people and also Oracle Warehouse Builder (one of the best regarding Gartner) is using keys.

    Not using keys is used for small DWHs and maybe PALO is not developed for large OLAP solutions but I will give the Jedox people my + for all their great work!

    Any comments are welcome regarding BI Architecture for DWH, OLAP, etc.

    Regards, Marco
  • The keys really aren't any issue in importing data.
    You use the keys to perform the joins and since you join the tables on the keys you get the performance regardless. Therefore you can hold your data in a fact table using keys but create a view as an abstraction layer on top of the tables for loading the data in Palo.
    See the attached spreadsheet which shows a sample fact table and 3 dimension tables as you can see the tables are joined on the key fields, which since they are indexed are fast. However the query returns the element names (but the work is done in performing the joins - and as already stated since these fields are indexed you get quick performance)

    BTW - Palo is RAM based so that is a limitation, however a 64 bit server can address several petabytes of data so in theory you should be able to have access to very large amounts of data. Also it will probably be possible to setup a clustered solution in Linux eg computerworld.com/hardwaretopi…ory/0,10801,94564,00.html
    Files
    • DWH_Sample.zip

      (2.28 kB, downloaded 628 times, last: )