Beginner relational to Molap design help

This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

  • Beginner relational to Molap design help

    Hi,
    I'm a beginner at both MOLAP, Jedox and BI. I have a snowflake database to import into Jedox. I gather in MOLAP that essentially, the legs of the star correspond to a dimension and that the side branches of the legs are handled by creating a hierarchy within that dimension.

    I have come to some difficulties on that path - for instance, lets say my fact table is production of apples. My star schema has a location leg made up of a city and a country of which the country is a branch of the snowflake leg, as there could be a number of locations all using the same country. As far as I understand it, this should be handled in a location dimension with level 1 as the country and level 2 as the city. But this does not work as cities and countries are different objects in the relational database and should not be put into the same dimension table; If it's in Luxembourg, I have a data issue as I cannot put Luxembourg into both the city and the country levels.

    So I did not do that. Instead, I imported the two as two different dimensions, in the knowledge that the fact table (cube) would essentially keep the city Luxembourg in the country Luxembourg related correctly so long as the fact table always associated them together. However, when I came to making a drill down, I found I did not know how to find a sub set of one dimension based on another without going all the way to the cube to find out. Specifically, I wanted two combo boxes, one for country and one for city where the cities in the second combobox would always be within the country selected in the first.

    so.. what is the best implementation?
  • Hi, I undertand that your relational system stores data in the following format and you want to analyze it by geography for example.

    Paris | France | 100
    Lyon | France | 100
    Lux | Lux | 100

    I think most OLAP people stringly advise againgst having two different dimensions representing which more or less defeats the purpose of OLAP and introduces a phenomenon known as sparcity - you will never have any value at intersection Germany/Paris (unless something very dramatic happens in the political arena). I'd suggest one of the following models: either use symmetrical hierarchy or asymmetrical. The benefit of the last is that you only need one node for analysis in cases such as Luxembourg but it makes system maintenance much more difficult as you need to maintain number of exceptions. Let's say you'd need further level of analysis for big regions and split France by department but there are smaller countries, say Denmark for which this level won't make sense. In this case you'd need three different base levels to maintain: Luxembourg leaf in on the 1st level, Danish cities are on the second level and French cities on the third level. In your case, you have country and city levels. Semantically all is correct: Luxembourg is both country and city.

    In my opinion the symmetrical hierarchy (equal number of levels between root node and leaf nodes) design is much cleaner. This becomes at the cost of performance though as system needs additional aggregations. So basically there are two options:

    Symmetrical design:
    Europe (c)
    +France (c)
    ++Paris (n)
    ++Lyon (n)
    +Luxembourg (c)
    ++Luxembourg (n)

    Or alternatively
    Europe
    +Very small countries
    ++Luxembourg
    ++Andorra
    ++&c

    Asymmetrical design
    Europe (c)
    +France (c)
    ++Paris (n)
    ++Lyon (n)
    +Luxembourg (n)

    As to the data load, the first one is very simple, col1 in the relational table is leaf node and col2 is inner node if they all aggregate up to Europe, for exmaple. In case of the second model, the exceptions for Luxembourg (and others) must be maintained.

    First example:
    :child | :parent | fact
    Paris | France | 100
    Lyon | France | 100
    Lux | Lux | 100

    Second example:
    If Col2 is (France, Germany &c)
    :child | :parent | fact
    Paris | France | 100
    Lyon | France | 100
    If col2 is Luxembourg
    :child| skip handling | fact
    Lux | Lux | 100

    Hope this helps
    Madis
  • Hi,
    great thanks - yes, my data format is like;

    Paris | France | 100
    Lyon | France | 100
    Lux | Lux | 100

    I'll add another column to complicate things a little:

    Paris | France | Europe | 100
    Lyon | France | Europe | 100
    Lux | Lux | Europe | 100
    CapeTown | SA | Africa | 100

    For which I'd go for the Symmetrical design:
    Europe (c)
    +France (c)
    ++Paris (n)
    ++Lyon (n)
    +Luxembourg (c)
    ++Luxembourg (n)
    Africa (c)
    +SA(c)
    ++CapeTown(n)

    I tried to load this using the TreeFH ETL but got a circular dependency error for Luxembourg. (I created a straight field transform to load in the table and then added the levels in order in the TreeFH screen.) What ETL mode should I use such that the levels do not get confused?

    Cheers!
  • Sadly, I am not in control of the data here, and there are many instances of name re-use over the tables in my snowflake model.

    Is the only way here to prefix the element name with level so they become unique and save the name as an attribute?

    Or should I go back to the one dimension per level model?
  • I'd definitely recommend against dimension per relational column model. Data integrity problems are pretty common in organizations and the bigger an organization gets the bigger issues they will have.

    Storing human-legible names as attributes is quite good solution (in most cases localization is done this way).
  • Hi, thanks for staying with me on this:
    so, for my dataset,

    Paris | France | Europe | 100
    Lyon | France | Europe | 100
    Lux | Lux | Europe | 100
    CapeTown | SA | Africa | 100
    Lux2 | Lux | Europe | 100

    I map it to something like
    Element1 | NameAttribute1 | Element2 | NameAttribute2 | Element3 | NameAttribute3 |
    City:Paris | Paris | Country:France | France | Continent:Europe | Europe | 100
    etc...

    Then I load this using the TreeFH with three rows with inputs Element1 Element2 and Element3 and I specify an Attribute 'Name' which I fill in each row: NameAttribute3 then NameAttribute3 then NameAttribute3.

    Is this correct? I find the Attribute upload confusing, as I can only specify one input variable per row in the attribute column, but I can make more than one attribute; how do I specify the input variables for more than one attribute in one element row? (If I wanted to add a telephone code, for example, at the same time as adding the human readable name?)

    I had a look at the manual, but it only specifies one attribute, 'Name'.
  • I realise now that in the load, more than one attribute can be created in the dropdown box; changing between these allows a different attribute link to be set in each of the element rows below. I had though that the link did not change in when teh attribute in the dropdown changed, and therefore it was impossible to set more than one attribute link during a load.
  • There is one other alternative you may want to explore (I haven't done this myself but I think its doable), is to go the route of surrogate keys. Meaning, you start a separate table (external to Jedox) with the mapping of IDs to city names like this (where city_id is the primary_key)

    city_id, city_name, country_id, country_name, region_id, region_name
    145, Luxembourg, 15, Luxembourg, 1, Europe
    147, Kansas City, 25, Kansas, 2, North America
    148, Kansas City, 27, Missouri, 2, North America

    Every unique tuple of Region, Country, City gets a unique city_id.
    country_id, country_name and region_id, region_name are for unique instances of country and region.

    When you build your dimension using the city_id, country_id and region_id
    (Add the actual city, country and region strings as attributes).

    This structure will allow for duplicate city names since our external city_ids are guaranteed to be unique.

    Check out the "Working with attributes" section p.56 in the Jedox 4.0 Excel Advanced Topics manual.
  • Also, if you can't find a suitable primary key, you can construct a key like this
    CONCAT(STRING(citi_id), '-', STRING(country_id), '-', STRING(region_id))

    While its not very compact, it will provide a unique indentifier for every city (assuming once again that you go with using attributes to hold the non-unique city names.)