Sparse dimensions, encoding postal codes

This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

  • Sparse dimensions, encoding postal codes

    Hi group,
    I am very new to Palo/Jedox, so please excuse if this is a trivial question.
    I use a dimension in my cube that encodes a location, so I can aggregate/zoom in on where some transaction occured. At my source system, I have the whole address at a very granular level (i.e. country - state - postal code - street name/number).
    Now, I want to ETL this into Jedox. Here come my questions:
    1. Obviously, I can not define all the possible addresses in my dimension. So, if we restrict this example to postal code: if it has 5 digits, this would result in a dimension with 100'000 members. Currently, I solved this by aggregating on import, encoding postal codes to a discrete number of cases, e.g. 1xxxx, 2xxxx etc. Is this the way to go here?
    2. If I wanted to have it at a more granular level, i.e. having the whole postal code in the dimension, is this practical? As the dimension of the cube would be high, most of the cells would be "empty", leading to a very sparse cube.
    3. Regarding 2., what is the correct way to deal with dimension members that have not been registered before? At the moment, I used the web modeler to manually insert all the postal codes that I knew were in my DB. Can I ensure that a postal code is added to the list once it is imported for the first time? As I understand it from the ETL manual, the correct way would be to use the "Load - dimension" feature. If I plan to regularly use ETL to fill my cubes, would I have to load my dimensions prior to loading the cubes, so all the postal codes I need are available?
    Thanks a lot in advance,
    regards,
    Georg
  • Hi Georg,

    I'd recommend taking a dump of postal codes from a reference source like: geonames.org/export/
    Palo/Jedox can deal with sparse dimensions (if you check out the files in the Data directory, you'll see only the defined data points in any of the cube files.). I say its better to have the dimensions defined up front than added to later. If you add later, you'll need to trigger reaggregation (cube rebuilding).

    If you want to take the "only define the elements that I use in a dimension", you'll need to recompile your cube every time you change a dimension. Which can be painful. But if your dimension, *rarely* changes, then that may be the way to go.

    Regarding #3, yes, use the ETL to load your dimension before loading your cube and always reload your cube if your dimensions change.

    The post was edited 1 time, last by byoshimi99999 ().