SQL Table to Palo Cube

This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

  • SQL Table to Palo Cube

    I have a the following table in a SQL database, that I want to convert into dimensions/cube in palo

    The SQL table has 5 columns
    Key-part-1: Integer
    Key-part-2: Integer
    Metric-1: double
    Metric-2: double
    The Unique Key is Key-part1 + Key-part-2

    A couple common queries which are executed on the table are:
    (1) Select Key-part-2 from MyTable where Key-part-1 = 10;
    (2) Select Metric-1 from MyTable where Key-part-1 = 10;
    (3) Select Metric-2 from MyTable where Key-part-1 = 10 and key-part-2 = 100;

    I am trying to think through the best way to convert this table into palo dimensions and cube:
    Option 1: Dense Cube
    Dimension-1 = values for Key-part-1
    Dimension-2 (Metrics Dimension) = values Key-part-2, Metric-1, Metric-2

    *I have tried this approach, but when inserting into the cube I end-up over-writing the previous values i.e. palo seems to assume that Dimension-1 is a unqiue-key. So when executing select (1) always get only 1 answer.

    Option 2: Sparse Cube
    Dimension-1 = values for Key-part-1
    Dimension-2 = values for Key-part-2
    Dimension-3 (Metrics Dimension) = values Metric-1, Metric-2

    *Have tried this approach, but select (1) seems to giving strange results: for a key-part-1 of 450 dimension elements and key-part-2 of 900 dimension elements of which only 2 metrics have been populated for each key-part-1 value. When I retrieve the metrics, using key-part-1, I get 826 results - while I expected either 2 or 900. This feels like a defect.

    Option 3: Dummy Dimension to create a Dense Cube
    Dimension-1 = values for Key-part-1
    Dimension-2 (Dummy Dimension) = Create a new set of dimension values based on the number of metrics to be populated, to ensure the Dimension-1 + Dimension-2 is unique.
    Dimension-3 (Metrics Dimension) = values Key-part-2, Metric-1, Metric-2

    *Have not tried this: seems like a long way around

    Option 4: Option 2 + DFilter
    Use option 2 (sparse cube) and then use a dimension filter to retrieve Key-part-2.

    *Have not tried this: I wish I knew if I can filter-out empty values.

    Any advice on how best to approach this?
  • It seems you are still thinking relational/SQL and not multidimensional/OLAP.
    A cube is NOT focussed on such queries like the 3 examples you gave.
    If you only (or mostly) have such kind of queries you IMO shouldn't even use an OLAP cube!

    A cube is more intended to answer queries similar to the following kind of SQL:
    SELECT Key-part1, SUM(metric-1) FROM MyTable GROUP BY KeyPart1
    Furthermore the aggregation is mostly NOT related to key columns but to
    general attributes (like product color for example), e.g.:
    SELECT product_color, SUM(metric-1) FROM MyTable GROUP BY product_color
    ..often your primary key details are not even represented in the cube.

    Jedox can also filter out empty values via data filter (DFilter), equivalent to SQL Having:
    SELECT product_color, SUM(metric-1) FROM MyTable GROUP BY product_color
    HAVING SUM(metric-1)>0
    ..Jedox Paste-View does this automatically if you select the respective checkbox in the dialog.
    Otherwise (e.g. in Jedox Web Dynaranges) you do this via Subset.

    So the primary question for your dimensional model is what kind of aggregations, slices, analyses you want to reach.
    Furthermore for working with your model it is interesting which frontend you intend to use (MS Excel or Jedox Web?).

    So I recommend to read once more what the intention and working principle of OLAP / multi-dimensional databases is.
    ..and furthermore to look at the Jedox sample projects / databases and the Jedox knowledgebase.

    The post was edited 1 time, last by JCO ().

  • apart from the general comments about:

    from you options only options 2 / 4 make any sense

    depending on the relation between key-part1 and keypart 2 (1:n,n:m,m:1)
    and your aggregation needs it might also make sense to combine them
    in 1 dimension as a hierarchy:

    dimension1:
    key-part1 (cons elements)
    -> key-part1 & key-part2 (concatenated; base elements)

    => you can additionally add key-part1 and key-part2 as attributes to the base elements to be used in the attribute filter
    => or just use the textfilter to filter the elements by key parts

    dimension2:
    metrics

    But as said in my previous reply:
    It is questionable if you really need an OLAP cube for this.

    The post was edited 1 time, last by JCO ().

  • Thanks JCO

    For your question: It is questionable if you really need an OLAP cube for this.
    To provide a little more context: I have a custom front-end. It has 2 connections: a remote connection to a custom processing engine with an associated relational database and a second connection to palo server running locally on the client machine. The processing engine publishes processing results to the font-end and the front-end writes the results into the palo server running locally into a couple of cubes, finally the front-end retrieves the results from the palo server to display to the user. The retrieval rate between the front-end and the palo server for some of cubes is reasonably high (easily a few hundred queries a second), so it does not make sense from a performance perspective for the palo server to sit remotely from the custom front-end. The other aspect of publishing the results to a palo server is to allow the user to bypass the custom front-end and use Calc to directly retrieve the results from the palo cubes themselves.

    Fundamentally the front-end has the data, so for this table I can:
    (1) code the queries myself
    (2) persist and retrieve the data client side myself or create a mechanism to retrieve this from the processing engine when needed (lots or work)
    OR install some sort of SQL database client-side
    (3) but then the user still looses the capability to access the complete set of results from Calc.
    None of which seems like a good solution.

    The relationship between Key-part-1 and Key-part-2 is 1:M.

    Thanks for your suggestion in terms of using a merged key and then filtering using the attributes. I will add that to my list of options as option 5.

    Any thoughts on why option 2 did not work?
  • Spent some quality time with the debugger and got option 2 to consistently return the 900 cells I was expecting. There were some silly coding mistakes in both the cube creation and retrieval routines. Since the libpalo2 api does not seem to provide a way to suppress zeros on retrieval, will do some work to ensure the input is non-zero, so I can suppress zero's on output.

    Thank-you for your time and thougths!
  • I still don't know how you query the cube, but I use the PHP API quite heavily and there are at least 2 options:

    1.
    palo_data_export with ignore_empty = true

    2.
    palo_subset with palo_dfilter to select the dimension elements with to non-zero values in the cube
    and then use
    palo_data / palo_datav to select the values related to those elements