Talend + Palo + Time Hierarchie

This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

  • Talend + Palo + Time Hierarchie

    Hello board,

    I wanna do some Talend to Palo Export. I got some time information in a MySQL DB, anything from years down to minutes in different rows and some measures for it. Now I want to get this information into a palo cubo with 3 dimensions (one for the years, one for the monts and days, one for the hours and minutes) for the time. The export is not the real problem, Talend is good for that but I don't know how to build the time dimensions in palo before I can start. Ok I know how it works technically, but I don't know what's the best way to build such a dimension. I want to have these 3 dimensions to do some views with the data. So how can I best build the hierarchies for the dimensions? When I do some "speakin-elements" (like "jul-01") I got some heavy probs in the hourse and minutes dimension ;)

    Can anyone help me with my prob?

    If something is not understandable enough feel free to ask, my english sucks ;)

  • Dude,

    What are you doing using Talend? I couldn't get the thing running and have given up.

    The new ETL server is pretty cool if you want to give it a try. After all, it is designed to run with Palo and it is also free.

    I am personally running mysql with Palo ETL and Palo 3.0

    In your example, it may be easier to build the dimensions in excel (using the time formula functions). Based on your description, this is what I would do.......




    Time Hour

    Time Minute (1)

    or Time Minute (2)

    It really depending on what sort of data you have...... What I have given you can make the data very sparse, but you can definately change the increments or combine the two dimensions to make it one.......

    What sort of data do you have? Able to show some example?

  • Hello again.

    Ok, so let's say I wanna extract some data from twitter with the API (public time line) and I can give this into a table ... the rows of the table would look like this:

    title|pubdate|link|year|month|day|hour|minutes (the last 5 are extracted from pubdate with Talend)

    Now I "give" this to Talend with another table with relevant entrys filtered by a word maybe and it builds me a measure for the relevance of this word at a specific time. So I got a measure for the relevance at this time and I can send all the data in the mysql table also to palo. Now i want to build a hierarchie for the time so I can choose a specific date or a aggregated time space and it will show me the relevance.

    Hope this is written cleary ;)

    Can you go further on with this ecxcel thing or can you give me a link with a good tutorial on these time functions? Got no clear idea what you meant there ;)

    Thx for your help

    PS. I will do some screens if it's unclear, but my remote connection is broken atm.
  • Can you please explain "aggregated time space" what is the definition of this? per hour? per 15min?

    Let me get this correct, you want to select a time period, whether it is 10min, 15min, 20min, 1 hour, 2 hour, 1 day, 1 week and you want to know the aggregation of the measure? eg. In this hour, 20 books were published, and these were the titles?
    ie. time is your primary "search and measurement" tool?

    Therefore, you don't want to select the title and show the publication time?

    Once you answer the above, I can give you a better response.

    Re: Excel

    This is written in the manual(you can purchase this for $30 EUR), you can define the elements with N or C. N is a base level element, while C is a consolidation.

    So you would have
    C Total Year
    2009 1
    2008 1
    2007 1
    n 2009
    n 2008
    n 2007

    The above means that there are 4 items in your hierarchy. 2007, 2008, 2009 are base level elements and they all roll up to the Total Year consolidation.

    I don't have a link, maybe one of the other guys in the forum can suggest a link or a page on the manual...
  • "time is your primary "search and measurement" tool?"

    Exactly, that's it. I want to be able to aggregate (better build an average of) the measures which are e.g. on the 6th august, 10th august and 28th august so that palo show the values for the complete august. Then I want to be able to aggregate anything to the year etc. etc.

    And no, i just want to select the time ... that's anything for the moment.


    I did some consolidation with the modeller. And that's the point. Talend maybe sends 3 values + 1 measure to palo.
    1. value = 2009
    2. value = 28 august
    3. value = 17. hour and 30. minute of that our ( --> 17:30)

    Every value is in his own dimension in palo. So Talend can't great consolidated elements i have to do it before, i have to consolidate them manually, so that i got a time hierarchie in every of the 3 dimensions that way:

    1. dimension: just the year (no hierarchie)
    2. dimension: month <-- day
    3. dimension: hour <-- minute

    And after that all, i want to select:

    year --> month + day --> hour + minute and then the measure pops up.

    Is this possible
  • OK, this is a little bit more clearer. Sorry, if I sound confusing, just got out of bed, it is 7am here......

    Using your example, this is what I would do based on my example......
    I would send the following values from talend 5 values + 1 measure

    Notice how I have broken down number 2 into A and B.

    I have also broken down number 3 into A and B. This way, it will be much easier to write your codes\dimensions and it won't be as messy within the dimension

    1. value = 2009 (into the year dimension)

    2a. value = August (into the month dimension)

    2b value = 28 (into the day dimension)

    3a value - 17 (into the hour dimension)

    3b Value = 30 (into the minute dimension) - this is where you will need to decide how much incriments you will require, ie. every 15minutes.

    Now based on the above, I would have the following consolidations

    1. Consolidation = "Total Time" (Children are 2009, 2008, 2007 etc)

    2a Consolidation = "Total Year" (Children are Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec)

    2b (Consolidation = "Total Month" (Children are 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24 ,25,26,27,28,29,30,31)

    3a Consolidation = "total Day" (Chidren are

    3b Consolidation = "Total Hour" (Children are 00, 15, 30, 45)

    I have to run for now. will finish this post when I come back.
  • Ok, I understand what you mean, I also got the idea to simple build a dimension for every time unit I got there. But is there (a simple) way to have more time units (e.g. hours, minutes) in one dimension like in my last text or is this simply to much for such an expirement?

    Btw. you stated that you want to give some further advise? :)

    So if I do some views of dimensions after we created them like in your example, can I simply build a drop down menu or something to choose the exact time I want to show the measure for? Hm, I mean the way it looks like in some example dashboards?

    Thx for your further help.
  • Sorry mate, went out for 2 beers yesterday afternoon. 12 hours and 12 drinks later, was not in any sort of condition to reply.......

    You can definately combine hours and minutes together if you want, it will not affect performance that much, it will just take you more time.

    Eg. To Enter Jan 1 to Dec 31 in one dimension, you will require 365 entries, compared to 12 for months dimension and 31 for days dimension. When you are working on averages, that's when seperating can be "easier". Having said that, it depends sole on the nature of your data. For example, I worked at a place that used a 4 - 4 - 5 weeks calendar system, and therefore I wouldn't seperate the Year, Month, Week. They all had to be together.

    You can create the dropdowns later by comining the two dimensions. Once the selection is made in the drop down, use some sort of "Left" or "Right" function, or "Day", "month", "Hour", "Minute" to seperate the dropdown box. Let me know if I don't make this clear.

    I can't remember what I was going to write now (Memory Blank hehehe :P ). But you have got the gist of what I was trying to say.

    Good luck with it! :)

  • You better make year-specific entries in your time dimension.

    If you have e.g. February 31st - data will find its way into this slot ... ;)

    What about storing balances? You can not add them up.

    The fun starts when you also have to look at weeks. I have built a separate cube for this.

    I do not not know about the sparsity of your data - but going down to minutes feels too ambitious for OLAP for me.
    Daily data seems max for me.

    My background is financial planning though and I always think of the chore of actually planning or budgeting without too many DUMMies and other shortcuts.
    The number of columns and lines of your output/input with OLAP is also very limited. I let SQL/ETL do the details and use OLAP for the Analytical Processing.
  • Can you please provide an example? What is an arithmetical average?

    ie. are you looking for the average of Jan, Feb, Mar? If that is the case have a consolidation called "Q1 - Avg" (instead of "Q1"), and children as Jan, Feb, Mar. Right click on the elements, and set the "Weighting" as 0.333333. That's the multipllication factor.

    If that is not what you are after, please do provide an example.
  • To do this for a specific element create a new element (Element A) "Average Count Element A" with a rule saying

    Average Count Element A = N: if palo.data(blah, !'ElementA', blah) = 1 , 1, 0)

    which will set a value of 1 to each base level where there is an equivalent value in Element A.

    You can then use this value to divide the Element A consolidation to give an average in a third element Element A Average.

    A bit clunky but it works.
    Best wishes

    John Hobson
    The Planning Factory, Lytham, UK
  • Ok, here comes the example:

    We got hierarchie that way you said Xpression:

    Total Hour <-- 1-2-3-4-5-6-7-8-9-10-...-24

    Now maybe the hour 1 got an value of 5,00% (this is the relevance of our keyword in this hour) and the hour 10 got a value of 10%.

    Now we want the value for the Total Hour:

    Now it's: 15% (aggregation).
    But it should be 7,5% (average).

    Can I do this with your "weight-method"?

    The post was edited 1 time, last by HiWi-TUC ().

  • Xpresson's method will work after a fashion - assuming that the hours al have the same relative values.

    I you are actually trying to create a consolidated percentage then the only accurate way is likely to be to store the constituent values at the N: level and then to create a C: level back calculation.

    For example if you have

    Profit % Product A = 20%


    Profit % Product B = 10%

    Then it is not necessarily the case that

    Profit % Products A + B = 15%

    as product A may have twice the volume of Product B.

    To calculate a properly weighted average Profit % for the consolidation of products A & B you need a rule at the C: level that says

    (Profit A + Profit B) / (Sales A + Sales B) * 100

    This means logically that you must store constituents of the Profit & (i.e. Sales and Profit in my example)

    Hope this helps
    Best wishes

    John Hobson
    The Planning Factory, Lytham, UK
  • Ah i nearly got you Pommie. Actually i didn't come in touch with rules but i know what you mean and i will go and read something about rules and then check your statement again.

    Fortunatly all the hours and minutes got the same relevance, so i can do the easier method just "to have something" ;)

    Thanks very much @ all.
  • Originally posted by HiWi-TUC
    Hello Xpresson,

    thx for your advise unfortunatly I can't give you a response today cause I'm very busy. But I will look at it tomorrow and do some sort of screens :)

    Anybody else a different idea?

    PS. I have to work with Talend :(


    I think what you meant was...

    PS. I have to work with Talend :)

    Once you'll see the options, advantages and power of Talend, you will understand why many people use it. Not to say it is a free and open source tool.