Metadata from MySql

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • Metadata from MySql

      This is a requirement for one of our clients. They have two sources of data. Once is a MySql data, which is in an unstructured format, and another Excel data, in a structured format. I need to extract both, join them together and then load cubes.

      So, what I did is as follows:
      1. Extracted the MySql data and the Excel data
      2. Denormalized the MySql extract so that the unstructured data is transformed into structured data
      3. Joined the two tables - Denormalized MySQL extract and the Excel extract
      4. Used the joined table to load my cube

      However, the entire process is taking a lot of time, and it sometime takes a few hours to complete the cube load. The MySql data has about 400,000 rows and Excel about 60,000 rows.

      Please suggest a better way to do the process. I am meeting the client tomorrow, and any quick help is really appreciated.

      Thank you,
      Kiran
    • Hi,

      several questions:

      do you really need to import ~ 450K rows during each load? do really all the data change each time?
      what do you mean "joined the two tables"? do you really mean join or union ? in case of union you do not need to unite them, you could make 2 different loads using the 2 different sources
      do you use the buffering mode or the full mode in the mysql connection?
      laloune

      Post hoc, non est propter hoc
    • do you really need to import ~ 450K rows during each load? do really all the data change each time?

      The first load would be ~450k rows. For subsequent loads, I would use the groovy job to pull transactions for a particular day

      what do you mean "joined the two tables"? do you really mean join or union ? in case of union you do not need to unite them, you could make 2 different loads using the 2 different sources

      I need to Join, and create additional columns to the existing table.

      do you use the buffering mode or the full mode in the mysql connection?

      Which mode would be better? Full mode would probably be better as once the data is in, mysql connection can be released before further processing. Let me know if my understanding is correct.

      Regards
      Kiran
    • okay. do you have the possibility to export the data from excel in the mysql table?

      then use could use the standard join functionality of sql, what would be much more performant than joining internally in the etl (especially if you use indexes in the lookup table)

      450k rows for an initialization round is not very much. I loaded more that 75 million rows using the jedox ETL and the result was still successful after 8 hours (loaded during the night)
      laloune

      Post hoc, non est propter hoc
    • Had set up a job with the groovy job so that the date filters are taken. The scenario is as follows:

      1. Table 1 - about 20,000 rows per day
      2. Table 1M - Metadata related to Table 1 - 16 metadata elements - hence, 320,000 rows per day
      3. Table 2 - Not sure about the number of rows
      4. Table 2M - Again not sure of the number of rows.

      1M and 2M are denormalized, joined with 1 and 2. The joined tables are joined together to make one big, wide table. The big table is then normalized to load the cube.

      When the job ran, 1 and 1M were extracted and joined. when 2 and 2M were being extracted, we ran into the following problem:

      FATAL : Abnormally terminated execution of subload C_post_postmeta: Java heap space


      I had changed the Max in the setenv.bat file to 4096 MB (the server has 128 GB RAM), but I still get the error.

      I am attaching the job monitor log as well. You would see an error "In transform T_WP_POSTMETA the measure _order_type has not been found in source column meta_key.", which has been fixed, as there was an additional filter in the extract query that prevented the fields from being extracted. The query has been corrected.

      Need some thoughts on how to fix this.

      Please note that the number of rows in the log is higher, since this is the first load from Dec-01. Once the first load is pulled in, I will set up a variable for start date so that it is a daily incremental load.

      Regards,
      Kiran
      Files
      • Job Log.txt

        (10.51 kB, downloaded 318 times, last: )
    • Laloune, in reply to your earlier query:

      okay. do you have the possibility to export the data from excel in the mysql table?then use could use the standard join functionality of sql, what would be much more performant than joining internally in the etl (especially if you use indexes in the lookup table)


      The client is presently using a Join query in mysql to build the final table, but, they are looking at the ETL functionality to do it as it would give them more flexibility to add additional data easily in the future. The join query they are currently using is a pretty complex one, with about 20 joins to the same table in order to bring in the metadata. The client wants to do away with that.

      Regards,
      Kiran