CSV to Palo : time of import

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • CSV to Palo : time of import

      Hi,

      I'm using PALO ETL Server and i'm surprised by the time needed for importing 100 rows from a CSV file.

      I have a Palo Base with 6 Dimensions :
      dimension 1: 3 elements,
      dimension 2 : 5 elements,
      dimension 3 : 6 elements,
      dimension 4 : 12 elements,
      dimension 5 : 4702 elements,
      dimension 6 : 576 elements
      which made a lot of combinations.
      And, according to Palo ETL, there are more that 5.000.000 values.

      My import contains 100 rows, but there are consolidated elements, so Palo said there are 3457 data written.
      It takes more than 3min20s (> 200 seconds), so 17 data written by second.
      Is it a normal time ?

      This is the log :

      Source Code

      1. 2009-11-03 23:10:01,044 INFO [job Import_CSV1] (ExecutionState.java:125) - Starting execution of job Import_CSV1
      2. 2009-11-03 23:10:01,044 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter fic_csv: C:\wamp\www\Gestidistest\import\balana_ETL_1_031109_223557_2.csv
      3. 2009-11-03 23:10:01,044 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter failOnError: true
      4. 2009-11-03 23:10:01,044 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter logLimit: 100
      5. 2009-11-03 23:10:01,044 INFO [job Import_CSV1] (CubeLoad.java:454) - Starting load Load_Palo of Cube Version4
      6. 2009-11-03 23:10:01,044 INFO [job Import_CSV1] (TableSource.java:208) - Data retrieval from extract Extract_from_CSV
      7. 2009-11-03 23:10:01,122 INFO [job Import_CSV1] (Processor.java:161) - Committing datastore Import_Gestidis.connections.Connexion_CSV
      8. 2009-11-03 23:13:28,456 INFO [job Import_CSV1] (CubeLoad.java:118) - Data vectors loaded: 100
      9. 2009-11-03 23:13:28,456 INFO [job Import_CSV1] (CubeLoad.java:459) - Finished load Load_Palo of Cube Version4. Filled cells changed from 5388908 to 5392365.
      10. 2009-11-03 23:13:28,456 INFO [job Import_CSV1] (ExecutionState.java:148) - Finished execution of job Import_CSV1 with status: Completed successfully


      Thanks
    • In another Post, a user said :
      Originally posted by AWW
      i am importing roundabout 7 Mio Cells with kettle into a cube with 7 Dimensions.
      the import takes 600 Cells per second.


      i'm not using Kettle but the difference of speed is big however.

      I have tried to import a full file : 441 rows
      and I had a TimeOut Exception !!

      Source Code

      1. 2009-11-03 22:36:21,761 INFO [job Import_CSV1] (ExecutionState.java:125) - Starting execution of job Import_CSV1
      2. 2009-11-03 22:36:21,777 INFO [job Import_CSV1] (TableSource.java:208) - Data retrieval from extract Extract_from_CSV
      3. 2009-11-03 22:36:21,777 INFO [job Import_CSV1] (CubeLoad.java:454) - Starting load Load_Palo of Cube Version4
      4. 2009-11-03 22:36:21,777 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter logLimit: 100
      5. 2009-11-03 22:36:21,777 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter failOnError: true
      6. 2009-11-03 22:36:21,777 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter fic_csv: C:\wamp\www\Gestidistest/./import/balana_ETL_1_031109_223557.csv
      7. 2009-11-03 22:36:21,855 INFO [job Import_CSV1] (Processor.java:161) - Committing datastore Import_Gestidis.connections.Connexion_CSV
      8. 2009-11-03 22:41:22,549 INFO [job Import_CSV1] (CubeLoad.java:118) - Data vectors loaded: 441
      9. 2009-11-03 22:41:22,549 WARN [job Import_CSV1] (MessageHandler.java:91) - Failed to load Cube Version4: com.tensegrity.palojava.PaloException: Timeout Exception!!
      10. 2009-11-03 22:46:23,072 ERROR [job Import_CSV1] (CubeLoad.java:468) - Cannot import Data into Cube Version4: Timeout Exception!!


      after a search in the forum, i found this post but i didn't find the files config.xml in the directories .\client\config\core and .\server\config\core because the directories didn't exist.
      There is a file config.xml in C:\Program Files\Jedox\ETL\webapps\etlserver\config but there is not the parameter "cubeCommit". Shall I add it ?

      Or is there another solution for the TimeOut Exception ?
      Or ameliorate the time of execution of course !!

      Thanks
    • There is a file config.xml in C:\Program Files\Jedox\ETL\webapps\etlserver\config but there is not the parameter "cubeCommit". Shall I add it ?


      add it .. but where ?

      there is not 'exports' structure in config.XML
      (my version of ETL Server is 3.0.943).

      my config.xml (default file) :

      XML Source Code

      1. <?xml version="1.0" encoding="UTF-8"?>
      2. <config>
      3. <persistence>
      4. <parameter name="database">derby</parameter>
      5. </persistence>
      6. <encryption>
      7. <parameter name="password">proclos</parameter>
      8. </encryption>
      9. <directories>
      10. <parameter name="customlib">customlib</parameter>
      11. <parameter name="logs">logs</parameter>
      12. <parameter name="persistence">data/db</parameter>
      13. <parameter name="projects">data/projects</parameter>
      14. <parameter name="files">files</parameter>
      15. </directories>
      16. <projects>
      17. <parameter name="autosave">true</parameter>
      18. <parameter name="validate">true</parameter>
      19. <parameter name="loglevel">info</parameter>
      20. </projects>
      21. <executions>
      22. <parameter name="failOnError">true</parameter>
      23. <parameter name="logLimit">100</parameter>
      24. </executions>
      25. </config>
      Display All


      Shall I add this :

      Source Code

      1. <exports>
      2. <parameter name="cubeCommit">false</parameter>
      3. </exports>
      ???

      Thanks
    • The execution is completed with this config :

      Source Code

      1. <config>
      2. <persistence>
      3. <parameter name="database">derby</parameter>
      4. </persistence>
      5. <encryption>
      6. <parameter name="password">proclos</parameter>
      7. </encryption>
      8. <directories>
      9. <parameter name="customlib">customlib</parameter>
      10. <parameter name="logs">logs</parameter>
      11. <parameter name="persistence">data/db</parameter>
      12. <parameter name="projects">data/projects</parameter>
      13. <parameter name="files">files</parameter>
      14. </directories>
      15. <projects>
      16. <parameter name="autosave">true</parameter>
      17. <parameter name="validate">true</parameter>
      18. <parameter name="loglevel">info</parameter>
      19. </projects>
      20. <executions>
      21. <parameter name="failOnError">true</parameter>
      22. <parameter name="logLimit">100</parameter>
      23. <parameter name="cubeCommit">false</parameter>
      24. </executions>
      25. </config>
      Display All

      => Parameter "cubeCommit" in "executions" tree

      but there are warning.
      The log says :

      Source Code

      1. 2009-11-04 11:21:13,780 INFO [job Import_CSV1] (ExecutionState.java:125) - Starting execution of job Import_CSV1
      2. 2009-11-04 11:21:13,780 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter fic_csv: C:\wamp\www\Gestidistest\import\balana_ETL_1_041109_095958.csv
      3. 2009-11-04 11:21:13,780 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter failOnError: true
      4. 2009-11-04 11:21:13,780 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter logLimit: 100
      5. 2009-11-04 11:21:13,795 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter cubeCommit: false
      6. 2009-11-04 11:21:13,826 INFO [job Import_CSV1] (CubeLoad.java:454) - Starting load Load_Palo of Cube Version4
      7. 2009-11-04 11:21:13,826 INFO [job Import_CSV1] (TableSource.java:208) - Data retrieval from extract Extract_from_CSV
      8. 2009-11-04 11:21:15,355 INFO [job Import_CSV1] (Processor.java:161) - Committing datastore Import_Gestidis.connections.Connexion_CSV
      9. 2009-11-04 11:26:17,229 WARN [job Import_CSV1] (MessageHandler.java:91) - Failed to load Cube Version4: com.tensegrity.palojava.PaloException: Timeout Exception!!
      10. 2009-11-04 11:26:17,229 INFO [job Import_CSV1] (CubeLoad.java:118) - Data vectors loaded: 584
      11. 2009-11-04 11:45:29,128 INFO [job Import_CSV1] (CubeLoad.java:459) - Finished load Load_Palo of Cube Version4. Filled cells changed from 5387582 to 5408984.
      12. 2009-11-04 11:45:29,129 INFO [job Import_CSV1] (ExecutionState.java:148) - Finished execution of job Import_CSV1 with status: Completed with Warnings
      Display All


      What means 'Completed with Warnings' ?
      Is the import OK ?
    • i had the same error using palo 2.5 as server. then i splitted my (large) import file into smaller files, and the same data went trough ok.

      i then tried it again in one chunk with with 3.0, and it works

      regarding performance i looked at my logs and import 1927 lines per sec (84797 in 44 sec) using 3.0 (etl and server)
    • 11 dimensions, 2 measures. i don't have the cubeCommit...
      cube size is 4.05285220808582E+16 but very sparse (it contains only the 84k elements i am using

      i have no rules at the time of the import in the cube

      but i tried adding it and it made no difference
      i also tried to import from a flat file (i use a db) but no difference

      i suppose you are running this on modern it equippement..
      are you loading with mode = create?

      <?xml version="1.0" encoding="UTF-8"?>
      <config>
      <persistence>
      <parameter name="database">derby</parameter>
      </persistence>
      <encryption>
      <parameter name="password">proclos</parameter>
      </encryption>
      <directories>
      <parameter name="customlib">customlib</parameter>
      <parameter name="logs">logs</parameter>
      <parameter name="persistence">data/db</parameter>
      <parameter name="projects">data/projects</parameter>
      <parameter name="files">files</parameter>
      </directories>
      <projects>
      <parameter name="autosave">true</parameter>
      <parameter name="validate">true</parameter>
      <parameter name="loglevel">info</parameter>
      </projects>
      <executions>
      <parameter name="failOnError">true</parameter>
      <parameter name="logLimit">100</parameter>
      </executions>
      </config>
    • I use the mode 'insert'.

      I made a test with the mode 'create' : it is very faster ! .... but the cube is cleared before
      (however, i'm impressed by the speed)

      the log :

      Source Code

      1. 2009-11-05 20:20:39,596 INFO [job Import_CSV1] (ExecutionState.java:125) - Starting execution of job Import_CSV1
      2. 2009-11-05 20:20:39,598 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter fic_csv: C:\wamp\www\Gestidistest\import\balana_ETL_1_041109_095958.csv
      3. 2009-11-05 20:20:39,602 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter failOnError: true
      4. 2009-11-05 20:20:39,603 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter logLimit: 100
      5. 2009-11-05 20:20:39,603 INFO [job Import_CSV1] (ExecutionState.java:130) - Parameter cubeCommit: false
      6. 2009-11-05 20:20:39,651 INFO [job Import_CSV1] (CubeLoad.java:454) - Starting load Load_Palo of Cube Version4
      7. 2009-11-05 20:20:39,652 INFO [job Import_CSV1] (TableSource.java:208) - Data retrieval from extract Extract_from_CSV
      8. 2009-11-05 20:20:41,547 INFO [job Import_CSV1] (Processor.java:161) - Committing datastore Import_Gestidis.connections.Connexion_CSV
      9. 2009-11-05 20:20:50,025 INFO [job Import_CSV1] (CubeLoad.java:118) - Data vectors loaded: 584
      10. 2009-11-05 20:20:50,027 INFO [job Import_CSV1] (CubeLoad.java:459) - Finished load Load_Palo of Cube Version4. Filled cells changed from 0 to 21402.
      11. 2009-11-05 20:20:50,029 INFO [job Import_CSV1] (ExecutionState.java:148) - Finished execution of job Import_CSV1 with status: Completed successfully
      Display All
    • Several remarks on this:

      1. If you write cube cells to consolidated elements, Palo Server is splashing them. This may lead to a much higher number of filled (base) cells compared to the number of imported rows in the File (100 against 3457 in your case). The performance of the ETL process decreases extremely when splashing is involved due to the additional effort of Palo Server. So you should compare cases where only base cells are involved.

      2. The Warning message "Failed to load Cube Version4: com.tensegrity.palojava.PaloException: Timeout Exception!!" occurs when Palo Server doesn't react on Cube loading for 300 seconds. You can increase this parameter in File component.xml. But this doesn't really solve the problem. I'm quite sure that the reason is once again splashing at a high degree.

      3. In the File-Connection you should use a SQL-statement only if it's absolutely necessary (e.g. for sorting) in order to speed up the extraction. The file has to be persisted internally to perform this SQL-statement.

      4. The parameter cubeCommit is not used in ETL 1.3 and is no longer existing in ETL 3.0