API / Palo Performance

This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

  • API / Palo Performance

    Hi

    I've been playing around the with the HTTP API and have created some objects in perl to interface with it.

    In the process I have done some performance testing and would like to detail my results.

    The architecture I'm using is a 2.2Ghz Dual Core Mac Book Pro Client running OS X 10.4 perl v5.8.6 built for darwin-thread-multi-2level.
    1.5 Ghz Win 2000 Server Running Palo 2.0 and connected to the client via 802.11g wifi link.

    I have run a few performance tests with my code, and have found I can add 10,000 elements to a dimension in 178 seconds. Which isn't exactly stellar performance, but in my experience new accounts etc, usually are not added all that often, at least not 10,000 at a time.

    When I remove the elements the calls are VERY slow. I know it's not the internals of my code, since looking up the metadata to make the HTTP call is done within milliseconds. However the round trip takes about 6.4 seconds per element. Which means it will take around 18 hours to delete the same elements.

    Actually this isn't quite the case, if I leave the program running I can see that the time is coming down after each successive call, after 1,000 deletes the operation is now taking 5.5 seconds.
    It strikes me that Palo must be rehashing the structure on each call. If this is the case is there a way to perfrom a bulk operation and then request the server reindex?

    Is there something wrong with the /element/destroy function, or is there a quicker way of performing this operation ?

    Can anyone let me know how long this operation takes through Excel, or JPalo?

    Also do all the API's such as JPalo, .NET etc.. now use HTTP beind the scenes, if not what are the performance differences between the legacy API and the HTTP API?

    Also when will the source code for 2.0 be published to Sourceforge, I would like to compile it on the Mac and not have to use a lower spec machine as the server.

    Thanks

    The post was edited 1 time, last by Hugo ().

  • RE: API / Palo Performance

    Ok I'm replying to myself

    I knocked up some java and tried to delete the remaining 9,000 or so elements using the jpalo http interface. This time the test was on actual server using the HTTP interface and the call is still slow, although marginally faster than on the client.

    My assumption about the elements, being rehashed seems to be confirmed by this thread

    64.233.183.104/search?q=cache:…=5&gl=uk&client=firefox-a

    Although this refers to adding Elements.

    Therefore how can I quickly delete elements, or is this not possible with the HTTP interface ? The above thread seems to imply that JPalo will drop the legacy interface and I assume this is true for the other API's. Are there some methods which aren't documents which can be used for send multiple results?
  • RE: API / Palo Performance

    Hugo,

    An alternative approach might be to directly manipulate the database_CUBE*.csv files (i.e. load and unload elements while the database is 'offline'). Not ideal I'd admit but without a batch write processing facility within the API itself it may be the only option for now.

    Tom
  • RE: API / Palo Performance

    Hello,

    the performance in deleting elements depends on an element being filled or not
    or in a cube or not. Are your elements filled or not, in a cube?

    HTTP calls are faster than via the PHP extension for example. Using the legacy
    interface is also slower. The extensions communicate via the libpalo_ng and are
    therefore slower too. To get a better performance you could use HTTP with PHP.

    You did run the HTTP with perl and therefore it would be good to know what
    kind of elements and the structure.

    Regards,
    Stephanie
  • RE: API / Palo Performance

    Tom - Yes that's an option but it's not really a client server system then.

    Stephanie - The dimension used was the Years dimension in the Sales cube of the demo database. No data was add to any of the intersections, I simply created 10,000 elements (which took about 3 minutes)
    I then deleted them. Like I said the delay wasn't in the perl the delays was the server removing the elements from the dimension. It got a lot faster the smaller the dimension became, but with 10,000 it was very slow. Initially it was taking around 6 seconds per element, as the dimension became smaller it was taking around 0.2 seconds / so it seems like a scalability problem.
    Using php won't make any difference, I tried Java which I'm sure is a faster langugage than php and that was still slow, with the delay being due to the server.

    In terms of the perl I'm using the LWP library to communicate with the server. I have created the following objects:
    • Connection
    • Database
    • Cube
    • Dimension
    • Element


    The structures / objects are populated on demand, which is different from JPalo which I think caches everything on the first connection.
    ID's / Names are stored in associative arrays as references / pointers and blessed into the object. Therefore there isn't much movement of data internally since everything is a reference to a memory location.

    However as I said the delay comes from the server. Should I raise this as a bug? I wouldn't expect it to be so slow.
  • RE: API / Palo Performance

    Hello Hugo,

    By adding and deleting elements a lot of additional information has to be
    updated like the hierarchy information, the attributes and so on. The
    rate of adding elements is about 50 Elements per second, like you mentioned
    before. The deletion of elements is a little bit slower, and could be further
    optimized. I added it as a feature request to our bug tracking system, but
    I'm not sure, how many people would use such a bulk feature.
    Maybe it is better to delete the whole dimension instead of deleting
    each element?

    For details about JPalo you should ask in the Java API section, the questions
    there will be answered directly by tensegrity.

    Regards,
    Stephanie