Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    BatchUploader issue with long per-row ADD-operation duration

    Hi Isomorphic,

    I have a problem with my batch uploader. I tried to reproduce in the EE Online Showcase, but was not able to do so.

    In my case the upload ("csv data to server", "parsed data to browser") is pretty fast, even though there are about 5 importStrategy="display" fields, which means MANY selects are done on the DB
    The add operation after clicking the "Commit"-button does take a while though (0,03s upload/row and 2,14s add/row).

    This results in a strange message, where the clientside BatchUploader says "Valid records added; some records remain in error", but the server is still processing the data. Also all records in the client are still present and I have this message in the Developer Console:
    Code:
    --500x this message (the row count in my CSV), then the two last lines:
    19:55:30.239:TMR0:WARN:RPCManager:getHttpHeaders called with a null XmlHttpRequest object
    19:55:30.239:TMR0:WARN:RPCManager:getHttpHeaders called with a null XmlHttpRequest object
    19:55:30.239:TMR0:WARN:RPCManager:getHttpHeaders called with a null XmlHttpRequest object
    19:55:30.239:TMR0:WARN:RPCManager:getHttpHeaders called with a null XmlHttpRequest object
    19:55:30.240:TMR0:WARN:RPCManager:getHttpHeaders called with a null XmlHttpRequest object
    19:55:30.240:TMR0:WARN:RPCManager:getHttpHeaders called with a null XmlHttpRequest object
    19:55:32.625:TMR0:WARN:Log:RangeError: Invalid string length
    Stack from error.stack:
    I could not reproduce this on the online showcase, but you should be able to do so, if you just add a Thread.sleep(2000) to the add-DMI of supplyItem.

    Now you can say that 2sec for adding a row is too much and you are right and I will definitely tune this a bit (basically remove all unneeded cacheSyncs when processing the ADDs). But even if I get to 0,5sec, there will be some point where the browsers times out.
    This point will be way before any acceptable threshold for a data centric enterprise application.

    For really big data volumes there will be upload of plain file + background processing needed, but I think that there are still "easy" data loads where you can tell the user "please split your file and retry CSV upload".
    Could you add a setter setMaxRowsAllowed(int maxRows). Once the serverside recognizes that it has to process more than the threshold, it stops parsing the CSV file and returns a new error message like "Processed 500 rows, but did not reach the end of file, yet. Please split the uploaded file and upload the chunks". The grid will stay empty then or wont be generated at all.
    This way the user knows what is happening and can act. Every developer then can decide if he or she wants to use this feature and set the value to a value where no timeouts occur for their application or if he or she wants to use a (custom built) background processing instead. The advantage is that the user knows what is happening and that there a no timeouts possible.
    Without such a feature a developer must always expect the time out and build a more complicated background processing with progress check.

    A related feature is a button to remove all erroneous rows from the grid prior to commit. Especially with partial commit mode "Prevent" this would be very useful. I would place the button next to "Commit", if enabled.
    My usecase here is that I have a unique field "SourceUID" (not the PK, but uploaded). With the button, the user could upload a file where he or she already knows that some entries are duplicate, but then directly get rid of those.

    Especially the 1st point is important for me, as an user of mine just tried to upload 3500 rows which I was not able to process (parsing yes, adding no) - but there was no good error message.

    Best regards
    Blama

    #2
    You're able to implement row maximums yourself, via modifying the "batchUpload" DataSource - same approach we have previously discussed with you for other kinds of validation.

    Row count limits aren't a framework feature we'd want to add. If you're trying to handle around 10,000 rows, it should be no problem to do the checks you need to. We're not sure how it could possibly take 2s/row right now, but perhaps there are easy opportunities for caching or DB-level optimizations such as additional indexes.

    If you really can't make your code fast enough, we would suggest a different interface, where files are uploaded and the user later receives a notification that processing is done, and can review and approve the uploaded rows. We would not advise building an interface that asks the user to manually split the file.

    Comment


      #3
      Hi Isomorphic,

      OK, I will do so then and also understand the rejection of the idea as feature. For me, this is a quick&dirty solution, too.
      If I don't get into the 5000rows area with my possible optimizations I will build the background processing interface.

      Best regards
      Blama

      Comment


        #4
        Hi Isomorphic,

        Originally posted by Isomorphic View Post
        Row count limits aren't a framework feature we'd want to add. If you're trying to handle around 10,000 rows, it should be no problem to do the checks you need to. We're not sure how it could possibly take 2s/row right now, but perhaps there are easy opportunities for caching or DB-level optimizations such as additional indexes.
        Getting rid of unneeded an expensive cacheSyncs got me down to 0,08sec/row which is OK for me. This means 1,5min/1000 records. The 2s are/were due to Oracle SQL statement hard parses for the view I display with primaryKeys-equals criteria.

        For others with the same problem: invalidateCache() on the base-ListGrid after BatchUploader-close is way faster than issuing a fetch per uploaded row and returning that data with the BatchUploader-queue-response.

        Best regards
        Blama

        Comment

        Working...
        X