Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    Enhancement: BatchUploader lookup result cache to increase performance

    Hi Isomorphic,

    please see this thread and the performance issue there.
    While writing it I had another idea how to improve BatchUploader.
    For my main table I can currently import about 1000 rows only, as otherwise the response takes to long, making the browser stop waiting for the response.
    The problem here is the (1st) upload part of BatchUploader, not the 2nd commit part.

    The queries issued serverside are mostly the same:
    • lookup country "Germany" to 455
    • lookup country "Austria" to 446
    • lookup country "Switzerland" to 447
    • lookup temperature "cold" to 1
    • lookup temperature "warm" to 2
    • lookup temperature"hot" to 3
    • lookup product "ABC" 1001
    • lookup product "DEF" 1002
    • ....
    During a run of BatchUpload the result here will never change (especially if you run the queries in a transaction, then is is even guaranteed).

    If you maintained a cache of
    "DataSource" - "key value" - "resulting id"
    during a run of BatchUpload, you'd need to run every different country lookup only once, every different temperature lookup only once, ...

    I assume that this is an easy and valuable enhancement, that would severely speed up BatchUploader, increasing the maximum number of possible records in a CSV.

    Best regards
    Blama

    #2
    Hi Isomorphic,


    let me modify the example in order to improve two other use cases:
    • lookup country "Germany" to 455 for field country_id
    • lookup country "Austria" to 446 for field country_id
    • lookup country "Switzerland" to 447 for field country_id
    • lookup temperature "cold" to 1 for field temperature_id
    • lookup temperature "warm" to 2 for field temperature_id
    • lookup temperature"hot" to 3 for field temperature_id
    • lookup product "ABC" 1001 for field product1_id
    • lookup product "DEF" 1002 for field product1_id
    • lookup product "ZYX" 1055 for field product2_id (different field also linking to product)
    • lookup product "WVU" 1054 for field product2_id (different field also linking to product)
    • ....
    If you ever implement the feature from this thread, it would be better if the cache is maintained as:
    "uploadFieldName" - "key value" - "resulting id" (because of the potentially different results per uploadField that link to the same DataSource)

    Also, if a lookup value is not found, this should be tracked as well, so that the next time the BatchUploader wants to request a value, because it does not have a stored value for it already, it does directly know that the query won't return any results. (e.g. if in all CSV entries the ISO code of a country is used instead of the name, that is stored in the DB). In this case the BatchUpload could directly add an error for the row/field in question, without issuing a query.


    Best regards
    Blama

    Comment


      #3
      This is indeed useful suggestion, thank you. It is implemented in 12.x versions and will be available for download in nightly builds since Sep 23 (tomorrow).

      Comment


        #4
        Hi Isomorphic,

        can you explain how the cache works (because of this thread)? I assume it is built in into 12.0p DataImport and only affects a single run of "upload", correct?
        If not, is there a setting to disable the cache?

        Thank you & Best regards
        Blama

        Comment


          #5
          Yes, your assumption is correct. Values are cached in a context of single run of batchUpload. Every new upload will have its own cache built during the process.

          Comment


            #6
            Hi Isomorphic,

            as you might have noticed I posted a lot about BatchUploader lately. I also had a closer look at the SQL statements sent and how this improvement works.
            It does reduce the amount of fetches a lot, as expected. Unfortunately, you only store the looked up ID if you find one.
            A text without an entry in the parent table does not lead to this negative "key not present"-information stored in the result cache.
            Instead you do this query over and over again. Of course, here the same optimization can be applied.

            It will lead to great further improvements in cases of bad upload data or a wrongly configured includeFrom in the field mentioned as displayField in the ID field with importStrategy="display".

            Best regards
            Blama

            Comment


              #7
              Hi Isomorphic,

              did you have a chance to look at this optimization as well?

              Thank you & Best regards
              Blama

              Comment


                #8
                We did, although this was implemented in 12.1 only.

                Comment


                  #9
                  Hi Isomorphic,

                  OK, thank you, perfect. We'll switch hopefully soon anyway.

                  Best regards
                  Blama

                  Comment

                  Working...
                  X