Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    Potential problem with BatchUploader FK-lookup

    Hi Isomorphic,

    I noticed a potential problem in BatchUploader with FK-lookup (importStrategy="display"). I'll describe it before creating a testcase, because it's not really a bug.

    Imagine a numberish string like supplyItem.SKU. supplyItem still has itemID as sequence-PK.
    Code:
            <field name="supplyItem_ID" importStrategy="display" uploadFieldName="SKU" displayField="SKU" foreignKey="supplyItem.itemID"
                required="true">
                <title><fmt:message key="SKU" /></title>
                <validators>
                    <validator type="hasRelatedRecord" />
                </validators>
            </field>

    The data looks like this:
    Code:
    itemID SKU
    0 "1"
    1 "2"
    2 "3"
    ...
    Now I delete itemID=1

    Now imagine I want to upload an item with SKU = "2" in the CSV, but it does not exist in the DB (because I deleted it). So the lookup for the SKU fails, and supplyItem_ID value is still 2 (=the value from the CSV).
    Next, the DB-insert will be tried with supplyItem_ID=2, which does exist, so the insert will succeed, but the new row will link to a different item.

    This problem can occur everywhere where the datatype of displayfield and datafield is the same or a cast succeeds.

    Best regards
    Blama

    #2
    This is a potential problem if your dataset is not properly set up for the import. But it doesn't seem like something the framework should somehow handle for you.

    Comment


      #3
      Hi Isomorphic,

      this (or a similar thing with the same root cause) now happened for a customer.

      They use an external ID as their logical name, this is looked up via importStrategy="display" to find the real internal Database ID and does not find anything.
      This is now NOT flagged as an error in the Uploader GUI. It would, if I also define a hasRelatedRecord validator, but this is not correct IMHO and should not be necessary.

      Also, if I use a Text in the column in the CSV-file, this is flagged as problem because of the not matching data type. This is kinda expected, but it is unclear, why two (logical) wrong (because they don't exist) cells result in a different outcome.

      Now, the customer does not see any error and hits "Commit", causing a problem because of failing database requests (foreignKey errors).

      The root cause here (just taking the text-value from the CSV file if the lookup fails) is the same as in #1.

      If necessary, I can create an Oracle sequence-PK based testcase for this, but I hope you already see what the problem here is.
      This update now is a milder version of #1, where the Batchuploader will add wrong data.

      Best regards
      Blama

      Comment


        #4
        Sorry, we still see this as a data problem, and not a framework bug. The framework is simply trying both ways to interpret the uploaded value, a good behavior. If you don't want that, you already understand how to avoid it via a hasRelatedRecord validator.

        Comment


          #5
          Hi Isomorphic,

          this is true for my post #3. It's not true for #1, where you would upload randomly wrong data.

          And for #3, without a framework code change there would be unnecessary DB workload, as you essentially do a:
          Code:
          SELECT id FROM lookuptable WHERE name = 'csvdata'; --Get Id
          ....
          SELECT id FROM lookuptable WHERE id = previously_looked_up_id; --Check if record exists
          Fixing the issue from #1 will also fix #3 without a need for a hasRelatedRecord validator.

          Best regards
          Blama
          Last edited by Blama; 24 Jun 2019, 08:15. Reason: Added word "framework".

          Comment


            #6
            So again, for #1, we would recommend setting up the data so you don't have an ambiguity between display value and ID, and for #3, we would recommend the hasRelatedRecord validator.

            It's true there's room for additional settings here that could optimize your #3 case, or allow you to work with ambiguous data as in #1. Perhaps something like importStrategy="displayRequired" with the behavior of treating a failed lookup as a hard failure instead of just using the provided value. But this would be an enhancement, not a bug.

            Comment


              #7
              Hi Isomorphic,

              unfortunately I don't control the data for #1, so IMHO this is a bug, because a end user will reasonably expect such behavior you describe with importStrategy="displayRequired" and I must live with the chance that an external ID is numerical. Of course an overlap is unlikely, but still possible.

              A thing like importStrategy="displayRequired" would solve #1 (and also #3 without the need for a hasRelatedRecord-validator).
              Additionally I don't see an use case, where current importStrategy="display" would have an advantage over your suggested importStrategy="displayRequired", making this look like a bug to me.

              Best regards
              Blama

              Comment


                #8
                You seem to be trying to use BatchUploader to go directly to production-ready data with no intervening review step. The latter is a typical use case, and in that scenario, you don't want to throw error messages at the end user blocking them from importing data that doesn't perfectly match the DB.

                So again, you have a way to be strict if you want, but it is not a bug for our product to have a permissive mode. It's, if anything, the more common use case.

                Comment


                  #9
                  Hi Isomorphic,

                  right now, I'm talking about a client side BatchUploader import with "Upload" and "Commit" button, which is used for bulk upload (typical amount: 100-2000 rows) of data.
                  We do verify the data with a bunch of validators. Not matching rows are flagged by the validators. Then the user can correct errors with "Prev"/"Next" errors buttons and finally persist via "Commit". This is a straight forward usecase IMHO.

                  Any row flagged as error (low amount) can be manually reviewed, but you can't expect the end user to compare every row and every cell in detail.
                  If you warned here as potentially wrong, this would also be fine.

                  With the permissive (as you call it) way it is now the result of an "Commit" depends on the existing lookup data with no warning or indication whatsoever. I'm sure you agree that this is a bad user experience.

                  As you also write in #8:
                  Originally posted by Isomorphic View Post
                  So again, you have a way to be strict if you want
                  I assume you mean using a hasRelatedRecord-validator. This is true ONLY for #3, not for #1, so I'm not sure if you understand my problem here.
                  In #1, for the CSV row
                  Code:
                  1 "2"
                  the final row after "Commit" will link to itemId:2 which does stand for SKU:"3".


                  Also the docs for display say:
                  Code:
                  The import process expects values in the import dataset to be display values, and it will transform them to the corresponding underlying keys
                  A silent "if there is no key found I'll keep the display value even if that has a different meaning if used as key itself" is really not something anyone would expect.

                  Aside of this one could also use serverside DataImport, which has the same problem and no way of intervening.

                  Best regards
                  Blama

                  Comment


                    #10
                    When we said:

                    You seem to be trying to use BatchUploader to go directly to production-ready data with no intervening review step.
                    We meant no human intervening review step. The way we ourselves have typically used BatchUploader, the uploaded data goes somewhere temporary, and there's a final review before it goes to actual production tables - not by the end user who is uploading, but by some kind of admin. In that case, we need display mapping to work just how it does, because again, we don't want to burden the uploader with errors they don't understand how to resolve.

                    So please stop claiming this is a bug when it's working the way we actually need it to!

                    We do understand that in your case of ambiguous id/display mapping (issue from post #1), you'll have to do something inconvenient like map to fake IDs first so you know which ones were mapped and not mapped. That's the kind of interesting edge case where Feature Sponsorship is very useful to you, because you get to modify the framework to solve something that, frankly, probably won't ever come up for anyone else, yet we still charge as though you're adding a feature for general use.

                    In sum: we fully understand how this behavior is inconvenient for you, but it is not at all a bug, and we're willing to add a feature that makes your use case easier and more efficient as a very small Feature Sponsorship.

                    Comment


                      #11
                      Hi Isomorphic,

                      I still see this as an error given the description of display, but let's agree to disagree.
                      Hopefully this will be minor to us. Also, I might be able to add validation errors myself like this(?):

                      The general lookup mechanism is as follow, right?
                      Lookup:
                      • If found, CSV-String to displayField, looked-up ID to ID.
                      • If not found, CSV-String to ID, displayField null
                      I might be able to catch this situation in my BatchUpload-DMI after
                      Code:
                      DSResponse validatedData = batchUpload.validateUploadData(response);
                      and add validation errors myself, correct?

                      Or perhaps even a server-side validator can do this?

                      Best regards
                      Blama

                      Comment


                        #12
                        Hi Isomorphic,

                        I tried doing so in my BatchUpload-DMI after validateUploadData(), as written in #11.
                        I came this far:
                        Code:
                                // Add validation errors if the lookup, that already happened, did not succeed
                                Map<String, String> lookupFields = new LinkedHashMap<String, String>();
                        
                                // Collect importStrategy="display" fields
                                for (String fieldName : dsRequest.getDataSource().getFieldNames()) {
                                    DSField dsField = dsRequest.getDataSource().getField(fieldName);
                        
                                    if (!"display".equals(dsField.getProperty("importStrategy")))
                                        continue;
                                    String name = dsField.getProperty("name");
                                    String displayField = dsField.getProperty("displayField");
                                    lookupFields.put(name, displayField);
                                }
                        Next step would be to loop over the data and add errors, but these differ from normal validation errors.
                        Output of RPC-Tab in the developer console for a BatchUpload:
                        Code:
                        {
                            affectedRows:0,
                            data:{
                                discardedColumns:[
                                ],
                                gridRows:[
                                    {
                                        COUNTRY_ID:64,
                                        COUNTRY_NAME:"FR",
                                        ZIPCODE_FROM:":00000",
                                        ZIPCODE_TO:"206 + 454"
                                    },
                                    {
                                        COUNTRY_ID:64,
                                        RESELLER_ID:12,
                                        RESELLER_NAME:"317",
                                        COUNTRY_NAME:"FR",
                                        ZIPCODE_FROM:"02110"
                                    },
                                    {
                                        COUNTRY_ID:64,
                                        RESELLER_ID:318,
                                        COUNTRY_NAME:"FR",
                                        ZIPCODE_TO:"02130"
                                    },
                                    {
                                        COUNTRY_ID:64,
                                        RESELLER_ID:12,
                                        COUNTRY_NAME:"FR",
                                        ZIPCODE_FROM:"02130"
                                    }
                                ],
                        [B]errors[/B]:[
                                    {
                                        errors:{
                                            RESELLER_ID:[
                                                "Field is required"
                                            ]
                                        },
                        [B]rowNum:0[/B]
                                    },
                                    {
                                        errors:{
                                            RESELLER_ID:[
                                                "\"318\" not found in existing database records"
                                            ],
                                            ZIPCODE_TO:[
                                                "Please set either \"State\" or \"ZIP-Code from\" (and optionally \"ZIP-Code to\")."
                                            ]
                                        },
                        [B]rowNum:2[/B]
                                    }
                                ]
                            },
                            invalidateCache:false,
                            isDSResponse:true,
                            operationType:"add",
                            queueStatus:0,
                            status:0
                        }
                        As you can see, there are stacked errors with a rowNum-attribute for the inner errors.
                        How can I add to this, meaning add errors for rows that don't have an errors yet and also amend the inner-errors list for rows with errors?

                        Thank you & Best regards
                        Blama

                        Comment


                          #13
                          Have you noticed the docs about dataSource.transformInputValue()? That would allow you to leave NULL, or some marker value, for rows where no mapping exists. At that point a hasRelatedRecord validator will flag the row as an error, so you get normal validation errors (you should not try to work with the specialized error format of BatchUploader, which is an internal detail).

                          Comment


                            #14
                            Hi Isomorphic,

                            thanks. I already assumed that BU is something special in this case and that adding the error won't be possible.
                            I did not try DataSource.transformInputValue(), but having a look at the docs I can't see how this might help. Can you elaborate?

                            For now, I found this solution, basically my own hasRelatedRecord-validator with the same optimization you use for fk-lookup (lookup every value only once):

                            .ds.xml field:
                            Code:
                                    <field foreignKey="T_RESELLER.ID" name="RESELLER_ID" batchUploadOperationId="fetchUploadData" batchUploadCaseSensitive="true" importStrategy="display" uploadFieldName="Reseller" displayField="RESELLER_NAME" type="integer" required="true">
                                        <title><fmt:message key="reseller" /></title>
                                         <validators>
                                            <validator type="serverCustom" dataSource="T_RESELLER" textField="RESELLER_NAME">
                                                <serverObject [B]lookupStyle="factory" className="com.lmscompany.lms.server.worker.validator.ValidatorLookupSucceededFactory"[/B] />
                                                <errorMessage>$errorMessage</errorMessage>
                                            </validator>
                                        </validators>
                                    </field>
                            ValidatorLookupSucceededFactory:
                            Code:
                            public class ValidatorLookupSucceededFactory {
                            
                                public static ValidatorLookupSucceeded create(HttpServletRequest httpServletRequest) throws Exception {
                                    Object validatorLookupSucceededObj = httpServletRequest.getAttribute("ValidatorLookupSucceededInstance");
                                    if (validatorLookupSucceededObj != null && !(validatorLookupSucceededObj instanceof ValidatorLookupSucceeded)) {
                                        Logger log = new Logger(ValidatorLookupSucceededFactory.class.getName());
                                        log.error("Object found in attribute \"ValidatorLookupSucceededInstance\" is not of type ValidatorLookupSucceeded.");
                                    }
                                    ValidatorLookupSucceeded validatorLookupSucceeded = (ValidatorLookupSucceeded) httpServletRequest
                                            .getAttribute("ValidatorLookupSucceededInstance");
                                    if (validatorLookupSucceeded == null) {
                                        validatorLookupSucceeded = new ValidatorLookupSucceeded();
                                        httpServletRequest.setAttribute("ValidatorLookupSucceededInstance", validatorLookupSucceeded);
                                    }
                                    return validatorLookupSucceeded;
                                }
                            }
                            Code:
                            public class ValidatorLookupSucceeded {
                                Logger log = new Logger(ValidatorLookupSucceeded.class.getName());
                                @SuppressWarnings("rawtypes")
                                MultiKeyMap map = new MultiKeyMap();
                            
                                @SuppressWarnings("unchecked")
                                public boolean condition(Object value, Validator validator, String fieldName, Map<Object, Object> record, DataSource ds,
                                        HttpServletRequest httpServletRequest) throws Exception {
                            
                                    log.info("Validating for field \"" + fieldName + "\", value \"" + (value == null ? "" : value.toString()) + "\"");
                            
                                    // Ensure correct configuration of validator
                                    if (validator.getProperty("textField") == null || validator.getProperty("textField").toString().isEmpty()) {
                                        throw new Exception("Validator must define a \"textField\" attribute.");
                                    }
                                    String textField = validator.getProperty("textField").toString();
                            
                                    // Exit early - nothing to validate
                                    if (value == null)
                                        return true;
                                    String strValue = value.toString();
                            
                                    /*
                                     * Automatic lookup works as follow:
                                     * User sends text data for ID(!) field. Lookup for importStrategy="display" checks the parent DS for the text data.
                                     * Row found: Row-ID into ID field, user text data into display field
                                     * Row not found: User text data stays in ID field, display field is empty
                                     * Latter case is checked below.
                                     */
                                    if (value != null && (!record.containsKey(textField) || record.get(textField) == null || record.get(textField).toString().isEmpty())) {
                                        String errorMessage = String.format(I18n.getString("automaticLookupNotSuccessful", httpServletRequest), value);
                                        validator.addErrorMessageVariable("errorMessage", errorMessage);
                                        return false;
                                    }
                            
                                    // Here we are sure to have CSV-Text and looked up ID and can either check against the DB or use already run check results.
                                    String strCSVString = record.get(textField).toString();
                            
                                    // Check for prior DB queries with these keys
                                    if (map.containsKey(fieldName, strValue, strCSVString)) {
                                        Object o = map.get(fieldName, strValue, strCSVString);
                                        if (o != null) {
                                            validator.addErrorMessageVariable("errorMessage", o.toString());
                                            return false;
                                        } else
                                            // Prior validation runs in this request did not find an error for this combination
                                            return true;
                                    }
                            
                                    // Validation of found ID against database
                                    if (validator.getProperty("dataSource") != null && !validator.getProperty("dataSource").toString().isEmpty()) {
                                        String dataSource = validator.getProperty("dataSource").toString();
                            
                                        String errorMessage = ValidatorHasRelatedRecord.doValidation(value, dataSource, httpServletRequest);
                                        if (errorMessage != null) {
                                            validator.addErrorMessageVariable("errorMessage", errorMessage);
                                            map.put(fieldName, strValue, strCSVString, errorMessage);
                                            return false;
                                        }
                                    }
                            
                                    // No validation error found, store this information
                                    map.put(fieldName, strValue, strCSVString, null);
                                    return true;
                                }
                            }
                            This is working for me and because of the storing of the validation results also more efficient than the default hasRelatedRecord-validator (this might be a nice optimization here as well).

                            Is there anything that does invalidate this solution from your point of view?
                            Because this way, once the open points regarding BatchUploader (Commits, Client side error messages, lookup-speed improvement when no entry is found, Queue success/failure; highest priority first) are solved, this should be fine for uploading ~5000 records with few columns.
                            Serverside speed should be OK for this, problem is more the browser behavior for AJAX requests this big (as the client sends VERY verbose XML data):

                            (For two records, that are <50 bytes in CSV, this data is in the ADD requests:
                            Code:
                            <transaction xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:type="xsd:Object"><transactionNum xsi:type="xsd:long">31</transactionNum><operations xsi:type="xsd:List"><elem xsi:type="xsd:Object"><values xsi:type="xsd:Object"><COUNTRY_ID xsi:type="xsd:long">64</COUNTRY_ID><RESELLER_ID xsi:type="xsd:long">18</RESELLER_ID><COUNTRY_NAME>FR</COUNTRY_NAME><ZIPCODE_FROM>02130</ZIPCODE_FROM><_selection_42 xsi:type="xsd:boolean">false</_selection_42><_embeddedComponents_isc_BatchUploader_0_grid xsi:nil="true"/><RESELLER_NAME>ABC-Softwarehaus GmbH (B)</RESELLER_NAME></values><operationConfig xsi:type="xsd:Object"><dataSource>V_RESELLER_SALESAREA_UPLOAD__6__2019_07_03_11_34_27</dataSource><repo xsi:nil="true"/><operationType>add</operationType><textMatchStyle>exact</textMatchStyle></operationConfig><appID>builtinApplication</appID><operation>V_RESELLER_SALESAREA_UPLOAD__6__2019_07_03_11_34_27_add</operation><oldValues xsi:type="xsd:Object"><COUNTRY_ID xsi:type="xsd:long">64</COUNTRY_ID><RESELLER_ID xsi:type="xsd:long">18</RESELLER_ID><COUNTRY_NAME>FR</COUNTRY_NAME><ZIPCODE_FROM>02130</ZIPCODE_FROM><_selection_42 xsi:type="xsd:boolean">false</_selection_42><_embeddedComponents_isc_BatchUploader_0_grid xsi:nil="true"/><RESELLER_NAME>ABC-Softwarehaus GmbH (B)</RESELLER_NAME></oldValues></elem><elem xsi:type="xsd:Object"><values xsi:type="xsd:Object"><COUNTRY_ID xsi:type="xsd:long">64</COUNTRY_ID><RESELLER_ID xsi:type="xsd:long">19</RESELLER_ID><COUNTRY_NAME>FR</COUNTRY_NAME><ZIPCODE_FROM>02130</ZIPCODE_FROM><_selection_42 xsi:type="xsd:boolean">true</_selection_42><_embeddedComponents_isc_BatchUploader_0_grid xsi:nil="true"/><RESELLER_NAME>Hansestadt Softwarekontor (HH)</RESELLER_NAME></values><operationConfig xsi:type="xsd:Object"><dataSource>V_RESELLER_SALESAREA_UPLOAD__6__2019_07_03_11_34_27</dataSource><repo xsi:nil="true"/><operationType>add</operationType><textMatchStyle>exact</textMatchStyle></operationConfig><appID>builtinApplication</appID><operation>V_RESELLER_SALESAREA_UPLOAD__6__2019_07_03_11_34_27_add</operation><oldValues xsi:type="xsd:Object"><COUNTRY_ID xsi:type="xsd:long">64</COUNTRY_ID><RESELLER_ID xsi:type="xsd:long">19</RESELLER_ID><COUNTRY_NAME>FR</COUNTRY_NAME><ZIPCODE_FROM>02130</ZIPCODE_FROM><_selection_42 xsi:type="xsd:boolean">true</_selection_42><_embeddedComponents_isc_BatchUploader_0_grid xsi:nil="true"/><RESELLER_NAME>Hansestadt Softwarekontor (HH)</RESELLER_NAME></oldValues></elem></operations></transaction>
                            Improvement suggestion: Is oldValues needed here? IMHO it can be removed, saving ~50% of the request size and XML generation.
                            )

                            In general it turns out for me, that BatchUploader is good for small CSVs (<1000 rows). If you agree, can you add it to the docs, so that users that already know they are going to need to be able to upload more rows, won't start this road?

                            So an upcoming task here is definitely going to be to move this processing to the background, meaning uploading of CSV as a file, pre-process it to a table serverside (most likely using serverside DataImport) and showing it as normal databound ListGrid, then allowing the user to change data in the integration table and then Batch transfer it to the target table.
                            As this is a background operation, we'll need some success or progress display, so this is going to be a bit more involved.

                            Best regards
                            Blama

                            Comment


                              #15
                              transformInput can be used as we described - we’re not sure how to elaborate further.

                              Not important anymore, as you seem to have realized the situation you described in #1 can actually be solved with a validator after all, with no enhancements required.

                              We will take a look at whether the XML can be reduced. Looks like there are easy wins.

                              Comment

                              Working...
                              X