Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    12.0p Generated SQL for PostgreSQL case insensitive LIKE comparisons / Add ILIKE support

    Hi Isomorphic,

    please see this PostgreSQL 11.4 create table code, and the SQL you typically generate for iContains.
    We have a big city data table here (from Geonames) and ask it with a few ... iContains ... OR ... iContains ... in order to make suggestions in a ComboBoxItem.

    It turns out that the SQL you generate (LOWER() around the column name) makes Postgres 11.4 not use the index, while it would do so for ILIKE.
    Please see this code and the resulting plans:

    Test table:
    Code:
    --Needed for indexes supporting wildcard searches
    CREATE EXTENSION pg_trgm;
    CREATE EXTENSION btree_gin;
    
    drop table test;
    create table test
    (
        id          bigint not null
            constraint test_pkey primary key
            constraint nnc_test_id check (id IS NOT NULL),
        id_in_order bigint not null
            constraint nnc_test_id_in_order check (id_in_order IS NOT NULL),
        text_lc      varchar(40)
            constraint nnc_test_text_lc check (text_lc IS NOT NULL),
        text_uc     varchar(40)
            constraint nnc_test_text_uc check (text_uc IS NOT NULL)
    );
    
    INSERT INTO test(id, id_in_order, text_lc, text_uc)
    SELECT series.i, --id
           series.i / 10000, --100 distinct values
           LOWER(md5(random()::text)),
           UPPER(md5(random()::text))
    FROM generate_series(1, 1000000) as series(i);
    
    ANALYZE VERBOSE test;
    
    create index test_gin on test USING gin (id_in_order, text_lc gin_trgm_ops, text_uc gin_trgm_ops);
    create index test_normal on test (id_in_order, text_lc, text_uc);
    
    --See generated data
    select * from test where id_in_order = 50 order by id_in_order, text_lc, text_uc;
    
    --ILIKE uses index (same result as below)
    select id_in_order, text_lc, text_uc, id
    from test
    where id_in_order = 50
      and ((text_lc ilike '%abcd%' and text_lc is not null) or
           (text_uc ilike '%abcd%' and text_uc is not null))
    order by id_in_order, text_lc, text_uc;
    
    --LOWER around column does not use index (same result as above)
    select id_in_order, text_lc, text_uc, id
    from test
    where id_in_order = 50
      and ((lower(text_lc) like lower('%abcd%') and text_lc is not null) or
           (upper(text_uc) like upper('%abcd%') and text_uc is not null))
    order by id_in_order, text_lc, text_uc;
    Plan 1 (good) (can paste it here):
    Code:
    QUERY PLAN
    "Sort  (cost=654.78..654.78 rows=2 width=82)"
    "  Sort Key: text_lc, text_uc"
    "  ->  Bitmap Heap Scan on test  (cost=646.80..654.77 rows=2 width=82)"
    "        Recheck Cond: ((((text_lc)::text ~~* '%abcd%'::text) OR ((text_uc)::text ~~* '%abcd%'::text)) AND (id_in_order = 50))"
    "        ->  BitmapAnd  (cost=646.80..646.80 rows=2 width=0)"
    "              ->  BitmapOr  (cost=73.50..73.50 rows=200 width=0)"
    "                    ->  Bitmap Index Scan on test_gin  (cost=0.00..36.75 rows=100 width=0)"
    "                          Index Cond: ((text_lc)::text ~~* '%abcd%'::text)"
    "                    ->  Bitmap Index Scan on test_gin  (cost=0.00..36.75 rows=100 width=0)"
    "                          Index Cond: ((text_uc)::text ~~* '%abcd%'::text)"
    "              ->  Bitmap Index Scan on test_normal  (cost=0.00..573.05 rows=9667 width=0)"
    "                    Index Cond: (id_in_order = 50)"
    Plan 2 (bad):
    Code:
    QUERY PLAN
    "Index Scan using test_normal on test  (cost=0.55..13488.27 rows=154 width=82)"
    "  Index Cond: (id_in_order = 50)"
    "  Filter: (((lower((text_lc)::text) ~~ '%abcd%'::text) AND (text_lc IS NOT NULL)) OR ((upper((text_uc)::text) ~~ '%ABCD%'::text) AND (text_uc IS NOT NULL)))"
    Could you change the generated SQL here in order to use ILIKE?

    Thank you & Best regards
    Blama

    #2
    We agree that this would be a worthwhile enhancement to our Postgres support. A developer is scheduled to make this change, and we will post back on this thread soon.

    Comment


      #3
      We have now implemented optional support for ILIKE in our PostgreSQL driver. This will be present in 12.1 and 12.0 builds as of tomorrow - ie, builds dated December 21 or later. Note, the use of ILIKE depends on a server.properties flag, which is set to true in 12.1 builds, but false in 12.0 builds (for safety, because 12.0 is really a bugfix-only branch). So to see a difference with a 12.0 build, add the following line to your server.properties file:

      Code:
      sql.postgresql.useILike: true

      Comment


        #4
        Hi Isomorphic,

        thank you. I'm pretty sure this setting will help other users as well. It's also great that you made this configurable, as it might be possible that other users like ANSI SQL here better for some reason.

        Best regards
        Blama

        Comment


          #5
          Hi Isomorphic,

          I tried this feature using v12.0p_2020-03-11. It seems to work in general, but broke importStrategy="display"-fields in BatchUploader, as now the query is with a normal "=" instead with ILIKE. It was a "=" with LOWER() before.
          Code:
                  <field name="CUST_ADDRESS_COUNTRY_ID" [B]importStrategy="display"[/B] uploadFieldName="ADDRESS_COUNTRY_ID" displayField="CUST_ADDRESS_COUNTRY_NAME"
                      foreignKey="T_COUNTRY.ID">
                      <title><fmt:message key="country" /></title>
                      <validators>
                          <validator type="hasRelatedRecord">
                              <errorMessage><fmt:message key="validatorSelectCountry" /></errorMessage>
                          </validator>
                      </validators>
                  </field>
          Best regards
          Blama

          Comment


            #6
            Could you clarify - if ILIKE is equivalent to = plus lower() then how could this change the importDisplay feature?

            Comment


              #7
              Hi Isomorphic,

              this is the generated query with sql.postgresql.useILike: true in the server.properties:
              Code:
              SELECT longfieldlist FROM tablename JOIN alltheneededjoinsforallthefields [B]WHERE (T_COUNTRY.ISO_3166_1_ALPHA_2='DE')[/B]
              .

              This is the query without the new setting:
              Code:
              SELECT longfieldlist FROM tablename JOIN alltheneededjoinsforallthefields [B]WHERE (LOWER(T_COUNTRY.ISO_3166_1_ALPHA_2)='de')[/B]
              My suspicion is that the new setting should affect the "i" versions of operators contains/startsWith/endsWith and their respective not-versions. This seems to be working fine.
              It somehow also seems to have an effect on iEquals, which is not intended.

              Best regards
              Blama

              Comment


                #8
                Hi Isomorphic,

                can you reproduce this issue?

                Best regards
                Blama

                Comment


                  #9
                  Hi Isomorphic,

                  can you reproduce this issue?

                  Best regards
                  Blama

                  Comment


                    #10
                    Hi Isomorphic,

                    can you reproduce this issue?

                    Best regards
                    Blama

                    Comment


                      #11
                      Apologies for the delay, this was lost in a queue of assigned tasks. We are working on this and will follow up soon.

                      Comment


                        #12
                        We've reproduced behavior you are reporting, but this needs some clarity.

                        First, why do you say effect on iEquals operator is not intended? For case insensitive comparisons iEquals using iLike approach would get same benefits of using indexes as other case insensitive operators, like iContains for example. Can you explain this please?

                        Second, BatchUploader lookup by ID has an issue that needs to be fixed. When sql.postgresql.useILike=false we use lower(field)="value" approach. Then, if sql.postgresql.useILike=true, we could use either same lower(field)="value" approach or switch to using field ilike "value" as well. This is related to the first question, since this lookup is using case insensitive comparison (like iEquals does) and we think using iLike here would make perfect sense. Do you agree?

                        Comment


                          #13
                          Hi Isomorphic,

                          thanks for looking into this. I’m on vacation now and will test and answer in 1-2 weeks, as I don’t know the exact problem we had with it anymore.

                          Best regards
                          Blama

                          Comment


                            #14
                            Hi Isomorphic,

                            to clarify: I was only speaking about the bug.
                            IMHO it does not matter if it is fixed by going back to the old way (like with sql.postgresql.useILike=false) of
                            Code:
                            lower(field) = 'value' (where value is the lowercased version of the search string)
                            or by using
                            Code:
                            field ILIKE 'value' (where value is the lowercased version of the search string or the unmodified search string)
                            , but...
                            • I agree that using ILIKE could have performance benefits (I did not test it)
                            • At the same this would already be solveable with an index on lower(field) (see "Examples" here). This then is doable with a normal btree index, no need for Trigram extension for GIN indexes.
                            • This is not true for the iContains, which needs ILIKE to be performant.
                            • The issue with using the ILIKE approach for iEquals is that you need to escape percent sign ( %) and underscore sign (_) in the search string.

                            So IMHO it seems more straightforward to use ILIKE only for iContains/iStartsWith/iEndsWith and leave iEquals as it is.

                            Does this make sense to you?

                            Best regards
                            Blama

                            Comment


                              #15
                              Just an FYI - the developer handling this ticket is assigned to multiple high priority tasks and has just been admitted to hospital, so this will be a pretty long wait, just to set expectations.

                              Comment

                              Working...
                              X