Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    Paging and server-side sorting on large dataset

    Hi,

    As stated in http://www.smartclient.com/smartgwt/javadoc/com/smartgwt/client/data/ResultSet.html, paging on large dataset is usually coupled with disabling server-side sorting to avoid traversing the whole dataset.

    It's indeed easy to see the impact of adding an order clause to the fetch request in such case: fetch time is much longer.

    So my question is: with a list grid displaying data of a potentially huge table, is there a way to add a default fetch order (like on primary key, descending) without killing performances?

    Rgds

    #2
    Hi agalataud,

    I'd think that "default fetch order" is the same as ORDER. So you either have a ORDER BY-clause in the generated SQL (triggered by a SortSpecifier in your clientside code) - or you don't.

    Try your statement with and without ORDER BY in your DB environment (SQL Developer, Toad, Query Analyzer, ...) and see if you can live with the performance loss. Watch your execution plan and see if the plan does not have a ORDER BY-step (as it is unnecessary because the data are pre-sorted for some reason).

    I know that for Oracle the "fetch only first 75" rows is done via some JDBC settings (as there is no LIMIT in Oracle until 12c) and I do not know how JDBC transmits that information to the DB.
    But you could try to emulate with ROWNUM-WHERE-clauses in order to get as near as possible to your application-setup.

    Best regards,
    Blama

    Comment


      #3
      Blama gives good advice here. This operation may not be particularly slow (it's not clear that you've measured it yet), and if it is, the best solution may be to add a DB index or apply other DB-level optimizations, or might be something entirely different, such as rejecting queries that would produce too many results.

      If you ultimately find that none of the above approaches make sense and you identify some SQL that you have benchmarked as materially faster for the "fetch" operation, let us know if you have any trouble applying it through SQL Templating and we can take a look.

      Comment


        #4
        Hi,

        Thanks to both of you for advice and help.

        It's clear that I need to provide more information on the query, the plans and the stats I collected.
        First of all, the order is done on an indexed primary key (ID). The query is executed on a huge record set: 1M+ records. I tested on a huge volume on purpose, even if production data size will be significantly different and we'll force user to restrict with criteria.

        I only notice important performance degradation when the order clause is added to the query, as shown below in traces:

        With ORDER BY:
        Code:
        === 2014-02-18 11:29:07,805 [l0-7] INFO  SQLDriver - [builtinApplication.auditLog_fetch] Executing SQL query on 'FIRCO': SELECT COUNT(*) FROM AUD_APPLICATION_LOG, AUD_EVENT_DESCRIPTOR, SEC_OPERATOR WHERE ((AUD_APPLICATION_LOG.CREATED >= TO_DATE('2014-02-14 00:00:00','YYYY-MM-DD HH24:MI:SS') AND AUD_APPLICATION_LOG.CREATED <= TO_DATE('2014-02-15 00:00:00','YYYY-MM-DD HH24:MI:SS') AND AUD_APPLICATION_LOG.CREATED IS NOT NULL)) AND AUD_APPLICATION_LOG.CREATED_BY = SEC_OPERATOR.ID AND AUD_APPLICATION_LOG.EVENT_ID = AUD_EVENT_DESCRIPTOR.ID
        === 2014-02-18 11:29:08,328 [l0-7] INFO  SQLDataSource - [builtinApplication.auditLog_fetch] Row count query found 1235508 rows, which exceeds the DataSource's progressiveLoadingThreshold of 200000.  Switching to progressive loading mode.
        === 2014-02-18 11:29:08,328 [l0-7] DEBUG SQLDataSource - [builtinApplication.auditLog_fetch] JDBC driver windowed select rows 0->75, result size 76. Query: SELECT AUD_APPLICATION_LOG.CREATED, AUD_APPLICATION_LOG.CREATED_BY, FN_LOCALIZE(AUD_APPLICATION_LOG.EVENT_ID,'en',AUD_APPLICATION_LOG.DATA,AUD_APPLICATION_LOG.RELOBJ_ID,AUD_APPLICATION_LOG.RELOBJ_TYPE,AUD_APPLICATION_LOG.RELOBJ_LABEL) AS DATA, AUD_APPLICATION_LOG.EVENT_ID, AUD_APPLICATION_LOG.ID, AUD_APPLICATION_LOG.RELOBJ_ID, AUD_APPLICATION_LOG.RELOBJ_LABEL, AUD_APPLICATION_LOG.RELOBJ_TYPE, SEC_OPERATOR.NAME, AUD_EVENT_DESCRIPTOR.CATEGORY, AUD_EVENT_DESCRIPTOR.SEVERITY FROM AUD_APPLICATION_LOG, AUD_EVENT_DESCRIPTOR, SEC_OPERATOR WHERE ((AUD_APPLICATION_LOG.CREATED >= TO_DATE('2014-02-14 00:00:00','YYYY-MM-DD HH24:MI:SS') AND AUD_APPLICATION_LOG.CREATED <= TO_DATE('2014-02-15 00:00:00','YYYY-MM-DD HH24:MI:SS') AND AUD_APPLICATION_LOG.CREATED IS NOT NULL)) AND AUD_APPLICATION_LOG.CREATED_BY = SEC_OPERATOR.ID AND AUD_APPLICATION_LOG.EVENT_ID = AUD_EVENT_DESCRIPTOR.ID ORDER BY ID DESC
        === 2014-02-18 11:30:56,769 [l0-7] DEBUG SQLDataSource - [builtinApplication.auditLog_fetch] Using paging strategy 'jdbcScroll' - scrolling to absolute position 1
        === 2014-02-18 11:30:56,770 [l0-7] DEBUG SQLDataSource - [builtinApplication.auditLog_fetch] Scrolling / positioning took 0ms
        ==> execution took 2min

        Without ORDER BY:
        Code:
        === 2014-02-18 11:33:09,508 [0-13] INFO  SQLDriver - [builtinApplication.auditLog_fetch] Executing SQL query on 'FIRCO': SELECT COUNT(*) FROM AUD_APPLICATION_LOG, AUD_EVENT_DESCRIPTOR, SEC_OPERATOR WHERE ((AUD_APPLICATION_LOG.CREATED >= TO_DATE('2014-02-14 00:00:00','YYYY-MM-DD HH24:MI:SS') AND AUD_APPLICATION_LOG.CREATED <= TO_DATE('2014-02-15 00:00:00','YYYY-MM-DD HH24:MI:SS') AND AUD_APPLICATION_LOG.CREATED IS NOT NULL)) AND AUD_APPLICATION_LOG.CREATED_BY = SEC_OPERATOR.ID AND AUD_APPLICATION_LOG.EVENT_ID = AUD_EVENT_DESCRIPTOR.ID
        === 2014-02-18 11:33:09,576 [0-13] INFO  SQLDataSource - [builtinApplication.auditLog_fetch] Row count query found 1235508 rows, which exceeds the DataSource's progressiveLoadingThreshold of 200000.  Switching to progressive loading mode.
        === 2014-02-18 11:33:09,576 [0-13] DEBUG SQLDataSource - [builtinApplication.auditLog_fetch] JDBC driver windowed select rows 0->75, result size 76. Query: SELECT AUD_APPLICATION_LOG.CREATED, AUD_APPLICATION_LOG.CREATED_BY, FN_LOCALIZE(AUD_APPLICATION_LOG.EVENT_ID,'en',AUD_APPLICATION_LOG.DATA,AUD_APPLICATION_LOG.RELOBJ_ID,AUD_APPLICATION_LOG.RELOBJ_TYPE,AUD_APPLICATION_LOG.RELOBJ_LABEL) AS DATA, AUD_APPLICATION_LOG.EVENT_ID, AUD_APPLICATION_LOG.ID, AUD_APPLICATION_LOG.RELOBJ_ID, AUD_APPLICATION_LOG.RELOBJ_LABEL, AUD_APPLICATION_LOG.RELOBJ_TYPE, SEC_OPERATOR.NAME, AUD_EVENT_DESCRIPTOR.CATEGORY, AUD_EVENT_DESCRIPTOR.SEVERITY FROM AUD_APPLICATION_LOG, AUD_EVENT_DESCRIPTOR, SEC_OPERATOR WHERE ((AUD_APPLICATION_LOG.CREATED >= TO_DATE('2014-02-14 00:00:00','YYYY-MM-DD HH24:MI:SS') AND AUD_APPLICATION_LOG.CREATED <= TO_DATE('2014-02-15 00:00:00','YYYY-MM-DD HH24:MI:SS') AND AUD_APPLICATION_LOG.CREATED IS NOT NULL)) AND AUD_APPLICATION_LOG.CREATED_BY = SEC_OPERATOR.ID AND AUD_APPLICATION_LOG.EVENT_ID = AUD_EVENT_DESCRIPTOR.ID
        === 2014-02-18 11:33:09,605 [0-13] DEBUG SQLDataSource - [builtinApplication.auditLog_fetch] Using paging strategy 'jdbcScroll' - scrolling to absolute position 1
        === 2014-02-18 11:33:09,605 [0-13] DEBUG SQLDataSource - [builtinApplication.auditLog_fetch] Scrolling / positioning took 0ms
        ==> execution takes less than 1 sec.

        Execution plans are almost the same with or without the order clause but when no limit/rowcount is specified, fetching 1M+ rows is logically slow. See attached files for details.

        My first guess was then that paging strategy coupled with order by clause was the cause of such degradation, as stated in the 'Paging and total dataset length' section of ResultSet documentation.
        Attached Files

        Comment


          #5
          Hi agalataud,

          and what is the time for the two statements in SQL Developer/Toad?
          If they are both fast there is a problem. If they are 1s and 2min as well, everything is fine (=as expected and you'll have to optimize elsewhere).

          It's very likely that you'll get a similar plan for a similar statement. And of course one step in an execution plan can make a big difference.

          Best regards,
          Blama

          Comment


            #6
            Hi Blama,

            Both fast - less than 0,05sec - when I emulate paging with 'AND ROWNUM <= 75'

            The problem is I think what is described in the documentation: order clause forces to traverse the whole resultset.

            Comment


              #7
              Hi agalataud,

              a "AND ROWNUM <= 75" sounds like you are adding to your existing WHERE-clause.
              As the ORDER step is always the last, you take some 75 rows and order them. Of course this is fast.

              You'll have to do a "SELECT * FROM (<old_statement_with_order_by>) x WHERE ROWNUM <= 75". The ORDER BY step must be executed for the whole data set, then the set must be stripped down to 75 rows.
              Is this fast as well in SQL Developer? I looked at your execution plan and saw the disk sort of 800k rows, so I'd think it will be way slower.

              Best regards,
              Blama

              Comment


                #8
                You're absolutely right, thanks for pointing this out.

                Now if I reduce the query to the strict minimum (remove function call, remove dates-related clauses) I still have a big difference in terms of execution time when looking in smartgwt logs (without order: 0.02sec / with order: 20sec+).

                I don't reproduce in sqldeveloper using select * from (...) where rownum <= 75, where both executions (with and without order by) are between 0.01 and 0.02secs

                Comment


                  #9
                  Hi agalataud,

                  I remember having a similar (unsolved) problem (fetch/SELECT from SmartGWT/Tomcat slow, SELECT from SQL Developer fast).

                  I'll look up the thread in the Oracle forums and post the link tomorrow. Perhaps it will help.

                  I think that it will come down to some lower-level JDBC/OCI problems and that Isomorphic can't do much here, but you never know.

                  Best regards,
                  Blama

                  Comment


                    #10
                    I don't reproduce in sqldeveloper using select * from (...) where rownum <= 75, where both executions (with and without order by) are between 0.01 and 0.02secs
                    If your SQL is really still something as simple as "where rowNum >= 75", again, you aren't asking Oracle to do the same work that SmartGWT is asking it to do. Try the *exact same query*.

                    If that's still significantly faster or slower, you should look at things like the network connection between your SQL Developer and the DB vs what SmartGWT has to go through, JDBC driver bugs, or other issues like that.

                    Comment


                      #11
                      Hi Isomorphic,

                      I agree that a different SQL will lead to perhaps uncompareable results.
                      But even for the letter-by-letter same SQL, how can one ensure that the environment is the same? One can try native and JDBC, but that's all, isn't it?

                      Example given: If you enter a execute a SELECT in Toad, it just gets the first x rows. If you hit Crtl-A, all matching rows are pushed to the client.
                      How is this done? And how can one tell the JDBC driver when used in Toad/SQL Developer to do your "DEBUG SQLDataSource - JDBC driver windowed select rows 0->75".

                      Thank you & Best regards,
                      Blama

                      Comment


                        #12
                        *If* there turns out to be a difference, you isolate the cause like you would with any other issue - try it from Java instead of Toad, make sure the JDBC driver and version is the same, make sure the user credentials and dynamic JDBC settings are the same, make sure the network topology is the same, etc.

                        Comment


                          #13
                          Hi,

                          My goal here is to use paging to limit results executed on large datasets.
                          Executing the same exact query than the one issued by smartgwt will be of course significantly longer, if I don't limit number of rows returned at some point.

                          What Blama advised seems to be a good way to emulate paging as the same request is executed then 'windowed' to a number of results.
                          Note that except the query is run in SQL developer (so probably different version of Jdbc driver is used), topology of test is the same than when application is running: same network, same hosts for app and DB, same DB.

                          Now I don't know what are the parameters applied to the statement and result set when SmartGwt issues the query. Maybe you could shed some light on this point so I can reproduce in a standalone Java test case.

                          Rgds

                          Comment


                            #14
                            Nothing special is done, it's standard JDBC usage, applying the properties you provide in server.properties.

                            Let us know if you isolate something that appears to be a bug or performance issue with SmartGWT. It doesn't sound like you've spotted anything of the kind yet - just a slow query if you run the same one.

                            Comment

                            Working...
                            X