Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    Large data sets: Memory consumption and performance issues

    Hello --

    First, my obligatory system information:

    - SmartGWT Power version: 2.3
    - Current browsers: Chrome (v6.0.472.63), IE (v8.0.6001.18702), Firefox (v3.6.9)
    - Java: Sun JVM build 1.6.0_20-b02
    - Tomcat version: 5.5.29
    - Desktop OS: Windows XP Service Pack 3
    - System memory: 4 Gb
    - Oracle version: 11g
    - Oracle JDBC driver version: ojdbc6, 11.2.0.1.0

    I currently have a table with approximately 2.25 million rows. I have defined a simple DataSource associated with this table, and associated the DataSource with a ListGrid. Upon rendering the ListGrid for the first time and fetching data, data from this table arrives quickly and successfully. Visually, the ListGrid displays 20 rows in the browser's interface, and the page size is kept at 75 rows.

    It does not take long for query performance to slow dramatically, and eventually max out the heap space, regurgitating OutOfMemoryErrors to Tomcat's logs. If I grab the ListGrid's scroll bar's cursor and drag it approximately half-way down the side of the grid, queries can take upwards of 2.5 minutes to return data (when returning data successfully). Performing this large of a cursor-drag query more than four or five times nets OutOfMemoryErrors. The JVM's maximum heap size is set at 1,024 Mb.

    The last messages directed to Tomcat's log files immediately prior to the wait (and eventual OutOfMemoryError) resemble the following:

    Code:
    === 2010-09-25 14:44:00,059 [55-6] INFO  SQLDriver - [builtinApplication.schedule_fetch] Executing SQL query on 'Oracle': SELECT COUNT(*) FROM adent.schedule WHERE ('1'='1')
    === 2010-09-25 14:44:01,665 [55-6] DEBUG SQLDataSource - [builtinApplication.schedule_fetch] JDBC driver windowed select rows 1546704->1546779, result size 75. Query: SELECT schedule.window_position, schedule.duration, schedule.spot_id, schedule.zone_id, schedule.window_open_date, schedule.status, schedule.event_id, schedule.machine_id, schedule.channel_id FROM adent.schedule WHERE ('1'='1')
    === 2010-09-25 14:45:42,169 [55-3] ERROR IDACall - Top-level servlet error: 
    java.lang.OutOfMemoryError: Java heap space
    It's important to mention that the table whose data I'm querying has a composite primary key. In addition, I have reindexed the table immediately prior to performing the queries that result in the poor performance and OutOfMemoryErrors above.

    Does anyone have suggestions on how best to optimize ListGrids and accompanying DataSources for acceptable performance when querying tables whose rows number in the millions? If Isomorphic's SQL query is, indeed, selecting all rows in this table (as the log file's query suggests) prior to isolating only those rows to return to the client, is this backend operation dependent upon the amount of memory I can allocate for the JVM's heap?

    I've attached the DataSource definition file for reference.

    Thanks for any assistance!

    Regards,

    Brad Andersen
    AZAD Consultant - ARRIS Group, Inc.
    Attached Files

    #2
    You can enable SQL-based limiting of results by setting sql.oracle.supportsSQLLimit = true in server.properties. This is off by default because it can affect certain custom queries, but is definitely fine for most queries and in particular for the query you're showing.

    Also, even counting matching rows is an expensive operation for a large table, so consider using the progressive loading pattern explained in the ResultSet docs.

    Comment


      #3
      Thank you very much for your prompt reply!

      Applying the following property (as you suggest),

      Code:
      sql.oracle.supportsSQLLimit: true
      appears to encourage even worse performance.

      Here is an example of a query generated by your server when the supportsSQLLimit property is set to true. (I've introduced newlines and tabs to make the SQL a bit more readable.)

      Code:
      SELECT * 
      FROM 
      (
          SELECT a.*, rownum myrownum 
          FROM 
          (
              SELECT schedule.window_position, schedule.duration, schedule.spot_id,
                  schedule.zone_id, schedule.window_open_date, schedule.status, 
                  schedule.event_id, schedule.machine_id, schedule.channel_id 
              FROM adent.schedule 
              WHERE ('1'='1')
          ) a 
          where rownum <=841201
      ) 
      WHERE myrownum >= 841126
      When invoked using your server within our application, the request times out. When invoking this query in Oracle's SQL Developer, the query returns successfully, but only after a whopping 680 seconds. One might assume that the inner query's retrieval of all rows less than the upper bound of the rows desired to retrieve is a culprit.

      I'm also not certain that the progressive loading solution effectively increases performance. Retrieving the total number of rows in the table requires a small fraction of the time to execute the original query.

      Code:
      === 2010-09-26 13:03:47,712 [55-3] INFO  SQLDriver - [builtinApplication.schedule_fetch] Executing SQL query on 'Oracle': SELECT COUNT(*) FROM adent.schedule WHERE ('1'='1')
      === 2010-09-26 13:03:49,210 [55-3] DEBUG SQLDataSource - [builtinApplication.schedule_fetch] Using SQL Limit query
      1.5 seconds to retrieve the row count isn't problematic, so I don't think that allowing the server to return a calculated (as opposed to queried) number of rows in the table would be effective here.

      We'll continue to investigate other solutions.

      Regards,

      Brad Andersen
      AZAD Consultant - ARRIS Group, Inc.

      Comment


        #4
        Please re-read the progressive loading discussion under the ResultSet docs - the purpose is to limit the user's ability to scroll to the end of the dataset until the number of results has been reduced by filtering to an amount that is reasonable to scroll through. Realize that with two millions rows and a 1000 px tall grid, each single pixel of the scrollbar is more than 1000 rows.

        That said, if you are seeing 680 seconds for that query, something is wrong with your DB, JDBC driver or both. There isn't a faster way to slice data from the middle of the table, according to Oracle's own advice and articles. Try a clean setup and figue out what problem makes that query slow (you can do this independently of SGWT).

        Comment


          #5
          Originally posted by Isomorphic
          Please re-read the progressive loading discussion under the ResultSet docs - the purpose is to limit the user's ability to scroll to the end of the dataset until the number of results has been reduced by filtering to an amount that is reasonable to scroll through. Realize that with two millions rows and a 1000 px tall grid, each single pixel of the scrollbar is more than 1000 rows.
          We're attempting to replicate the historical behavior of a thick client which allowed users to scroll to the end of the result set. Therefore, we'd like to minimize pagination as much as possible. However, the argument favoring performance over toeing the historical line might eventually win.

          Originally posted by Isomorphic
          That said, if you are seeing 680 seconds for that query, something is wrong with your DB, JDBC driver or both. There isn't a faster way to slice data from the middle of the table, according to Oracle's own advice and articles. Try a clean setup and figue out what problem makes that query slow (you can do this independently of SGWT).
          Indeed. Where performance has been abysmal has been performing queries against a materialized view which is aggregating data from a variety of homogeneous data sources. Upon changing the configured data source and directing queries against one of these tables from which the materialized view is built, better performance is exhibited (especially when using the supportsSQLLimit property). Something's amiss ... we'll figure it out.

          Thanks again.

          Comment


            #6
            Thanks for confirming there's a DB problem here.

            Yes, it's important to allow scrolling to the end - but above a certain size (perhaps 100k records or so) it's useless to the user and slows down the server unnecessarily.

            So we recommend switching to a progressive loading style when initial results are too large, and switching back to normal mode once filtering reduces the number of results.

            This will be the default behavior of the SQLDataSource in the future.

            Comment


              #7
              Would you mind giving an example of a custom query that is affected by the setting "sql.oracle.supportsSQLLimit: true" ?
              Thanks

              Comment

              Working...
              X