Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    Jdbc fetch size for export feature

    Isomorphic,

    We have scenarios where users pulls reports ( list grid) which have 2-3 lakhs of rows. Currently we have limited them to pulling 1000 rows into the listgrid and then we asked them to either filter out records or use scroll bar to get other records. We have done this by setting the pagesize on the listgrid.
    However, we would like the customer to be able to pull the whole report(2-3 lakh rows) using the export feature. But what we observed was it was very slow.

    Our observations:
    Setting pageSize on a listgrid is actually defining the fetch size for the query pull. So if we put page size of 1 lakh on listgrid the same report which we tried exporting in the previous case comes out very fast i.e 1 lakh rows are recieved from database in one request. We checked on the properties available on dsRequest which would allow us to set something similar for export too, but didn't find it. So what is the right way to improve the export performance. We suspect the fetch size being on the lower side for the expor t is responsible for the performance.

    My guess is some parameter in server.properties. If this is right, we wouldn't want the fetch size to be applicable for all queries that hit the database. So how do we achieve this.

    Also, we don't want to use JNDI.

    Thanks.

    #2
    If by export you mean exportData() (not that there is also exportClientData() and it's implemented very differently) then there is no data paging going on, all data is fetched. So there's nothing to optimize in the area you are looking. Generally the processing that takes a while here will be:

    1. using POI to generate the XLS

    2. streaming this to the browser (if it's large)

    Let us know if you think you've found something that suggests this could be done any faster.

    Comment


      #3
      Isomorphic,

      We use exportData() currently. Is there a way I can actually identify what is taking time, database time or xls creation time. Only way I can think of is interrupting in the dmi.

      In our earlier j2ee project, we had connection pooling where we could specifically mention the jdbc fetch size for a query. Just wanted to know if any of those details could be set if we use smartclient connection pool or oracle connection pool.

      Thanks.

      Comment


        #4
        Use standard techniques to assess performance: look at the millisecond gaps between logs for coarse-grained information, use a Java profiler for finer-grained information.

        We really don't think the JDBC fetch size is a useful thing to tune, but you can see if your JDBC driver itself accepts such a parameter.

        Comment


          #5
          I have noticed that a query which returns 50000 rows takes about 25 minutes to export to excel under normal conditions. However if I use the setEndRow(50000) method of the DSRequest it takes less than a minute to export.

          So is the setEndRow() method internally setting the fetchRowSize for the fetch operation which causes it to return faster?

          Comment


            #6
            We'll see if we can reproduce that effect, but, what might the problem is that you have a poor setting for the default fetch size? This is what JDBC ResultSet.setFetchSize() configures on-the-fly, but you can set a default system-wide. Discussion here:

            http://www.coderanch.com/t/302254/JDBC/java/Significance-ResultSet-setFetchSize

            Comment


              #7
              I am trying to set the default fetch size in the server.properties file. But it doesn't seem to work.
              Is there any way to check if a property I set in the server.properties file is being set?The logs dont tell me much.

              Here is a part of my server.properties
              Code:
              sql.Oracle.driver: oracle.jdbc.pool.OracleDataSource
              
              sql.Oracle.pool.maxActive:10
              sql.Oracle.pool.maxIdle:5
              sql.Oracle.pool.maxWait:5000
              sql.Oracle.pool.minEvictableIdleTimeMillis:5000
              sql.Oracle.pool.numTestsPerEvictionRun: 5
              sql.Oracle.pool.timeBetweenEvictionRunsMillis: 10000
              sql.Oracle.pool.defaultRowPrefetch: 1000
              sql.Oracle.pool.defaultRowPrefetch: 1000
              Is this the right way to set the fetch size? Or is there some other way I can do it?

              Comment


                #8
                For some reason you're trying to set it on the pool. sql.Oracle.driver.defaultRowPrefetch would be how to affect the driver.

                Comment


                  #9
                  I have tried sql.Oracle.driver.defaultRowPrefetch. It doesnt seem to work.
                  I have also tried using another driver (sql.Oracle.driver: oracle.jdbc.driver.OracleDriver) and setting the Row Prefetch size on that. That too doesnt work.
                  It is really important that we can set this fetch size since ours users usually export large number of records. And waiting for about 20mins for the export to happen is not an option. Isomorphic Please help!

                  Comment


                    #10
                    When we try this with a simple, single-table fetch of just over 50,000 rows, the export takes less than 30 seconds regardless of the setting of "endRow". Can you share some details of the query that you suggest is 50x slower on the same data volumes? Is it extremely complex? Does it involve a large number of joins?

                    Comment


                      #11
                      Isomorphic,

                      The following is one of the sample queries we have in our app.

                      Code:
                      SELECT MB.ORIGIN ORIG,
                          A1.CITY_NAME ORIGCITY,
                          A1.COUNTRY_NAME ORIGCOUNTRY,
                          A1.WORLD_AREA_NAME ORIGAREA,
                          MB.DESTINATION DEST,
                          A2.CITY_NAME DESTCITY,
                          A2.COUNTRY_NAME DESTCOUNTRY,
                          A2.WORLD_AREA_NAME DESTAREA ,
                          MB.DOMAIRL MKTAIRLINE,
                          CD.CARRIER_NAME MARKETINGAIRLINENAME ,
                          ROUND(SUM( SUM(MB.UPAX)) OVER (PARTITION BY MB.ORIGIN, A1.CITY_NAME, A1.COUNTRY_NAME, A1.WORLD_AREA_NAME , MB.DESTINATION, A2.CITY_NAME, A2.COUNTRY_NAME, A2.WORLD_AREA_NAME , MB.DOMAIRL, CD.CARRIER_NAME )*100/SUM(SUM(MB.UPAX)) OVER ( PARTITION BY MB.ORIGIN, A1.CITY_NAME, A1.COUNTRY_NAME, A1.WORLD_AREA_NAME , MB.DESTINATION, A2.CITY_NAME, A2.COUNTRY_NAME, A2.WORLD_AREA_NAME ), 2) AIRLINESHARE ,
                          SUM(MB.UPAX) BOOKINGS,
                          ROUND(SUM(REVENUE)/SUM(UPAX),2) FARE,
                          ROUND(SUM(REVENUE),0) REVENUE
                        FROM FCT_MIDT_OD_SUMMARY_MTHY MB ,
                          AIRPORT_DIM A1,
                          AIRPORT_DIM A2 ,
                          M_DIM_MONTH MTH ,
                          CARRIER_DIM CD
                        WHERE MB.ORIGIN    = A1.AIRPORT_IATA_CODE
                        AND MB.DESTINATION = A2.AIRPORT_IATA_CODE
                        AND MB.DOMAIRL     = CD.CARRIER_CODE
                        AND MB.MONTH_ID    = MTH.MO_ID
                        AND MTH.MO_ID BETWEEN '2430' AND '2441'
                        AND (((A1.WORLD_AREA_CODE = '002'
                        AND A1.WORLD_AREA_CODE   IS NOT NULL))
                        AND '1'                   ='1'
                        AND '1'                   ='1'
                        AND '1'                   ='1'
                        AND '1'                   ='1'
                        AND '1'                   ='1'
                        AND '1'                   ='1'
                        AND '1'                   ='1'
                        AND '1'                   ='1'
                        AND '1'                   ='1'
                        AND '1'                   ='1')
                        GROUP BY MB.ORIGIN,
                          A1.CITY_NAME,
                          A1.COUNTRY_NAME,
                          A1.WORLD_AREA_NAME,
                          MB.DESTINATION,
                          A2.CITY_NAME,
                          A2.COUNTRY_NAME,
                          A2.WORLD_AREA_NAME ,
                          MB.DOMAIRL,
                          CD.CARRIER_NAME
                      The table FCT_MIDT_OD_SUMMARY_MTHY here has millions of rows and the remaining tables used in joins have rows between 3000 to 12000 rows.
                      We are not very much concerned at the complexity of the query at this point, we are trying to understand the difference in behavior for some sample use cases we tried.

                      Some statistics on the data the query would pull:

                      complete data output size of the above query: 50000 - 60000 rows
                      ListGrid page size we use: 1000
                      Time taken to populate the listgrid with 1000 rows: <2 minutes
                      Time taken when trying to export the complete 50k rows by calling exportData() on the listGrid: >20 minutes

                      Our observations so far:

                      1) Setting a higher dataPageSize on the ListGrid populates the ListGrid faster than the time it would take to export the report. We set ListGrid dataPageSize property to 50,000 and listgrid got populated in less than 4 minutes. This indicates that dataPageSize is affecting the time taken to fetch data. However the data when exported takes more time i.e. >20 minutes and this makes us believe that dataPageSize on listGrid is infact playing a role here.

                      2) For the export itself, when we set the endRow to a high number 50,000 in this case on the dsRequest object the export is pretty fast i.e. <3-4 minutes. But it limits to only those rows (i.e fetches exactly 50000 rows) which is understood. But when we don't mention the endRow on dsRequest export takes >20 minutes though there are only a 1000 more rows than in the previous case of 50k. This clearly indicates endRow has a role to play in the time taken to fetch data.

                      From the above we could only suspect fetchSize to be the culprit which is being affected by dataPageSize in case of ListGrid and endRow in case of Export with dsRequest, which is improving the query time significantly. Hence we are trying to find a way to set the jdbc fetch size.

                      Can you explain these behaviors and let us know how we can resolve this performance issue in export. Also, correct us if any of our understanding is wrong.

                      Thanks.

                      Comment


                        #12
                        As we mentioned, we see no performance effect from setting endRow, so it may be that setting endRow is actually triggering some performance difference in your business logic, not anything in SmartGWT or Oracle.

                        However, since setting endRow seems to have an effect for you, try setting it to a number much larger than the number of exported rows. Our SQLDataSource will handle this without a problem. If the weird performance benefit still occurs in this case, you might just set endRow to a very high number - larger than any export that the user could reasonably complete without causing the server to run out of memory, for example.

                        Comment


                          #13
                          Just a further note that we tried simulating a complex join between several million+ row tables. Still no difference with setEndRow vs not.

                          About your overall approach - you may have noted that POI does not support streaming export (at least until the next version), so while we enable streaming export with CSV, we cannot currently do so with POI. So, separately from how well the export performs, you should consider that your server is likely to run out of memory if 2-3 users simultaneously request an export.

                          Solutions to this would include deferring exports so they run on a Quartz timer and email the user when complete, or implementing a streaming export against the next version of POI. Both could be done as sponsorships.

                          Comment


                            #14
                            Isomorphic,

                            Are you going to enable streaming export with CSV. Any timeline on when it would be out.

                            Thanks.

                            Comment


                              #15
                              As indicated earlier in this thread, this is already a feature.

                              Comment

                              Working...
                              X