Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    Fetching N rows at a time with com.isomorphic.datasource.DataSource ?

    Hi,

    We have been using the server-side DataSource (i.e. com.isomorphic.datasource.DataSource) for quite a while and it works well for us.

    However, one of the operations that we use is DataSource.fetch(null) which fetches all rows in the DataSource at once. The returned object is a java.util.List that holds all the rows in the result set. When the result set is extremely large, this object will take up much memory in the server.

    We would like to instead fetch N rows at a time, but it is not obvious to us how to do that. Can anyone shed some light on this?

    One of the solutions that we have been thinking is to artificially adding one more column called rowNum, which will have an integer value starting from 1 and up. And then we add a "criteria" for rowNum. The first time from 1 to N, the second time from N+1 to 2N, and so on. Would this work? Would there be more straightforward method? (We have not used "criteria" before so would like to hear from you first before we try)

    Thanks!


    Our environment are listed below:

    1. SmartClient Version: v8.3p_2013-01-15/PowerEdition Deployment (built 2013-01-15)

    2. Browser: (not applicable)

    3. Server log: (not applicable)

    #2
    You can set startRow and endRow to fetch ranges of rows at a time. You can see in the generated SQL that these values are being passed to the database as part of the SQL (the SQL to do this differs by DB).

    This approach means the query is being repeated of course, as opposed to streaming results from the server, which is what we use to do very large CSV exports. If you're interested in having a supported & documented API to do streaming in the same way we do CSV exports, consider Feature Sponsorship.

    Comment


      #3
      Hi,

      This is performing a fetch() on a DataSource on the server (com.isomorphic.datasource.DataSource) not on the client.

      That is, I was following the API on this page

      http://www.smartclient.com/smartgwtee/server/javadoc/?com/isomorphic/datasource/DataSource.html

      I could not find any mentioning of startRow and endRow there. Can you give a few lines of example on how to pass the two values to fetch() ?

      Comment


        #4
        See the server-side JavaDoc for DSRequest: there you will see setStartRow() and setEndRow().

        Comment


          #5
          Hi,

          Thanks for pointing me in the right direction.

          I have not used server-side DSRequest before. I wrote the following code but got a NullPointerException. Did I forget to set some of the fields of the object?

          Code:
          dsRequest.setStartRow( i*N );
          dsRequest.setEndRow( (i+1)*N - 1 );
          dsRequest.setCriteria(null);
          dsResponse = ds.executeFetch(dsRequest); // this line got NullPointerException
          nRowsOnly = dsResponse.getDataList();
          I was trying to emulating the original code below (which worked perfectly well) with the additional control of startRow and endRow

          Code:
          List<HashMap<String, Object>> rows = ds.fetch(null);
          printTraceStack() showed the following info:

          Code:
          java.lang.NullPointerException
          	at com.isomorphic.sql.SQLDataSource.processRequest(SQLDataSource.java:286)
          	at com.isomorphic.sql.SQLDataSource.executeFetch(SQLDataSource.j
          ava:258)
          ...

          Comment


            #6
            It's invalid to call executeFetch() directly. Call dsRequest.execute(), or DataSource.execute(dsRequest). The former is usually simpler.

            Comment


              #7
              Hi,

              I rewrote to the following lines, but I still got a NullPointerException

              Code:
              dsRequest.setStartRow( i*N );
              dsRequest.setEndRow( (i+1)*N - 1 );
              dsRequest.setCriteria(null);
              dsRequest.setOperationType("fetch");
              dsResponse = ds.execute(dsRequest); // This line got a NullPointerException
              nRowsOnly = dsResponse.getDataList();
              The error log was:

              Code:
              === 2013-09-04 12:00:34,386 [l0-9] INFO  SQLDataSource - Performing fetch operation with
              	values: 0 valueSets
              java.lang.NullPointerException
              	at com.isomorphic.sql.SQLDataSource.SQLExecute(SQLDataSource.java:1253)
              	at com.isomorphic.sql.SQLDataSource.processRequest(SQLDataSource.java:314)
              	at com.isomorphic.sql.SQLDataSource.executeFetch(SQLDataSource.java:258)
              	at com.isomorphic.datasource.DataSource.execute(DataSource.java:1367)
              ...
              Any more suggestions will be deeply appreciated.

              Comment


                #8
                Perhaps too obvious to mention, but did you provide a valid DataSource ID in the constructor of this dsRequest?

                Note that setCriteria(null) is also invalid, and does nothing.

                Comment


                  #9
                  Thanks a lot. That did the trick !

                  However, I have another question now. Below is my code, which tried to read 500 rows at a time, for 10 iterations.

                  Code:
                  List<HashMap<String, Object>> nRowsOnly;
                  DSRequest dsRequest = new DSRequest(ds.getName(), "fetch");
                  DSResponse dsResponse;
                  int N = 500;
                  for (int i=0; i < 10; i++) {
                      dsRequest.setStartRow( i*N   );
                      dsRequest.setEndRow  ( (i+1)*N );
                      dsResponse = ds.execute(dsRequest);
                      nRowsOnly = dsResponse.getDataList();
                      long actualNumRows = nRowsOnly.size();
                      logger.error("Another " + String.valueOf(actualNumRows) + " has been read");
                      if (actualNumRows==0) {
                          break; // no more rows to fetch
                      }
                  }
                  When I looked at the 10 queries that got generated, I noticed these strange MySQL LIMIT statements (for each statement the first number is the offset, the second is the row_count

                  Code:
                  WHERE ('1'='1') LIMIT 0, 500
                  WHERE ('1'='1') LIMIT 500, 500
                  WHERE ('1'='1') LIMIT 1000, 500
                  WHERE ('1'='1') LIMIT 1500, 500
                  WHERE ('1'='1') LIMIT 2000, 500
                  WHERE ('1'='1') LIMIT 2500, 500
                  WHERE ('1'='1') LIMIT 2842, 658
                  WHERE ('1'='1') LIMIT 2684, 1316
                  WHERE ('1'='1') LIMIT 2026, 2474
                  WHERE ('1'='1') LIMIT 868, 4132
                  The actual resultSet has 3,342 rows.

                  I can understand the LIMIT statement for the first six iterations. But why in the 7th iteration, the LIMIT statement became "LIMIT 2842, 658" but not just "LIMIT 3000, 500"? In both case the sum of the offset and row_count would be 3,500. But with the statement generated by SmartGWT, 2842th - 3000th rows would be read in twice -- once in the 6th iteration, and once in the 7th.

                  I also expected the 8th iteration, had the LIMIT statement be "LIMIT 3500, 500", to return zero row. If that was the case, I could then use the check on actualNumRows to break the for-loop without having to know ahead of the query how many rows there were. But the 8th iteration actually got "LIMIT 2684, 1316" and read in 1316 rows. That breaks my plan. In other words, I will have to do a "SELECT count(*)" first to know the exact number of rows so that I can put in an exact loop count for the for-loop.

                  Comment


                    #10
                    Some more observations. The test in the last post (#9) was done with progressiveLoading set to false (the default). If I tried to set progressiveLoading to true, hence disable the extra "select count(*)" issued by SmartGWT automatically. I would actually got 13 queries issued.

                    Code:
                    WHERE ('1'='1') LIMIT 0, 501
                    WHERE ('1'='1') LIMIT 500, 502
                    WHERE ('1'='1') LIMIT 1000, 503
                    WHERE ('1'='1') LIMIT 1500, 504
                    WHERE ('1'='1') LIMIT 2000, 505
                    WHERE ('1'='1') LIMIT 2500, 506
                    WHERE ('1'='1') LIMIT 3000, 507
                    WHERE ('1'='1') LIMIT 3500, 508
                    WHERE ('1'='1') LIMIT 0, 509
                    WHERE ('1'='1') LIMIT 4000, 510
                    WHERE ('1'='1') LIMIT 0, 511
                    WHERE ('1'='1') LIMIT 4500, 512
                    WHERE ('1'='1') LIMIT 0, 513
                    Ten of them were from my for-loop, but three more -- those at the end but with offset 0 -- were "refetch" attempts automatically issued by SmartGWT.

                    I saw lines like these for them in the log

                    Code:
                    === 2013-09-04 18:24:49,951 [l0-2] DEBUG SQLDataSource - Fetched no records, attempting refetch from 0 to 507
                    I really want to be able to suppress the "select count(*)" because my data sets are very large and I hope I do not have to make an extra pass to just count the number of rows.

                    Thanks!

                    Comment


                      #11
                      Hi,

                      Even if I am, for now, willing to do a "select count(*)", and to manage carefully the number of rows read, and also be careful to issue endRow to never exceed the true number of rows in the data set, I still could not achieve the desired result.

                      Here is my code:

                      Code:
                      List<HashMap<String, Object>> nRowsOnly;
                      DSRequest dsRequest = new DSRequest(ds.getName(), "fetch");
                      DSResponse dsResponse;
                      int N = 500;
                      long totalRows = 0;
                      long actualTotalNumRows = 0;
                      long startRow    = 0;
                      long endRow      = N;
                      for (int i=0; i < 10 /* forever until break */; i++) {
                          dsRequest.setStartRow( startRow );
                          dsRequest.setEndRow  ( endRow   );
                          dsResponse = ds.execute(dsRequest);
                          nRowsOnly = dsResponse.getDataList();
                          rows.addAll(nRowsOnly); // emulate old behavior
                      
                          long actualNumRows = nRowsOnly.size();
                          logger.debug("Another " + String.valueOf(actualNumRows) + " has been read");
                          actualTotalNumRows += actualNumRows;
                      
                          // Determine if we need to issue another query
                          if (i==0) { // calculate this only once
                              totalRows = dsResponse.getTotalRows();
                              logger.debug("totalRows = " + String.valueOf(totalRows));
                          }
                          if (totalRows==0) {
                              break; // no data
                          } else if (actualTotalNumRows==totalRows) {
                              logger.debug("All data read.  Break ");
                              break; // all data read
                          } else {
                              startRow += N;
                              endRow   += N;
                              endRow   = (endRow > totalRows) ? totalRows : endRow; // never read pass totalRows
                              logger.debug("startRow is " + String.valueOf(startRow)
                                      + "; endRow is " + String.valueOf(endRow));
                          }
                      }
                      logger.debug("Finished fetching: totally " + String.valueOf(rows.size()) + " are in the List rows");
                      The following log shows the startRow and endRow issued in each iteration.

                      Code:
                      2013-09-04 19:13:29,678 DEBUG DynDsExporterFactory,btpool0-1:98 - Another 500 has been read
                      2013-09-04 19:13:29,678 DEBUG DynDsExporterFactory,btpool0-1:104 - totalRows = 3342
                      2013-09-04 19:13:29,678 DEBUG DynDsExporterFactory,btpool0-1:115 - startRow is 500; endRow is 1000
                      2013-09-04 19:13:29,817 DEBUG DynDsExporterFactory,btpool0-1:98 - Another 500 has been read
                      2013-09-04 19:13:29,817 DEBUG DynDsExporterFactory,btpool0-1:115 - startRow is 1000; endRow is 1500
                      2013-09-04 19:13:29,992 DEBUG DynDsExporterFactory,btpool0-1:98 - Another 500 has been read
                      2013-09-04 19:13:29,993 DEBUG DynDsExporterFactory,btpool0-1:115 - startRow is 1500; endRow is 2000
                      2013-09-04 19:13:30,190 DEBUG DynDsExporterFactory,btpool0-1:98 - Another 500 has been read
                      2013-09-04 19:13:30,190 DEBUG DynDsExporterFactory,btpool0-1:115 - startRow is 2000; endRow is 2500
                      2013-09-04 19:13:30,330 DEBUG DynDsExporterFactory,btpool0-1:98 - Another 500 has been read
                      2013-09-04 19:13:30,331 DEBUG DynDsExporterFactory,btpool0-1:115 - startRow is 2500; endRow is 3000
                      2013-09-04 19:13:30,467 DEBUG DynDsExporterFactory,btpool0-1:98 - Another 500 has been read
                      2013-09-04 19:13:30,467 DEBUG DynDsExporterFactory,btpool0-1:115 - startRow is 3000; endRow is 3342
                      2013-09-04 19:13:30,786 DEBUG DynDsExporterFactory,btpool0-1:98 - Another 500 has been read
                      2013-09-04 19:13:30,786 DEBUG DynDsExporterFactory,btpool0-1:115 - startRow is 3500; endRow is 3342
                      2013-09-04 19:13:30,989 DEBUG DynDsExporterFactory,btpool0-1:98 - Another 500 has been read
                      2013-09-04 19:13:30,989 DEBUG DynDsExporterFactory,btpool0-1:115 - startRow is 4000; endRow is 3342
                      2013-09-04 19:13:31,117 DEBUG DynDsExporterFactory,btpool0-1:98 - Another 500 has been read
                      2013-09-04 19:13:31,118 DEBUG DynDsExporterFactory,btpool0-1:115 - startRow is 4500; endRow is 3342
                      2013-09-04 19:13:31,195 DEBUG DynDsExporterFactory,btpool0-1:98 - Another 500 has been read
                      2013-09-04 19:13:31,195 DEBUG DynDsExporterFactory,btpool0-1:115 - startRow is 5000; endRow is 3342
                      2013-09-04 19:13:31,195 DEBUG DynDsExporterFactory,btpool0-1:121 - Finished fetching: totally 5000 are in the List rows
                      The critical problem is in the 7th iteration, when I have already explicitly set startRow to 3000, and endRow to 3342 (try to not read pass the actually number of rows in the data set, which is 3342). Still, SmartGWT use a row_count of 500 (where did this come from), and lower the offset from 3000 to 2842.

                      Code:
                      WHERE ('1'='1') LIMIT 0, 500
                      WHERE ('1'='1') LIMIT 500, 500
                      WHERE ('1'='1') LIMIT 1000, 500
                      WHERE ('1'='1') LIMIT 1500, 500
                      WHERE ('1'='1') LIMIT 2000, 500
                      WHERE ('1'='1') LIMIT 2500, 500
                      WHERE ('1'='1') LIMIT 2842, 500 <--- 7th iteration
                      WHERE ('1'='1') LIMIT 2842, 500
                      WHERE ('1'='1') LIMIT 2842, 500
                      WHERE ('1'='1') LIMIT 2842, 500
                      Now I feel like there is a bug in SmartGWT. Please help !

                      Comment


                        #12
                        Generally speaking, what you're seeing is heuristics kicking in which are trying to do the best thing in the scenario where a client-side component such as a ListGrid is querying the server, does not have definitive knowledge of the number of rows in the table, and if it has made a request for rows that don't exist, is going to need some reasonable set of rows returned in order for the UI to function without having to turn around and issue another request immediately.

                        That's why you see things like like a fetch from 3000 to 3500 when there are really only 3342 rows converted to "LIMIT 2842, 500". The client clearly wants about 500 rows near the end of the table but didn't know there were only 3342 rows, so we fetch the last 500 and return that.

                        Similarly when the requested range is entirely beyond the available rows, the range is converted to go from 0 up to the number of rows requested.

                        The best approach for what you want to do (aside from sponsoring a true streaming API as we previously mentioned) is to have progressiveLoading off for the first request (so you get an accurate totalRows), then on for subsequent requests, and avoid ever setting startRow and endRow values that would fetch beyond the end of the range.

                        Comment


                          #13
                          Okay that makes sense.

                          How do I turn on progressiveLoading after a DataSource is already instantiated? Or must I re-instantiate a new one?

                          If you can show some example code that would be great. Thanks again for your help!

                          Comment


                            #14
                            progressivoading can be set on a per-operationBinding basis.

                            Comment


                              #15
                              Hi,

                              Thanks for pointing us in the right direction.

                              We now can correctly fetch the rows without having any rows repeated. This is done by using a new DSRequest for the last partial page -- we leverage the fact the heuristic would NOT kick in for the first fetch, so if we fetch the last page with a fresh DSRequest, we can control the exact offset and number of rows with startRow and endRow.

                              So so far so good.

                              However, we found that with progressiveLoading turned off, SmartGWT always issues a "select count (*)" for each fetch() operation -- even if the DataSource is the same one. It seems to me the first "select count(*)" is justified (to count the number of rows in the resultSet), but the subsequent ones are not.

                              This is causing us performance problems because the "select count(*)" operations are very expensive.

                              Is there a way for us to turn off the subsequent "select count(*)"?

                              We also tried turning on progressiveLoading (assuming that somehow we already know the total number of rows in the resultSet), but with this method we got even more interference from the heuristic. Our settings of startRow and endRow are often not honored -- SmartGWT always seems to add 1 or 20 rows to the number of rows that we instruct it to fetch.

                              Thanks!

                              Comment

                              Working...
                              X