Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    Optimizing server side database access code

    We use SQLDataSource for almost all of our data access. I'm doing some benchmark testing of data intensive operations and would like some advice on how to optimize our code. As a test, I've created a simple table with an auto-generated primary key and six other columns, along with an index over two of the other columns.

    To test performance of inserts and updates I've created a simple DMI class that does 500 inserts into the table then fetches each record using the indexed fields and updates it using the primary key.

    Can you please take a look at the code and let me know if there is a more efficient way to do this?

    The table and index definitions.
    Code:
    CREATE TABLE BENCHMARK ( 
    ID INTEGER GENERATED ALWAYS AS IDENTITY, 
    FIELD1 CHAR(10) CCSID 937 NOT NULL DEFAULT '' , 
    FIELD2 CHAR(10) CCSID 937 NOT NULL DEFAULT '' , 
    FIELD3 CHAR(10) CCSID 937 NOT NULL DEFAULT '' , 
    FIELD4 CHAR(10) CCSID 937 NOT NULL DEFAULT '' , 
    FIELD5 CHAR(10) CCSID 937 NOT NULL DEFAULT '' , 
    FIELD6 CHAR(10) CCSID 937 NOT NULL DEFAULT '' , 
    PRIMARY KEY( ID ) )
    
    CREATE UNIQUE INDEX BENCHMARKA 
    ON BENCHMARK ( FIELD1 ASC , FIELD2 ASC ) ;
    The ds.xml
    Code:
    <DataSource ID="benchmark" dataFormat="iscServer"
    	serverType="sql" dbName="as400" sparseUpdates="true" tableName="benchmark"
    	serverConstructor="com.islandpacific.gui.server.customDataSource.BenchmarkDS">
    	<!-- Datasource used for benchmark testing. -->
    	<fields>
    		<field name="id" title="id" type="sequence" primaryKey="true"/>
    		<field name="field1" title="field1" type="text"/> 
    		<field name="field2" title="field2" type="text"/>
    		<field name="field3" title="field3" type="text"/>
    		<field name="field4" title="field4" type="text"/>
    		<field name="field5" title="field5" type="text"/>
    		<field name="field6" title="field6" type="text"/> 
    	</fields>
    	<operationBindings>
    		<operationBinding operationType="remove" allowMultiUpdate="true"/>
    	</operationBindings>	
    </DataSource>
    The DMI class. You'll notice it also deletes the records created by previous tests first. Not too concerned about that since it doesn't reflect an real-life scenario.
    Code:
    package com.islandpacific.gui.server.customDataSource;
    
    import java.util.HashMap;
    import java.util.Map;
    
    import com.isomorphic.datasource.DSRequest;
    import com.isomorphic.datasource.DSResponse;
    
    public class BenchmarkDS extends IpDataSource {
    
    	@Override
    	public DSResponse executeCustom(DSRequest req) throws Exception {
    		
    		int howMany = 500;
    		DSResponse response = new DSResponse();
    		String responseString = "";
    		
    		long time = System.currentTimeMillis();
    		DSRequest removeReq = newRequestInContext(req, "benchmark", OP_REMOVE);
    		removeReq.setCriteria("field6", "value_6");
    		DSResponse removeResp = removeReq.execute();
    		time = System.currentTimeMillis() - time;
    		responseString += "<br/>Total time to delete "+removeResp.getAffectedRows()+" records was " + time + " milliseconds";
    		
    		time = System.currentTimeMillis();
    		for (int i=0; i<howMany; i++) {
    			DSRequest addReq = newRequestInContext(req, "benchmark", OP_ADD);
    			HashMap addRec = new HashMap();
    			addRec.put("field1", "value_"+i);
    			addRec.put("field2", "value_"+i);
    			addRec.put("field3", "value_"+i);
    			addRec.put("field4", "value_"+i);
    			addRec.put("field5", "value_"+i);
    			addRec.put("field6", "value_6");
    			addReq.forceInvalidateCache(true);
    			addReq.setValues(addRec);
    			addReq.execute();
    		}
    		time = System.currentTimeMillis() - time;
    		responseString += "<br/>Total time to insert "+howMany+" records was " + time + " milliseconds";
    		
    		time = System.currentTimeMillis();
    		for (int i=0; i<howMany; i++) {
    			DSRequest fetchReq = newRequestInContext(req, "benchmark", OP_FETCH);
    			HashMap criteria = new HashMap();
    			criteria.put("field1", "value_"+i);
    			criteria.put("field2", "value_"+i);
    			fetchReq.setCriteria(criteria);
    			DSResponse fetchResp = fetchReq.execute();
    			Map rec = fetchResp.getDataMap();
    			rec.put("field3", "update_"+i);
    			rec.put("field4", "update_"+i);
    			rec.put("field5", "update_"+i);
    			DSRequest updateReq = newRequestInContext(req, "benchmark", OP_UPDATE);
    			updateReq.setCriteria("id", rec.get("id"));
    			updateReq.setValues(rec);
    			updateReq.forceInvalidateCache(true);
    			updateReq.execute();
    		}
    		time = System.currentTimeMillis() - time;
    		responseString += "<br/>Total time to fetch/update "+howMany+" records was " + time + " milliseconds";
    		
    		response.setData(responseString);
    		return response;
    	}
    
    }
    The newRequestInContext method. We use this method whenever we create a server side DSRequest so it has access to the same context as the main request that came from the client.
    Code:
    	protected static DSRequest newRequestInContext(DSRequest req, String dataSource, String operationType) {
    		DSRequest theRequest;
    		theRequest = new DSRequest(dataSource, operationType);
    		theRequest.context = req.context;
    		theRequest.setRPCManager(req.getRPCManager());
    		return theRequest;
    	}

    #2
    Here is a comparison using standard JDBC versus the Iso framework.

    Using standard JDBC with prepared statements:

    Total time to delete 500 records was 78 milliseconds
    Total time to insert 500 records was 562 milliseconds
    Total time to fetch/update 500 records was 2282 milliseconds

    Using Iso framework classes:

    Total time to delete 500 records was 78 milliseconds
    Total time to insert 500 records was 7435 milliseconds
    Total time to fetch/update 500 records was 14125 milliseconds

    Comment


      #3
      I found one thing we overlooked. datasources.pool.enabled: true in server.properties. That cut the time in half.

      Total time to delete 500 records was 47 milliseconds
      Total time to insert 500 records was 3172 milliseconds
      Total time to fetch/update 500 records was 7594 milliseconds

      Anything else we should consider?

      I should have mentioned that we're using the dynamic DS generator feature. But we are using DataSource.fromXML() and caching the xml.

      Comment


        #4
        We are also:

        1. Doing this in the context of a transaction
        2. Re-selecting the inserted data for every insert in order to deliver the record-as-saved to the client
        3. Not re-using a PreparedStatement since every insert has potentially unrelated sets of fields
        4. Adding paging logic around fetches (eg a count query first)

        Note we don't really optimize for a case of a large batch of inserts, because UIs don't usually do this.

        Comment


          #5
          I understand the differences and appreciate the benefits the framework brings, but unfortunately our app deals with apparell buying and many of the use cases consist of entering information into matrix grids by color/size and then saving that to a normalized database structure where each color/size is a separate record.

          I just want to make sure we're aren't overlooking something that could boost performance for these use cases.

          Regarding your point #2, in this case we don't need the record-as-saved to return to the client. Am I correct in understanding that DSRequest.forceInvalidateCache(true) disables the extra fetch? Is there a reason that api is not documented?

          Any other performance tips for optimizing I/O intensive operations yet still getting the benefits of the framework would be greatly appreciated.

          Comment


            #6
            There's nothing wrong with multiple inserts, but are you saying 500 inserts plus two complete table fetches (there's no criteria and no startRow/endRow on your fetch dsRequests) is a real and common operation in your app? Because if it's not, then if you were to try your actual use case you might find nothing worth optimizing.

            Another difference: if your JDBC code isn't iterating over the complete results of the fetch, it's not equivalent.

            You can avoid the re-fetch after an insert by setting invalidateCache on the operationBinding. forceInvalidateCache() has a different purpose, ignore it.

            Comment


              #7
              There is criteria on the fetch, using the two fields that have a unique index. It results in one record returned, which is why I'm not iterating over the results.
              Code:
              			DSRequest fetchReq = newRequestInContext(req, "benchmark", OP_FETCH);
              			HashMap criteria = new HashMap();
              			criteria.put("field1", "value_"+i);
              			criteria.put("field2", "value_"+i);
              			fetchReq.setCriteria(criteria);
              			DSResponse fetchResp = fetchReq.execute();
              We're experiencing performance problems in production deployments, which is why we're looking for ways to optimize.

              Is there a way to set invalidateCache on the request itself? We've been using DSRequest.forceInvalidateCache(true) for a while now and it does appear to skip the cache-synch fetch. I don't want to find all instances of it and refactor to use a separate operationBinding if I can avoid it.

              Comment


                #8
                Sorry to press, but the question is whether the performance problems have been definitively tracked to this series of operations? It's very very easy when trying to optimize to bark up the wrong tree. So if this 500 insert use case isn't a direct reflection of something that's been isolated as the main problem in production, we would suggest going back to analyzing what's happening in production before doing a deep dive here. It's very common to see a team spend weeks optimizing the wrong thing and gain <1% from it.

                One other possible red herring: if you've got JIT enabled, run the test ~50x times and take the last results, otherwise, you may be principally measuring the JVM's time to JIT our code.

                You can also turn up the log level to see where time is being spent (timestamp deltas between statements). Note this will of course temporarily make things slower.

                That all said, if you really have 500 identical inserts and all the dynamic checking we do doesn't apply, there's nothing wrong with dropping to JDBC just for that. In this sample, we show how to do this and participate in an automatic transaction.

                Finally, on forceInvalidateCache() - generally don't rely on undocumented APIs. In this case, if you want an API to suppress the cache sync operation, we could add one, but forceInvalidateCache() means more than that, and that's why it wasn't doc'd.

                Comment


                  #9
                  Do you know of a good log analyzer that could help in calculating timestamp deltas between statements? I've looked at a few and none seem to have that feature, although it seems like an obvious need.

                  Comment


                    #10
                    Can't really recommend one with that specific feature, no. We have only looked at log analyzer from the perspective of tracking errors. Consider also a profiler though.

                    Comment

                    Working...
                    X