Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    Errors may not be logged on file upload

    I have a DynamicForm with a FileItem field that users can use to upload files. It works great. In testing however, I've discovered that when I try to upload obscenely large files (which our users may try to do), it's possible to get a sort of "silent failure". Is there a way to catch and log this error?

    This happens if the JVM on the server throws an OutOfMemoryError "before" the "maxFileSize" attribute of the datasource is validated. What happens is that nothing is logged on the server, and the client never gets any kind of response. Instead, it appears that the stack trace for the error is actually placed in the HTTP response body itself; the HTTP response returns fairly quickly with a 200 code, but includes just the error text. I determined this using Fiddler, where I see the following response body:

    Code:
    isc.logWarn("java.lang.OutOfMemoryError: Java heap space\r\n\tat java.util.Arrays.copyOf(Arrays.java:2271)\r\n\tat java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)\r\n\tat java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)\r\n\tat java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:122)\r\n\tat java.io.FilterOutputStream.write(FilterOutputStream.java:77)\r\n\tat com.isomorphic.io.ByteCountingOutputStream.write(ByteCountingOutputStream.java:45)\r\n\tat java.io.FilterOutputStream.write(FilterOutputStream.java:125)\r\n\tat org.apache.commons.fileupload.util.Streams.copy(Streams.java:101)\r\n\tat org.apache.commons.fileupload.util.Streams.copy(Streams.java:64)\r\n\tat org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)\r\n\tat org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:310)\r\n\tat com.isomorphic.servlet.ISCHttpServletRequest.parseRequest(ISCHttpServletRequest.java:247)\r\n\tat com.isomorphic.servlet.ISCHttpServletRequest.parseRequest(ISCHttpServletRequest.java:219)\r\n\tat com.isomorphic.servlet.ISCHttpServletRequest.getStringParams(ISCHttpServletRequest.java:119)\r\n\tat com.isomorphic.servlet.ISCHttpServletRequest.getParameter(ISCHttpServletRequest.java:300)\r\n\tat com.isomorphic.rpc.RPCManager.parseRequest(RPCManager.java:2107)\r\n\tat com.isomorphic.rpc.RPCManager.<init>(RPCManager.java:309)\r\n\tat com.isomorphic.rpc.RPCManager.<init>(RPCManager.java:289)\r\n\tat com.isomorphic.servlet.IDACall.processRequest(IDACall.java:132)\r\n\tat com.isomorphic.servlet.IDACall.doPost(IDACall.java:73)\r\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:754)\r\n\tat com.isomorphic.servlet.BaseServlet.service(BaseServlet.java:152)\r\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:847)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:329)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)\r\n\tat org.jboss.weld.servlet.ConversationPropagationFilter.doFilter(ConversationPropagationFilter.java:62)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)\r\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)\r\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)\r\n\tat org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:153)\r\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)\r\n");
    There's nothing logged on the server, and no feedback to the client. It waits and eventually the user sees an "Operation timed out" popup. I happen to be using JBoss, and I'm not sure if this is just how JBoss is handling the uncaught exception or if it is being caught somewhere in the Isomorphic code. I suppose I could wrap a filter around the IDACall servlet and see if I can do something about it, but I'd rather that the IDACall servlet caught the error and returned it in a form that the client could understand, or at least log the error on the server side.

    Of course you can easily fix this by cranking up the maximum heap size appropriately, but I have concerns that once in production, this could become a config error that is hard to diagnose.

    Here's a relevant code snippet of the form:
    Code:
    final DynamicForm uploadForm = new DynamicForm();
    DataSource ds = DataSource.get("FileUpload");
    uploadForm.setDataSource(ds);
    
    HiddenItem keyItem = new HiddenItem("keyField");
    // Give any arbitrary key value, but it is required.
    keyItem.setValue("999");
    
    		
    final FileItem fileItem = new FileItem("file");
    // To start with, allow a single file.
    fileItem.setMultiple(false);
    		
       ...
    		
    ButtonItem saveItem = new ButtonItem("save", "Save");
    saveItem.addClickHandler(new com.smartgwt.client.widgets.form.fields.events.ClickHandler() {
    	@Override
    	public void onClick(
    				com.smartgwt.client.widgets.form.fields.events.ClickEvent event) {
    		uploadForm.saveData(new DSCallback() {
    
    			@Override
    			public void execute(DSResponse dsResponse, Object data,
    					DSRequest dsRequest) {
    				// I never get here on this type of error.  
    				refresh();
    			}					
    		});		
    	}
    });
    And here's the relevant ds.xml:
    Code:
    <DataSource ID="FileUpload"
    	serverConstructor="com.eterra.tdm.ui.wiring.CdiDataSourceDelegate">
    	<fields>
    		<field name="keyField" type="text" hidden="true" primaryKey="true" />
    		<field name="file" type="binary" title="File" />
    		<field name="path" type="text" title="Path" />
    	</fields>
    </DataSource>

    For version info, I'm using SmartGWT EE 4.1, version v9.1p_2014-03-25/Enterprise Deployment (from the developer console).

    #2
    How the error is reported is actually not important. The way you are testing, with presumably just you hitting the server, you happen to be getting an OutOfMemoryError for the request doing the upload. In general, with a multi-threaded server, the error could happen absolutely anywhere, and permanently wedge the server instance. So this is an error you need to prevent altogether, and nuances of reporting it aren't relevant.

    The most efficient handling is to prevent the file upload from even starting. Here is some sample code for doing this on modern browsers:

    http://stackoverflow.com/questions/3717793/javascript-file-upload-size-validation

    Note this is moderately advanced, and you will need to use some DOM selectors/traversal to get to the UploadItem since that element doesn't actually have a predictable ID when SmartGWT renders it. This is something that will eventually be built into SmartGWT, and you could sponsor it to get it done by a specific date if you need it.

    The second best, and still required approach, is to block the upload at some layer earlier than the "maxFileSize" DataSourceField attribute. The "maxFileSize" check is too late; we have to parse the request to look up your DataSource and discover this setting, and by that time, we're already out of memory.

    Depending on your deployment environment, you are probably able to set a maximum system-wide upload size either in JBoss or in webservers in front of JBoss (if you have those).

    Comment


      #3
      Good information, but

      Thanks for the information on being able to prevent the error, which is very helpful and of course we can take advantage of.

      And of course, we'd love to be able to prevent the error altogether; and I know, OOMs basically hose the JVM and you typically have to reboot it at that point. And yes, our operations engineers can change the heap size of the JVM to fix the underlying errors when they occur. The client side validation you suggested is also great! That would make the system more robust, and I'd love to see that in place.

      However, I have a problem with your assertion:
      -So this is an error you need to prevent altogether, and nuances of reporting it aren't relevant.

      This is kind of fantasy in production. It's not like a null pointer exception, where I can prove with source code analysis that it will never happen. This is a result of tuning several knobs that will be done by operations people, and even if it is well tuned, you can only assert that it will "probably" never happen given the traffic you expect.

      I guess I'm less worried about the fact that the system breaks, but more that it's pretty tough to figure out *why* it's broken. Because there were no logs, I was the second developer who looked at this test case to figure out what was going on. The first one stared at it for two days and gave up.

      I'm just asking for logging because I know that this kind of stuff drives our support people and our end users into a hateful rage.

      Sorry for the long rant, but please consider it.

      Comment


        #4
        To clarify: the nuances of how this error is reported *for an end user* aren't relevant, because you need to prevent it from happening at all, or the server is likely to die.

        Developers of course need to be able to figure out what happened. However, according to what we're seeing, this problem is already logged in the server-side log, just like any other exception, so it should have been really really easy to figure out. But we'll double-check this.

        Comment


          #5
          We've made a change where the code will now make an attempt to log this error to the server log as well.

          Note that for this particular kind of error, unique amongst all errors, even the attempt to log it may fail (if some memory needs to be allocated in the process!).

          Comment

          Working...
          X