Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    character encoding in DataSourceLoader servlet

    Hi all,

    I edit all my files (both source and DS XML) as UTF-8.
    My HTML files declare content charset as UTF-8.
    I load my DataSources using the DataSourceLoader servlet.

    I see the following error: all non-trivial (ie. ascii) characters in text fields specified in DS XML (like title, pluralTitle, etc...) have encoding errors when I read then out on my client side.

    I narrowed the problem down to the DataSourceLoader servlet:
    - The DS XML files it uses for input are in UTF-8
    - If I directly call the servlet from a browser, and check the output,
    I can see that it not UTF-8.

    Is there a way to configure this servlet to return UTF-8 data?
    Or it this a bug?

    Thank you for your help!

    #2
    You need to set the character encoding at the JVM level (it's a startup option "file.encoding").

    Comment


      #3
      If possible, I like to avoid depending on external configuration, so I have created a subclass of the DataSourceLoader servlet, which sets the encoding of the response before calling the real doGet() and doPost() methods.

      Now it does what I need.

      Comment


        #4
        Originally posted by csillag
        If possible, I like to avoid depending on external configuration, so I have created a subclass of the DataSourceLoader servlet, which sets the encoding of the response before calling the real doGet() and doPost() methods.

        Now it does what I need.

        Does it work with the export to xml or csv?? Do you mind to share your code?

        Comment


          #5
          Unless there are very unusual circumstances (like lacking control of the JVM), you want to set file.encoding as we originally suggested. Otherwise, all of your code related to network or file IO will potentially have to explicitly set a charset.

          Comment


            #6
            Originally posted by Isomorphic
            Unless there are very unusual circumstances (like lacking control of the JVM), you want to set file.encoding as we originally suggested. Otherwise, all of your code related to network or file IO will potentially have to explicitly set a charset.
            I would like to follow your suggestion, but according to System.getProperties, file.encoding _is_ set to UTF-8 by default, and the DataLoadServlet still returns iso-1 data. (I am running the default built-in Jetty server with GWT Developement Mode.)

            What could be wrong?

            Comment


              #7
              How are you detecting the charset of the returned content? If you're seeing an HTTP header with charset specified other than UTF-8, that's not SmartGWT (with the default installation procedure, this does not happen).

              Comment


                #8
                Originally posted by Isomorphic
                How are you detecting the charset of the returned content?
                When I point my browser to the servlet, several ways give consistent results:
                - Firefox's "Page Info" dialog
                - Manually changing the character encoding in the View menu until
                the chars look right

                If you're seeing an HTTP header with charset specified other than UTF-8,
                According to FireBug, the response headers are these:

                Transfer-Encoding chunked
                Server Jetty(6.1.x)

                ------------------------------------------

                Among the request headers, I have found this:

                Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7

                This might influence the Servlet to return ISO-1.
                (Although it should not.)

                that's not SmartGWT (with the default installation procedure, this does not happen).
                I have this servlet mapped in the web.xml.
                com.isomorphic.base.Init is also declared properly,
                with load-on-startup, and has been initiated,
                when the jetty (of GWT dev. mode) was started.

                I don't think I have done anything out of ordinary while installing SmartGWT.

                Do you have any idea what could be wrong?
                (I have the workaround in place, so I am only worrying because of the other
                (not yet surfaced) consequences of the root problem causing this.)
                Last edited by csillag; 7 Jan 2010, 06:41.

                Comment


                  #9
                  Ah OK. The response is being returned in UTF-8 but the server is not advertising that this is the case and the client, for whatever reason, is guessing incorrectly (based on your locale settings, presumably) and mangling the file.

                  We'll fix this by adding an explicit header "Content-Type: text/javascript;charset=utf-8" to the response.

                  You can fix this in the immediate term by adding a filter servlet that adds this header.

                  Comment


                    #10
                    Originally posted by Isomorphic
                    Ah OK. The response is being returned in UTF-8 but the server is not advertising that this is the case and the client, for whatever reason, is guessing incorrectly (based on your locale settings, presumably) and mangling the file.
                    I might be wrong here, but this conclusion seems wrong to me.

                    If I explicitly tell my browser to interpret the data as UTF-8, it looks wrong. (The accented characters are all wrong.)
                    It only looks right if I tell the browser it to interpret as ISO-1.

                    If I save the result, and analyze it with any tool, it seems to be ISO-1.

                    Therefore, I think it _is_ ISO-1.
                    Last edited by csillag; 7 Jan 2010, 09:15.

                    Comment


                      #11
                      This would also result from the .ds.xml file being saved in ISO encoding.

                      When we read the file we are not specifying an encoding (therefore getting the JVM file.encoding which you say you've set to UTF-8).

                      When we write the file to the servlet output stream we are likewise not setting an encoding (so getting the platform encoding again).

                      Comment


                        #12
                        The .ds.xml file is UTF-8. (This was one of the first things I checked.)

                        I have not yet set the file.encoding; I just read it (using System.getProperties() in one of my init servlets), and I see that it's UTF-8.

                        * * *

                        My guess would be that it's the Jetty engine who configures the ISO-1 encoding on the servlet output stream (which is completely stupid).

                        Wherever it comes from, if I override it with response.setCharacterEncoding("UTF-8"), things get to work normally.

                        Comment


                          #13
                          Originally posted by Isomorphic
                          We'll fix this by adding an explicit header "Content-Type: text/javascript;charset=utf-8" to the response.
                          Thanks, good hint.

                          I still had the same issue even with the JVM property set, on jetty in web & compiled mode and on a websphere in Chrome and FireFox.
                          The characters looked good in the .ds.xml, even in the http response, but still did not show up correctly when shown in a databoundcomponent.

                          This fixes it for all those case:
                          Code:
                          public class MyDataSourceLoader extends DataSourceLoader {
                          
                          	@Override
                          	public void service(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException {	
                          		try {
                          			res.setContentType("text/javascript;charset=utf-8"); //International characters
                          			super.service(req, res);
                          		} finally {
                          		}		
                          	}

                          Comment

                          Working...
                          X