Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    FileAssembler mangles characters

    I'm running the FileAssembler on a Ubuntu machine (previously I was running it on a Windows machine) and I noticed that it mangles characters like à, ù, etc
    The command file --mime for the original js files says they're UTF-8, while the assembled file is us-ascii.
    Is it possible to tell the assembler to produce a UTF-8 file?

    #2
    If you're running this as a command-line process (no servlet engine) set the JVM default charset to UTF8.

    Comment


      #3
      the command line process is run by Jenkins, which is a webapp that runs under tomcat 7 (JDK 7), which is started with -Dfile.encoding=UTF-8 option

      Comment


        #4
        Also, I've added -Dfile.encoding=UTF-8 option to the java command which launches the FileAssembler.
        Also the Ubuntu locale is set to it_IT.UTF-8
        The assembled file is still us-ascii

        Comment


          #5
          same problem if I run the command directly in a shell:
          Code:
          java -Duser.country.format=IT -Duser.language.format=it -Duser.country=IT -Duser.language=it -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -cp /root/.jenkins/workspace/Jtk_dev/juveModule/web/WEB-INF/lib/*:/root/.jenkins/workspace/Jtk_dev/iscModule/web/WEB-INF/lib/*:/usr/local/tomcat/apache-tomcat-7.0.40/lib/* com.isomorphic.assembly.FileAssembler --webRoot /root/.jenkins/workspace/Jtk_dev/iscModule/web --config /root/.jenkins/workspace/Jtk_dev/juveModule/conf/assembly.juveModule.xml --config /root/.jenkins/workspace/Jtk_dev/Jtk/conf/assembly.xml --outputDir /root/.jenkins/workspace/Jtk_dev/Jtk/web
          it logs:

          Code:
          [B]Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8[/B]
          === SmartClient Init: log4j.isc.config.xml not found in CLASSPATH. Assuming log4j configuration for com.isomorphic is located in some other log4j configuration file.
          Assembling /init.js
          Assembling /roleAdmin.js
          Assembling /everyRole.js
          Note the JAVA_TOOL_OPTIONS
          but the files are us-ascii:

          Code:
          root@srvubuntu:~/.jenkins/workspace/Jtk_dev/iscModule/web/WEB-INF/lib# file --mime /root/.jenkins/workspace/Jtk_dev/Jtk/web/everyRole.js
          /root/.jenkins/workspace/Jtk_dev/Jtk/web/everyRole.js: text/x-c++; charset=us-ascii
          I don't know which is the exact problem with my setup, as I've set UTF-8 everywhere (even sun.jnu.encoding=UTF-8), but if I could pass an encoding option to the FileAssembler...
          Last edited by claudiobosticco; 18 Dec 2015, 02:26.

          Comment


            #6
            Also, this is my locale setting on ubuntu:

            Code:
            root@srvubuntu:~# locale
            LANG=it_IT.UTF-8
            LANGUAGE=it_IT.UTF-8
            LC_CTYPE="it_IT.UTF-8"
            LC_NUMERIC="it_IT.UTF-8"
            LC_TIME="it_IT.UTF-8"
            LC_COLLATE="it_IT.UTF-8"
            LC_MONETARY="it_IT.UTF-8"
            LC_MESSAGES="it_IT.UTF-8"
            LC_PAPER="it_IT.UTF-8"
            LC_NAME="it_IT.UTF-8"
            LC_ADDRESS="it_IT.UTF-8"
            LC_TELEPHONE="it_IT.UTF-8"
            LC_MEASUREMENT="it_IT.UTF-8"
            LC_IDENTIFICATION="it_IT.UTF-8"
            LC_ALL=it_IT.UTF-8

            Comment


              #7
              And this java program:

              Code:
              public class CharSetTest {
              
                  public static void main(String[] args) {
                      System.out.println("file.encoding=" + System.getProperty("file.encoding"));
                      System.out.println("sun.jnu.encoding=" + System.getProperty("sun.jnu.encoding"));
                      System.out.println("Default Charset=" + Charset.defaultCharset());
                      System.out.println("Default Charset in Use=" + getDefaultCharSet());
                  }
              
                  private static String getDefaultCharSet() {
                      OutputStreamWriter writer = new OutputStreamWriter(new ByteArrayOutputStream());
                      String enc = writer.getEncoding();
                      return enc;
                  }
              }
              prints:

              Code:
              root@srvubuntu:/usr/local/tomcat/apache-tomcat-7.0.40/webapps/Jtk##0042/WEB-INF/classes# java -cp . com.juve.jtk.CharSetTest
              Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
              file.encoding=UTF-8
              sun.jnu.encoding=UTF-8
              Default Charset=UTF-8
              Default Charset in Use=UTF8

              Comment


                #8
                We are not reproducing this. There is no hardcoding of ASCII in the SmartClient/SmartGWT toolchain - we use whatever the JVM default is. Can you take a closer look at your source files? Are they in fact being saved with UTF-8 encoding? What do you see if you run e.g:

                Code:
                file /roleAdmin.js
                Note that the 'file' command uses heuristics to make educated guesses about the file type and encoding. Even if you saved a file with UTF-8 encoding, if the file itself contains no characters outside the ASCII character range, then 'file' will report the file encoding as 'ASCII'.

                Also, which version of SmartClient or SmartGWT are you using?

                Comment


                  #9
                  Hello, the problem now is fixed, after adding:

                  Code:
                  export LANGUAGE=it_IT.UTF-8
                  export LANG=it_IT.UTF-8
                  export LC_ALL=it_IT.UTF-8
                  export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8"
                  to the .bashrc file and restarting.

                  Comment


                    #10
                    Glad to hear it - thanks for sharing the settings that worked for you. But looking back on your previous posts, it seems all of these should already have been in effect. Did you change anything else?

                    Comment


                      #11
                      Previously I've tried many things (and I've done a reboot every time).
                      I've run the commands:
                      sudo locale-gen "it_IT.UTF-8"
                      sudo dpkg-reconfigure locales
                      sudo update-locale LANG=it_IT.UTF-8 LC_MESSAGES=it_IT.UTF-8

                      And I've edited:
                      1. the gedit /etc/default/locale file:
                      Code:
                      LANG="it_IT.UTF-8"
                      LANGUAGE="it"
                      LC_NUMERIC="it_IT.UTF-8"
                      LC_TIME="it_IT.UTF-8"
                      LC_MONETARY="it_IT.UTF-8"
                      LC_PAPER="it_IT.UTF-8"
                      LC_IDENTIFICATION="it_IT.UTF-8"
                      LC_NAME="it_IT.UTF-8"
                      LC_ADDRESS="it_IT.UTF-8"
                      LC_TELEPHONE="it_IT.UTF-8"
                      LC_MEASUREMENT="it_IT.UTF-8"
                      LC_ALL="it_IT.UTF-8"
                      LC_MESSAGES="it_IT.UTF-8"
                      2. the /etc/environment file:
                      Code:
                      PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games"
                      LANGUAGE="it"
                      LANG="it_IT.UTF-8"
                      LC_NUMERIC="it_IT.UTF-8"
                      LC_TIME="it_IT.UTF-8"
                      LC_MONETARY="it_IT.UTF-8"
                      LC_PAPER="it_IT.UTF-8"
                      LC_IDENTIFICATION="it_IT.UTF-8"
                      LC_NAME="it_IT.UTF-8"
                      LC_ADDRESS="it_IT.UTF-8"
                      LC_TELEPHONE="it_IT.UTF-8"
                      LC_MEASUREMENT="it_IT.UTF-8"
                      LC_ALL="it_IT.UTF-8"
                      3. the ~/.pam_environment file, so that it contains:
                      Code:
                      LANGUAGE="it"
                      LANG="it_IT.UTF-8"
                      Note that I've got an old release of Ubuntu, I need to update it.

                      Comment

                      Working...
                      X