Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    Question regarding field length and string length / bytes

    Hi Isomorphic,

    I have a question regarding <field type="text" length="..." /> in .ds.xml using v11.1p_2018-11-01.
    Now some introduction and observations, if you want you can skip to the end, where the framework related questions are.

    In Oracle (using version 11.2), the maximum length of a column is 4000 CHAR or BYTE (which is a bit strange). I think the reason for this are small fields, where this works as expected, see here:
    Code:
    DROP TABLE smallsizetest;
    CREATE TABLE smallsizetest
    (
      cont VARCHAR2(30 CHAR),
      onebyte VARCHAR2(1 BYTE),
      onechar VARCHAR2(1 CHAR)
    );
    INSERT INTO smallsizetest (cont, onebyte, onechar) VALUES('normal',  null, 'a');
    INSERT INTO smallsizetest (cont, onebyte, onechar) VALUES('umlaut',  null, 'ä');
    INSERT INTO smallsizetest (cont, onebyte, onechar) VALUES('unicode', null, ''); --Face with tears of joy U+1F602
    INSERT INTO smallsizetest (cont, onebyte, onechar) VALUES('normal',  'a',  null);
    INSERT INTO smallsizetest (cont, onebyte, onechar) VALUES('umlaut',  'ä',  null); --[B]Error[/B]
    INSERT INTO smallsizetest (cont, onebyte, onechar) VALUES('unicode', '', null); --Face with tears of joy U+1F602, [B]Error[/B]
    COMMIT;
    SELECT * FROM smallsizetest;
    For bigger fields at the 4000-limit, this behaves different, where the total limit is 4000 BYTE, see this test:
    Code:
    DROP TABLE sizetest;
    CREATE TABLE sizetest
    (
      cont VARCHAR2(30 CHAR),
      lenb INTEGER,
      len INTEGER,
      val VARCHAR2(4000 CHAR)
    );
    INSERT INTO sizetest(cont, lenb, len, val)
    WITH
    normal2000  AS (SELECT 'normal2000' AS cont, rpad('a', 2000, 'a') AS val FROM dual),
    normal4000  AS (SELECT 'normal4000' AS cont, rpad('a', 4000, 'a') AS val FROM dual),
    normal5000  AS (SELECT 'normal5000' AS cont, rpad('a', 5000, 'a') AS val FROM dual),
    umlaut2000  AS (SELECT 'umlaut2000' AS cont, rpad('ä', 2000, 'ä') AS val FROM dual),
    umlaut4000  AS (SELECT 'umlaut4000' AS cont, rpad('ä', 4000, 'ä') AS val FROM dual),
    umlaut5000  AS (SELECT 'umlaut5000' AS cont, rpad('ä', 5000, 'ä') AS val FROM dual),
    unicode2000  AS (SELECT 'unicode2000' AS cont, rpad('', 2000, '') AS val FROM dual),
    unicode4000  AS (SELECT 'unicode4000' AS cont, rpad('', 4000, '') AS val FROM dual),
    unicode5000  AS (SELECT 'unicode5000' AS cont, rpad('', 5000, '') AS val FROM dual),
    unioned AS (
    SELECT * FROM normal2000
    UNION ALL
    SELECT * FROM normal4000
    UNION ALL
    SELECT * FROM normal5000
    UNION ALL
    SELECT * FROM umlaut2000
    UNION ALL
    SELECT * FROM umlaut4000
    UNION ALL
    SELECT * FROM umlaut5000
    UNION ALL
    SELECT * FROM unicode2000
    UNION ALL
    SELECT * FROM unicode4000
    UNION ALL
    SELECT * FROM unicode5000
    )
    SELECT cont, lengthb(val) AS lenb, LENGTH(val) AS len, val FROM unioned;
    COMMIT;
    SELECT * FROM sizetest ORDER BY cont;
    Result:
    Click image for larger version  Name:	Result.PNG Views:	1 Size:	3.2 KB ID:	255910
    As you can see, the result is never longer than 4000 BYTE (and therefore sometimes less than the defined 4000 CHAR).

    Additional information, if needed:
    Code:
    SELECT value$ FROM sys.props$ WHERE name = 'NLS_CHARACTERSET';
    Result: AL32UTF8

    Framework related part:
    Now (only thinking about 2-byte umlauts) I set my database to 2000 CHAR and my .ds.xml field length as well to 2000, so that it can work with 2000 umlauts.
    This is working as expected.

    Now a customer is complaining about the 2000 not being big enough and I was thinking about allowing 4000 CHAR, but thought it might cause problems with your server code, if I send 4000 chars that is in reality more than 4000 byte.
    This is due to these docs, which say "Maximum number of characters allowed" at the very beginning.

    Now I tested this behavior for the 1st time. I does seem that you also to somehow watch bytes and not chars, but I'm not sure what the framework is exactly doing.
    I have field with type=text length=2000 in .ds.xml and a TextAreaItem.

    This is the maximum I can insert and save without the validator complaining (already on the client, Chromium 69.0.3497.81 / Win 10):
    Code:
    2000x "ä" --Umlaut, OK
    2000x "a" --Normal "a", OK
    1000x "" [B]--Why only 1000? 1000x "" and 1x "a" is too much. Is this a bug? /[/B][B]/ FORUM SOFTWARE REMOVED THE "Face with tears of joy", U+1F602 here twice[/B]
    Can you explain how your implicit length validator works - client side and server side?
    There seems to be some mixup between char length and byte length, but it does not make sense to me.

    My goal is to set the maximum field length in DB (not using CLOB) and SmartGWT, that will never cause an exception.

    Best regards
    Blama
    Last edited by Blama; 21 Nov 2018, 08:52.

    #2
    We check JavaScript String.length on the client and Java String.length() on the server.

    Your observations about obscure unicode smileys and byte length are interesting, and may indicate obscure browser bugs or internal hacks in unicode implementations.

    It's obviously not a good idea for us to go down the route of scanning strings, client and server, to try to work around issues with high-unicode smiley faces, and it's extremely unlikely that losing a single character in max length per unicode smiley would ever affect end users, so we'd suggest not spending further time on this.

    Comment


      #3
      Hi Isomorphic,

      agreed - my problem is not loosing some chats at the end, my problem is with failed INSERTs/UPDATEs and DB error messages, if the new string is too big.
      I'll set the size to 4000 CHAR in my local DB and .ds.xml and see if I can cause problems that way.

      Best regards
      Blama

      Comment

      Working...
      X