Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

  • mathias
    replied
    Hey thanks for looking into this more. If I can, will let you know.

    Leave a comment:


  • Isomorphic
    replied
    Just to follow up here - we’ve revisited the Reflection methods involved, and there is no plausible scenario that we can see where concurrent threads could cause an issue. Plus we have the previously mentioned concurrency test cases that exercise these APIs..

    We could be wrong of course, so if you can find a way that we can reproduce the problem, we can of course look again.

    Leave a comment:


  • Isomorphic
    replied
    It’s possible that would work. It’s also possible that relative placement of jars (eg system vs entire app server vs app-specific) could make a difference.

    You may also be able to “prime” the system by eg, asking JSTranslater to translate some dummy data at server startup.

    And of course you’ve already discovered the order-of-operations / concurrency nature of the problem, so that implies a number of app-level workarounds with reordering operations.

    Unfortunately, getting to the root cause is a daunting task of basically running the JVM under a debugger.

    Leave a comment:


  • mathias
    replied
    Thanks for the heads up. Guess i've never thought of it since i've never ran into it. Also, i don't really get it, the class has surely been loaded into the JVM long before this. Oh well.

    Haha yeah thanks, you just made my day a bit better for not having to deal with backwards compatibility stuff on the server. It's enough with all the shenanigans on IOS and Android...

    Leave a comment:


  • Isomorphic
    replied
    Try googling “spring classloader problem” to see several hundred problems - a tiny subset of the bizarre problems that can come from classloader issues, which are especially rife with Spring.

    We are forced to use Reflection because if you reference a class in your code in Java, the class must be present and loaded, even if never actually used at runtime. Therefore any class that might not be present has to be used via CRAPI (Java Core Reflection API - yea CRAPI is the real acronym).

    Without using reflection for LocalDate, we could not continue to support old Java where such classes aren’t present.

    We need to do the same thing for any optional libraries that we want to support: Hibernate.

    Every framework has to go through acrobatics like this due to Java language limitations - just be glad you as an app developer don’t have to deal with it!

    Leave a comment:


  • mathias
    replied
    OK. Sounds super alien to me, never heard of or seen before.

    If i may, can i ask why you do all this reflection stuff? Why not just cast it to a LocalDate and make the method calls directly?

    Leave a comment:


  • Isomorphic
    replied
    Yes, sorry, you’re correct, there’s the underlying exception in the log and there is nothing to improve in our logging (and we internally had realized the same thing, this morning).

    We’re going to take a quick look for possible concurrency issues in the Reflection class, but honestly, odds are bad that this is our issue. We have a lot of concurrency stress tests and they make a lot of calls into Reflection.

    What this looks like is that with your particular stack of Spring classloaders and whatever JVM you’re using, sometimes LocalDates do show up as not having methods they really do have. Possibly this is part of the JVMs attempt to somehow deal with the way that certain Java packages keep moving in and out of being official Java APIs, in different Java versions.

    Leave a comment:


  • mathias
    replied
    Isn't that in the stack trace above, too? :

    Code:
    Caused by: java.lang.NoSuchMethodException: Method getLong not found on class java.time.LocalDate
    at com.isomorphic.base.Reflection.findMethod(Reflection.java:452)
    at com.isomorphic.base.Reflection.findMethod(Reflection.java:401)
    at com.isomorphic.base.Reflection.invokeMethod(Reflection.java:545)
    at com.isomorphic.js.JSTranslater.convertSimple(JSTranslater.java:1208)

    Leave a comment:


  • Isomorphic
    replied
    Great, this is clearly a concurrency or order-of-operations issue.

    As far as the stack trace, you have the stack trace from the exception our framework throws. You want to see a stack trace for whatever exception was thrown that our framework then responded to by throwing the UnconvertableException. You can get the stack trace (and exception type) of the underlying Exception using your debugger.

    Leave a comment:


  • mathias
    replied
    Well to be honest, i would have thought the issue was somewhere in your framework. To me, the logs/stack trace and the exception being thrown is a very strong indicator that some strange dealings is going on. Please don't kill me...

    I'll try to answer the rest, but first some more debugging info:
    I tried strippinging down my entry point so that it either loads the schedule grid or the timeline but not both at the same time. I loaded each about 5-10 times, no exceptions thrown.

    As for your additional questions:

    3. Well, given that i tried the above thing and it could load that exact row without issue, i don't think it's anything wrong with the actual data. I also tried with my "pre-refactoring" web, and that also loads the rows in question without issue. I don't think it's a "special data" issue.

    4. I have the entire stack trace in my first post, is that not what you mean?

    5. AH! I already answered this above i see :) Again: Yeah if i turn off one or the other i haven't been able to provoke the error. Tried each about 5 times.

    Leave a comment:


  • Isomorphic
    replied
    Hi mathias, that's potentially useful, and may suggest a concurrency issue, although it's not clear in what system (your app code, Spring, our framework..).

    What about the other questions? These two in particular are extremely important:

    3. And then, what about the data? Can you return a subset of the data and the problem disappears? Because perhaps the problem is as simple as having certain records with date values that are special in some way: they are set to the min or max millis value, so they are untranslatable.

    4. Also, finally, you should be able to see the underlying exception and its stack trace in your debugger, so what is it?
    Just note that in the almost-quote above, these questions have been numbered for easy reference. To add to these, how about:

    5. what if you simply turn off the larger, near- concurrent grid fetch. Can you still reproduce the issue with the Timeline fetch? If so, then that suggests a concurrency issue. If not, likely no concurrency issue.

    Leave a comment:


  • mathias
    replied
    OK, first to respond to questions:
    1. Yes i believe it's the first time after startup that i return a LocalDate to you.
    2. No this is not new code. I am doing a refactoring effort, but we have the same stuff in production since a long time. Our Scheduling module has been around since at least 2016. The middleware hasn't been touched in over a year at least. Production is not running 13d however. We are using LocalDate in various places in production too.
    3. It's probably the first time that 2 requests come in for the Schedule Data like this though.

    As for debugging, i did some more digging.
    If you see attached screenshot you can see what happens when my (very much trimmed down) entry point loads.


    Row 1 is generic data
    Row 2 is schedule config
    Row 3 is loading users from the user ds
    Row 4 is loading schedule data for the timeline - fails
    Row 5 is loading schedule data for the listgrid

    Now, when i look into my own logging there's some interesting stuff. First of all one point to know - row 4 and 5 loads using the same service, but the timeline only has 3 rows to show, while the grid has 116. This means that the TimeLine request will basically always be finished first, which is probably why it always is the one that fails.

    This can be seen in my logs. A couple of observations here (printout below)

    1. As you can see, the 2 fetch requests comes in almost at the same time and in parallel. In general, this could cause init/sync issues, but i know nothing about your code ofc.
    2. The ordering of timeline and grid logs can be in any order, but timeline is always finished first since it has much less rows to process.


    Code:
    2021-10-27 13:41:12.350 TRACE c.k.n.service.ScheduleService - fetch;spaceId;6;caller;1;begin;null;end;null;locationId;null;offset;0;limit;250
    2021-10-27 13:41:12.352 TRACE c.k.n.service.ScheduleService - fetch;spaceId;6;caller;1;begin;Wed Oct 27 00:00:00 CEST 2021;end;Fri Nov 26 23:59:59 CET 2021;locationId;null;offset;0;limit;4000
    2021-10-27 13:41:12.352 DEBUG ScheduleRepo db - select scheduleit0... for ListGrid
    2021-10-27 13:41:12.352 DEBUG ScheduleRepo db - select scheduleit0... for TimeLine
    
    2021-10-27 13:41:12.370 TRACE com.n.api.ds.ScheduleDS -
    RETURNING;fetchScheduleItems;spaceId;6;caller;1;reporterId;null;locationId;null;begin;Wed Oct 27 00:00:00 CEST 2021;end;Fri Nov 26 23:59:59 CET 2021;limit=4000;startRow=0;endRow=4000;total=3;size=3;componentID;isc_SchedTimeline_0
    
    2021-10-27 13:41:12.396 TRACE com.n.api.ds.ScheduleDS -RETURNING;fetchScheduleItems;spaceId;6;caller;1;reporterId;null;locationId;null;begin;null;end;null;limit=250;startRow=0;endRow=250;total=116;size=116;componentID;isc_SchedGrid_0
    I hope it's not too hard to follow! Let me know if i can do anything more

    Click image for larger version  Name:	Screen Shot 2021-10-27 at 13.42.23.png Views:	0 Size:	274.6 KB ID:	266732
    Attached Files

    Leave a comment:


  • Isomorphic
    replied
    So what's different about this DS? Is it the first DS accessed that returns data that involves a LocalDate? If that's not it, how else is it special? We don't have your whole app's code, so we can't analyze this for you - only you can.

    Also, looks like this is a pretty recent problem (based on file creation dates in comments). So again, what's new here? Is the first time using this particular data type anywhere in your app? If not, what changed right before you noticed this problem?

    And then, what about the data? Can you return a subset of the data and the problem disappears? Because perhaps the problem is as simple as having certain records with date values that are special in some way: they are set to the min or max millis value, so they are untranslatable.

    Also, finally, you should be able to see the underlying exception and its stack trace in your debugger, so what is it?

    Leave a comment:


  • mathias
    replied
    Allright, some comments first:

    1. It is not the first request to the server, far from it. Other tabs have already been initialized, my "dashboard" page is already loaded etc. It is not even the first time the datasource is called. As you can see below, i have a "custom" method defined on the datasource, which is called before anything else (it's various information related to company settings for scheduling, what the user can do etc. This is why i'm not sure how it could be framework init.
    Also, All the Spring stuff happens when the webapp is started, through ContextLoaderListener, it's not related to any actual network call i'm pretty sure.

    2. I have some more info related to the timing and what is loaded. I have to explain a bit about my scheduling tab.
    -It has *two* sub-tabs. One which just is a grid displaying the rows in descending time order. Then another that displays the same data, but in a Timeline, which only shows scheduling events today and in the future. The Timeline is what causes this issue.

    --
    What i can see is the following request process:

    1. First, a bunch of 'setup' data for my app is loaded using other datasources without issue (config, users and locations, foreign key DS data etc.)
    2. Then the ListGrid scheduling data is loaded without any problem. (uses the same datasource, spring service and everything)
    3. Finally, the Timeline scheduling data is loaded. This is what causes the problem.

    Notable is that when the timeline data is loaded, it's using a slightly different form of parameters, so not the exact same rows are returned (as explained in point 2).

    Attached is my .ds and a truncated version of my spring service, as well as my javabeans.
    It's the "end" parameter in the LRecurringSchedule that's causing the issue.

    Let me know if there's anything else can i provide.
    Attached Files

    Leave a comment:


  • Isomorphic
    replied
    OK don't be sad :) We have a whole action plan in mind :)

    Can you please take a look at the specific DS where this issue occurs and look for differences from other DataSources? And then perhaps share that DS here, letting us know what seems to be special about it.

    Also please see if carefully running through the app and not triggering that DS (if that is possible) will still lead to that DS having this error after all other DataSources have been hit. Particularly useful would be to hit any other DataSources where results include a LocaleDate.

    Also, from other threads, we know you've got Spring in the mix (there's a "spring mix" salad joke in there somewhere). Please check out the serverInit topic and verify that you see all the logging messages from normal SmartGWT server-side init before this DataSource is hit. We would be unsurprised to find that the underlying exception is something to do with framework init not having run yet, and that's why it seems to self-correct. Some functions of the framework will still work even if init hasn't run yet, but this particular DS might use things that won't work until init has run properly.

    And finally, we will be adding logging to see what the underlying exception was, but of course, using breakpoints, you could catch it now and find out what it is.

    Leave a comment:

Working...
X