Archive for the ‘Java Development’ Category

11 Year Old Java Method’s subtle multi-threading issue…

December 4, 2011



Thinking about multi-threaded code on a slow single core processor (from 2000) is subtly different than thinking about mutli-threaded code on a multi-core processor (from 2011), especially when using the following line of code:

int salt = Math.abs(new Random().nextInt());


A) Create a private static field:

private static final Random RANDOM = new Random(); //created once when the class is loaded

B) Change all the code to use the static instance, RANDOM, ensuring two threads are MUCH less likely to receive the same value:

int salt = Math.abs(RANDOM.nextInt());


So, there I was looking through the server logs of the most recent release of my team’s Java server application. And every other day or so, a weird SQL insert error would show up indicating the values of the primary key had already been inserted. Given the file based code in this area had been working for the past 11 years (in writes to the file system), the code in the new release had added pushing the records into to the SQL DB. And additional new IOException error (“is a directory”) started showing up occasionally claiming a file operation was attempted on a folder…except that I was not working with a folder, I was working with a file. Huh?!

For deeper context, let’s step back in time. The primary application was started in 1999/Nov. It’s a simple HTTP servlet application which accepts a small set of parameters (URI or post) and responds with XML (mime type text/xml). In order to debug system interactions, a copy of the XML is stored in local file system. The folder path consists of “servlet_name/year/month/day” (ex: AddressResponse/2011/12/04). And the file is named “hh_nn_ss-mmm(salt).xml” (ex: 04_58_37-987(2043482311).xml). To avoid possible collisions in case the four separate servers generating these files were to end up with the same time stamp to the millisecond for the file, a small salt value was added to the end of the file, the value in parenthesis, to reduce the possible collisions as much as possible. Implicit in this was the thinking that if two threads in the same server (process) happened to generate the same time stamp, the salt would eliminate to very small probability the two files would collide. The value is generated using java.util.Random. And the line of code which generates it is:

int salt = Math.abs(new Random().nextInt());
 Ah, the folly of insufficiently thought out problems based on implicitly defaulted parameters. Sigh. I am sure some, and perhaps many, of you can already see where this is headed. Please remember, this was code written over 11 years ago which had long been forgotten about and “just worked”.

So back to the present and the errors/exceptions – a failed insertion into the SQL DB and a claim that attempting to open a new file to write results in an exception with the message text simply saying “is a directory”. I ran through the new DB code over and over again assuming I had made some “global context via Java statics” error even though I had been diligent to avoid statics and make EVERYTHING immutable (tysvm Joshua Bloch). Finally, I happened to notice that non-primary key data was slightly different between the data already existing in the DB and the data dumped into the exception log. This implied that two separate requests were ending up with an identical file name INCLUDING THE SALT. Statistically, that seemed impossible. The servers don’t receive enough requests to actually come near the 1 in +2.1 billion chance that the same salt value would be generated.

After carefully reading the code and checking for any sort of bleed of the text value between threads (hundreds of lines of code between a number of files), I pulled all my hair out. And barely slept that night. When I got to work the next day, I almost didn’t open Eclipse as I didn’t want to get stuck chasing this. I had way too many other things I had to work. However, since I didn’t understand what the problem was, I couldn’t be sure it was just contained to this one area. There might be some sort of data loss occurring which was silent and remained hidden. And it was occurring in production. So, ignoring it was not the right or responsible thing to do.

I opened Eclipse anyway and my focus shifted to where the salt value was being generated. It seemed it was the only point where the conflict could occur. Double checked to verify there was no static bleed. I went and looked at multi-threading issues around java.util.Random. There had been some prior to 1.5. However, the servers were both now on 1.6. And I didn’t have the problem back when they were using 1.3. Then, I looked at the line again:

int salt = Math.abs(new Random().nextInt());
Then, my subconscious finally indicated that maybe I ought to make explicit what is implicitly default about this line. Instantiating java.util.Random with an empty constructor (no seed) causes Random to use System.currentTimeMillis() as the default for the seed. So, the line really reads:
int salt = Math.abs(new Random(System.currentTimeMillis()).nextInt());

And walla, there’s the problem. How? First you must understand how java.util.Random works. It’s a pseudo random number generator. What it means is that it’s not really random. Given the same starting seed value, the nextInt() method will return exactly the same sequence of values. So, new Random(0).nextInt() will always return exactly the same value every time. Almost all of the time for almost all uses of Random, using the clock (i.e. System.currentTimeMillis()) is a great way to generate a random sequence. The problem here is that two separate servlet threads happen to make the call at exactly the same time which results in exactly the same value being generated as the default for the empty Random constructor. This results in the salt value in both threads being precisely the same. And this results in both threads having identical file names. So, the thread that is successful in getting the file stored to the file system “loses” in that the second thread automatically deletes it and then saves it’s own copy. However, the DB allows the first thread’s record to be inserted and then the DB rejects the second thread’s record. And as a result, the two errors above occur.

The first error, the one claiming that opening the file to write it to the file system (message of “is a directory”) was when one thread was still in the middle of writing the file (owned the system file lock) causing the second thread to receive the exception. And the second exception, the SQL DB insert error, was when either thread finally wrote it’s record to the DB successfully causing the lagging thread to “lose” when the DBs primary key (unique index) restricted the attempted duplicate insertion.

Now that I understood the problem, some XML responses were not being logged in the file system and the SQL DB, I could conclude the problem was non-critical and didn’t require an immediate fix. The fix could go into the next release which ought to occur within the next 2-4 months. It also meant that the problem has likely existed ever since this code was generated back in 2000 and has been silently occurring, albeit very infrequently, because the file saving mechanism was always deleting the first thread’s file write with the second thread’s file write (silent file delete needs to be elevated to a log output so this could have been caught sooner).

This issue actually showed there were several problems with the code design beyond just the java.util.Random default value issue. However, the fix for this particular part of the issue (with a minimum of redesign) is to change the class where the salt is generated by moving the creation of the Random to a private static value. This will ensure the Random is generated once when the class is loaded. And then the salt related code will call nextInt() on that instance of Random (and nextInt() is thread safe, I checked).

There is still the very remote possibility that two threads could both have an identical date/time stamp and nextInt() return the same (Math.abs()) value, twice in a row. However, we are now back at the 1 in +2.1 billion chance, a risk I’d easily take…for now. I will eventually revisit the code generating the file name to eliminate file name duplication in that rare event.

It is possible to design the logic more effectively to handle file name conflicts and just regenerate the full name for the file. The probability that the time stamp and the salt will be the same on the regenerate is so very small, it would rarely occur (like maybe once every 10 years given the current server loads). And even if it happened to occur, the expense for the recalculation would be close to negligible. And given the exception, just causing the thread with the collision to sleep for a couple of milliseconds before attempting the regenerate would noise up the results, especially of the chosen sleep time happened to be a random value between 1 and 20 and retrieved from the same instance of RANDOM.

After working through all of this, I then realized I might have used this particular Random pattern in other places in the code base. A quick use of Eclipse to search the code base turned up a use in a different area, and it was using the same basic implementation. The impact there was not large enough to even worry about it as there was already a mechanism for retrying around conflicts. Still, I will update it with the same solution (moving to a private static instance initialized at class load time) and virtually eliminate conflicts.

As servers continue to increase in speed, number of processors and number of cores, conflicts which were virtually impossible 10 years ago, are much more possible now. It does make me wonder what other kinds of assumptions I have made in my code from back then which will show up as odd server log exceptions. Whatever the case, I’m sure Murphy is working hard to enlighten me as soon as possible. Whoohoo!


How Oracle’s Ellison Drove a Java advocate to .NET/C#…

October 26, 2010

You don’t know me. Larry Ellison certainly doesn’t know me. I am a nobody. A typically esoteric strange developer coder type nobody. But still a nobody in the big picture of industrial software platforms.

Yet, there are enough of me, of nobodies, to eventually matter. And thanks to Ellison, Jobs and Ballmer, I am now planning to use FLOSS Mono influenced MonoTouch to produce .NET/C# apps on the iPhone. That’s the only way I know to punch each of them in their groin while simultaneously maximizing _my_ market value. Jobs has to _hate_ my choosing to use .NET/C# on the iPhone. Ballmer has to hate I am doing anything on the iPhone. And Ellison loses the mind-share of a disillusioned Java enterprise developer. How does this help Oracle long-term?

And I know I am not the only Java developer like this. In fact, I am seeing lots of nobodies out there…enough to eventually make a difference emergently. If my attention moves away from enterprise Java and into, say, .NET/C# via MonoTouch, how does that help Oracle long-term? And if enough of us move away from Java and into whatever else is out there, Java heads towards the same fate as FORTRAN and COBOL. It will still exist. It will just do so in obscurity while its descendants thrive. Perhaps that’s the future. Java’s descendants; Android and .NET/C# via MonoTouch. How does this help Oracle long-term?

I’m not a star Java developer with a blog about how awesome Java is. I am just one of those mundane day-to-day enterprise application code sluggers that has been happy with Sun’s FLOSS directions in and around Java and OpenOffice for years. And then Oracle/Ellison purchased Sun. And I tried to stave off the feelings of dread I had about what might happen as Sun was consumed and digested by the Oracle/Ellison behemoth. But the staving didn’t work. And an internal motivation threshold was crossed. And I am now actively pursuing a .NET/C# future. First in MonoTouch. And then, in MS’s own products as they are introduced (as opposed to Java) where I’m responsible for architecture and design decisions. How does this help Oracle long-term?

I am not an Ellison fan. I’m not into the psychologically immature narcissists (probably adjectively redundant) that seem to plague US technical corporations (I don’t know if the same occurs elsewhere). I particularly don’t care for Oracle/Ellison, Apple/Jobs and Microsoft/Ballmer. Maybe it’s because I cannot relate to them and their corporate juggernauts. It’s probably why I remained a Sun fan. I like underdogs. I wasn’t very keen on Scott McNeily, Sun’s original CEO, as he seemed to be a personality alignment wannabe for those others. However, he stepped down and was replaced with Jonathon Schwartz. And he was my kind of guy. Unfortunately, he was left with quite a mess upon his arrival. And in spite of that, he was able to achieve some great progress, at least for those of use who value FLOSS. And he was able to communicate pretty effectively with his development community. Schwartz fell away as Sun entered the great maws of Oracle. I trusted Schwartz.

And as I watched Oracle digest Sun, I became aware that I was no longer comfortable with Java’s future as I didn’t trust Java’s new steward. How OpenSolaris was treated was one of the earliest signs. Then, there was the activity around Apache.

So, I began to think about “What’s next?” I couldn’t help but notice all the activity around Android. I became interested in moving in that direction. Oracle helped push me further into that interest by suing Google for software patent infringement within Android. Nothing gets me more anti a company than for them to offensively use software patents. Okay, maybe ANYONE using the DMCA as a censorship mechanism might get me hotter, still. So, how does their litigation help Oracle long-term?

And then there’s OpenOffice. Oracle’s behavior in this area is just atrociously apathetic. And then they have made things even worse in the way they treated those who might be able to be liasons to help bridge themselves into the FLOSS community’s new LibreOffice. Instead, they acted like the community clueless corporate hacks I had feared they might be. Guess which nobody isn’t in alignment with, isn’t going to advocate for and will run from the NOracle as quickly as possible? Is Oracle even thinking long-term?

Since I currently own an iPhone 3G and my two year contract doesn’t expire until April, I decided to look at developing an app for the iPhone. As I investigated creating the app, I invested two weeks into researching what my options might be. And that was my first real foray into “Objective C”. I happened to get  a copy of “Head First iPhone Development”. I was immediately put off by Objective C. And then I recalled Jobs requiring I use it (or C or C++) to make apps for the iPhone. So, I began a hunt to see what other options existed.

And that’s when a friend pointed out MonoTouch. I investigated and liked how comfortable it felt. So, I have taken a series of very low risk and low cost choices as I move to generate my first iPhone app using MonoTouch. And thus far, I am VERY happy with what I have discovered. Plus, if Windows Mobile 7 takes off, I have leveraged my iPhone experience to help me eventually do .NET/C# development on Microsoft’s new mobile OS (which is the most likely one for Nokia to pick up now that they have offed Symbian).

All in all, I am a nobody. And as such, I have to hedge my bets. And thanks to Oracle/Ellison, I have decided to go explore other ways to hedge my bets as opposed to endure their hegemony. .NET/C# are now mature enough. Mono is FLOSS free enough. And thanks to Oracle/Ellison, it appears that Java has begun its eventual decline into the same status as COBOL.

As to Android, that’s just second nature to my existing Java experience. Of COURSE I will be producing apps for that. It’s a slam dunk. And knowing that producing apps on Android is annoying Oracle/Ellison’s juggernaut, bonus!

Java-like code remains in my future. It’s just descendants of Java; C# and Android, which will be my new partners.

So long Java, and thanks for all the FLOSS!