Weekly Summary - Slow, Painful TC Spring Progress
After two weeks of debugging, on Friday I added five magic lines of code to make the last three automated Spring tests pass. With this change we can say we support Spring up to 2.0.8 in our upcoming 2.6.2 release. We still have targeted Spring 2.5.x support for an upcoming summer release.
Two weeks of debugging, five lines of code, one bug fix. This has to be improved on.
Just for my own amusement, I’m going to try to list and describe all of the things that accounted for all the time spent on this.
By midweek last week, I had finally gotten into a groove with the Eclipse debugger, stepping through one of the failing Spring container tests. By “container” test, we mean parts of the test were actually running in a web app container, Tomcat in this case. Even so, this was not a very good, tight feedback loop. The basic pattern was for me to start the test, then attach to each of the two processes running in tomcat with the debugger. (Luckily, our code was already set up to allow for debugging, although it took me awhile to find and enable the magic property.) Then I just had to step through code in the debugger, not really sure what I was looking for. I had to alternate doing this, first with Spring 2.0.1 in the classpath, since our test was passing for that version, then with Spring 2.0.8 in the classpath, and look for what was going wrong. (We have a way of easily varying the version of a library such as Spring. However, I spent basically all of Tuesday helping Hung debug a problem in our build process concerning those variants.) Each time I alternated Spring variants, I also had to tweak the source lookup in my Eclipse remote debug configuration, to pull up source from the right Spring version.
The test uses our DSO functionality, it’s basically a semi-complete running instance of the DSO client. At one point I found I needed to hit breakpoints that were occuring during DSO bootstrapping, which meant I had to set yet another magic property (in ClassProcessorHelper) to enable debugging which is normally not available prior to TC instrumentation. I had to dig through my IM chat transcripts to find the name and location of that property, which my boss had mentioned once weeks ago, and then enable it, and find through painful trial and error that it didn’t work unless I did a full clean recompile (but it worked like a charm after that).
The code I was stepping through was instrumented using AspectWerkz, although for the most part the breakpoints seemed to work fine. But the amount of code was just vast. All I can say is, I spent probably twelve to fourteen hours of straight debugging on Thursday and Friday, just hitting breakpoints, comparing state, following hunches and wild goose chases and red herrings. In the end I found the Spring class whose source had changed between 2.0.1 and 2.0.5, and again in 2.0.8. (It was org.springframework.aop.config.ScopedProxyBeanDefinitionDecorator, deep in the guts of Spring’s aop framework.)
So in the final tally, I spent unexpected amounts of time
- helping debug a build problem which prevented us from using the “variants” feature, to vary the Spring library
- setting up Eclipse projects for different versions of Spring, so I could browse 2.0.1 and 2.0.8 source code side by side
- tweaking the Spring source lookup for the Eclipse remote debugging configuration, to alternate between 2.0.1 and 2.0.8
- figuring out how to enable debugging of our container tests
- figuring out how to enable debugging during DSO client startup
- trying (in vain) to write a more lightweight unit test to tackle the problem with
- debugging for many, many hours once it was all working
And that’s not even counting actually learning AspectWerkz and writing a new pointcut and advice, which I would have had to do anyway, but which itself involved some painful trial and error.
And at the end of all of this, I sort of feel like I’ve just patched some big honking beast that I don’t fully understand. I feel like our code as it stands now (the AspectWerkz pointcuts and advices for clustering Spring) are still very tightly coupled to Spring source code in a very fragile way, and it’s only a matter of time before a new minor Spring version comes along and breaks it again. All week I was thinking, there has got to be a better way.
Our code for instrumenting and clustering Spring is really cool and mind-blowing, don’t get me wrong. I never would have thought of it in a million years. But it is a classic case of code that is tightly coupled, not modular at all. There is an impressive amount of test coverage through automated integration and system tests, which we need. But the Spring test suite takes about an hour to run, and that’s not even every test. There is a lot of AspectWerkz advices and pointcuts being used to cluster Spring, and none of them can be run and tested in isolation. You just have to start up the whole shebang and debug.
I consider myself a TDD and refactoring proponent. All week long, the pain of trying to debug this was screaming out to me that this part of the code needed refactoring and unit tests. Honestly, I’m not sure I could have done any worthwhile refactoring and still figured out the problem in the same amount of time. That’s the classic TDD/refactoring chicken and egg problem - when you’re in a time crunch, the prospect of trying to clean up some code in order to make things easier at some undermined future point seems insurmountable, and meanwhile you always tend to mentally downplay the amount of time it will take to just debug and fix the code as is. That’s the fear that comes with TDD.
In an interesting twist, I was talking to my boss Steve, and he asked me why I wasn’t refactoring. My boss wants me to refactor! He didn’t order me to or anything, but he sounds very much in favor of it. He said he almost always regrets not refactoring as he goes.
I think keeping the code clean and testable is so important to the long term health of a large, constantly evolving code base. I mean, we can’t just keep having software developers spend two weeks debugging one bug.
There’s a lot more I want to think about along these lines, but I’ve written enough for now.
Leave a comment