Our columnist, Kirk Pepperdine, interviews Ron Bodkin of Glassbox, the company that are the primary motivators for Glassbox the open source automated troubleshooting and monitoring agent for Java apps.Published April 2007, Author Kirk Pepperdine
Can you tell us a bit about yourself and what you do?
I'm Ron Bodkin, the Founder and CEO of Glassbox. We provide open source software to help organizations proactively solve Java applications issues such as errors and slowness before they become real issues, to minimize downtime and time wasted in fighting fires.
What is Glassbox and how can it help me?
Glassbox is an easy to use plug-in for your Java VM: in 15 minutes you start seeing where problems might be, and what might be causing them. With its AJAX UI it's easy to use Glassbox and to share its findings with others in your organization: we think this means of facilitating communication is crucial to reducing downtime. And since it is Open Source, companies can customize it to their own systems and components.
Glassbox is valuable in production by watching applications and identifying issues early on, so you know if users are hitting errors or if performance is degrading, then you get focused clear data about causes of problems such as database outages or slow Web service calls. Glassbox is also a great foundation for building application-specific monitoring, letting you define your own monitoring policies.
Glassbox is also useful in a QA environment. It makes it easy to identify where problems occur and flags common causes when running load tests or even functional tests. It provides URLs, and parameters for incoming and outgoing requests to make bugs easier to reproduce and isolate. Of course this feature also enables isolating problems in production, by detecting which data sets or users are performing badly or failing.
In addition, Glassbox is a good tool in development to provide an overview of performance sliced by operation in your system, to see where time is spent in processing, and to flag problems under load such as thread contention. You can also configure Glassbox to provide more detailed performance logs
Glassbox runs on Java 1.4, 5, and 6 for Tomcat, JBoss, Weblogic, WebSphere, OC4J and Oracle Application Server, and Glassfish.
What was the nucleus of the idea behind Glassbox?
Glassbox came about from two main influences. We had a lot of experience in supporting production systems and we believed that a lot of problems occur frequently and waste a great deal of time and talent in diagnosing, especially given how often people lose their cool in a crisis and jump to conclusions rather than analyzing all the possibilities. The other side was from my experience with Aspect-Oriented Programming: I wanted to make it easy for projects to get the benefits of AOP for management without having to become experts.
Are there any performance preconceptions or ideas regarding Java and performance that the Glassbox project has changed?
The biggest one in my mind is the notion that performance tools should focus on the hardest problems. We are showing how important it is to automate and simplify the common problems, to minimize downtime and expenses. Glassbox helps with identifying and more quickly fixing the long tail of performance problems, allowing you to detect the many different variations on the common themes of slow and failing behavior across components and application-specific details. Glassbox often helps organizations find death by 1000 cuts, thread contention, and unreliability in production and staging environments. These are frequently overlooked areas of performance and reliability problems.
Another preconception that I see a lot is the traditional dichotomy between management code which is laboriously instrumented (and often of doubtful accuracy and value) and low-level automated tools. We are enabling modeling application performance by components and resources to makes sense for a specific application.
Tell us about the Glassbox community? How did that come about and how does it work?
The Glassbox community has come to us through a wide variety of sources from word of mouth to presentations and articles. It is highly diverse and international: we have users from 25 different countries. We see a sophisticated, engaged user base, whose members are interested in leveraging open source and contributing back. From our interaction with users, our surveys and analysis of users and Web site visitors, Glassbox is being used in production, QA, and development for a wide variety of industries and applications. Our users interact with us and each other through our forums, email, and issue tracking software. It's really hard to understate the importance of open source in building a community and establishing effective ways to work.
What do you consider to be the biggest Java performance issues currently?
As always, writing distributed calls like they are local is a big problem: it's a problem in database interaction, in Web services calls, and in chatty AJAX applications. This is the most frequent cause of death by 1000 cuts. Another important issue is over synchronization. Speaking of AJAX, it shifts workloads a lot and can really cause big problems on a server if the system architecture isn't designed to handle more frequent smaller requests. Another common problem is abysmal throughput because they are synchronization access to a bottleneck in their system, even facing deadlocks.
How does Glassbox help you deal with these issues?
Glassbox decomposes the causes of slowness and failure for interaction with common resources like databases, Web services APIs, and clustering software to identify death by 1000 cuts and also failures from unreliable distributed components. Glassbox also detects thread contention due to Java synchronization and reports where this is causing slow performance. Glassbox also analyzes performance for DWR and GWT, the two most popular server-side Java frameworks for handling AJAX. It then reports these problems in a concise, understandable way with more supporting detail available. For Java 5 and later VM's,
What do you see as the next biggest performance issue that is lurking around the corner?
A big performance issue that's really picking up steam is the weakest link in the chain: as we build more distributed systems through service oriented architectures and computing grids, any one external system or resource can be slow or fail, exponentially multiplying the likelihood that the consuming system will be slow or unreliable system. Think how often performance and reliability problems in a traditional server-based application happen because of the one distributed component, the database. While having distributed resources adds a lot of flexibility, reduces latency, and simplifies integration in many ways, it greatly increases troubleshooting complexity.
What can developers do today to prepare themselves for these issues?
I think many projects would benefit by creating prototypes of their high risk components, and load test them early on to verify performance and reliability assumptions, then use those results to set a reasonable service level agreement with the other stakeholders for their application (users and managers). With that in mind, we see tremendous value in modeling the key layers and performance thresholds for an application with a monitoring and troubleshooting tool like Glassbox, to allow tracking against the baseline.
Where do you go from here on with Glassbox?
GB is going ahead full speed. We've had our 2.0 beta out since last September, and are getting great feedback. We want to make it better, and optimize it. We are working on some exciting ideas and integrations, which I won't share just yet, but at the core we want to hear back from the community, and make sure that we solve the long tail performance and reliability issues we are after. We also are very interested in contributions from others who care about making it easier to deliver reliable, high performance Java solutions.
Thanks for that great interview Ron