Thursday, June 16, 2005

Walmart.com migrates to java 1.5

I found this case study interesting for one particular reason. The author has not used unneccessary jargons to impress the readers. He has used straight-forward language & simple examples to represent, what actually was/is a huge task.

As a developer one knows how painful any migration can get. Even a short re-engineering or re-factoring of a small project takes hell lot of time & energy. Moreover there's always the fear that we might have missed something, or something may come up in due course.

However strong designing or coding skills a person might have, the possibility of human error or over-sight can never be completely removed, it can only be minimised.

Considering that Walmart.com handles 7 million sessions per day (huge by any standards), it must have been a herculan task to upgrade the present system working on Java 1.3/1.4 to Java 1.5.

http://www.theserverside.com/articles/article.tss?l=MigratingtoJava5

How do you upgrade one of the busiest web applications of the whole Internet to Java 1.5? This is a challenge under the best conditions, considering that the application must handle up to 7 million sessions and 106 million page views per day. This article is a case study of the Java 1.5 upgrade for the application and subsystems that make up the walmart.com web site.
Upgrading to Java 1.5 was motivated by the need to have better monitoring tools and JVM capabilities on the production site, the evolution of some foundation technologies that require Java 1.5 functionality in their latest releases, and programmer demand to use the new language features.
The walmart.com store is structured as a single application with scores of subsystems. The application combines open-source, commercial, and proprietary technologies. The main web site is a served by Apache and Tomcat, and it depends on:
Database stored procedures
Apache Axis
Commercial application APIs
Internal applications
Various presentation technologies
The application runs in a loosely coupled cluster of over one hundred servers. Data is managed by a massively parallel database server. The application provides and consumes web services in the form of HTTP POSTs, ad hoc file transfers, and SOAP web services. Since it all depends on Java, the upgrade to 1.5 was planned in three distinct stages over the course of 24 weeks. The goal was to have minimum operational impact.
The application development calendar is made up of eight-week cycles. A typical cycle includes:
new software development;
unit, integration, and interoperability testing;
staging; and
deployment.
The release dates are immutable because of the interdependence with many vendors’ applications. Delaying a release impacts the application itself and a large number of third-party vendors who provide services throughout the site.
Table 1 shows how the Java 1.5 upgrade was mapped to the release calendar:
Release
Activity
Notes
A
Introduce Java 1.5 tools to the development environment, but allow compilation only as Java 1.4

Define workarounds for issues so that a roll ‑ back to JVM 1.4 is possible in case of major production issues not identified during this release or due to system load
Identify and address build and environmental issues

Training in the new Java 5 language features for the engineers
B
Deploy Java 1.5 in production, running as Java 1.4

Define “best coding practices” for developers aching to use the new language features

Resolve issues identified during release A in a permanent way and adopt Java 1.5-compatible solutions
If the production deployment causes problems, roll ‑ back to Java 1.4 and reevaluate

Some third party products, like Apache Axis, may be updated in release B
C
Deploy code written using the new Java 1.5 language features in production
Migration is complete
Table 1
The release schedule is designed to minimize operational disruption and to give the engineers time to address any development and integration issues. The rest of this article presents some of those issues and how the development and operations teams addressed them.
SWITCHING JAVA COMPILERS
The application used Jikes instead of javac prior to release A. Jikes was used in the development environment for compiling static classes and content, and at run‑time for generating JSPs. The upgrade required switching the compiler to javac and configuring it for 1.4 sources and 1.5 targets. This created minimum disruption in the core application building tools, which rely on make and Ant for compiling the application. It created a few problems among the developers because the engineering department policy allows the developers to build with any IDE of their choice as long as the site can be built with the standard make/Ant/javac command line tools. The compiler switch affected the developers in varying degrees:
Users of standard *NIX-only tools were not affected; all they had to do was check the new compiler and build files out of the CVS repository and run “make clean; make”.
Eclipse 3.1 users had the most problems getting the site to compile and execute within the scope of the IDE. None of the problems was a showstopper, and issues were resolved within 24 hours of the upgrade in most cases.
IDEA users didn’t experience major disruptions with the upgrade.
Users of jEdit and other tools didn’t report issues.
There was a small performance impact in building dynamic JSPs with the javac compiler. The impact was deemed insignificant in the context of the site, and balances out with performance advantages gained with the use of Java 1.5 Hotspot.
XML SUPPORT IN JAVA 1.5
Java 1.5 introduced several changes or enhancements to its XML API (SAX, DOM and XSLT in particular). These changes first became apparent when running code that relied on JSP’s imports. Previous versions of the XML API allowed the same XML attribute to be defined more than once in a single JSP. Quite a few JSP pages, dynamic and static, broke when running under Java 1.5 because of the attribute repetition. The two main solutions proposed to address this problem were:
Parse all the JSP files that made reference to this attribute and replace it with a unique name per file. This would allow nesting and cause minimum impact to the existing code or the site itself, since the offending attribute had limited scope.
Restructure the JSP pages and imports so that the attribute was defined once, and only once, for each page, without collisions while retaining the attribute’s name.
We discarded the first solution because, although expeditious, it was also considered a kludge. The “unique name per file” solution would help perpetuate a bad-coding practice. This would result in short-term impact to the production site but non-maintainable or brittle code in the long term. This is not acceptable for a site that relies on several hundred JSPs that, combined with database queries and templates, explode to thousands of different pages.
Implementing the second solution was relatively straightforward, albeit labor-intensive for the platform engineers. Several hundred pages were updated to reflect the change, and were individually tested. We expect to find few issues during quality assurance thanks to their work. The application engineers participated in a training session that explained the reasons for this change and adapted the production code to the new header page import sequence.
SOAP WEB SERVICES
Some mission-critical applications are implemented as Axis SOAP web services. These applications presented two major challenges during the upgrade:
The Axis 1.1 developers had used the word ‘enum’ in one of their package names; enum is a word now reserved for designating enumerated types. Java enums are class‑like constructs and specialize the java.lang.Enum class.
The platform group upgraded Axis to version 1.2 with a different package nomenclature. This worked fine for the compiler but resulted in a run‑time error because of Axis.
Changes to existing Java classes relying on any component upgraded along with Java 1.5 were kept to a minimum to simplify the roll-back process. The standard operating procedure for Axis implementations is to re—generate the application code with the new code base and tools such as WSDL2Java. We chose to keep the existing classes instead for release A and build a programmatic solution to the bug found in Axis 1.2 that would also work with Java 1.4/Axis 1.1. That way only the JVM and libraries need to roll-back if needed. Full implementation of Axis 1.2 (or latest) will wait for releases B or C, when the development focus shifts to language and library features, not JVM/run‑time migration.
HOTSPOT CAVEATS
The servers implement all possible optimization advantages because of the traffic volume that they must handle. The application relies on Hotspot to optimize frequently used methods. The introduction of the Java 1.5 JVM brought with it a few run-time exceptions in code that worked fine under the JVM 1.4.
Troubleshooting and resolving these errors is simple, though somewhat labor intensive:
Use profiling tools, if possible
If the JVM dies, analyze the core dump for clues as to which methods are causing the error
An example of the first case was a problem with the concurrent and parallel garbage collector. It was resolved by increasing the stack memory available to the GC through the use run-time configuration parameters like CMSMarkStackSize and CMSParallelRemarkEnabled, per the Java vendor’s recommendation.
In the second case, some methods threw NullPointerExceptions during the JIT compilation, regardless of the configuration. The short-term workaround this problem is to add the class/method name to the .hotspot_compiler configuration file. The JVM won’t compile the methods listed there. The long-term solution is for the JVM vendor to provide a bug fix.
JSTL TRANSFORM ERRORS
Some portions of the site are implemented with JSTL. They are isolated within a set of common end‑user services, so it was easy to resolve issues when they cropped up.
A number of errors were reported when the JSTL code was executed in the Java 1.5 environment. It was determined JSTL relies on JAXP 1.2 but the J2SE provides JAXP 1.3. The solution to involved two steps:
Specifying the JAXP 1.2 classes to the run-time; and
Resetting the default transformer factory from Xalan to XSLTC.
In step with our desire to disturb the environment as little as possible, these changes were made as command line switches passed to the JVM during startup; they are easy to remove if there is a need to roll Java 1.5 back in a production environment.
OTHER THINGS TO CONSIDER
The examples listed so far are the most glaring examples of incompatibilities and issues introduced by the new JVM in an otherwise stable environment. It’s likely that other issues will manifest themselves during regression testing. At this time, the application is stable and the platform development team feels that all the show stopping issues have been resolved.
Once the JVM upgrade to production is complete, the programmers will use Java 5 idioms in the code such as generics, autoboxing, enhanced ‘for’ statements, etc. starting in release B.
RELEASE B: JAVA 1.5 PROGRAMMING
Java 5 coding idioms were introduced at walmart.com through a series of presentations beginning in December 2004. The goal was for the engineering team to familiarize themselves with the new language features and the caveats involved in their use (or abuse). Three 90-minute overview sessions were scheduled:
December 2004: new Java 5 programming features overview: generics, autoboxing, enumerated types, enhanced ‘for’ statement and static imports
May 2005: Annotations, new concurrency API, and enhanced formatted I/O
June 2005: JVM monitoring and profiling tools (JMX)
A team of walmart.com’s engineers developed these training sessions. Additional in‑depth training provided by a third party will follow. The company has a set of non-restrictive coding guidelines in place already; the engineering group will enhance it with a set of best practices specific to the new language features.
New language features will be added only when:
they’re necessary because a third—party API requires them;
an existing third—party API is updated and integrated with the environment;
new application code is developed; and
existing application code is revised;
There won’t be a massive updating exercise to implement the Java 5 language features in every source file. The number of classes and JSP pages to update is staggering, and such endeavor would be plagued with bugs. The goal is to add these features as their need becomes apparent.
CONCLUSION
Upgrading a production Java 1.5 environment for a popular site is a task riddled with risks that can be minimized or addressed through conscientious planning. This article presented how problems were overcome during a run‑time Java 1.5 upgrade process. The engineers in charge of this upgrade and their users (i.e. the application developers) feel comfortable with the process so far, though a few things might be done differently in the future. For example, JTidy (an HTML syntax checker that can fix malformed HTML output) was introduced along with the Java 1.5 upgrade. The engineering group spent some resources chasing ghosts thinking that some problems introduced into the environment by JTidy were caused by the XML parser changes in J2SE 1.5. Future run‑time updates will focus on JVM—specific upgrades, leaving the introduction of new functionality to the late part of the current cycle, or moving it to the next release altogether.
This article is limited to just the Java environment at walmart.com. Except for the problems presented in here, the overall feeling is that a migration to Java 1.5 in a production environment can be a mostly painless exercise. No insurmountable issues are expected when the new coding features come in use. The Java 1.5 upgrade engineers feel that this update will result in performance, integrity, coding and monitoring benefits for the site’s customers, the application developers, and the operations team in charge of the site.

AUTHOR BIO
Eugene Ciurana is an author and computer engineer with 20+ years of experience in the design, implementation and deployment of mission-critical systems.

No comments: