Extending maintenance intervals: drifting into failure?
IN JANUARY 2000 an Alaskan Airlines McDonnell Douglas MD-83 jet crashed into the Pacific Ocean off California killing all 88 people on board. The jacking screw device that allowed the pilots to control the pitch of the horizontal stabilizer and the nose attitude of the aircraft had failed, causing the aircraft to nose dive into the sea. The threads on the nut of the jacking screw wore out because grease was applied too infrequently. 
This was a straight forward case. A simple mechanical device failed because maintenance intervals were extended in the interest of minimizing machine downtime. Sound familiar? We can all learn something from this story – not only from the accident but also from the different ways it has been investigated. 
Like many aircraft, the MD-83 requires the front edge of the horizontal stabilizer to pitch up or down in order to compensate for shifting load balances and maintain the correct nose attitude of the aircraft.  This control of the horizontal stabilizer is carried out by a screw jack which moves with the stabilizer and is actuated by servo motors mounted within the stabilizer itself. The mating nut, fixed to the airframe, is manufactured of material considerably softer than its corresponding screw. With insufficient grease, the nut threads wore down and stripped. As the nut threads gave way, the lower jack stop was all that was left to retain the horizontal stabilizer and in time it failed as well.  This allowed the leading edge of the horizontal stabilizer to pitch up at an extreme angle forcing the nose of the aircraft down into an uncontrollable dive.
The TV ‘Cutting Corners’ documentary correctly dramatized the difficulty of the pilots wrestling with the control column, trying to drag the nose of the aircraft up and preventing it from diving into the ocean.  The documentary cited the cause of the accident as “a shocking chain of negligence and error”. It vindicated the efforts of a company whistle-blower who, like the NTSB report AAR-02/01, criticized the company for poor maintenance standards.  On the surface, that’s what it looked like: inexcusable negligence.
However Sydney Dekker, an innovative safety thinker, has delved deeper into the investigation than the 189 page NTSB report was able to. In Ten questions about human error: A new view of human factors and system safety (London: Lawrence Erlbaum Associates) he reconstructs the unfolding history of the Alaskan Airlines management and the regulator responsible for monitoring them and reveals a story that could have caught out any one of us.
Dekker plots a timeline of lubrication intervals (see graph above) and finds that they were not extended inappropriately at a single event, but gradually over a period of 34 years from 1966 to 2000.  He finds a similar story with end play checks, which provided a measure of thread wear on the nut of the screw jack.  He notes that the FAA, as regulator, approved the maintenance interval extensions and contrary to the initial expectations of federal prosecutors no laws were broken, no rules violated.
Instead Dekker goes on to discover that “a complex and constantly evolving web of committees with representatives from regulators, manufacturers, subcontractors, and operators was at the heart of a fragmented, discontinuous development of maintenance standards, documents and specifications.” This is a story that sounds all too familiar to many of us, particularly those of us who serve in the larger organizations and corporations of New Zealand industry.
Using this accident as an example, Dekker presents his theory on Failure Drift.  Summarized it reads:
1. Accidents are not normally preceded by monumentally bad decisions or deviant steps away from the ruling norm. Drifting into failure occurs slowly and by small incremental steps.  Pressures of scarcity and competition typically fuel this drift.
2. To properly understand the cause of failure drift and prevent it from recurring, it is necessary to understand how each of the incremental steps made sense to people in the organization at the time – without the benefit of hindsight.
3. With such careful study it is often plain to see that people who contribute to accidents and large engineering failures are not usually negligent or incompetent; they are normal people doing normal work in normal organizations. They are often trying to balance conflicting goals and constraints, for example cutting costs and managing resource limits on the one hand and implementing good preventative maintenance and safety on the other.

A screw jack is a simple mechanical device that can be found in many New Zealand process industries. Like many items of plant it needs lubrication and regular maintenance downtime for continued operation. Although extending maintenance intervals looks bad on paper and embarrassing through hindsight vision, it is legitimate reality to many in these tough economic times.
However, when pushing maintenance interval limits and aggressive cost cutting strategies, it is important to factor in the need for more thorough and extensive engineering monitoring without which failure drift is invisible. In some cases this is needed not just over months and years, but over decades and often beyond individual career lifetimes. 
Article by Andrew McGregor, director, Prosolve Ltd - mechanical and forensic engineers and project managers.










Publishing Information
Page Number:
Related Articles
Unlock the potential of your operation.
You're invited to learn how in three simple steps
Trans-Tasman client offers extended service
A global leader in architectural revolving doors and security entrance systems, Boon Edam...
POAL’s big port move
Ports of Auckland has successfully moved two...