Speech to IASS Miami 2015
Chair, Transportation Safety Board of Canada
Miami, Florida, 2 November 2015
Check against delivery.
Good morning. Thank you very much for that kind introduction. It is a real pleasure to be here.
I'd like to start today by acknowledging the excellent work that is being conducted by Flight Safety Foundation. At the TSB, we regularly quote your research in our reports—the ALAR toolkit,Footnote 1 for example, which we referenced in our report last year into the crash of a Boeing 737 in Resolute Bay, in Canada's Arctic. Or your Operators Guide to Human Factors in Aviation, which we referenced in our investigationFootnote 2 of a fatal Cessna Cardinal crash just outside Ottawa in 2011.
We also appreciate the frequent exposure that your magazine, AeroSafety World, has given to the TSB's reports. It helps us spread the word about the safety lessons we have learned, which may be of value to operators and regulators in other countries.
For those in today's audience who may not be familiar with the Transportation Safety Board of Canada, we are an independent federal agency, one whose only goal is to advance transportation safety. It's been our job, for 25 years now, to conduct thorough investigations into air, marine, rail and pipeline occurrences. We identify the multiple causes and contributing factors, and then we make recommendations aimed at preventing such accidents from happening again. We are not a court; we do not assign blame or determine civil or criminal liability. Nor are we the regulator; we don't make laws, and our findings are not binding on those who do. BUT. When we identify a risk, or a safety deficiency that needs to be addressed, we make sure to communicate that information—loud and clear—to those best placed to do something about it.
Sometimes, though, the recommendations we make face resistance—from legislators, industry, or even both. In part that's because change is seldom easy, and it often requires significant investment. So starting in 2010, when our investigators reported that they were seeing many of the same causal factors repeated in investigation after investigation, we created something called the safety Watchlist. This was a list of the issues posing the greatest risk to Canada's transportation system. And our goal was to shine a light on those issues, because they were proving stubbornly—often fatally—resistant to change, and in doing so to encourage industry and regulators to work together. We have updated the list twice, most recently in November 2014 to highlight ongoing issues such as approach-and-landing accidents, and runway incursions. Problems that are well-known, but which stakeholders—globally, not just in Canada—have been less than successful at fixing.
There's also a third issue on the TSB's latest Watchlist that applies to aviation, as well as other transportation modes. And that's one I'd like to focus on today: safety management and oversight. And to better explore it, I'm going to pose two very simple questions to you today.
Why are some companies better at managing safety risks than others? and
Why does the TSB support the widespread implementation of safety management systems, or SMS?
To answer the first requires a little bit of background about the evolution of the work we do and how we think about accident causation.
It used to be, for instance, that the focus of accident investigation was on mechanical breakdowns. Then, as technology improved, investigators started looking at the contribution of crew behaviour and human-performance limitations. Nonetheless, people still thought things were “safe” so long as everyone followed standard operating procedures. Don't break any rules or regulations, went the thinking. Make sure the equipment isn't going to fail. And above all: Pay attention to what you're doing and don't make any “stupid” mistakes.
That line of thinking held for quite a while. In fact, even today, in the immediate aftermath of an accident, people on the street and in the media still think accidents begin and end with the person or people in the flight deck. So they ask if an accident was caused by “mechanical failure” or “human error.” Or they jump to false conclusions and say, “Oh, this was caused by someone who did not follow the rules,” as if that were the end of it. Case closed.
But it's not that simple. No accident is ever caused by one person, or by one factor.
That's why our thinking had to evolve. And so it has become critical to look deeper into an accident, to understand why people make the decisions they make. Because if those decisions and those actions made sense to the people involved at the time, they could also make sense to others, in future. In other words, if we only focus on “pilot error,” we miss out on understanding the context in which the pilots were operating. And that means shifting our focus to look at the organization.
In fact, when you get right down to it, many—if not most—accidents can be attributed to a breakdown in the way that organizations pro-actively identify and mitigate hazards and manage risks. And so the answer to that first question I posed—“Why are some companies better at managing safety risks than others?”—is this: They're better at it because their thinking about accident causation has evolved, and the way they deal with it has evolved as well. Their focus is no longer on the old maxim of “blame and punish” or “blame and retrain.” Instead, they look at the organizational factors that may have been contributory. They look at the company policies and procedures, and at the attitudes and the safety culture of an organization, and how those may have contributed. They also look at the way in which hazards are not just identified, but how they are reported to senior management, and then how those reports are received and actioned. Because all of these are things that can have a tremendous impact on the operating context of an occurrence.
Allow me to offer an example where these concepts played a role.
On March 13, 2011, a Sunwing Airlines Boeing 737 was departing Toronto's Lester B. Pearson International Airport with 189 passengers and a crew of seven. During the early-morning takeoff run, at about 90 knots indicated airspeed, the autothrottle disengaged after take-off thrust was set. As the aircraft approached the critical engine failure recognition speed, the first officer, who was the pilot flying, noticed an AIRSPEED DISAGREE alert and transferred control of the aircraft to the captain, who then continued the takeoff. During the initial climb, at about 400 feet above ground, the aircraft received a stall warning (stick shaker), followed by a flight director command to pitch to a 5° nose-down attitude. Fortunately, the takeoff was being conducted in day, visual conditions, allowing the captain to determine that the flight director commands were erroneous. Therefore, the captain ignored the flight director commands and maintained a climbing attitude. The crew then advised ATC of a technical problem that required a safe return to Toronto.
Now, some may consider this as “no big deal,” just something that occasionally happens—in this case, due to a failure in the pitot-static system. Yes, it resulted in inaccurate airspeed indications, stall warnings, and misleading commands being displayed on the aircraft flight instruments. But since the pilots handled it effectively, nothing serious came of it. There was, after all, no damage to the aircraft, nor were there any injuries to those onboard.
But what if the takeoff had been during darkness or IMC conditions, when the captain could not have so easily determined that the airspeed indicator was unreliable?
And that brings me to the second question I asked earlier: Why does the TSB back the implementation of safety management systems, or SMS?
To answer that, let's look at that same occurrence from an SMS perspective. After all, a mature, robust SMS, while it cannot be expected to predict and deal with every possible occurrence in advance, is supposed to have proactive processes to identify and mitigate hazards, and it should also have reactive processes to learn safety lessons from incidents when they do happen.
Now, in this case, back in September 2010—six months prior to the occurrence—Boeing had issued an advisory to 737NG operators regarding flight crew and airplane system recognition of, and response to, “erroneous main display airspeed situations” which could compromise the safety of a flight. I'm going to quote here directly from that advisory:
“The rate of occurrence for multi-channel unreliable airspeed events, combined with probability of flight crew inability to recognize and/or respond appropriately in a timely manner, is not sufficient to ensure that loss of continued safe flight and landing is extremely improbable.”Footnote 3
This is a classic example of what some researchers call a “weak signal,” because even though Boeing was pointing out that such events were occurring more frequently than predicted, Sunwing did not consider the notice as a statement of a hazard that should be analyzed by its proactive process. And Boeing's advisory, therefore, was not circulated to flight crews.
As to the operator's reactive processes… well, following the occurrence, the operator still did not see any hazards worthy of analysis by its SMS. At least initially. Again: the effective performance of the crew masked the underlying risks—so much so that Sunwing delayed reporting this incident to the TSB because, despite the potentially serious consequences, it did not recognize this event as a reportable aviation occurrence.
The theme of this year's IASS event is “Excellence in Flight Safety—Journey to Resilience.” Resilient companies, it goes almost without saying, aren't just those that have the fewest accidents. Resilient companies are the ones that recognize the changing conditions and the hazards in the world—and then adapt accordingly.
From my readings, there appears to be common characteristics attributed to high-reliability organizations, to resilient organizations, and to those that effectively manage safety risks. One of these is something called a “mindful infrastructure,” Footnote 4 which decision-makers and organizations are encouraged to develop. This involves tracking small failures, resisting oversimplification, taking advantage of shifting locations of expertise, and listening for—and heeding—those weak signals.
There are also a number of organizational factors that have been identified as indicative of a strong safety culture, including:Footnote 5
- a strong organizational emphasis on safety;
- congruence between tasks and resources;
- a culture that encourages effective and free-flowing communications;
- clear mapping of its safety state; and
- a learning orientation.
SMS—and I'll be explicit, in case anyone is missing the parallel I'm drawing here—is also about exactly that: putting in place a systemic, formal process to recognize hazards, to analyze them, and to implement mitigating measures to reduce the risks they pose. Put another way, the features that make an organization reliable and resilient—being proactive, flexible, listening to weak signals, the ability to let ideas and communication flow freely, not just from the top-down but from the bottom up … all of these things also make for a robust SMS.
And that is why the TSB backs SMS's implementation: because, properly implemented, it can significantly improve resiliency by giving operators a very powerful tool to anticipate and manage safety risks. In short, a way to look at all your operations in order to find trouble before trouble finds you.
Now, no SMS is perfect, and no company is perfectly resilient. It's impossible to predict every conceivable condition or accident in advance, and even the most robust SMS is subject to the same pressures that can affect any other corporate initiative. I'm talking about corporate attitudes, the level of commitment from senior management, competing priorities, finite budgets, etc.
In the case of the takeoff I just described, the operator had an SMS, but the hazards weren't initially recognized as worthy of analysis, nor was the incident later deemed significant enough to be worthy of further examination. Now, I want to be clear that the TSB is not blaming the operator; unfortunately, this happens more often than we'd like. In this case, the hazards weren't identified because, as I said, the crew's effective performance masked the underlying risks, but there are other possible reasons too. For example, at some companies, SMS may be a new concept, and those tasked with its development and implementation may still be learning. In other cases, an SMS may be something put in place only grudgingly, to comply with legislation, in which case it may exist on paper only, and not at all in day to day operations.
As more and more operators transition to safety management systems, the regulator must recognize that some operators may not always identify and mitigate hazards as they should. The regulator—for us, Transport Canada—must therefore adjust its oversight activities to be commensurate with the maturity of an operator's SMS. Some companies may just need a regulatory push. Others may need a greater frequency of inspections, or audits, to ensure that the systems they have in place are working. This is part of the third Watchlist issue I mentioned earlier—the role of government oversight in managing safety—and, going forward, I think this is going to be one of the challenges facing the transportation industry over the next few years. Don't get me wrong: air travel, particularly scheduled operations is, and continues to be, very safe. I'm not saying otherwise. But as we strive to constantly improve that already admirable safety record, this is one area where we can do more work.
It's one thing, though, to talk about improving an SMS. It's something else to give a concrete proposal for how. And I'd like to close by doing just that, and bringing up a subject that I know will be of interest to the people in this room. It's something I spoke about publicly for the first time earlier this summer at a speech I gave to the Airline Pilots Association, and something that the TSB is very interested in exploring as we go forward. I'm talking specifically about the use of cockpit voice recordings, and our belief that they can be useful in the context of a proactive, non-punitive SMS.
As the TSB has said many times over the years, when an accident occurs, a recording of the communication between the crew is often critical to our understanding of what happened, and—again—why. Currently, in Canada, this information is available only to TSB investigators, and it is privileged under legislation. We use these recordings to identify safety deficiencies, and only safety deficiencies. The TSB does not assign fault or determine criminal or civil liability.
But, having access to this information, especially if used in conjunction with information obtained from flight data recorders, may also be very useful for the operators involved—not, I need to stress, for punitive purposes, but in order to better understand what and why events occur, and especially to use in the context of an effective SMS.
Because really, it's all about understanding the why, and the more information operators have, the better they can do that. Companies can be proactive, working collaboratively with their employees and employee representatives, to identify trends, or take a closer look to see how severe a problem may be, and whether those problems are internal or external. For example, what, if any, additional training is required for personnel or what changes may be required to SOPs?
We're entering some new territory with all of this and, obviously, in order for such a big change to happen, there would need to be an amendment to our legislation, one that allows the recordings to be shared and prescribes the appropriate safeguards and the exact purpose and manner in which that might happen.
In the rail mode in Canada, this discussion between the TSB, the regulator and the industry is already underway. In fact, earlier this year, we announced that the TSB will be working with Transport Canada to conduct a joint safety study on the use of locomotive voice and video recorders—a study that will, in part, identify and assess related technology issues, and the associated legislative and regulatory considerations. Since then, the regulator has announced that the results of that study will help inform the basis of any regulatory or legislative changes that may be developed.
Again, though, I want to be clear: regardless of the study's result, and regardless of what changes come from it, if the same were to happen in aviation, it would need to be made very clear that these recordings from the flight deck must continue to be protected from punitive use.
Now, I am aware that this will be considered controversial. There are certainly many issues involved. But it's time we at least began the conversation. It is my hope—it is my belief—that this could lead to some big advancements. Because as more operators, not just in Canada but around the world, realize that understanding human factors—understanding the why—is critical if they are to prevent accidents … and if voice and video recordings become a part of a pro-active, non-punitive SMS … then our work is about to take a big leap forward in terms of safety.
The payoff—safer skies for everyone—is a goal we can all agree on.
That's food for thought.
- Footnote 1
Approach and Landing Accident Reduction (ALAR) toolkit.
- Footnote 2
TSB Investigation Report A11O0239
- Footnote 3
Source: Boeing Fleet Team Digest 737NG-FTD-34-10006, ATA 3400-00, “Recognition and Response to Erroneous Airspeed by Flight Crew and Airplane Systems,” 13 September 2010.
- Footnote 4
Weick, K. E. & Sutcliffe, K. M. (2007). Managing the Unexpected: Resilient Performance in an Age of Uncertainty. (2nd ed.) John Wiley & Sons Inc.
- Footnote 5
Westrum, Ron. (1999). Organizational Factors in Air Navigation Systems Performance Review Paper for NAV CANADA.
- Date modified: