About this Website » Analysis of Observational Studies, Experiments, and Surveys
Analysis of Observational Studies, Experiments, and Surveys
- Overview
Introduction
We reviewed documentation from various automation-related experiments, surveys (including our own),
observation studies, andother studies that represented a combination of approaches. In
these documents we identified and recorded supportive and contradictory evidence related
to the flight deck automation issues in 24 experiments, 29 surveys, and 15 observational studies.
Strength ratings depended on the type of study reviewed, the methodology and type of subjects used in
the study, and the type of evidence yielded by the study. Details are described below.
Experiments and Observation Studies
Some of the studies we reviewed were experiments conducted in simulators or in
laboratories. Others were observation studies in which an observer observed pilots in
simulators or in flight operations. In such studies, subjects could perform in a manner
consistent with or contradictory to the problem suggested by an issue statement. We
reviewed these studies for such supportive and contradictory evidence. Where the results
of such studies were reported as percentages of subjects performing in a manner consistent
with or in a manner contradictory to an issue, we assigned strengths according to the
following table.
| Strength |
% of Subjects |
| ± 5 |
90-100% |
| ± 4 |
75-89% |
| ± 3 |
50-74% |
| ± 2 |
25-49% |
| ± 1 |
1-24% |
For example, given an experiment in which 25% of the line pilot subjects did not
know how to perform an important flight management system operation we would have recorded
two instances of evidence related to issue105 (understanding of automation inadequate),
one supportive and one contradictory. The first would have been recorded with strength +2
for those pilots who lacked understanding, the second with strength -4 for the remaining
pilots.
In such a study in which we could not determine exact percentages, we used the
following table to assign strengths.
| Strength |
% of Subjects |
| ± 3 |
> 50% (e.g., 'most') |
| ± 1 |
number cannot be determined from excerpt (e.g., 'some') |
For example, if 'most' of the data from such a study supported (or contradicted) the
problem suggested by the issue statement of issue105, we would have recorded evidence for
it with strength +3 (or -3). If 'some' of the data supported (or contradicted) issue105,
we would have recorded evidence for it with strength +1 (or -1).
Some experimental studies we reviewed tested hypotheses. For each of these we based
evidence strength on the type of subjects used in the experiment, the type of tasks the
subjects performed, and the results. We used the following table to assign strengths.
| Strength |
Subjects |
Tasks |
Results |
| +5 |
line pilots trained on equipment |
line operations |
supportive |
| +4 |
line pilots trained on equipment |
simulated line operations |
supportive |
| +3 |
other line pilots |
simulated line operations |
supportive |
| line pilots trained on equipment |
other simulated flight tasks |
| +2 |
other line pilots |
other simulated flight tasks |
supportive |
| GA/student pilots |
simulated line operations |
| +1 |
line pilots |
generic automation tasks |
supportive |
| GA/student pilots |
other flight automation tasks |
| -1 |
line pilots |
generic automation tasks |
contradictory |
| GA/student pilots |
other flight automation task |
| -2 |
other line pilots |
other simulated flight tasks |
contradictory |
| GA/student pilots |
simulated line operations |
| -3 |
other line pilots |
simulated line operations |
contradictory |
| line pilots trained on equipment |
other simulated flight tasks |
| -4 |
line pilots trained on equipment |
simulated line operations |
contradictory |
| -5 |
line pilots trained on equipment |
line operations |
contradictory |
Subjects in such a study could be line pilots actually trained on
the equipment used in the experiment, other line pilots (not trained on the equipment used
in the experiment), general aviation (GA) pilots, or student pilots. The tasks the
subjects performed in the experiments could conceivably be tasks performed in actual line
operations or they could be tasks performed in simulated line operations, simulated flight
tasks in part-task simulators, or generic automation tasks performed in laboratories.
The results could either be supportive of or contradictory to an issue.
For example, consider an experiment conducted to test the hypothesis
that flightcrews respond more quickly to air traffic control (ATC) clearances when flying
the airplane manually than when using the flight management system (FMS). The experiment
involves line pilots using a part-task simulator modeling equipment they have been trained
on. Each flies several scenarios, flying half of them manually and half with the FMS, and
responds to ATC clearances. The results show that mean time to begin complying with an ATC
clearance takes, on the average, 4.5 seconds manually and 8.1 seconds with the FMS, and
that the difference is statistically significant at the p = 0.0963 level (good statistical
significance for this type of experiment). This would be supportive evidence for issue161,
When using automation, pilot response to unanticipated events and clearances may be
slower than it would be under manual control, possibly increasing the likelihood of unsafe
conditions. We would rate the strength of this evidence as +3 (line pilots trained on
equipment, other simulated flight tasks, supportive of issue).
Surveys
In some of the surveys we reviewed, respondents were asked to rate their
level of agreement with assertions equivalent to our issue statements. We assigned
strengths based on the percentage of respondents agreeing with or disagreeing with these
assertions, according to the following table.
| Strength |
% of Respondents |
| ± 5 |
90-100% |
| ± 4 |
75-89% |
| ± 3 |
50-74% |
| ± 2 |
25-49% |
| ± 1 |
1-24% |
For example, suppose that, in such a survey, 63% of the subjects agreed
or strongly agreed with the assertion "Overall, the flight management system reduces
workload." We would have recorded evidence related to issue079 (automation may
adversely affect pilot workload) with strength -3, for these results contradict the
problem suggested by issue079's issue statement. In this case we would not have recorded
supportive evidence, because the fact that some of the respondents (maybe as many as 37%)
disagreed that the FMS reduces workload does not mean that they think that it actually
increases workload.
In some of the surveys we reviewed, subjects were asked to respond to
assertions equivalent to our issue statements by giving their level of agreement as Likert
scores (e.g., 1 means strongly disagree, 5 means strongly agree). When the results were
given as a mean score as a percentage of the maximum possible score (a percentage of 5, in
the example), we used the following table to assign strengths to evidence.
| Strength |
Mean Score as a % of Maximum |
| -5 |
0-9% |
| -4 |
10-19% |
| -3 |
20-29% |
| -2 |
30-39% |
| -1 |
40-49% |
| 0 |
50% |
| +1 |
51-60% |
| +2 |
61-70% |
| +3 |
71-80% |
| +4 |
81-90% |
| +5 |
91-100% |
The table is based on the assumptions that scores could range from 0% of
maximum score (for strongly disagree) to 100% of maximum score (for strongly agree), that
the survey question was worded consistently with the asserted issue, and that 50% of
maximum score was neutral (did not agree or disagree).
If the assertion to which the subjects responded was worded to be
opposite to that of the issue statement, we reversed the strength signs. Unless response
distribution information was given, we did not count evidence both for and against the
issue unless minimum and maximum responses were given. For example, consider a survey in
which pilot respondents were asked to give their level of agreement with the assertion
"Pilots fully understand the flight management system." If the mean response was
1.9 on a scale of 1 (strongly disagree) to 5 (strongly agree), we would have recorded
evidence with strength +2 (1.9 is 38% of 5) since the subjects as a group tended to
disagree with the survey statement, thereby agreeing with the issue statement of issue105.
If it was also reported that the maximum response was 5, we would also have recorded
evidence with strength -1, since at least one subject strongly disagreed with the
issue105.
In surveys in which we could not determine exact percentages, we used
the following table to assign strengths.
| Strength |
% of Respondents |
| ± 3 |
> 50% (e.g., 'most') |
| ± 1 |
number cannot be determined from excerpt (e.g., 'some') |
For example, if 'most' of the respondents from such a study agreed with
an assertion consistent with the issue statement of issue079, we would have recorded
evidence for it with strength +3. If 'some' of the respondents agreed with the assertion,
we would have recorded evidence for it with strength +1.
Results
Experiments
We found evidence for flight deck automation issues in 24 of the
experiments we reviewed. The experiments are listed alphabetically by author and include
links to the bibliographic information and evidence found in the report.
| Investigator(s) |
Short Description of Experiment |
|
|
| Airbus Industrie |
An experiment designed to compare the sidestick/fly-by-wire
combination and conventional controls |
EVIDENCE |
| Airbus Industrie |
Two experimental studies comparing the performance of
conventional instruments and advanced instruments in the A310. |
EVIDENCE |
| Barbato, G. |
An experiment designed to evaluate impact of automatic target cueing and pilot voice recognition and automatic target cueing integrated into a single-seat fighter cockpit simulator on the pilot. |
EVIDENCE |
| Beringer, D.B. |
An experiment designed to explore some of the more serious
and more subtle malfunctions that have a moderate probability of causing the termination
of the flight in civil aviation. |
EVIDENCE |
| Beringer, D.B., & Harris, H.C., Jr. |
Two experiments designed to explore pilot response during autopilot malfunctions and system malfunctions that influence the autopilot that vary in how obvious and how quickly the effects are manifest. |
EVIDENCE |
| Edwards, R.E., Tolin, P., & Jonsen, G.L. |
A simulation study used to assess the impact of two
navigation- and two flight control modes on pilot visual behavior. |
EVIDENCE |
| Inagaki, T., Takae, Y., & Moray, N. |
An experiment designed to explore the effect of Go/Abort messages presented on the interface to aid the pilot in making correct Go/No-Go decisions. |
EVIDENCE |
| Lin, H.X. & Salvendy, G. |
An experiment design to explore whether a specific class of warnings can reduce human error by increasing their their level of conceptual knowledge. |
EVIDENCE |
| Lozito, S., McGann, A., & Corker, K. |
An experimental simulation used to investigate the effect of
using the data-linked ATC and an automated flightdeck. |
EVIDENCE |
| Mosier, K.L., Skitka, L.J., Heers, S., & Burdick, M. |
An experimental simulation used to investigate omission and commission errors resulting from the use of automated cues as a heuristic replacement for vigilant information seeking and processing. |
EVIDENCE |
| Mumaw, R.J., Sarter, N.B., & Wickens, C.D. |
A fixed-based simulator experiment of Boeing 747-400 line pilots designed to address issues related to the role of pilot monitoring in the loss of mode awareness on automated flight decks. |
EVIDENCE |
| Muthard, E.K. & Wickens, C.D. |
An experiment designed to investigate the effects of automation on the pilot's task of plan monitoring and making plan revisions. |
EVIDENCE |
| Muthard, E.K. & Wickens, C.D. |
An experiment designed to investigate the effects of automation and task loading on the pilot's task of plan monitoring and making plan revisions. |
EVIDENCE |
| Petridis, R.S., Lyall, E.A., & Robideau, R.L. |
A study in which the activities of pilots were coded during
flight and then coding was used to analyze the effect of automation |
EVIDENCE |
| Pritchett, A.R. & Johnson, E.N. |
A part-task simulator study conducted to explore incidents
occurring in A320 aircraft involving Vertical Speed Mode |
EVIDENCE |
| Pritchett, A.R., & Johnson, E.N. |
An experimental simulator study was run to test pilot
detection of an error in autopilot mode selection |
EVIDENCE |
| Riley, V., Lyall, E., & Wiener, E. |
Two experiments were designed to identify and characterize factors that influence pilot decisions about whether or not to use automation. The first was a simple computer-based experiment and the second was a series of similator studies. |
EVIDENCE |
| Riley, V.A. |
Experiments designed to provide basic empirical evidence on
how selected factors influence automation use decisions |
EVIDENCE |
| Roscoe, A.H. |
An experiment used to compare levels of workload between B767
and B707-200 |
EVIDENCE |
| Sarter, N.B. & Woods, D.D. |
A part-task simulator experiment designed to address issues
related to pilot's proficiency in standard tasks, mental models of the FMS, and mode
awareness |
EVIDENCE |
| Sarter, N.B. & Woods, D.D. |
An experimental simulation study of mode awareness and
pilot-automation coordination on the flight deck of the A-320 |
EVIDENCE |
| Skitka, L.J., Mosier, K.L., Burdick, M., & Rosenblatt, B. |
Experiment to study whether two-person crews are as likely as one-person crews to commit errors due to automation bias. |
EVIDENCE |
| Speyer, J.J. & Blomberg, R.D. |
An experiment involving the interrogation pilots for scaled
workload assessment ratings |
EVIDENCE |
| Speyer, J.J., Fort, A., Fouillot, J.P., & Blomberg, R.D. |
A comparison of workload between DC-9 and A300FF using the
Static Taskload Analysis |
EVIDENCE |
Observation Studies
We found evidence for the flight deck automation issues in 15 of the
observation studies we reviewed. The observation studies are listed alphabetically by
author and include links to the bibliographic information and evidence found in the
report.
| Investigator(s) |
Short Description of Observation Study |
|
|
| Billings, C.E. |
Presents principles and guidelines for human-centered automation in aircraft and in the aviation system. |
EVIDENCE |
| Bruseberg, A., & Johnson, P. |
Discusses human-computer collaboration and it's relationship to different foci that can be used to model temporal aspects of tasks in dynamic and complex work situations. |
EVIDENCE |
| Damos, D.L., John, R.S., & Lyall, E.A. |
An observational study designed to explore the relationship between the level of automtion in the flight deck and the amount of time the pilot spends performing specific activities. |
EVIDENCE |
| Damos, D.L., John, R.S., & Lyall, E.A. |
An observational study designed to investigate the frequency of 23 activities that were varied as a function of the level of auotmation in the flight deck. |
EVIDENCE |
| Hughes, D. |
Observation made during visit to TWA and Air Canada training
centers in St. Louis and and the flights on which the author road jumpseat |
EVIDENCE |
| Norman, S.D., & Orlady, H.W. |
A discussion of the major ideas and concepts presented in the
panels and papers about automation in the air transport system |
EVIDENCE |
| Orlady, H.W. |
A discussion of training issues for advanced technology
aircraft |
EVIDENCE |
| Palmer, E.A. & Mitchell, C.M. |
Flight deck descent procedures were developed for a field evaluation of the CTAS Descent advisor conducted in the fall of 1995. |
EVIDENCE |
| Sarter, N.B. & Woods, D.D. |
An observation of simulator check ride in the process of
transitioning to the B-737-300 aircraft |
EVIDENCE |
| Sarter, N.B. & Woods, D.D. |
A discussion about mode awareness problems in glass cockpits |
EVIDENCE |
| Sarter, N.B. & Woods, D.D. |
A study of the pilots' transition into the advanced
technology B737-300 from non-advanced aircraft. |
EVIDENCE |
| Wiener, E.L. |
An observation made while author was riding in the jumpseat
of a glass cockpit aircraft |
EVIDENCE |
| Wiener, E.L. |
A discussion of the management of human error in the cockpit. |
EVIDENCE |
| Wise, J.A., Abbott, D.W., Tilden, D., Dyck, J.L., Guide,
P.C., Ryan, L. |
A discussion with workshop participants used to investigated
the impact of automation in corporate aviation cockpits |
EVIDENCE |
| Woods, D.D. |
A discussion of intelligent interfaces |
EVIDENCE |
Surveys
We found evidence for the flight deck automation issues in 29 of the
surveys we reviewed. The surveys are listed alphabetically by author and include links to
the bibliographic information and evidence found in the report.
| Investigator(s) |
Short Description of Survey |
|
|
| Airbus Industrie |
A detailed survey given to teams of visiting aircrew about
the A320 sidestick/fly-by-wire 'proof of concept' |
EVIDENCE |
| Braune, R. |
A survey of Deutsche Lufthansa pilots flying in a mixed fleet
of 737-200/ -300 |
EVIDENCE |
| Bruseberg, A., & Johnson, P. |
This paper discusses the merits of drawing analogies between human-computer interaction and human-human collaboration in the light of the ever-advancing capability of computer systems. |
EVIDENCE |
| Curry, R.E. |
A survey of pilots during the introduction of an advanced
technology B-767 aircraft |
EVIDENCE |
| Gras, A., Moricot, C., et. al. |
A survey of pilots working for French airline companies to
assess their attitudes about flight deck automation |
EVIDENCE |
| Hutchins, E., Holder, B., & Hayward, M. |
A survey of line pilots' attitudes about autoflight automation. |
EVIDENCE |
| James, M., McClumpha, A., Green, R., Wilson, P., &
Belyavin, A. |
A survey of UK commercial pilots used to assess their
opinions and attitudes toward advanced automated aircraft |
EVIDENCE |
| Last, S., & Alder, M. |
A survey of pilots to determine the views of line pilots
about the lack of feedback movement of A320 thrust levers |
EVIDENCE |
| LUFTHANSA Airline |
A survey of pilots about general characteristics of airplane
and electronic interfaces |
EVIDENCE |
| Lyall, B., Wilson, J., & Funk, K. |
A survey of pilots for evidence related to flight deck automation issues. |
EVIDENCE |
| Lyall, E.A. |
A survey of 737 pilots to assess the effects of allowing
pilots to concurrently fly two derivatives of the Boeing 737 |
EVIDENCE |
| Lyall, E., Niemczyk, M. & Lyall, R. |
A survey of aviation experts used to compile evidence for problems or concerns about flightdeck automation. |
EVIDENCE |
| McClumpha, A.J., & James, M.R. |
A survey of UK commercial pilots used to assess their
opinions and attitudes toward advanced automated aircraft |
EVIDENCE |
| Morters, K. |
A survey of pilots used to examine issues concerning the
pilot-automated flight-deck interface on the B767 |
EVIDENCE |
| Noyes, J.M. & Starr, A.F. |
A survey of commercial flight crews to identify user requirements for designing the next generation of warning systems. |
EVIDENCE |
| Orlady, H.W., & Wheeler, W.A. |
A survey of pilots of advanced technology aircraft used to
investigate training and maintenance of basic flying skills |
EVIDENCE |
| Rash, C.E., Adam, G.E., LeDuc, P.A., & Francis, G. |
The study identified which aspects of the two cockpit designs were most favorable or troublesome to the pilots, and identified differences in opinions across pilots who flew traditional or glass cockpit designs. |
EVIDENCE |
| Rudisill, M. |
A survey of line pilots' attitudes about flight deck
automation - I |
EVIDENCE |
| Rudisill, M. |
A survey of line pilots' attitudes about flight deck
automation - II |
EVIDENCE |
| Sarter, N.B. & Woods, D.D. |
A survey of pilots' experiences with training for and the
operation of the A320 automation |
EVIDENCE |
| Sarter, N.B. & Woods, D.D. |
A survey of B-737-300 pilot's attitudes about FMS |
EVIDENCE |
| Sherman, P.J., Helmreich, R.L., & Merritt, A. |
Survey of multi-national airline pilots' attitudes toward automation. |
EVIDENCE |
| Speyer, J.J. |
A survey of pilots about the quality of the man-machine
interface |
EVIDENCE |
| Stefanovich, Y., & Thouanel, B. |
An exchange of opinions about the A320 among pilots |
EVIDENCE |
| Wiener, E.L. |
A study of the pilots' transition into the advanced
technology B757 from non-advanced aircraft. |
EVIDENCE |
| Wiener, E.L. |
A longitudinal survey of pilots about transitioning from a
traditional technology aircraft to a highly automated derivative model |
EVIDENCE |
| Wiener, E.L., Chidester, T.R., Kanki, B.G., Palmer, E.A.,
Curry, R.E., & Gregorich, S.E. |
A survey of pilots to assess subjective workload in a LOFT
simulator experiment |
EVIDENCE |
| Wiener, E.L., Chidester, T.R., Kanki, B.G., Palmer, E.A.,
Curry, R.E., & Gregorich, S.E. |
A questionnaire given to pilots designed to elicit their
opinions, experience level, and specific information and viewpoints on the DC-9 and MD-88 |
EVIDENCE |
| Wise, J.A., Abbott, D.W., Tilden, D., Dyck, J.L., Guide,
P.C., Ryan, L. |
A survey of pilots who regularly fly corporate missions used
to obtain information on various aspects of cockpit automation |
EVIDENCE |
|