Random control trials were used to test the efficacy of the program on two dependent variables developed for this study, a general CT test (administered before and after the training) and a biology CT test (post-test only). Average scores were reported for nearly five hundred students, and the results were highly favourable for the program, as experimental students registered higher average gain scores on the biology CT test (d = +2.32), and also outperformed the control group on the general CT test (d = +2.09).

5. Marzano (1989) reported the results of the Tactics for Thinking program for elementary school, high school and community college students in the United States. The program was designed to teach the following twenty-two thinking strategies: attention control, deep processing, memory frameworks, power thinking, goal setting, the responsibility frame, concept attainment, concept development, pattern recognition, macro-pattern recognition, synthesizing, proceduralizing, analogical reasoning, extrapolation, decision making, evaluation of evidence, evaluation of value, elaboration, nonlinguistic patterns, everyday problem solving, academic problem solving, and invention. Reported results (mostly quasi-experimental, comparing treated and untreated groups of students) were favourable; Abrami et al. were able (given published t values) to

estimate five positive effect sizes, ranging from +1.68 to +2.31, for the following skills:

analogical reasoning (ninth grade), extrapolation (grades seven and ten), examination of value strategy (grade eleven), and decision making strategy (grade ten). Although sufficient data was provided only for these five effect size calculations, the Marzano study also reported many other statistically significant comparisons between treated and untreated groups; of twenty-six such comparisons, only one did not produce a significant gain for the treatment group in comparison to the respective control group.

6. Using a quasi-experimental design, Riesenmy, Mitchell, Hudgins and Ebel (1991) taught self-directed critical thinking to 70 fourth and fifth grade students in St. Louis public schools. They expected that students who were taught the roles of four modes of thinking (task definer, strategist, monitor and challenger) would perform better on a problem solving post-test which demanded both lateral and vertical transfer of thinking skills. This prediction was fulfilled by the results. Three groups of treated students outscored the control group; the group who wrote immediate post-tests had greatly superior scores on average {d = +2.30); a second group tested four weeks later outscored the controls (d = +2.00), and a third group, tested eight weeks later, also outperformed the control students (d = +0.68). Thus, while the effects of critical thinking training were immediately evident, the benefits seemed to degrade over time.

7. To test the effects of problem based learning on the development of medical students' critical thinking skills, Kamin, O'Sullivan, and Deterding (2002) used digital video case simulations followed by group discussions as an instructional method. One group of students who viewed the cases on video discussed the case online, a second group saw the videos and discussed them face to face, while a third group received a text account of the case (rather than a video) and participated in face to face discussions. Content analysis of the discussions was used to assess critical thinking demonstrated by each group; results showed that, video presentation seemed to facilitate critical thinking, and the online discussion group scored highest, outperforming the text group by a wide margin (d = +2.20). The authors suggested that the online discussion format provided better opportunities for the students to concentrate on articulating their ideas.

8. In a study conducted by Champion (1975) for his doctoral dissertation, ninety-seven fourth grade students in Pennsylvania received short-term instruction in distinguishing fact from inference; two quasi-experimental comparisons between treated and untreated groups demonstrated significantly higher gains (d = +2.15 and d = +1.40) in treated students' scores on the Van Pit Thinking Test.

9. Feuerstein (1999) investigated the effects of a Media Literacy program, which was intended to teach primary school students in Israel to be critical of media advertising. The instruction included activities connected with defining and researching problems, decision-making, drawing conclusions and verifying conclusions. The dependent variable for the quasi-experimental design was a language and media test, administered before and after the four month training course, and the results showed a large increase in average scores for the treatment group, with only a slight increase for untreated control group students (/ =+2.10).

10. Daley, Shaw, Balistrieri, Glasenapp, and Placentine (1999) used the construction of concept maps (as recommended by Novak and Go win, 1984) as a method for both the teaching and the assessment of critical thinking. Fifty-four nursing students were taught to create concept maps (diagrammatic representations of conceptual frameworks showing hierarchical organization and specifying links between ideas) as part of their training in clinical practice; their early efforts were compared with their third assignment at the end of the semester-long course. Eighteen cases were selected for analysis, and a significant improvement (d = +1.90) was recorded from the first assignment to the last, which the authors claimed was "indicative of the students' increase in conceptual and critical thinking" (Daley et al, 1999, p. 45).

Studies with Negative Effect Sizes

11. In an attempt to promote CT skills in community college students studying microbiology, Norton (1985) withdrew all instructional support in laboratory work from an experimental group (except for safety supervision). While a control group worked in pairs and were guided by the instructor in following the lab manual (which gave step-bystep instructions for identifying an unknown bacterial culture), the "independent study" group worked without instructor support, selecting procedures to be performed, performing the procedures, and interpreting the results. (All students followed the manual for the first few weeks of the term to learn the procedures, and the manual was always available to the treatment group). After three weeks, the control group outperformed the experimental group on the Watson-Glaser Critical Thinking Appraisal (WGCTA) by a small margin (d = -0.18). Norton suggested that the WGCTA might not be sensitive to increased CT skills in this setting, that learning styles may have accounted for unmeasured influences on results, and that the treatment may have been too short in duration to have had a measurable effect.

12. Stekel (1969) also developed a program of independent study in a physical sciences laboratory setting, which offered an intact experimental group of freshman non-science majors the opportunity to select the topics that they would study, and to design their own experiments. A control group underwent a conventional program that assigned a particular experiment each week; each group was pre- and post-tested using alternate forms of the WGCTA. While the author reported that both groups increased their CT scores (at the statistical significance level ofp 0.10), the control students' average gain (3.78 points) was higher than that registered by the experimental group (2.13 points), which translates to an effect size of d = -0.22.

13. Using the California Critical Thinking Skills Test (CCTST) as their dependent variable, and employing a quasi-experimental design, Arburn and Lowell (1999) tested the idea that training in question generation would lead to improvement in CT. Two intact community college classes in Human Anatomy and Physiology were studied; the experimental group was taught to apply a set of generic question stems to construct questions on the subject matter. This technique was meant to facilitate complex thinking, and the results indicated that both groups of students (thirty-seven control and thirty-one experimental subjects) scored slightly higher on the post-test; however, the control group gained 1.27 points, compared to 0.29 points for the treated students id = -0.24).

14. Kemp and Sadoski (1991) used training in the appropriate formation of generalizations in attempting to increase the critical thinking of high school history students. Two intact groups of eleventh grade world history students were compared, after one class received specific training in explicit methods of forming cogent generalizations. While the authors reported no significant difference between pre- and post-test Cornell Critical Thinking Test scores, the means indicate that both groups achieved lower scores on the post-test compared with their pre-test performances. The experimental group showed a decrement of 4.47 points, while the control students scored

1.75 points lower {d = -0.28).

15. In attempting to teach analogical reasoning to approximately one hundred fifth and seventh grade students, Hartman-Haas (1984) used the Children's Association Responsing Test to assess this skill in children who had been taught a "holistic approach to improving thinking" (which included training in thinking, listening, remembering, reading, writing, speaking, active class participation, attitudes, clarification, logic and argumentation). The program lasted seven months, and post-test results (compared with matched groups who had not received the training) indicated that, while treated grade seven students scored significantly higher than their control group (d = +0.59), untrained peers outscored fifth grade students who had participated in the program (d = -.34). The author speculated, "Seventh grade students may have had more highly developed abstraction skills than fifth graders, which may be important for consolidating and demonstrating gains from programs which emphasize the development of higher-order thinking skills" (p. 20). She also pointed out that the Grade Five class suffered from a (traumatic) interruption in their studies after their teacher was injured in a traffic accident.

16. Moffett (1998) evaluated CT through the analysis of writing samples provided by students of eighty-seven teachers in Indiana who taught grades eight through twelve. In this study, teachers were provided with a monthly set of study materials and a study guide, which were designed to promote critical thinking through activities in visual, performing, and literary arts. Two cohorts of the treated teachers' classes, and two sets of control classes (approximately one thousand seven hundred students), provided pre- and post-test essays for assessment, and the results were uniformly negative; post-test average scores for all four groups were lower than pre-test performances, and in one of the two comparisons the experimental subjects showed a greater decrement than controls (d = Ennis, Finkelstein, Smith, and Wilson (1969) attempted to teach conditional logic to elementary school students (grades one, two and three) by presenting fifteen audiotaped lessons in as many weeks. Each tape presented a lesson in logical thinking associated with various problem-solving tasks, and was intended to teach an aspect of using conditional logic; post-tests (the Smith-Sturgeon Conditional Reasoning Test, created for the project) assessed the children's thinking skills in the areas of inversion, conversion, contraposition and transitivity. Contrary to expectations, students exposed to the lessons performed no better at the logic tests than control groups for each grade level (and the grade two control participants scored much higher than their treated counterparts, d = The researchers concluded that the training they had devised was inadequately effective under the circumstances.

18. Saucier, Stevens and Williams (2000) studied one hundred twenty nursing students in Texas who were taking a course in Nursing Care of the Family. Random assignment to control and experimental groups allowed for the latter to perform clinical case studies through simulations using computer-assisted instruction (CAI), while the control group participated in the "traditional written nursing process" for fifteen weeks. While neither

process was described in detail, both were reported to include the following steps:

Assessment, Nursing Diagnosis, Client Goals, Planning Intervention, Actual Intervention and Evaluation of Goal Attainment. All students were pre- and post-tested using the CCTST, and the authors reported that the experimental treatment was not a significant predictor variable in a multiple regression of post-test results. Mean scores indicated that the control group outperformed the CAI students on the CCTST (d = -0.52).

19. Gibbs, Brown and Keeley (1988) reported surprising and discouraging results of their attempt to educate fifty university faculty members in critical thinking skills. Faculty from twenty-five departments at University of Wisconsin Eau Claire participated in a development program that was designed to "alert faculty to the need for a greater focus on higher order cognitive thinking in their classrooms" (p. 3). The program included discussing critical thinking skills and attitudes, teaching styles for facilitating CT, selfassessment of CT engagement, and pedagogical methods consistent with CT objectives.

In a random control experimental design, twenty-two faculty members who had applied to the program were designated as untreated controls, and after six four-hour training sessions over two semesters, the experimental group scored lower than the control subjects on the Ennis-Weir Critical Thinking Test (EWCTT; d = -0.66). The authors, in attempting to explain the results, pointed out that, given all the planned activities, "it was impossible to build into the training sufficient time for faculty to practice critical thinking activity" (p. 13). They also noted that the EWCTT is limited in its scope (concentrating on the identification of reasoning fallacies), and may be inappropriate for measuring a "broader concept" of CT. They suggested that "compensatory rivalry" might have motivated the control group (who "may have resented" their exclusion from the program) to take more care in responding to the test.

