[HJigher-order cognitive constructs must be definable and assessable in the context of processes and outcomes that regularly occur in the classroom (e.g., class discussions, teacher-made tests, and student projects). Applying curriculum-based assessment to higher-order thinking would first require a detailed analysis of the particular skills involved in higher order thinking. These skills would be transformed into specific objectives ordered from most basic to most advanced. Using this approach, teachers could assess student progress on a hierarchy of higherorder thinking skills. Progress would be assessed in terms of both the nature and quantity of thinking skills mastered. (Williams, 1999, p. 423) Metacognition Flavell (1979) distinguished metacognitive knowledge as ideas and beliefs about our cognitive processes (including ideas about ourselves, other people, learning tasks and strategies), and he described the idea of metacognitive experience as "any conscious cognitive or affective experiences that accompany or pertain to any intellectual exercise" (p. 906). Metacognitive knowledge directs how we manage our intellectual tasks, and assesses how likely we are to be successful, while metacognitive experiences affect which goals we choose, and whether we persist in achieving them (or not). Bandura (1977) described self-efficacy (including efficacy expectations and outcome expectations) as a set of psychological variables, which relate to successful (or unsuccessful) learning, and Bandura (1986) elaborated the metacognitive self-regulatory mechanisms (selfobservation, judgment of performance, and self-reaction), which affect learning processes.

According to Bandura (1986), self-observation (or self-monitoring) does not simply refer to noticing what occurs, but is closely involved with judgment and selfreaction, since we may monitor the quality of our learning performance and the rate at which we are learning. Such observations relate to our prior standards of performance, providing the data for self-judgment and self-reaction. Bandura noted that high selfmonitoring is associated with high motivation for the task, and monitoring that is more proximate to the task is more useful for learning than that which is removed in time.

Judgments of one's performance relate to internal standards, which are the product of social influences (either modelled by others or directly taught); and the "development of evaluative standards and judgmental skills establishes the capacity for self-reactive influence" (Bandura, 1986, p. 350).

Facione's (1990) Delphi Committee elaborated a similar scheme, describing two

self-regulation "sub-skills," namely self-examination and self-correction. Selfexamination refers to a complex set of processes, which include the following abilities:

... to reflect on one's own reasoning and verify both the results produced and the correct application and execution of the cognitive skills involved... to make an objective and thoughtful meta-cognitive self-assessment of one's opinions and reasons for holding them... to judge the extent to which one's thinking is influenced by deficiencies in one's knowledge, or by stereotypes, prejudices, emotions or any other factors which constrain one's objectivity or rationality... [and] to reflect on one's motivations, values, attitudes and interests with a view toward determining that one has endeavored to be unbiased, fair-minded, thorough, objective, respectful of the truth, reasonable, and rational in coming to one's analyses, interpretations, evaluations, inferences, or expressions. (Facione, 1990, pp.

10-11) Facione (1990) describes self-correction as the ability to react to errors revealed by self-examination, and specifically to "design reasonable procedures to remedy or correct, if possible, those mistakes and their causes" (p. 11). Bandura (1986) devotes considerable attention to the "self-reactive influences" involved in self-regulation, and his social cognitive analysis of the processes involved in self-correction should be of considerable interest to professional educators. Indeed, Bandura notes, self-observation not only "provides the information necessary for setting realistic performance standards and for evaluating ongoing changes in behaviour" (Bandura, 1986, p. 337), but it also serves as a dynamic source for new information that affects action and adaptation. In particular, self-analytic and self-diagnostic processes help us to learn about our affective and cognitive reactions to various types of events, and such efforts are instrumental in determining the conditions under which we perform well or poorly. Furthermore (as recommended by Neuringer, 1981), "By systematically varying things in their daily lives and recording the accompanying personal changes, people can discover how these factors influence their psychological functioning and sense of well-being" (Bandura, 1986, p.

338). Also, self-observation contributes to self-motivation, as, "Goal-setting enlists evaluative self-reactions that mobilize efforts towards goal attainment... A number of factors, some relating to the persons, others to the behavior, and still others to the nature and type of self-monitoring can affect the likelihood that observing how one behaves will enlist self-reactive influence" (Bandura, 1986, p. 338).

Bandura (1986) provides an extensive set of descriptions of self-judgemental and self-reactive functions, and of the influences that are involved in the psychology of selfregulation. Judgment is closely tied to the development of individual standards, acquired through social modelling. Aside from ideal and theoretical standards, Bandura notes that comparison with others provides a convenient basis for self-judgment, and that it is important for students to recognize the value of performing to high social and academic standards (rather than being content with expending less effort by emulating the results of their less accomplished peers). He points out that an important part of self-evaluation is the development of appropriate norms, with regard to one's social group, and also taking into account one's previous performance in the milieu. Other influences on self-reaction include the valuation of the activities being learned, with high valuation being associated with maintaining or increasing one's welfare and self-esteem. "Thus, the more relevant one's performances are to one's sense of personal adequacy, the more likely selfevaluative reactions are to be elicited in that activity" (Bandura, 1986, p. 349). Selfreaction is also affected by the perception of the determinants of one's behaviour; causal attribution of success to our own abilities and to effort expended, rather than to external factors over which we exert less control, results in greater self-satisfaction.

Bandura warns. "[Internalization of dysfunctional standards of self-evaluation can serve as a source of chronic misery" (Bandura, 1986, p. 357). Indeed, to be most effective, educators must attend to the metacognitive functions which support the maintenance of coherent frameworks of ideas, whether our ideas concern the content of academic disciplinary subjects, or whether they are associated with more general human concerns (such as maintaining our social relationships, or applying for a job). Sternberg (1987) argues that schools should prepare students for life by teaching cognitive skills, including knowledge acquisition skills, performative skills, and metacognitive selfregulation. To succeed in higher learning, students must learn to combine these types of learned abilities; they must use workable learning strategies, they must develop "appropriate" mental representations of things and processes in the world, and they must be motivated to use these thinking skills.

Aulls and Shore (2008) point out, "Forms of traditional instruction are not likely to promote students to learn to be inquirers... Students are expected to be passive more than active learners who acquire factual and conceptual knowledge by hearing it or seeing it rather than thinking and doing" (pp. 15-16). These authors recommend that inquiry be treated by educators as a curricular imperative, and that teachers use inquirybased and student-centred methods in their practices. "In order to be engaged in inquiry learning... [instruction must be more centred on the learner than the teacher... For students to become more active learners, they must take on more responsibility for what and how to learn" (p. 9).

Academic and practical understandings of metacognition and metacognitive selfregulation (MSR) are essential to educators and to educational researchers in facilitating instruction in the definition and resolution of complex and ill-defined (academic or practical) problems. One aim of research in metalearning and metacognition is to raise the awareness of teachers and students with regard to the relevance of metacognitive functions to learning and cognitive development; eventually, the use of metacognitive self-regulatory functions may become well-understood, and widely spread, throughout our schools and our workplaces.

Empirical Research on Teaching Thinking In a recent effort to discover which instructional interventions, and under what conditions, are effective in facilitating the development of critical thing (CT) skills and dispositions, Abrami, Bernard, Borokhovski, Wade, Surkes et al. (2008) systematically reviewed over 3700 abstracts of papers, and retrieved 1300 articles and reports for closer analysis. Applying the method of quantitative meta-analysis to extract information from empirical research reports, we analyzed one hundred seventeen articles (dated from 1953 to 2003) that contained enough statistical data for us to calculate, or to estimate, effect sizes (in terms of Cohen's d, the mean difference between two groups divided by the pooled standard deviation). Some papers reported the results from more than one comparison, and we calculated one hundred sixty-one effect sizes that examined CT skills (or dispositions) from experimental, quasi-experimental, and pre-experimental studies. These comparative results ranged from +2.90 to -1.36 (where negative effect sizes indicate that control group scores were, on the average, higher than those of the treated group). There were one hundred thirty-seven positive effects, and twenty-four negative ones, and the mean of these effect sizes (without weighting by sample size) was +0.569. This section presents a qualitative review of the twenty papers that produced the ten highest, and the ten lowest, effect sizes with regard to CT skill development, comparing the features of each of these studies with one another, in order to see whether any pedagogical or methodological features tend to predominate in the "successful" studies (those with high positive effect sizes). A qualitative review of the attributes of the reported interventions may reveal some features of the successful intercessions that distinguish them from the unsuccessful ones. In the following sections I will summarize these studies, evaluate their study features, and discuss their relevance; in addition, I will draw some general conclusions about the difficulties faced in drawing general conclusions about such endeavours.

Review of Twenty Selected Studies Studies with Positive Effect Sizes

1. Annis and Annis (1979) showed that Ethics students significantly outscored Introduction to Philosophy students and the Control class in Deduction and Interpretation, and that Logic students significantly outperformed the other three groups in Inference, while no significant differences were found in Recognition of Assumptions, or Evaluation of Arguments. Since only post-test mean scores were provided, Abrami et al. 's calculations did not take pre-test scores into account, and our comparison of the results indicated that Logic students outperformed the Control subjects by a margin of nearly three standard deviations (d = +2.91).

2. McCarthy-Tucker (1998) reported that high school freshman and sophomore students in English and Algebra who received instruction in formal logic showed much greater improvement (from pre-test to post-test) on two standardized measures of thinking, the Test of Logical Thinking (TLT) and the Content Specific Test of Logic (CSTL), than untreated control subjects (d = +2.54 and d = +0.59, respectively).

3. In a pre-experimental (one group pre- and post-test) study of inservice teachers, Robinson (1987) worked with eighteen educators on encouraging their elementary (kindergarten to grade three) students to think interpretively, reflectively and intelligently, and to acknowledge complexity. The teacher training program emphasized using questioning to encourage thinking, modelling (personifying listening, problemsolving, calmness, understanding and enthusiasm), and facilitating logical thought.

Teachers' mastery of CT teaching skills was evaluated by trained observers according to classroom performance, assessed by checklist which included fourteen ratings, nine of which were provided in an appendix (apparently five were inadvertently omitted): fosters a climate of openness, encourages student interaction/co-operation, demonstrates attitude of acceptance, models reasoning strategies, encourages transfer of cognitive skills to everyday life, elicits verbalization of student reasoning, probes student reasoning for clarification, encourages students to ask questions and promotes salient reflection of ideas. Raw pre- and post-test scores for each participant were provided, and they showed significant positive gains in mastery of teaching thinking skills; gain scores in this preexperimental paradigm translated to an effect size of d = +2.50. As far as the teachers' elementary school pupils were concerned, statistical measures were not calculated, but pre- and post-evaluation summary results of thinking skills tests (knowledge, comprehension, application, analysis, synthesis and evaluation) were provided, and positive gains were reported.

4. Zohar, Weinberger and Tamir (1994) developed the Biology Critical Thinking Project to support seventh grade biology students in Israel in developing their CT skills (which included recognizing logical fallacies, distinguishing between experimental findings and conclusions based on findings, identifying tacit and explicit assumptions, avoiding tautologies, isolating variables, testing hypotheses, and identifying relevant information).

