«INDIVIDUALIZING ELEMENTARY GENERAL MUSIC INSTRUCTION: CASE STUDIES OF ASSESSMENT AND DIFFERENTIATION By Karen Salvador A DISSERTATION Submitted to ...»
MENC also recommended that the assessment process be transparent and open to review by interested parties. Transparency refers to a teacher’s willingness to share information about the material being assessed, how it was measured, and how the assessments were scored. For example, if a parent wanted to know how a teacher determined that their child was a “limited range singer,” the teacher could share the rubric used in evaluating the child’s performance or even a video or audio recording of the child singing. If teachers design valid, authentic measures that reflect the standards and benchmarks that were taught, transparency becomes less difficult.
2 Although MENC specifies that assessments must be both reliable and valid, reliability is a necessary precursor to validity. That is, if an assessment tool is not reliable (does not yield the same or similar results in varied trials) it cannot be valid (measure what it purports to measure).
Therefore, the remainder of this study will refer to validity rather than both reliability and validity.
A thorough discussion of the first guideline, that assessment should be standards-based, is beyond the scope of this paper. Not all teachers and researchers agree that a standards-based assessment model is necessarily the right road for music education to choose. I have already cited Eisner’s trepidation about the short-term appeal of standards-based education, and his suggestion that excellent education should result in increased diversity of outcomes as well as raising the mean performance level of students. Suffice it to say that this paper is focused on how assessments allow individual teachers to personalize teaching in the moment to meet the unique music learning needs of each student.
MENC’s second recommendation—assessment should support, enhance, and reinforce learning—defines the optimal role of assessment in the elementary general music classroom as a natural outgrowth of instruction. Brummett and Haywood (1997) proposed conceptualizing teaching, learning, and evaluating as interrelated rather than separate. That is, although all of these activities occur in each music class, on some days, the balance shifts more toward one or another. The game of chess may be a useful metaphor: the player (teacher) routinely checks in with each piece (student) to ascertain needs and create strategies for moving forward. Similar to chess pieces, our students come with different needs and abilities, but, when guided by an expert, each contributes his or her own strengths. Using a variety of assessments to check in with each of the “chess pieces” allows each child to move forward in the way that is best. The problem with this analogy is that the chess pieces have no self-determination and individual pieces are sacrificed in order to win the game. Unlike a chess player, a teacher values what individual children bring to the teaching/learning transaction and hopes that each child will learn and grow.
Purposes and types of assessment. Miller, Linn, and Gronlund (2009) described the
following purposes and types of assessment when discussing general education classrooms:
In any classroom, there are substantial individual differences in aptitude and achievement. Thus, it is necessary to study the strengths and weaknesses of each student in a class so that instruction can be adapted as much as possible to individual learning needs. For this purpose (a) aptitude tests provide clues concerning learning ability, (b) reading tests indicate the difficulty of the material the student can read and understand, (c) norm-referenced achievement tests point out general areas of strength and weakness, (d) criterion-referenced achievement tests describe how well specific tasks are being performed, and (e) diagnostic tests aid in detecting and overcoming specific learning errors (Italics added, p. 454).
While this source may be considered biased toward a positivist or behaviorist model of assessment, the quote nevertheless obliquely identifies another of the many difficulties surrounding assessment in elementary general music. Teachers in general education settings have access to a variety of standardized assessment tools that have been developed and validated for each of the above specific purposes. Elementary music teachers do not have access to comparable testing resources. Although a few quality tests of elementary students’ music aptitude are available (e.g., Primary Measures of Music Audiation, Gordon, 1986), curricular expectations across various music classrooms and at different grade levels render achievement tests nearly impossible to standardize. The measurement of achievement must be based on what students were actually taught (Ravitch, 2010). Furthermore, the lack of standardized achievement tests may be a blessing in disguise, as it prevents comparison of music achievement among schools, districts and states and the inevitable “teaching to the test” that accompanies such comparison (Eisner, 2005). Miller, Linn and Gronlund (2009) also advocated for the use of more authentic assessment strategies, such as portfolios and performance assessments, which music teachers could certainly design. Ongoing use of a variety of assessments, including aptitude tests and authentic measurements of music achievement, could facilitate teaching and learning in a way that increases variance in student performance levels and also raises the mean level of achievement (Eisner, 2005).
Criticisms of testing. Many teachers, parents and other stakeholders prefer that music educators refrain from adding more testing to the educational experiences of children (Shih, 1997). They express concerns that students are tested too often and for the wrong reasons.
Eisner, an outspoken proponent of this viewpoint, stated:
Most efforts at school reform operate on the assumption that the important outcomes of schooling, indeed the primary indices of academic success, are high levels of academic achievement as measured by standardized achievement tests. But what do scores on academic achievement tests predict? They predict scores on other academic achievement tests. But schools, I would argue, do not exist for the sake of high levels of performance in the context of schools, but in the contexts of life outside of the school. The significant dependent variables in education are located in the kinds of interests, voluntary activities, levels of thinking and problem solving, that students engage in when they are not in school. In other words, the real test of successful schooling is not what students do in school, but what they do outside of it (2005, p. 147).
According to Eisner, the optimal result of a unit of study would be that students would be able to ask questions and think critically about the subject at hand. To extrapolate, the real measure of the success of a music program would be evident in students’ musicking, in the questions they posed (musically and verbally), and the degree to which students sought out musical opportunities outside of school and/or applied what they learned in school music to their musicking outside of school. Eisner indicated that a culture of standards-based assessment enslaves teachers merely to enact the will of government and requires students to memorize decontextualized information in order to perform well on a test that has little meaning to the child as an individual (Robinson, 2002). However, Eisner does not argue that individual teachers should not find ways to track the progress of students so that learning can be individualized and optimized. In fact, Eisner argued persuasively for a model of “personalized teaching” (p. 4) in which heterogeneity and diversity of outcome are valued.
At the time of this study, few topics in education are as inflammatory as “high-stakes testing,” which is currently used to make decisions regarding school funding, staffing, and even teacher pay (Ravitch, 2010). Assessments also function as a determinant in such “high stakes” decisions as whether a student passes a grade level, graduates from high school, is certified as a nurse, or granted a variety of other credentials. For the purposes of this paper, I propose that “testing” and “assessment” may serve separate functions. Testing seems intended to track group progress on specific curricular goals, to allow comparisons between classrooms, across demographic groups, and among regions. This testing is imposed from outside individual classrooms, and may or may not accurately reflect an individual student’s progress on the material he was taught (Ravitch, 2010). The current political and social climate sees testing as the way to prove what a child has learned, and as the way to hold schools and teachers
accountable for that learning (Eisner, 2005; Ravitch, 2010). Economic factors also intrude:
failure to raise tests scores results in cuts to funding, school closures, and/or teacher firings.
However, there are few, if any, high stakes assessments in school music programs (Colwell, 2002, p. 195). Although the National Assessment of Educational Progress (NAEP) includes a music test, this measure is administered sporadically (every 8 years or so) and does not disaggregate data at the district, building, classroom or individual student level. Because music programs do not typically test in the same manner as other subject areas, some policymakers propose that budget-conscious leaders might then view them as expendable, due to a lack of proof that learning is taking place. This could place music programs in jeopardy of policy decisions such as reduced funding, reduced staffing and program elimination (Philip, 2001).
Some teachers and researchers suggest that music educators must incorporate more testing as a way to increase funding and improve policies (Brophy, 2000; Campbell & Scott-Kassner, 2002;
Holster, 2005; Niebur, 2001; Peppers, 2010; Talley, 2005). Ravitch (2010) was critical of this
Tests can be designed and used well or badly. The problem [is] the misuse of testing for high-stakes purposes, the belief that tests could identify with certainty which students should be held back, which teachers and principals should be fired or rewarded, and which schools should be closed—and the idea that these changes would inevitably produce better education. Policy decisions that were momentous for students and educators came down from elected officials who did not understand the limitations of
Although the current educational political climate might encourage school music programs to move toward a more standardized and decontextualized testing model that would allow comparisons among schools and districts and communicate testing gains to parents, administrators, policymakers and the community, high-stakes testing is not the kind of assessment discussed and promoted in the current study. Instead, this study investigates assessment as a necessary and natural component of curriculum and instruction (Lehman, 2008;
Even outside the controversial arena of high-stakes testing, any assessment endeavor includes the caveat that “the map is not the territory.” The results of an assessment are not the same as the thing itself: any test or assessment is only a representation of the trait, ability, aptitude, or cognitive skill being measured. Not only are measurement tools inherently subject to numerous possible errors, but also each measurement is only a single snapshot on one day.
Thus, the implementation of assessments and their use in personalizing instruction requires a certain humility, which was described with regard to IQ tests in the Handbook of Psychological
Despite the many relevant areas measured by IQ tests… Many persons with quite high IQs achieve little or nothing. Having a high IQ is in no way a guarantee of success but merely means that one important prerequisite has been met… Although 50-75% of the variance of children’s academic success is dependant on nonintellectual factors (persistence, personal adjustment, family support), most of a typical assessment is spent evaluating IQ. Some of these nonintellectual areas might be quite difficult to assess, and others might be impossible to account for (Groth-Marnat, 2009, pp. 134-135).
Perhaps we as music educators and music education researchers should observe similar humility with regard to assessments of music aptitude and achievement.
Summative and formative assessments. Assessments can have summative and formative purposes. Summative assessments “…generally [take] place after a period of instruction and [require] making a judgment about the learning that has occurred” (Boston, 2003, p. 1).
Assessments given at the end of a unit of study to determine a final level of achievement are summative. Summative assessments have been criticized for being acontextual or atomistic rather than authentic and holistic (Brummett & Haywood, 1997). However, a summative assessment does not need to be a paper and pencil “sit still and write” experience. In the elementary music setting, a summative assessment could be a composition, a performance, or another more holistic measure of musical progress. Many elementary music teachers also believe that summative assessment is and/or should be inextricably linked to grading (HepworthOsiowy, 2004; Peppers, 2010; Schuler, 1996), but summative assessments do not yield an evaluative result, such as a percentage or letter grade, unless a teacher assigns one. A summative assessment could help a teacher who is required to grade, but does not need to be used in this fashion. It could give the teacher information about the skills and concepts the students have mastered or that will need to be revisited at a later time. Moreover, a well-designed summative assessment could contribute to learning even as it measures progress. For example, a capstone composition project that demonstrates a final level of achievement on specific objectives would simultaneously allow summative assessment and continued learning.