There is currently a great deal of interest in stimulating and evaluating teacher effectiveness, and in developing incentives for defining, evaluating, and recognizing teacher effectiveness
through policy. This effort is an important adjunct to the NCLB initiative to ensure “highly qualified teachers” for all students, which has had a beneficial effect in many states, strengthening recruitment incentives, teacher preparation, and certification, and dramatically reducing the number of emergency certified teachers in many locations. The HQT provision should be continued, with modest amendments,1
while efforts are made to take steps toward measuring and strengthening teacher effectiveness in additional ways.
In thinking about strategies for measuring teacher effectiveness for purposes of recognizing and rewarding teachers, as well as informing teacher education and professional development programs, it is important to consider both the availability and accuracy of particular measures and the potential incentive effects of their use. For any high stakes purpose associated with personnel decision making or compensation, multiple measures should be used, as all measures give a partial picture of teacher performance and are subject to error.
In addition, the system should be designed to operate so that teachers are not penalized for teaching the
students who have the greatest educational needs. Incentives should operate to recognize and reward
teachers who work with challenging students. This requires sensitivity to student and classroom
characteristics in the evaluation system.
In a system for assessing teacher effectiveness, three kinds of evidence can be considered in combination with one another:
• Performance on teaching assessments measuring standards known to be associated with student
learning (including teacher performance assessments and standards-based teacher evaluations);
• Evaluation of teaching practices that are associated with desired student outcomes and achievement
of school goals (through systematic collection of evidence about teacher planning and instruction,
work with parents and students, and contributions to the school).
• Contributions to student learning and other student outcomes (from classroom assessments and
documentation, as well as valid tests when they are appropriate);
These three strategies are all used in the Denver, CO system of teacher compensation based on knowledge, skills, and performance, which is the most advanced such system in the nation, and one of the few to have survived the test of time. (For more detail, see http://denverprocomp.org.)
Performance-Based Assessments of Teaching
There is growing evidence that some well-designed performance-based assessments of teaching
detect aspects of teaching that are significantly related to teacher effectiveness, as measured by
student achievement gains. These include standardized teacher performance assessments like those
used for National Board Certification and for beginning teacher licensure in states like Connecticut
and California and standards-based teacher evaluation systems used in some local districts. The
value of using such assessments is that they can both document broader aspects of teacher
effectiveness and can be used to help teachers develop greater effectiveness, as participation in
these assessments has been found to support learning both for the teachers who are being evaluated
and teachers or principals who are trained to serve as evaluators.
1) A number of studies have found that the National Board Certification assessment process identifies
teachers who are more effective than others who have not achieved certification.2 Designed to
identify experienced accomplished teachers, a number of states and districts already use National
Board Certification as the basis for salary bonuses or other forms of teacher recognition, including
selection as a mentor or lead teacher. Teachers generally perceive the assessments, which are
specific to each subject area, as good representations of their work as teachers and as fair
assessments of their performance. Studies suggest that participating in the assessment also helps
improve their practice. Thus, this way of encouraging and recognizing teacher effectiveness may also
help stimulate improvement. California offers a $20,000 bonus, paid over four years, to Boardcertified
teachers who teach in high-need schools, which has helped to distribute these accomplished
teachers more fairly to students who need them.
2) In some states, teacher performance assessments for new teachers, modeled after the National
Board assessments, are being used either in teacher education, as a basis for the initial licensing
recommendation (CA), or in the teacher induction period, as a basis for moving from a probationary to
a professional license (CT). These assessments require teachers to document their plans and
teaching for a unit of instruction, videotape and critique lessons, and collect evaluate evidence of
student learning. These assessments have also been found to help teachers improve their practice.
Beginning teachers’ ratings on the Connecticut BEST assessment have been found to significantly
predict their students’ value-added achievement on state tests.3 A study of predictive validity is also
underway for the Performance Assessment for California Teachers (PACT). The Teach Act contains a
provision to develop a nationally available beginning teacher performance assessment, based on
these models, which could provide a useful measure of effectiveness for new teachers and could
inform assessments of teacher education.
3) Finally, standards-based teacher evaluations used by some districts have been found to be
significantly related to student achievement gains for teachers and to help teachers improve their
practice and effectiveness.4 Like the teacher performance assessments described earlier, these
systems for observing teachers’ classroom practice are based on professional teaching standards
grounded in research on teaching and learning. They use systematic observation protocols to examine
teaching along a number of dimensions. The Denver compensation system, which uses such an
evaluation system as one of its components, describes the features of the system used there as
including: well-developed rubrics articulating different levels of teacher performance; inter-rater
reliability; a fall-to-spring evaluation cycle; and a peer and self-evaluation component.
Evaluation of Successful Teaching Practices
Effectiveness can be documented by evaluating teaching practices that are associated with desired
student outcomes and the achievement of school goals through systematic collection of evidence
about teacher planning and instruction, work with parents and students, or contributions to the
school. This might be part of a portfolio of teacher evidence about performance. The practices
included should be those that are associated with improvements in students’ school performance and
learning. For example, a teacher might document how she increased student attendance or
homework completion through regular parent conferences and calls home and show evidence of
changes in these student outcomes, as well as other outcomes associated with them, such as
In some systems, teachers receive bonuses or stipends for demonstrating that they have implemented
particular new practices associated with school-wide or district-wide goals, such as the use of common
literacy practices across classrooms, or the use of formative assessments in planning and modifying
instruction, or the implementation of a new system of writing instruction. Where possible, these
practices are documented along with evidence of how the changes have affected student participation
and learning. The rationale for using these measures of effective teaching practices is that they
support teacher development and school-wide change initiatives, and are related to improvements in
the conditions for student learning.
Teacher Contributions to Student Learning
Many states have developed data systems that could allow investigation of value-added gains in
student achievement on state tests. This offers promising new areas of research to track student
learning over time and to examine factors associated with that learning. Some have suggested that
these data could be used to evaluate teachers as well. However, there are many obstacles to using
state test data for purposes of determining the effectiveness of individual teachers for personnel
purposes. First, many other factors influence student gains beyond teachers’ efforts, including school
resources and policies that shape the conditions of learning (class sizes, availability of specialists,
administrative actions), materials that are available and the teaching strategies that are possible,
home situations that can affect students’ ability to attend school and focus productively on school
work at school and at home, and the prior education of students.
Value-added measures of teacher “effects” vary for a given teacher from year to year, class to class,
and subject to subject. They are influenced by the effects of students’ prior year teachers as well as
other student variables.
For reasons of the availability of properly scaled tests in different grade levels and subject areas and
the availability of adequate data for individual teachers, value-added student achievement data from
state tests are typically available for no more than about 30% of elementary teachers and perhaps
10% of high school teachers. The use of these data for looking at individual teacher effects is
complicated further by data availability issues for students, due to mobility and special needs. (See
appendix.) Thus, such data may be useful for contributing to evaluations of individual teachers’
effectiveness for a minority of teachers only as part of a broader collection of evidence about the
teacher’s performance and practices, and only when adjustments are made to ensure that individual
student gains are properly represented.
States should be encouraged to build better data systems that include information about student
progress on a range of measures, even though it may never be possible to assess value-added gains
for most teachers on large-scale state achievement tests. These data systems will be useful for
looking at student achievement for teachers in the aggregate – to examine, for example, the effects of
teacher education and professional development or school improvement initiatives.
Other kinds of evidence can and should be assembled about student learning. In some districts and
schools, pre- and post- measures of student learning in specific subject areas and classrooms are
collected. These may be scored writing samples or reading samples, mathematics assessments,
assessments of science or history knowledge, or even musical performances. These typically provide
better measures of classroom learning in a specific course or subject area because they are
curriculum-specific and can offer more authentic measures of student learning. They are also more
likely to capture the effects of a particular teacher’s instruction and be available for most students. In
some schools, teachers use their own fall and spring classroom assessments (or pre- and post-unit
assessments) as a way of gauging student progress. These measures can also be tailored for the
learning goals of specific students (for example, special education students or English language
learners.) As part of a portfolio of evidence, these measures can document teacher effectiveness in
achieving specific curriculum goals. In Denver’s system, teachers set two goals annually in
collaboration with the principal, and document student progress toward these goals using district,
school, or teacher-made assessments to show growth.
Finally, other evidence of teacher effectiveness related to specific achievements can be part of a
portfolio of evidence. For example, a teacher might document the Westinghouse science competition
awards she helped students win, or specific break-throughs achieved by her special education
students, with evidence of her role in supporting these accomplishments.
In any of these systems, it is also important to include evidence about the students being served and
to consider their progress in appropriate ways. Evidence in medicine as well as teaching indicates
that where assessments do not fairly represent professional practice, incentives can be created to
avoid serving high-need clients, which works against the goals of the system. (For example, mortality
ratings for cardiac surgeons in one state led doctors to stop serving very sick patients. Similarly, test
score ratings have led some schools to keep and push out low-scoring students.) To create systems
that measure and encourage teacher effectiveness, it is important to use multiple measures of
practice, performance, and outcomes so that a more complete picture of practice emerges, so that
assessments are fair and produce the right incentives, and so that educators are encouraged to
improve what they do instead of trying to game an unfair system.
1 Amendments to the HQT provision should include: 1) requiring that both elementary and secondary teachers demonstrate teaching skills as well as content knowledge (such teaching skills to be demonstrated through performance based evaluation during student teaching or internship or by the passage of a teacher
performance assessment) ; 2) ensuring that teachers entering under alternative certification pathways
complete training and assessments that allow them to meet state standards of content and teaching skills
before they are identified as “highly qualified;” 3) and allowing states to develop reasonable and appropriate
standards for certifying the content knowledge of teachers whose assignments require them to teach multiple
subjects, subject to approval in their state plans by USDOE.
2 Bond, L., Smith, T., Baker, W., & Hattie, J. (2000). The certification system of the National Board for
Professional Teaching Standards: A construct and consequential validity study (Greensboro, NC:
Center for Educational Research and Evaluation); Cavaluzzo, L. (2004). Is National Board Certification
an effective signal of teacher quality? (National Science Foundation No. REC-0107014). Alexandria,
VA: The CNA Corporation; Goldhaber, D., & Anthony, E. (2005). Can teacher quality be effectively
assessed? Seattle, WA: University of Washington and the Urban Institute; Smith, T., Gordon, B., Colby,
S., & Wang, J. (2005). An examination of the relationship of the depth of student learning and National
Board certification status (Office for Research on Teaching, Appalachian State University).
Vandevoort, L. G., Amrein-Beardsley, A., & Berliner, D. C. (2004). National Board certified teachers and
their students' achievement. Education Policy Analysis Archives, 12(46), 117.
3 Wilson, M. & Hallum, P.J. (2006). Using Student Achievement Test Scores as Evidence of External
Validity for Indicators of Teacher Quality: Connecticut’s Beginning Educator Support and Training
Program. Berkeley, CA: University of California at Berkeley.
4 Milanowski, A.T., Kimball, S.M., White, B. (2004). The relationship between standards-based teacher
evaluation scores and student achievement. University of Wisconsin-Madison: Consortium for Policy
Research in Education.