Methodology
This study measures the effect of peer assessment on students’ academic performance. The study is based on the premise that engagement in peer assessment for formative tasks and coursework assignments help teachers in training develop critical thinking skills and content knowledge which will be reflected by their academic achievements. A quasi experimental design was selected because the research occurred in the natural context and variables were controlled and manipulated (Cohen et al, 2011). True experiments seek to prevent any cross contamination between control and experimental group (Cohen et al, 2011) but the researcher had no control over any sharing that occurred before or after classes as this is a residential programme. The post-test-only, nonequivalent control group design was specifically selected for this study. Here one group received the treatment and the other group did not. Both groups sat the examination which is the post test at the same time. This is consistent with the post-test-only, nonequivalent control group design as outlined in Wiersma and Jurs (2005). Using a quasi-experimental design, the researcher found that peer assessment improves students’ academic performance. The main research question is ‘To what extent does student engagement in peer-assessment improve their academic performance?’
To guide the investigation, the following null and alternative hypotheses were used:
H0: There is no significant difference in the performance of students who were engaged in peer-assessment and those who were not.
H1: Students, who are engaged in peer-assessment, perform significantly better than those who are not.
Procedure. The quasi-experimental design was used therefore there was no need for randomization of groups (Creswell, 2005; Wiersma & Jurs, 2005) however the researcher sought to ensure that both groups were similar. This was achieved by removing students who had previous experience in the course or special knowledge about the course material as well as those who failed the prerequisite course. After this was done students were alternatively placed in the control and experimental groups. The independent samples t-test also determined that the variance of the groups were homogeneous as indicated by p-values greater than 0.05.
The 45 hour course is divided into 4 units but was modified slightly to infuse 2 hours of training in providing constructive feedback, and using analytic rubrics. This was done to ensure that all participants would be comfortable using the provided instruments to rate their peers and also to coach them into providing and receiving feedback in a meaningful non judgmental way. Students completed 3 graded tasks as recommended by the Joint Board of Teacher Education. This included one group task which was an oral presentation, two weeks of supervised practicum which was a co-teaching activity and an individual essay. Students also engaged in in-class micro- teaching activity and one essay as formative tasks. Only members of the experimental group engaged in peer-assessment and evaluation of the formative tasks. The control group received general oral feedback from the course facilitators as usual for formative tasks. All graded work for members of the control group was faculty marked only using the same rubrics. Analytic rubrics were prepared by faculty for the formative tasks while JBTE provided rubrics for the graded tasks. Students used the rubrics to rate their peers’ performance on the stated tasks and wrote extended feedback on the same rubrics. The format of the written feedback included area(s) of strength to be commended, area(s) of concern or deficiency to be improved and a suggestion for improvement. Students were not allowed to respond to the ratings or feedback provided by peers.
Peer assessors were randomly selected to avoid friendship marking. All the names were placed in a bag and the assessee would pull his/her peer assessor’s name from the bag. Faculty and peer assessors engaged in standardization before any marking took place. This allowed for any further clarification to be sought before marking began. Faculty also randomly selected a private student peer assessor who was notified beforehand. This means that each task was peered assessed twice by two different assessors, one student selected and one faculty selected. To maintain anonymity of the second peer-assessor, their name was removed from the bag before entering the classroom. This also supports inter-rater reliability to ensure parity and reliability. To reduce situational errors, all students received the same rubrics, tasks, time allotment, and course content and scheduling. Members of the treatment group engaged in peer assessment while the control group did not. All students sat the comprehensive examination which is set by JBTE at the end of the semester.
Statistical Tests. The results of the examination were analyzed using SPSS. The independent samples T-test was use to compare the means of the groups. By comparing the data sets the researcher was able to determine a statistically significant difference between the performance of students in the treatment group and those in the control group. Inferential statistics were used to enable generalizability of findings from this small group (Macmillan & Schumacher, 2010).
Independent samples T-tests were conducted to compare means of both groups in each of the four graded tasks as well as the overall performance. The rationale was to reassure the researcher that the differences observed are attributed to the treatment. Another statistical measure observed was the Levene’s test for homogeneity of variance. The reason for observing this statistical test is that equality of groups reduces the chances of type I errors.
Ethical and Legal Considerations
Written consent was received from all participants after the college’s review board granted permission. The researcher met with all participants and shared the purpose of the research, explained their rights including voluntary withdrawal. The researcher also explained her responsibility to protect the participants and the practices that would promote confidentiality and anonymity such as the use of pseudonyms, secure holding area for documents and other data. All data and related documents were destroyed at the end of the research to protect the participants and host college.
Validity and Reliability
To ensure validity and reliability the following considerations and actions were taken.
- Selection Bias may threaten the validity of a quasi-experimental research (Wiersma & Jurs, 2005). To address this, the results of the prerequisite course and the students’ GPA were used as antecedent data to promote similarity between groups. Candidates who possessed special characteristics such as previous knowledge of the course were not included in an attempt to control extraneous variables and increase internal validity. The Levene’s Test of equality of variance was also used to determine homogeneity of variance which was indicated by p-values greater than 0.05 on the independent samples t-tests. This measure assumes that both groups are equal.
- Statistical regression also threatens internal validity so attention was given to the results of the prerequisite course as an antecedent in order to exclude outliers such as students who scored exceptionally above or below the class mean. This was done to prevent against statistical regression. Cohen et al (2011) explain that students who score highest on pretests are most likely to score lower on post test and vice versa. By removing those cases the groups are more equal which allowed the researcher to confidently attribute changes to the treatment.
- Recognizing that human judgment and students’ skills in using rubrics and providing constructive feedback may introduce some error in data collection; participants were trained to provide constructive feedback before the treatment began. It should also be noted that standardization sessions were held before a grading exercise so that peer assessors, assesses and faculty would have the same understanding and interpretation of the rubrics. The use of faculty and secret peer assessment to compliment the peer assessment acted as a form of inter-rater reliability.
- One concern related directly to the treatment is diffusion of treatment. This occurs when members of both groups communicate and may share information and knowledge (Creswell, 2005). When this occurs members of the control group may learn about peer assessment from members of the experiment group. To address this both groups were timetabled for the same time to limit their interactions. They were also facilitated at two separate locations.
- To limit compensation rivalry there was strategic information management. Specific details about the expectations about the treatment were only shared with the experimental group. Additionally teacher notes were given to the control group in an attempt t compensate and cause members of the control group to appreciate some benefit.
Sample and Sampling Technique
Eighty year two teachers in training registered to the course Strategies of Teaching and Learning. As this is a quasi-experiment no random selection of participants took place instead the researcher sought similarity of the groups on relevant characteristics (Wiersma & Jurs, 2005).To equate the groups only students studying Early Childhood Education with GPA of 2.7 to 3.2 participated in this study. Students who satisfied these two criteria but were resitting the course were excluded. Students who failed the prerequisite course were also excluded. Only females are registered in this programme currently.
By excluding the stated participants, the researcher sought to control extraneous variables that may impact the outcome (Creswell, 2005). Consider a student who is resitting the course for example, he may have a distinct advantage since he would have already been exposed to the course content, assessment tasks and rubrics aligned to the graded tasks. This could have influenced his or his peer’s performance in the course and provided another explanation for results achieved. While I may not be able to control all extraneous factors (Cohen et al, 2011) I sought to make both the control and experimental group similar so that the findings could be reasonably attributed to peer assessment. All participants are Jamaican except one Haitian and one Panamanian. One was placed in the experimental group and the other in the control group. After the similarity was determined students were randomly assigned to the experimental and control groups (Bastick & Matalon, 2007).
Strengths and Limitations of study
- One strength of this research was the control of extraneous factors by equating the groups and using statistical measures to ensure control f variance. This allows the researcher to confidently attribute changes in academic achievement to peer-assessment.
- There was no replication of the study. Before these finding can be generalized, the researcher would seek reproductivity as this is a desirable quality of a scientific design such as experiment (Creswell and Poth, 2018).
- There is a possible threat to internal validity based on the design used. In that the grades of the prerequisite course were used in determining group assignment hence threat of history may occur.
- The design used was suitable for the research interest and questions. This will increase the validity of the study.
Results and Statistical Significance
The results from independent samples t-t indicate that there was a significant difference between the overall scores obtained for the course for the experiment group (M= 79.62, SD 1.69) and that of the control group (M=76.45, S = 1.75) conditions; t (78) = 8.24, p=0.000. This suggests that there is enough evidence at the 95% confidence level, to allow us to reject the H0: “There is no significant difference in the performance of students who were engaged in peer-assessment and those who were not” and accept the H1: Students, who are engaged in peer-assessment, perform significantly better than those who are not.
The impact of the experiment was evident from the third assessment task, which followed the second administration of the formative task to the experiment group only. A mean difference between both groups of 5.92 was determined, with the experiment group scoring (M=81.57, SD =4.46) and the control group (M=75.65, SD 4.12). When viewed in terms of statistical significance, the results showed a t (78) = 6.17, p=0.000.
The results obtained in the third assessment, when compared to the results obtained in the second assessment task at which point, the experiment group had only experienced one formative task, showed a widening of the gap between the mean difference of both groups. At the second assessment, the mean difference was a mere -0.4 with the control group scoring a slightly higher mean (M= 71.57, SD = 4.27) compared to the experiment group (M=71.17, SD = 4.27). When viewed in terms of statistical significance, the results showed t (78) -0.411, p = 0.682.
The results of the final examination was in line with those obtained in the third assessment and the overall course grade in that the mean for experiment group (M=87.83, SD=1.77) was significantly higher than the control group (M = 82.40, SD = 2.18) conditions t(78)=12.22, p = 0.000
The assessment of the homogeneity of variance between both control and experiment groups was done using the results of the Levene’s Test for Equality of Variances. In all cases, the results satisfied the condition of sig. (p) values greater than 0.05. A sig. value greater than 0.05 indicates that the variability of the scores for both groups are approximately the same, (Bastik & Matalon, 2007).
The findings of this study are consistent with the findings of Sun et al (2015) who found that students who participated in peer assessment performed significantly higher on the corresponding exam items than those who did not (d=.122, t(319) =3.03, p =.001). The researcher also found that exposure to peer assessment yielded better performance on related quizzes in the short term (Cohen’sd=.115,t(298)=2.92,p=.002). Like in this study Sun et al (2015) concluded that the benefits of peer assessment are sustained over a period of time. The findings are also consistent with the work of Hwang et al (2014) who also employed an experimental design. They found that “students who learned with the peer assessment based game development showed learning achievements significantly better than those who learned with the game development”. Finally it seems that peer-assessment can be applied to other contexts in the field of pedagogy to enhance student learning and achievement.
Recommendations
The most significant implication revealed from this study is that students’ engagement in peer assessment will most likely improve their learning and academic performance. I recommend that teachers create opportunities for students to peer assess both formative and summative tasks to develop metacognitive and other analytical skills.
I suggest that to gain maximum benefit from peer assessment, students should both use the teacher created rubrics/ mark scheme, generate their own rubrics/mark schemes and offer subjective feedback. This will promote evaluative thinking and give learners autonomy over their own learning and performance.
This research highlights the need for further research in the area of peer assessment and also self- assessment and use of rubrics and their impact on earning and performance. The relationship between self-efficacy and peer assessment is also an area for future research evolving from this study.
The results of the experiment provides enough evidence that at the 95% confidence level we can reject the Null Hypothesis (H0): “There is no significant difference in the performance of students who were engaged in peer-assessment and those who were not” and accept the H1: Students, who are engaged in peer-assessment, perform significantly better than those who are not. We therefore conclude that the use of peer assessment increases students’ academic performance
Appendix 1: Screenshots from statistical tests
Figure 1: Random Number Generator in Excel
Figure 2: Data entry in SPSS
Figure 3: Labeling of variables in SPSS
Figure 4: T-test Output Table 1
Figure 5: T-test Output Table 2
Reference
Bastick, T. Matalon, B., (2007). Research: New & Practical Approaches Second edition.
Chalkboard Press, Materials Production Unit, Unversity of the West Indies, Mona, Kingston.
Cohen L., Manon, L., & Morrison, K, (2011) Research Methods in Education. 7th Edition.
Routledge
Creswell, J. W., (2005). Educational Research, Planning Conducting, and Evaluating
Quantitative and Qualitative Research
Hwang, G.J., Hung C. M., Shing, N., (2014). Improving learning achievements, motivations
and problem-solving skills through a peer assessment-based game development approach Chen Education Tech Research Dev. 62:129–145 DOI 10.1007/s11423-013-9320-7
Macmillan, J, Schumacher, S. (2010). Research in Education: evidence based inquiry 7th
Edition, Pearson
Sun, D.L., Harris, N., Walther, G., Baiocchi, M., (2015). Peer Assessment Enhances Student
Learning: The Results of a Matched Randomized Crossover Experiment in a College Statistics Class. PLoSONE10 (12):e0143177.doi:10.1371/journal. pone.0143177
Wiersma, W., Jurs, S.G. (2005). Research Methods in Education: An Introduction. Pearson