Validity and Reliability of Scores Obtained on Multiple-Choice Questions: Why Functioning Distractors Matter

Main Article Content

Syed Haris Ali
Patrick A. Carr
Kenneth G. Ruit

Abstract

Purpose Plausible distractors are important for accurate measurement of knowledge via multiple-choice questions (MCQs). This study demonstrates the impact of higher distractor functioning on validity and reliability of scores obtained on MCQs. Methods Free-response (FR) and MCQ versions of a neurohistology practice exam were given to four cohorts of Year 1 medical students. Consistently non-functioning multiple-choice distractors (<5% selection frequency) were replaced with those developed from incorrect responses on FR version of the items, followed by administration of the revised MCQ version to subsequent two cohorts. Validity was assessed by comparing an index of expected MCQ difficulty with an index of observed MCQ difficulty, while reliability was assessed via Cronbach’s alpha coefficient before and after replacement of consistently non-functioning distractors. Result Pre-intervention, effect size (Cohen’s d) of the difference between mean expected and observed MCQ difficulty indices was noted to be 0.4 – 0.59. Post-intervention, this difference reduced to 0.15 along with an increase in Cronbach’s alpha coefficient of scores obtained on MCQ version of the exam. Conclusion Multiple-choice distractors developed from incorrect responses on free-response version of the items enhance the validity and reliability of scores obtained on MCQs.

Downloads

Download data is not yet available.

Article Details

How to Cite
Ali, S. H., Carr, P. A., & Ruit, K. G. (2016). Validity and Reliability of Scores Obtained on Multiple-Choice Questions: Why Functioning Distractors Matter. Journal of the Scholarship of Teaching and Learning, 16(1), 1–14. https://doi.org/10.14434/josotl.v16i1.19106
Section
Articles
Author Biographies

Syed Haris Ali, University of North Dakota School of Medicine and Health Sciences

Resident (postgraduate medical trainee), Dept. of Internal Medicine, University of North Dakota School of Medicine and Health Sciences

Patrick A. Carr, University of North Dakota School of Medicine and Health Sciences

Associate Professor, Dept. of Basic Sciences, Assistant Dean for Faculty Development and Director of Education Resources

Kenneth G. Ruit, University of North Dakota School of Medicine and Health Sciences

Associate Professor, Dept. of Basic Sciences and Associate Dean for Educational Administration and Faculty Affairs

References

Case, S. M., Swanson, D. B., & Ripkey, D. R. (1994). Comparison of items in five-option and extended-matching formats for assessment of diagnostic skills. Academic Medicine: Journal of the Association of American Medical Colleges, 69 (10 Suppl), S1-3.

Cook, D. A., & Beckman, T. J. (2006). Current concepts in validity and reliability for psychometric instruments: Theory and application. The American Journal of Medicine, 119 (2), 166.e7-166.16. doi:S0002-9343(05)01037-5 [pii]

Damjanov, I., Fenderson, B. A., Veloski, J. J., & Rubin, E. (1995). Testing of medical students with open-ended, uncued questions. Human Pathology, 26 (4), 362-365.

De Champlain, A. F. (2010). A primer on classical test theory and item response theory for assessments in medical education. Medical Education, 44 (1), 109-117. doi:10.1111/j.13652923.2009.03425.x [doi]

Downing, S. M. (2004). Reliability: On the reproducibility of assessment data. Medical Education, 38 (9), 1006-1012. doi:10.1111/j.1365-2929.2004.01932.x [doi]

Fajardo, L. L., & Chan, K. M. (1993). Evaluation of medical students in radiology. written testing using uncued multiple-choice questions. Investigative Radiology, 28 (10), 964-968.

Haladyna, T.M., & Downing, S.M. (1993). How many options is enough for a multiple-choice test item? Educational Measurement: Issues and Practice. 53, 999–1009.

Harvill, L.M. (1991). NCME Instructional module: standard error of measurement. Educational Measurement: Issues and Practice. 10 (2), 33–41.

Hojat, M., & Xu, G. (2004). A visitor's guide to effect sizes: Statistical significance versus practical (clinical) importance of research findings. Advances in Health Sciences Education: Theory and Practice, 9 (3), 241-249. doi:10.1023/B:AHSE.0000038173.00909.f6 [doi]

Hutchinson, L., Aitken, P., & Hayes, T. (2002). Are medical postgraduate certification processes valid? A systematic review of the published evidence. Medical Education, 36 (1), 73-91. doi:1120 [pii]

Karras, D. J. (1997). Statistical methodology: II. reliability and variability assessment in study design, part A. Academic Emergency Medicine : Official Journal of the Society for Academic Emergency Medicine, 4 (1), 64-71.

Kern, D.E., Thomas, P.A., & Hughes, M.T. (2009). Curriculum Development For Medical Education: a Six Step Approach, second edition. Baltimore: The Johns Hopkins University Press.

McManus, I. C., Mooney-Somers, J., Dacre, J. E., Vale, J. A., MRCP(UK) Part I Examining Board, & Federation of Royal Colleges of Physicians, MRCP(UK) Central Office. (2003). Reliability of the MRCP(UK) part I examination, 1984-2001. Medical Education, 37 (7), 609611. doi:1568 [pii]

Newble, D. I., Baxter, A., & Elmslie, R. G. (1979). A comparison of multiple-choice tests and free-response tests in examinations of clinical competence. Medical Education, 13 (4), 263-268.

Norman, G. R. (1988). Problem-solving skills, solving problems and problem-based learning. Medical Education, 22 (4), 279-286.

Norman, G. R., Smith, E. K., Powles, A. C., Rooney, P. J., Henry, N. L., & Dodd, P. E. (1987). Factors underlying performance on written tests of knowledge. Medical Education, 21 (4), 297304.

Prihoda, T. J., Pinckard, R. N., McMahan, C. A., & Jones, A. C. (2006). Correcting for guessing increases validity in multiple-choice examinations in an oral and maxillofacial pathology course. Journal of Dental Education, 70 (4), 378-386. doi:70/4/378 [pii]

Schuwirth, L. W., van der Vleuten, C. P., & Donkers, H. H. (1996). A closer look at cueing effects in multiple-choice questions. Medical Education, 30 (1), 44-49.

Shaw, J.M. (1997). Threats to the validity of science performance assessments for English language learners. Journal of Research in Science Teaching, 34, 721–743.

Solano-Flores, G., & Nelson-Barber, S. (2001). On the cultural validity of science assessments. Journal of Research in Science Teaching, 38, 553–573.

Swanson, D. B., Holtzman, K. Z., Allbee, K., & Clauser, B. E. (2006). Psychometric characteristics and response times for content-parallel extended-matching and one-best-answer items in relation to number of options. Academic Medicine : Journal of the Association of American Medical Colleges, 81 (10 Suppl), S52-5. doi:10.1097/01.ACM.0000236518.87708.9d [doi]

Swanson, D. B., Holtzman, K. Z., Clauser, B. E., & Sawhill, A. J. (2005). Psychometric characteristics and response times for one-best-answer questions in relation to number and source of options. Academic Medicine : Journal of the Association of American Medical Colleges, 80 (10 Suppl), S93-6. doi:80/10_suppl/S93 [pii]

Tarrant, M., & Ware, J. (2010). A comparison of the psychometric properties of three- and fouroption multiple-choice questions in nursing assessments. Nurse Education Today, 30 (6), 539543. doi:10.1016/j.nedt.2009.11.002 [doi]

Tavakol, M., & Dennick, R. (2011). Post-examination analysis of objective tests. Medical Teacher, 33 (6), 447-458. doi:10.3109/0142159X.2011.564682 [doi]

Tighe, J., McManus, I. C., Dewhurst, N. G., Chis, L., & Mucklow, J. (2010). The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: An analysis of MRCP(UK) examinations. BMC Medical Education, 10, 406920-10-40. doi:10.1186/1472-6920-10-40 [doi]

Veloski, J. J., Rabinowitz, H. K., Robeson, M. R., & Young, P. R. (1999). Patients don't present with five choices: An alternative to multiple-choice tests in assessing physicians' competence. Academic Medicine: Journal of the Association of American Medical Colleges, 74 (5), 539-546.

Ward, W.C. (1982). A comparison of free-response and multiple-choice forms of verbal aptitude tests. Applied Psychological Measurement. 6 (1), 1–11.