Validity and Reliability of Scores Obtained on Multiple-Choice Questions:  Why Functioning Distractors Matter

Syed Haris Ali; Patrick A. Carr; Kenneth G. Ruit

doi:10.14434/josotl.v16i1.19106

PDF Validity and Reliability of Scores Obtained on Multiple-Choice Questions: Why Functioning Distractors Matter Validity and Reliability of Scores Obtained on Multiple-Choice Questions: Why Functioning Distractors Matter

Published: Feb 26, 2016

DOI: https://doi.org/10.14434/josotl.v16i1.19106

Keywords:

assessment, psychometrics, validity, reliability

Syed Haris Ali

University of North Dakota School of Medicine and Health Sciences

Patrick A. Carr

University of North Dakota School of Medicine and Health Sciences

Kenneth G. Ruit

University of North Dakota School of Medicine and Health Sciences

Abstract

Purpose Plausible distractors are important for accurate measurement of knowledge via multiple-choice questions (MCQs). This study demonstrates the impact of higher distractor functioning on validity and reliability of scores obtained on MCQs. Methods Free-response (FR) and MCQ versions of a neurohistology practice exam were given to four cohorts of Year 1 medical students. Consistently non-functioning multiple-choice distractors (<5% selection frequency) were replaced with those developed from incorrect responses on FR version of the items, followed by administration of the revised MCQ version to subsequent two cohorts. Validity was assessed by comparing an index of expected MCQ difficulty with an index of observed MCQ difficulty, while reliability was assessed via Cronbach’s alpha coefficient before and after replacement of consistently non-functioning distractors. Result Pre-intervention, effect size (Cohen’s d) of the difference between mean expected and observed MCQ difficulty indices was noted to be 0.4 – 0.59. Post-intervention, this difference reduced to 0.15 along with an increase in Cronbach’s alpha coefficient of scores obtained on MCQ version of the exam. Conclusion Multiple-choice distractors developed from incorrect responses on free-response version of the items enhance the validity and reliability of scores obtained on MCQs.

Downloads

Download data is not yet available.

How to Cite

Ali, S. H., Carr, P. A., & Ruit, K. G. (2016). Validity and Reliability of Scores Obtained on Multiple-Choice Questions: Why Functioning Distractors Matter. Journal of the Scholarship of Teaching and Learning, 16(1), 1–14. https://doi.org/10.14434/josotl.v16i1.19106

Issue

Vol. 16 No. 1 (2016): Journal of the Scholarship of Teaching and Learning

Section

Articles

Authors retain copyright and grant the Journal of the Scholarship of Teaching and Learning (JoSoTL) right of first publication with the work simultaneously licensed under a Creative Commons Attribution License, (CC-BY) 4.0 International, allowing others to share the work with proper acknowledgement and citation of the work's authorship and initial publication in the Journal of the Scholarship of Teaching and Learning.
Authors are able to enter separate, additional contractual agreements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in the Journal of the Scholarship of Teaching and Learning.
In pursuit of manuscripts of the highest quality, multiple opportunities for mentoring, and greater reach and citation of JoSoTL publications, JoSoTL encourages authors to share their drafts to seek feedback from relevant communities unless the manuscript is already under review or in the publication queue after being accepted. In other words, to be eligible for publication in JoSoTL, manuscripts should not be shared publicly (e.g., online), while under review (after being initially submitted, or after being revised and resubmitted for reconsideration), or upon notice of acceptance and before publication. Once published, authors are strongly encouraged to share the published version widely, with an acknowledgement of its initial publication in the Journal of the Scholarship of Teaching and Learning.

Author Biographies

Syed Haris Ali, University of North Dakota School of Medicine and Health Sciences

Resident (postgraduate medical trainee), Dept. of Internal Medicine, University of North Dakota School of Medicine and Health Sciences

Patrick A. Carr, University of North Dakota School of Medicine and Health Sciences

Associate Professor, Dept. of Basic Sciences, Assistant Dean for Faculty Development and Director of Education Resources

Kenneth G. Ruit, University of North Dakota School of Medicine and Health Sciences

Associate Professor, Dept. of Basic Sciences and Associate Dean for Educational Administration and Faculty Affairs

References

Case, S. M., Swanson, D. B., & Ripkey, D. R. (1994). Comparison of items in five-option and extended-matching formats for assessment of diagnostic skills. Academic Medicine: Journal of the Association of American Medical Colleges, 69 (10 Suppl), S1-3.

Cook, D. A., & Beckman, T. J. (2006). Current concepts in validity and reliability for psychometric instruments: Theory and application. The American Journal of Medicine, 119 (2), 166.e7-166.16. doi:S0002-9343(05)01037-5 [pii]

Damjanov, I., Fenderson, B. A., Veloski, J. J., & Rubin, E. (1995). Testing of medical students with open-ended, uncued questions. Human Pathology, 26 (4), 362-365.

De Champlain, A. F. (2010). A primer on classical test theory and item response theory for assessments in medical education. Medical Education, 44 (1), 109-117. doi:10.1111/j.13652923.2009.03425.x [doi]

Downing, S. M. (2004). Reliability: On the reproducibility of assessment data. Medical Education, 38 (9), 1006-1012. doi:10.1111/j.1365-2929.2004.01932.x [doi]

Fajardo, L. L., & Chan, K. M. (1993). Evaluation of medical students in radiology. written testing using uncued multiple-choice questions. Investigative Radiology, 28 (10), 964-968.

Haladyna, T.M., & Downing, S.M. (1993). How many options is enough for a multiple-choice test item? Educational Measurement: Issues and Practice. 53, 999–1009.

Harvill, L.M. (1991). NCME Instructional module: standard error of measurement. Educational Measurement: Issues and Practice. 10 (2), 33–41.

Hojat, M., & Xu, G. (2004). A visitor's guide to effect sizes: Statistical significance versus practical (clinical) importance of research findings. Advances in Health Sciences Education: Theory and Practice, 9 (3), 241-249. doi:10.1023/B:AHSE.0000038173.00909.f6 [doi]

Hutchinson, L., Aitken, P., & Hayes, T. (2002). Are medical postgraduate certification processes valid? A systematic review of the published evidence. Medical Education, 36 (1), 73-91. doi:1120 [pii]

Karras, D. J. (1997). Statistical methodology: II. reliability and variability assessment in study design, part A. Academic Emergency Medicine : Official Journal of the Society for Academic Emergency Medicine, 4 (1), 64-71.

Kern, D.E., Thomas, P.A., & Hughes, M.T. (2009). Curriculum Development For Medical Education: a Six Step Approach, second edition. Baltimore: The Johns Hopkins University Press.

McManus, I. C., Mooney-Somers, J., Dacre, J. E., Vale, J. A., MRCP(UK) Part I Examining Board, & Federation of Royal Colleges of Physicians, MRCP(UK) Central Office. (2003). Reliability of the MRCP(UK) part I examination, 1984-2001. Medical Education, 37 (7), 609611. doi:1568 [pii]

Newble, D. I., Baxter, A., & Elmslie, R. G. (1979). A comparison of multiple-choice tests and free-response tests in examinations of clinical competence. Medical Education, 13 (4), 263-268.

Norman, G. R. (1988). Problem-solving skills, solving problems and problem-based learning. Medical Education, 22 (4), 279-286.

Norman, G. R., Smith, E. K., Powles, A. C., Rooney, P. J., Henry, N. L., & Dodd, P. E. (1987). Factors underlying performance on written tests of knowledge. Medical Education, 21 (4), 297304.

Prihoda, T. J., Pinckard, R. N., McMahan, C. A., & Jones, A. C. (2006). Correcting for guessing increases validity in multiple-choice examinations in an oral and maxillofacial pathology course. Journal of Dental Education, 70 (4), 378-386. doi:70/4/378 [pii]

Schuwirth, L. W., van der Vleuten, C. P., & Donkers, H. H. (1996). A closer look at cueing effects in multiple-choice questions. Medical Education, 30 (1), 44-49.

Shaw, J.M. (1997). Threats to the validity of science performance assessments for English language learners. Journal of Research in Science Teaching, 34, 721–743.

Solano-Flores, G., & Nelson-Barber, S. (2001). On the cultural validity of science assessments. Journal of Research in Science Teaching, 38, 553–573.

Swanson, D. B., Holtzman, K. Z., Allbee, K., & Clauser, B. E. (2006). Psychometric characteristics and response times for content-parallel extended-matching and one-best-answer items in relation to number of options. Academic Medicine : Journal of the Association of American Medical Colleges, 81 (10 Suppl), S52-5. doi:10.1097/01.ACM.0000236518.87708.9d [doi]

Swanson, D. B., Holtzman, K. Z., Clauser, B. E., & Sawhill, A. J. (2005). Psychometric characteristics and response times for one-best-answer questions in relation to number and source of options. Academic Medicine : Journal of the Association of American Medical Colleges, 80 (10 Suppl), S93-6. doi:80/10_suppl/S93 [pii]

Tarrant, M., & Ware, J. (2010). A comparison of the psychometric properties of three- and fouroption multiple-choice questions in nursing assessments. Nurse Education Today, 30 (6), 539543. doi:10.1016/j.nedt.2009.11.002 [doi]

Tavakol, M., & Dennick, R. (2011). Post-examination analysis of objective tests. Medical Teacher, 33 (6), 447-458. doi:10.3109/0142159X.2011.564682 [doi]

Tighe, J., McManus, I. C., Dewhurst, N. G., Chis, L., & Mucklow, J. (2010). The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: An analysis of MRCP(UK) examinations. BMC Medical Education, 10, 406920-10-40. doi:10.1186/1472-6920-10-40 [doi]

Veloski, J. J., Rabinowitz, H. K., Robeson, M. R., & Young, P. R. (1999). Patients don't present with five choices: An alternative to multiple-choice tests in assessing physicians' competence. Academic Medicine: Journal of the Association of American Medical Colleges, 74 (5), 539-546.

Ward, W.C. (1982). A comparison of free-response and multiple-choice forms of verbal aptitude tests. Applied Psychological Measurement. 6 (1), 1–11.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Syed Haris Ali, University of North Dakota School of Medicine and Health Sciences

Patrick A. Carr, University of North Dakota School of Medicine and Health Sciences

Kenneth G. Ruit, University of North Dakota School of Medicine and Health Sciences

References