Validity and Reliability of Scores Obtained on Multiple-Choice Questions: Why Functioning Distractors Matter
Main Article Content
Abstract
Downloads
Article Details
- Authors retain copyright and grant the Journal of the Scholarship of Teaching and Learning (JoSoTL) right of first publication with the work simultaneously licensed under a Creative Commons Attribution License, (CC-BY) 4.0 International, allowing others to share the work with proper acknowledgement and citation of the work's authorship and initial publication in the Journal of the Scholarship of Teaching and Learning.
- Authors are able to enter separate, additional contractual agreements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in the Journal of the Scholarship of Teaching and Learning.
- In pursuit of manuscripts of the highest quality, multiple opportunities for mentoring, and greater reach and citation of JoSoTL publications, JoSoTL encourages authors to share their drafts to seek feedback from relevant communities unless the manuscript is already under review or in the publication queue after being accepted. In other words, to be eligible for publication in JoSoTL, manuscripts should not be shared publicly (e.g., online), while under review (after being initially submitted, or after being revised and resubmitted for reconsideration), or upon notice of acceptance and before publication. Once published, authors are strongly encouraged to share the published version widely, with an acknowledgement of its initial publication in the Journal of the Scholarship of Teaching and Learning.
References
Case, S. M., Swanson, D. B., & Ripkey, D. R. (1994). Comparison of items in five-option and extended-matching formats for assessment of diagnostic skills. Academic Medicine: Journal of the Association of American Medical Colleges, 69 (10 Suppl), S1-3.
Cook, D. A., & Beckman, T. J. (2006). Current concepts in validity and reliability for psychometric instruments: Theory and application. The American Journal of Medicine, 119 (2), 166.e7-166.16. doi:S0002-9343(05)01037-5 [pii]
Damjanov, I., Fenderson, B. A., Veloski, J. J., & Rubin, E. (1995). Testing of medical students with open-ended, uncued questions. Human Pathology, 26 (4), 362-365.
De Champlain, A. F. (2010). A primer on classical test theory and item response theory for assessments in medical education. Medical Education, 44 (1), 109-117. doi:10.1111/j.13652923.2009.03425.x [doi]
Downing, S. M. (2004). Reliability: On the reproducibility of assessment data. Medical Education, 38 (9), 1006-1012. doi:10.1111/j.1365-2929.2004.01932.x [doi]
Fajardo, L. L., & Chan, K. M. (1993). Evaluation of medical students in radiology. written testing using uncued multiple-choice questions. Investigative Radiology, 28 (10), 964-968.
Haladyna, T.M., & Downing, S.M. (1993). How many options is enough for a multiple-choice test item? Educational Measurement: Issues and Practice. 53, 999–1009.
Harvill, L.M. (1991). NCME Instructional module: standard error of measurement. Educational Measurement: Issues and Practice. 10 (2), 33–41.
Hojat, M., & Xu, G. (2004). A visitor's guide to effect sizes: Statistical significance versus practical (clinical) importance of research findings. Advances in Health Sciences Education: Theory and Practice, 9 (3), 241-249. doi:10.1023/B:AHSE.0000038173.00909.f6 [doi]
Hutchinson, L., Aitken, P., & Hayes, T. (2002). Are medical postgraduate certification processes valid? A systematic review of the published evidence. Medical Education, 36 (1), 73-91. doi:1120 [pii]
Karras, D. J. (1997). Statistical methodology: II. reliability and variability assessment in study design, part A. Academic Emergency Medicine : Official Journal of the Society for Academic Emergency Medicine, 4 (1), 64-71.
Kern, D.E., Thomas, P.A., & Hughes, M.T. (2009). Curriculum Development For Medical Education: a Six Step Approach, second edition. Baltimore: The Johns Hopkins University Press.
McManus, I. C., Mooney-Somers, J., Dacre, J. E., Vale, J. A., MRCP(UK) Part I Examining Board, & Federation of Royal Colleges of Physicians, MRCP(UK) Central Office. (2003). Reliability of the MRCP(UK) part I examination, 1984-2001. Medical Education, 37 (7), 609611. doi:1568 [pii]
Newble, D. I., Baxter, A., & Elmslie, R. G. (1979). A comparison of multiple-choice tests and free-response tests in examinations of clinical competence. Medical Education, 13 (4), 263-268.
Norman, G. R. (1988). Problem-solving skills, solving problems and problem-based learning. Medical Education, 22 (4), 279-286.
Norman, G. R., Smith, E. K., Powles, A. C., Rooney, P. J., Henry, N. L., & Dodd, P. E. (1987). Factors underlying performance on written tests of knowledge. Medical Education, 21 (4), 297304.
Prihoda, T. J., Pinckard, R. N., McMahan, C. A., & Jones, A. C. (2006). Correcting for guessing increases validity in multiple-choice examinations in an oral and maxillofacial pathology course. Journal of Dental Education, 70 (4), 378-386. doi:70/4/378 [pii]
Schuwirth, L. W., van der Vleuten, C. P., & Donkers, H. H. (1996). A closer look at cueing effects in multiple-choice questions. Medical Education, 30 (1), 44-49.
Shaw, J.M. (1997). Threats to the validity of science performance assessments for English language learners. Journal of Research in Science Teaching, 34, 721–743.
Solano-Flores, G., & Nelson-Barber, S. (2001). On the cultural validity of science assessments. Journal of Research in Science Teaching, 38, 553–573.
Swanson, D. B., Holtzman, K. Z., Allbee, K., & Clauser, B. E. (2006). Psychometric characteristics and response times for content-parallel extended-matching and one-best-answer items in relation to number of options. Academic Medicine : Journal of the Association of American Medical Colleges, 81 (10 Suppl), S52-5. doi:10.1097/01.ACM.0000236518.87708.9d [doi]
Swanson, D. B., Holtzman, K. Z., Clauser, B. E., & Sawhill, A. J. (2005). Psychometric characteristics and response times for one-best-answer questions in relation to number and source of options. Academic Medicine : Journal of the Association of American Medical Colleges, 80 (10 Suppl), S93-6. doi:80/10_suppl/S93 [pii]
Tarrant, M., & Ware, J. (2010). A comparison of the psychometric properties of three- and fouroption multiple-choice questions in nursing assessments. Nurse Education Today, 30 (6), 539543. doi:10.1016/j.nedt.2009.11.002 [doi]
Tavakol, M., & Dennick, R. (2011). Post-examination analysis of objective tests. Medical Teacher, 33 (6), 447-458. doi:10.3109/0142159X.2011.564682 [doi]
Tighe, J., McManus, I. C., Dewhurst, N. G., Chis, L., & Mucklow, J. (2010). The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: An analysis of MRCP(UK) examinations. BMC Medical Education, 10, 406920-10-40. doi:10.1186/1472-6920-10-40 [doi]
Veloski, J. J., Rabinowitz, H. K., Robeson, M. R., & Young, P. R. (1999). Patients don't present with five choices: An alternative to multiple-choice tests in assessing physicians' competence. Academic Medicine: Journal of the Association of American Medical Colleges, 74 (5), 539-546.
Ward, W.C. (1982). A comparison of free-response and multiple-choice forms of verbal aptitude tests. Applied Psychological Measurement. 6 (1), 1–11.