Machine learning algorithms are rapidly being adopted to aid pedagogical decision-making in applications ranging from grading to student placement. Are these algorithms fair? We prove that, for predicting students' math performance, the standard machine learning practice of selecting a model that maximizes predictive accuracy can result in algorithms that give significantly more benefit of the doubt to White, Asian students and are more punitive to Black, Hispanic, Native American students. This disparity is masked by comparatively high predictive accuracy across both groups. We suggest new interventions that help close this performance gap and do not require the use of a different algorithm for each student group. Together, our results suggest new best practices for applying machine learning to education-related applications.