In this episode of Education Research Rounds, we analyze Washington DC's ambitious $33 million High-Impact Tutoring Initiative, implemented in response to COVID-19 learning disruption. The 2022-2023 program served over 5,000 at-risk students through 27 different tutoring providers, offering an average of 27 sessions per student at a cost of $6,426 per student. While the program showed some improvements in school attendance, the academic impacts were surprisingly small - with effect sizes of just 0.05-0.06 standard deviations in math and reading, well below typical tutoring program effects of 0.30-0.40. Through our analysis, we explore implementation challenges, discuss the trade-offs between scale and quality, and consider important lessons for districts planning similar interventions. This study raises crucial questions about program design, cost-effectiveness, and the complexities of scaling educational interventions while maintaining quality.
[Chapter 1 - Introduction]
ALANNAH: Welcome, everyone, to AI-Learn Insights and our podcast series Education Research Rounds. I'm Alannah, and with me is my co-host, Eric. Today, we're diving into a fascinating new report on one of the most discussed interventions in education right now - High-Impact Tutoring.
ERIC: That's right, Alannah. We're looking at a first-year implementation report from Washington DC's Office of the State Superintendent of Education, or OSSE. The title is 'Implementation of the OSSE High Impact Tutoring Initiative', and it was published in August 2024.
ALANNAH: This is particularly interesting because we're seeing a lot of districts implement tutoring programs post-pandemic, but we don't often get such detailed implementation data. DC made a pretty substantial investment here, right?
ERIC: Absolutely. We're talking about a three-year, $33 million investment. The program was specifically designed to support students classified as 'at-risk' or those who experienced disrupted instruction during COVID-19. It's quite an ambitious undertaking.
ALANNAH: Can you explain what they mean by 'at-risk' in this context?
ERIC: Sure. In DC, students are classified as 'at-risk' if they qualify for TANF or SNAP benefits, have been identified as homeless, are in foster care, or if they're high school students who are at least a year older than expected for their grade.
ALANNAH: And this study is looking at the first year of implementation, correct? What kind of data are we working with?
ERIC: Yes, it covers the 2022-2023 school year. The researchers had access to quite a comprehensive dataset - we're talking about attendance records, academic assessments, demographic data, and even student surveys. They were able to track both implementation and outcomes across different schools and student groups.
ALANNAH: One thing I find interesting is the scale. This wasn't just a pilot program - they were working across multiple schools and grade levels.
ERIC: Right, and they worked with both traditional public schools and charter schools. The program involved 14 organizations providing direct tutoring services, plus another 13 providers through something called City Tutor DC. It's really a district-wide effort to scale up tutoring access.
ALANNAH: Before we dive into what they found, can you explain what they mean by 'high impact' tutoring? How is this different from regular tutoring?
ERIC: That's a great question. High-impact tutoring typically involves three key elements: small group sizes - usually no more than four students per tutor, frequent sessions - at least three times per week, and the same tutor working consistently with the same students. It's meant to be more intensive and personalized than traditional drop-in tutoring.
ALANNAH: Before we get into the findings, can you tell us who conducted this research?
ERIC: The research team was led by Cynthia Pollard from UnboundEd, along with several researchers from Stanford University based at their National Student Support Accelerator. Their vision is "that every K-12 student in need will have access to an effective tutor that champions and ensures their learning and success."
ALANNAH: Excellent. When we come back, we'll dive into what they found.
[Chapter 2 - Results] 3:26
ALANNAH: Welcome back to Education Research Rounds. We're discussing DC's $33 million high-impact tutoring initiative, and now it's time to look at what they found. Eric, let's start with the simple numbers - how many students did they reach?
ERIC: They served 5,135 students, that's over five thousand students, with each student receiving an average of 27 tutoring sessions. If we do the math, that works out to about $6,426 per student, or roughly $236 per session.
ALANNAH: Those are very high costs. Now, let's talk about impact. The study looked at both academic outcomes and attendance. What did they find?
ERIC: The findings are... well, let's just say they're not what you might expect given the investment. Looking at academic outcomes, the effect sizes were surprisingly small. They are downright underwhelming.
ALANNAH: Can you break down those effect sizes for our listeners? And maybe explain what we typically look for in education research?
ERIC: Sure. In education research, we typically consider 0.2 standard deviations to be a small effect, 0.5 medium, and 0.8 large. For this program, in math for grades K-8, they found only a 0.05 standard deviation improvement. In reading, it was just 0.06.
ALANNAH: Wait... those numbers seem incredibly small. How do they compare to what we typically see in tutoring research?
ERIC: That's exactly right. Typical high-impact tutoring programs usually show effects of 0.3 to 0.4 standard deviations. These results are well below what we'd consider even a small effect. It's barely noticeable.
ALANNAH: What about the PARCC scores? Did they show different results? For our listeners, PARCC stands for Partnership for Assessment of Readiness for College and Careers. It's a standardized test used to assess students' academic achievement in English Language Arts/Literacy and Mathematics. It was used in several states, including Washington DC, to measure student progress and college/career readiness. In the context of this study, PARCC was DC's annual standardized assessment during the 2022-23 school year, administered to students in grades 3 through 11. It served as one of the key metrics to measure academic achievement.
ERIC: Actually, the PARCC results were even more concerning. They showed negative effects: -0.1 standard deviations in ELA and -0.05 in math. Even students who received more tutoring sessions showed larger negative effects. Let that sink in - students who received tutoring actually performed worse than students who didn't receive tutoring at all. Even more troubling, students who received more tutoring sessions showed larger negative effects.
ALANNAH: So where was the bright spot? I know they found some positive results with attendance?
ERIC: Yes, attendance was the one area where they saw some improvement. Students were 6.9% less likely to be absent on tutoring days. The effect was particularly strong for middle school students, who were 11.4% less likely to be absent, and for chronically absent students, showing a 7.3% reduction.
ALANNAH: But even there, we should talk about cost-effectiveness, right? Can you break down what that means in actual days of attendance?
ERIC: Absolutely. The average improvement translated to about 2.3 more days of attendance per year. If we do the math, that means we're spending about $2,794 per additional day of attendance per student. Even for the highest-impact group - the chronically absent students - it works out to about $1,236 per additional day. I am not sure if we can call that meaningful improvement, given the cost.
ALANNAH: Those numbers really put things in perspective. Are there any other outcomes we should discuss?
ERIC: They did collect some survey data showing positive student experiences with tutoring, but given the cost and scale of this intervention, the minimal academic impact is concerning.
[Chapter 3 - Methodology] 7:52
ALANNAH: Let's talk about how this program was actually implemented. One thing that jumped out to me was the number of different organizations involved in delivering the tutoring.
ERIC: Yes, this is crucial. They had 14 organizations providing direct tutoring services, plus another 13 providers through CityTutor DC. That's 27 different organizations delivering what's supposed to be 'high impact' tutoring.
ALANNAH: That raises a number of red flags about consistency, doesn't it?
ERIC: Absolutely. When you look at Table A1 in the appendix, you see huge variations in how these programs were implemented. Some providers met with students five times a week, others only twice. Session lengths ranged from 20 minutes to 90 minutes. Group sizes varied from one-on-one to groups of four. One wonders also whether there was a common design and training for tutoring organizations. Or, were they all left to their own devices.
ALANNAH: So, we're not really looking at one tutoring program, are we? We're looking at 27 different versions of tutoring.
ERIC: Exactly. Big red flag indeed. And this has major implications for interpreting the results. When we see those small effect sizes, we have to wonder: Is it because tutoring doesn't work, or because there was too much variation in how it was delivered?
ALANNAH: There's also a methodological issue here, right? The study didn't have a true control group?
ERIC: That's correct. Schools selected which students would receive tutoring, often choosing those who were struggling the most. Without a randomized control group, it's really hard to isolate the impact of the tutoring itself from all other factors.
ALANNAH: Can you explain for our listeners why that matters?
ERIC: Think of it this way - if you're selecting students who are struggling the most to receive tutoring, and then comparing their outcomes to students who weren't selected, you're not making an apples-to-apples comparison. The fact that these students narrowed the gap at all might actually be more meaningful than it appears, given they started further behind. Also, we have considerable variation in those who received tutoring. Many students, it appears, were not only starting behind --- but they were also starting way way behind. For example, some of the students might not have been ready for tutoring at all. Other life factors might have been more pressing. But I am speculating here.
ALANNAH: Though with that kind of investment - $6,426 -- per student - shouldn't we expect to see more substantial gains regardless?
ERIC: That's the big question. And it relates back to the implementation issues. With so many providers and so much variation in delivery, there's a real possibility that resources were spread too thin, trying to coordinate across all these different organizations rather than focusing on program quality and consistency.
ALANNAH: It's a classic scale versus quality trade-off, isn't it?
ERIC: Exactly. Sometimes in our rush to help as many students as possible, we can end up diluting the very things that make an intervention effective in the first place.
[Chapter 4 - Discussion] 11:09
ALANNAH: Let's step back and look at the bigger picture here. Eric, how should we think about the cost-benefit ratio of this program?
ERIC: Well, we want to be thoughtful about this. While $33 million is a significant investment, we have to remember this was a response to an unprecedented situation with COVID learning loss. Districts were under immense pressure to act quickly.
ALANNAH: That's a good point. Though even considering the urgency, the academic returns seem unimpressive for the investment.
ERIC: They do. And I think there's an interesting disconnect between the data and how it's presented in the report. The researchers frame this as a success story, particularly emphasizing the attendance improvements.
ALANNAH: I noticed that too. They seem to present any positive movement as a win, without really contextualizing the size of these effects relative to the investment or typical tutoring outcomes.
ERIC: Right. Though I can understand why. No one wants to discourage efforts to help struggling students. But I think we do need to be clear-eyed about what worked and what didn't if we want to improve.
ALANNAH: Speaking of improvement, what changes would you suggest for the next phase of this initiative?
ERIC: I see at least three main areas for potential improvement. First, they might want to consider consolidating providers. Having 27 different organizations delivering tutoring makes it really hard to maintain consistent quality.
ALANNAH: Yes. It might make political sense to involve so many organizations, but it doesn't lead to good design. What else, Eric?
ERIC: Second, they need clearer standards for implementation. The variation in session frequency, duration, and group size across providers suggests a need for more standardization.
ALANNAH: And the third?
ERIC: Better targeting of resources. Some students received over 50 sessions while others got fewer than 5. They need to better understand why this variation occurred and ensure students are getting enough sessions to make a difference.
ALANNAH: Those sound like constructive suggestions. Anything else you think they did well that should be maintained?
ERIC: Actually, yes. Their data collection was impressive, and they did reach their target population of at-risk students. The attendance improvements, while expensive, show that the program made some difference in students' engagement with school. Those are foundations they can build on.
ALANNAH: So, it's not about abandoning the effort, but rather refining it?
ERIC: Exactly. The goal isn't to criticize, but to learn and improve. High-impact tutoring has strong research support - we just need to figure out how to implement it effectively at scale.
ERIC: Alannah, what struck you most about this study?
ALANNAH: You know, what really stands out to me is that this data set is actually a rich trove of information that could yield more insights. The researchers reported aggregate results across all 27 providers, but I'd be really interested in seeing which providers had better results than others. The overall effect sizes might be masking some important variations - maybe some providers achieved those 0.3 - 0.4 standard deviation improvements we typically see in tutoring research, while others had minimal or negative effects. That kind of analysis could tell us a lot about what implementation approaches actually work at scale.
ERIC: That's a fantastic point. The answers to improving this program might actually be hiding in the data they already have.
ERIC: Anything else?
ALANNAH: You know, I wonder if there's a lesson here about scale. Instead of one big $33 million initiative trying to solve everything at once, maybe DC would have been better off running multiple smaller experiments.
ERIC: What do you mean by that?
ALANNAH: Well, they could have tried different approaches with smaller groups first - maybe test out various tutoring models, different provider arrangements, various session frequencies - and see what works best before scaling up. Start small, learn what works, then expand.
ERIC: That makes a lot of sense. It's the difference between placing one big bet versus running several smaller tests, and iterate to learn what works best.
ALANNAH: Right. And with these smaller experiments, you could actually have proper control groups and better data on what's really driving improvements. Instead of trying to hit a home run with one big program, you could get there with a series of well-designed base hits.
ERIC: That's a really practical suggestion for districts considering similar initiatives.
[Chapter 5 - Closing] 15:47
ALANNAH: We're coming to the end of our discussion of DC's High Impact Tutoring Initiative. Before we close, I want to acknowledge something important - DC took a bold step here. They invested significantly in trying to help their most vulnerable students, and they did it with remarkable transparency.
ERIC: Absolutely. And the research team gave us an incredibly detailed look at both implementation and outcomes. That kind of transparency is crucial for improving educational interventions.
ALANNAH: This study raises some really important questions for districts considering similar programs. Eric, what do you think are the key questions education leaders should be asking?
ERIC: I think the big one is about scale versus quality. How do we maintain program quality while serving large numbers of students? And then there's the question of cost-effectiveness - how do we balance the urgent need to help students with making sure we're using resources effectively?
ALANNAH: And I think we need to consider the complexity of implementation. Twenty-seven different providers - that's a lot to manage. But you know what? These are exactly the kinds of real-world implementation challenges we need to understand better. This initiative sets a solid foundation, especially for addressing disrupted education post-pandemic. But there is much left to explore‚ like how to enhance scalability, better serve students with disabilities, or engage English learners more comprehensively.
ERIC: The attendance findings are also really intriguing. Even if the academic gains were minimal, the fact that students were more likely to come to school on tutoring days tells us something important about engagement.
ALANNAH: Agreed. And as we discussed earlier, there's probably more to learn from this rich dataset about what worked and what didn't across different providers and contexts.
ALANNAH: You know, one other thought occurred to me. When we look at studies like this, focusing just on test scores and attendance numbers, we might be missing something important. For some of these students, especially after the pandemic, these tutoring sessions might have been their first steps toward reconnecting with school and teachers. Sometimes small steps - like building trust with a caring adult or starting to see school as a positive place - don't show up in our traditional metrics.
ERIC: That's a really thoughtful point. These softer outcomes are harder to measure but could be foundational for longer-term success.
ALANNAH: Exactly. While we need to be clear-eyed about program effectiveness and costs, we should also recognize that rebuilding educational engagement is sometimes about small wins that add up over time.
ALANNAH: To our listeners - thank you for joining us for this episode of Education Research Rounds. We hope this discussion helps you think critically about education interventions and implementation challenges in your own context. Be sure to join us next time when we'll be discussing... well, Eric, what are we discussing next time?
ERIC: You'll have to tune in to find out! But I promise it will be equally thought-provoking.
ALANNAH: For Education Research Rounds and AI-Learn Insights, I'm Alannah...
ERIC: And I'm Eric. Thanks for listening.
Share this post