Partners for Our Children

blog

Nothing Scary About Outliers

Interested in taking a deep dive into data analytics? Then this blog post (which contains more technical information) is for you!

New Ways to Measure Child Wellbeing

Earlier this year, POC released our first Annual Report of Child Welfare System Performance. In it we introduced new ways to gauge how children are faring in the child welfare system – inspired by and aligned with proposed changes to the federal Child and Family Service Review measures.

Tracking how children are faring in the system is never easy. These new measures are sophisticated and allow us to take a closer look at many aspects of the system. One measure we created takes a look at the amount of time teenagers in state custody spend “on-the-run.” (Note what this measure isn’t: it is not the percentage of teenagers who run away, nor is it the amount of time an individual youth spends on-the-run. Rather, it seeks to combine both of those pieces of information and to summarize overall time spent on-the-run by youth in state custody.)

A Close Look at On-the-Run Information

We know that it’s often hard to track teenagers’ experiences in the system. With this new way of measuring we can better track – and support – these young people. We took a look at all the days teenagers spend in state custody, and calculated the percentage of those days that they are flagged as “on-the-run” in the administrative data. Doing it this way shows us the number and duration of incidents, while accounting for the number of children in state custody and how long they spend in out-of-home care (such as foster care and kin care).

The graph below shows us that, not surprisingly, older youth spend more time on-the-run. It also appears that 2010 had an unusually low percentage of runaway days, especially for 16- and 17-year-olds.

""

Note: You can also see the On-the-Run measure on the Data Portal.

Deciphering Data: A Sometimes Messy Process

Data analysis is a messy process, and this dip in and around 2010 is a great example of that: it looks suspicious. There are a number of possible explanations, for example:

  • There could be some underlying issue where data wasn’t recorded properly in one region of the state, and was eventually corrected;
  • It could be related to the change in Children’s Administration’s data systems that began in 2009;
  • Perhaps there was a crackdown on runaway risks that caused this big dip in teenagers running away;
  • Or maybe it’s just random noise, a big coincidence.

So we did some digging. After looking at the underlying data, we did not find evidence of any single explanation outlined above. Looking by region, for example, just makes the overall 2010 dip look coincidental as different regions seem to have longer dips before or after 2010, and parts of them just seem to “line up” in 2010 to create something noticeable statewide.

From there, we took a step back to see if there’s any direction over the entire course of our data. Indeed, we do see a downward trend in the percent of care-days spent on-the-run. The trend is small and not “statistically significant” for every age group. The problem is, can we trust the trend?

Using statistical significance tests offers a tiny bit of protection against overstating claims, but the presence of this outlier could be pulling the trend-lines downward. Should we be worried about an upward trend except for this outlier? We could run the regression again, leaving out the 2010 point, but throwing out a point is a drastic measure, one to avoid unless we are sure the point is an error. Instead, we’ll use a statistical technique called robust regression which makes the regression resist the pull of outliers.

The results from the robust regression turn out to be very similar to a normal regression – in fact the robust regression is almost identical to the regular regression for thirteen-, fourteen-, and fifteen-year-olds. The sixteen- and seventeen-year-old regression results are more different – a little closer to “no trend” but the changes are small.

None of the trends are “statistically significant” so we are happy to conclude that there is no evidence for an upward trend, even when accounting for the outlying year. We’ve also observed that this is a noisy time series and we should expect to continue to see a lot of variability. The trendlines for the oldest youth are especially noisy due to a smaller population size: seventeen-year-olds have relatively few care-days.

The Power (and Cautions) in Data

Implementing the new measures will allow us to better quantify and understand the experiences of children and families in the child welfare system. But it is always important to take care in data analysis. Outliers – even those looking alarmingly high – shouldn’t cause panic. Instead we study them carefully, check possible explanations, and recognize that one point doesn’t create a trend all by itself.