Would it surprise you if I told you that a popular and well-respected machine learning algorithm developed to predict the onset of sepsis has shown some evidence of racial bias?[1]  How can that be, you might ask, for an algorithm that is simply grounded in biology and medical data?  I’ll tell you, but I’m not going to focus on one particular algorithm.  Instead, I will use this opportunity to talk about the dozens and dozens of sepsis algorithms out there.  And frankly, because the design of these algorithms mimics many other clinical algorithms, these comments will be applicable to clinical algorithms generally.

Continue Reading Unpacking Averages: Understanding the Potential for Bias in a Sepsis Prediction Algorithm, a Case Study

In prior posts here and here, I analyzed new data obtained from FDA through the Freedom of Information Act about FOIA requests.  I looked at response times and then started to dive into the topics that requesters were asking about.  This is the third and final post on this data set, and it builds on the last post by taking the topics identified there to explore success rates by topic.  From there, I look at who is asking about those topics and how successful those individual companies are in their requests.

Continue Reading Unpacking Averages: Success Rates for FDA FOIAs by Topic and Requester

It is certainly easy, when writing code to accomplish some data science task, to start taking the data on face value.  In my mind, the data can simply become what they claim to be.  But it’s good to step back and remember the real world in which these data are collected, and how skeptical we need to be regarding their meaning.  I thought this month might be an opportunity to show how two different FDA databases produce quite different results when they should be the same.
Continue Reading Unpacking Averages: The Difference Between Data and the Truth: Comparing FDA’s UDI Database with FDA’s 510(k) Database