Unpacking Averages: Device Inspection Citations That Frequently Precede Warning Letters

Overview

In this month’s post, in the medical device realm I explore what kinds of inspection citations most often precede a warning letter. In this exercise, I do not try to prove causation. I am simply exploring correlation. But with that caveat in mind, I think it’s still informative to see what types of inspectional citations, in a high percentage of cases, will precede a warning letter. And, as I’ve said before, joining two different data sets – in this case inspectional data with warning letter data – might just reveal new insights.

Background

Before diving into the analysis, it’s important to understand the data and outside trends or forces that might act on it. We have just gone through an extraordinary two years where, in unprecedented fashion, FDA’s inspection process was essentially shut down. Further, from a warning letter standpoint, without inspection data, FDA focused in other compliance realms than it typically might. Further, during the pandemic, we had a change in presidential administrations that by most standards was significant.

As a result, I start by looking at the rate of inspection citations compared to the rate of warning letters to see what it might tell us about the underlying data.

When I look at the chart, I see three relatively clear time periods.

The Obama Administration years characterized by relatively consistent, and relatively high, numbers of citations and numbers of warning letters over the years FY 2011 through 2016.
The early Trump Administration years, which would be FY 2017 through 2019. In those years, the number of citations stayed relatively high as facility inspections continued, but the warning letters fell off dramatically. During that time period, inspection citations were simply not leading to warning letters at the same rate they had previously.
The COVID years of fiscal 2020 and fiscal 2021. In those years, in person inspections stopped, and so citations fell off precipitously. But warning letters ticked up slightly as they were directed toward unapproved products like fake COVID tests and unauthorized personal protective equipment-- warning letters that did not require facility inspections.

Thus, if I wish to focus on what inspection observations lead to warning letters, I need to recognize that during the early Trump years, frankly very few inspectional observations led to warning letters. And because the COVID experience altered the number of inspections and the nature of warning letters issued, I need to be aware of that too.

Given those anomalies, what might this data tell us about the future? Just a short while ago FDA recommenced inspections. Further, we are now under a Democratic President whose administration bears similarities to the Obama Administration. At least on the surface, it seems likely that we may return to a pattern in these data more akin to the Obama years. Thus, if we focus on the Obama years, this analysis might give guidance on what we should expect over the next couple of years.

Methodology

Inspection Data Set

I loaded the data set of all inspections organized by facility establishment number. There were 255,000 inspections in the data set which goes back to 2009. The data include such things as the date of the inspection and whether there were citations issued. It also includes geographic location. Importantly, it also includes an inspection ID.

I filtered to select only those for medical devices. That produced about 40,000 inspections.

FDA enters the exact same citations according to different project areas within their inspectional workflow, but from my standpoint, that simply produces duplication which amounts to noise. As a result, I also filtered to identify only unique inspections regardless of the associated project area. That produced just over 31,000 inspections.

Citations Data Set

The actual content of the citations is located in a different data set. These are organized by inspection ID. There are about 220,000 total citations in this data set. I filtered for only those that relate to devices, which produced just over 41,000 citations. There are about 2700 unique device citations that FDA used. FDA groups those 2700 unique citations into about 460 categories. Down below, those categories are listed as the “short description.” These categories may reflect anywhere from one to perhaps a dozen unique citations.

Warning Letter Data Set

This is a data set that identifies the company that received the warning letter, the date of the warning letter and, importantly, the facility to which the warning was directed. When I filter this based on the program area for medical devices, I get about 1600 warning letters, again going back to fiscal 2009. Note that this data set does not include the actual warning letter texts. It is just a table listing the warning letters sent. For my post next month, I downloaded the last five years of warning letter texts to analyze the actual content of the warning letters, but I will not be talking about that in this post.

Joinder

That is all the raw data, and this is where it gets interesting (at least for a regulatory data scientist). I combined the different data sets to be able to see connections among them.

Going from left to right, I start with my core data set being the medical device warning letters. Then, to the right of that, merging the data sets on common establishment registration numbers, I add on the device inspections that relate to those manufacturing establishments that received warning letters.

My next step, again going right, is to add in all of the citations associated with a given inspection. I can do this because the citations databases and the inspection database are both organized by an inspection reference number, so I can connect the dots between the citations that flowed from a specific inspection.

Ultimately my goal is to figure out what inspection citations preceded warning letters. Now this might seem arbitrary, but it is based on my professional experience that the most relevant inspections typically occur within the year that preceded the issuance of the warning letter. Thus, I filtered the inspections using a 365 day window. That left just over 8000 citations in a database that covered the years 2009 through 2021. Those 8000 citations all occurred within 365 days prior to the issuance of a warning letter for the facility to which the citations related.

Putting the Data in Context

I want to show changes over time, so I sorted the 8000 citations by fiscal year. Then, I want to make sure that I am only dealing with repetitive observations in a given fiscal year where I would have enough data to draw some reasonable conclusion. Therefore, I arbitrarily set a threshold that there had to be at least five such citations in a given year, regardless of the facility, to use the data. It does not seem meaningful to talk about numbers less than that in a given year.

That actually removed quite a few citations. There were many instances where anywhere from 1 to 5 citations of a given kind were issued in a given year. Even though I said the Obama years were relevant, I did not want to go back more than 10 years from now. The world just changes too much in that amount of time.

For the years of interest, 2011 through 2019 inclusive (I am going to ignore the COVID years), there were just over 400 categories of citations that met these criteria. In other words, there were an average of about 45 categories per year of citations that preceded a warning letter during the 365 day window.

Remember that my overall objective is to find those citations that most frequently occur within one year prior to a warning letter. On the one hand, I could just use the raw numbers to assess frequency. But the problem with raw numbers is that they do not take into account which citations are simply more common, even if they are less serious. There may be some citations that FDA gives out in many cases, and so it is not surprising that those citations might precede a warning letter. I want to focus on the more serious citations that might actually trigger a warning letter.

A concrete example might make this easier to appreciate. Let’s say in a given year, one of the most common inspection citations is for inadequate complaint handling procedures. Let’s say in fiscal 2014, FDA issued 1000 such citations. Let’s say that 20 of them were directed at facilities that within the next 365 days received a warning letter.

I would argue that receiving an inspection citation for inadequate complaint handling really doesn’t tell you much about your risk of getting a warning letter. Only 2% of facilities that received such citations got a warning letter within the next 365 days. In this hypothetical, it seems that such citations are more common than serious.

Therefore, to figure out which citations are more likely indicators that a warning letter might follow, I needed to normalize the data by calculating appropriate denominators for each category of citations. In other words, I needed to calculate how many such citations there were in a given year in a given category, whether or not they preceded a warning letter, to use as a denominator. Once I did that, I then simply divided 1) the number of citations that preceded a warning letter by 2) the total number of that specific citation for that given fiscal year. By normalizing the data in this way, the percentages are more likely to be meaningful indicators that a warning letter will follow.

In contrast to the complaint handling example, hypothetically let’s say that FDA in fiscal 2014 issued 10 citations to the effect that a correction or removal conducted to reduce a risk to health posed by device was not reported to FDA. And let’s say that five times in that year such citations preceded warning letter. In other words, in fiscal 2014, 50% of such citations preceded a warning letter.

If your firm, after an inspection, receives such a correction or removal citation of the kind described above, statistically it is more likely that your firm will receive a warning letter than if you had received a citation for inadequate complaint handling, based purely on the hypothetical data. Again, I’m just talking correlation and not causation. But this is true even though 20 inadequate complaint handling citations preceded a warning letter in FY 2014, and only 5 correction or removal citations preceded warning letter that year.

Obviously, this data analysis does not take into account the unique situations presented in a given inspection beyond the inspectional citations listed. I am only explaining why I wish to divide the citations preceding a warning letter data by the frequency of such citations in order to normalize the data and give them some context. The frequency of citations matters.

Once I the normalize data for the whole data set, I picked 50% probability as a good cut off to show the highest likely inspectional observations to lead to a warning letter. I could have picked any percentage I wished. The lower the percentage, the more types of citations the analysis would report. By picking 50%, it produced 14 citation categories which seems like a good set on which to focus. If I lowered the threshold to 40%, there were almost 40 citation categories that met that criteria. If you want the list associated with that lower threshold, drop me a note.

Visualization

The 50% threshold led to the following chart:

Notice, as predicted from the chart above, there are no data points meeting this filter after 2014.

Results in English

You may prefer those results in table form with an example of the actual citation to give you a better understanding of the category. In this case, I selected categories where they met the criteria in at least one year. Indeed, none of the categories met the criteria in multiple years.

Here is that table:

Short Descriptions	Example Citation
Design input – documentation	Design input requirements were not fully documented.
Design output – documentation	Design output was not adequately documented before release
Design plans – Lack of or inadequate	The design plan does not describe the design and development activities and define responsibility for implementation of design and development activities.
Design review – documentation	The design review results, including identification of the design, the date, and the individual performing the review, were not documented and filed in the design history file.
Design validation – simulated testing	The design was not validated using production units under actual or simulated use conditions.
Distribution records	Distribution records do not include the name and address of the initial consignee, the identification and quantity of devices shipped, the date shipped, and control numbers.
Evaluation, timeliness, identification	Complaints representing events that are MDR reportable were not promptly reviewed, evaluated, and investigated by a designated individual and clearly identified.
Incoming acceptance records, documentation	Acceptance or rejection of incoming product was not documented.
Info evaluated to determine if event was reportable	The written MDR procedure does not include documentation and recordkeeping requirements for all information that was evaluated to determine if an event was reportable.
Personnel	Personnel do not have the necessary education, background, training, and experience to perform their jobs.
Quality policy and objectives	The quality policy, quality objectives, and [sic] was not established by management with executive responsibility.
Report of risk to health	A correction or removal, conducted to reduce a risk to health posed by a device, was not reported in writing to FDA.
Sampling methods – Lack of or inadequate procedures	Procedures to ensure sampling methods are adequate for their intended use have not been adequately established.
Servicing – Lack of or inadequate procedures	Procedures or instructions for performing servicing activities and verifying that servicing meets specified requirements have not been adequately established.

I would note that when I reduced the threshold criteria to 40%, and got roughly 40 categories, several of those categories started to show up in multiple years.

To pick an example unique citation from the data set to illustrate a given short description category, I used machine learning. Broadly speaking, I took all of the citations that fit within the given short description category, and I selected a representative example on the basis of which citation used words that were most commonly used in all of the citations. As a result, perhaps not surprisingly, I ended up picking the longer citations that included more of the keywords.

Interpretation

High-Level Meaning

I’d like to reiterate that all I’m really doing here is looking for correlation, i.e. those inspectional citations that from a purely statistical standpoint frequently precede the issuance of a warning letter. I’m not trying to prove that a given inspectional observation necessarily caused the warning letter to be sent.

Given the outline of the political and environmental changes from 2011 through 2021, it’s not surprising that the data emphasize citations from the early years of that decade. There were no inspection observations during the Trump years or during the COVID years that in more than 50% of the instances preceded a warning letter.

It’s also interesting that there’s no repetition in the filtered citations from year to year. That suggests that there aren’t simple inspectional citation categories that reliably, year after year, meet the 50% threshold test of preceding a warning letter. But there were repetitions when I lowered the threshold to 40%.

As already explained, in this analysis we are focused on the normalized data, having divided each category by the total number of citations in that category. The chart would look very different if we were simply counting the number of citations in a given category that precede a warning letter. Such a chart would be dominated by the most frequent citations such as failure to undertake adequate complaint handling. In contrast, the data in the chart above are arguably more meaningful because they are put in context for how frequently a given citation is issued.

More Specific Findings

At a more granular level, we can see that some deficiencies related to the design controls often will precede a warning letter. Frankly, all elements of a quality system are important, but you can imagine that FDA might be particularly disturbed if they do not have confidence that the product was well-designed to meet its intended use.

There are also categories that you might interpret as indicative of a significant failure of the quality system. For example, we could imagine that FDA would be quite concerned if a company does not have:

A quality policy.
Information adequate to evaluate whether an MDR is necessary.
In the context of risk management, an adequate assessment of risk to determine whether a recall might be necessary.

It’s not hard to see in those cases why FDA might conclude that a warning letter is appropriate.

Conclusion: A Balancing Act

I had to make some choices about how high-level vs. specific I should get in presenting these data.

On the one hand, I avoided doing this analysis at a higher level, for example compressing the over 400 inspectional citation categories into say 40 that correspond more directly with the regulations, organizing the citations into high-level categories like design controls, MDR compliance and labeling requirements. It’s easy to do that, but then a lot of information is lost in the generalization.

For example, while there are dozens of different observations that relate to design controls in some way, only five were identified in this analysis as preceding a warning letter in over 50% of the cases. Thus, while five design control citations precede a warning letter 50% of the time, there were dozens of other design control observations that did not rise to the level these five did. If I simply reported out the results at a high level (i.e. design controls in general), we would lose that more specific insight of the five that rise to the top.

On the other hand, I resisted getting too granular to the point where the results might be considered merely anecdotal, driven by unseen and unreported idiosyncratic forces. By requiring at least five citations in any category in any year to be considered, I tried to stay away from the unique circumstances that might be behind a rarely used citation.

In the same way, I could have sorted the data by such factors as the size of the company, or a particular country in which the facility resided or other such demographic factors. But I found that when I did that, it produced truly small numbers that were anecdotal. I wanted to stay at a high enough level to discern meaningful statistical associations and trends.

These are simply the judgments I made. There are many different ways to do this analysis, and in the coming months I will undoubtedly revisit this topic in a different light.

Attorney Bradley Merrill Thompson is the Chairman of the Board and Chief Data Scientist for EBG Advisors and a Member of the Firm at Epstein Becker Green.

The opinions expressed in this publication are those of the author.

Tags: Bradley Merrill Thompson, FDA, Inspection Citation, Medical Devices, Unpacking Averages™, Warning Letter

Unpacking Averages: Device Inspection Citations That Frequently Precede Warning Letters

Overview

Background

Methodology

Results in English

Short Descriptions

Example Citation

Interpretation

Conclusion: A Balancing Act

Search This Blog

Blog Editors

Authors

Related Services

Topics

Archives

Epstein Becker Green Blogs

Subscribe

Privacy Preference Center

Strictly Necessary Cookies

Performance Cookies