Research Article no_lock Open Access no_lock Open Peer Review

Compliance with Results Reporting at ClinicalTrials.gov Before and After the 2017 FDAAA Final Rule: A Comparative Analysis

  • Zakariyya Mughal
  • Rosita Fu
  • Thomas Luechtefeld
  • Karen Chiswell
  • Nicole Kleinstreuer
  • Gary Shaw
  • George F. Tidmarsh

Submitted: Dec 17, 2024| Published: Jan 30, 2025 | DOI: https://doi.org/10.70542/rcj-japh-art-vr3aga

100

1.4k

 

 

 

Citations

Views

Downloads

Comments

Views

100

1.4k

 

 

 

Citations

Views

Downloads

Comments

Views

search_icon
search_icon Abstract
search_icon 1 - Background
search_icon 2 - Methods
search_icon 3 - Results
search_icon 4 - Discussion
search_icon 5 - Supplement
search_icon 6 - Reporting results compared when accounting for extensions
search_icon References
Peer Reviews search_icon Tools search_icon
search_icon
search_icon Peer Review 1
search_icon Peer Review 2
search_icon Author Rejoinder 1
Peer Reviews
Authors
Article
Supplemental Materials

Peer Reviews

Peer Review 1 Peer Review 2 Author Rejoinder
1
Peer Review 1

Stuart L. Goldstein

[email protected]

University of Cincinnati College of Medicine,
DOI:https://doi.org/10.70542/rcj-japh-pr-16w21ul

In general, I find the manuscript to be well written and clear. The authors replicate methodology from the study by Anderson et al assessing researcher compliance in reporting data from clinical trials as mandated by the updated 2017 FDAAA Final Rule. The rationale for the current study is to assess if compliance had improved since the Anderson report and the 2017 FDAAA Final Rule.

The authors indeed perform a comprehensive analysis, including pre- and post- assessments in aggregate for the time spans covered as well as annually. They also perform several important sensitivity analyses, selecting for funding source, Trial Phase, drug or device trial and trial primary purpose. They also assessed compliance withing 12- and 36-month time frames. Finally, in a supplemental analysis, they repurposed all these assessments for studies that were granted extensions.

The Results generally show an improvement in compliance with reporting requirements since the Updated Rule and the Anderson publication across in the primary analysis and the sensitivity analyses as well.  Somewhat disappointingly but not unexpectedly, even though compliance rates improved, the rates were about 25% at 12 months and 50% at 36 months. Compliance rates were far better for industry sponsored trials.

The Discussion is appropriate and supported by the data in the Results. The authors’ comments are free from overt speculation or bias.

My main concerns for the manuscript center around incomplete reporting of statistical analyses or results, with a reliance on subjective rather than objective measures of differences between the groups.  In addition, I have several minor comments regarding the structure and reporting in the manuscript. These concerns are highlighted individually below by the manuscript section and not by order of importance.

Abstract

1. HLACT should be spelled out in this first instance.
2. The 95%CIs should be presented for all rates.

Methods

1. The choice of using the permutation test as the primary comparison is acceptable, but this test is usually performed when the sample sizes are small, which is not the case with this study.  Furthermore, the authors do not state the number of shuffles that were performed (usually this is between 1000 to 10,000), which is standard when using the permutation test. Finally, the permutation test is usually used when prior rates are unknown, yet the Anderson paper provides these prior rates.  Other non-parametric tests should have been considered.
2. The authors do not provide a level for significance in the Methods, nor do they report the statistical package used, both of which are common practice in medical reporting.
3. The authors do not compare the Kaplan Meier curves in each of the analyses, which is possible by long-rank test.

Results

1. The authors also provide several Kaplan-Meier graphs to provide a visual comparison of reporting compliance rates between different cohorts in the sensitivity analyses.  In the Results, the authors use vague terms such as “generally improved”, “much higher rates” and “increased”, “significant improvement”, without any statistical comparison. Kaplan Meier curves can be compared statistically for objective differences using the log-rank test.
2. The authors have instances of data interpretation in the Results that should be left for the Discussion.

a. “This may reflect strict regulatory oversight of pharmaceutical interventions, generally with more structured timelines and well-defined reporting requirements, whereas different device types have varying reporting requirements and often involve longer timelines that require iterative development.
b. “This provides strong statistical evidence that the improvements in reporting rates between Window 1 and Window 2 (12 months: +14.9%, 36 months: +8.5%) represent genuine changes in trial reporting behavior.”

3. The 95%CI should be provided for all rates in every instance.
4. No need to have a “+” sign when the text states and increase.
5. Some of the numbers don’t match when added in the Results. For example, 40.8% + 8.5% doesn’t add up to 49.2%.
6. Some of the rates have one significant digit and others have none.

Discussion

1. The discussion is appropriate and not biased or speculative. It would have been interesting for the authors to have assessed whether there was a reporting bias against negative trials. I realize this is out of the scope of the current manuscript, but the authors could have cited literature that would support such a speculation.

2
Peer Review 2

Lakhmir Chawla

[email protected]

Veterans Affairs Medical Center,
DOI:https://doi.org/10.70542/rcj-japh-pr-1dlyc48

Mughal et al have conducted an analysis of reporting compliance in clinicaltrials.gov before and after the FDAAA. The analysis compares two time windows that are both 4 years in duration. The analysis finds that the FDAAA was associated with an increase in 12-month and 36 reporting rates.

Overall, this is a timely article about an important issue for physicians, scientists, and patients alike.

Major Critiques:

1. The selection of windows and time for follow-up is rational, but this is not well described in the Methods. A Figure in the Supplement might be useful to illustrate this decision and follow-up.

2. It is surprising that window 1 had more HLACTs than window 2. Is there a reason for this? Why is this the case? How can you be certain there is not ascertainment bias in the way studies were selected to account for this difference?

3. In the statistics section, which permutation test was done – and why was test selected – more information should be included

4. Section 3.3 should be renamed – statistical analysis should not be in the Results.

5. Discussion – the order of the discussion is non-standard. The standard approach is:

a. The findings from this analysis
b. What others have shown
c. Significance of the previous and current findings d. Strengths
e. Limitations
f. Summary

While this recommendation is stylistic, I believe that this will enhance the flow of these interesting findings.

6. Figure 4 – seems odd that the ‘other’ category got worse – this should be commented on in the Discussion.

Minor Critiques:

Abstract


1. The acronym HLACT appears to be utilized without first having it written out.

2. The use of a reference in the abstract is non-standard and the entire sentence presumes information which has not yet been introduced. This should be corrected/modified.
3. The term ‘consistent majority’ is non-standard.
4. The second sentence of the conclusions is confusing and needs to be re-written.

5. Figure 1, Anderson study should have a footnote
6. 5.1.1 What is pseudocode?

 

Author Rejoinder

Zakariyya Mughal

[email protected]

Insilica,

We appreciate the reviewers thorough feedback on the study design and requests for clarification.

Permutation Tests Rationale

Typically, permutation tests are performed for small sample sizes which is not the case with the sample sizes from Window 1 at N = 14,174 and Window 2 at N = 9,880. The number of shuffles used is given in the caption of the table (“Permutation test results at (N = 50,000 replicates), stratified by Trial Characteristics”), but this should be placed in the text as well.

We also performed the more widely used parametric chi-squared tests, but these were not reported in the paper or supplement, as both tests led to the same conclusion in that all strata are less than a level of significance set at 0.001 (table 1). The chi-squared test makes one additional assumption that the test statistic (in this case, the difference in frequencies) is chi-squared distributed. The permutation test most directly simulates the null hypothesis without this additional assumption.

The reason that we chose the permutation test despite the large sample sizes and greater computational requirements is that we wanted to ensure consistency across the various stratified analyses in that it would be free from any distributional assumptions. Differences in the trial type distributions is discussed in the Supplement section “Trial Type Composition changes between Window 1 and Window 2” and as can be seen in the figure, there is quite a bit of variation across the windows.

Level of significance

We have added text to the Methods with the level of significance.

Statistical software packages

The software package used for the Kaplan-Meier analysis is ggsurvfit v1.1.0 under R v4.2.3. The permutation test was implemented using a shuffling procedure from NumPy v1.24.2 under Python v3.10.13.

Log-rank test

Yes, this additional analysis would greatly clarify and quantify these results. We can see in Table 2 that most strata show agreement between the log-rank test and the 36-month permutation test, with three notable exceptions. ‘Funding: Other’ and ‘Phase: N/A’ have p-values very close to the alpha = 0.001threshold in the permutation test (p = 0.00048 and p = 0.00094 respectively) while showing no significance in the log-rank test (p = 0.432 and p = 0.137). ‘Purpose: Other’ presents a different pattern, showing clear significance in the permutation test (p = 0.0) but not in the log-rank test (p = 0.0257).

Both tests are useful as log-rank uses the entire event data rather than a single point in time, while the permutation test at 36 months represents the point in time when all applicable clinical trials should have submitted data whether or not they have been granted an extension.

Reporting of 95% CI intervals

We have added these to the supplement for Version 2 of the article.

Reporting of values

Regarding the use of a “+” prefix alongside “increase”, the choice to do this is more to emphasize that these values are changes in percentages (i.e., indicating the unit of the value) especially when they are next to other percentages values.

We have looked through the manuscript to ensure that all relevant reported values are rounded to 1 significant digit. Furthermore, this means that, for example when the text has 40.8% + 8.5% not adding up to 49.2%, this is because the actual values were 40.79% + 8.45% = 49.24%.

Clarification on the window selection

There is a timeline in the supplement, but to summarize, this was chosen to have a similar sized window to that of Anderson et al. (2015) and then have a long enough time to ensure that trials towards the end of the stop date can submit their result information.

Window 2 has fewer trials than Window 1

Window 2 has an additional criteria that is applied in order to more closely match the regulation which is that trials in Window 2 must also start after the rule effective date of Jan 18th 2017. Trials in Window 1 do not have this additional criteria as the publication of the rule was done in 2016.

In regards to ascertainment bias, the HLACTs (highly likely applicable clinical trials) are approximations of the actual ACT (applicable clinical trials). Since applicability is not fully public, we implement inclusion/exclusion criteria outlined in The Final Rule algorithmically to both study windows. (Apart from the initial date of the rule described above, the algorithm is consistent across both windows). The team manually verified sub-samples to ensure that the criteria were properly implemented. We compute confidence intervals to further assess sampling and selection bias to obtain robustness measurements.

Pseudocode

Pseudocode is a simplified, human-readable way of describing an algorithm without using actual programming language syntax. This enables technical and non-technical people to understand the logic of a program without the ambiguity of prose or extraneous implementation details.

To address any difficulties in reading pseudocode, funding source categorization is now explained in prose in the main part of the paper: “Trials are classified based on their funding sources in the following priority order…”.