Brain Imaging in Neonatal Clinical Trials: In Search of a Gold Standard
Article Outline
Abbreviations: CUS, Cranial ultrasound scanning, IVH, Intra entricular hemorrhage, MRI, Magnetic resonance imaging, PVL, Periventricular leukomalacia
Interobserver reliability and accuracy of neonatal cranial ultrasound scans (CUS) interpretation are important issues in neuroimaging. They have significant implications for clinical trials in which brain injury is an outcome of interest and neurodevelopmental prognostication because it is well documented that grades 3 to 4 intraventricular hemorrhage (IVH) and cystic periventricular leukomalacia (PVL) are associated with adverse long-term outcome in preterm infants. The National Institute of Child Health and Human Development Neonatal Network PINO random controlled trial afforded the unique opportunity to retrospectively compare interobserver reliability of the CUS interpretations between the central, “gold standard” readers and to determine whether the interpretations of local readers conformed with those of the central experts.
See related article, p 592
In this issue of The Journal, Hintz et al report that there was good agreement (kappa, 0.60-0.66) between the “gold standard” central experts on CUS examinations that had normal results or demonstrated grades 3 or 4 IVH.1 Congruence improved (kappa, 0.84) when the worst prognostic categories (grades 3 and 4 IVH) were combined as severe IVH. There was also excellent agreement (kappa, 0.76) of experts on whether or not the ventricles were enlarged. Agreement was poor (kappa, 0.20-0.26), however, for mild IVH (grades 1 and 2), considered alone or combined, and for PVL. Because the incidence of PVL was very low, adding PVL to other diagnostic categories was not informative. The specificity of local interpretations was excellent for all grades of IVH. Although the sensitivity of local interpretations was good for the presence of any IVH, grade 4 IVH, and severe IVH (grades 3 and 4 IVH combined), sensitivity was poor for grades 1 and 2 IVH, either alone or combined, and for PVL.
Despite the congruence for the diagnostic category of severe IVH between the gold standard experts and local readers, it is instructive to examine how often the expert and local readers disagreed for the major diagnostic categories of IVH. Overall, 7% to 8% of CUS examinations in this study were misclassified by local readers. Two to three percent of the CUS examinations read by local readers as having no or grades 1 to 2 IVH were judged by the experts to show grades 3 or 4 IVH. Conversely, 4% to 5% of the CUS examinations read by the local readers as Grades 3 or 4 IVH were judged by the central experts to have either no IVH or grades 1 to 2 IVH. Congruence was much worse for PVL, for which 56% and 80% of CUS examinations read as positive for any PVL by each of the 2 central readers were read as negative for any PVL by the local readers. Although a small percent of misclassification may be acceptable when evaluating the results of a large, multicenter RCT, the implications of diagnostic misclassification for individual children and their families may be profound. Review of serial CUS examinations and a second independent reading when grades 3 and 4 IVH are seen would help distinguish these diagnostic categories.
The good interobserver reliability and accuracy for the diagnostic categorization of severe IVH is reassuring when considering neonatal RCTs, in which severe IVH is an important outcome variable, especially because the techniques for obtaining and reading the images at the local level were not standardized and the CUS data used in the trial was extracted from clinical radiology reports. The poor interobserver reliability and accuracy for grades 1 and 2 IVH and for any PVL suggest that CUS is less useful for defining these types of brain injuries. It is not surprising that there is difficulty with the identification and classification of mild IVH. Small subependymal hemorrhages may be missed when the appropriate image is not selected for review. Small amounts of intraventricular blood may not be visible on CUS. The choroids may be misinterpreted as a grade I hemorrhage on the coronal views. Also, in this study, severe IVH, not mild IVH, was the outcome of interest, and the central review was performed retrospectively in a very short period of time. Increased echogenicity in the periventricular regions is very subjective. It is not surprising that there would be variations in reading about whether the echogenicity was “enough” to call PVL. Distinguishing between diagnostic error and variation of interpretation in this study is difficult without the autopsy or magnetic resonance imaging (MRI) confirmation. Other authors have also reported greater interobserver variability between local and expert readers for grade 1 to 2 IVH and PVL compared with grades 3 and 4 IVH.2, 3 Does differentiation of mild grades of IVH matter clinically? Although several studies have reported similar early childhood neurodevelopmental outcomes in very low birth weight infants with no IVH and those with grades 1 or 2 IVH,4, 5, 6 a recent study suggests that grade 2 IVH may be an independent predictor for adverse outcome.7
Can the diagnostic accuracy of CUS for white matter injury be improved? Consistent identification of PVL and ventriculomegaly requires that standard definitions of PVL and ventricular dilation be universally adopted and that serial CUS examinations are performed with similar views during the evolution of white matter injury to detect transient echogenicity, cystic changes, and ventricular dilation. The day 28 CUS examinations used in this study may have been too late to identify transient echogenicity and too early to identify subsequent ventricular dilation. Even with ideal conditions, the capacity of ultrasound scanning to identify white matter injury is technically limited. MRI at term-equivalent age detects many subtle white matter abnormalities not evident on CUS, especially with the use of diffusion tensor imaging, and volumetric assessments.8, 9, 10 These subtle white matter abnormalities are likely to be particularly useful in predicting the later, and much more frequent, neurobehavioral and developmental sequalae associated with preterm birth.11, 12 Although it is tempting to suggest that MRI just before discharge replace CUS as the gold standard for neurodevelopmental prognostication, MRI is not as readily accessible, may require sedation or anesthesia, and is much more costly compared with CUS. In addition, severe IVH, ventricular dilation, and cystic white matter injury seen on MRI, which are associated with significant motor and cognitive impairment in early childhood, are usually readily apparent on simultaneous CUS examinations.11, 13, 14, 15
In this study, the gold standard experts disagreed among themselves on the major diagnostic categories 2% to 3% of the time. Some of these discrepancies may be explained by legitimate differences in interpretation of intracerebral echodensities or by poor image quality. Radiologists generally acknowledge an interobserver variability of 10% for most types of examination.16 A review of 20 years of medical litigation on radiologists found that diagnostic errors could be classified as: 1) inadequate technique, 2) perceptual errors, 3) lack of knowledge, and 4) errors of judgment.17 Of these, perceptual errors are the most frequent cause of missed diagnoses.18 Other common causes are misleading or incomplete clinical information, unavailability of earlier studies, over-calls, and missing subtle findings when other more apparent findings are seen. Improved training, availability of internet access for reference materials, attention to viewing conditions (eg, appropriate lighting), dual reading, availability of earlier studies, and standardization of performing, reviewing, and reporting studies may all assist in more accurate, diagnostic reading of neonatal head ultrasound scanning examinations.16
How should the results of this study be applied to future multicenter, neonatal random controlled trials in which brain injury is an outcome of interest? First, clinical trial design and analyses must take into account the inherent interobserver variability in interpretation of imaging studies. Second, the technique of obtaining and interpreting and reporting CUS examinations at the local level must be standardized. Third, timing and type of examinations should be determined by the information required. When the question is whether an early intervention increases the risk of brain injury, a CUS must be performed before the intervention. A repeat CUS examination at 10 days of age will determine the maximum severity of IVH, and at 28 days it will demonstrate post-hemorrhagic hydrocephalus. Although this sequence of CUS may also demonstrate cystic antenatal and postnatal PVL, persistent echogenicity and ventriculomegaly, a MRI at term equivalent, with appropriate techniques, will best show the extent of white matter injury. Fourth, prospective, independent review of CUS and MRI examinations by central experts will quickly identify centers in which local interpretation may be a problem, maximize accuracy, and minimize interobserver variability.
References
- Interobserver reliability and accuracy of cranial ultrasound interpretation in premature infants. J Pediatr. 2007;150:592–596
- Does variation in interpretation of ultrasonograms account for the variation in the incidence of germinal matrix/intraventricular haemorrhage between newborn intensive care units in New Zealand?. Arch Dis Child Fetal Neonatal Ed. 2005;90:F494–F499
- Variable interpretation of ultrasonograms may contribute to variation in the reported incidence of white matter damage between newborn intensive care units in New Zealand. Arch Dis Child Fetal Neonatal Ed. 2006;91:F11–F16
- . Neurodevelopmental outcome at 12 months of age related to cerebral ultrasound appearances of high-risk preterm infants. Early Hum Dev. 1985;11:1230132
- . Neurologic and developmental status related to the evolution of visual-motor abnormalities from birth to 2 years of age in preterm infants with intraventricular hemorrhage. J Pediatr. 1989;115:296–302
- Probability of neurodevelopmental disorders estimated from ultrasound appearance of brains of very preterm infants. Dev Med Child Neurol. 1987;29:3–11
- . Grades I-II intraventricular hemorrhage in extremely low birthweight infants: effects on neurodevelopment. J Pediatr. 2006;149:152–154
- Microstructural brain development after perinatal cerebral white matter injury assessed by diffusion tensor magnetic resonance imaging. Pediatrics. 2001;107:455–460
- Axial and radial diffusivity in preterm infants who have diffuse white matter changes on magnetic resonance imaging at term-equivalent age. Pediatrics. 2006;117:376–386
- Uncomplicated intraventricular hemorrhage is followed by reduced cortical volume at near-term age. Pediatrics. 2004;114:e367–e372
- . Neonatal MRI to predict neurodevelopmental outcomes in preterm infants. N Engl J Med. 2006;355:685–694
- . Natural history of brain lesions in extremely preterm infants studied with serial magnetic resonance imaging from birth and neurodevelopmental assessment. Pediatrics. 2006;118:536–548
- . Defining the nature of the cerebral abnormalities in the premature infant: a qualitative magnetic resonance imaging study. J Pediatr. 2003;143:171–179
- Neonatal cranial ultrasound versus MRI and neurodevelopmental outcome at school age in children born preterm. Arch Dis Child Fetal Neonatal Ed. 2005;90:F489–F493
- Comparison of findings on cranial ultrasound and magnetic resonance imaging in preterm infants. Pediatrics. 2001;107:719–727
- . Radiology’s Achilles’ heel: error and variation in the interpretation of the Rontgen image. Br J Radiol. 1997;70:1085–1098
- . Malpractice and radiologists in Cook County, Il: trends in 20 years of litigation. AJR. 1995;165:781–788
- . Err in radiology:classification and lessons in 182 cases presented at a problem case conference. Radiology. 1992;183:145–150
PII: S0022-3476(07)00332-0
doi:10.1016/j.jpeds.2007.04.002
© 2007 Mosby, Inc. All rights reserved.
Refers to article:
- Interobserver Reliability and Accuracy of Cranial Ultrasound Scanning Interpretation in Premature Infants
