The Journal of Pediatrics
Volume 150, Issue 6 , Pages 575-577, June 2007

Brain Imaging in Neonatal Clinical Trials: In Search of a Gold Standard

  • Yvonne E. Vaucher, MD, MPH

      Affiliations

    • Division of Neonatal/Perinatal Medicine, Department of Pediatrics, University of California, San Diego, San Diego, California
    • Corresponding Author InformationReprint requests: Yvonne E, Vaucher, MD, MPH, Division of Neonatal/Perinatal Medicine, UCSD Medical Center MC-8774, 200 W Arbor Dr, San Diego, CA 92103-8774.
  • ,
  • Dolores H. Pretorius, MD

      Affiliations

    • Division of Body Imaging, Department of Radiology, University of California, San Diego, San Diego, California

Article Outline

Abbreviations: CUS, Cranial ultrasound scanning, IVH, Intra entricular hemorrhage, MRI, Magnetic resonance imaging, PVL, Periventricular leukomalacia

 

Interobserver reliability and accuracy of neonatal cranial ultrasound scans (CUS) interpretation are important issues in neuroimaging. They have significant implications for clinical trials in which brain injury is an outcome of interest and neurodevelopmental prognostication because it is well documented that grades 3 to 4 intraventricular hemorrhage (IVH) and cystic periventricular leukomalacia (PVL) are associated with adverse long-term outcome in preterm infants. The National Institute of Child Health and Human Development Neonatal Network PINO random controlled trial afforded the unique opportunity to retrospectively compare interobserver reliability of the CUS interpretations between the central, “gold standard” readers and to determine whether the interpretations of local readers conformed with those of the central experts.

See related article, p 592

In this issue of The Journal, Hintz et al report that there was good agreement (kappa, 0.60-0.66) between the “gold standard” central experts on CUS examinations that had normal results or demonstrated grades 3 or 4 IVH.1 Congruence improved (kappa, 0.84) when the worst prognostic categories (grades 3 and 4 IVH) were combined as severe IVH. There was also excellent agreement (kappa, 0.76) of experts on whether or not the ventricles were enlarged. Agreement was poor (kappa, 0.20-0.26), however, for mild IVH (grades 1 and 2), considered alone or combined, and for PVL. Because the incidence of PVL was very low, adding PVL to other diagnostic categories was not informative. The specificity of local interpretations was excellent for all grades of IVH. Although the sensitivity of local interpretations was good for the presence of any IVH, grade 4 IVH, and severe IVH (grades 3 and 4 IVH combined), sensitivity was poor for grades 1 and 2 IVH, either alone or combined, and for PVL.

Despite the congruence for the diagnostic category of severe IVH between the gold standard experts and local readers, it is instructive to examine how often the expert and local readers disagreed for the major diagnostic categories of IVH. Overall, 7% to 8% of CUS examinations in this study were misclassified by local readers. Two to three percent of the CUS examinations read by local readers as having no or grades 1 to 2 IVH were judged by the experts to show grades 3 or 4 IVH. Conversely, 4% to 5% of the CUS examinations read by the local readers as Grades 3 or 4 IVH were judged by the central experts to have either no IVH or grades 1 to 2 IVH. Congruence was much worse for PVL, for which 56% and 80% of CUS examinations read as positive for any PVL by each of the 2 central readers were read as negative for any PVL by the local readers. Although a small percent of misclassification may be acceptable when evaluating the results of a large, multicenter RCT, the implications of diagnostic misclassification for individual children and their families may be profound. Review of serial CUS examinations and a second independent reading when grades 3 and 4 IVH are seen would help distinguish these diagnostic categories.

The good interobserver reliability and accuracy for the diagnostic categorization of severe IVH is reassuring when considering neonatal RCTs, in which severe IVH is an important outcome variable, especially because the techniques for obtaining and reading the images at the local level were not standardized and the CUS data used in the trial was extracted from clinical radiology reports. The poor interobserver reliability and accuracy for grades 1 and 2 IVH and for any PVL suggest that CUS is less useful for defining these types of brain injuries. It is not surprising that there is difficulty with the identification and classification of mild IVH. Small subependymal hemorrhages may be missed when the appropriate image is not selected for review. Small amounts of intraventricular blood may not be visible on CUS. The choroids may be misinterpreted as a grade I hemorrhage on the coronal views. Also, in this study, severe IVH, not mild IVH, was the outcome of interest, and the central review was performed retrospectively in a very short period of time. Increased echogenicity in the periventricular regions is very subjective. It is not surprising that there would be variations in reading about whether the echogenicity was “enough” to call PVL. Distinguishing between diagnostic error and variation of interpretation in this study is difficult without the autopsy or magnetic resonance imaging (MRI) confirmation. Other authors have also reported greater interobserver variability between local and expert readers for grade 1 to 2 IVH and PVL compared with grades 3 and 4 IVH.2, 3 Does differentiation of mild grades of IVH matter clinically? Although several studies have reported similar early childhood neurodevelopmental outcomes in very low birth weight infants with no IVH and those with grades 1 or 2 IVH,4, 5, 6 a recent study suggests that grade 2 IVH may be an independent predictor for adverse outcome.7

Can the diagnostic accuracy of CUS for white matter injury be improved? Consistent identification of PVL and ventriculomegaly requires that standard definitions of PVL and ventricular dilation be universally adopted and that serial CUS examinations are performed with similar views during the evolution of white matter injury to detect transient echogenicity, cystic changes, and ventricular dilation. The day 28 CUS examinations used in this study may have been too late to identify transient echogenicity and too early to identify subsequent ventricular dilation. Even with ideal conditions, the capacity of ultrasound scanning to identify white matter injury is technically limited. MRI at term-equivalent age detects many subtle white matter abnormalities not evident on CUS, especially with the use of diffusion tensor imaging, and volumetric assessments.8, 9, 10 These subtle white matter abnormalities are likely to be particularly useful in predicting the later, and much more frequent, neurobehavioral and developmental sequalae associated with preterm birth.11, 12 Although it is tempting to suggest that MRI just before discharge replace CUS as the gold standard for neurodevelopmental prognostication, MRI is not as readily accessible, may require sedation or anesthesia, and is much more costly compared with CUS. In addition, severe IVH, ventricular dilation, and cystic white matter injury seen on MRI, which are associated with significant motor and cognitive impairment in early childhood, are usually readily apparent on simultaneous CUS examinations.11, 13, 14, 15

In this study, the gold standard experts disagreed among themselves on the major diagnostic categories 2% to 3% of the time. Some of these discrepancies may be explained by legitimate differences in interpretation of intracerebral echodensities or by poor image quality. Radiologists generally acknowledge an interobserver variability of 10% for most types of examination.16 A review of 20 years of medical litigation on radiologists found that diagnostic errors could be classified as: 1) inadequate technique, 2) perceptual errors, 3) lack of knowledge, and 4) errors of judgment.17 Of these, perceptual errors are the most frequent cause of missed diagnoses.18 Other common causes are misleading or incomplete clinical information, unavailability of earlier studies, over-calls, and missing subtle findings when other more apparent findings are seen. Improved training, availability of internet access for reference materials, attention to viewing conditions (eg, appropriate lighting), dual reading, availability of earlier studies, and standardization of performing, reviewing, and reporting studies may all assist in more accurate, diagnostic reading of neonatal head ultrasound scanning examinations.16

How should the results of this study be applied to future multicenter, neonatal random controlled trials in which brain injury is an outcome of interest? First, clinical trial design and analyses must take into account the inherent interobserver variability in interpretation of imaging studies. Second, the technique of obtaining and interpreting and reporting CUS examinations at the local level must be standardized. Third, timing and type of examinations should be determined by the information required. When the question is whether an early intervention increases the risk of brain injury, a CUS must be performed before the intervention. A repeat CUS examination at 10 days of age will determine the maximum severity of IVH, and at 28 days it will demonstrate post-hemorrhagic hydrocephalus. Although this sequence of CUS may also demonstrate cystic antenatal and postnatal PVL, persistent echogenicity and ventriculomegaly, a MRI at term equivalent, with appropriate techniques, will best show the extent of white matter injury. Fourth, prospective, independent review of CUS and MRI examinations by central experts will quickly identify centers in which local interpretation may be a problem, maximize accuracy, and minimize interobserver variability.

Back to Article Outline

References 

  1. Hintz SR, Slovis T, Bulas D, Van Meurs KP, Perritt R, Stevenson DK, et al. Interobserver reliability and accuracy of cranial ultrasound interpretation in premature infants. J Pediatr. 2007;150:592–596
  2. Harris DL, Bloomfield FH, Teele RL, Harding JE, et al. Does variation in interpretation of ultrasonograms account for the variation in the incidence of germinal matrix/intraventricular haemorrhage between newborn intensive care units in New Zealand?. Arch Dis Child Fetal Neonatal Ed. 2005;90:F494–F499
  3. Harris DL, Bloomfield FH, Teele RL, Harding JE, et al. Variable interpretation of ultrasonograms may contribute to variation in the reported incidence of white matter damage between newborn intensive care units in New Zealand. Arch Dis Child Fetal Neonatal Ed. 2006;91:F11–F16
  4. Fawer CL, Calame A, Furrer MT. Neurodevelopmental outcome at 12 months of age related to cerebral ultrasound appearances of high-risk preterm infants. Early Hum Dev. 1985;11:1230132
  5. Vohr BR, Garcia-Coll C, Mayfield S, Brann B, Shaul P, Oh W. Neurologic and developmental status related to the evolution of visual-motor abnormalities from birth to 2 years of age in preterm infants with intraventricular hemorrhage. J Pediatr. 1989;115:296–302
  6. Stewart AL, Reynolds EOR, Hope PL, Hamilton PA, Baudin J, Costello AM, et al. Probability of neurodevelopmental disorders estimated from ultrasound appearance of brains of very preterm infants. Dev Med Child Neurol. 1987;29:3–11
  7. Patra K, Wilson-Costello D, Taylor HG, Mercuri-Minich N, Hack M. Grades I-II intraventricular hemorrhage in extremely low birthweight infants: effects on neurodevelopment. J Pediatr. 2006;149:152–154
  8. Hüppi PS, Murphy B, Maier SE, Zientara GP, Inder TE, Barnes PD, et al. Microstructural brain development after perinatal cerebral white matter injury assessed by diffusion tensor magnetic resonance imaging. Pediatrics. 2001;107:455–460
  9. Counsell SJ, Shen Y, Boardman JP, Larkman DJ, Kapellou O, Ward P, et al. Axial and radial diffusivity in preterm infants who have diffuse white matter changes on magnetic resonance imaging at term-equivalent age. Pediatrics. 2006;117:376–386
  10. Vasileiadis GT, Gelman N, Han VKM, Williams L, Mann R, Bureau Y, et al. Uncomplicated intraventricular hemorrhage is followed by reduced cortical volume at near-term age. Pediatrics. 2004;114:e367–e372
  11. Woodward LJ, Anderson PJ, Austin NC, Howard K, Inder TE. Neonatal MRI to predict neurodevelopmental outcomes in preterm infants. N Engl J Med. 2006;355:685–694
  12. Dyet LE, Kennea N, Counsell SJ, Maalouf EF, Ajayi-Obe M, Duggan PJ. Natural history of brain lesions in extremely preterm infants studied with serial magnetic resonance imaging from birth and neurodevelopmental assessment. Pediatrics. 2006;118:536–548
  13. Inder TE, Well SJ, Mogridge NB, Spencer C, Volpe JJ. Defining the nature of the cerebral abnormalities in the premature infant: a qualitative magnetic resonance imaging study. J Pediatr. 2003;143:171–179
  14. Rademaker KJ, Uiterwaal CS, Beek FJA, van Haastert IC, Lieftink AF, Groenendaal F, et al. Neonatal cranial ultrasound versus MRI and neurodevelopmental outcome at school age in children born preterm. Arch Dis Child Fetal Neonatal Ed. 2005;90:F489–F493
  15. Maalouf EF, Duggan PJ, Counsell SJ, Rutherford MA, Cowan F, Azzopardi D, et al. Comparison of findings on cranial ultrasound and magnetic resonance imaging in preterm infants. Pediatrics. 2001;107:719–727
  16. Robinson PJA. Radiology’s Achilles’ heel: error and variation in the interpretation of the Rontgen image. Br J Radiol. 1997;70:1085–1098
  17. Berlin L, Berlin JW. Malpractice and radiologists in Cook County, Il: trends in 20 years of litigation. AJR. 1995;165:781–788
  18. Renfrew RL, Franken EA, Berbaum KS, Weigelt FH, Abu-Youself MM. Err in radiology:classification and lessons in 182 cases presented at a problem case conference. Radiology. 1992;183:145–150

PII: S0022-3476(07)00332-0

doi:10.1016/j.jpeds.2007.04.002

Refers to article:

  • Interobserver Reliability and Accuracy of Cranial Ultrasound Scanning Interpretation in Premature Infants

    Susan R. Hintz, Thomas Slovis, Dorothy Bulas, Krisa P. Van Meurs, Rebecca Perritt, David K. Stevenson, W. Kenneth Poole, Abhik Das, Rosemary D. Higgins, NICHD Neonatal Research Network
    The Journal of Pediatrics June 2007 (Vol. 150, Issue 6, Pages 592-596.e5)

The Journal of Pediatrics
Volume 150, Issue 6 , Pages 575-577, June 2007