Utility and Reliability:
Selected Commentary on the Subject of PennHIP vs. Hip-Extended Views
These comments are enlarged upon in my book on HD and other orthopedic disorders, available from the publisher or the author. Get the whole book for the whole picture. This magazine/website format is in response to requests by dog clubs where I lecture, and from individuals for a synopsis-comparison between diagnostic methods.
Occasionally, there are some disturbing and puzzling aberrations reported in diagnosing radiographs that are taken in the old hip-extended-only (OFA-type) position, and it may take a little detective work or repeated radiography, perhaps at a later age or different position, to resolve the mystery. Rendano and Lust at Cornell gave the all-too-common example of a dog radiographed in “the standard (ventro-dorsal) view” twice the same day; they demonstrated a v-d diagnosis of dysplasia at one examination and normal at the other. In most dogs, anesthesia would probably not matter noticeably, as any difference seen probably is due to the chance element that exists in the v-d procedure. In similar experiments at the University of Pennsylvania, many dogs were repeatedly radiographed in that previously “standard” ventro-dorsal position, often with different results, even though only board-certified radiologists were used in the tests. Remember, one definition of reliability is repeatability. Others seemingly dysplastic at 12 months (using the OFA protocol) appeared normal to other or the same OFA evaluators in that same hip-extended view at 2 years. It had long been customary to explain away these cases by differences in estrus, exercise, position, etc. Frequently encountered are also the cases wherein dogs (even a few older ones, in some breeds) will have subluxation but no osteophytic or other degenerative changes yet, as these differences can be examples of a breed-related phenomenon.
There are even differences between palpation showing laxity, yet later (or same-age) v-d radiographs indicating “normal” hips. Less often, vice-versa. These inconsistencies have led some orthopedic specialists to diagnose one way and a radiologist another, especially if making a distinction between adjacent “grades.” And remember, this is the hip-extended view we are talking about, with its lower reliability. I, too, have seen the same pelvis and femurs radiographed twice during the same exam, with one radiograph just good enough to call the dog OFA-Fair-normal and a few minutes later the other film caused the eventual reader to reach a diagnosis of Mild HD, both based on apparent laxity. The difference in any two such radiographs might sometimes be due to a slight change in the amount of femoral rotation, but it could again be simply the low reliability quotient. Such variation is inherent in the unstressed view.
Effects of Position on Appearance of Laxity
There used to be many discouraged or disillusioned fanciers who would harp on the percentage of error in such registry/diagnosis organizations as OFA and use it as an excuse to not worry about hips, but now with the availability of the distraction technique (PennHIP, which stands for Penn Hip Improvement Program and sometimes seen as “P-H”), they have no such excuse or reason to complain. The hip-extended procedure, in vogue and delineated by the AVMA for many years, does not show everything you might want to know about the hip joint. But that position does show more of those exostoses (bone deposits) and other degenerative effects such as changes in femoral head contour than a frog-leg position with the knees spread and the femurs pointing up from the table. PennHIP’s distractive technique or the use of another wedge shows laxity much more readily, but is equal to the others in showing DJD because one of the 3 positions is indeed the very same hip-extended view. But that it is not the best position for discovering all the laxity that truly exists. You need the distracted view.
The mid-1990s Keller/OFA work designed to compare early prediction with later results (yet still using the old hip-extended position) showed “accuracy” 85.6 percent of the time, if you just use the distinction of normal vs. borderline vs. dysplastic, but if you compared the grades of severe, moderate, mild, borderline, fair, and good—(there was none in this particular study that rated “excellent” at any age), then in only 36 of the 81 (44%) were the lone radiologist’s “juvenile readings” in agreement with the later panel-consensus ones (and most of that agreement came with the “Good” hips). The study concluded that preliminary evaluations using distraction should be more sensitive in distinguishing between eventually normal/dysplastic hips, that early joint laxity measurement or disclosure may be of benefit in some dogs, that the unstressed Norberg angle picture is less useful or accurate than measurement of displacement under stress, and that there was “good reason for further research in this area.” Although stress radiography is still not used by OFA, they had at least taken that initial tentative step in forward-thinking. But they were later to step back again from the concept, when the PennHIP distraction technique raced past them into the limelight. The OFA apparently considered that “competition” (a curiously unscientific attitude) and failed to do the research that they themselves had called reasonable and desirable. It remained for Penn, the very university in Philadelphia where OFA had been born, and which already had recent years of stress-radiography research under its belt, to finish the work.
Although deterioration/abnormalities make the definitive identification of HD (call it DJD, then), the primary predictive indication of DJD and the main definition, at least in young dogs, is joint laxity. This is best revealed by the PennHIP method. Cornell’s Lust realized before the mid-1980s the need for such a standardized method for applying a force that would displace femoral heads from the acetabula (demonstrate covert laxity) and give an actual measurement of that laxity or displacement. The old puppy palpation method (more in my book) was just too subjective for further progress.
I have said that one dog’s hip joints may be looser than another’s. But by how much? Is the difference measurable in “integral” intervals? We have often heard the phrase “You can’t be ‘a little bit pregnant,” but that is misleading when used to describe HD and many other orthopedic disorders. You can no more determine where “abnormal” begins than you can see where orange turns to red in the spectrum of visible light, or hear whether 221 cycles per second is sharper than 220 when a tuning fork is tapped. We can see that red is not violet, but the dividing line is impossible to pin down. Where is the change from A-sharp to B-flat? Just as there is a continuous spectrum, there is a continuum of range in the tightness of hip joints. To use a pass/fail criterion is not accurate; it is arbitrary. And to say that one dog has OFA-normal and the next dog is OFA-dysplastic may be like trying to decide where the rainbow colors change—those two hypothetical dogs may be Fair and Mild-HD, with the difference in the genes not worth a söu. Hip dysplasia is distinctly a non-binary disorder; forget the idea that either a dog has the disease or it doesn’t. People fall into this idea because of the historical presentation of passing and failing grades used by the hip registries of the world prior to PennHIP.
Advantages of the Distraction Method
PennHIP offers a number of advantages over the “standard” radiographic technique: closer reflection of the forces acting in a standing or moving dog, accurate evaluation at a younger age, and a quantitative numerical value to compare one dog to the next. The older the dog, perhaps the less difference it would make using this technique versus the older view, because the purpose of demonstrating laxity is to predict future degenerative changes. We now have available the ability to accurately predict at 6 months or even 4 months of age the likelihood of a dog developing DJD later in life. Enough data have been compiled to allow us to give more-reliable risk figures for probability of developing DJD. PennHIP and independent researchers have found that there is no significant change in the DI at a young age compared to that determined in adulthood. The old hip-extended radiograph is not as appealing to some breeders because certification can’t be had until 1 or 2 years of age, and owners don’t get reasonable freedom from the worry that hips will change for the worse in the next year or so. Obtaining a more accurate hip-extended diagnosis later in the dog’s life means that an owner may have invested much in his dog before he can find out if the hips are good enough for breeding. And the dog may already have been bred, as well. The PennHIP method is effective in early identifying the prime aspect of HD: laxity. Smith had earlier said his method at 4 months was 80% as accurate as the standard AVMA/OFA diagnosis at 24 months, but it now appears that it may actually be even better than that. Indeed, one day the PennHIP method will be the “standard.”
Possible Difficulties with Hip-Extended View Registries
An OFA claim in their 1998 public relations release found fault with the independent University of Wisconsin study that was reported earlier in the same year (July) because the dogs studied were initially a slightly younger age than those in one or two 1997 OFA studies, but the Wisconsin study followed a group of dogs from puppyhood to maturity at one year age, and sought to get a representative sample of the population that way. That JAAHA report concluded, “The distraction index measurement was the most accurate in predicting the development of DJD” and “distraction index in puppies … 16-18 weeks of age was the most reliable predictor of hip dysplasia.” Later studies by Purina and Penn over the lifetime of dogs supported that statement.
Extended-hip radiographs were analyzed by both one of the authors of the 1998 Wisconsin JAAHA study and a regular OFA reader in a blinded fashion, and with OFA protocols. Results at the various ages were compared with evaluations at one year of age. At the 52-week evaluation, DJD was defined as 2 or more stress lines in the joint capsule attachments or a combination of one stress line with subluxation of femoral heads, corresponding to the OFA-AVMA definition. Statistical tools included logistic regression and p value was used in such a way as to minimize the probability of DJD prediction when in fact it did not happen. The authors also calculated percentage of correct prediction, and numbers of false negatives and false positives for each age group and method (18 combinations). The numbers of false negatives were very high with the OFA method at the younger ages (56% false negatives at 4 months, which is counter to OFA claims in publicity releases in the late 1990s). The researchers concluded that OFA views and assessments at 4 months “cannot be recommended as a routine screening method.” The most accurate predictors in this study were the DI and Norberg angle when hips were in the neutral position; this is the knees-up one, most closely duplicating the normal stance of a dog with vertical femurs (similar to one of the positions used for PennHIP). The relatively lower precision of the OFA system was shown by the disagreement in 21% of the cases when using that type of evaluation. Comparing the Norberg angle alone with Penn’s Distraction Index, we find that “the DI is a far more accurate predictor of eventual DJD than is the Norberg angle.”
An interesting data lagniappe in this Wisconsin study was that the Labrador Retrievers had a mean DI of 0.53 and the Irish Setters had 0.46, almost identical with the PennHIP database at the time (0.52 and 0.47, respectively). Also interesting is that the Labs were from dysplastic parents but the Setters were from “OFA-Good” parents. Additionally, they found that there has been no decrease in the incidence of HD in these breeds between 1972 (when the OFA minimum age of 24 months went into effect) and 1990 (1993 OFA report).
The OFA study in 1997 only looked at some of the dogs: those better-looking ones that returned for evaluation at age 2 years; this increased the apparent (but not the real) reliability. There were, as expected, many drop-outs, since it is natural for owners who have a 6-month pup diagnosed as dysplastic not to bring it back and spend more money and time of additional follow-up radiographs. Even those that looked good to OFA at 6 months but not good to the owners’ local vets at 2 years were not all sent in to OFA. Thus, OFA’s published reliability figures were erroneously high; the percentages of false negatives as well as false positives were not valid. The skew or bias built into such selective re-radiographing would tend to exaggerate the perceived reliability beyond reality. It is important for statistical data integrity that all dogs be re-evaluated, not just the ones that look good at the local vet’s clinic. Conversely, PennHIP vets are told up front that they will lose privileges to use the name if they do not submit all radiographs.
OFA’s claims to reliability had suffered an earlier wound. The word “reliability” can also be considered a quasi-synonym for “repeatability.” With the hip-extended position, it became apparent that there was an uncomfortable difference in the way various radiologists read the hip quality: a “marked conflict” was noted between the OFA radiologists and other board-certified radiologists, with the OFA panels being considerably more lenient. In order to improve the comparisons between OFA and PennHIP evaluations of DJD in the leg-extended view, the readers at Penn experimented by adopting the OFA’s more forgiving evaluation. However, this does not mean that the procedures were equivalent, because the distraction technique for looking at laxity as a predictor more than makes up for the leniency in the view of DJD. Even using those lenient criteria, however, Penn was unable to substantiate the OFA press-release claims of reliability (not a scientific journal) or its claimed progress in the fight against HD. In that direct comparison between the distraction index and the OFA-type scoring, the latter was shown to be quite unreliable (P>0.05) compared to the former (P<0.001) in predicting hip scores from four months to two years of age.
The only two OFA papers published in the veterinary scientific literature before the year 2000 were retrospective surveys in 1985 and 1997, not original research. Meanwhile, more than 20 peer-reviewed papers on and/or by PennHIP were published in that time period, and some of these were known as “concurrent cohort (prospective) studies,” which design carries more weight or validity than other types. The oft-repeated OFA claim, therefore, that PennHIP was still “experimental” could not be substantiated; that claim was curious in the light of no studies showing the OFA’s approach to be anything more than experimental, despite well over 30 years of use. At least four independent research laboratories have confirmed the findings that the distraction technique of PennHIP had significantly fewer false-negatives (saying a dog looked good when it was found not to be so in stricter or later tests) at 4, 6, and 12 months when compared to two years.
Meanwhile, in Switzerland, Dr. Mark Flückiger of the University of Zurich observed, “The conventional v-d positioning that is used in diagnosis does not truly reflect the genetic make-up of a dog… it does not correctly evaluate for joint laxity... [it] is unreliable.”
A preliminary evaluation of a hip-extended radiographic picture might help you weed out the dogs that already exhibit severe HD or DJD at that age, before a lot of expense of showing and training, and can influence their sale or choice of homes. But those who want an even earlier and far more accurate prediction of relative risk for later HD can get a PennHIP evaluation. PennHIP is far better for the younger dogs, while hip-extended views may be sufficient for the older dog, perhaps over 4 or 5 years age.
Adoption of the PennHIP compression-distraction method as a far better revealer of probable genotype, and thus an improved predictor of eventual DJD in the individual and its progeny will become the quantum leap in genetic disease control, at least in hips, that the breeder has wanted for a long time. All groups can benefit from the modern approach: stress radiography for early and accurate diagnosis, and (in a few breed clubs) Zuchtwert or Breed Value assessment with its use of progeny information and open database on relatives. Breeders and clubs that want faster progress and earlier indication of eventual hip joint quality should supplement their knowledge with PennHIP evaluation data.
Comparing H-E and PennHIP (again)
Distraction index (DI) is Penn’s description of laxity and, since it is a ratio, makes breed or size of dog irrelevant. At the 2000 Veterinary Orthopedic Society meeting, a survey of 260 dogs was presented that compared P-H scores with OFA evaluations. The study showed that:
- 53% of OFA-Excellents were looser than 0.30 DI;
- 64% of OFA-Goods were looser (the loosest had a 0.77 DI);
- 93% of OFA-Fairs were looser than 0.30 DI.
- 77% of all OFA-”Normals” combined had DI of >0.30 (were looser).
In a study of large-breed dogs, some 71% of dogs that had been reported by OFA to have no evidence of HD were found to have distraction indices high enough to be considered susceptible to DJD development. These facts merely mean that PennHIP and the DI represent a more rigorous or demanding look at laxity and risk. It may well be that you are satisfied with some DI higher than 0.30, but you should have the information before expecting to make wise choices. The 0.30 number is the practical cut-off where dogs with that or lower numbers are practically guaranteed never to develop any HD.
The hip-extended registries’ (SV, OFA, etc.) higher rates of false negatives means that among those people and kennels using them, there will be a significant percentage of “cleared” or “normal” dogs that will “get” HD later (show radiographic signs). I call these the “stealth dogs” because they can slip through the inadequate radar-like screening of the hip-extended programs. This also adds to the bias in many registries, as described above. In the leg-extended system in America there are false-negative rates of about 83% in 6-month-old German Shepherds, for example, but in the PennHIP scheme, the rate is only about 12% in 4-month-old dogs and 0% at 6 months (when compared to the same-dog readings at 24 months). The leg-extended (only) position on 4-month pups gave a false-negative rate of 24%; double that of the PennHIP compression-distraction method. Even at 6 and 12 months, the OFA-type predictive tests gave false-negatives of 15% and 12% and the PennHIP stress-radiographic method showed zero false negatives for 6- and 12-month old dogs.
Perhaps the PennHIP findings regarding breed differences and relative rankings may be similar to those of the OFA, but their laxity index system is far more accurate than the OFA: quantitative diagnosis means more meaningful information. Penn’s chief of surgery Gail Smith guesses that he would describe perhaps half of the “OFA normal” German Shepherd Dogs as abnormal. Of course, the description would depend on where one draws the cut-off line between normal and abnormal.
Accuracy, Precision, and Reliability in Young Dogs
Lust has called Penn’s “more physiological position... [and] improved assessment for joint laxity” both promising and useful for diagnosing HD in young dogs. The PennHIP method is effective in identifying the prime aspect or predictor of HD (laxity) in dogs as young as four months. Smith once said that he believed his method at 4 months was 80% as accurate as was the standard AVMA/OFA diagnosis at 24 months. Later statistics in an American Journal of Veterinary Research paper indicate that it is actually 96% accurate at four months and nearly 100% accurate by six months, when compared to radiographic evaluation done at twelve months of age. The way it was worded was that there was about a 96% probability that hip laxity would vary by less than plus or minus 0.15 DI units from 4 months to 24 months. That type of data is what I base my statement on, that PennHIP is more than 95% accurate at 4 or 5 months in most breeds.
Repeatability includes within- and between-examiner evaluations. There is considerably more reliability (fulfilled expectations of getting the same results) in the more objective P-H distraction technique, even with inexperienced readers, than there is in the more subjective hip-extended method.
The assertion by Penn that OFA is not the best method American breeders have for progress in reducing HD in their line or breed involves the accuracy factor as well as the reliability factor. They call our attention to the fact that there are many dogs (usually of certain breeds) that do not develop DJD in their early years yet are OFA-assessed as dysplastic because of laxity at 2 years’ age. Even more importantly, there is the greater number that were adjudged “normal” at 2 years but later developed DJD or, if not re-radiographed, produced an unacceptably high percentage of dysplastic descendants. This led to the conclusion that the accuracy of OFA’s method is gravely flawed. Even if reliability had been higher from younger ages up to the two-year qualification age for OFA certification (and it is not), the absence of accuracy is worrisome to breeders, and diminishes the importance of OFA’s published reliability figures in 1997. Remember the difference between reliability (repeatability) and accuracy (avoiding errors).
As an example, Penn cites the 1996 OFA-type evaluation of military dogs in a longitudinal study in which all the dogs with “normal” hips at 2 years had mild degenerative changes by nine years of age. At the same time, 22 of 52 dogs that had been judged “positive” for HD at age two had similar changes by nine years! The conclusion was that the OFA-type evaluation at 2 years does gives a relatively high rate of misdiagnosis, and blurs the distinction between true positive-for-HD and true negative (no HD) diagnoses even at the supposedly “safe” age of two years. Admittedly, the OFA hedges its emphasis on laxity a little by using the phrase “normal for age and breed” when grading radiographs; they do allow for some differences between Saint Bernards’ and Borzois’ hips this way, though it is still primarily a subjective evaluation.
In all studies that compared the DI with OFA score, the OFA diagnostic test was found to have even more error when evaluated longitudinally. That means following dogs over a long period of time. The data are clear on this issue. Many people have been lulled into believing that since their dogs receive one OFA score at 2 years of age, that the score is accurate, absolute, and will not change. If they would take the time, and spend the money to have repeated OFA testing done, they would find a troubling amount of error, and far more error or change than they would with PennHIP.
Distraction Index has the best repeatability and accuracy for viewing laxity, especially at earlier ages such as between 4 and 24 months, than either the subjective hip score used in the hip-extended projections or the other objective measurement known as the Norberg angle. It is the best method at this point, with nothing likely to improve on it in the foreseeable future. This is not to say that it should be used to the total exclusion of other tools, however. Hip-quality production data of siblings, offspring, and parents, of the sort used in BV and ZW calculations, are additional aids to progress.
PennHIP allows a clearer picture of the genotype than the old AVMA view gives. It offers, at younger ages, identification of the most likely carriers of bad or good genes, a more quantitative evaluation (numerical index), a more natural positioning of the dog, and faster progress in reducing the incidence of HD. Many peer-reviewed vet-journal articles on the method prove very good reliability in radiographically predicting which pups are at greatest risk of developing DJD.
The views and opinions expressed on this web site are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of SiriusDog.com, the staff, and/or any/all contributors to this site.