Looking Closer at OFA's Claims about PennHIP

Revised 2012.

A contributor to an e-mail discussion group back in 2003 reproduced a public-relations flyer from OFA without realizing that many of its statements were old, repeated ones that long ago became inaccurate and misleading by their lack of currency. I only have space here to comment on a few of these. In italics are quotes from the 1998 OFA mailing (a repeat of the pamphlet sent in 1994, which in turn repeated some older claims, and never yet retracted); my response is in regular typeface.

OFA said:
…this method [referring to PennHIP] appears to be premature. In other words, commercialization (marketing) of the method has outreached the science. [and] …the jury is still out!

Back in the early days before PennHIP had many years and studies under its belt, the first and third statement may have had a little credence, but never has the second, even though for a time the university had decided to let a respected science company spread the word, because the university did not have the staff to do a good job at that. The science is overwhelmingly on the side of PennHIP, and the marketing efforts are overwhelmingly being carried out by OFA. The only two OFA papers published in the veterinary scientific literature before the year 2000 were retrospective surveys in 1985 and 1997, not original research. Meanwhile, more than 20 peer-reviewed papers on and/or by PennHIP were published in that time period, and some of these were known as “concurrent cohort (prospective) studies.” which design carries more weight or validity than other types. Many more have followed in more recent years. The OFA’s continuing claim, therefore, that PennHIP is still “experimental” cannot be substantiated; that was curious in the light of no studies (at the time of its initial release and even later) showing the OFA’s approach to be anything more than experimental, despite well over 30 years of use.

OFA said:
Dr. E.A. Leighton (JAVMA, May 13, 1997) reported on genetic progress … (OFA’s conclusion): It should be pointed out that DI was considered experimental… It will be interesting to see the results when DI is included as a selection criterion.

So, why didn’t the OFA bother to say what he said? The need for more rapid and sure progress than had been gained by use of OFA is why Dr. Leighton (geneticist at The Seeing Eye Inc.™) has said (in Genetics of canine hip dysplasia. J Am Vet Med Assoc 1997;210:1474-1479.), “To make further progress toward eliminating CHD, a measure of hip quality that will allow us to distinguish among the dogs currently in the breeding program is needed. For this reason, The Seeing Eye, Inc. has turned to the distraction index (PennHIP) as a means of assessing hip quality.” Their breeding program had reached a point where careful selection had brought them to an admirably lower level of HD; yet, because they only had “normal” dogs to use in their program, the BVA 9-point scale, and even the OFA’s 7-level scale of hip joint quality, was of no further use. All their prospective breeding dogs were “clustering” and looking perfect at #s 4, 7, and 8 on the BVA scheme. (Dorsal acetabular edge, Caudal acetabular edge, & Femoral head/neck exostoses) That is, no DJD at all, and only a few with a little laxity (#s 1, 2, & 3, these being Norberg angle, Subluxation, & Femoral head recontouring). Their scale had essentially become a de facto 3-point scale. Because of this, they adopted the PennHIP distraction index to get to the next upward step, to break through the hip-extended-view ceiling or plateau, and reach a higher level. Studies at The Seeing Eye Inc. have included PennHIP’s Distraction Index in later years. It was found that GSDs with DJD had a mean (average) DI of 0.51 (the breed average hovers around 0.4) and those with no DJD had a mean 0.33 DI. In Labs, the DJD dogs had a mean of 0.67 while DJD-free dogs had 0.38 as a mean DI. What keeps people and clubs on this intermediate-altitude (OFA) level? One reason is that they (we) have been using outdated definitions of what is “normal.” PennHIP allows us to refine that definition.

OFA said:
…Vet Clinics of No Am, May 1992) on a study of 3,369 dogs from 25 breeds. Reliability of the preliminary evaluations ranged from 71.4% to 100%…
The OFA pamphlet conveniently fails to explain that their definition of “reliability” (repeatability) only compares short-term evaluations in the hip-extended view. Those figures drop precipitously when one compares repeatability of hip-extended views at end-of-life (or nearer to it).

If one applies the more stringent evaluation protocols of PennHIP, one finds a disturbing number of “OFA-normals” to indeed be not normal in any sense that most of us would consider so, as compared with the mean or average in the breed. Even an unacceptably high number of OFA-Excellents at 2 years will show up in the PennHIP position to have worse hips than would otherwise be suspected. If you want to know if a bridge can bear a load of 10 tons, you should not expect to run a meaningful test by driving your half-ton pickup over it. The dog also should be evaluated in the strictest method in order to tell if a reading of “excellent” has any validity. Otherwise, the breeder—or the person driving a big truck over a bridge—may have a false sense of security. If not much is demanded, is it a real test?

In a study of large-breed dogs (Smith GK, Gregor TP et al. Hip dysplasia diagnosis: A comparison of diagnostic methods and diagnosticians. In Proceedings, An Conf Vet Orth Soc 1992;20.), some 71% of dogs that had been reported by OFA to have no evidence of HD were found to have distraction indices high enough to be considered susceptible to DJD development. These facts merely mean that PennHIP and the DI represent a more rigorous or demanding look at laxity and risk. It may well be that you are satisfied with some DI higher than others would be, or with only the limited information afforded by the OFA view at 2 years, but you should have more information before expecting to make wise choices. Reliability connotes or even is defined as repeatability. You should visit your nearest vet school library and study these for logical, sound revelations on that subject:

Within- and between-examiner repeatability. Am J Vet Res 1997;58;#10:1076-1077.
Views: Letters to the editor re OFA reliability figures questioned. J Am Vet Med Assoc 1998;212:487-488.
Views: Letters to the editor re prevalence data and bias. J Am Vet Med Assoc 1999;214:27-28.

It is an interesting side-note that the referenced OFA pamphlet cites Vol. 22 of Vet Clinics of North America, May 1992. In that same issue is my article on HD, in which I quote the OFA director as saying, “some 60% or more of submitted radiographs are given erroneous evaluations.” Also notable is that I did not mention PennHIP, because at the time, its system was only a very few years old and statistical data had not yet been compiled and published. Much has changed in the interim, however. Even in the two years between that 1992 Vet Clin issue and the basis for the OFA’s repeatedly-resurrected statements, enough data had been generated to indicate the diagnostic superiority of the distraction technique and knees-up position. Since those days, “tons” of further evidence has piled up.

OFA said:
…the owner may [he obviously meant to say “might”] not allow submission of an obviously large DI measurement.
That is patently misleading. The typical scenario at a PennHIP-certified vet’s facility is this: The client brings in his dog, specifically for a scheduled P-H evaluation (not a spur-of-the-moment decision), the papers are filled out, the dog is taken back into the room where anesthesia and radiography are administered with the client remaining in the waiting room, the three views are taken, and after the dog starts to come out of unconsciousness and the films are dry, they are brought to the interview room for consultation with the client. The vet might make some prediction based on his subjective view of the films, but most know better than to second-guess the objective measurement with circle gauges that will be done only at the PennHIP Analysis Center in Pennsylvania. He tells the client that they both should hear in two or three weeks at most, what the DI and percentile will be. Any vet that fails to send in all radiographs taken (excepting extra ones that the client might ask to be sent to OFA or kept) will lose his accreditation. And since he has invested faith and time, and charges more for his P-H services, he will not risk that. On the other hand, any ol’ vet can take a film for OFA and ask the client if he wants to send it in with extra money when they already know that it will not get a passing certificate. Where is the bias? You know where.

OFA said:
… 5% false negative finding as reported by Dr. Jessen (Proceedings, 1972…

That 1972 report is what prompted OFA to immediately require an age of 24 months instead of the more highly unreliable 12 months for a “normal” rating. So it is strange that later papers by OFA claimed accuracy in predicting HD based on “prelims” at young ages, and not so strange that this claim was only made after the success of PennHIP in accurately predicting HD/DJD by estimating relative risk. This is just one example of reactionary moves that have lowered OFA’s credibility. All the hip-extended registries also have a higher rate of “false negatives”, meaning there will be a significant percentage of “cleared” or “normal” dogs that will “get” HD later (show radiographic signs). I call these the “stealth dogs” because they can slip through the radar-like screening of the hip-extended programs. This also adds to the bias in many registries. In the leg-extended system in America there are false-negative rates of about 83% in 6-month-old German Shepherds, for example, but in the PennHIP scheme, the rate is only about 12% in 4-month-old dogs and 0% at 6 months (as compared to the readings at 24 months). That can also be stated that PennHIP is 96% accurate at 4 months and practically 100% accurate from 6 months up! That cannot be said about the hip-extended (OFA) method. The OFA position on 4-month pups gave a false-negative rate of 24%; double that of the PennHIP compression-distraction method. Even at 6 and 12 months, the OFA-type predictive tests gave false-negatives of 15% and 12%, and the PennHIP stress-radiographic method showed zero false negatives for 6- and 12-month old dogs. End-of-life studies in recent years confirm that the DI remains almost unchanged throughout life, regardless of breed.

The limited number of dogs resulted in a larger confidence interval for the PennHIP values, says the OFA publication. That is not representative of the truth. With a much more sensitive test, fewer numbers of test dogs are needed to reach statistical significance. If you used a magnet and counted the number of pellets in random square yards of a battlefield, and it reported only 25% of what you found when you later dug up and sifted those plots, then used a much stronger and more accurate magnetic metal-finder which reported 99.5% of those pellets, you would not have to dig up as many square-yard plots when using the latter in order to arrive at a better confidence level in your predictive method. While OFA , SV, etc. will always have a larger number of blips on their magnetometer, but they will never improve on the locating ability (predictive value) of PennHIP’s magnets. Another way of looking at this simile: OFA can survey more acreage, but PennHIP will find more pellets.
PennHIP may be said to have a very small percent of “false positives,” depending on breed, but only if by that you mean that the high DI figure that refers to relative risk was not echoed in later life by presence of osteophytes (OA or DJD). There are some breeds that are much more tolerant of laxity than others. The laxity quantitatively stated as DI is predictive—it refers to a certain amount of risk of later remodeling that sometimes does not occur in the useful life of the animal. However, keep in mind that while the individual dog may have relatively clean joints, it may also have many bad genes to pass along, and the PennHIP laxity test is an improved indicator of these. False positives do not hurt breeds or our progress in eliminating HD; false negatives (passing grades on a failing student) most certainly do. It is better to detect the many stealth dogs and cull them from the breeding programs than worry about losing the genes of the very few false-positive dogs (loose, but no later DJD). Don’t let some covert fifth-columnist into the ranks to poison or sabotage the gene pool in your breed or line.

OFA said:
The meaning of the measurements remains unclear and will require repeat studies, on the same dogs, at >24 months of age.
Wrong. The meaning is crystal-clear. Wrong also in the old and renewed, but not updated, intimation that repeated studies have not been done. Probably the most telling data to come to light (Smith GK, Powers MY, Biery DN, Houlton JEF, Evans RH, McKelvie PJ, Shofer FS, Gregor TP, Ballam JM, Mantz SL, Lust G, Lawler DF, Kealy RD. Effects of age and diet [restricted feeding] on hip phenotype: A lifelong study in Labrador retrievers. Proc Brit Vet Ortho Assoc, Birmingham, UK, 2003, pp. 20-22.) derives from that long study of 48 Labrador Retrievers conducted at the laboratories of Nestlé-Purina Pet Care Center. Of the dogs having “normal” hips at 2 years of age (according to OFA scoring criteria), 92% went on to develop OA radiographically or histopathologically by end of life. This is a real eye-opener, indicating a phenomenally high rate of false negative diagnoses in the OFA procedure of evaluation, and therefore poor reliability in the sense of trustworthiness. Similar patterns of hip disease would be observed in GSDs and other breeds if also followed for life. Already, there is a weighty bulk of evidence, now several years old, in GSDs comparing early DI readings with DI and DJD at four years of age.

OFA said:
PennHIP has not been available long enough to accumulate the data necessary to evaluate the effect of this test method…
Again, we hear this old, feeble, and misleading statement that should have been retired many years ago. That claim was first brought out when PennHIP was about four years old, but it’s now been almost 20 years since the first conception of the new procedure, and the volume, pertinence, and quality of data has been both impressive and convincing for quite a long time now. Although I am now more than two-thirds of my mother’s age, I will never catch up to her in years while we both live. But I long ago surpassed her in education. OFA will always (as long as it lasts) be older than PennHIP, and probably always will have a larger number of dogs in its database. However, it has already been passed in credibility and relevance in the eyes of most of the scientific community and many breeders who strive for the very best. Today, the PennHIP distraction index method is shown to be 2.5 times as sensitive as the old, standard hip-extended one. That is, it uncovers 2.5 times as much laxity. It is also 4 times more accurate.

The OFA comment in their 1998 public relations release, “over half of the dogs in the OFA study were older than in the study reported by Dr. Adams”, found fault with an independent University of Wisconsin study reported earlier in the same year (Adams WM, Dueland RT, Meinen J, et al. Early detection of canine hip dysplasia: comparison of two palpation and five radiographic methods. J Am Anim Hosp Assoc Jul/Aug 1998;34:339-346.), because the dogs studied were initially a slightly younger age than those in one or two 1997 OFA studies, (Corley EA, Keller GC, Lattimer JC. Reliability of early radiographic evaluations for canine hip dysplasia obtained from the standard ventrodorsal radiographic projection. J Am Vet Med Assoc 1997;211:1142-46. [and] Kaneene JB, Mostosky UV, Padgett GA. Retrospective cohort study of changes in hip joint phenotype of dogs in the United States. J Am Vet Med Assoc 1997;211:1542-1544), but the Wisconsin study followed a group of dogs from puppyhood to maturity at one year age, and sought to get a representative sample of the population that way. That JAAHA report concluded, “The distraction index measurement was the most accurate in predicting the development of DJD” and “distraction index in puppies … 16-18 weeks of age was the most reliable predictor of hip dysplasia”. Since then, as I have said, lifelong studies have proven the concept and practice of DI data as predictive to be valid.
Another potentially misleading statistic is the “32% increase” in % of OFA-Excellent among GSDs: does the change from 2.5% excellents before 1980 to 3.1% in 1986-87 excite you? Is that anything to hang your hopes on? In the same time period, the HD rate in the breed (OFA’s own statistics) went from 20.7 to 22.8%. There are a few reasons for this apparent divergence, but one first has to accept the numbers and decimal placings as significant and accurate before those can be logically explored. Even less faith in such statistics is generated by the dismal news that less than 5% of all dogs registered by AKC are evaluated for HD by any hip registry. This simply means that we breeders will never significantly affect the general breed population, but we can change the course of the core — the show-quality and working-dog representatives—and even more, you can make a big impact on your own lines. Just remember that there are only a limited number of OFA-Excellents (between 1 and 2% in the GSD), and some of them will be false-negatives.

Comparing H-E and PennHIP (again)

There are several important differences in approaches to the problem of HD between PennHIP and OFA in the USA, felt to a lesser extent with registries in other parts of the world. For one, the heritability of the phenotype measured using the PennHIP method (to detect passive laxity and therefore the likelihood of DJD) has been shown to be much higher than the reported heritability estimate of 0.22 for subjective hip scoring methods (the traditional leg-extended methods) for breeds thus far studied. The higher the heritability associated with a trait, colony, breed, or diagnostic method, the greater the influence of selection pressure, and the faster the potential progress. Such information holds great promise for breeders who, by applying selection pressure based on the PennHIP phenotype, desire to make rapid genetic change toward reducing the frequency and severity of DJD in future generations of dogs.

In 1995, Popovitch, Smith, and others published two studies on the differences between GSDs and Rottweilers as well as an evaluation of risk factors for DJD. See: Comparison of susceptibility for hip dysplasia between Rottweilers and German Shepherd Dogs. J Am Vet Med Assoc 1995;206(5):648-650. [and]: Evaluation of risk factors for degenerative joint disease associated with hip dysplasia in dogs. J Am Vet Med Assoc 1995;206(5):642-647. Taken together with the 1993 comparison of PennHIP and OFA, these papers satisfactorily address the differences between the subjective hip-extended view and the more objective distraction view. See: Biomechanical study of the effect of coxofemoral positioning on passive hip joint laxity in dogs. Am J Vet Res 1993;54:210-215.; Coxofemoral joint laxity from distraction radiography… ibid.; pp 1021-1042; Joint laxity and its association with hip dysplasia in Labrador Retrievers. ibid.; pp. 1990-1999. The OFA’s predictions were less accurate than the P-H predictions, partly because the former’s predictive capacity does not reach statistical significance.

At the Veterinary Orthopedic Society meeting in 2000, a survey of 260 dogs was presented that compared PennHIP scores with OFA evaluations. The study showed that:

53% of OFA-Excellents were looser than 0.3 DI (had a higher number);
64% of OFA-Goods were looser than 0.3 DI (the loosest had a 0.77 DI);
93% of OFA-Fairs were looser than 0.3 DI.
77% of all OFA-”Normals” combined had DI of >0.3 (were looser).

You might be perfectly satisfied in breeding Labs with their average DI of 0.45 or 0.5, or GSDs with their average of 0.4 or 0.43; or with OFA, SV, or BVA-type “passing scores”, but your customers are not going to thank you, as some of their dogs’ descendants will give the owners both clinical and financial problems. You’d be letting too many bad genes sneak through the wide openings in your fence and muddy up your gene pool.

“OFA, a not-for-profit organization”, (as is PennHIP, by the way), makes subtle but revealing insinuations in its choice of words in such non-peer-reviewed pamphlets. When its author says, “laxity that is demonstrated by forcing the heads of the femurs away from the acetabula…”, the operative word is “forcing,” which conjures in the mind of the unwary reader some excessive and damaging action. He adeptly avoids using the same terminology to describe the hip-extended position as being one that tends to force the heads into the sockets! The neutral, non-combative term might have been “distracting the heads away…”. The force that the hip-extended procedure uses actually hides many indications of risk, while the distractive force of PennHIP reveals the otherwise covert laxity! Nor is the distraction technique any longer experimental, as the pamphlet erroneously continues to proclaim. Good science demands revising reprints when new evidence is developed, and the 1998 reprint or recirculation by OFA fails to practice that protocol.

Dr. Lust, et.al. were quoted (1993) in that pamphlet and website version as supposedly lending credibility to OFA’s claims of Penn’s poor predictability in pups with DI between 0.4 and 0.7 but other references in which Lust supports the distraction technique are not mentioned. (Farese JP, Lust G, Williams AJ, et al. Comparison of measurements of dorsolateral subluxation of the femoral head and maximal passive laxity for evaluation of the coxofemoral joint in dogs. Amer J Vet Res 1999;60:1571-1576.) With Drs. Rendano and Summers at Cornell, Lust noted (Lust G, Rendano VT, Summers BA. Canine hip dysplasia: concepts and diagnosis. J Am Vet Med Assoc 1985;187:638-640.) that in one dog radiographed in “the standard [OFA] view” twice the same day, there was a diagnosis of dysplasia at one examination and normal at the other. In similar trials at the University of PA, many dogs were repeatedly radiographed in that “standard” ventro-dorsal position, often with different results, even though only board-certified radiologists were used in the tests. Remember, one definition of reliability is repeatability. Others, judged dysplastic at 12 months, appeared normal in the hip-extended view at 2 years, again a failing of the highly inaccurate and non-predictive hip-extended position.
In that OFA publication, it was said that “OFA would be remiss in its responsibility to either endorse or reject the PennHIP testing method…” yet spokesmen for OFA continually reject and attack the method. It reminds me of the bigot who defends a miscreant by starting his remarks with “Far be it from me to suggest a lynching…”

I don’t want you to get the idea that I want to denigrate the OFA. It has done a wonderful job in the past, in bringing HD and its consequences to the attention of thousands of new arrivals to the dog scene. And it has an invaluable service in several other disorders, both orthopedic and other. It filled a vacuum for many years and stimulated much inquiry on the part of scientists who went further along that road, and breeders who used its revelations to improve their bloodlines and breeds, sometimes to a noticeable extent. It’s just that there is another mountain higher to climb, and we now have the modern tools to scale it. I am optimistic, and hold out hope that the people who run OFA will someday catch up to the present, and adopt the PennHIP method and database as an adjunct or optional “extra” to the rating service that they now provide. Give the class of breeders something in which they can “earn extra credit”.

An almost unnoticed improvement in openness on the part of OFA was shown in a September 2000 JAVMA article (Reed AL, Keller GG, Keller EA, et al. Effect of dam and sire qualitative hip conformation scores on progeny hip conformation. J Am Vet Med Assoc 2000;217:675-680) that reported on a longitudinal study of four breeds. That type of study is really a review of data already collected and in the archives, with a specific purpose in mind. In this case, it was to give further strength to the obvious: that better hip conformation in parents tended to result in better hip conformation in the progeny. But really interesting were the hints of admission that the data and conclusions garnered by OFA were imperfect; this is an admirable advance in attitude over past years’ near-hubris. Quotes include: “Because information was submitted voluntarily, caution should be used when interpreting results… we did not have hip conformation scores for every dog from every litter… There also may be various degrees of prescreening of radiographs prior to submission… [and that] bias remained constant over time.”

Also in that report we find some substantiation for the premise that breeding selection, even when using less than the best diagnostic methods, can improve average quality in a given population (showdogs, for example) up to a point. (Kaneene JB, Mostosky UV, Padgett GA. Retrospective cohort study of changes in hip joint phenotype of dogs in the United States. J Am Vet Med Assoc 1997;211:1542-1544). Conclusions generated from the archived data implied that “dam and sire hip conformation scores had a significant effect on progeny… [and] gene contributions from sire and dam were equal.” Further, “the OFA recommendations that only dogs with excellent or good… be used… and use of dogs with fair [hip] conformation is questionable.” This comes close to what I have long maintained, that using an OFA-Fair is no better than using a Mild-HD dog in one’s breeding program. On the other hand, if one were to rely only on the hip-extended view for breeding decisions re hips, the use of only dogs with OFA-Excellent hip quality will usually give “an overall improvement”. This September 2000 report also mentioned “it is more difficult to differentiate excellent hips from good hips than it is to differentiate normal hips from dysplastic hips”.

We know that the further toward the middle-right of the percentile rankings, with Distraction indices between 0.4 and 0.7, the harder it is to pinpoint the exact risk or describe fully the DJD that will occur. This is partly because control of the environments such as growth rate, calcium and calorie intake, etc., can cause great variation in dogs genetically predisposed to HD. OFA should not have been blindsided by such an obvious fact and deduction.

It is also hard to decide where to draw the line and call the hip moderate or severe, but harder to distinguish between mild and moderate, since these are subjective terms. This “middle ground” is where we have always found the most disagreement among evaluators. As a rule, those in the Fair-normal, Borderline, and Mild-HD categories are likely to have the greatest disparity in descriptions between evaluators. Some disagreement is even found in the interpretation of films rated “Good” by OFA, but more agreement among panel participants is had with hips that appear Excellent, or Moderately-to-Severely dysplastic. Remember, in this paragraph we are talking about disagreements concerning the same piece of evidence, a single film that is circulated to various people. There is much more variance if different films (same animal, radiographed at slightly different times) are evaluated. Also, there seems to be frequent (and sometimes considerable) discrepancy between a foreign evaluation such as SV, and that of OFA readers.

It is known that breeding selection for normal hips results not only in fewer cases of HD, but also in lower average severity. And this is why we would have cause to (mildly) rejoice: the higher relative numbers for “excellents” might indeed a sign of progress, but how rapid? And how reliable are those higher percentages? Yes. using OFA-Excellents almost exclusively will give faster progress that just breeding any grade of “OFA-normal”, but there are still too many false-negatives (nice films but not so nice genotype). Even an unacceptably high number of OFA-Excellents at 2 years will show up in the more-revealing PennHIP view to have worse hips than would otherwise be suspected. There is a better way! It remains true that laxity is a risk factor for DJD (breed for breed, the most important factor), and an indication of more “bad genes” than are desirable. And that risk indicator, laxity, is best seen with the PennHIP distraction technique.
Examples of Two Report Forms Used by PennHIP:

Fred Lanting

Fred Lanting is an internationally respected show judge, approved by many registries as an all-breed judge, has judged numerous countries’ Sieger Shows and Landesgruppen events, and has many years experience as one of only two SV breed judges in the US. He presents seminars and consults worldwide on such topics as Gait-&-Structure, HD and Other Orthopedic Disorders, and The GSD. He conducts annual non-profit sightseeing tours of Europe, centered on the Sieger Show (biggest breed show in the world) and BSP.

SIRIUS DOG

Looking Closer at OFA’s Claims about PennHIP

Comparing H-E and PennHIP (again)

Books by Fred Lanting