“There’s lies, there’s damn lies, and then there is statistics.” Mark Twain’s words apply to the outsized, misplaced influence of fertility statistics today
All of us in medicine are motivated by the three principles of transparency, value and quality. We address the transparency and value parts in our blog “Building Value into Our Fertility Clinic.” Here we’re delving into the quality of fertility medicine. And you can be sure that the devil is in the details!
What is quality? In our field, we have lived with federally mandated reporting to the Society for Assisted Reproductive Technology (SART) since 1992. Sangita is research chair and executive council member of SART, so we are very familiar with how SART works. This mandated reporting, for better or worse, has been taken as a boon to consumers who can access the data and select their IVF center based on the numbers.
But what do the numbers really tell you? It’s critical to understand what goes into the IVF reporting algorithm, especially because it is about to change, and in coming years it will appear as if the pregnancy rates are going down for many centers.
Here is a brief breakdown that’s not complete, but intended to be instructive. Let’s say an IVF center initiates 100 cycles per year. This means that 100 women are screened, eligible and begin a stimulation in that year.
The outcomes of those 100 women are reported for that year only and include babies born within two calendar years of cycle initiation. Therefore, the report you read today is always at least one year out of date.
The single most important statistic reported by SART is the live birth rate per initiated cycle. Here are some initial caveats to the interpretation of these numbers.
IVF success rates for pregnancy online today may not reflect current practices
There are wide regional variations in practice. Some centers perform a very high proportion of their cycles using preimplantation genetic screening (PGS), a process by which embryos are biopsied in the IVF cycle but not transferred back into the uterus until a subsequent cycle, after the genetic analysis has been completed. Use of PGS can affect pregnancy rates because the embryo transfer can be deferred for long periods of time, causing the delivery of a baby to be more than a year from the initiation of the cycle.
Practices that are just adopting PGS, or that are reducing their use of PGS, will see a fall or rise in pregnancy rates if the changes in use of PGS are large. The direction of the change depends upon how PGS is used. If, for example, a center adopts PGS for only its poorest prognosis patients, their statistics will look particularly good as they shuttle these patients out of the report for a given year, to be saved for a subsequent year (or incorrectly to never be reported at all if there is no embryo transfer because all of the embryos were abnormal).
This loophole has been closed by SART for reporting in 2014 and beyond. Current innovations in IVF include changes in embryo culture conditions, specifically oxygen tension, use of new incubators, lab air filtration, and other improvements that may not have been implemented when the report was made, but which have been done by the time you are ready for your treatment.
Small numbers lack statistical validity
In a program performing 100 cycles per year, it is difficult to make much sense of success rates when broken down by age. For example, this program may have very few women in the age range 38-40. If you are 39 years old and looking to see what your success might be in that particular program, there might be only 10-15 cycles in that group. One or two pregnancies more or less within a group that small will have a profound effect on the reported success rate, and may be statistically meaningless.
One statistic that is relatively comparable across centers is the donor egg cycle live birth rate. Fresh donor egg cycle live birth rates typically run around 50 percent nationally (in 2014 it was 53.6 percent), and this can serve as a useful comparator across centers. However, with a trend toward increasing use of frozen donor eggs, the number of initiated fresh donor egg cycles may be decreasing, thereby compromising the validity of the cross-center assessment.
Different centers move patients into IVF through different pathways
At CU Advanced Reproductive Medicine, we usually encourage couples to try several cycles of ovulation stimulation (OS) prior to proceeding to IVF. This is because an OS cycle costs between 10-20 percent of an IVF cycle, and if your fertility profile looks like this treatment might have a good chance of getting you pregnant without needing the more expensive and invasive IVF treatments, it’s a win/win.
As opposed to a center that moves patients immediately into IVF, this kind of policy ultimately affects our IVF success rates as reported to SART, because the patients who have conceived through these easier, less expensive methods are no longer among our patient pool of IVF candidates. Our corresponding IVF success rates will therefore be lower, as our IVF candidate pool will be enriched in couples with multiple infertility factors, low ovarian reserve or unidentified, undiagnosable fertilization problems—all of which make it harder to conceive with any treatment, including IVF.
How a practice’s variations affect its success numbers
Let’s demonstrate how a couple of simple variations in practice affect outcome numbers. We’ve provided two hypothetical examples where there is not necessarily a right or wrong way to do things, but the decisions made by the two IVF practices affect their apparent quality based on the criteria used for reporting.
Doctors in Center X saw 150 infertile patients that year who could be candidates for IVF. All of these patients met the standard definition of “infertility.” All of these patients were offered ovulation stimulation beforehand.
Based on available scientific evidence1,2 about 25 percent of these patients will conceive without ever requiring IVF. Therefore, the remaining 112 patients who did not conceive from ovulation stimulation underwent IVF. Of these 112 patients, the mean age of the female partner was 38 years. The overall live birth rate per IVF cycle initiation reported to SART was 23 percent for that year for Center X.
Doctors in Center Y also saw 150 infertile patients that year who could be candidates for IVF. Center Y has a more liberal policy on how infertility is defined, and they are in favor of completing a fertility evaluation and offering treatment at the patients’ request, regardless of the duration of infertility.
Center Y also has a policy of preferring immediate treatment with IVF, as it has superior success rates per cycle compared with all other forms of fertility therapy even though it is significantly more expensive than more conservative treatment. All 150 patients undergo IVF that year at Center Y. The mean age of the female partner is 34. Center Y’s website reports an overall 43 percent live birth rate per initiated cycle for that year.
Which is the better center? Isn’t Center Y clearly the place you would want to go if you are going to do IVF?
When comparison shopping, look beyond the numbers & go to the SART website
Note the disclaimer about how these statistics should not be used to compare one center to another. Little attention is usually paid to this disclaimer!
Looking at the 2014 preliminary primary outcome per intended retrieval, you can see that for women aged 38-40 (more characteristic of Center X’s patient population), a 22.4 percent live birth rate per intended retrieval is the national average. Given this knowledge, we might expect that Center X will report a 23 percent overall live birth rate, very close to the national average. For women aged under 35 years, more characteristic of Center Y’s patient population, the live birth rate is 42.6 percent per intended retrieval.
Thus, both centers are performing close to the national average for their patient population. There is no obvious advantage of one over the other based on the report. However, given the practices at Center X, it is likely that they are performing slightly above the national average with a patient population that has a worse prognosis. This might favor consideration of Center X over Center Y, despite the fact that the overall reported live birth rate for Center Y is almost double that for Center X.
Of course, the SART tables of statistics break down success by age of the female partner, because it is one of the most important determinants of live birth. But it’s not enough to just look at the live birth rates by age to ‘comparison shop’ for IVF centers.
As in the example above, different centers have different policies about who is eligible for IVF at their site. At CU-ARM we look critically at our own outcomes over the years and do not offer IVF as an option when the chances for success in our hands is very low (usually 5 percent or less).
Centers that are unduly restrictive and eliminate patients who have a poor prognosis from ever undertaking an IVF cycle will wind up reporting better outcomes3. And we haven’t even started discussing how the use of PGS can alter the apparent outcomes!
What can consumers do to truly understand or cross compare IVF centers? We are at the beginning of an era of “transparency, value and quality” reporting, and it’s a science in its infancy. It has lots of imperfections.
We don’t want to deconstruct it entirely, but in the words of Mark Twain, “There’s lies, there’s damn lies, and then there is statistics.”
1. Diamond MP, Legro RS, Coutifaris C, et al. Letrozole, Gonadotropin, or Clomiphene for Unexplained Infertility. N Engl J Med 2015;373:1230-40.
2. Legro RS, Brzyski RG, Diamond MP, et al. Letrozole versus clomiphene for infertility in the polycystic ovary syndrome. N Engl J Med 2014;371:119-29.
3. Kulak D, Jindal SK, Oh C, et al. Reporting in vitro fertilization cycles to the Society for Assisted Reproductive Technology database: where have all the cycles gone? Fertil Steril. 2016;105:927-931