Research Article

Dissecting racial bias in an algorithm used to manage the health of populations

See allHide authors and affiliations

Science  25 Oct 2019:
Vol. 366, Issue 6464, pp. 447-453
DOI: 10.1126/science.aax2342
  • Fig. 1 Number of chronic illnesses versus algorithm-predicted risk, by race.

    (A) Mean number of chronic conditions by race, plotted against algorithm risk score. (B) Fraction of Black patients at or above a given risk score for the original algorithm (“original”) and for a simulated scenario that removes algorithmic bias (“simulated”: at each threshold of risk, defined at a given percentile on the x axis, healthier Whites above the threshold are replaced with less healthy Blacks below the threshold, until the marginal patient is equally healthy). The × symbols show risk percentiles by race; circles show risk deciles with 95% confidence intervals clustered by patient. The dashed vertical lines show the auto-identification threshold (the black line, which denotes the 97th percentile) and the screening threshold (the gray line, which denotes the 55th percentile).

  • Fig. 2 Biomarkers of health versus algorithm-predicted risk, by race.

    (A to E) Racial differences in a range of biological measures of disease severity, conditional on algorithm risk score, for the most common diseases in the population studied. The × symbols show risk percentiles by race, except in (C) where they show risk ventiles; circles show risk quintiles with 95% confidence intervals clustered by patient. The y axis in (D) has been trimmed for readability, so the highest percentiles of values for Black patients are not shown. The dashed vertical lines show the auto-identification threshold (black line: 97th percentile) and the screening threshold (gray line: 55th percentile).

  • Fig. 3 Costs versus algorithm-predicted risk, and costs versus health, by race.

    (A) Total medical expenditures by race, conditional on algorithm risk score. The dashed vertical lines show the auto-identification threshold (black line: 97th percentile) and the screening threshold (gray line: 55th percentile). (B) Total medical expenditures by race, conditional on number of chronic conditions. The × symbols show risk percentiles; circles show risk deciles with 95% confidence intervals clustered by patient. The y axis uses a log scale.

  • Table 1 Descriptive statistics on our sample, by race.

    BP, blood pressure; LDL, low-density lipoprotein.

    WhiteBlack
    n (patient-years)88,08011,929
    n (patients)43,5396079
    Demographics
    Age51.348.6
    Female (%)6269
    Care management program
    Algorithm score (percentile)5052
    Race composition of program (%)81.818.2
    Care utilization
    Actual cost$7540$8442
    Hospitalizations0.090.13
    Hospital days0.500.78
    Emergency visits0.190.35
    Outpatient visits4.944.31
    Mean biomarker values
    HbA1c (%)5.96.4
    Systolic BP (mmHg)126.6130.3
    Diastolic BP (mmHg)75.575.7
    Creatinine (mg/dl)0.890.98
    Hematocrit (%)40.737.8
    LDL (mg/dl)103.4103.0
    Active chronic illnesses (comorbidities)
    Total number of active illnesses1.201.90
    Hypertension0.290.44
    Diabetes, uncomplicated0.080.22
    Arrythmia0.090.08
    Hypothyroid0.090.05
    Obesity0.070.18
    Pulmonary disease0.070.11
    Cancer0.070.06
    Depression0.060.08
    Anemia0.050.10
    Arthritis0.040.04
    Renal failure0.030.07
    Electrolyte disorder0.030.05
    Heart failure0.030.05
    Psychosis0.030.05
    Valvular disease0.030.02
    Stroke0.020.03
    Peripheral vascular disease0.020.02
    Diabetes, complicated0.020.07
    Heart attack0.010.02
    Liver disease0.010.02
  • Table 2 Performance of predictors trained on alternative labels.

    For each new algorithm, we show the label on which it was trained (rows) and the concentration of a given outcome of interest (columns) at or above the 97th percentile of predicted risk. We also show the fraction of Black patients in each group.

    Algorithm training labelConcentration in highest-risk patients (SE)Fraction of Black patients in group with highest risk (SE)
    Total costsAvoidable costsActive chronic conditions
    Total costs0.165(0.003)0.187(0.003)0.105(0.002)0.141(0.003)
    Avoidable costs0.142(0.003)0.215(0.003)0.130(0.003)0.210(0.003)
    Active chronic conditions0.121(0.003)0.182(0.003)0.148(0.003)0.267(0.003)
    Best-to-worst difference0.044 0.033 0.043 0.126
  • Table 3 Doctors’ decisions versus algorithmic predictions.

    For those enrolled in the high-risk care management program (1.3% of our sample), we first show the fraction of the population that is Black, as well as the fraction of all costs and chronic conditions accounted for by these observations. We also show these quantities for four alternative program enrollment rules, which we simulate in our dataset (using the holdout set when we use our experimental predictors). We first calculate the program enrollment rate within each percentile bin of predicted risk from the original algorithm and either (i) randomly sample patients or (ii) sample those with the highest predicted number of active chronic conditions within a bin and assign them to the program. The resultant values are then compared with values obtained by simply assigning the aforementioned 1.3% of our sample with (iii) the highest predicted cost or (iv) the highest number of active chronic conditions to the program.

    PopulationFraction Black (SE)Fraction of all costs (SE)Fraction of all active chronic conditions (SE)
    Observed program enrollment (1.3%)0.192(0.003)0.029(0.001)0.033(0.001)
    Simulated alternative enrollment rules
    Random, in predicted-cost bin0.183(0.003)0.044(0.002)0.034(0.001)
    Predicted health, in predicted-cost bin0.269(0.003)0.044(0.002)0.064(0.002)
    Highest predicted cost0.172(0.003)0.100(0.002)0.047(0.002)
    Worst predicted health0.292(0.004)0.067(0.002)0.076(0.002)

Supplementary Materials

  • Dissecting racial bias in an algorithm used to manage the health of populations

    Ziad Obermeyer, Brian Powers, Christine Vogeli, Sendhil Mullainathan

    Materials/Methods, Supplementary Text, Tables, Figures, and/or References

    Download Supplement
    • Materials and Methods 
    • Figs. S1 to S5
    • Tables S1 to S4 
    • References 

Stay Connected to Science

Navigate This Article