Preventing undesirable behavior of intelligent machines

See allHide authors and affiliations

Science  22 Nov 2019:
Vol. 366, Issue 6468, pp. 999-1004
DOI: 10.1126/science.aag3311
  • Fig. 1 Overview of Seldonian regression algorithms.

    The algorithm takes the behavioral constraints (gi,δi)i=1n and training data D as input and outputs either a solution θc or NSF (no solution found). First, the data are partitioned into two sets, D1 and D2. Next, a routine called Candidate Selection uses D1 to select a single solution, the candidate solution θc, which it predicts will perform well under the primary objective f while also being likely to pass the subsequent safety test based on knowledge of the specific form of the test. The Safety Test mechanism checks whether the algorithm has sufficient confidence that gic) ≤ 0 for each constraint i ∈ {1, …, n}. If so, it returns the candidate solution θc, otherwise it returns NSF. The Safety Test routine uses standard statistical tools such as Student’s t test and Hoeffding’s inequality to transform sample statistics computed from D2 into bounds on the probability that g(a(D)) > 0 (i.e., bounds on the probability of undesirable behavior).

  • Fig. 2 Seldonian regression algorithm applied to GPA prediction.

    We used five different regression algorithms to predict students’ GPAs during their first three semesters at university based on their scores on nine entrance exams. We used actual data from 43,303 students from Brazil. Here, the user-selected definition of undesirable behavior corresponds to large differences in mean prediction errors (mean predicted GPA minus mean observed GPA) for applicants of different genders. This plot shows the mean prediction errors (±SD) for male and female students when using each regression algorithm. We used three standard ML algorithms—least squares linear regression (LR) (40), an artificial neural network (ANN) (41), and a random forest (RF) (42)—and two variants of our Seldonian algorithm: QNDLR and QNDLR(λ). All shown standard ML methods tend to notably overpredict the performance of male students and underpredict the performance of female students, whereas the two variants of our Seldonian regression algorithm do not. In particular, our algorithms ensure that, with approximately 95% probability, the expected prediction errors for men and women will be within ε = 0.05, and both effectively preclude the sexist behavior that was exhibited by the standard ML algorithms.

  • Fig. 3 Seldonian classification algorithm applied to GPA prediction.

    We applied classification algorithms to predict whether student GPAs will be above 3.0. Shaded regions represent SE over 250 trials. The curves labeled “Standard” correspond to common classification algorithms designed using the standard ML approach; the multiple curves for Fairlearn and Fairness Constraints correspond to different hyperparameter settings for each algorithm (14). Each row corresponds to a different fairness definition: (A) disparate impact, (B) demographic parity, (C) equal opportunity, (D) equalized odds, (E) predictive equality. The horizontal axes of all plots correspond to the amount of training data and have logarithmic scale. The left column shows the accuracy of the trained classifiers, the center column shows the probability that each algorithm returned a solution (non-Seldonian algorithms always returned solutions), and the right column shows the probability that each classifier violated a behavioral constraint. When showing the failure rate of each algorithm, the horizontal dashed line corresponds to 100δ%, where δ = 0.05. In all cases, the Seldonian and quasi-Seldonian algorithms returned solutions using a reasonable amount of data (center), did so without significant losses to accuracy (left), and were the only algorithms to reliably enforce all five fairness definitions (right).

  • Fig. 4 Seldonian reinforcement learning algorithm for proof-of-principle bolus calculation in type 1 diabetes.

    Results are averaged over 200 trials; shaded regions denote SE. The Seldonian algorithm is compared to an algorithm built using the standard ML approach that penalizes the prevalence of low blood sugar. (A) Probability that each method returns policies (solutions) that increase the prevalence of low blood sugar. The algorithm designed using the standard ML approach often proposed policies that increased the prevalence of low blood sugar, violating the safety constraint, even though it used an objective function (reward function) that penalized instances of hypoglycemia. In contrast, across all trials, our Seldonian algorithm was safe; it never changed the treatment policy in a way that increased the prevalence of low blood sugar. (B) Probability that each method returns a policy that differs from the initial policy. Our Seldonian algorithm was able to safely improve upon the initial policy with just 1 to 5 months of data. (C) Box plot (with outliers plotted) of the distribution of the expected returns (objective function values) of the treatment policies returned by the standard ML algorithm. The blue line depicts the sample mean; red lines within the boxes mark the medians. All points below –0.1116 [where the blue curve in (D) begins] correspond to cases where the standard ML algorithm both decreased performance and produced undesirable behavior (an increase in the prevalence of low blood sugar). (D) Similar to (C), but showing results for the Seldonian algorithm. The magenta line is the average of the performance when the algorithm produced a policy that differed from the initial policy. Notice that all points have values of at least –0.1116, indicating that our algorithm never produced undesirable behavior. When boxes appear to be missing, the boxes have zero width and are obscured by the red line indicating the median of the box.

Supplementary Materials

  • Preventing undesirable behavior of intelligent machines

    Philip S. Thomas, Bruno Castro da Silva, Andrew G. Barto, Stephen Giguere, Yuriy Brun, Emma Brunskill

    Materials/Methods, Supplementary Text, Tables, Figures, and/or References

    Download Supplement
    • Supplementary Text
    • Figs. S1 to S39
    • References 

Stay Connected to Science

Navigate This Article