Research Article

Combining satellite imagery and machine learning to predict poverty

See allHide authors and affiliations

Science  19 Aug 2016:
Vol. 353, Issue 6301, pp. 790-794
DOI: 10.1126/science.aaf7894
  • Fig. 1 Poverty data gaps.

    (A) Number of nationally representative consumption surveys occurring in each African country between 2000 and 2010. (B) Same as (A), for DHS surveys measuring assets. (C to F) Relationship between per capita consumption expenditure (measured in U.S. dollars) and nightlight intensity at the cluster level for four African countries, based on household surveys. Nationally representative share of households at each point in the consumption distribution is shown beneath each panel in gray. Vertical red lines show the official international extreme poverty line ($1.90 per person per day), and black lines are fits to the data with corresponding 95% confidence intervals in light blue.

  • Fig. 2 Visualization of features.

    By column: Four different convolutional filters (which identify, from left to right, features corresponding to urban areas, nonurban areas, water, and roads) in the convolutional neural network model used for extracting features. Each filter “highlights” the parts of the image that activate it, shown in pink. By row: Original daytime satellite images from Google Static Maps, filter activation maps, and overlay of activation maps onto original images

  • Fig. 3 Predicted cluster-level consumption from transfer learning approach (y axis) compared to survey-measured consumption (x axis).

    Results are shown for Nigeria (A), Tanzania (B), Uganda (C), and Malawi (D). Predictions and reported r2 values in each panel are from five-fold cross-validation. Black line is the best fit line, and red line is international poverty line of $1.90 per person per day. Both axes are shown in logarithmic scale. Countries are ordered by population size.

  • Fig. 4 Evaluation of model performance.

    (A) Performance of transfer learning model relative to nightlights for estimating consumption, using pooled observations across the four LSMS countries. Trials were run separately for increasing percentages of the available clusters (e.g., x-axis value of 40 indicates that all clusters below 40th percentile in consumption were included). Vertical red lines indicate various multiples of the international poverty line. Image features reduced to 100 dimensions using principal component analysis. (B) Same as (A), but for assets. (C) Comparison of r2 of models trained on correctly assigned images in each country (vertical lines) to the distribution of r2 values obtained from trials in which the model was trained on randomly shuffled images (1000 trials per country). (D) Same as (C), but for assets. Cross-validated r2 values are reported in all panels.

  • Fig. 5 Cross-border model generalization.

    (A) Cross-validated r2 values for consumption predictions for models trained in one country and applied in other countries. Countries on x axis indicate where model was trained, countries on y axis where model was evaluated. Reported r2 values are averaged over 100 folds (10 trials, 10 folds each). (B) Same as in (A), but for assets.

Supplementary Materials

  • Materials/Methods, Supplementary Text, Tables, Figures, and/or References

    Download Supplement
    • Materials and Methods
    • Figs. S1 to S8
    • Tables S1 to S3
    • Full reference list

Stay Connected to Science

Navigate This Article