Eric Busboom.
The variable used in the following charts are:
score_all
: Test score averaged across all students in the districtscore_ses
: Test score of high socioeconomic static students (SES) minus low-SES students; the gap between rich an dpoor.score_wb
: Test score of white students minus the test score of black students; the white-black gap.
staff_black_rate
: Percetage of teachers who are blackstaff_hisp_rate
: Percentage of teachers who are asian.staff_whasian_rate
: Percentage of teachers who are white or asian.staff_male_rate
: Percentage of teachers who are malestaff_teacher_rate
: Ratio of teachers to total staff.
This analysis is limited to districts where the annual cost per student is less than $20,000, which excludes very small districts that have unusual circumstances and should be analyzed independently.
Correlation Matrix
<AxesSubplot:>

Detailed Correlations
These tables show the correlations between the variables in the a
and b
columns.
More asian and white teachers are correlated with higher test scores across all categories. More asian white teachers has a moderate association with increasing college rate for black students, but much less so for other students.
a | b | corr | |
---|---|---|---|
19 | staff_whasian_rate | score_all | 0.370149 |
20 | staff_whasian_rate | score_black | 0.369934 |
26 | staff_whasian_rate | score_ses | 0.302490 |
28 | staff_whasian_rate | cgr_black | 0.221133 |
30 | staff_whasian_rate | score_white | 0.183244 |
48 | staff_whasian_rate | cgr_all | 0.044013 |
49 | staff_whasian_rate | score_wb | 0.022123 |
59 | staff_whasian_rate | cost_per_ada | -0.024438 |
77 | staff_whasian_rate | enr_black_rate | -0.180846 |
Higher per-student spending is associated with higher scores for white students and lower scores for black students.
a | b | corr | |
---|---|---|---|
0 | cost_per_ada | cost_per_ada | 1.000000 |
31 | cost_per_ada | score_wb | 0.182259 |
50 | cost_per_ada | score_white | 0.018462 |
55 | cost_per_ada | score_ses | -0.018445 |
57 | cost_per_ada | score_all | -0.021149 |
70 | cost_per_ada | cgr_all | -0.157765 |
74 | cost_per_ada | cgr_black | -0.170933 |
80 | cost_per_ada | score_black | -0.207649 |
More black teachers is associated with lower scores for black students and a lower rate of college for black students.
a | b | corr | |
---|---|---|---|
5 | staff_black_rate | enr_black_rate | 0.750346 |
34 | staff_black_rate | score_wb | 0.171786 |
46 | staff_black_rate | cost_per_ada | 0.049291 |
66 | staff_black_rate | cgr_all | -0.121854 |
71 | staff_black_rate | score_white | -0.162522 |
76 | staff_black_rate | score_ses | -0.176879 |
83 | staff_black_rate | cgr_black | -0.251551 |
85 | staff_black_rate | score_all | -0.263363 |
88 | staff_black_rate | staff_whasian_rate | -0.339720 |
89 | staff_black_rate | score_black | -0.441095 |
More male teachers is associated with higher scores for black students.
a | b | corr | |
---|---|---|---|
37 | staff_male_rate | staff_black_rate | 0.131872 |
38 | staff_male_rate | cost_per_ada | 0.112925 |
40 | staff_male_rate | score_wb | 0.091101 |
42 | staff_male_rate | cgr_all | 0.080405 |
45 | staff_male_rate | enr_black_rate | 0.068373 |
61 | staff_male_rate | score_black | -0.037697 |
62 | staff_male_rate | score_white | -0.041318 |
63 | staff_male_rate | cgr_black | -0.044097 |
64 | staff_male_rate | score_ses | -0.075816 |
65 | staff_male_rate | score_all | -0.115924 |
67 | staff_male_rate | staff_whasian_rate | -0.145117 |
72 | staff_male_rate | enr_whasian_rate | -0.167281 |
Detailed Regression Plots
Detailed scatter plots with a regresion line for pairs of variables. The two variables are shown in the title of each plot, where score_var
is the name of variable on the y axis, and rate_var
is the variable on the x axis.

Imputation Test
There are a lot of missing data, so it may be worth while to impute missing records. This grid of KDE plots shows the oroginal data series in orange, and the imputed ( KNN, n=2 ) data series in blue. Plots for variables where the blue and orange curves align exactly do not require imputation, and where the lines diverge greatly ( such as cgr_black
) the imputation is probably adversely affecting the statistics.
