(Continuing from last week’s post …)
We all jump to conclusions quickly. Journalists are the most famous for it — three of anything within a week is a trend. But, in fact, we all do it, journalist or not, in an attempt to understand the world’s complexity.
So it’s normal to look at a few census tracts and think, Gee, that’s an area where I know a lot of new Chinese immigrants live (or that’s what the neighbours say). And it’s an expensive area. And it looks like a lot of people are claiming they have low incomes there. So, it must be that …
The flaw is that we don’t look beyond that. We don’t look at the neighbourhood next door, one that also has a lot of new Chinese immigrants, one that’s also in an expensive area, but one where the rate of low-income households is not that high. Or the area that is expensive and that has a high rate of low-income households but where the level of new Chinese immigrants is low.
Academic researchers run what are called regression analyses to test whether there is really a connection between different variables.
It’s a standard feature of almost any study that attempts to prove anything. Do people who do strength training have better results in retaining memory functions than people who do yoga? You need to make sure that the strength-trainers and the yoga-ites are comparable on every other scale: a mix of ages and occupations, a similar range of diets, comparable levels of crossword-puzzle and other brain-boosting activities, and so on. Otherwise you run the risk of concluding that strength training is better, when actually the kind of people who like to do strength training also do a number of other related things that tend to boost memory.
My last post got kind of long, so I skipped talking about this.
But I’m adding this to make the point to everyone trying to understand Vancouver by looking at a few census tracts at a time. It can’t be done, or at least not to the level that any serious researcher would think was credible. You have to look at the region overall and figure out, i.e. does the number of new Chinese immigrants correlate at a significantly high rate with the level of low-income households in that tract. (It’s also good if you can do these kinds of correlation over time as well i.e. as more and more new Chinese immigrants move into an area, does the rate of this or that other factor increase as well in something approaching lockstep?)
This is true for some of the research showing up related to students or homemakers being listed on land titles as owners. On the face of it, it seems weird that people in those categories are listed as owners in expensive areas. But to make the case absolutely solidly, you’d need to look at what occupations are listed everywhere. Maybe it will show that a suspiciously high number of homemakers and students are buying only in particular high-end areas. But maybe it will show that a lot of homes throughout the region are allegedly owned by homemakers and students and that there’s barely a difference between the west side and anywhere else in town.
Yes, it’s a lot of work to do that. Hardly anyone is doing that kind of work. Well, except for one of my researcher friends who has provided the analysis below. He ran the variables through the program to see whether there were any distinct correlations. There weren’t, except for very minor effects. I’ve provided his full analysis below. I can’t understand more than a quarter of it, but maybe some of you can. I can understand enough to see that, yet again, there’s no smoking gun yet.
I hope everyone gets that I’m doing this because I believe that ideas should be tested. It’s dangerous to have everyone spouting the same conventional wisdom.
. ************************************
. *
. * CENSUS TRACT DATA SETS: CORRELATES OF SHELTER COSTS > INCOME IN THE 2011 NHS – ANALYSIS FOR VANCOUVER METRO
.
THE VARIABLE NAMES & DESCRIPTIONS
storage display value
variable name variable label
——————
cmaname CMA name
cmauid CMA code
ctname Census Tract name
ct_gnr CT GNR (%)”
TotPriHH Total # of private HHs
AreaKm Land area in square kilometres
juris_id BCA jurisdiction code
HHdensity Density of HH – per sq km
Owners Total HHs – Owners
Renters Total HHs – Renters
HHMedInc Median HH total income
MedVal Median value of dwellings
MedRent Median monthly shelter costs for rented dwellings
Pct0_30 Pct of HH paying 0-30% of income for shelter
Pct30_99 Pct of HH paying 30-99% of income for shelter
Pct100up Pct of HH paying 100% of income or more for shelter
PctOwn30up % of owner HH spending 30% + of HH income on shelter costs
MedOwnPay Median monthly shelter costs for owned dwellings
RecImm5 Total # Recent Immigrants (Last 5 years)
RecImmInd Total # of Recent Immigrants – India
RecImmChi Total # of Recent Immigrants – China
RecImmPhil Total # of Recent Immigrants – Philippines
RecImmIran Total # of Recent Immigrants – Iran
ImmTot Total # Immigrants
TotImmInd Total # of Immigrants – India
TotImmChi Total # of Immigrants – China
TotImmPhil Total # of Immigrants – Philippines
TotImmIran Total # of Immigrants – Iran
TotImmHK Total # of Immigrants – Hong Kong
TotImmViet Total # of Immigrants – Viet Nam
. ***********************************
. *
. * REGRESSIONS ON PCT OF HOUSEHOLDS (RENTERS AND OWNERS) W/ SHELTER PAYMENTS > INCOME
These regressions test for correlates across 450 Vancouver CMA census tracts. Est. percentage in tract with shelter costs greater than income (not sure how 0 income and 0 costs are treated). Like all regressions the results assume that the other variable values re not changing so higher X keeping all other variables unchanged (which is hard in reality if you think of increasing the number of recent immigrants while keeping the total number of immigrants unchanged.
This percentage in a tract is higher in tracts with more density, higher median house values, higher median rents, and lower median income. It falls with the total number of immigrants, but rises with the number of new (past 5 years) immigrants. Higher for Chinese recent immigrants, lower for Indian, and Filipino.
But these effects are pretty small. Mean tract avg is 7.2% (standard deviation of 3.28 percentage points) of households (HH) reporting shelter costs >income.
Mean # of Chinese recent immigrants is 80, Standard deviation is 135. Increasing the number by 135 (a 168% increase) would raise the HH of people in a census tract reporting shelter costs > income from 7.22 to 8.30 (an increase of 18%). Or another way, to increase the number of households reporting shelter costs > income by one, you would need to add 106 more Chinese recent immigrants to the tract. Half of this effect is common to all recent immigrants, the marginal effect if they are Chinese is thus just half of this
. areg Pct100up HHdensity MedVal MedRent HHMedInc ImmTot RecImm5 RecImmInd RecImmChi RecImmPhil RecImmIr
> an if cmauid==933, absorb(juris_id)
Linear regression, absorbing indicators Number of obs = 451
F( 10, 421) = 33.94
Prob > F = 0.0000
R-squared = 0.6048
Adj R-squared = 0.5776
Root MSE = 2.4276
———————————-
Pct100up | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+——————–
HHdensity | .0003058 .000062 4.94 0.000 .000184 .0004276
MedVal | 3.14e-06 6.72e-07 4.67 0.000 1.82e-06 4.46e-06
MedRent | .0030059 .0005704 5.27 0.000 .0018846 .0041271
HHMedInc | -.0000774 .00001 -7.73 0.000 -.0000971 -.0000577
ImmTot | -.0004567 .0002237 -2.04 0.042 -.0008963 -.000017
RecImm5 | .0051461 .0014461 3.56 0.000 .0023036 .0079887
RecImmInd | -.0045674 .0018941 -2.41 0.016 -.0082904 -.0008443
RecImmChi | .0047709 .0021152 2.26 0.025 .0006133 .0089285
RecImmPhil | -.0070449 .0025268 -2.79 0.006 -.0120116 -.0020782
RecImmIran | .0050884 .0044768 1.14 0.256 -.0037113 .0138882
_cons | 6.10212 .7518196 8.12 0.000 4.624332 7.579907
————-+——————–
juris_id | F(19, 421) = 1.221 0.236 (20 categories)
Vancouver
Pct100up Mean 7.22 std dev 3,78
PctOwn30up Mean 27.62335 std dev 6.837
PctRent30up Mean 40.77577 std dev 12.754
RecImmChi Mean 79.8 std dev 135.2
RecImmInd Mean 42.8 std dev 114
RecImmPhil Mean 52.6 std dev 75.4
HHdensity Mean 1993.744 std dev 2754.097
HHMedInc Mean 68666.32 std dev 20319.
MedVal Mean 645885.4 std dev 318822.
MedRent Mean 1008.376 std dev 287.332
In the regression below, I add a dummy variable (=1 if the tract has a median value > 1.25M and then interact this with the # of recent Chinese immigrants. The idea is to try to see if it is high house price / high Chinese immigrants tracts that have uniquely higher rates. Main point, is that no correlation between higher # of recent Chinese immigrants in higher house price tracts and HH reporting shelter costs > income.
. areg Pct100up HHdensity MedVal MedRent HHMedInc hivalue hival_chi ImmTot RecImm5 RecImmInd RecImmChi R
> ecImmPhil RecImmIran if cmauid==933, absorb(juris_id)
Linear regression, absorbing indicators Number of obs = 451
F( 12, 419) = 28.15
Prob > F = 0.0000
R-squared = 0.6048
Adj R-squared = 0.5756
Root MSE = 2.4334
———————————-
Pct100up | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+——————–
HHdensity | .0003059 .0000625 4.89 0.000 .000183 .0004288
MedVal | 3.20e-06 9.20e-07 3.48 0.001 1.39e-06 5.01e-06
MedRent | .0030105 .0005725 5.26 0.000 .0018852 .0041359
HHMedInc | -.0000777 .0000103 -7.56 0.000 -.0000979 -.0000575
hivalue | .0074929 1.300754 0.01 0.995 -2.549323 2.564309
hival_chi | -.000499 .0040146 -0.12 0.901 -.0083903 .0073923
ImmTot | -.0004627 .0002283 -2.03 0.043 -.0009114 -.0000139
RecImm5 | .0051772 .0014778 3.50 0.001 .0022723 .008082
RecImmInd | -.004567 .0019021 -2.40 0.017 -.0083058 -.0008282
RecImmChi | .0048346 .0021715 2.23 0.027 .0005663 .0091029
RecImmPhil | -.0070798 .0025449 -2.78 0.006 -.0120822 -.0020774
RecImmIran | .0050731 .0045187 1.12 0.262 -.003809 .0139552
_cons | 6.08185 .7996954 7.61 0.000 4.509935 7.653764
————-+——————–
juris_id | F(19, 419) = 1.214 0.241 (20 categories)
. ***********************************
. *
. * REGRESSIONS ON PERCENTAGE OF OWNER HH’S PAYING > 30% OF INCOME ON SHELTER
. *
Here I look at owner HH paying more than 30% of income. What is striking is that it is not correlated with median tract house value. Falls with median tract income, higher in tracts w/ more recent immigrants, but not especially Chinese recent immigrants, though it I correlated with more recent immigrants from India. That the recent Chinese immigrant correlation is not present for this regression (mainly because of a less precisely estimated effect, so there is more noise in the connection)
. areg PctOwn30up HHdensity MedVal MedRent HHMedInc ImmTot RecImm5 RecImmInd RecImmChi RecImmPhil RecIm
> mIran if cmauid==933, absorb(juris_id)
Linear regression, absorbing indicators Number of obs = 452
F( 10, 422) = 22.72
Prob > F = 0.0000
R-squared = 0.4418
Adj R-squared = 0.4035
Root MSE = 5.1884
———————————-
PctOwn30up | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+——————–
HHdensity | .0000429 .000132 0.33 0.745 -.0002164 .0003023
MedVal | 1.30e-06 1.43e-06 0.91 0.362 -1.50e-06 4.11e-06
MedRent | .0059343 .0011865 5.00 0.000 .0036021 .0082665
HHMedInc | -.0001558 .0000214 -7.29 0.000 -.0001978 -.0001138
ImmTot | .0002155 .0004744 0.45 0.650 -.0007169 .001148
RecImm5 | .0069896 .0030906 2.26 0.024 .0009148 .0130644
RecImmInd | .0105182 .0040467 2.60 0.010 .0025641 .0184723
RecImmChi | .0033269 .0045028 0.74 0.460 -.0055238 .0121776
RecImmPhil | -.0088441 .0054003 -1.64 0.102 -.0194589 .0017708
RecImmIran | -.0022972 .0090973 -0.25 0.801 -.0201788 .0155845
_cons | 28.38539 1.561183 18.18 0.000 25.31672 31.45405
————-+——————–
juris_id | F(19, 422) = 1.893 0.013 (20 categories)
. areg PctOwn30up HHdensity MedVal MedRent HHMedInc hivalue hival_chi ImmTot RecImm5 RecImmInd RecImmCh
> i RecImmPhil RecImmIran if cmauid==933, absorb(juris_id)
Linear regression, absorbing indicators Number of obs = 452
F( 12, 420) = 18.91
Prob > F = 0.0000
R-squared = 0.4426
Adj R-squared = 0.4015
Root MSE = 5.1971
———————————-
PctOwn30up | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+——————–
HHdensity | .0000414 .0001329 0.31 0.756 -.0002199 .0003026
MedVal | 1.79e-06 1.93e-06 0.93 0.352 -1.99e-06 5.58e-06
MedRent | .0059687 .0011897 5.02 0.000 .0036302 .0083072
HHMedInc | -.0001588 .0000219 -7.26 0.000 -.0002017 -.0001158
hivalue | .4558599 2.745186 0.17 0.868 -4.940156 5.851876
hival_chi | -.0057133 .0085679 -0.67 0.505 -.0225547 .011128
ImmTot | .0001598 .0004855 0.33 0.742 -.0007946 .0011141
RecImm5 | .0072346 .0031536 2.29 0.022 .0010358 .0134333
RecImmInd | .0105596 .0040598 2.60 0.010 .0025795 .0185397
RecImmChi | .0040504 .0046194 0.88 0.381 -.0050297 .0131304
RecImmPhil | -.0091559 .0054349 -1.68 0.093 -.0198389 .0015271
RecImmIran | -.0021824 .0091348 -0.24 0.811 -.0201381 .0157733
_cons | 28.26762 1.640656 17.23 0.000 25.0427 31.49254
————-+——————–
juris_id | F(19, 420) = 1.908 0.012 (20 categories)
. areg PctOwn30up HHdensity MedRent HHMedInc hivalue hival_chi ImmTot RecImm5 RecImmInd RecImmChi RecIm
> mPhil RecImmIran if cmauid==933, absorb(juris_id)
Linear regression, absorbing indicators Number of obs = 452
F( 11, 421) = 20.56
Prob > F = 0.0000
R-squared = 0.4415
Adj R-squared = 0.4017
Root MSE = 5.1963
———————————-
PctOwn30up | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+——————–
HHdensity | 1.34e-06 .0001257 0.01 0.991 -.0002458 .0002485
MedRent | .0060131 .0011885 5.06 0.000 .0036769 .0083493
HHMedInc | -.0001483 .0000187 -7.92 0.000 -.0001851 -.0001115
hivalue | 1.721937 2.384254 0.72 0.471 -2.964587 6.408461
hival_chi | -.00566 .0085664 -0.66 0.509 -.0224983 .0111782
ImmTot | .0001717 .0004853 0.35 0.724 -.0007821 .0011256
RecImm5 | .0067994 .0031182 2.18 0.030 .0006702 .0129287
RecImmInd | .0110327 .0040273 2.74 0.006 .0031166 .0189487
RecImmChi | .0048544 .0045372 1.07 0.285 -.0040641 .0137728
RecImmPhil | -.0090561 .005433 -1.67 0.096 -.0197352 .0016231
RecImmIran | -.0016543 .0091158 -0.18 0.856 -.0195724 .0162638
_cons | 28.70721 1.570989 18.27 0.000 25.61925 31.79517
————-+——————–
juris_id | F(19, 421) = 1.863 0.015 (20 categories)
. ***********************************
. *
. * REGRESSIONS ON PERCENTAGE OF RENTER HH’S PAYING > 30% OF INCOME ON SHELTER
. *
.
Finally, the same exercise for the percentage of renter HH paying more than 30% of their income on shelter. As expected, higher percentage of renter HH w/ shter costs > 30% of income in tracts with higher median values and rents and lower median incomes. Higher where there are more recent immigrants, but lower in tracts with more recent immigrant from India
. areg PctRent30up HHdensity MedVal MedRent HHMedInc ImmTot RecImm5 RecImmInd RecImmChi RecImmPhil RecI
> mmIran if cmauid==933, absorb(juris_id)
Linear regression, absorbing indicators Number of obs = 452
F( 10, 422) = 18.45
Prob > F = 0.0000
R-squared = 0.3989
Adj R-squared = 0.3576
Root MSE = 10.1124
———————————-
PctRent30up | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+——————–
HHdensity | .0001175 .0002572 0.46 0.648 -.000388 .0006231
MedVal | 9.53e-06 2.78e-06 3.43 0.001 4.06e-06 .000015
MedRent | .0137682 .0023125 5.95 0.000 .0092227 .0183138
HHMedInc | -.0004058 .0000416 -9.74 0.000 -.0004876 -.0003239
ImmTot | -.0008175 .0009246 -0.88 0.377 -.0026349 .0009999
RecImm5 | .0157228 .0060236 2.61 0.009 .0038827 .0275629
RecImmInd | -.0277157 .0078871 -3.51 0.000 -.0432186 -.0122128
RecImmChi | -.0115154 .0087761 -1.31 0.190 -.0287657 .0057349
RecImmPhil | -.0152668 .0105254 -1.45 0.148 -.0359555 .005422
RecImmIran | -.0260211 .017731 -1.47 0.143 -.060873 .0088309
_cons | 47.89108 3.042807 15.74 0.000 41.91013 53.87202
————-+——————–
juris_id | F(19, 422) = 1.483 0.087 (20 categories)
. areg PctRent30up HHdensity MedVal MedRent HHMedInc hivalue hival_chi ImmTot RecImm5 RecImmInd RecImmChi RecImmPhil RecImmIran if cmauid==933, absorb(juris_id)
Linear regression, absorbing indicators Number of obs = 452
F( 12, 420) = 15.35
Prob > F = 0.0000
R-squared = 0.3995
Adj R-squared = 0.3552
Root MSE = 10.1311
———————————-
PctRent30up | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+——————–
HHdensity | .0001043 .0002591 0.40 0.688 -.000405 .0006135
MedVal | 7.85e-06 3.75e-06 2.09 0.037 4.71e-07 .0000152
MedRent | .0137653 .0023191 5.94 0.000 .0092068 .0183238
HHMedInc | -.0004007 .0000426 -9.40 0.000 -.0004844 -.0003169
hivalue | 2.444142 5.351347 0.46 0.648 -8.074617 12.9629
hival_chi | .0009968 .016702 0.06 0.952 -.031833 .0338267
ImmTot | -.0006908 .0009465 -0.73 0.466 -.0025512 .0011695
RecImm5 | .0149404 .0061474 2.43 0.016 .0028568 .0270239
RecImmInd | -.0275182 .0079141 -3.48 0.001 -.0430743 -.0119621
RecImmChi | -.0118728 .0090049 -1.32 0.188 -.029573 .0058274
RecImmPhil | -.0146099 .0105946 -1.38 0.169 -.0354349 .006215
RecImmIran | -.0254652 .0178071 -1.43 0.153 -.0604673 .0095369
_cons | 48.5118 3.198224 15.17 0.000 42.22528 54.79831
————-+——————–
juris_id | F(19, 420) = 1.354 0.146 (20 categories)
. areg PctRent30up HHdensity MedRent HHMedInc hivalue hival_chi ImmTot RecImm5 RecImmInd RecImmChi RecImmPhil RecImmIran if cmauid==933, absorb(juris_id)
Linear regression, absorbing indicators Number of obs = 452
F( 11, 421) = 16.22
Prob > F = 0.0000
R-squared = 0.3933
Adj R-squared = 0.3500
Root MSE = 10.1716
———————————-
PctRent30up | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+——————–
HHdensity | -.000071 .0002461 -0.29 0.773 -.0005548 .0004128
MedRent | .0139598 .0023265 6.00 0.000 .0093867 .0185328
HHMedInc | -.0003548 .0000367 -9.68 0.000 -.0004269 -.0002827
hivalue | 7.987928 4.66708 1.71 0.088 -1.185753 17.16161
hival_chi | .0012302 .0167684 0.07 0.942 -.0317299 .0341903
ImmTot | -.0006385 .0009499 -0.67 0.502 -.0025057 .0012287
RecImm5 | .0130351 .0061038 2.14 0.033 .0010373 .0250328
RecImmInd | -.0254467 .0078832 -3.23 0.001 -.040942 -.0099513
RecImmChi | -.0083523 .0088814 -0.94 0.348 -.0258098 .0091052
RecImmPhil | -.0141726 .0106348 -1.33 0.183 -.0350766 .0067314
RecImmIran | -.0231531 .0178438 -1.30 0.195 -.0582271 .0119208
_cons | 50.4366 3.075147 16.40 0.000 44.39204 56.48116
————-+——————–
juris_id | F(19, 421) = 1.147 0.301 (20 categories)
.
end of do-file