Friday, May 12, 2017

Assignment 6

Part 1

The goal of this assignment was to determine whether or not there is a relationship between the percent  of children receiving free school meals and crime rates per hundred thousand people. This is being conducted as a local radio station from the town in which the data was gathered suggested that as the number of children receiving free meals at school increases, so does the crime rate. SPSS was used to conduct regression analysis to determine whether or not there is truth to this.

For this test, the percentage of children receiving free school meals is the independent variable, while the depended variable is the crime rates per one hundred thousand people.

Figure 1 shows the results of the SPSS statistical test. From the Standardized Coefficients column, we can see that the result is .416. This result indicates that there is a linear relationship between the percentage of children receiving free school meals and crime rates. This result is significant, as we can see in the column labelled 'Sig.' that the significance is .005 - anything smaller than .05 would indicate significance, as the confidence level for this test is 95%. Therefore, there is a relationship between the two. However, this relationship is strong enough to be considered significant but is not particularly strong, as a result of  0 would indicate that no relationship is present - this result is not much higher than 0, but is still positive. Therefore, while there is a positive relationship between the two that is significant, meaning that as the percent of children receiving free school meals increases so does the crime rate per one hundred thousand people, this is not particularly strong.
Figure 1 - SPSS output showing the standardized regression coefficient between free school meals and crime rates, as well as the significance of the results.

The equation for calculating regression is Y = a - bX. In this case, a is 21.819, as seen in the 'Unstandardized B' column above, while b = 1.685. Therefore, the equation would be Y = 21.819 - 1.685X. With this in mind, we can calculate what the corresponding crime rate would be for a town with 23.5% of children receiving free school meals. The calculation for this would be Y = 21.819 - (1.685*23.5). The result of this is a crime rate of 39.5975 per hundred thousand people. However, when looking at the r-squared value associated with the test, we can see that this is .173 (figure 2). This is otherwise known as the coefficient of determination and measures how well the independent variable explains the dependent. This is measured on a scale of -1 to 0 to +1, with +1 and - 1 indicating 100% confidence and 0 indicating no confidence. Therefore, the number of free school meals is explaining 17% of the crime rate, which is not particularly high. The r-squared here is above 0, but is still very low which suggests that while the results are significant the independent variable of free school lunches does a poor job of explaining the dependent variable of crime rates, which means that the calculated result above may not reflect reality.


Figure 2 - Model Summary from SPSS output showing r-squared value and standard error of estimate.
Part 2

Introduction

The goal of this section of the task is to advise a company on where in the city of Portland, Oregon, is the best location to place an ER, based on the number of 911 calls recieved per census tract and the factors influencing the number of calls. This is also useful to the City of Portland, who are interested in identifying the reasons contributing to the number of calls. The factors that they are primarily interested in examining the influence of are: unemployment; low education; and college graduates. Additional variables that could also contribute to a higher number of 911 calls are also of interest the City, which will help them make sure they have adequate response teams available based on each area.

Methods

First, three single regression analyses were conducted using SPSS with the dependent variable being number of calls, and the independent variables being: unemployment; low education; and college graduates. As there are additional factors that could influence the number of calls made from within a census tract, a multiple regression was completed that looked at these additional factors, such as the number of renters and number of people born in a foreign country. These were then mapped using residual maps. Next, multiple factors were analysed using multiple regression to see which had the largest impact on the number of calls made. A multicollinearity diagnostic was used to check that the results were accurate.


Results

Low Education - Figure 3 shows that there is a positive relationship between the number of people with low education levels and the number of 911 calls, which is significant as the significance value is .000 which is lower than .05. This being the case, we can reject the null hypothesis as there is a linear relationship between these variables. This suggests that as the number of people with low education levels in an area increases, so does the number of 911 calls. The equation for this would be Y = 3.931 - 0.166X. Therefore, for every unit of change (increasing the low education rate by one), we would see an increase of .166 in the number of 911 calls placed. Figure 4 shows us that the r-squared value is .567, which indicates that low education rates are explaining 56% of the variation in numbers of 911 calls.
Figure 3 - Coefficients table showing the regression analysis results between the number people with low education levels and 911 calls.

Figure four - Model summary of the regression between people with low education levels and 911 calls. 



 Percent College Graduates - Figure 5 shows that there is a negative linear relationship between the percentage college graduates in a census tract and the number of 911 calls. However, the significance result of .142 indicates that this result is not significant, so in this case we would have to fail to reject the null hypothesis, meaning that there is no linear relationship between the percentage of college graduates and the number of 911 calls. We can see from figure 6 that there is a very small r-squared value, of only 0.25, which shows that the number of college graduates can only explain 2% of the 911 calls. This highlights the fact that the percentage of college graduates in an area does not significantly contribute to the number of 911 calls made.
Figure 5 - Coefficients table showing the relationship between percentage of college graduates and 911 calls.

Figure 6 - Model Summary of the regression between percentage of college graduates and number of 911 calls.


 Unemployment - Figure 7 shows that there is a positive linear relationship between the unemployment rate and number of 911 calls in census tracts in Portland. The significance of this is .000, which is lower than .05 and means that we can reject the null hypothesis and state that as the unemployment level increases, so does the number of 911 calls. The equation associated with this result is Y = 1.106 - .507X. Therefore, for every one unit of change in the unemployment rate, we would expect to see and increase in the number of 911 calls by .507.  From figure 6, we can see that with an r-squared value of .543, the unemployment rate is explaining 54% of the variation in number of calls.
Figure 7 - Coefficients table showing the results of the regression analysis between unemployment rates and number of 911 calls. 

Figure  8 - Model Summary of the regression between unemployment rates and number of 911 calls. 
From this, we can see that unemployment rates and low education rates have a linear relationship with the number of 911 calls placed, but the percentage of college graduates in an area does not. From the map below (Figure 9), we can see that the areas with the highest number of 911 calls are mainly in the north of the city, with the lowest number of calls on the outskirts of the city. which suggests that, from these results, we may expect to find higher levels of unemployment and low education compared with the other census tracts in the city of Portland, for instance the area in the north of the state including tracts 59 - 66.
Figure 9 - Map showing the number of calls from each census tract in Portland, Oregon
As the Low Education variable has the highest r-squared value (.567), a residual map was created based on this.
Figure 10 - Residual Map

From the map above, we can see how far from a line of best fit each tract plot point would fall on a graph. Those closest to the line of best fit are beige, while those in blue have a residual that means they are lower than the line of best fit, while those in blue are higher, and in the red are below the trend line. This shows how these areas deviate from the model created.


In addition to the three factors explored above, multiple regression analysis was used to identify other factors contributing to the number of 911 calls. These were: Jobs, Renters, LowEduc (Number of people with no HS Degree), AlcoholX (alcohol sales), Unemployed, ForgnBorn (Foreign Born Pop), Med Income, CollGrads (Number of College Grads). Figure 11 shows the Coefficient table with the results of the multiple analysis. With an r-squared value of .780, we can see that the variables are doing a good job of explaining the number of calls made (Figure 11).



Figure 11 - Multiple Regression coefficients table

Figure 12 - Multiple regression model summary.

From figure 11, we can see that a number of the factors have a significance of above .05, meaning that they are not significant and there is no linear relationship between these factors and the number of 911 calls. These are: renters; unemployed; foreign born; median income' and college graduates. This suggests that the only significant factors contributing to the number of calls made are low education and jobs. However, figure 12 shows a relativley high r-squared of .760, indicating that the independent variables are doing a good job of explaining the dependent variable.  This seems unusual, so a diagnostic test was used to check for multicollinearity.
When conducting multiple regression analysis, it is important to check for multicollinearity as this occurs when two of the independent variables correlate highly with one another, which can make the results of the multiple regression redundant. This could make an independent variable that would be significant on its own appear to be insignificant.  Figure 13 shows the results of this diagnostic test. As the condition index values are below 30, this indicates that no multicollinearity is present. This means we can continue with the results shown above.

Figure 13 - Multicollinearity diagnostic.

 To identify which of these variables were the most important, a stepwise approach was also used: this method of doing a multiple regression sorts through the data and picks out the three independent variables that have the largest influence on the dependent variable. These are; Renters, Low Education and Jobs (Figure 13). It is interesting, as here, renters is significant. This suggests that despite the condition index values being below thirty, some level of multicollinearity was occurring in the previous test that made it appear that the number of renters was not significant. Figure 14 shows that jobs has the highest r-squared value, so this is the variable that has the most influence on the number of 911 calls placed (Figure 14).




Figure 14 - Stepwise regression coefficients table
Figure 15 - Stepwise regression model summary.



Conclusion



The number of calls made to 911 in the city of Portland, Oregon, is influenced by many factors, particularly the jobs in the tract, the number of renters and the number of low education individuals. These results are interesting and will help the city of Portland to plan where a new ER would be best located - ideally in easy reach of areas with a high percentage of renters, low education individuals and those with jobs. This will mean that those who need to access the service most will have shorter travel times. From the map in figure 9, the tracts in the northern area of the city would appear to be a suitable location as these are areas where the highest number of calls has already been identified to be coming from.