ASSESSMENT BRIEF | |
Subject Code and Title | STAT6003 : Statistics for Financial Decisions |
Assessment | Assessment 4 (individual assignment) – 30% |
Individual/Group | Group |
Length | 2000 words (+/- 10%) |
Learning Outcomes |
|
Weighting | 30% |
Total Marks | 30 marks |
The main aims to develop students’ competency in statistical literacy for decision making in the local and global business environment. It reviews statistical techniques for the quantitative evaluation of data in Financial applications. Students will develop analytical and statistical skills to enable them to transform data into meaningful information for the purpose of decision making.
Objectives:
Instructions:
This individual assignment requires you to apply statistical knowledge and skills learned from STAT6003 lectures between week 9, 10 and 11.
Submit copy of presentation Report in .docx, or .pdf format via the Assessment link in the main navigation menu in STAT6003. The Learning Facilitator will provide feedback with reference to the criteria below via the Grade Centre in the LMS portal. Feedback can be viewed in My Grades.
Assignment tasks:
The variables for this assignment are as follows: House Price Index (a)(b): Brisbane, Sydney and Melbourne, 2002–03 to 2016–17.
V1) Market Price ($000) V2) Sydney price Index V3) Annual % change
V4) Total number of square meters V5) Age of house (years)
1) Module 5 topic – Regression Analysis
You will specify a regression model for this assignment. This model can be based on a theory, several theories, your experience, and/or ideas from research article(s). Suggest you consider a regression model that is of interest to you or one that is related to your profession or one that you have knowledge about.
Y = b0 + b1 X1 + b2 X2 + b3 X3 + b4 X4 + e.
In your model, there must be one dependent variable and four independent variables
Assignment tasks:
This assignment is due in week 12 by Wednesday 11:55 pm. Please submit your assignment in Word document in the main navigation menu in STAT6003. The Learning Facilitator will provide feedback with reference to the criteria below via the Grade Centre in the LMS portal. Feedback can be viewed in My Grades.
Statistics Assignment
Report on Regression Analysis
1. Introduction
The following pages determine a regression model for the given data related to market prices for houses in Sydney. The variables included for analysis are: Market Price (in $000), Sydney price Index, Annual percentage change, Total number of square meters and Age of house (in years). Furthermore, the sample size is 18.
As can be seen from above, there are multiple variables and hence, a multiple variable regression model will be appropriate. Further, the Ordinary Least Square technique, popularly known as OLS, will be used to arrive at an appropriate regression model for provided data.
This model will have one dependent variable the value of which will be predicted through regression model. This dependent variable is the Market Price (in $000). The remaining variables will be independent variables that impact the value of dependent variable, that is, Market Price (in $000). The independent variables include: Sydney price Index, Annual percentage change, Total number of square meters and Age of house (in years).
The Sydney price index is based on base value of 100 in 2011-12 and calculated accordingly. The Annual percentage change is in percentage form and reflects the increase or decrease in market prices in year-on-year basis. The Total number of square meters reflects the size of the land on which house is built and it is measured in square meters. The age of the house reflects the number of years since the house was built.
The data for all the variables has been sourced from ABS6416.0, Residential Property Price Indexes for the Three Capital casitas.
To perform the regression analysis, Microsoft Excel will be utilised in order to generate the regression output.
The following graphs were generated using Microsoft Excel and represent relationship between the dependent variable and each of independent variables:
The above graph is a scatter plot for Sydney Price Index versus the Market Price. The independent variable (x) is the Sydney Price Index while the dependent variable (y) is the Market Price. It can be seen that there is a positive linear relationship between the two variables such that as the value of Sydney Price Index increases, the market price also increases.
The above graph is a scatter plot for Annual Percentage Change versus the Market Price. The independent variable (x) is the Annual Percentage Change while the dependent variable (y) is the Market Price. It can be seen that there is a positive linear relationship between the two variables such that as the value of Annual Percentage Change increases, the market price also increases. The trend line slope is not as steep as in case of Sydney Price Index, indicating that this variable has relatively lesser impact.
The above graph is a scatter plot for Total number of Square Meters versus the Market Price. The independent variable (x) is the Total number of Square Meters while the dependent variable (y) is the Market Price. It can be seen that there is a positive linear relationship between the two variables such that as the value of Total number of Square Meters increases, the market price also increases. The data is somewhat concentrated indicating close range values of the variables.
The above graph is a scatter plot for Age of House (years) versus the Market Price. The independent variable (x) is the Age of House (years) while the dependent variable (y) is the Market Price. It can be seen that there is a negative linear relationship between the two variables such that as the value of Age of House (years) increases, the market price decreases and vice versa. This can be related to real life where the older house will fetch a lower price and a newly constructed house will fetch a higher price.
As discussed earlier, Microsoft Excel was used to generate Regression output as follows:
SUMMARY OUTPUT | |||||||||||||||
Regression Statistics | |||||||||||||||
Multiple R | 0.900 | ||||||||||||||
R Square | 0.810 | ||||||||||||||
Adjusted R Square | 0.751 | ||||||||||||||
Standard Error | 45.001 | ||||||||||||||
Observations | 18 | ||||||||||||||
ANOVA | |||||||||||||||
Df | SS | MS | F | Significance F | |||||||||||
Regression | 4 | 1,12,068.128 | 28,017.032 | 13.835 | 0.000 | ||||||||||
Residual | 13 | 26,326.316 | 2,025.101 | ||||||||||||
Total | 17 | 1,38,394.444 | |||||||||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | ||||||||
Intercept | 531.463 | 80.386 | 6.611 | 0.000 | 357.801 | 705.126 | 357.801 | 705.126 | |||||||
Sydney price Index | 2.264 | 0.560 | 4.043 | 0.001 | 1.054 | 3.473 | 1.054 | 3.473 | |||||||
Annual % change | -6.310 | 3.060 | -2.062 | 0.060 | -12.919 | 0.300 | -12.919 | 0.300 | |||||||
Total number of square meters | 0.507 | 0.315 | 1.607 | 0.132 | -0.174 | 1.187 | -0.174 | 1.187 | |||||||
Age of house (years) | -2.637 | 1.146 | -2.301 | 0.039 | -5.113 | -0.161 | -5.113 | -0.161 |
We can say that the Market Price (‘000) is regressed on Sydney price Index, Annual percentage change, Total number of square meters and Age of house (in years).
From the above output, we can interpret the least squares regression equation as follows:
y^ = 531.46 + 2.264 x1 -6.310 x2 + 0.507 x3 – 2.637 x4
where,
The above equation has a constant of 531.46 which is also known as Y-intercept coefficient. It is the minimum value for y even when all x values are zero. It is the point where regression line crosses the vertical axis and is also known as ‘β0’ or ‘constant’.
x1 and x3 are greater than zero indicating that the relationship is positive such that x and y increase or decrease together.
x2 and x4 are lesser than zero indicating that the relationship is negative such that as x increases, y decreases and vice versa.
The output above provides coefficients as well some other statistically significant information, such as p-values for each of the coefficients. These values help in determining whether the variable has statistically significant relationship with the dependent variable or not.
Each of the p-value entails a null hypothesis that the variable has no correlation with the dependent variable. The alternative hypothesis is that the variable has correlation with the dependent variable. This null hypothesis can be rejected if the p-value for a coefficient is less than the significance level in consideration. Hence, in such a case, we can conclude that the particular variable has a correlation with the dependent variable.
In our case, the significance level is 0.05. The p-values for various coefficients are:
The values of R and R2 help in determining the degree of relationship between independent and dependent variables. These are also called as Coefficients of Determination.
In the given case, R value is 0.900 which indicates that there is high degree of correlation between the variables in consideration. R2 is at 0.810 with adjusted R2 being 0.751. The value of R2 is nothing but the proportion of change in dependent variable that can be explained through the independent variables. As can be seen, at 81.0%, a high proportion of change in value of ‘Market Price’ can be attributed to change in independent variables, namely, Sydney price Index, Annual percentage change, Total number of square meters and Age of house (in years).
The confidence intervals for various parameters help in determining the change in values of dependent variable with a unit change in the parameter. In the given case, the confidence interval for various parameters can be stated as:
Points that should be noted in above statements are:
The following is the output when regression is done for single variable, that is, size in total square meters.
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.30 | |||||||
R Square | 0.09 | |||||||
Adjusted R Square | 0.03 | |||||||
Standard Error | 88.83 | |||||||
Observations | 18 | |||||||
ANOVA | ||||||||
Df | SS | MS | F | Significance F | ||||
Regression | 1 | 12,150.69 | 12,150.69 | 1.54 | 0.23 | |||
Residual | 16 | 1,26,243.76 | 7,890.23 | |||||
Total | 17 | 1,38,394.44 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 659.26 | 110.93 | 5.94 | 0.00 | 424.10 | 894.42 | 424.10 | 894.42 |
Total number of square meters | 0.64 | 0.52 | 1.24 | 0.23 | -0.46 | 1.74 | -0.46 | 1.74 |
We can say that the Market Price (‘000) is regressed on Total number of square meters. From the above output, we can interpret the least squares regression equation as follows:
y^ = 659.26 + 0.64 x1
where,
It can be seen above that Total number of square meters explains around 9% of change in the dependent variable, that is, market price. Further, when we see the p-value for this variable, it is 0.23 which is greater than 0.05. Hence, we are unable to reject the null hypothesis that there is no correlation between the selected variables. In other words, this is not a statistically significant variable for our purpose at the significance level of 0.05.
Basis above analysis, we can compare the two regression models:
In case of Model 1, R value is 0.900 which indicates that there is high degree of correlation between the variables in consideration. R2 is at 0.810 with adjusted R2 being 0.751. The value of R2 is nothing but the proportion of change in dependent variable that can be explained through the independent variables. As can be seen, at 81.0%, a high proportion of change in value of ‘Market Price’ can be attributed to change in independent variables, namely, Sydney price Index, Annual percentage change, Total number of square meters and Age of house (in years).
In case of Model 2, R value is 0.300 which indicates that there is low degree of correlation between the variables in consideration. R2 is at 0.09 with adjusted R2 being 0.03. As can be seen, at 9.0%, a very low proportion of change in value of ‘Market Price’ can be attributed to change in independent variable, Total number of square meters.
Hence, it can be seen that Model 1 is a much better fit as compared to Model 2.
From above analysis, the regression equation in case of one variable that is, total square meters is:
y^ = 659.26 + 0.64 x1
where,
When value of independent variable is given at 400 square meters, we can plug the value of x and predict a value of y as follows:
y^ = 659.26 + 0.64 x1
y^ = 659.26 + 0.64 (400)
y^ = 659.26 + 256
y^ = 915.26
Hence, value of dependent variable is 915.26. In other words, when the total size is 400 square meters, the market price of the house in Sydney is expected to be $915,260.