# STAT6003 Statistics for Financial Decisions: Report on Regression Analysis Assessment 4 Answer

 ASSESSMENT BRIEF Subject Code and Title STAT6003 : Statistics for Financial Decisions Assessment Assessment 4 (individual assignment) – 30% Individual/Group Group Length 2000 words (+/- 10%) Learning Outcomes Examine the statistical techniques for the quantitative evaluation of data in decision making for accounting, finance and business applications.Identify and apply appropriate statistical techniques to the problems and challenges.Students will develop analytical and statistical skills through Excel data analysis to manipulate data into meaningful information for the purpose of decision making. Weighting 30% Total Marks 30 marks

## Context:

The main aims to develop students’ competency in statistical literacy for decision making in the local and global business environment. It reviews statistical techniques for the quantitative evaluation of data in Financial applications. Students will develop analytical and statistical skills to enable them to transform data into meaningful information for the purpose of decision making.

Objectives:

• To more broadly understand the statistical literacy for decision making.
• Interpret statistical results and communicate their statistical analysis in business reports.

Instructions:

This individual assignment requires you to apply statistical knowledge and skills learned from STAT6003 lectures between week 9, 10 and 11.

• You will specify a regression model for this assignment. This model can be based on a theory, several theories, your experience, and/or ideas.
• Please use Excel for statistical analysis in this assignment. Relevant Excel statistical output must be properly analysed and interpreted.
• Please provide a number for every table, graph or figure used and make clear reference to the table/graph/figure in your discussion.
• The word limit for this group assignment is 2,000 words (excluding Excel output).
• This assignment is due in week 12 by Wednesday 11:55 pm.

Submit copy of presentation Report in .docx, or .pdf format via the Assessment link in the main navigation menu in STAT6003. The Learning Facilitator will provide feedback with reference to the criteria below via the Grade Centre in the LMS portal. Feedback can be viewed in My Grades.

The variables for this assignment are as follows: House Price Index (a)(b): Brisbane, Sydney and Melbourne, 2002–03 to 2016–17.

V1) Market Price (\$000) V2) Sydney price Index V3) Annual % change

V4) Total number of square meters V5) Age of house (years)

1) Module 5 topic – Regression Analysis

You will specify a regression model for this assignment. This model can be based on a theory, several theories, your experience, and/or ideas from research article(s). Suggest you consider a regression model that is of interest to you or one that is related to your profession or one that you have knowledge about.

1. Using Ordinary Least Square (OLS), estimate the model (below is a template for developing your regression model):

Y = b0 + b1 X1 + b2 X2 + b3 X3 + b4 X4 + e.

In your model, there must be one dependent variable and four independent variables

1. For statistical analysis involving any hypothesis test in this assignment, you are required to:
• Formulate the null and alternative hypotheses.
• State your statistical decision using significant value (��) of 5% for each test.
• State your conclusion in context.

1. Provide an introduction section on the rationale of your model , sample size, and the dependent and independent variables (including their unit of measurement) in this model.
2. Plot the dependent variable against each independent variable using scatter plot/dot function in Excel. Describe the relationship from the plots.
3. Present the full model in your assignment.
4. Write down the least squares regression equation and correctly interpret the equation.
5. Interpret the estimated coefficients of the regression model and discuss their sig values.
6. What is the value of the coefficient of determination for the relationship between the dependent and independent variables. Interpret this value accurately and in a meaningful way.
7. State the 95% confidence intervals for each parameters and interpret these intervals.
8. Estimate the linear regression model to investigate the relationship between the market price and the land size in total number of square meters.
9. Compare the original model (question 1) and re-estimated model (question 2) and evaluate the goodness of fit between them (Hint: Use R2 and Coefficient of determination to evaluate the goodness of fit of the model).
10. Predict the market price of a house (in \$) with a building area of 400 square meters.

This assignment is due in week 12 by Wednesday 11:55 pm. Please submit your assignment in Word document in the main navigation menu in STAT6003. The Learning Facilitator will provide feedback with reference to the criteria below via the Grade Centre in the LMS portal. Feedback can be viewed in My Grades.

Statistics Assignment

Report on Regression Analysis

1. Introduction

The following pages determine a regression model for the given data related to market prices for houses in Sydney. The variables included for analysis are: Market Price (in \$000), Sydney price Index, Annual percentage change, Total number of square meters and Age of house (in years). Furthermore, the sample size is 18.

As can be seen from above, there are multiple variables and hence, a multiple variable regression model will be appropriate. Further, the Ordinary Least Square technique, popularly known as OLS, will be used to arrive at an appropriate regression model for provided data.

This model will have one dependent variable the value of which will be predicted through regression model. This dependent variable is the Market Price (in \$000). The remaining variables will be independent variables that impact the value of dependent variable, that is, Market Price (in \$000). The independent variables include: Sydney price Index, Annual percentage change, Total number of square meters and Age of house (in years).

The Sydney price index is based on base value of 100 in 2011-12 and calculated accordingly. The Annual percentage change is in percentage form and reflects the increase or decrease in market prices in year-on-year basis. The Total number of square meters reflects the size of the land on which house is built and it is measured in square meters. The age of the house reflects the number of years since the house was built.

The data for all the variables has been sourced from ABS6416.0, Residential Property Price Indexes for the Three Capital casitas.

To perform the regression analysis, Microsoft Excel will be utilised in order to generate the regression output.

# 2. Scatter Plots

The following graphs were generated using Microsoft Excel and represent relationship between the dependent variable and each of independent variables:

The above graph is a scatter plot for Sydney Price Index versus the Market Price. The independent variable (x) is the Sydney Price Index while the dependent variable (y) is the Market Price. It can be seen that there is a positive linear relationship between the two variables such that as the value of Sydney Price Index increases, the market price also increases.

The above graph is a scatter plot for Annual Percentage Change versus the Market Price. The independent variable (x) is the Annual Percentage Change while the dependent variable (y) is the Market Price. It can be seen that there is a positive linear relationship between the two variables such that as the value of Annual Percentage Change increases, the market price also increases. The trend line slope is not as steep as in case of Sydney Price Index, indicating that this variable has relatively lesser impact.

The above graph is a scatter plot for Total number of Square Meters versus the Market Price. The independent variable (x) is the Total number of Square Meters while the dependent variable (y) is the Market Price. It can be seen that there is a positive linear relationship between the two variables such that as the value of Total number of Square Meters increases, the market price also increases. The data is somewhat concentrated indicating close range values of the variables.

The above graph is a scatter plot for Age of House (years) versus the Market Price. The independent variable (x) is the Age of House (years) while the dependent variable (y) is the Market Price. It can be seen that there is a negative linear relationship between the two variables such that as the value of Age of House (years) increases, the market price decreases and vice versa. This can be related to real life where the older house will fetch a lower price and a newly constructed house will fetch a higher price.

# 3. Regression Model

As discussed earlier, Microsoft Excel was used to generate Regression output as follows:

 SUMMARY OUTPUT Regression Statistics Multiple R 0.900 R Square 0.810 Adjusted R Square 0.751 Standard Error 45.001 Observations 18 ANOVA Df SS MS F Significance F Regression 4 1,12,068.128 28,017.032 13.835 0.000 Residual 13 26,326.316 2,025.101 Total 17 1,38,394.444 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 531.463 80.386 6.611 0.000 357.801 705.126 357.801 705.126 Sydney price Index 2.264 0.560 4.043 0.001 1.054 3.473 1.054 3.473 Annual % change -6.310 3.060 -2.062 0.060 -12.919 0.300 -12.919 0.300 Total number of square meters 0.507 0.315 1.607 0.132 -0.174 1.187 -0.174 1.187 Age of house (years) -2.637 1.146 -2.301 0.039 -5.113 -0.161 -5.113 -0.161

# 4. Regression Equation

We can say that the Market Price (‘000) is regressed on Sydney price Index, Annual percentage change, Total number of square meters and Age of house (in years).

From the above output, we can interpret the least squares regression equation as follows:

y= 531.46 + 2.264 x1 -6.310 x2 + 0.507 x3 – 2.637 x4

where,

• Dependent variable, Market price (\$000) is represented as: y
• Independent variable, Sydney price index is represented as x1
• Independent variable, Annual percentage change is represented as x2
• Independent variable, Total number of Square Meters is represented as x3
• Independent variable, Age of House (years) is represented as x4

# Coefficients

The above equation has a constant of 531.46 which is also known as Y-intercept coefficient. It is the minimum value for y even when all x values are zero. It is the point where regression line crosses the vertical axis and is also known as ‘β0’ or ‘constant’.

x1 and x3 are greater than zero indicating that the relationship is positive such that x and y increase or decrease together.

x2 and x4 are lesser than zero indicating that the relationship is negative such that as x increases, y decreases and vice versa.

## Sig Value

The output above provides coefficients as well some other statistically significant information, such as p-values for each of the coefficients. These values help in determining whether the variable has statistically significant relationship with the dependent variable or not.

Each of the p-value entails a null hypothesis that the variable has no correlation with the dependent variable. The alternative hypothesis is that the variable has correlation with the dependent variable. This null hypothesis can be rejected if the p-value for a coefficient is less than the significance level in consideration. Hence, in such a case, we can conclude that the particular variable has a correlation with the dependent variable.

In our case, the significance level is 0.05. The p-values for various coefficients are:

• Intercept: 0.000, which is less than 0.05. Hence, the variable is statistically significant.
• x1 : 0.001, which is less than 0.05 Hence, the variable is statistically significant.
• x2 : 0.060, which is greater than 0.05. Hence, the variable is not statistically significant.
• x3 : 0.132, which is greater than 0.05. Hence, the variable is not statistically significant.
• x: 0.039, which is less than 0.05. Hence, the variable is statistically significant.

# 6. Coefficients of Determination

The values of R and R2 help in determining the degree of relationship between independent and dependent variables. These are also called as Coefficients of Determination.

In the given case, R value is 0.900 which indicates that there is high degree of correlation between the variables in consideration. R2 is at 0.810 with adjusted R2 being 0.751. The value of R2 is nothing but the proportion of change in dependent variable that can be explained through the independent variables. As can be seen, at 81.0%, a high proportion of change in value of ‘Market Price’ can be attributed to change in independent variables, namely, Sydney price Index, Annual percentage change, Total number of square meters and Age of house (in years).

# 7. Confidence Intervals

The confidence intervals for various parameters help in determining the change in values of dependent variable with a unit change in the parameter. In the given case, the confidence interval for various parameters can be stated as:

• x1 : We can be 95% confident that with each increase in Sydney price index, the market price (\$000) will increase between 1.054 and 3.473.
• x2 : We can be 95% confident that with each decrease in Annual % change, the market price (\$000) will increase between -12.919 and 0.300.
• x3 : We can be 95% confident that with each increase in Total number of Square meters, the market price (\$000) will increase between -0.174 and 1.187.
• x: We can be 95% confident that with each increase in Age of House (years), the market price (\$000) will decrease between -5.113 and -0.161.

Points that should be noted in above statements are:

• In each case, the first value is Lower 95% confidence level and the second number is Upper 95% confidence level.
• The change is in \$000. Hence, for example, a change of 1.054 indicates an increase of \$1,054.

# 8. Market Price versus Land Size

The following is the output when regression is done for single variable, that is, size in total square meters.

 SUMMARY OUTPUT Regression Statistics Multiple R 0.30 R Square 0.09 Adjusted R Square 0.03 Standard Error 88.83 Observations 18 ANOVA Df SS MS F Significance F Regression 1 12,150.69 12,150.69 1.54 0.23 Residual 16 1,26,243.76 7,890.23 Total 17 1,38,394.44 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 659.26 110.93 5.94 0.00 424.10 894.42 424.10 894.42 Total number of square meters 0.64 0.52 1.24 0.23 -0.46 1.74 -0.46 1.74

We can say that the Market Price (‘000) is regressed on Total number of square meters. From the above output, we can interpret the least squares regression equation as follows:

y= 659.26 + 0.64 x1

where,

• Dependent variable, Market price (\$000) is represented as: y
• Independent variable, Total number of Square Meters is represented as x1

It can be seen above that Total number of square meters explains around 9% of change in the dependent variable, that is, market price. Further, when we see the p-value for this variable, it is 0.23 which is greater than 0.05. Hence, we are unable to reject the null hypothesis that there is no correlation between the selected variables. In other words, this is not a statistically significant variable for our purpose at the significance level of 0.05.

# 9. Re-estimated Model

Basis above analysis, we can compare the two regression models:

In case of Model 1, R value is 0.900 which indicates that there is high degree of correlation between the variables in consideration. R2 is at 0.810 with adjusted R2 being 0.751. The value of R2 is nothing but the proportion of change in dependent variable that can be explained through the independent variables. As can be seen, at 81.0%, a high proportion of change in value of ‘Market Price’ can be attributed to change in independent variables, namely, Sydney price Index, Annual percentage change, Total number of square meters and Age of house (in years).

In case of Model 2, R value is 0.300 which indicates that there is low degree of correlation between the variables in consideration. R2 is at 0.09 with adjusted R2 being 0.03. As can be seen, at 9.0%, a very low proportion of change in value of ‘Market Price’ can be attributed to change in independent variable, Total number of square meters.

Hence, it can be seen that Model 1 is a much better fit as compared to Model 2.

# 10. Prediction of Market Price

From above analysis, the regression equation in case of one variable that is, total square meters is:

y= 659.26 + 0.64 x1

where,

• Dependent variable, Market price (\$000) is represented as: y
• Independent variable, Total number of Square Meters is represented as x1

When value of independent variable is given at 400 square meters, we can plug the value of x and predict a value of y as follows:

y= 659.26 + 0.64 x1

y= 659.26 + 0.64 (400)

y= 659.26 + 256

y= 915.26

Hence, value of dependent variable is 915.26. In other words, when the total size is 400 square meters, the market price of the house in Sydney is expected to be \$915,260.