## Question :

Assessment Brief: BUS 1003 Mathematics and Statistics Trimester 2, 2021

Assessment 3: Statistical Data Analysis

 Group/individual: Group Word count / Time provided: 1500 - 2000 Words Weighting: 30% Unit Learning Outcomes: [ULO1], [ULO2], [ULO3], [ULO4]

## Assessment Details:

The BUS1003 assessment task 3 is worth of 30% of the overall assessment in the unit. This assignment is a group work.

Timeframe and Submission:

The assessment must be uploaded no later than 11:59 pm on Sunday of Week 12 on the Canvas assessment submission link. Unless approval for an extension is given on medical grounds (supported by a medical certificate), there will be a penalty of 5% of the maximum marks per calendar day for late submission of assignments. Although you will be provided with guidance about addressing the assignment tasks, you will need to complete the tasks in your own time.

Assessment Presentation

• Your assignment must be presented in Microsoft (MS) Word or pdf. Copy and paste any relevant Excel outputs to this document immediately before any relevant written answers to each task.
• If you are unfamiliar with using the MS Word Equations Editor, you may write algebraic/mathematical/statistical symbols and notation in neat handwritten form.
• Your answers must be clear. You must highlight relevant items on any required Excel outputs and refer to them in your written answers.
• When asked to perform a manual calculation (i.e., MS Excel is not specified), you must show all working. This must include intermediate steps where relevant. Failure to do so will result in a loss of marks.
• An Assessment Declaration is required and must be attached to the front of your assignment.

The dataset included with this assignment is a random sample of 450 persons from the population survey of an NSW in a particular year (2016). The population consists of working and drawing salaries during the survey year, which you can access from the Assessment Information page on the unit website. You need to select the random samples of 70 IDs each containing observations, where appropriate, of the eight variables, V1 to V8. The variables in the data set are as follows

V1 = Salary (dollars per hour)

V2 = Occupational category (1=Management, 2=Sales, 3=Clerical, 4=Service, 5=Professional, 6=Other)

V3 = Sector (0=Other, 1=Manufacturing, 2=Construction)

V4 = Indicator variable for Residency Ownership (1=Homeowner, 0=Tenant)

V5 = Educational level (0= other, 1= Diploma, 2= Graduate Certificate, 3= Bachelor, 4=Master, 5= Doctorate)

V6 = Number of years of work experience V7 =Age (years)

V8 = Indicator variable for sex (1=Female, 0=Male).

Answers to the Assessment 3 tasks must be based on the sample data file you created in Part I of the assignment. In addition, most tasks in assessment3 require you to obtain an Excel output before performing some analysis. There are five tasks in Assessment 3. You must meet all task requirements to receive full marks.

1. Find the frequency distribution for the Educational level (0= other, 1= Diploma, 2= Graduate Certificate, 3= Bachelor, 4=Master, 5= Doctorate). Use Excel to produce a Descriptive Statistics table for your sample "Educational level" data and paste it into your MS Word assignment document.
2. Use the relative frequency approach to find the probability distribution for the Educational level.
3. Draw the pie chart for the probability distribution of Educational level.
4. Define the probability distribution based on part (b) (You have to calculate according to your data). Show your results in the below format:
 x 0 1 2 3 4 5 P(x)

1. Based on the probability distribution calculate in part (d), the following
2. Find the probability of exactly three.
3. Find the probability of more than two.
4. Find the probability of at least two.

1. Find the frequency distribution for the indicator variable for Residency Ownership (1=Homeowner, 0=Tenant). Then, use Excel to produce a Descriptive Statistics table for your sample "Residency Ownership" data and paste it into your MS Word assignment document.
2. Use the relative frequency approach to find the probability distribution for the Residency Ownership.
3. Draw the bar chart for the probability distribution of Residency Ownership.
4. According to a report of the sample data, 26% (you need to consider the Residency Ownership proportion as the probability of success) of the people are the homeowner. Assume that a sample of 9 people is studied:
5. Find the probability of exactly four are a homeowner.
6. Find the probability less than four are a homeowner.
7. Find the probability that at least six are a homeowner.

1. Use Excel and your sample data file to produce a suitable output; test, at the 5% level of significance, the hypothesis that, for Salaries (dollar per hours) in the population with mean is \$25.
2. Is this a one-tailed or two-tailed test? Briefly explain the reasoning behind your answer.
3. Write, in precise symbolic form, the null and alternative hypotheses.
4. Define Z test and calculate the value of test statistics.
5. Define critical values based on the nature of the problem.
6. Find a 95% confidence interval for the salaries (dollar per hours) in the population.
7. Make the decision based on the critical value.

1. Use Excel and your sample data file to produce a descriptive summary output (remember to include confidence bound "e" at 1% level of significance) for the Indicator variable for sex (1=Female, 0=Male) according to your sample data from task 1.
2. Define the mean proportion.
3. At a 1% level of significance, the hypothesis that for the Indicator variable for sex (1=Female, 0=Male) according to your sample data from task 1 and the mean proportion for the male population is 0.45.
4. Write, in precise symbolic form, the null and alternative hypotheses.
5. Is this a one-tailed or two-tailed test? Briefly explain the reasoning behind your answer.
6. State the conclusion based on the sample evidence.
7. Find a 99% confidence interval for the Indicator variable for sex male.

1. Find the relationship between Salaries (dollar per hours) as a response variable and Education level as an explanatory variable. Use excel to find the linear regression output. The belief is that as the education level increases, the Salaries (dollar per hours) would increase. (You have to calculate according to your data).
2. State the slope coefficient of the least square regression equation.
3. State the intercept coefficient of the least square regression equation.
4. Determine the least square regression equation representing the approximately linear relationship between the Salaries (dollar per hours) as a response variable and Education level as an explanatory variable.
5. Estimate the Salaries when the education level is Diploma.
6. Construct the 95% confidence interval for the slope parameter of the least square regression equation.