Show all calculations/reasoning
Guide to marks: 20 marks - (a) 3, (b) 3, (c1) 5, (c 2-6) 5 (1 each), (d) 4 (1 each)
(a) Define the term probability. How is it measured? What range can the measures take and what do they mean?
(b) What is meant by the term statistical independence? How can it be identified from a relationship between two variables in a given situation?
(c ) Consider the following record of daily sales of loaves of sourdough bread over the last 100 days. .
Sales Units (x) | No. of days | p(x) | Exp value | More than | Les than | [x-E(x)]2 | [x-E(x)]2 p(x) |
0 | 5 | ||||||
1 | 15 | ||||||
2 | 20 | ||||||
3 | 25 | ||||||
4 | 20 | ||||||
5 | 15 | ||||||
Total | 100 | Variance |
(1)Copy the above table into Excel and using formulas complete the missing column figures (note that the 5th and 6th columns refer to cumulative probability distributions) while the last 2 columns contain variance calculations. All cells (except for cols 1 and 2) are to contain formulas so no fudging. Answer the questions below by highlighting the answers in your table, and simply repeating these figures against answers 2 to 6. After answering the questions below paste your Excel model into Word twice, once showing the output and once showing formulas (with row and column headings). Insert the standard deviation below the variance.
(2)What were the average daily sales? Highlight your answer in the spreadsheet and repeat it here.
(3)What was the probability of selling 2 or more loaves on any one day? Highlight your answer in the spreadsheet and repeat it here.
(4)What was the probability of selling 2 or less? Highlight your answer in the spreadsheet and repeat it here.
(5)What is the variance of the distribution? Highlight your answer in the spreadsheet and repeat it here.
(6)What is the standard deviation? Highlight your answer in the spreadsheet and repeat it here.
(d)The average sales of oranges is 4,700 with a standard deviation of 500.
(1)What is the probability that sales will be greater than 5,500 oranges?
(2)What is the probability that sales will be greater than 4,500 oranges?
(3)What is the probability that sales will be less than 4,900 oranges?
(4)What is the probability that sales will be less than 4,300 oranges?
QUESTION 2 Research Question, Constructing data table and calculating probabilities
Guide to marks: 10 marks – (1) 3, (2) 3, (3) 4
The following question involves learning/employing research skills in searching out data on the Internet, presenting it in a well constructed and informative table, and calculating some probabilities showing calculation methods.
1.Search the Internet for the latest figures you can find on the age and sex of the Australian population.
2. Using Excel, prepare a table of population numbers (not percentages) by sex (in the columns) and age (in the rows). Break age into about 5 standard groups, eg, 0-14, 15-24, 25-54, 55-64, 65 and over. Insert total of each row and each column. Paste the table into Word as a picture. Give the table a title, and below the table quote the source of the figures.
3.Calculate from the table, showing your calculation methods:
•The probability that any person selected at random from the population is a female.
•The probability that any person selected at random from the population is aged between 25 and 54.
•The joint probability that any person selected at random from the population is a male and aged between 55 and 64.
•The conditional probability that any person selected at random from the population is aged between 25 and 64 given that the person is a female.
QUESTION 3 Statistical Decision Making and Quality Control
Show all calculations/reasoning
Guide to marks: 20 marks – (a)10: 3 each for 1,2, and 3, 1 for conclusion, (b) 10: 2 for 1, 3 for 2, 2 for 3, 3 for 4
(a) A company wishes to set control limits for monitoring the direct labour time to produce an important product. Over the past the mean time has been 30 hours with a standard deviation of 10 hours and is believed to be normally distributed. The company proposes to collect random samples of 64 observations to monitor labour time.
Calculate the control limits for each of the 3 alternatives.
Which procedure will provide the narrowest control limits? What are they?
(b)
Hypothesis testing
Active Insurance Company’s rates for fire insurance depend on the distance a home is from the nearest fire station. A progressive community claims that the average home in its town is within 5.5 km of the nearest fire station.
Active took a sample of 64 homes, which produced a mean of 5.8 km from the nearest fire station. Is there sufficient evidence to refute the town’s contention that the mean distance is not greater than the claimed 5.5 km if σ (sigma) = 2.4 km? Use α = 0.05.
1.Show the null and alternative hypotheses.
2.Calculate the critical value.
3.Should the town’s claim be accepted or rejected?
4.Sketch the situation.
END OF ASSIGNMENT 1
Rationale
This assessment task will assess the following learning outcome/s:
1.
a. Probability is the branch of mathematics that deals with the likelihood of occurrence of an event that bargains with calculating the probability of a given event, which is calculated as a number between 1 and 0.
It is calculated by dividing the no. of possible outcomes by the total no. of outcomes, which mathematically represented as-
Probably can be measured on the scale ranging from 0 to 1, where 0 represents that event is impossible to happen and 1 represents that the event is certain to happen. Hence, an event is more probable to happen as its probability moves closer to 1 or farther to 0.
b. Statistical independence means when the probability of occurrence of two events are independent of each other, which implies that the likelihood of occurrence of an event is in no way influenced by the likelihood of occurrence of another event.
It can be identified if two events (A and B) are independent or not, by calculating P(A), P(B) and P(A∩B), and at then checking whether P(A∩B) breaks even with P(A)P(B). In case they equal P(A)P(B), A and B are independent events; if they are not equal, they are dependent.
It can be diagrammatically represented through venn diagram as –
It is clear through the venn diagram that the only common part between A and B is P(A∩B) which is equal to P(A)P(B) in this case, which means that these two events can occur simultaneously without affecting the probability of each other.
E.g. An event of selecting a spade card from a deck of cards and another event of rolling a 3 on a dice are not affected by each other and occur at the same time independently. The likelihood of rolling a 3 would stay the same, as would the likelihood of choosing a spade card.
Sales Units (x) | No. of days (p) | p(x) | f(x) | Exp value [E(x)] | More than 2 | Less than 2 | [x-E(x)]^2] | [x-E(x)]^2 p(x)] |
0 | 5 | 0.05 | 0.05 | 0 | 0 | 0.05 | 0 | 0 |
1 | 15 | 0.15 | 0.2 | 0.15 | 0 | 0.2 | 0.7225 | 0.108375 |
2 | 20 | 0.2 | 0.4 | 0.4 | 0.2 | 0.4 | 2.56 | 0.512 |
3 | 25 | 0.25 | 0.65 | 0.75 | 0.45 | 0 | 5.0625 | 1.265625 |
4 | 20 | 0.2 | 0.85 | 0.8 | 0.65 | 0 | 10.24 | 2.048 |
5 | 15 | 0.15 | 1 | 0.75 | 0.8 | 0 | 18.0625 | 2.709375 |
Total | 100 | 1 | 2.9 | Variance | 36.6475 | 6.643375 |
c.
= √ 36.6475 / 15
= 1.56306
d) Average Sales= 4700
Standard Deviation= 500
Assuming that it is a normal distribution, we will refer to Standard Normal table to find the Z value.
1. P(X > 5500) = (Sales – Avg. Sales) / Std. Deviation
= (5500 - 4700)/500) = 1.6.
Z-value for 1.6 in the Standard Normal table = 0.94520. Since the Standard Normal table only provided Z-value to the left of the table, we have to subtract 0.94520 from 1 to get the value to the right of the table.
Hence, required probability = 1 - 0.94520
= 0.0548
2. P(X > 4500) = (Sales – Avg. Sales) / Std. Deviation
= (4500 - 4700)/500) = -0.4.
Z-value for -0.4 in the Standard Normal table = 0.3446.
Since the Standard Normal table only provided Z-value to the left of the table, we have to subtract 0.3446 from 1 to get the value to the right of the table.
Hence, required probability= 1 - .3446
= 0.6554
3. P(X < 4900) = (Sales – Avg. Sales) / Std. Deviation
= (4900 - 4700)/500) = 0.4.
Z-value for 0.4 in the Standard Normal table = 0.6554.
Hence, required probability= 0.6554
4. P(X < 4300) = (Sales – Avg. Sales) / Std. Deviation
= (4300 - 4700)/500) = -0.8.
Z-value for -0.8 in the Standard Normal table = 0.2119
Hence, required probability= 0.2119
2.
1. Age and sex structure of Australian population:
0-14 years: 17.8% (male 2,122,139/female 2,012,670)
15-24 years: 12.79% (male 1,524,368/female 1,446,663)
25-54 years: 41.45% (male 4,903,130/female 4,725,976)
55-64 years: 11.83% (male 1,363,331/female 1,384,036)
65 years and over: 16.14% (male 1,736,951/female 2,013,149) (2017 est.)
Reference : https://www.indexmundi.com/australia/age_structure.html
2.
Age | Population in the age group | Percentage in total population | Male in the total population | Female in the total population |
0-14 | 4,134,809 | 17.80% | 2,122,139 | 2,012,670 |
15-24 | 2,971,031 | 12.79% | 1,524,368 | 1,446,663 |
25-54 | 9,629,106 | 41.45% | 4,903,130 | 4,725,976 |
55-64 | 2,747,367 | 11.83% | 1,363,331 | 1,384,036 |
65 years and above | 3,750,100 | 16.14% | 1,736,951 | 2,013,149 |
Total | 23,232,413 | 100% | 11,649,919 | 11,582,494 |
(i) Probability = 11,582,494 / 23,232,413
= 49.85%
(ii) Probability = 9,629,106/ 23,232,413
= 41.45%
(iii) Total male population = (100- 49.85)% = 50.15%
Probability = 1,736,951/ 23,232,413
= 7.48%
(iv) Since its mentioned in the question that the given person is a female. Hence the probability will be-
Probability = (4,725,976 + 1,384,036)/ 11,582,494
= 6,110,012 / 11,582,494
= 52.75%
QUESTION 3 Statistical Decision Making and Quality Control
Show all calculations/reasoning
Guide to marks: 20 marks – (a)10: 3 each for 1,2, and 3, 1 for conclusion, (b) 10: 2 for 1, 3 for 2, 2 for 3, 3 for 4
(a) A company wishes to set control limits for monitoring the direct labour time to produce an important product. Over the past the mean time has been 30 hours with a standard deviation of 10 hours and is believed to be normally distributed. The company proposes to collect random samples of 64 observations to monitor labour time.
If management wishes to establish x-bar control limits covering the 95% confidence interval, calculate the appropriate UCL and LCL.
We know that:
N = 64
Mean (µ) = 30
Standard Deviation (σ) = 10
Also, A 95% confidence interval for µ = µ ± zα/2 σ/√n
At 95% CI,
α = 0.05, so zα/2 = z0.025 = 1.96 from the table of Normal distribution.
Then, the 95% confidence interval for µ is calculated as follows:
= 30 ± (1.96)*10/√64
= 30 ± (1.96)*1.25
= 30 ± 2.45
= 32.45 and 27.55
Clearly, the appropriate UCL and LCL at 95% CI are 32.45 hours and 27.55 hours, respectively.
If management wishes to use smaller samples of 16 observations; calculate the control limits covering the 95% confidence interval.
Now, we know that:
N = 16
Mean (µ) = 30
Standard Deviation (σ) = 10
Also, A 95% confidence interval for µ = µ ± zα/2 σ/√n
At 95% CI,
α = 0.05, so zα/2 = z0.025 = 1.96 from the table of Normal distribution.
Then, the 95% confidence interval for µ is calculated as follows:
= 30 ± (1.96)*10/√16
= 30 ± (1.96)*2.5
= 30 ± 4.9
= 34.9 and 25.1
Clearly, the appropriate UCL and LCL at 95% CI are 34.9 hours and 25.1 hours, respectively.
Management is considering three alternative procedures in order to maintain tighter control over labour time:
Sampling more frequently using 16 observations and setting confidence intervals of 90%
Maintaining 95% confidence intervals and increasing sample size to 64 observations
Setting 95% confidence intervals and using sample sizes of 36 observations.
Calculate the control limits for each of the 3 alternatives.
Which procedure will provide the narrowest control limits? What are they?
We know that,
N = 16; Mean (µ) = 30; Standard Deviation (σ) = 10; α = 0.10, so zα/2 = z0.05 = 1.645 from the table of Normal distribution.
Then, the 90% confidence interval for µ is calculated as follows:
= 30 ± (1.645)*10/√16
= 30 ± (1.645)*2.5
= 30 ± 4.1125
= 34.1125 hours and 25.8875 hours
We know that,
N = 64; Mean (µ) = 30; Standard Deviation (σ) = 10; α = 0.05, so zα/2 = z0.025 = 1.96 from the table of Normal distribution.
Then, the 95% confidence interval for µ is calculated as follows:
= 30 ± (1.96)*10/√64
= 30 ± (1.96)*1.25
= 30 ± 2.45
= 32.45 hours and 27.55 hours
We know that,
N = 36; Mean (µ) = 30; Standard Deviation (σ) = 10; α = 0.05, so zα/2 = z0.025 = 1.96 from the table of Normal distribution.
Then, the 95% confidence interval for µ is calculated as follows:
= 30 ± (1.96)*10/√36
= 30 ± (1.96)*1.6667
= 30 ± 3.2667
= 33.2667 hours and 26.7333 hours
The above results can be presented as follows (rounded off to one decimal):
CI | N | UCL (hrs) | LCL (hrs) | Range (hrs) |
90% | 16 | 34.1 | 25.9 | 8.2 |
95% | 64 | 32.5 | 27.6 | 4.9 |
95% | 36 | 33.3 | 26.7 | 6.5 |
Clearly, the tightest control limits are in scenario 2 where CI = 95% and n = 64. This is because the number of observations used is the highest which and so is the confidence level which leads to the smallest interval.
(b) Hypothesis testing
Active Insurance Company’s rates for fire insurance depend on the distance a home is from the nearest fire station. A progressive community claims that the average home in its town is within 5.5 km of the nearest fire station.
Active took a sample of 64 homes, which produced a mean of 5.8 km from the nearest fire station. Is there sufficient evidence to refute the town’s contention that the mean distance is not greater than the claimed 5.5 km if σ (sigma) = 2.4 km? Use α = 0.05.
We know that,
N = 64; Mean (µ) = 5.5; Standard Deviation (σ) = 2.4; α = 0.05, so zα/2 = z0.025 = 1.96 from the table of Normal distribution.
Then, the 95% confidence interval for µ is calculated as follows:
= 5.5 ± (1.96)*2.4/√64
= 5.5 ± (1.96)*0.3
= 5.5 ± 0.588
= 6.09km and 4.91km
The value of 5.8km lies within the above confidence interval
1. Show the null and alternative hypotheses.
H0: µ = 5.5
H1: µ < 5.5
2. Calculate the critical value.
We will use one-tailed (left) test with α = 0.05. Since n is reasonably large, we will use one sample z-test. The corresponding critical value from the table is: 1.645
Z statistic is calculated as follows:
Z = X − µ/ σ/√ n
Z = (5.8–5.5)/(2.4/√64)
Z = 0.3/0.3
Z = 1
Since, it’s a left tail test, we will use the negative value from z table. The corresponding value from the table at z = -1.0 is: 0.1587
This result is not significant at α = 0.05. Hence, we do not have sufficient evidence to reject the null claim. Hence, mean distance is 5.5 kms.
The mean is 5.5 kms in assumptions as used above. A particular sample point may have led to 5.8 kms as mean however; there is no significant evidence that suggests that the mean distance is less than 5.5 km
It should also be noted that the 5.8kms mean is within the confidence interval of 4.91-6.09kms.