# Statistics Assignment on Version of Laptop for Window User Assignment Answer

Dataset overview

Suppose market research company XYZ shows potential buyers two versions of a laptop and asks them how much they would pay for version 1 and version 2.   Some of the buyers come from region A and some come from region B, for example region A could be Sydney and region B could be outside of Sydney. A dataset showing you the buyers answers is given on moodle
The variables of the data set are:
“Which region? Region A (A) or Region B (B)”
“How much would you pay for version 1?”
“How much would you pay for version 2?”
“Would you pay more for version 1? Yes (y) or No (n)”

The columns “student number” and “which sample” are variables they are used to give every student their own sample

Task overview

You need to submit discussion of your dataset as a word file,
You also need to submit an excel file that shows you can summarize the dataset without using the automatic dataset summarizer and p-value calculator.

Instructions explaining how to discuss the dataset in a word file

a) Use the automatic dataset summarizer to make a summary that lets you investigate the relationship between the variables “Which region?” and “How much would they pay for version 1?”.  Paste the summary into the word file. Briefly comment on the relationship between the variables.

b)   Use the automatic dataset summarizer  to make to Make a graph that lets you investigate the relationship between the variables  “How much they would pay for version 1” and “How much would they pay for version 2” and paste it into the word file . Briefly comment on the relationship between the variables.

c) Use the automatic dataset summarizer to make a summary that lets you investigate you investigate the relationship between the variables “Which region?” and “would you pay more for version 1” and paste the summary into the word file. Briefly comment on the relationship between the variables.

d) Find a 95% confidence interval for the variable “would you pay more for version 1”

e) Find the test stat for testing the claim that people would pay more than \$1000 for version 1 on average.

This is the same as finding the zscore for the sample mean if you assume the population mean is 1000 and the population  standard deviation is the same as the sample  standard deviation.

To this question you need to find the sample size, mean and standard deviation of the variable
“how much would you pay for version 1”, This is not difficult you can either use the =count(), =average() and =stdev() command, you would also get the relevant information when you do part (b) of the excel task.

f) Test the claim there is a relationship between the variables “Which region?” and “How much would they pay for version 1?”. Use the automatic dataset summarizer to get the p-value. Interpret the p-value in simple terms.  You are not required to discuss H0 and H1.

g) Test the claim there is a relationship between the variables “Which region?” and “Would they pay more for version 1?”. Use the automatic dataset summarizer to get the p-value. Use the automatic dataset summarizer to get the p-value. Interpret the p-value in simple terms you are not required to discuss H0 and H1.

h) Briefly describe some other variables that could be used in a dataset if you wanted to help a business that makes laptops and explain why the variables would be useful. (300 words)

i) Briefly describe what is meant by the phrase “lurking variables” and explain why it is important to consider lurking variables when writing a report (300 words)

j) Briefly describe how you would make a report that uses the information from at least 5 of the previous parts of the assignment, the previous parts of the assignment are parts (a),(b),(c),(d),(e),(f),(g),(h),(i) given above  (300 words)

Instructions for the excel file , demonstrate you can make summaries without using the automatic dataset summarizer

1. Paste in your dataset into the excel file
2. Use a pivot table to make a summary that lets you investigate the relationship between the variables “Which region?” and “How much would they pay for version 1”
3. Make a graph that lets you investigate the relationship between the variables
“How much they would pay for version 1” and “How much would they pay for version 2”.
4. Use a pivot table to make a summary that lets you investigate the relationship between the variables “Which region?” and “would you pay more for version 1”

## Answer

Statistics Assignment

Window User

Use the automatic dataset summarizer to make a summary that lets you investigate the relationship between the variables “Which region?” and “How much would they pay for version 1?”.  Paste the summary into the word file. Briefly comment on the relationship between the variables.

Pivot table output as generated in Microsoft Excel is as follows:

 Row Labels Average of how much would they pay for version 1? A \$                    986.39 B \$                1,047.73 Grand Total \$                1,016.45

For version 1, people in Region A are willing to pay average of \$986.39 while people in Region B are willing to pay average of \$1,047.73.

Price is a continuous variable as it can take multiple values. Region is a nominal variable that helps to categorize data.

Use the automatic dataset summarizer  to make to Make a graph that lets you investigate the relationship between the variables  “How much they would pay for version 1” and “How much would they pay for version 2” and paste it into the word file . Briefly comment on the relationship between the variables.

Pivot table output as generated in Microsoft Excel is as follows:

 Row Labels Average of how much would they pay for version 1? Average of how much would they pay for version 2? A \$986.39 \$986.05 B \$1,047.73 \$927.20 Grand Total \$1,016.45 \$957.22

The table has been also used to create a bar graph as graph is visually easier to understand. It can be seen that there is very less difference in average prices of Version 1 and Version 2 in Region A. While the difference in average prices of Version 1 and Version 2 in Region B is very high. These are extremely opposite average numbers.

Use the automatic dataset summarizer to make a summary that lets you investigate you investigate the relationship between the variables “Which region?” and “would you pay more for version 1” and paste the summary into the word file. Briefly comment on the relationship between the variables.

Pivot table output as generated in Microsoft Excel is as follows:

 Count of Would they pay more for version 1? Column Labels Row Labels n Y Grand Total A 55 47 102 B 34 64 98 Grand Total 89 111 200

In Region A, 55 people are not willing and 47 people are willing to pay more for Version 1. Whereas, in Region B, 34 people are not willing and 64 people are willing to pay more for Version 1. It can be seen that, in Region B, much more people are willing to pay more for Version 1.

Find a 95% confidence interval for the variable “would you pay more for version 1”

The required information is as follows:

 Count of Would they pay more for version 1? n y Sample Size Standard Error 95% CI Lower 95% CI Upper A 53.9% 46.1% 102 0.04935 0.3640 0.5575 B 34.7% 65.3% 98 0.04808 0.5588 0.7473
• The people who answered either ‘y’ or ‘n’ in either regions have been converted to percentage form of the sample size of respective regions. This helps to find the proportion of people who answered ‘y’
• Standard error of sample proportion is calculated as: sp =
• 95% confidence interval limits have been calculated as:
• Lower: p – 1.96*Standard Error
• Upper: p + 1.96*Standard Error
• 95% confidence interval for the proportion of Region A is [0.3640, 0.5575]
• 95% confidence interval for the proportion of Region B is [0.5588, 0.7473]

Find the test stat for testing the claim that people would pay more than \$1000 for version 1 on average.

An independent sample z-test will be set up to test the following:

• Research Hypothesis: People would pay more than \$1000 for version 1 on average
• Null hypothesis and Alternative hypothesis:
• Null H0: µ = 1000
• Alternative H1: µ > 1000
• Significance level assumed α = 0.05
• Using z.test function in Microsoft Excel, p-value = 0.0000
• Conclusion: Since p-value = 0.0000 is less than assumed significance level of α = 0.05, we can reject the null hypothesis. Hence, we can conclude that at α = 0.05, there is statistically significant evidence that people would pay more than \$1,000 for Version 1 on average.

Test the claim there is a relationship between the variables “Which region?” and “How much would they pay for version 1?”. Use the automatic dataset summarizer to get the p-value. Interpret the p-value in simple terms.  You are not required to discuss H0 and H1.

Pivot table output as generated in Microsoft Excel is as follows:

 Row Labels Average of how much would they pay for version 1? Count of how much would they pay for version 1? A 986.39 102 B 1,047.73 98 Grand Total 1,016.45 200

The above table indicates average price people are willing to pay for Version 1 in region A and Region B. It also gives sample size.

 how much would they pay for version 1? (Multiple Items) Row Labels Count of how much would they pay for version 1? A 45 B 63 Grand Total 108

This table indicates numebr of people who are willing to pay more than \$1000 for version 1 in Region A and Region B.

Using data from both tables, we can calculate proportion as follows:

Region A: 45 out of 102 people are there so, p = 45/102 = 0.4412

Region B: 63 out of 98 people are there so, p = 15/22 = 0.6429

Test the claim there is a relationship between the variables “Which region?” and “Would they pay more for version 1?”. Use the automatic dataset summarizer to get the p-value. Use the automatic dataset summarizer to get the p-value. Interpret the p-value in simple terms you are not required to discuss H0 and H1.

Pivot table output as generated in Microsoft Excel is as follows:

 Count of Would they pay more for version 1? Column Labels Row Labels n y Grand Total A 53.92% 46.08% 100.00% B 34.69% 65.31% 100.00% Grand Total 44.50% 55.50% 100.00%

The table shows percentage of people in each region categorized by their willingness to pay (or not pay) more for Version 1. The pivot uses count expressed as percentage of row total to get required percentages. Hence,

• Region A: p = 0.4608
• Region B: p = 0.6531

Briefly describe some other variables that could be used in a dataset if you wanted to help a business that makes laptops and explain why the variables would be useful. (300 words).

The above analysis focuses only on price of various versions of laptop. However, there can be other important variables such as demographics of the population, such as, age, gender, educational level, profession etc. which can indicate demand for the laptops.

Apart from this, various features of the laptop that people want can also be a variable to help business ascertain what kind of laptops should be made in more quantity. For example, screen size, number of USB ports, RAM, processor etc.

If the above variables are studied in conjunction with the prices, it can give a much more comprehensive picture that can help the businesses making laptops in deciding the production line as well as the price points. Further, it can also assist the businesses making laptops to forecast demand for the period.

Briefly describe what is meant by the phrase “lurking variables” and explain why it is important to consider lurking variables when writing a report (300 words)

As the name suggests, a lurking variable is a hidden or unknown variable such that it is not part of the data analysis or research study. However, this variable not included in research study does not indicate that the variable is not important. Rather, a lurking variable will be unknown or not included in the study but will have a strong relationship with the variables included in the study such that it can impact the variables positively or negatively. Further, it will also distort the data analysis as the lurking variable’s impact is not accounted for in such analysis.

Briefly describe how you would make a report that uses the information from at least 5 of the previous parts of the assignment, the previous parts of the assignment are parts (a),(b),(c),(d),(e),(f),(g),(h),(i) given above  (300 words)

Laptops have become a necessity in today’s time as the world becomes linked through digitization. However, with increasing technological developments, people want to own the latest technology and discard the older one leading to high rate of obsolescence. At the same time, there are various types of consumers who would be willing to pay varying amounts of money for a laptop as seen above.

A business making laptops needs to forecast demand for the right type or version of laptops such that there is minimal risk of obsolescence. Further, it also needs to determine price points that are attractive to the customer and maximize profit for the company.

The above data analysis study focuses on price that the customers are willing to pay. Further, the research is limited to two regions, namely A and B and only two versions of laptop, namely, Version 1 and Version 2. It was seen that in Region A, people are not willing to pay more for Version 1 and average price for the two versions are very close. On the other hand, in Region B, people are willing to pay more for Version 1 and average price for the two versions varies largely. Statistically, at a significance level of 0.05, it could be seen that people are willing to pay more than \$1,000 for version 1 laptop. At 95% confidence level, higher percentage of people was willing to do so in Region B.

Hence, a business making laptops could target Region B for high-end versions of laptop and price them well over \$1,000 to earn higher profits.

However, if the above variable of price is studied in conjunction with other variables, such as features, demographics of population, etc., it can give a much more comprehensive picture that can help the businesses making laptops in deciding the production line as well as the price points. Further, it can also assist the businesses making laptops to forecast demand for the period.