**401077 Introduction to Biostatistics, Autumn 2019**

**Assignment 1 ****(Due Sunday August 18, 2019)**

Please answer all 5 questions. Record your answers in the template document provided and submit via Turnitin before 11:59pm on the due date. The marks allocated to each question are shown in the assignment. A total of 30 marks are available and this assignment is worth 30% of your overall grade.

All of the questions require you to analyse the unique assignment data set which I have created for you. This is labelled ‘dataforxxxxxxxx.RData’ where xxxxxxxx represents your Student ID number. The description of this data set is provided in the file ‘Description of your data set.docx’. You can find your data set and its description into the Assessment 1 folder in vUWS.

Note: Each student will get different answers as the data sets differ.

Question 1 (6 marks)

Using the data set on grandparent carers’ health assigned to you and R Commander:

- Is hand grip strength (the variable ‘grip’) categorical or continuous? Explain why. (1 mark)
- Graph the distribution of the WHO body mass classification (the variable ‘bmicat’) with appropriate axis labels. Write a sentence or two summarising the main characteristics of the distribution as shown in the graph. (2 marks)
- Write one or two sentences describing the distribution Body Mass Index (the variable ‘bmi’) using appropriate summary statistics. (Hint: consider measures of centre, spread and shape. R commander output alone is insufficient – write the answer in your own words.) (3 marks)

Question 2 (6 marks)

Using the data set on grandparent carers’ health assigned to you and R Commander:

- Graph respondents’ hand grip strength (the variable ‘grip’) against their WHO body mass classification (the variable ‘bmicat’). Using the graph alone, write a sentence or two describing the relationship between these two variable. (2 marks)
- Graph respondents’ hand grip strength (the variable ‘grip’) against their Body Mass Index (the variable ‘bmi’). Using the graph alone, write a sentence or two describing the form, direction and strength of this relationship. (4 marks)

Question 3 (6 marks)

Using the data set on grandparent carers’ health assigned to you and R Commander:

- Tabulate the relationship between occupation history (the variable ‘occ’) and WHO body mass classification (the variable ‘bmicat’). Include frequency counts and row or column percentages. (Note: R commander output alone is insufficient – present your table(s) in Word with informative headings.) (1 mark)
- Using the results in part a) write a sentence or two describing the relationship between occupation history and WHO Body Mass Classification. (2 marks)
- If you were to select one person at random from this data set, what is the probability they would be an obese person with an occupation history of ‘heavy manual’? (1 mark)
- If you were to select one obese person at random from this data set, what is the probability they would have an occupation history of ‘heavy manual’? (1 mark)
- Are WHO Body Mass Classification (the variable ‘bmicat’) and occupation history (the variable ‘occ’) independent in this sample? Justify your answer. (1 mark)

Question 4 (6 marks)

Using the data set on grandparent carers’ health assigned to you and R Commander:

Suppose you were to select one person at random from this data set five times and record whether or not they are obese. That is, randomly select 1 person from the 233 and then put them back, then randomly select 1 person from the 233 then put them back, etc (5 times).

- Explain why the Binomial model would be an appropriate model for this scenario. (2 marks)
- Using the Binomial model, what is the probability that either 2 or 3 of these people would be obese? Explain how you derived this answer. (2 marks)
- Suppose the procedure for selecting 5 people at random described above was repeated many times. What is the mean number of obese people you would expect when averaged across all repetitions? Explain how you derived this answer. (2 marks)

Question 5 (6 marks)

Use the data set on grandparent carers’ health assigned to you and R Commander.

Suppose, in this sample, the distribution of participant’s hand grip strength (the variable ‘grip’) follows the Normal model.

- Calculate the mean and standard deviation of participants’ hand grip strength (the variable ‘grip’). (1 mark)
- Estimate the z-score for a participant who records a hand grip strength of 34 kilograms. Show your working. (1 mark)
- Using the Normal model estimate the proportion of participants in this sample who would be expected to have a hand strength of 34 or more kilograms. (1 mark)
- Suppose you selected four random samples, each of 10 participants, from the 233 in your data set. Explain why the Central Limit Theorem (CLT) applies. Using the CLT estimate of the mean and standard deviation of the mean grip strengths across these four samples. (2 marks)
- Using the Normal model estimate the value of hand grip strength which is achieved or surpassed by only 5% of the grandparent carers. (1 mark)

**401077 Introduction to Biostatistics, Autumn 2019**

**Question 1 (6 marks)**

- Hand grip strength (the variable “grip”) is a continuous variable as continuous variables are numeric variables that generally have an infinite number of values between any two values. Whereas, a categorical variable has a finite number of categories or groups which defines the set. So “grip” is continuous variable.

- As we can see that the number of overweight people is more than the obese and normal. Also, normal people are too less as compared to others.
- Interpretation of R: Talking about bmi summary statistics, Minimum and Maximum is 19.93 and 41.29 respectively and the centre is close to 28.37.

Skewness= 0.5822436

Kurtosis= 3.413015

As the skewness is positive, the distribution of BMI is skewed right. Kurtosis value is greater than 3, so its tails are heavier compared to normal distribution, and central peak is lower and shorter.

**Question 2 (6 marks) **

Histogram of gripstrength by BMI calssification is displayed below.

It can be easily observed that Overweight persons has higher grip strength than normal and obese persons. Normal persons has lowest grip strength among these three groups.

b) Plot between BMI ands hand grip strength is displayed below in the figure.

The correlation between bmi and grip strength is -0.267 which means they are negatively correlated. But their strength is low. If grip is increased then bmi would decrease but slightly.

**Question 3 (6 marks)**

a)

*Table 1: Body Mass Classification by Occupation history*

Normal | Overweight | Obese | Total | |

Heavy Manual | 8 | 37 | 20 | 65 |

Other | 25 | 87 | 56 | 168 |

Total | 33 | 124 | 76 | 233 |

(Created by using ‘bmicat’ and ‘occ’ variables)

*Table 2: Body Mass Classification distribution by occupation in percentage*

Normal | Overweight | Obese | |

Heavy Manual | 3.4334% | 15.8798% | 8.5837% |

Other | 10.73% | 37.339% | 24.034% |

We can see that the number of normal people with the occupation as heavy manual is the lowest and the number of overweight people with the occupation as other is the highest. Through the cross tabulation we can depict that if a person is selected at random then what is the probability of it being one of the cross-sectional data.

From the table produced in R-Commander, we can see that the probability of selecting one person at random would be an obese person with an occupation history of “heavy manual” is:

P = 20/233 (Total Number of observations)

P = 0.085

From the table produced in R-Commander, we can see that the probability of selecting one obese person at random would be with an occupation history of “heavy manual” is:

P = 20/76

P = 0.26

To determine the dependence and independence of 2 variables in a sample we use Chi-Square Test. In Chi-Square Test, Null Hypothesis is that the two variables are independent and Alternate Hypothesis is that the two variables are dependent. From the result, we can see that the p-value is above 0.05 so, we fail to reject the null hypothesis, which means the two variables are independent of each other.

Question 4 (6 marks)

Binomial model is used to determine the success of any given situation when the actions are repeated a number of times. Here we are selecting one person at random and the process is repeated 5 times in total so the binomial distribution model is best fit for such kind of scenarios.

Total number of obese people = 76

Total number of people = 233

Probability that 2 or 3 times it would be obese = 0.4827864

The mean in a binomial distribution is calculated by n*P where n is the number of trials and P is the probability of success. Here n=5 and probability of success is 0.5 so, mean would be 5*0.326 = 1.63. So, 1.63 is the mean which will be observed for all obese people if repetitions are repeated a number of times.

Question 5 (6 marks)

Mean is 32.19 and Standard Deviation is 3.94

Z-score for a participant who records a hand grip strength of 34 kilograms is 0.45.

Central limit theorem is basically used for normal distribution of data. We can see that the data given to us is almost normally distributed.

By using CLT,

32.83245, 31.93303 33.39408, 31.57391

The value of hand grip strength which is achieved or surpassed by only 5% of the grandparent carers is 25.70584.

Chat Now