401077 Introduction to Biostatistics, Autumn 2019
Assignment 1 (Due Sunday August 18, 2019)
Please answer all 5 questions. Record your answers in the template document provided and submit via Turnitin before 11:59pm on the due date. The marks allocated to each question are shown in the assignment. A total of 30 marks are available and this assignment is worth 30% of your overall grade.
All of the questions require you to analyse the unique assignment data set which I have created for you. This is labelled ‘dataforxxxxxxxx.RData’ where xxxxxxxx represents your Student ID number. The description of this data set is provided in the file ‘Description of your data set.docx’. You can find your data set and its description into the Assessment 1 folder in vUWS.
Note: Each student will get different answers as the data sets differ.
Question 1 (6 marks)
Using the data set on grandparent carers’ health assigned to you and R Commander:
Question 2 (6 marks)
Using the data set on grandparent carers’ health assigned to you and R Commander:
Question 3 (6 marks)
Using the data set on grandparent carers’ health assigned to you and R Commander:
Question 4 (6 marks)
Using the data set on grandparent carers’ health assigned to you and R Commander:
Suppose you were to select one person at random from this data set five times and record whether or not they are obese. That is, randomly select 1 person from the 233 and then put them back, then randomly select 1 person from the 233 then put them back, etc (5 times).
Question 5 (6 marks)
Use the data set on grandparent carers’ health assigned to you and R Commander.
Suppose, in this sample, the distribution of participant’s hand grip strength (the variable ‘grip’) follows the Normal model.
401077 Introduction to Biostatistics, Autumn 2019
Question 1 (6 marks)
Skewness= 0.5822436
Kurtosis= 3.413015
As the skewness is positive, the distribution of BMI is skewed right. Kurtosis value is greater than 3, so its tails are heavier compared to normal distribution, and central peak is lower and shorter.
Question 2 (6 marks)
Histogram of gripstrength by BMI calssification is displayed below.
It can be easily observed that Overweight persons has higher grip strength than normal and obese persons. Normal persons has lowest grip strength among these three groups.
b) Plot between BMI ands hand grip strength is displayed below in the figure.
The correlation between bmi and grip strength is -0.267 which means they are negatively correlated. But their strength is low. If grip is increased then bmi would decrease but slightly.
Question 3 (6 marks)
a)
Table 1: Body Mass Classification by Occupation history
Normal | Overweight | Obese | Total | |
Heavy Manual | 8 | 37 | 20 | 65 |
Other | 25 | 87 | 56 | 168 |
Total | 33 | 124 | 76 | 233 |
(Created by using ‘bmicat’ and ‘occ’ variables)
Table 2: Body Mass Classification distribution by occupation in percentage
Normal | Overweight | Obese | |
Heavy Manual | 3.4334% | 15.8798% | 8.5837% |
Other | 10.73% | 37.339% | 24.034% |
We can see that the number of normal people with the occupation as heavy manual is the lowest and the number of overweight people with the occupation as other is the highest. Through the cross tabulation we can depict that if a person is selected at random then what is the probability of it being one of the cross-sectional data.
From the table produced in R-Commander, we can see that the probability of selecting one person at random would be an obese person with an occupation history of “heavy manual” is:
P = 20/233 (Total Number of observations)
P = 0.085
From the table produced in R-Commander, we can see that the probability of selecting one obese person at random would be with an occupation history of “heavy manual” is:
P = 20/76
P = 0.26
To determine the dependence and independence of 2 variables in a sample we use Chi-Square Test. In Chi-Square Test, Null Hypothesis is that the two variables are independent and Alternate Hypothesis is that the two variables are dependent. From the result, we can see that the p-value is above 0.05 so, we fail to reject the null hypothesis, which means the two variables are independent of each other.
Question 4 (6 marks)
Binomial model is used to determine the success of any given situation when the actions are repeated a number of times. Here we are selecting one person at random and the process is repeated 5 times in total so the binomial distribution model is best fit for such kind of scenarios.
Total number of obese people = 76
Total number of people = 233
Probability that 2 or 3 times it would be obese = 0.4827864
The mean in a binomial distribution is calculated by n*P where n is the number of trials and P is the probability of success. Here n=5 and probability of success is 0.5 so, mean would be 5*0.326 = 1.63. So, 1.63 is the mean which will be observed for all obese people if repetitions are repeated a number of times.
Question 5 (6 marks)
Mean is 32.19 and Standard Deviation is 3.94
Z-score for a participant who records a hand grip strength of 34 kilograms is 0.45.
Central limit theorem is basically used for normal distribution of data. We can see that the data given to us is almost normally distributed.
By using CLT,
32.83245, 31.93303 33.39408, 31.57391
The value of hand grip strength which is achieved or surpassed by only 5% of the grandparent carers is 25.70584.