# MAS183 Statistical Data Analysis Assignment 1 Answer

Pages: 4 Words: 890

## Question :

MAS183 Statistical Data Analysis Semester 1, 2020

Assignment 1 –

Read pages 13-14 of the Unit Information and follow the instructions carefully.

This assignment covers material in chapters 1 to 4 of the Unit Notes. must be used for all graphs and calculations. All graphs must have titles and axis titles. Numerical answers should be supported by relevant output either included at the appropriate points or appended to the end of your assignment. Data files are available from the Data Files page on LMS. The data is modified from the original sources.

Total marks = 40.

Driving heavy equipment on wet soil can compress the soil and make it harder for crops to grow. The data file  SoilCompress.csv  gives data on the penetrability of the same soil at different levels of compression.1 The data are in two columns, as follows:

Compindicates the compression level of the soil in which each penetrability observation was made. Values are “Compressed”, “Intermediate” or “Loose”.

Pentthe penetrability observations in unspecified units. Higher values indicate that soil is more penetrable (i.e., easier to penetrate).

We are interested in comparing the distributions of soil penetrability observed for soils at the three compression levels.

(a) Paying careful attention to graphical principles of clarity, simplicity and accuracy, provide graphs that allow the distributions of soil penetrability to be compared between soils at different levels of compression. Construct your graphical comparison using

1. Boxplots, and[5]
2. Histograms.[5]

(b) Provide the mean, median and sample standard deviation for each distribution.[3]

1. Using your graphs and statistics from parts (a) and (b), compare and contrast the three distributions of soil penetrability observations in terms of their locations, spreads and shapes.[3]
2. Do any of the observations appear to be outliers relative to the compression group in which they occur?  Justify your answer.[2]

The effects of climate change include change in sea temperatures and shifts in fish populations. The data file  AnglerFish.csv  contains data concerning angler fish populations in the North Sea over the period 1977 to 2001.2 The variables are as follows:

tempthe mean bottom temperature (in °C) of the North Sea during winter in a particular year

latthe mean northern latitude (° of latitude) at which angler fish populations were found during that year

We are interested in the relationship (if any) between mean sea bottom temperature and the mean latitude at which angler fish populations were found.

(a) Which variable is the predictor and which is the response, and why?[2]

1. Provide a scatterplot of the data.  Circle any points that appear to be outliers.[5]
2. Based only on the graph in part (b), briefly describe the relationship between the variables in terms of direction, shape and strength.[3]
3. Using the equation of the least-squares line of best fit, estimate the mean latitude at which angler fish will be found in a year when the mean bottom temperature in the North Sea is 7.3 °C.[2]
4. By hand, draw the least-squares line on your graph from part (b) (“by hand” includes using pen and ruler, or manual positioning of a line drawn using software. DO NOT use for this.) Briefly describe how you worked out where to position the line.[2]
5. Obtain the coefficient of determination for the regression and interpret it in the context of the data.[3]
6. Without doing further calculations, would you expect the coefficient of determination to increase, reduce, or be unchanged if any outlier(s) were removed from the analysis. Briefly justify your answer.[2]

For each data variable used in this assignment (2 variables in each of questions 1 and 2), classify its data type in terms of numerical or categorical, discrete or continuous and nominal or ordinal as appropriate in each case.[3]