Part A
All essays should be appropriately referenced. Essays will be marked for both content and proper grammar, punctuation and spelling. The indicated word lengths are just a general guideline. A shorter complete answer will not be penalised.
A1:
Mary, knowing that you are taking a course in data visualisation, asks for your advice on when, if ever, it’s OK to not use a 0-baseline on a chart. Write a short essay (~500 words) on how to select appropriate baselines, include specific chart types and examples in your discussion. You may refer to or include specific figures from The Truthful Art in your discussion.
A2:
John W. Tukey once wrote that exploring data is “graphical detective work”. Visually exploring data can be summarised as: find patterns and trends lurking in the data and then observe the deviations from those patterns. Write a short essay (~ 500 words) discussing the insights that can be gained by calculating a data sets: mode, median, mean and range. Use specific examples to illustrate your points. You may refer to or include specific figures from The Truthful Art in your discussion.
A3:
In his book, The Truthful Art, Albert Cairo discusses three inherent Mind Bugs or built-in human biases: patternicity, storytelling and confirmation. Write a short essay (~600 words) discussing how each of these mind bugs can contribute to our misinterpreting visualisations. You may refer to or include specific figures from The Truthful Art in your discussion.
A4:
In his book, The Truthful Art, Albert Cairo discusses the human propensity to notice interesting patterns and then presume a possible cause-effect relationship. Cairo states that there are three requirements of any rational conjecture. Write a short essay (~600 words) discussing each of the three requirements. You may refer to or include specific figures from The Truthful Art in your discussion.
Answer:
A1) As we know that Data visualization is very powerful tool, which is used to inform and educate the readers. It is also known as information graphics. When we draw some sense from the numbers, measurements and facts, these all are form of art. This art is known as data visualization. There can be noise in the data, when we remove the noise from the data then we can get some relevant information or knowledge from the data. The visualization of data is not only displaying the data, but we display the data in certain way that it is easy to understand some pattern where the real value lies. The visualizations helps the corporate sectors are as follows to impact the business:
For data visualization baseline is very important. When the baseline is different, the data appearance can change drastically, the importance of baseline is as follows:
Here we will represent the of sales data with columns year, quarter and sales.
1. Here we represent the regular quarterly sales. The baseline is historical sales.
In this we see that sales has decreased during the year 2014.
2) here the quarterly and yearly sales change and we will represent the percentage change.
Annual percentage change is as follows:
A2) The term exploratory data analysis is coined a century ago, in this approach we emphasize on the hypothesis generation and pattern recognition from the given data in raw form. That’s why it is said to be detective work. When we represent the result of the analysis using the graph we termed it graphical detective work. Using this data analysis approach, we try to find the hidden information or knowledge from the raw data, it means that we explore the raw data and we get some meaningful pattern. In this approach first step is to understand the given raw data and explore the data, this approach of data analysis is applied in different areas like marketing, geography, operations management. The exploring data analysis technique like data mining and data visualization is used in auditing, finding the pattern in given raw data. It focuses more on the testing the assumptions which is required for the fitting of the model and hypothesis testing and also used to handle the missing values and making the transformations. So, finally we can say that exploratory data analysis is philosophy which uses the various techniques to perform the following tasks as follows:
So, we can say that it is an approach to analysis of data which postpones the primary assumptions about the model which followed by the data with the direct approach which allows the data and reveals the primary structure and model. Most of the exploring data analysis technique is graphical in nature with few qauntitative techniques.
The particular graphical techniques in exploring data analysis are :
Example: We have given some data X and Y and we will compute the summary statistics.
X Y
10.00 8.04
8.00 6.95
13.00 7.58
9.00 8.81
11.00 8.33
14.00 9.96
6.00 7.24
4.00 4.26
12.00 10.84
7.00 4.82
5.00 5.68
The summary statistics result is as follows:
N = 11
Mean of X = 9.0
Mean of Y = 7.5
Intercept = 3
Slope = 0.5
Residual standard deviation = 1.237
Correlation = 0.816
This scatter plot explains that there is no outliers and we can say that data is equally precise and it is regular fit.
A3) Basically, the model is the extended version of the act of human thinking and their communication. As human when we imagine anything then we represent it through any kind of model or picture. So, it is perception, cognition and reasoning of the human brain which is incapable of catching the reality in all its complexity. Human brain and sense mediate the relationship with the world. Our logical and analytical thinking and communication are based on the models. If we think in terms of statistic like average. We can say that the average height of the U.S female is 63.8 inches, we can assume it as model which summarizes the all heights of the adult women in U.S. it is not absolute model but it is just approximation. If we take huge dataset we can’t understand anything. It will be very difficult for human brain to compute. Using this thinking we can make perfect models to analyse, observe and represent reality. A visualization is model which serves as channel between the mental model and designer’s brain and inside the audience’s brain.
The previous books summarizes the human mind is humbling. It is describe as follows:
Patternicity: The very first mind bug is finding the patterns visual and otherwise. In this case visualization becomes the powerful tool. Which transforms the lot of numbers or data into the chart or data map. While several patterns which is detected by the eyes and brain are the results of the coincidences and noise. One author Shermer calls the tendencies of receiving the pattern is patternicity.
Storytelling: When we find the patterns, then we try to find the cause-effect explanation. When we write the explanation we conclude it, fill the gaps and frame the results as story. We feel pattern first and narrate the explanation based on them. We store the single event and transform it into rules.
Confirmation: When we write the story of the understanding of the pattern. Then we try to find some proofs or evidence which supports our arguments and stories. For doing so, we search evidence which supports our thought or narration and interpretation of the result which achieves the goal.
A4) When we become the visualization designer, we familiarize with the word research. While studying the concept of data interpreting and visualizations we apply the rules compared to the what/ when which always search the pieces missing in the model.
In this when we use some social networking sites like twitter or quora we find the interesting pattern based on the cause-effect relationship and we made conjecture about it. Our conjecture can be as follows:
What we want to convey that good conjecture can be made using the several components and it is hard to change without changing the whole conjecture. The three point are as follows:
Hypothesizing: A conjecture which is empirically tested is called hypothesis. Hypothesis is related to the assumption of the problem related to the query. In this we use the term predictor and independent variable.
An Aside on Variables: Variables has lot of flavours. In this we calssify the variables on the basis of the value of the data. That is it can be nominal, ordinal, Interval and ratio.
On studies: When we once decide the hypothesis then it’s time to test the hypothesis against the reality. In this we take cross validation test it means that some part of data is classified in terms of training data and some part of data is in terms of testing data. Then we test our data and try to rectify that our hypothesis is right or wrong.