Autumn 2018: Assignment 1
Relational Data Visualisation
For this assignment, you are required to identify and develop one (or more) visualisation(s) for relational data sets using existing tools or software. You might use the sample data sets at the tutorials as well as the provided visualisation techniques. Alternatively, you are encouraged to search and use other visualisation tools and/or datasets in literature. Based the visualisation(s), you can to explore to find insight, patterns, ir(regularity) and interesting property from the visualisation.
You are also required to write a report (approximate 1500 words but no limit to) on the following aspects:
Note: images (as figures) are essential and should be included in the report to illustrate the visualisations, results and findings.
Marking criteria for the assignment includes
Students must individually complete the visualisation(s) and the report. The report should be typed and submitted online through vUWS as a Word or pdf file. A high standard of professional English and neat logical structure (including consistent and complete referencing style) is expected.
You are required to submit a declaration with the following claim (in a text file or world file).
I hold a copy of this assignment that I can produce if the original is lost or damaged.
I hereby certify that no part of this assignment/product has been copied from any other student’s work or from any other source except where due acknowledgement is made in the assignment.
No part of this assignment/product has been written/produced for me by another person except where such collaboration has been authorised by the subject lecturer/tutor concerned.
The declaration, visualisation program(s) and data sets, and the report should be submitted via vUWS before the deadline for marking purpose. In Turnitin submission system, the report (preferable in PDF or MS Word) is submitted separately together with a compressed zip file containing all supporting program(s), data sets or supporting works. Please ensure the file names include your student id. Submission that does not follow the format is not acceptable. No hard copy of the work and email submission is acceptable.
Please note: it is not advisable to copy the materials from illustrating samples, your friend’s works, works from previous years, or other sources. In addition to Turnitin checking, I may run a cross check the reports for detecting plagiarism. Failure to comply plagiarism’s avoidance may lead to a misconduct with serious penalty.
The below presented visualizations in the form of pictorial representation are created using the datasets that holds various information regarding FIFA18 and the three datasets used here like player attributes, player personal data and players playing positioning. And all the three datasets are joined using a common key that is present in the three sheets, and here ID is the common key that is present in all the datasets and they are joined using the inner join operation. And here some of the different visualizations (Charts, bubbles, maps) of Tableau tool will be presented. There will be certain reasons and functionality behind the use of different visualizations that are going to be showcased here.
And we be discussing about the technical details for each charts or maps that are being used, advantages and disadvantages of the visualizations, and what is the main and key findings from the charts that we have prepared and how it will be helping us is in better and effective decision making and finally our critical thinking based on the visualization that has been made out. The important reason for getting the key findings is that it helps us in better understanding about the data and if any certain decisions need to be taken based out completely on the figures shown that eases the work of taking right decisions.
Figure 1 : Average Age of Players
From the above figure we can see that the stacked bars are used for representing the Average Age of the players who are playing for different clubs and also they are grouped under Nationality for better understanding. It’s seen that the Clubs are mapped under one axis and the Average age on the other axis. We can see from the figure that once the pointer moved to a certain bar we could see the details of what is the average age of players playing for a particular club and to which country they belongs to. This is actually achieved by mapping the dimension Nationality under Color marks, where each country are represented using different colors, that gives better understanding about this chart. We have added the AVG aggregate function on Age dimension to get the average age of players. Apart from this we have used a filter on the work sheet and this filter is based on the Clubs. It can be seen like when ALL check box is marked we could details about all the clubs and if we want to see the average age of players for particular group of clubs we can select only those and can see the data required.
The main advantage of this stacked bars visualization is that, we can all the representations in the form of small bars. Each stacked bars here represents the data so anyone looking at chart can easily understands what is conveyed just by pointing to the respective bars. And in the perspective of disadvantage we could see is that since the total number of clubs is many we don’t get to see the complete information of chart in a single screen, where we need to scroll towards right side to get the complete view of all the clubs that are mentioned in the chart. This is one of the important disadvantages with the stacked bars kind of visualization. Because of this disadvantage that has been discussed here this cannot be used for large data relational datasets where getting all the facts in a single sheet looks impossible.
And the important key findings and analysis from this datasets with respect to this chart is that some of the clubs that are based of Brazil are having only Brazilian players rather than signing the players from another different country.
As part of critical thinking this results helps to conclude that the Clubs in Brazil are more dependent or more trusted on the Brazilian players rather than hiring players from other country. There could be various reasons for this but as per our understanding we can say that the Brazilian clubs are more trusted on their local players.
Figure 2 : Numbers of players from different countries playing for a particular club
The above figure tells the story of total number of players from each country representing a particular club and along we are also displaying the maximum heading accuracy with respect to the clubs. And the visualization that we are using here is bubble charts where some certain information will be present in each bubble that is similar to capsules. Since this is bubble chart we won’t be using any axis here where this will be handled by the different marks present in the worksheet. Here the dimension nationality is given a color mark where we can see that a certain colors represents a country and here the sandal color represents the Netherlands that is evident from the above figure. Along with that name of the club and the maximum heading accuracy is being marked under the label, where that is the reason when a pointer is moved to a particular bubble it gives all the necessary information about like Name of the Club, Nationality of the players, How many players from a certain country and maximum heading accuracy. And also a filter is added to this, dimension Nationality is added as the Filter here. Once the certain country is selected we can see the filtered out data which shows the clubs only which the respective country men plays and rest of the things are ignored. This helps in many ways in identifying the facts.
The main advantage of using this bubble chart is that, we can get all the necessary information in the form capsules. And another advantage is that all the informations are covered in a single page and nothing misses out, we don’t have to scroll down or scroll right. So once we move over a bubble we get to know all the information about that data capsuled over the bubble. The disadvantage is that size of the capsules when large datasets are used some of them will be very smaller in size and getting the information about that double becomes a little tedious process. And yes this bubble chart can be used for large datasets since everything is covered under single sheet but only problem is the size of the bubbles. Larger the dataset smaller the bubbles will be in majority of the cases.
One of the key findings based up on the this chart is that based on the color pattern and size of the bubble we can see that the countries like Argentina, Germany and England have ore number of football players when compared to the other football playing countries (It is very evident from the main work sheet).
As far as critical thinking is concerned we can say that the countries Argentina, Germany and England give more importance to football sport and helps in nurturing the young and fresh talents of this sport.
Figure 3 : Value and Potential of a Player:
From the above tree map visualization we can see the value and the potential of the each player with respect to their Nationality. And also there is no dependency between the potential and value of a player where both are completely independent. Since this is a tree map there is no concept of axis here and these are handles based on the marks present in the worksheet. The picture depicts the tree block here has a sub tree block where each tree represents a different country. And the sub blocks inside country holds the value of the each player based out of that country where when the cursor pointed towards the sub block we could see all the information like Name of the player, his value and potential and the country to which he belongs. This is achieved by giving the Nationality field color marks, potential field size mark (Bigger the sub blocks more the potential will be). And the name of the player and value of the player has been assigned to Label mark. Apart from this we have used a filter on the work sheet where using the filter the details for a particular country or certain group of countries can be seen for better understanding.
The Advantage of using the tree map is that we could see the data of all the players listed in the datasheet where specifically we got to know the value and potential of the each player. The disadvantage of this chart is that we need to use filter in case we need a data of only a particular set of countries, since the current one is difficult for getting the information about a certain specific group.
One of the important key finding from this chart is that Spain has more potential players, in spite of having less players than England. The Potential of ten Spain players are more than 600 which is the highest from the current dataset.
With respect to critical thinking we can see that evidently Spain focuses more on the potential of the players when compared to any other Football playing countries in the world.