CS5603 Data Visualization Coursework: Using Tableau Software For Airline Company Assessment Answer

pages Pages: 4word Words: 890

Question :

Coursework brief

Assessment Title Visualisation Design Task

CS5603 Data Visualisation

MAIN OBJECTIVE OF THE ASSESSMENT

In this assessment you will develop an interactive data visualisation, using a specialist tool like Tableau (or Power BI, Qlik etc), to allow demonstrably useful insight into a complex dataset. By insight we mean not only helping users to answering pre-planned questions but also supporting the process of exploration (i.e. arising from unplanned questions).

successful approach will depend on the nature of your chosen data and the types of questions you initially intend to answer. For instance, your visualisation might be a kind of information graphic designed for publication on a popular website that tells a story around a particular topic (e.g. changes in third world prosperity over the last decade). Alternatively, your visualisation might be more akin to a dashboard designed to provide rapid insight into key performance indicators relating to an organisation’s business processes (e.g. a department-wise analysis of profitability or efficiency). Either way your final design/implementation will be constrained to a single screen view whilst following best practice in terms of visual representation, presentation and interaction.

You will work in a small group in the early stages but your resulting proposal and final solution and report should be an individual effort. Through this assessment you will demonstrate your achievement of the module learning outcomes. To pass these learning outcomes you need to design and implement an interactive data visualisation using data from a specified domain (e.g. social, business, scientific), which communicates some relevant and nontrivial insight to the user (LO1). You should then critically reflect on the issues faced throughout the process and explaining how you applied theory and technical knowledge to resolve these during design and implementation (LO2).

DESCRIPTION OF THE ASSESSMENT The task is split into two parts: a Proposal and a Final Report. The proposal is a formative assessment which you will submit individually. You will start this task by working in a small group. Together you will select a common dataset, formulate relevant questions. You will then choose two questions that you will individually address and formulate an initial visualisation design artefact to indicate how you propose to solve the problem. You will submit this proposal as two-page report. Whilst this assessment does not contribute directly to your module grade, you will receive valuable formative feedback and additional marks will be awarded to students who make demonstrable use of this feedback in their final report.

Part two, the final report, is an individual submission in which you develop the ideas presented in your proposal into a final solution. This part contributes 100% to your final grade and you must pass (C- or higher) this part to pass the module overall.

At the start of the task, you should form a group of three or four students. Preferably, you should self-select but, if necessary, you can ask the module leader to allocate you to a group. If you take the first option, then you must inform the module leader of your group membership within by Week 21 at the latest. All students who are not part of a registered group by this date will be assigned to one.

You should take a structured approach to the task including the following steps (explained in more detail later): • Dataset selection (Group) • User type/persona specification (Group to pool alternatives, then individual choice) • Question formulation (Group to pool questions then individual choice) • Requirements specification (Individual) • Prototype designs (Individual) • Implementation (; Individual) • Critical evaluation (of both solution and tools; Individual)

Data Selection In your group, you will select an appropriate dataset that is of interest/relevance to all members. These data should exist as a table of at least 10 columns (variables) and at least 500 rows (cases) and be in a format that can be downloaded and saved locally (e.g. as an Excel, CSV or SQLite file). If you want to use a table that you feel is suitable but does not meet these criteria, please discuss with the module leader prior to the Proposal deadline. The data should also come with sufficient metadata and documentation to enable you both to formulate relevant questions and draw credible conclusions from your analysis. A summary description of the data domain and key variables should be included in the proposal and report.

User type/Persona specification Your objective is to design a data visualisation that allows a user to achieve useful insights from your dataset. When thinking about the kinds of questions that could be asked, it’s helpful to have a particular user type or ‘persona’ in mind. For instance, if your data relates to reported crime statistics, you could describe the persona of a “house-buyer” who is moving to a new town or region and wants to find a safe area to live close to their workplace. An alternative persona might be a “senior police officer” who wants to monitor current status and changes of crime levels across their jurisdiction. By describing the user in this way, it should be relatively easy to formulate the kind of questions they are likely to ask, plus other relevant requirements. You can brainstorm personas as a group, but each persona can only be adopted by one member.

Question formulation You will then work with your group to create a pool of interesting questions that your user persona might feasibly ask of the data. Try to think of different questions for each persona. These should be open and non-trivial questions that would be difficult to answer clearly using non-visual analysis. For instance “Were average house prices in 2014 higher in London or in Birmingham?” is a poor example, whereas “How has the geographical distribution of house-price to average earnings changed in the last 10 years?” is a good one. In your proposal, you should declare two questions. These questions may focus on different aspects of a single overarching question, but it is important that each is distinct from the other and also those proposed by other group members. If any issues are identified in your proposal feedback, you should revise the question(s) accordingly and provide an rationale for the change in the final report.

Requirements specification When designing any kind of data visualisation, it is important to consider user requirements. Some will relate to the questions asked, for e.g. what data is required (perhaps you need to join additional data to your existing set) and how should this be prepared and represented (e.g. does the user need to see a correlation or part-whole relationship?) in a way that most easily reveals the desired answer? Other relevant considerations might include relevant aspects of the user’s background (e.g. experience using statistical graphics), context of use (e.g. indoors/outdoors, private/public) and the device through which they will interact with your visualisation. You should state your key requirements as a short (max six items), bulleted list in the proposal. In the final report you should elaborate briefly on requirements suggesting ways in which each has influenced your design.

Prototype Design Bearing in mind your data, questions and user requirements, you will sketch a prototype design (e.g. using the Paper Landscapes method) to illustrate how your intended solution will look and function. You should create your initial design, for the proposal, before you have attempted any kind of implementation i.e. your design at this stage should be guided by the theory and best practice that you have learnt in the lectures, not the limitations or capabilities of your intended visualisation tool. Pay careful consideration to representation (e.g. visual variables, graph choice), presentation (e.g. view/axis arrangement, colour schemes) and interaction (e.g. brushing, linking and filtering) choices throughout the design and implementation process to ensure you meet your stated requirements. The entire visualisation should fit within a single screen. You can assume a maximum resolution of 1080p but smaller is permitted (e.g. if your intended display medium is a mobile device). After the proposal you will refine your design, in response to feedback received and any known constraints imposed by the primary implementation tool (e.g. Tableau) and translate this into an implementation using at least one tool (i.e. Tableau). You should present both the original prototype and a screenshot of the final implementation in the report, explaining how and why the design evolved during the implementation process. For a higher grade, if you feel confident, you might also devote one or two paragraphs to discussing how HCI/UCD/UX theory, principles or methods was (or could potentially) be applied to improve the design process and overall user experience. Any discussion of this kind must be explicitly supported with relevant references.

Implementation You will then attempt to implement your revised design using Tableau. Remember, no evidence of implementation is required at the proposal stage. However, in the final report you should describe clearly, but concisely how you implemented your solution, commenting on any issues faced during the process and how you overcame them (or not). For a higher grade, you can attempt a second implementation using a different tool (e.g. Power BI or Qlik). You are not required to describe the implementation in the same detail as for the Tableau solution, but you should present a screenshot of the visualisation along with a short paragraph highlighting how and why this differs from the proposed design and Tableau solution.

Critical Evaluation At the end of the final report, you should evaluate and discuss the extent to which your implementation(s) answered your planned questions. In addition, extra credit will be given if you are able to demonstrate how you were able to answer an unplanned question. This must be a genuine, non-obvious discovery that wasn’t known before starting your analysis and could not be perceived directly from the initial static overview. This means you must have performed some exploratory interaction and view transformation in order to achieve the insight. The process of achieving both planned and unplanned insights should be clearly and concisely described, using screenshots where appropriate. These ‘walkthroughs’ should be sufficiently detailed to allow a competent third-party (i.e. the marker) to repeat the process using the submitted project files. Finally, after the walkthroughs you should provide a final, critical discussion which both compares the strengths and weaknesses of the tool(s) that you used and reflects on your overall learning experience during the module and any personal learning objectives you want to achieve in the future.

Show More

Answer :

CS5603 Data Visualization Coursework

Introduction

X is an airline company in Australia that has defined its strategy of offering Riche products to their customers and differentiating their businesses, services and performance. For achieving these, they apply data science to improve customer experience and profitability. 

This data analysis will be done using ‘Tableau’ software where the historical weather data for Australian cities will be plugged to create a visual deck. The data pertains to November 2007 to June 2017 and variables include weather related variables such as, date, location, Min & Max Temperature, Rainfall, Evaporation, Sunshine, Wind Gust Direction, Wind Gust Speed, Wind Direction at 9am and 3pm, Wind Speed, Humidity, Pressure, Temperature, Rain etc.

The above data analysis aims to answer following questions:

  1. How well we can predict airline flight cancellations or delay in Australia if we include weather data with historical flight data? 
  2. What time of year/ week/day is best for avoiding flight delays?

Design

I considered various industries that can be used for the purpose of this assignment. Once I finalised that I will be presenting something related to Airlines industry, I tried various data variables that can impact this industry. I considered ideas such as cost reduction, providing services to the customer that differentiate the airline from other competitors, staffing and training issues. I had to reject some of these ideas due to lack of publically available data for the purpose of analysis, some ideas were rejected due to very few variables involved, thereby making it very difficult to analyse the variables in order to answer research questions and some ideas were rejected as they were towards qualitative analysis instead of quantitative analysis, making them a misfit for purpose of this assignment.

As an airline customer, I realised that one of the most crucial points is how frequent the airlines is able to keep its flights on-time. The timing of the flight should be maintained and unnecessary delays should be minimized so as to ensure convenience for the customer and increase airlines’ reliability and reputation (Bronsvoort et al., 2009)

The weather condition is one of the important factors that cause flight delay or cancelation. Additionally, the flight delays have negative impacts for the airlines, especially the low fare carriers. Given the uncertainty of their occurrence, business passengers usually plan to travel many hours earlier for their appointments, increasing their trip costs, to ensure their arrival on time. On the other hand, airlines suffer penalties, fines and additional operation costs, such as crew and aircrafts retentions in airports. Furthermore, delays also jeopardize airlines’ marketing strategies, since carriers rely on customers’ loyalty to support their frequent-flyer programs and the consumer’s choice is also affected by reliable performance. The estimation of flight delays can improve the tactical and operational decisions of airlines managers and warn passengers so that they can rearrange their plans (Rosenow, Lindner & Fricke, 2017)

On the other hand, knowing the best weather condition that attract passenger more to travel to Australia can help airline to companies to increasing their trip costs, and plan them.

Once I searched for data, I was able to find various data related to factors that contribute to delay of a flight.

The dataset is the historical weather data of Australian cities from November 2007 to June 2017 in csv format (13.6 MB in size). It has got 142,194 rows and 24 columns (including date, location, Min & Max Temperature, Rainfall, Evaporation, Sunshine, Wind Gust Direction, Wind Gust Speed, Wind Direction at 9am and 3pm, Wind Speed at 9am and 3 pm, Humidity at 9am and 3pm, Pressure at 9am and 3pm, Cloud at 9am and 3pm, Temperature at 9am and 3pm, Rain Today, and Rain Tomorrow. I’m planning to use other data set that will help to answer my research questions, which are flight delay dataset and passenger movement by month (Australian Government Website, 2019). 

Furthermore, I planned the data analysis to answer two research questions, namely:

  1. How well we can predict airline flight cancellations or delay in Australia if we include weather data with historical flight data? 
  2. What time of year/ week/day is best for avoiding flight delays?

Implementation

The project required me to use tableau software. I used historical weather data for Australian cities spanning 12 Airports. The time period for all data was November 2007 till June 2017. The variables included are:

  1. Date, 
  2. Location, 
  3. Minimum & Maximum Temperature, 
  4. Rainfall, 
  5. Evaporation, 
  6. Sunshine, 
  7. Wind Gust Direction, 
  8. Wind Gust Speed, 
  9. Wind Direction at 9am and 3pm, 
  10. Wind Speed at 9am and 3 pm, 
  11. Humidity at 9am and 3pm, 
  12. Pressure at 9am and 3pm, 
  13. Cloud at 9am and 3pm, 
  14. Temperature at 9am and 3pm, 
  15. Rain Today, and 
  16. Rain Tomorrow

Additionally, I plan to use other data set that will help to answer my research questions, which are flight delay dataset and passenger movement by month. In all, the data sheet has got 142,194 rows and 24 columns.

The tableau dashboard has been synced and filters have been activated. Following is a screenshot of the dashboard:

So, to answer all the questions, we just need to select a particular airport from the first graph and all the other graphs will show data for that particular airport making it easy for other to understand. 

For example, if we select an airport, windspeed, cloud and pressure would tell us what time is the best to fly, whether, morning or evening. On the other hand, Rainfall versus Risk graph will tell us which month to prefer and how much is the risk level to travel from that airport. The following are screenshots of the graphs that will get generated once an airport is selected:

The above graph indicates the passenger traffic, risk versus rainfall and other profiles for the Sydney airport. Sydney is the busiest airport with respect total number of passengers in the twelve selected airports. Risk and rainfall are moderate as compared to other airports and wind speed, could and pressure is also moderate. But if a passenger wants to take a flight from this airport it is suggested to take in the morning as the windspeed is less as compared to the windspeed in the evening.

The above graph indicates the passenger traffic, risk versus rainfall and other profiles for the Newcastle airport. Newcastle is the quietest airport with respect total number of passengers in the twelve selected airports. Risk and rainfall are moderate as compared to other airports and wind speed, could and pressure is also moderate. But if a passenger wants to take a flight from this airport it is suggested to take in the evening as the clouds are less as compared to the clouds in the morning.

The above graph indicates the passenger traffic, risk versus rainfall and other profiles for the Cairns airport. Cairns does not have much footfall as compared to total number of passengers in the twelve selected airports. Risk and rainfall are very high as compared to other airports and wind speed, could and pressure is also high. So, if a passenger wants to take a flight from this airport it is suggested to take in the evening as the clouds are less as compared to the clouds in the morning.

Walkthroughs

The data considered for this report is from two different sources: one for the weather related variables and one for the passengers travelling out of various airports. When the data sets were compared, twelve airports were common between the two and hence, data analysis was considered for these twelve airports. 

These two datasets were then imported in the Tableau and an inner join was applied on Airport and Location, that helped in filtering the relevant airports which were present in both the data sets. Filtering the data is the first step, then comes linking the data together so that we have one data source which can be used to reach to the desired analysis. Filtration resulted in getting the relevant 12 airports which whom we can derive our analysis around.

After filtration and linking comes checking the data source if it has all the relevant fields for our ultimate output or not. Once that is checked, we start with the 1st worksheet which is the total number of passengers from each airport. This worksheet is important as to understand which airport see the maximum number of passengers flying in and out from that particular airport. Then there is the 2nd worksheet which is rainfall and risk on dual axis versus airport. This worksheet is important to check the two major things for a flight to be delayed, i.e. Rainfall and Risk. These were compared on dual axis and was synchronized so that the result should be on the same axis and values should match each other. Then there is the 3rd worksheet which is the riskiest airport, this was considered on 3 parameters wind speed, cloud coverage and pressure, all at 2 time period, 9 am and 3 pm. This helped us to decide when should a person should fly, in the morning or evening.

All the 3 worksheets were created to make an appealing dashboard which would be comprised of all the 3 worksheets. The first graph in the dashboard is of Total Passengers at a particular airport. The second graph is the rainfall versus risk graph. The third and the bottom graph is of wind speed, cloud and pressure against the 12 airports. All the graphs were joined together by filter which helped to view things in a generalized manner. Filter helps us to view a particular thing at a particular graph and rest all the unnecessary things are eliminated. Like explained earlier, if we select a particular airport, all the other graph shows result of that particular airport only.

After everything was done, the file was saved in .twbx format so that all the data is there with the sheet and nothing is lost in between.

As mentioned above, the Tableaux dashboard has been synced and filters have been activated such that as soon as one of the twelve airports is selected from the first graph, all other graphs will start indicating the data for that particular airport. The three main aspects that have been covered include:

  1. Total passengers form an Airport
  2. Rainfall versus Airport
  3. Riskiest Airport

These aspects will be discussed in detail in next few pages:

Total Number of Passengers

The above graph depicts the sum of total passengers from each of the twelve airports. The data can be filtered for various months ranging from 1 till 12 and also keeps null values.

So, in this graph we have taken total number of passengers traveling from each airport to see which airport has the greater number of travelers and which has the least. 

Rainfall versus Risk of Each Airport

This graph is a scatter plot with four data fields: Rainfall data, Risk data, Airport names and Months. In the scatter graph, we have taken all 4 data fields together and merged them to make a comparison that a particular airport is risky if the rainfall and risk is high and vice versa. Orange colored circles are Risk and Blue colored circles are Rainfall.

The above scatter plot shows only 2 dots of each airport, because the data given to us in Data Set is same from Month 2 to Month 12. So, Tableau only considers the data in the 1st and 2nd month. 

The color shows details for Risk Mm and Rainfall and details are indicated for a single month. The data above has filter set to ‘Rain Today’, Action (Location) and Action (Airport). The Rain Today filter keeps ‘Yes’ and ‘No’. The Action (Location) filter keeps 49 members. The Action (Airport) filter keeps 12 members. The above view is filtered to Month that ranges from 1 till 12.

Riskiest Airport

This graph considered 7 fields, windspeed 3pm, windspeed 9am, cloud 3pm, cloud 9am, pressure 3pm, pressure 9am and location. This graph is a scatter plot and helps us to decide which airport is the riskiest with respect to comparison of windspeed, cloud and pressure at 3pm and 9am. It also helps us to decide whether we should take a morning flight or the evening one in order to avoid turbulent weather and consequent delay. Dark colored circles are 9am and light-colored circles are 3pm.

The above graph indicates sum of windspeed at 9am and 3pm, sum of cloud at 9am and 3pm and sum of pressure at 9am and 3pm at each location. Color shows details for Location. The data is filtered on Action (Airport) which keeps 12 members. 

Evaluation

From the above data analysis, we were able to observe trends and patterns in the 12 airports that were used to depict the data in such a way that it gives answers to the following questions:

  1. Which airport has the maximum number of passengers over the years: This is directly evident from the bars in the graph. As from the total graph depicted above, Sydney airport has the maximum footfall, followed by Melbourne and Brisbane in that order.
  2. Which airport has the minimum number of passengers over the years: This is directly evident from the bars in the graph. As from the total graph depicted above, Newcastle airport has the maximum footfall, followed by Launceston and Townsville in that order.
  3. Which airport is the safest: This can be seen from all the graphs that Melbourne has the second most number of passengers and in the Risk and Rainfall graphs it lies right at the bottom with lowest score. Also in the other graph the values of this airport is moderate as compared to others, so it is evident that Melbourne is the safest among all other 12 airports.
  4. Which airport is the riskiest: This can be seen from all the graphs that Cairns has one of the lowest number of passengers but in Risk and Rainfall graph it is right at the top. Also in the other graph the values of this airport is almost at the highest as compared to other airports. So, it is evident that Cairns is the riskiest among others.
  5. Which airport is good to travel from in a particular month: Seeing that almost all the airports had the values of 1st month higher than the other months from the Risk and Rainfall graph, it is suggested that one should avoid travelling in the first month.
  6. What time one should take a flight from the above 12 airports (Morning or Evening): Analyzing the Riskiest airport graph, it is quite evident that one should avoid taking evening flights and should opt for morning flights. Sky is clear and wind speed is moderate in the morning. Pressure is almost the same in the morning or evening.

Hence, we were able to profile the busiest and least busy airports, the riskiest and safest airports. The analysis also helped to identify the best time, whether morning or evening to take a flight from various airports. Additionally, we also determined the best months to fly so as to avoid weather related delays.

However, this analysis was limited only to weather-related factors that can impact the timing of flights. There can be many other factors that can impact the timing of flights from various airports. This can be a point of direction for further research in this area.