Dataset Analysis on Modes of Transportation Sued in NSW and the Patterns
Section 1a: Introduction
The following assignment will discuss the provided dataset that provides insight into various modes of transportation sued in NSW and the patterns therein. The data will be analysed to understand the distribution of users amongst various modes of transportation, namely bus, train, ferry and lightrail. Further, the data provides breakup basis various locations and stations as well.
Another dataset selected is based on the train statistics for NSW available on the government’s transportation website (Bureau of Transport Statistics, 2014). The data provides insight into journey for work by various passengers and respective share of train journey. The data is divided basis various centres and provides total trips as well as railway share. Further, the data provides these numbers on basis of including and excluding ‘walk only’ so as to get a better idea of use of transportation as well as journeys covered on foot.
The citation for the article is: Bureau of Transport Statistics (2014) ‘Train Statistics 2014: Everything you need to know about Sydney Trains and NSW TrainLink’ [online] Accessed from: https://www.transport.nsw.gov.au/sites/default/files/media/documents/2017/Train%20Statistics%202014.pdf
Section 1b: Dataset 1
The dataset 1 provided is primary data which is a subset of the main data available on government website. The subset covers the period of 8th August to 14th August, 2016.
The provided data covers count of passengers on basis of transportation mode used (train, bus, lightrail and ferry) and also provides breakup basis various locations. Further, it also provides information about whether the tap is held on or off for each location.
Section 1c: Dataset 2
The dataset 2 is also primary data that has been collected basis Australian census 2011. The data selected is based on the train statistics for NSW available on the government’s transportation website (Bureau of Transport Statistics, 2014). The data provides insight into journey for work by various passengers and respective share of train journey. The data is divided basis various centres and provides total trips as well as railway share. Further, the data provides these numbers on basis of including and excluding ‘walk only’ so as to get a better idea f use of transportation as well as journeys covered on foot.
Section 2a: Type of Mode
For this purpose, a pivot table was created by dividing various modes of transportation and sum total of the provided count.
This was used to create a pie chart as follows:
It is clear that the ‘train’ mode of transportation is the most widely used that accounts for as much as 60% of the total count of the provided subset.
Section 2b: Hypothesis Testing
The breakup of various modes of transportation is provided as follows:
It can be seen that the ferry and lightrail form less than 3% each and hence, do not contend for chosen mode by more than 50% of public transport users in NSW. The remaining two modes, bus (36.6%) and train (59.7%) account for majority users
For this, the counts of bus and train were separated in two columns. The mean and variance of the two were calculated using excel formulae. Then, z test for two samples was used and output is as follows:
From above, it can be seen that one tailed P value is much less than the alpha of 0.05. Hence, we reject the null hypothesis that the mean difference in the counts of bus and train is same. In other words, we can say that there is significant difference between means of count of bus and train and that there is evidence that count of train accounts for more than 50% of users during the period.
Section 3a: Recommendation for Underground Line
For this purpose, a pivot table was created with various modes and locations. Then, a filter was put on ‘train’. Further, the pivot was used to draw out data for the three stations using vlookup:
This data was used to create a bar chart as follows:
From the bar chart above, it is clear that the Parramatta station has maximum number of passengers at 1,578 (or, 2.6% of total train users). Central station is already busy at 3,997 passengers. Hence, it seems that the most suitable underground line will be Parramatta to Central Station so as to ease the heavy footfall.
Section 3b: Hypothesis Testing for Tap on/off
For this purpose, single factor ANOVA test was conducted in MS-Excel. The ‘count’ data for ‘on’ and ‘off’ was separated out in two columns (only for trains) and the test was run at alpha of 0.05. The hypothesis is:
H0: μon = μoff
H1: μon ≠ μoff
The result is as follows:
From above, we can see that the value of F (2.47) < F crit (3.86). Hence, we are unable to reject the null hypothesis. Concluding, we can say that there is no significant difference in the means of ‘tap on’ and ‘tap off’.
Section 3c: Conclusion for section
We saw that out of the three stations selected for the purpose, Bankstown, Gosford and Parramatta, the heaviest footfall is at Parramatta. Further, the Central Station is very busy at even a heavier footfall. We also saw that the means of tap on and tap off do not vary significantly at the given alpha level of 0.05.
Hence it is recommended to build an underground railway line from Parramatta Station to Central Station so as to ease the heavy footfall.
Section 4: Analysis
Section 5: Discussion & Conclusion
In above report, it was seen that during the period of 8th August to 14th August, 2016, the type of public transport utilised in NSW included bus, train, lightrail and ferry. Further, it was seen that very less percentage of passengers used ferry (2%) and lightrail (1%). The majority of people relied on buses (37%) and trains (60%).
It was seen that top 25 train stations accounted for almost 69% of the total train users whereas top 25 bus locations accounted for almost 53% of the total bus users during the period. The data is presented in graphs below (count on x-axis):
This indicates that the traffic can be much better managed if focus is on top 20 or top 25 locations instead of all of them.
It was found that there is significant difference between means of count of bus and train, indicating that train usage is much more than bus and may be accounting for more than 50% of the transport users.
Hence, focus is on improving traffic management at train stations. The Central Station (6.57%) is the second busiest station (after Town Hall Station at 10.09% of train users) as per the provided data. Analysis was done to understand the need for underground railway line from Central Station to one of the three selected stations (corresponding percentage of users in parenthesis): Parramatta (2.59%), Bankstown (0.80%) and Gosford (0.86%).
Looking at the data, it can be seen that Parramatta is the busiest of the three with highest footfall. Hence, an underground line will be most beneficial from Parramatta to Central Station.
The data provided categorization basis tap on and tap off as well but it was seen through hypothesis testing that there is no significant difference between the means of tap on and tap off. Hence, it seems that the categorization can be ignored for the purpose of analysis without any significant impact on the findings.