Cloud Computing and Big Data Analytics Homework #2
Exercise 1:
Using Hadoop MapReduce, write a program to analyze a log file and extract and print the lines containing a specific word like “MR”:
-You are required to explain the workflow by specifying the Map and Reduce Pairs.
Exercise 2:
Given a csv file containing a list of flights (2016-2019):
Flight 40, Saudi Air lines, Cancelled, 15/8/2018 Flight 50, Air Italia, Not Cancelled, 18/1/2019 Flight 45, Saudi Air lines, Not Cancelled, 11/7/2019 Flight 60, Air Italia, Cancelled, 19/12/2016
Flight 70, Saudi Air lines, Not Cancelled, 18/11/2016
Write a MR program (Hadoop MapReduce) to determine the number of effective flights per carrier.
Exercise 3:
Using Spark transformations and actions, write a program to analyze a log file, and count and print the lines starting with the word “ERROR”:
Exercise 4:
Using Spark-MapReduce, write a program to determine the number of duplicate elements in a list of numbers contained in a given Text file (numbers.txt):
Exercise 5:
Consider the following text files:
Messages1.txt
‘Spark is faster than Hadoop MR.
MapReduce is somewhat limited compared to Spark.’
Messages2.txt
‘Apache Spark has been developed in AMPLAB.
Later Spark has been donated to Apache Software Foundation. Batch processing is done in memory.’
1. Write a program in Spark-MR which performs the following: Using the messages1.txt and messages2.txt files:
You are required to specify Map Pairs and Reduce Pairs, and the appropriate RDD transformations and actions to be used.
2. show the results when executing this program using these two text files.
For solution, connect with our online professionals.