CS5605 Data Management in Industry 4: Reflective Essay Assessment Answer
In this present era, technology has completely changed the way we work, live and interact with each other. Technology has not only changed the nature of human being, but it has also changed the working process in almost every industry. Industry 4.0 stands for the complete integration and digitalisation of the industrial value chain by linking it with the automation technology with the help of communication and information technology. Therefore, for this automation, it is necessary for industry 4.0 in making relevant information available on a real-time basis so that it can help connect the entire entities. For managing this automation, it is always required to handle a massive quantity of information. Therefore, data management is one of the essential facts in the industry 4.0 so that it can help derive more value from the data.
Modern businesses need to process information in a valuable and timely manner so that they can help optimise the business process and make a decision. Present competitive environment of business force business organisation in high-speed processing information and integrating valuable data in the process of production. As an example, it can be said that industry 4.0 change production shortly. Industry 4.0 transformations have provided the capability to the machines in processing the real-time data so that it can help prevent any disruption in the process of production and facilitating self-diagnose. For these purposes, it is always necessary for handling a vast quantity of information. Therefore, a better process of data management needs to be applied in industry 4.0. For taking a practical decision, data management is also necessary for industry 4.0. This study is intended to describe the efficiencies by which data of industry 4.0 can be managed so that it can derive more value
Concept of industry 4.0:
According to Lee et al. (2015), industry 4.0 has a broad vision that can be characterised by bridging the digital technologies with the physical industrial assets in the cyber-physical systems. As argued by Stock and Seliger (2016), a combination of the cyber-physical system, internet of systems and internet of things has made industry 4.0 possible. Industry 4.0 offers the opportunity to the manufacturer in optimising the operation efficiently and quickly. As an example of industry 4.0, the case of a gold mine of Africa can be explained. Industry 4.0 is highly effective for the African gold mine in identifying issues in the level of oxygen during the leaching. The advantages that are created by industry 4.0 are the optimisation and the customisation.
The requirement of data processing in the industry 4.0:
According to Gölzer et al. (2015), industry 4.0 always seeks improvement in the process of production. As mentioned by Qi and Tao (2018), this improvement requires an effective integration among the standardised interfaces, semantics and integration of data so that it can help exchange data and maintain better communication. There are different publications which address the requirements about the data integration, but there is only fewer publication among them deal with the data contents, which is necessary. As stated by Barreto et al. (2017), the effective processing of information is highly essential for industry 4.0. As argued by Wang and Wang (2016), decisions are generally processed in the industry 4.0 with the help of a random group of the cyber-physical system, which creates an ad-hoc network. This group is usually building upon the environmental condition and individual condition in the production. In industry 4.0, the decentralised and autonomous control of the cyber-physical system also requires the goals for the process of decision making. In industry 4.0, overall systems goal is essential for the optimisation of the overall network of production and the value chain while making decisions. As mentioned by Li et al. (2017), enhancement of the decision process in the industry 4.0 requires the utilisation of the formalised and the extracted knowledge that are generally developed from the historical data. This knowledge is highly essential in the process of decision making. This data is mainly created from the machinery which is continuously growing; therefore, it must be processed to develop more values from this data.
Big data analytics in the industry 4.0 for managing data:
Industry 4.0 requires effective and high-performance processing of a large amount of data so that it can help handle the large data volumes. As stated by Niesen et al. (2016), big data analytics is one of the effective solutions for industry 4.0 in handling large data volume so that more values can be developed from it. As argued by Witkowski (2017), different data-related issues that are faced by the industry 4.0 are continuous development of a large amount of data, issues in the accessing large quantity of information in whenever required basis, issues in the real-time processing and accessing the data. From the fact, it can be said that there are mainly two different forms of the big data use cases that are important here in accessing and processing the information that is generated by the industry 4.0 so that it can develop more values in the process of decision making (Reis and Gins, 2017). Among these use cases, the first use case is the data mining which is required because it is helpful in handling the time-consuming data analytics, and mining on the large amount of information (Ur Rehman et al. 2018). However, information's random access and the real-time queries are not supported by this big data use case. For supporting these facts, a complete solution of the big data is required necessary, which is essential for the distributed computing along with the processing of the batch. The relevant software solution for the data mining solution is the MapReduce, along with the Hadoop HDFS.
Second use cases of big data that are required here for resolving different data handling issues in the industry 4.0 is the entry access that is associated with performing the ad-hoc queries in all over the network for facilitating the operative decision making. According to (), entry access requires the solution of the big data that supports the random access of data continuously along with queries on a real-time basis. In that consequences, MongoDB, Cassandra or SimpleDB can be relevant software solution by considering the application related scenario and the infrastructure.
Outline of the case and topic that need to be addressed:
Industry 4.0 refers to the process of development in the chain production and the management of manufacturing (Wan et al. 2016). The main idea behind industry 4.0 is developing a social network with the help of which machines can communicate with each other. This system is called the internet of things. When machines communicate with the manufacturer and each other, then they form a cyber-physical system of production (Wagner et al. 2017). All these facts create integration among the real world with the virtual world. This integration enables different machines in collecting and analysing the live information so that it can help make a decision.
Industry 4.0 has created an industrial revolution by applying the operational technique and the advanced production with the smart digital technologies so that it can help form a digital enterprise that is autonomous and interconnected (Stock and Seliger, 2016). For facilitating all these activities, industry 4.0 creates a huge amount of information with the help of industrial equipment so that it can help hold the potential value of the business. The big data that are generated in the machinery of the industry takes the advantages of the internet technology of industry (O’Donovan et al. 2015). It uses a huge quantity of raw data that are helpful for the management in making a good decision so that customer service can be improved and the maintenance costs of the organisation can be reduced (Wan et al. 2017). Therefore, it can be said that for maintaining the communication among different types of machinery and the manufacturer, industry 4.0 develops huge quantity of information which needs to be effectively managed so that more values can be generated from it.
For resolving the issues about a large amount of data handling, big data analytics is one of the effective solutions. Therefore, it can be said that the software solution for big data is one of the most important components of industry 4.0 (Wagner et al. 2017). Software solution of big data always brings a large variety of the applications, characteristics and capabilities. Among different big data software solution, Apache Hadoop is one of the software solutions. Along with the Apache Hadoop, there are other software solutions also that are SimpleDB, Redis, Terrastore etcetera. Data can be easily be managed by using the NoSQL database (Gilchrist, 2016). Different levels of taxonomies are there that are highly efficient in classifying the NoSQL databases. With the help of the criteria of the data model, it is possible to classify the NoSQL data. There are mainly three different types of data model such as key-value, column-oriented along with the document-based. Other criteria with the help of which big data solution can be classified are the data processing principles.
With the help of these data processing principles, it is possible in processing the data gained from the industry 4.0 and using it in the process of decision making so that more values can be generated from it. These data processing principles can be segregated into three different types of principles (Gilchrist, 2016). Among these, three principles, the first principle is associated with batch processing, where a standard algorithm is applied to split the large dataset into the mall subset. After that, it is necessary for processing the subsets. The common algorithms that are used here are the MapReduce algorithm (Wagner et al. 2017). In that case, the second principles that are applied here are associated with storing data in the semi-structured model of data. This fact helps resolve the real-time queries and achieve the random access on the data without any kind of data joints and involvement of the time-consuming operations.
Analysis of the case by using the selected framework:
Among different data management framework of big data, Apache Hadoop is one of the frameworks that effectively shows how data can be managed. Apache Hadoop is one of the scalable, open-source and fault-tolerant frameworks that are written in Java. This framework is highly effectual to facilitate batch processing. This framework is highly effective in processing a large set of information on the cluster of commodity hardware (Mavridis and Karatza, 2017). Modern Hadoop framework version is mainly composed of different layers or components which work effectively in processing batch information. In industry 4.0, data are generated at the extraordinary volumes and speeds. Therefore, storing and managing this information is really difficult for machinery (Ousterhout et al. 2015). For facilitating the data management and data storage to generate an effective value, an analytical system of industry 4.0 need to include real-time analytics
Different layers in the Hadoop framework that are associated with the batch processing is the distributed file system of Hadoop, YARN and MapReduce. HDFS is the layer of the distributed file system that is involved in replicating and coordinating storage across the cluster nodes (Ousterhout et al. 2015). This component ensures the availability of information even during the failure of the hosts. HDFS is used as the sources of information, storage of information and the calculation of the information. The first step of data management in industry 4.0 is the collection of proper information. As data are generated from an increasing number of sensors; therefore, the automation level is higher in the modern equipment (Mavridis and Karatza, 2017). HDFS is associated with collecting information because they are acting as sources of information. Therefore, it can be said that if the Hadoop framework is applied in industry 4.0, then the HDFS layer of the Hadoop framework can be helpful in industry 4.0 in storing information.
YARN is one of the resources negotiators, which is one of the components for cluster coordinating of the Hadoop stack. YARN is associated with managing as well as coordinating the resources and then scheduling jobs. YARN has made it possible in running more workload on the Hadoop cluster (Siddique et al. 2016). Therefore, if Apache Hadoop will be applied in the industry 4.0, then the YARN component of the Apache Hadoop will be associated with coordinating the data and information from the HDFS so that it can help schedule the task in the cluster of Hadoop.
MapReduce is the batch processing engine of Hadoop. Therefore, MapReduce is associated with processing huge amount of data (Greeshma and Pradeepini, 2016). From this fact, it can be said that if Hadoop framework applies in the industry 4.0, then the MapReduce components of the Hadoop framework will handle a large amount of data that are generated from the machinery and sensors in the industry 4.0 (Mavridis and Karatza, 2017). After that, MapReduce components will also perform an analysis of these large amounts of data on a real-time basis. Therefore, MapReduce components of the Hadoop framework are associated with performing the main tasks in industry 4.0 (Ousterhout et al. 2015). The tasks that are performed by the MapReduce is developing rapid decision making, storing and managing information. MapReduce is associated with performing the main task in the industry 4.0 that is batch processing. MapReduce is involved in splitting the entire dataset into a different subset. After that, it would resolve different types of real-time queries.
Figure 1: MapReduce architecture
(Source: Gorsevski et al. 2019)
MapReduce architecture will perform the entire task of batch processing in the industry 4.0 with the help of two types of tasks that are Map tasks and the reduce tasks (Ghazi and Gangodkar, 2015). In the map tasks, two types of tasks are generally occurred, such as splitting of the entire input and then the mapping of the input. After that, the reduce tasks are associated with shuffling and then reducing (Greeshma and Pradeepini, 2016). After applying the Hadoop framework in the industry 4.0, the MapReduce architecture is mainly involved in dividing the entire input (that has been achieved from the HDFS) into different subtasks so that it can be helpful to manage the large size of data gathered from the machinery of industry 4.0 (Verma et al. 2015). After that, the job tracker coordinates the activities with different data nodes by performing schedule. After that, the tracker is associated with the execution of different types of tasks that resides on every single data node so that the proper part of the job can be executed (Ghazi and Gangodkar, 2015). Then, the job tracker and the tasks trackers are associated with checking the status of the entire system.
While the MapReduce architecture would receive the set of data from the output of industry 4.0, then the data would be segregated by the MapReduce (Hernández et al. 2015). The Next thing that would be done by MapReduce architecture is creating map tasks for each of the subtasks so that every subtask can be recorded. It is always beneficial for the MapReduce function in segregating the entire inputs that have been achieved from industry 4.0 into multiple numbers of parts (Greeshma and Pradeepini, 2016). This is because processing speeds are better for the smaller subtasks. However, the smaller size of the splits can create an overload in managing the divisions, which can dominate the execution time for the jobs. It would be better for the designer in making the split size same as the size of HDFS.
Discussion and critical reflection:
By discussing the facts mentioned above, I can say that the study has successfully achieved the aims and objectives of the research. The main aim of the research is identifying the ways by which data management can be performed in the industry 4.0. On the other hand, the objective of the study is to identify the ways with the help of which management of data is possible in industry 4.0. From the above discussion, I have observed that industry 4.0 has the technologically updated version of machinery that can generate automation. Along with that, the machinery of industry 4.0 has the capabilities to take the decision to increase production. Therefore, for improving the production rate or offering a set of valuable information to the manufacturer about every phase of manufacturing, industry 4.0 creates a huge amount of information. I can say from this fact that this huge amount of information must be handled very carefully because the productivity of industry 4.0 is entirely dependent upon the decision taken by the manufacturer. The above discussion has made me understood that industry 4.0 creates a cyber-physical system that creates a huge amount of information that needs to be stored and managed with efficiencies so that more value can be generated from the data. The above discussion has made it clearer for me that the involvement of big data analytics is highly efficient in managing the information that has been collected from industry 4.0. I have observed that the amount of data is continually increasing in the industry 4.0. Therefore, I think that big data analytics is a great solution in handling the data generated in industry 4.0. Therefore, the above discussion has made it clearer for me that data mining and the entry access are two different types of use cases that are highly important in handling a large amount of information generated from industry 4.0 (Verma et al. 2015). As the data need to be handled, stored and maintained to make a better decision from the raw data that are generated from the industry 4.0. Therefore, I think that data mining is the effective and only solution. This is because; data mining is the practice to examine a large amount of pre-existing information for generating a new set of information (Cunha et al. 2015). In those contexts, a new set of information needs to develop to make a better decision. This is because; the data that are generated in the industry 4.0 is the raw data which is not in the human-readable form. Therefore, I think that for generating the information in the human-readable form, it is always necessary for involving the data mining in industry 4.0 data handling process.
I have observed that MapReduce is one of the effective architecture that is highly efficient that segment the entire input gained from the industry 4.0 into subtasks so that it can be easily handled with efficiencies. I think that the other use case that is entry access is needed in the big data solution because it is helpful in answering the queries in a real-time basis. Industry 4.0 requires decision making intelligence inside the machinery so that the manufacturing process can be continued with the process of automation (Sehgal and Agarwal, 2016). For this decision making intelligence, answering real-time queries are highly important, which are performed by the entry access. This fact can help handle the data gained from industry 4.0.
The above discussion has also made it clearer for me that it would be better in applying the Apache Hadoop framework in industry 4.0. The main reason behind it is apache Hadoop works fine in handling the big data. As the data generated by industry 4.0 is the big data; therefore, I think that the application of the Apache Hadoop framework in industry 4.0 is perfectly alright. Moreover, Apache Hadoop includes the MapReduce architecture that is highly efficient in splitting the entire input into multiple parts that can effectively process the information (Cavallo et al. 2016). Dividing the whole input into different subtasks always creates faster delivery of the answer of the real-time queries and human-readable form of the information (Greeshma and Pradeepini, 2016). Therefore, the framework is highly efficient in facilitating better decision making.
I think that the HDFS component of the Apache Hadoop is efficient in storing information that is helpful for industry 4.0 in saving a huge amount of data. Therefore, I think that the application of the Apache Hadoop framework in industry 4.0 is wholly justified. From the reflection mentioned above, I can say that the entire study has entirely satisfied the aims and objectives of the research. The main reason behind it is the reflection has correctly shown the ways with the help of which the data in the industry 4.0 can be handled so that it can be represented in the human-readable form for the creation of an effective decision making which can be helpful in improving the production processes,
Science fiction stories:
The president X of ABC company has described on the meeting that he wants to improve the data management in industry 4.0 system of their organisation. The cabinet room went silent.
“It is difficult for us in improving the entire system of industry 4.0 system of our organisation” the first engineer said.
The second engineer has said that it will be better for them in improving the entire maintenance procedures.
The third engineer of ABC company has asked the other two engineer, "why not we are focusing on the run-to-failure strategies for maintaining and improving it?". During that time, the second engineer has added, "it will be a great idea for us in improving the availability and reliability of our equipment. This is because ageing equipment is creating difficulties like run-to-failure strategies.”.
“Let’s discuss the idea with the president”, the first engineer said.
“It’s a good idea, but will you be successful in improving the data management in the industry 4.0” president said. “This is because; all we need is an improved version of data management facilities in the industry 4.0”.
“Why not we develop a system in the industry 4.0 which will be effective in receiving data directly from the machines and then takes decision automatically during the failure of the machinery," the first engineer said. "I want to develop a system where machines will automatically repair its parts after collecting information from different machines so that it can identify the proper position in which repairmen is required."
“it is one of the strategies that improve the entire industry 4.0 system, but the presidents have asked about the improvement of the data management", the second engineer has said.
“I think that it is one of the data management strategies because here data are collected by them machines automatically with the help of big data analytics and then it will be processed by the analytics so that it can identify the error in the proper location," the first engineer said.
“Why not we just upgrade the cybersecurity system of the entire industry 4.0 system", the third said.
"it is more important to facilitate the maintenance rather than the security," the first manager said. "we already have a proven security mechanism for cyber attacks”.
“We should go to the president with the idea” second engineer said.
“This is one of the best ideas to increase value from the data management", the president said.
“You should proceed with the idea”, the president said to the first engineer.
From the above discussion, it can be concluded that for increasing values of the data gained from industry 4.0, it is highly essential in applying an effective data management strategy. Industry 4.0 creates a vast amount of information in every stage of the production, mainly from the machinery and the sensors. This data need to be handled properly so that a better set of data can be generated here that can help make proper decision to improve the efficiencies in the production. As a vast amount of information is generated from the industry 4.0, therefore, it would be better in applying the big data analytics. Big data analytics are highly crucial in storing the information and offer the opportunity to the manufacturer to access and analyse the information at any time. Big data analytics examine the information and offer it into the human-readable form that can be effective for the manufacturer in making proper decisions with the help of the data. Therefore, this study provides a brief overview of big data analytics. After describing the ways in handling data in the industry, this study provides a critical reflection about the whole study. This reflection helps understand the successfulness of the research in achieving the goal of this study. At last, this study has provided science fiction stories for improving value creation through the management of data.