Towards a Data Lake for High Pressure Die Casting

This introductory paper is the research content of the paper "Towards a Data Lake for High Pressure Die Casting" published by MDPI.

Figure 1. Network layout and technology stack from the HDPC cell to the storage solution
Figure 1. Network layout and technology stack from the HDPC cell to the storage solution

1. Overview:

  • Title: Towards a Data Lake for High Pressure Die Casting
  • Author: Maximilian Rudack, Michael Rath, Uwe Vroomen and Andreas Bührig-Polaczek
  • Publication Year: 2022
  • Published Journal/Society: Metals
  • Keywords: High Pressure Die Casting (HPDC); data lake; internet of production; Industry 4.0; digital foundry; OPC UA; Node-Red; MinIO

2. Abstract

The High Pressure Die Casting (HPDC) process is characterized by a high degree of automation and therefore represents a data rich production technology. From concepts such as Industry 4.0 and the Internet of Production (IoP), it is well known that the utilization of process data can facilitate improvements in product quality and productivity. In this work, we present a concept and its first steps of implementation to enable data management via a data lake for HPDC. Our goal was to design a system capable of acquiring, transmitting and storing static as well as dynamic process variables. The measurements originate from multiple data sources based on the Open Platform Communication Unified Architecture (OPC UA) within the HPDC cell and are transmitted via a streaming pipeline implemented in Node-Red and Apache Kafka. The data are consecutively stored in a data lake for HPDC that is based on a MinIO object store. In initial tests the implemented system proved it to be reliable, flexible and scalable. On standard consumer hardware, data handling of several thousand measurements per minute is possible. The use of the visual programming language Node-Red enables swift reconfiguration and deployment of the data processing pipeline.

3. Research Background:

Background of the research topic:

The HPDC process is a discontinuous permanent mold based production technology used for casting near net shape metal components, primarily in the automotive industry. It involves high production volumes and complex interconnected subsystems within an HPDC cell (thermal regulation, spraying, vacuum, dosing, etc.). These systems are controlled by Programmable Logic Controllers (PLCs).

Status of previous research:

With increasing availability of interfaces like OPC UA [1,2], research is shifting towards processing data with finer granularity [3,4]. Analytical evaluation of data is recognized for its potential to increase productivity and quality [5,6]. Existing research emphasizes the need for data infrastructure to utilize modern data analysis methods.

Need for research:

Before data analysis can be applied, a data infrastructure is needed to collect, process, store, and make available machine, process, and product data [7]. This infrastructure, often termed Online Analytical Processing (OLAP), includes data warehouses and data lakes [8,9]. A data pipeline is crucial for ingesting and organizing this data [10,11].

4. Research purpose and research question:

Research purpose:

To present a concept and its initial implementation steps to enable a data lake for HPDC. This includes data collection at the machine and efficient forwarding to the cloud using a data pipeline.

Core research:

Designing a system to acquire, transmit, and store static and dynamic process variables from multiple OPC UA-based data sources within an HPDC cell. The research focuses on the data pipeline design and its performance.

5. Research methodology

The research employed a practical implementation and testing approach.

  • Research Design: Development of a data pipeline and data lake architecture.
  • Data Collection: Data was collected from five OPC UA servers connected to PLCs within a 500t horizontal cold chamber HPDC machine (DAK450-40 Vacural) cell at the Foundry-Institute of RWTH Aachen University. The PLCs included:
    • PLC HPDC Machine
    • PLC Cell Retrofit Sensors
    • PLC Dosing + Furnace
    • PLC Sprayhead
    • Real Time Measurement System
  • Data Pipeline Implementation:
    • Edge Server: A Raspberry Pi 4 Model B was used as an edge server to access separate networks within the HPDC cell.
    • Data Acquisition and Processing: Node-Red v2.0.6 [12,13] was used on the edge server to collect data from OPC UA servers.
    • Message Broker: Apache Kafka v2.6.0 [14] running in a Docker container [15] on a Kubernetes cluster v1.19.10 [16,17] was used as a message broker in the cloud.
    • Data Storage: A MinIO object storage version 2021-11-24T23:19:33Z [18] served as the data lake.
    • Cloud Processing: A second Node-Red instance in the cloud handled data transfer from Kafka to MinIO.
  • Analysis Method: Extensive load tests were conducted to evaluate the pipeline's performance. Metrics included throughput, latency, and ordering of messages. Preliminary tests determined the data production frequency of the OPC UA servers.
  • Research Scope: Focus on the data pipeline connecting the HPDC cell to cloud.

6. Key research results:

Key research results and presented data analysis:

  • A functional data architecture was implemented (Figure 1).
  • Node-Red flows were developed for data extraction, processing, and metadata injection (Figure 2, Figure 3, Figure 4).
  • Load tests demonstrated the pipeline's capability (Figure 5).
  • Preliminary tests characterized the behavior of the OPC UA servers (Figure 6).
  • The pipeline could handle around 300 messages per second without delays, with a maximum throughput of around 700 messages per minute.
  • Latency increased with throughput beyond 300 messages per second (Figure 8).
  • Message ordering was quantified, showing an increase and plateau during load tests (Figure 7).
  • The data pipeline was tested by collecting data from the "serverTimestamp" [19].
Figure 2. Segment of the flow to access the data from the machine PLC
Figure 2. Segment of the flow to access the data from the machine PLC
Figure 5. Part of the overall architecture used for load testing.
Figure 5. Part of the overall architecture used for load testing.
Figure 6. Response to subscriptions to the current time of the OPC UA servers depending on various requested frequencies.
Figure 6. Response to subscriptions to the current time of the OPC UA servers depending on various requested frequencies.
Figure 8. Dependence of latency on throughput measured in five load tests conducted in identical setups.
Figure 8. Dependence of latency on throughput measured in five load tests conducted in identical setups.

List of figure names:

  • Figure 1. Network layout and technology stack from the HDPC cell to the storage solution.
  • Figure 2. Segment of the flow to access the data from the machine PLC.
  • Figure 3. Compounding flow on the edge server that connects to the message broker.
  • Figure 4. Compounding flow in the cloud that connects the message broker with the data lake.
  • Figure 5. Part of the overall architecture used for load testing.
  • Figure 6. Response to subscriptions to the current time of the OPC UA servers depending on various requested frequencies.
  • Figure 7. Time dependent metrics of five load tests in identical setups.
  • Figure 8. Dependence of latency on throughput measured in five load tests conducted in identical setups.

7. Conclusion:

Summary of key findings:

The developed data pipeline and technology stack are sufficient for the HPDC use case, capable of handling a significant volume of data with room for expansion. The system is highly expandable, adjustable, and transferable to other use cases with multiple PLCs providing data via OPC UA.

Summary of research results.

The research demonstrated the feasibility of a data pipeline for HPDC. The pipeline supports practical and agile development.

Academic significance of the research

The research contributes to the implementation of Industry 4.0 and IoP concepts in HPDC by providing a practical solution for data management.

Practical implications of the research

The system enables the collection and storage of process data, paving the way for data-driven improvements in product quality and productivity. Future work includes developing semantic metadata for interoperability and integrating visual information. The concept of bronze, silver, and gold data layers is introduced for data refinement.

8. References:

  • [1] OPC Unified Architecture. Available online: https://opcfoundation.org (accessed on 6 January 2022).
  • [2] Mahnke, W.; Leitner, S.H.; Damm, M. OPC Unified Architecture; Springer Science & Business Media: Berlin, Germany, 2009.
  • [3] Rix, M.; Kujat, B.; Meisen, T.; Jeschke, S. An agile information processing framework for high pressure die casting applications in modern manufacturing systems. Procedia CIRP 2016, 41, 1084–1089.
  • [4] Pennekamp, J.; Glebke, R.; Henze, M.; Meisen, T.; Quix, C.; Hai, R.; Gleim, L.; Niemietz, P.; Rudack, M.; Knape, S.; et al. Towards an infrastructure enabling the internet of production. In Proceedings of the 2019 IEEE International Conference on Industrial Cyber Physical Systems (ICPS), Taipei, Taiwan, 6–9 May 2019; pp. 31-37.
  • [5] Dai, H.N.; Wang, H.; Xu, G.; Wan, J.; Imran, M. Big data analytics for manufacturing internet of things: opportunities, challenges and enabling technologies. Enterp. Inf. Syst. 2020, 14, 1279–1303.
  • [6] Rath, M.; Gannouni, A.; Luetticke, D.; Gries, T. Digitizing a Distributed Textile Production Process using Industrial Internet of Things: A Use-Case. In Proceedings of the 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems (ICPS), Victoria, BC, Canada, 10-12 May 2021; pp. 315-320.
  • [7] Lee, J.Y.; Yoon, J.S.; Kim, B.H. A big data analytics platform for smart factories in small and medium-sized manufacturing enterprises: An empirical case study of a die casting factory. Int. J. Precis. Eng. Manuf. 2017, 18, 1353–1361.
  • [8] Chen, K.Y.; Wu, T.C. Data warehouse design for manufacturing execution systems. In Proceedings of the IEEE International Conference on Mechatronics (ICM'05), Taipei, Taiwan, 10–12 July 2005; pp. 751-756.
  • [9] Hai, R.; Geisler, S.; Quix, C. Constance: An intelligent data lake system. In Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA, 26 June–1 July 2016; pp. 2097–2100.
  • [10] Lipp, J.; Rath, M.; Rudack, M.; Vroomen, U.; Bührig-Polaczek, A. Flexible OPC UA Data Load Optimizations on the Edge of Production. In Enterprise Information Systems, Proceedings of the 22nd International Conference (ICEIS 2020), Virtual Event, 5–7 May 2020; Revised Selected Papers; Springer: Cham, Switzerland, 2020; pp. 43–61.
  • [11] Raj, A.; Bosch, J.; Olsson, H.H.; Wang, T.J. Modelling Data Pipelines. In Proceedings of the 2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), Portoroz, Slovenia, 26–28 August 2020; pp. 13–20.
  • [12] Node-Red. Available online: https://nodered.org (accessed on 6 January 2022).
  • [13] Nicolae, A.; Korodi, A. Node-Red and OPC UA Based Lightweight and Low-Cost Historian with Application in the Water Industry. In Proceedings of the 2018 IEEE 16th International Conference on Industrial Informatics (INDIN), Porto, Portuga, 18–20 July 2018; pp. 1012–1017.
  • [14] Apache Kafka. Available online: https://kafka.apache.org (accessed on 6 January 2022).
  • [15] Docker. Available online: https://www.docker.com (accessed on 6 January 2022).
  • [16] Kubernetes. Available online: https://kubernetes.io (accessed on 6 January 2022).
  • [17] Burns, B.; Grant, B.; Oppenheimer, D.; Brewer, E.; Wilkes, J. Borg, omega, and kubernetes. Commun. ACM 2016, 59, 50–57.
  • [18] MinIO. Available online: https://min.io (accessed on 6 January 2022).
  • [19] OPC UA Foundation. OPC Unified Architecture—Part 4: Services, version 1.05; OPC UA Foundation: Scottsdale, AZ, USA, 2021.

9. Copyright:

  • This material is a paper by "Maximilian Rudack, Michael Rath, Uwe Vroomen and Andreas Bührig-Polaczek": Based on "Towards a Data Lake for High Pressure Die Casting".
  • Source of paper: https://doi.org/10.3390/met12020349

This material was created to introduce the above paper, and unauthorized use for commercial purposes is prohibited. Copyright © 2025 CASTMAN. All rights reserved.