Using Machine Learning to Predict and Reduce Porosity Defects in Die-Casting
This technical brief is based on the academic paper "Factors Analysis and Prediction in Die-casting Process for Defects Reduction" by Pavee Siriruk and Titiwetaya Yaikratok, published in the Proceedings of the International Conference on Industrial Engineering and Operations Management (2022). It is summarized and analyzed for HPDC professionals by the experts at CASTMAN.

Keywords
- Primary Keyword: Die-casting defect prediction
- Secondary Keywords: Machine learning, Predictive maintenance, Porosity defect, Hard Disk Drive (HDD) components, Decision Tree algorithm, Feature importance analysis
Executive Summary
- The Challenge: To reduce the occurrence of outer surface porosity in die-cast Hard Disk Drive (HDD) components—a critical defect that is difficult to detect at the manufacturing site but can cause significant quality issues for customers.
- The Method: Researchers collected 5 months of real-time production data, encompassing 35 machine parameters, from a die-casting machine. They applied machine learning classification models to predict defect occurrence based on this data.
- The Key Breakthrough: A Decision Tree (DT) algorithm proved to be the most effective model. It successfully identified the most critical machine parameters causing porosity and, unlike other models, was able to predict both good and defective parts.
- The Bottom Line: This research demonstrates a practical, data-driven approach using machine learning to proactively identify and control the root causes of porosity, moving from reactive inspection to predictive quality control.
The Challenge: Why This Research Matters for HPDC Professionals
In the high-stakes world of HDD component manufacturing, quality is paramount. A single failure at the end-user level can compromise crucial data and damage a manufacturer's reputation. One of the most persistent challenges in the die-casting process for these components is outer surface porosity.
As detailed in the paper's introduction, this type of defect often evades detection during standard inspection at the supplier's facility. It is only discovered after passing through the customer's processes, leading to costly quality issues and supply chain disruptions. While improving inspection methods is an option, it requires a huge investment that customers are often unwilling to absorb. This research tackles the problem at its source: instead of just trying to find the defect, it aims to prevent it by understanding the relationship between machine parameters and defect formation.
The Approach: Unpacking the Methodology
To find a solution, the researchers embarked on a data-centric project rooted in predictive maintenance and Industry 4.0 principles. The methodology, outlined in Section 4. Data Collection
, involved several key steps:
- Data Collection: Real-time data was collected from a single prototype die-casting machine over a 5-month period. This dataset included 35 distinct machine parameters (e.g., temperatures, pressures, speeds, times) for every part produced, with each part serialized for traceability.
- Defect Labeling: Each cast product was visually inspected at the final VMI station, and the outcome (OK or NG for surface porosity) was recorded and matched to its corresponding machine data via the serial number. This created a labeled dataset of 92,000 parts after data cleansing.
- Modeling: The researchers applied three supervised machine learning classification algorithms to the dataset:
- Decision Trees (DT)
- Logistic Regression (LR)
- Random Forests (RF)
- Analysis: They used a Feature Importance method (Extra Tree Classifier) to identify which of the 35 machine parameters had the most significant impact on the formation of porosity defects. The performance of each model was evaluated not just on overall accuracy, but on its ability to correctly identify both good (OK) and defective (NG) parts using metrics like the G-mean.
The Breakthrough: Key Findings & Data
The analysis yielded clear, actionable insights into the die-casting process.
- Finding 1: Key Process Factors Identified: The Feature Importance analysis pinpointed the most critical machine parameters contributing to porosity. As shown in Table 2 and Figure 3, the top five most influential factors were:
- Factor 26 (Pressure releasing related) - Score: 0.053
- Factor 3 (High-speed related) - Score: 0.049
- Factor 27 (Pressure releasing related) - Score: 0.047
- Factor 8 (High-speed related) - Score: 0.043
- Factor 16 (Filling pressure related) - Score: 0.042 This finding provides a clear focus for process control and optimization efforts.
- Finding 2: The "Accuracy Trap" Revealed: While Logistic Regression (LR) and Random Forest (RF) models achieved a high accuracy of 95.85%, they were fundamentally flawed for this application. The confusion matrices in Figure 5 and Figure 6 show that these models predicted zero defective parts. They achieved high accuracy simply by classifying every part as "OK." The G-mean score of 0.00 for both models (Table 3) confirms their complete inability to identify the minority class (defects).
- Finding 3: Decision Tree (DT) Proves Most Practical: In stark contrast, the Decision Tree (DT) model, with a slightly lower overall accuracy of 91.18%, was the only algorithm that could successfully predict both "OK" and "NG" outcomes. The confusion matrix in Figure 4 shows it correctly identified 28 defective parts. Its G-mean score of 0.28 (Table 3) demonstrates a balanced predictive performance, making it the only truly useful model for this real-world problem.
Practical Implications for Your HPDC Operations
The conclusions of this paper offer a practical guide for leveraging data to improve quality in a real-world manufacturing environment.
- For Process Engineers: The findings in Table 2 provide a data-backed priority list for process optimization. By focusing on stabilizing and controlling the top five factors, particularly those related to pressure release and high-speed injection, engineers can directly target the root causes of porosity and significantly reduce defect rates.
- For Quality Control: The Decision Tree model serves as a blueprint for a predictive quality system. Instead of relying solely on end-of-line inspection, this model can be deployed to flag parts with a high probability of being defective in real-time, allowing for immediate investigation and process correction.
- For Industry 4.0 & Data Teams: This study is a powerful case study on the importance of choosing the right metric for the right problem. It highlights that in imbalanced dataset scenarios (common in defect analysis), overall accuracy is a misleading metric. The G-mean and a close look at the confusion matrix are essential for selecting a model that provides real-world value, proving that a simpler model like a Decision Tree can be more effective than more complex ones.
Paper Details
Factors Analysis and Prediction in Die-casting Process for Defects Reduction
1. Overview:
- Title: Factors Analysis and Prediction in Die-casting Process for Defects Reduction
- Author: Pavee Siriruk, Titiwetaya Yaikratok
- Year of publication: 2022
- Journal/academic society of publication: Proceedings of the International Conference on Industrial Engineering and Operations Management
- Keywords: Big Data Analytics, Classification, Defects Prediction, Machine Learning, Predictive Maintenance.
2. Abstract:
Defect reduction has always been the continuous improvement topic that is being addressed in the manufacturing industry. Even nowadays, that the world is moving into the industrial 4.0, such a particular topic still has never outdated, only the new approaches have been introduced for the better achievement of defect reduction. This research aims to reduce the defects in die-casting process of the Hard Disk Drive (HDD) component manufacturing company, focusing on the effects of various machine parameters on the defects occurring in casting products. Predictive maintenance approach and machine learning have been introduced to determine the suitable data modelling technique. The most related independent factors can be identified through Feature Importance method. Decision Tree (DT) performed the best results among other classification methods. The 91.18% accuracy can be obtained by decision tree algorithm. However, the ratio of labelled data still needs to be reviewed and optimized for the future work as well as continue the actual checking on the frontline production results with the Subject-Matter Expert (SME) also required in order to obtain the best prediction results.
3. Introduction:
This research addresses the persistent challenge of defect reduction in the manufacturing industry, specifically within the die-casting process for Hard Disk Drive (HDD) components. The focus is on outer surface porosity, a defect that is difficult to detect at the manufacturer's site but causes significant quality issues for the customer. The paper proposes using multi-level production data analytics to find the relationship between machine parameters and defects, as an alternative to costly investments in new inspection technology. It aims to leverage historical machine data to build predictive models, enabling a shift towards Industry 4.0 and predictive maintenance.
4. Summary of the study:
Background of the research topic:
The study is set in the context of a 3rd tier supplier manufacturing Motor baseplates for the HDD industry. The primary issue is an "outer surface porosity defect" that cannot be 100% detected by the supplier but causes quality failures at the customer's facility. This creates a supply chain problem where the cost of improved inspection is a point of contention. The research explores a data analytics approach to control the defect's occurrence rather than just improving its detection.
Status of previous research:
The paper reviews various machine learning algorithms used for predictive maintenance (PdM) and defect prediction in different industries. It notes the use of Partial Least Squares Regression (PLSR), Artificial Neural Networks (ANN), and Random Forests (RFs) in steel industries. For classification problems with discrete outputs (like OK/NG), the paper identifies Decision Trees (DT), Logistic Regression (LR), and Random Forests (RFs) as relevant methods, citing their application in fields from railway infrastructure to aerospace engine health prediction.
Purpose of the study:
The objective is to reduce defects in the die-casting process by determining suitable data modeling techniques for predictive maintenance. The research aims to identify which machine parameters cause casting defects and to apply several machine learning algorithms to find the best predictive model.
Core study:
The core of the study involves collecting real-time machine sensor data and corresponding quality inspection data (OK/NG for porosity) for die-cast products. This data is then used to train and evaluate three classification models: Decision Tree (DT), Logistic Regression (LR), and Random Forest (RF). A feature importance analysis is also conducted to identify the machine parameters that are most correlated with the occurrence of porosity defects. The ultimate goal is to find a reliable model for predicting defects based on process conditions.
5. Research Methodology
Research Design:
The research follows a general machine learning framework consisting of four main steps:
- Data Collection: Gathering relevant data from machine sensors and inspection processes.
- Data Pre-processing: Cleaning the data and preparing it for modeling.
- Model Selection, Training, and Validation: Training several algorithms and testing them to find the most robust and accurate model.
- Model Maintenance: Acknowledging the need for ongoing model updates.
Data Collection and Analysis Methods:
- Data Collection: Data was collected over 5 months from a prototype die-casting machine. This included 35 machine parameters (continuous data) and the final inspection result (discrete data: OK/NG). The two data sources were matched using a serial number for each product. An initial 141,000 datasets were collected, with 92,000 remaining after cleaning.
- Data Analysis: The study used an "Extra Tree Classifier" for feature importance analysis to rank the influence of input factors. The performance of the DT, LR, and RF classification models was evaluated using a Confusion Matrix, from which metrics like Accuracy, Precision, Recall, F-Measure, and G-mean were calculated.
Research Topics and Scope:
The research is focused specifically on the die-casting process for an HDD component (Motor baseplate). While 17 types of casting defects exist in the process, this study exclusively analyzes the "surface porosity" defect. The data is sourced from a single machine, intended to represent all process variations over a 5-month period.
6. Key Results:
Key Results:
The study produced two main sets of results. First, the feature importance analysis identified the top five machine parameters most correlated with porosity defects, with "Factor 26 (pressure releasing factor)" being the most significant. Second, the performance evaluation of the three machine learning models showed that while Logistic Regression and Random Forest had higher accuracy (95.85%), they failed to predict any of the defective parts (G-mean = 0.00). The Decision Tree model, with 91.18% accuracy, was the only one that could predict both positive (OK) and negative (NG) outcomes, achieving a G-mean of 0.28, making it the most practically useful model.
Figure Name List:


- Figure 1. General framework of machine learning
- Figure 2. Confusion matrix
- Figure 3. Comparing score of feature importance analysis results
- Figure 4. Confusion Metrix of DT
- Figure 5. Confusion Metrix of LR
- Figure 6. Confusion Metrix of RF
7. Conclusion:
The research successfully identified the key factors contributing to porosity defects in die-cast products, with pressure-releasing and high-speed related parameters being the most influential. The Decision Trees (DT) algorithm was determined to be the best predictive model, achieving 91.18% accuracy and, most importantly, demonstrating the ability to predict both good and defective parts (a positive G-Mean value), which other tested models could not. The paper concludes by noting a key limitation: the extremely low percentage of NG data (imbalanced dataset). Future work should focus on optimizing the dataset and continuing collaboration with Subject-Matter Experts (SMEs) to refine the prediction model for real-world practice.
8. References:
- [The paper lists 28 references, including works by Aliyan E. et al. (2020), Amihai I. et al. (2018), Behera S. et al. (2019), Bukhsh Z. A. et al. (2019), Canizo M. et al. (2017), Carvalho T. P. et al. (2019), Chen X. et al. (2021), Durbhaka G. K. et al. (2016), Hsu J. Y. et al. (2020), Kaparthi S. & Bumblauskas D. (2020), Kim J. S. et al. (2020), Kolokas N. et al. (2018), Lasisi A. & Attoh-Okine N. (2018), Liao H. et al. (2006), Mathew V. et al. (2017), Nourian-Avval A. & Fatemi A. (2020), Park S. et al. (2019), Phillips J. et al. (2015), Prytz R. et al. (2015), Rai R. et al. (2021), Rønsch G. Ø. et al. (2021), Su C. J. & Huang S. F. (2018), and Zhang Z. & Zhang P. (2015).]
Conclusion & Next Steps
This research provides a valuable roadmap for enhancing quality control in die-casting. The findings offer a clear, data-driven path toward improving quality, reducing defects, and optimizing production by focusing on the process parameters that matter most.
CASTMAN is committed to applying cutting-edge industry research to solve our customers’ most challenging technical problems. If the problem discussed in this white paper aligns with your research goals, please contact our engineering team to discuss how we can help you apply these advanced principles to your research.
Expert Q&A:
- Q1: What was the primary defect studied in the paper and why was it so important? A: The study focused on "outer surface porosity" in die-cast HDD components. According to the paper's introduction, this defect is critical because it cannot be 100% detected at the manufacturer's site and is often found by the customer, leading to significant quality issues and potential HDD failure.
- Q2: Which machine parameters were found to be the most critical for causing porosity defects? A: The feature importance analysis, as shown in Table 2 of the paper "Factors Analysis and Prediction in Die-casting Process for Defects Reduction," identified the top five most influential factors. The most critical was "Factor 26 (Pressure releasing related)," followed by factors related to high-speed injection and filling pressure.
- Q3: Which machine learning model was determined to be the most effective for predicting defects? A: The Decision Tree (DT) algorithm was found to be the most effective. As stated in the conclusion, while it had a slightly lower overall accuracy (91.18%), it was the only model tested that could predict both "OK" and "NG" (defective) parts, making it the most practical for real-world defect detection.
- Q4: Why did models with higher accuracy, like Logistic Regression, perform worse in a practical sense? A: The paper explains that models like Logistic Regression (LR) and Random Forest (RF) fell into an "accuracy trap." Figures 5 and 6 show they achieved high accuracy (95.85%) by classifying all products as "OK" and failing to identify a single defect. Their G-mean score of 0.00 in Table 3 confirms they were useless for the actual task of finding the rare defective parts.
- Q5: What was a key limitation of this study that needs to be addressed in future work? A: The conclusion of the paper highlights that a key limitation was the "extremely low percentage of NG data," which creates an imbalanced dataset. The authors recommend that future work should focus on reviewing and optimizing this data ratio to improve the predictive performance for the negative (defect) class.
Copyright
- This material is an analysis of the paper "Factors Analysis and Prediction in Die-casting Process for Defects Reduction" by Pavee Siriruk and Titiwetaya Yaikratok.
- Source of the paper: Proceedings of the International Conference on Industrial Engineering and Operations Management, Istanbul, Turkey, March 7-10, 2022.
- This material is for informational purposes only. Unauthorized commercial use is prohibited.
- Copyright © 2025 CASTMAN. All rights reserved.