Data extension-based analysis and application selection of process-composition-properties of die casting aluminum alloy

Jian Yang ab, Bo Liu ab, Yunbo Zeng c, Yiben Zhang ab, Haiyou Huang de, Jichao Hong bShow moreAdd to MendeleyShareCite

https://doi.org/10.1016/j.engappai.2024.108514Get rights and content

Abstract

This research aims to provide a solution to the scarcity and fragmentation of industrial data on die casting aluminum alloys. Quantifying the coupling between die casting process-composition-properties of aluminum alloys through small datasets, is a critical step in predicting part properties and optimizing process selection. To visualize the connections and discuss the effect of the interaction between different parameters on the property, data is fed into a self-organizing mapping model. Whereafter, an innovative data extension method is proposed to predict both yield and tensile strengths with more than 96% accuracy using a small data set. Moreover, two novel methods of multi-parameter combined range selection, the multi-objective optimization based on the agent model and the superimposition of the contour map, are guided by being informed in which region the mechanical properties fall. Finally, the feasibility of application range selection method is verified experimentally. Mapping based on data-driven process-composition-properties relationships and the free combination of application ranges are reliable theoretical solutions which are valuable for practical applications.

Introduction

Aluminum (Al) alloys are widely used in various industrial fields (Jiang et al., 2023). Different combinations of chemical composition and process parameters have an essential influence on the organizational and mechanical properties (Han and Viswanathan, 2004, Kanjilal et al., 2006, Vicario et al., 2022). The application of small sample data is being challenged in the face of the scarcity and fragmentation of actual industrial production data. Predictive modeling using mechanical properties as target quantities and material parameters such as composition and process parameters as inputs has been more widely used in the field of materials (Hu et al., 2018, Byberg et al., 2018, Hasan and Acar, 2022). But for the emerging heat treatment-free integrated die casting Al alloys (Barta et al., 2021, Fan et al., 2022), few studies have been conducted involving process-composition-property (PCP) relationships (Basori et al., 2021). As such, quantitative analysis of PCP coupling relationships and design of reasonable prediction schemes to guide the intelligent selection of multi-parameter combination ranges for die casting Al alloys is a promising research direction.

Reproducibility of experimental results is an expensive and time-consuming process due to the large variation in Al alloy composition and tedious processes. Most existing mechanical property studies are limited to qualitative analysis, so an effective method is needed to break the deadlock (Zhou et al., 2021). Currently, the commonly used methods are practical production, data mining, and deep learning approaches. Numerous experiments have shown that trial-and-error based methods to determine optimal conditions for desired performance are time-consuming and expensive production-manufacturing-experimental cycles (Zhuang et al., 2022, Kwon, 2021). It is challenging, even impossible, to collect large datasets of different PCP coupling relationships. Only a limited data set is available for engineering materials in practical production and existing research (Yu et al., 2021). As such, to address the characteristics of small data sample size (Soofi et al., 2022), high noise (Sajadi et al., 2023), poor quality (Chen-yang et al., 2021), and huge space for exploration of new materials, machine learning (ML) models and optimization algorithms are integrated to predict the mechanical properties of die casting Al alloys from a small dataset. Yu et al. (2021) designed a deep neural network (DNN) using a small dataset to efficiently provide an accurate mapping between PCP coupling relationships, and with higher generalization performance compared to other ML methods. Soofi et al. (2022) utilized ML and data mining techniques based on statistical knowledge that can assist in the design of commercial alloy systems on small data sets that are actually produced.

Recently, data-driven based approaches have provided an effective and efficient way of thinking (Zhan et al., 2021, Wang et al., 2022). These methods are able to utilize a certain amount of data to significantly accelerate the predictive power of materials in high dimensional space (Hart et al., 2021, Maier et al., 2022). In particular, ML and deep learning methods can predict the properties of specific material combinations based on available data in the literature (Chaudry et al., 2021, Xu et al., 2020, Kim and Shin, 2023). Bang and Ince (2022) developed a two-parameter driving force model to predict the crack extension rate. The results show that the experimental data of two Al alloys at four different R-ratio agree well with the model predictions. Zhou et al. (2021) conducted a quantitative analysis and comparative study on the subcooling of Al alloys by collecting experimental data and using various ML algorithms. Cao et al. (2020) applied ML methods to the prediction of mechanical properties and corrosion resistance of Al alloys, allowing the creation of other goal-oriented models for design whenever training data sets are available. Pouraliakbar et al. (2016) used a genetic programming approach to build an optimal model to improve the prediction accuracy of Al alloys total strain, ultimate tensile strength (UTS), and primary grain size parameters. The errors in the training and testing phases were minimized in the optimal fitness region, and the experiments showed the reliable performance of the model for the prediction of ultrafine grain size.

However, ML methods require a certain number of datasets to achieve favorable prediction performance (Feng et al., 2021). Too small a number of datasets can lead to a model that does not find the features in it and tends to overfit (Cui et al., 2022, Zhang et al., 2017). Numerous studies have shown that there are many ways to make the best use of measurement data (Xu et al., 2023), one of which is to expand on small sample datasets (Chen et al., 2018, Chang et al., 2013). Based on the mean and standard deviation of the measurements, Suh et al. (2022) expanded the 32 measurements by 9 times, which makes it easier to identify correlations between input and output variables. Chang et al. (2014) developed a latent information function using virtual sample generation techniques to take temporal features into account in the training of small sample data. Chang et al. (2017) proposed an improved gray forecasting model that uses data smoothing metrics to extract data features and use them to construct accurate forecasting models. The experiment verified that the method is well suited for small sample short-term demand. However, a common and unavoidable problem in all these studies is that generating and constructing datasets requires considerable time and cost, as well as following certain criteria (Suh et al., 2022). As they describe, these data extension methods apply to other types of data (Suh et al., 2022, Chang et al., 2017). Therefore, based on this foundation, we improved the method of data expansion and applied it to process-composition-property data of die casting Al alloys.

To make the ML model prediction accuracy and generalization ability meet the demand, many scholars found the optimal solution by heuristic optimization algorithm for local search in finite feature space (Shi et al., 2023, Jennings et al., 2019, Dashtbayazi et al., 2007, Dey et al., 2017). Menou (Menou et al., 2018) used a multi-objective genetic algorithm (GA) for optimization in a large component space to screen new and optimal high-entropy alloys. The applicability of the method was demonstrated by experimental optimization. Fang et al. (2009) applied least squares support vector machine and non-dominated sorting genetic algorithm-II (NSGA-II) to a predictive model of mechanical and electrical properties of Al-Zn-Mg-Cu series alloys. The generalization performance of the model was significantly improved after determining the optimal hyperparameters by the lattice algorithm and cross validation technique. Many studies have shown the power of the Pareto frontier in GA to handle multiobjective problems (Pattanayak et al., 2015, García-Carrillo et al., 2022, Daksha et al., 2021, Solomou et al., 2018). Zhao et al. (2021) used a combination of GA and neural networks to generate Pareto optimal solution sets for rivet and die combinations for self-piercing riveting to meet the requirements of different joint evaluation criteria. Inspired by this way of his thinking, we build a die casting Al alloy PCP agent model based on support vector machine (SVM). The GA-based multi-objective optimization implementation identifies the mechanical property under different selection criteria. This is a new attempt and challenge, by which users can be quickly screened for materials that meet their needs.

Furthermore, Zhao et al. (2021) also used artificial neural networks to build application range maps for different rivet and die combinations, which further simplified the selection of rivets and dies. Contour plots can visualize the significance of the interaction of the two variables. Suh et al. (2022) plotted contour plots of the variation of mechanical properties of AZ61 alloy with aging temperature and aging time for aging properties. Pattanayak et al. (2015) presented the effect of composition and process parameters on the mechanical properties of rigid materials in the form of contour plots. In this research, we plot the range of contour applications for different parameter combinations based on support vector regression (SVR). The corresponding curves for yield strength (YS), UTS, and elongation (EL) are found according to the user’s requirements and then the contour maps are overlaid to form a new range area. The application range usually varies depending on different parameter combinations and performance indicators. This intelligent combination of application range map provides an idea for the prediction of mechanical properties and is one of the core ideas of this work.

This paper attempts to make several original contributions and improvements to the current research as shown in the following:

(1) PCP relational coupling: Estimation of the importance of predictors for small sample data sets and quantitative analysis of the coupling between PCPs of integrated die casting Al alloys are performed. To visualize the PCP connections and to discuss the effect of the interaction between different parameters on the mechanical properties, historical data sets are input to the Self-organizing map (SOM).

(2) Data fission and expansion: To overcome the lack of data and to take full account of data correlation, a novel method is designed to expand the existing data set by as much as 50 times using the average (AVG), standard deviation (SD), and standard error of mean (SEM) of the original data (OD). A small number of these samples are selected to construct a data-driven predictive model.

(3) SVR prediction and optimization: Regression analysis verifies that the SVM model has high prediction accuracy. To reduce the sampling uncertainty, cross validation is performed to expand the distribution of validation subsets for a given set of parameters. The optimization results show mean absolute percentage errors (MAPE) of 2.98%, 3.52%, and 8.15% for YS, UTS, and EL, with prediction accuracies of 97.02%, 96.48%, and 91.85%, respectively.

(4) GA-based multi-objective optimization: A PCP agent model for die casting Al alloys is established and multi-objective optimization based on GA is implemented to identify the mechanical properties under different selection criteria. The automatic selection of the PCP application range is guided by being informed in which range the mechanical properties belong.

(5) Application scope map creation: Application range contour maps for different combinations of parameters are plotted. The corresponding curves of mechanical properties are drawn according to the user’s requirements and the contour maps are superimposed to form the available ranges. Based on the comparison of the results of die casting tests, the feasibility of the PCP application range selection method is verified.

The flow chart of data-driven PCP coupling-based mapping and intelligent selection of multi-parameter application ranges is illustrated in Fig. 1. The rest of the paper is organized as follows: Section 2 clarifies the sources of data and novel data fission methods. Section 3 quantitatively analyzes the PCP coupling relationships and visualizes the connections between them using the SOM. The prediction accuracy of the SVR model is enhanced based on expanded data with cross-validation methods in Section 4. Section 5 designs two novel methods for the intelligent selection of multi-parameter combination ranges of die casting Al alloys using the proposed SVR model, and verifies the feasibility of the scheme through experiments. Section 6 summarizes the reseach and provides an outlook.

Access through your organization

Check access to the full text by signing in through your organization. Access through your institution

Section snippets

Data set acquisition

Obtaining a large data set of mechanical properties of die casting Al alloys under different chemical compositions and process conditions is difficult (Wu et al., 2020). On the one hand, there are still some technical barriers to die casting technology. On the other hand, a large number of industrial applications do not account for the data. Hence, the datasets for practical applications are often small and sparse, and only a limited number of datasets are available for engineering materials.

Correlation and predictor importance analysis

Generally, the use of many properties that do not relate well to the desired attributes leads to model complexity and increases the probability of overfitting (Guo et al., 2016, Asghari et al., 2020). We input 13 sets of composition parameters, 5 sets of process parameters, and 3 sets of mechanical property parameters into the SVR model, and the training all ends shortly after the start due to overfitting or high data dispersion. Thus, conducting correlation analysis prior to modeling is

Establishment and development of SVR prediction model

Among ML methods, SVR is a reliable tool for solving classification problems and regression problems with high-dimensional features (Costa et al., 2023, Hu et al., 2021). SVR is a regressor that manages data in a supervised learning manner, with strong sparsity and robustness. The classification accuracy is high and the generalization ability is strong when the sample size is not a huge amount of data. Thus, for smaller datasets, SVR will have better training results (Zhai et al., 2018). In

Application range selection for different parameter combinations

The SVR model can be used to predict the macroscopic mechanical properties of Al alloys, but cannot be used for the intelligent selection of a multi-parameter combination range of PCP. Therefore, two novel methods are designed for the intelligent selection of multi-parameter combination ranges of die casting Al alloys. First, a PCP agent model of integrated die casting Al alloy is established and multi-objective optimization based on GA is realized to identify mechanical properties under

Conclusion

(1) In real production, additional bad values and optional processing steps are discarded. In the future, efforts should be made to recover these “bad data”, which are equally valuable for ML based modeling and “good data”. This further implies that knowledge of organizational information, mechanical properties, databases, etc. should be considered in the ML model.

(2) To visualize PCP connectivity relationships, experimental datasets are used for correlation analysis with SOM, which can be

CRediT authorship contribution statement

Jian Yang: Writing – review & editing, Writing – original draft, Visualization, Validation, Software, Resources, Methodology, Formal analysis, Data curation. Bo Liu: Writing – review & editing, Supervision, Resources, Project administration, Methodology, Investigation, Funding acquisition, Conceptualization. Yunbo Zeng: Supervision, Resources, Investigation, Data curation. Yiben Zhang: Writing – review & editing, Validation, Software, Resources, Methodology, Investigation, Formal analysis,

Declaration of competing interest

1. The manuscript has never been published or under consideration for publication elsewhere. There is no dispute or conflict of interest about this manuscript.

2. All authors have participated in (a) conception and design, or analysis and interpretation of the data; (b) drafting the article or revising it critically for important intellectual content; and (c) approval of the final version.

3. We understand that the Corresponding Author is the sole contact for the Editorial process (including

Acknowledgments

This work is supported in part by the project is supported partly by the Fundamental Research Funds for Central Universities, China (No. 06500203 and No. 00007735) and Chongqing Technology innovation and application development, China project (No. 08400110).

References (68)