Details

Enhancing Efficacy of Machine Learning Model Selection Process for Big Data Science Projects by Introducing an Adaptive Method Based on Dynamic Factors

Arnav Goenka

Vellore Institute of Technology, Vellore, Tamil Nadu, India

134-139

Vol: 13, Issue: 3, 2023

Receiving Date: 2023-08-18 Acceptance Date:

2023-09-19

Publication Date:

2023-09-24

Download PDF

http://doi.org/10.37648/ijrst.v13i03.014

Abstract

Data science projects typically involve a machine learning (ML) process characterized by evolving data, code, and models. For instance, as datasets grow in size, they may become suitable for ML models that require larger datasets. However, the dynamic factors influencing model selection must be better understood and explicitly represented. This paper introduces ongoing work on an adaptive method for ML model selection in big data science projects. The proposed method includes (i) identifying the factors that influence model selection based on heuristics from the literature and (ii) modelling the variability of these factors using a feature diagram and constraints that trigger adaptive reconfiguration—changes in model selection due to shifts in these factors. The method's applicability is demonstrated through an illustrative use case. By providing a clearer understanding of the dynamic factors that influence model selection, this method shows how these factors can be explicitly represented and automated. This enhanced understanding can lead to a more explicit, efficient, adaptive, and explainable model selection process, ultimately laying the groundwork for developing novel dynamic software product lines to support this process.

Keywords: data science; machine learning; big data

References

  1. J. S. Saltz and I. Krasteva, “Current approaches for executing big data science projects—a systematic literature review,” PeerJ Computer Science, vol. 8, p. e862, 2022.
  2. G. Symeonidis, E. Nerantzis, A. Kazakis, and G. A. Papakostas, “Mlopsdefinitions, tools and challenges,” in 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2022, pp. 0453–0460.
  3. S. K. Karmaker, M. M. Hassan, M. J. Smith, L. Xu, C. Zhai, and K. Veeramachaneni, “Automl to date and beyond: Challenges and opportunities,” ACM Computing Surveys (CSUR), vol. 54, no. 8, pp. 1–36, 2021.
  4. S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Nagappan, B. Nushi, and T. Zimmermann, “Software engineering for machine learning: A case study,” in 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2019, pp. 291–300.
  5. J. Klaise, A. Van Looveren, C. Cox, G. Vacanti, and A. Coca, “Monitoring and explainability of models in production,” arXiv preprint arXiv:2007.06299, 2020.
  6. J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, “Learning under concept drift: A review,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 12, pp. 2346–2363, 2018.
  7. W. Hummer, V. Muthusamy, T. Rausch, P. Dube, K. El Maghraoui, A. Murthi, and P. Oum, “Modelops: Cloud-based lifecycle management for reliable and trusted ai,” in 2019 IEEE International Conference on Cloud Engineering (IC2E). IEEE, 2019, pp. 113–120.
  8. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al., “Scikit-learn: Machine learning in python,” the Journal of machine Learning research, vol. 12, pp. 2825–2830, 2011
  9. D. Chicco and G. Jurman, “Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone,” BMC medical informatics and decision making, vol. 20, no. 1, pp. 1–16, 2020.
  10. R. Leenings, N. R. Winter, L. Plagwitz, V. Holstein, J. Ernsting, K. Sarink, L. Fisch, J. Steenweg, L. Kleine-Vennekate, J. Gebker et al., “Photonai—a python api for rapid machine learning model development,” Plos one, vol. 16, no. 7, p. e0254062, 2021.
Back

Disclaimer: All papers published in IJRST will be indexed on Google Search Engine as per their policy.

We are one of the best in the field of watches and we take care of the needs of our customers and produce replica watches of very good quality as per their demands.