ABSTRACT
Aim: This paper investigates the transformative potential of artificial intelligence in streamlining and enhancing various stages of the data analysis workflow, addressing the conventional "human bottleneck" in extracting value from complex and diverse datasets.
Methodology: This study employs a systematic review methodology, augmented by a comparative analysis of AI-driven tools and frameworks, to explore the current capabilities and emergent opportunities for AI in automating exploratory data analysis, feature engineering, and model validation.
Findings: The findings indicate that AI-driven approaches significantly improve feature selection, uncover intricate relationships between variables, and reveal latent groupings within data, thereby streamlining the process of generating new engineered features. Furthermore, end-to-end automated data processing systems, incorporating automated machine learning techniques, are increasingly capable of transforming raw data into valuable features by automating all intermediate processing stages.
Implication: These advancements hold substantial implications for operational efficiency and decision-making accuracy, allowing human analysts to redirect their focus from routine data manipulation to higher-level strategic interpretation and hypothesis generation. This shift alleviates the extensive human expertise traditionally required for tasks such as hyperparameter tuning and model selection, accelerating the development and deployment of sophisticated AI models.
Originality/Value: This work offers a novel synthesis of current advancements in automated data processing and feature engineering, highlighting how AI-driven methodologies address the complexities of large-scale, heterogeneous datasets. Specifically, automated feature engineering methods, including those leveraging pre-trained foundational models, are crucial in identifying entities, concepts, and optimal data representations to enhance downstream analysis efficiency and accuracy.
Keywords: artificial intelligence, machine learning, data automation, feature engineering, data preprocessing.