4  AI Workflow

Resources: Slides, Videos, Exercises

DALL·E 3 Prompt: Create a rectangular illustration of a stylized flowchart representing the AI workflow/pipeline. From left to right, depict the stages as follows: ‘Data Collection’ with a database icon, ‘Data Preprocessing’ with a filter icon, ‘Model Design’ with a brain icon, ‘Training’ with a weight icon, ‘Evaluation’ with a checkmark, and ‘Deployment’ with a rocket. Connect each stage with arrows to guide the viewer horizontally through the AI processes, emphasizing these steps’ sequential and interconnected nature.

DALL·E 3 Prompt: Create a rectangular illustration of a stylized flowchart representing the AI workflow/pipeline. From left to right, depict the stages as follows: ‘Data Collection’ with a database icon, ‘Data Preprocessing’ with a filter icon, ‘Model Design’ with a brain icon, ‘Training’ with a weight icon, ‘Evaluation’ with a checkmark, and ‘Deployment’ with a rocket. Connect each stage with arrows to guide the viewer horizontally through the AI processes, emphasizing these steps’ sequential and interconnected nature.

The ML workflow is a structured approach that guides professionals and researchers through developing, deploying, and maintaining ML models. This workflow is generally divided into several crucial stages, each contributing to the effective development of intelligent systems.

In this chapter, we will explore the machine learning workflow, setting the stage for subsequent chapters that go deeper into the specifics. This chapter focuses only presenting a high-level overview of the steps involved in the ML workflow.

Learning Objectives
  • Understand the ML workflow and gain insights into the structured approach and stages of developing, deploying, and maintaining machine learning models.

  • Learn about the unique challenges and distinctions between workflows for Traditional machine learning and embedded AI.

  • Appreciate the roles in ML projects and understand their responsibilities and significance.

  • Understanding the importance, applications, and considerations for implementing ML models in resource-constrained environments.

  • Gain awareness about the ethical and legal aspects that must be considered and adhered to in ML and embedded AI projects.

  • Establish a basic understanding of ML workflows and roles to be well-prepared for deeper exploration in the following chapters.

4.1 Overview

Figure 4.1: Multi-step design methodology for the development of a machine learning model. Commonly referred to as the machine learning lifecycle

Figure 4.1 illustrates the systematic workflow required for developing a successful machine learning model. This end-to-end process, commonly referred to as the machine learning lifecycle, enables you to build, deploy, and maintain models effectively. It typically involves the following key steps:

  1. Problem Definition - Start by clearly articulating the specific problem you want to solve. This focuses on your efforts during data collection and model building.
  2. Data Collection and Preparation: Gather relevant, high-quality training data that captures all aspects of the problem. Clean and preprocess the data to prepare it for modeling.
  3. Model Selection and Training: Choose a machine learning algorithm suited to your problem type and data. Consider the pros and cons of different approaches. Feed the prepared data into the model to train it. Training time varies based on data size and model complexity.
  4. Model Evaluation: Test the trained model on new unseen data to measure its predictive accuracy. Identify any limitations.
  5. Model Deployment: Integrate the validated model into applications or systems to start operationalization.
  6. Monitor and Maintain: Track model performance in production. Retrain periodically on new data to keep it current.

Following this structured ML workflow helps guide you through the key phases of development. It ensures you build effective and robust models ready for real-world deployment, resulting in higher-quality models that solve your business needs.

The ML workflow is iterative, requiring ongoing monitoring and potential adjustments. Additional considerations include:

  • Version Control: Track code and data changes to reproduce results and revert to earlier versions if needed.
  • Documentation: Maintain detailed documentation for workflow understanding and reproduction.
  • Testing: Rigorously test the workflow to ensure its functionality.
  • Security: Safeguard your workflow and data when deploying models in production settings.

4.2 Traditional vs. Embedded AI

The ML workflow is a universal guide applicable across various platforms, including cloud-based solutions, edge computing, and TinyML. However, the workflow for Embedded AI introduces unique complexities and challenges, making it a captivating domain and paving the way for remarkable innovations. Figure 4.2 illustrates the differences between Machine Learning and Deep Learning.

Figure 4.2: Comparing traditional Machine Learning and Deep Learning. Source: BBN Times

Figure 4.3 showcases the uses of embedded ai in various industries.

Figure 4.3: Embedded AI applications. Source: Rinf.tech

4.2.1 Resource Optimization

  • Traditional ML Workflow: This workflow prioritizes model accuracy and performance, often leveraging abundant computational resources in cloud or data center environments.
  • Embedded AI Workflow: Given embedded systems’ resource constraints, this workflow requires careful planning to optimize model size and computational demands. Techniques like model quantization and pruning are crucial.

4.2.2 Real-time Processing

  • Traditional ML Workflow: Less emphasis on real-time processing, often relying on batch data processing.
  • Embedded AI Workflow: Prioritizes real-time data processing, making low latency and quick execution essential, especially in applications like autonomous vehicles and industrial automation.

4.2.3 Data Management and Privacy

  • Traditional ML Workflow: Processes data in centralized locations, often necessitating extensive data transfer and focusing on data security during transit and storage.
  • Embedded AI Workflow: This workflow leverages edge computing to process data closer to its source, reducing data transmission and enhancing privacy through data localization.

4.2.4 Hardware-Software Integration

  • Traditional ML Workflow: Typically operates on general-purpose hardware, with software development occurring independently.
  • Embedded AI Workflow: This workflow involves a more integrated approach to hardware and software development, often incorporating custom chips or hardware accelerators to achieve optimal performance.

4.3 Roles & Responsibilities

Creating an ML solution, especially for embedded AI, is a multidisciplinary effort involving various specialists. Unlike traditional software development, building an ML solution demands a multidisciplinary approach due to the experimental nature of model development and the resource-intensive requirements of training and deploying these models.

There is a pronounced need for roles focusing on data for the success of machine learning pipelines. Data scientists and data engineers handle data collection, build data pipelines, and ensure data quality. Since the nature of machine learning models depends on the data they consume, the models are unique and vary with different applications, necessitating extensive experimentation. Machine learning researchers and engineers drive this experimental phase through continuous testing, validation, and iteration to achieve optimal performance.

The deployment phase often requires specialized hardware and infrastructure, as machine learning models can be resource-intensive, demanding high computational power and efficient resource management. This necessitates collaboration with hardware engineers to ensure that the infrastructure can support the computational demands of model training and inference.

As models make decisions that can impact individuals and society, ethical and legal aspects of machine learning are becoming increasingly important. Ethicists and legal advisors are needed to ensure compliance with ethical standards and legal regulations.

Understanding the various roles involved in an ML project is crucial for its successful completion. Table 4.1 provides a general overview of these typical roles, although it’s important to note that the lines between them can sometimes blur. Let’s examine this breakdown:

Table 4.1: Roles and responsibilities of people involved in MLOps.
Role Responsibilities
Project Manager Oversees the project, ensuring timelines and milestones are met.
Domain Experts Offer domain-specific insights to define project requirements.
Data Scientists Specialize in data analysis and model development.
Machine Learning Engineers Focus on model development and deployment.
Data Engineers Manage data pipelines.
Embedded Systems Engineers Integrate ML models into embedded systems.
Software Developers Develop software components for AI system integration.
Hardware Engineers Design and optimize hardware for the embedded AI system.
UI/UX Designers Focus on user-centric design.
QA Engineers Ensure the system meets quality standards.
Ethicists and Legal Advisors Consult on ethical and legal compliance.
Operations and Maintenance Personnel Monitor and maintain the deployed system.
Security Specialists Ensure system security.

This holistic view facilitates seamless collaboration and nurtures an environment ripe for innovation and breakthroughs. As we proceed through the upcoming chapters, we will explore each role’s essence and expertise and foster a deeper understanding of the complexities involved in AI projects. For a more detailed discussion of the specific tools and techniques these roles use, as well as an in-depth exploration of their responsibilities, refer to Section 13.5.

4.4 Conclusion

This chapter has laid the foundation for understanding the machine learning workflow, a structured approach crucial for the development, deployment, and maintenance of ML models. We explored the unique challenges faced in ML workflows, where resource optimization, real-time processing, data management, and hardware-software integration are paramount. These distinctions underscore the importance of tailoring workflows to meet the specific demands of the application environment.

Moreover, we emphasized the significance of multidisciplinary collaboration in ML projects. By examining the diverse roles involved, from data scientists to software engineers, we gained an overview of the teamwork necessary to navigate the experimental and resource-intensive nature of ML development. This understanding is crucial for fostering effective communication and collaboration across different domains of expertise.

As we move forward to more detailed discussions in subsequent chapters, this high-level overview equips us with a holistic perspective on the ML workflow and the various roles involved. This foundation will prove important as we dive into specific aspects of machine learning, which will allow us to contextualize advanced concepts within the broader framework of ML development and deployment.

4.5 Resources

Here is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.

Slides

These slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to improve their understanding and facilitate effective knowledge transfer.

Videos
  • Coming soon.
Exercises

To reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.

  • Coming soon.