Artificial intelligence systems can be defined in many ways. Undoubtedly, they are simply some products, a form of software that can take an entirely digital form or be part of some physical solution, such as a cleaning robot or robotic courier. However, the role of data, uniqueness, sophistication, and usefulness distinguishes products and services based on machine learning, natural language processing, or even more advanced statistical methods. Data will also have its place in the case of "simple" data analytics if the data is characterized, for example, by a high degree of "sensitivity" and therefore constitutes legally protected information. Some solutions will use personal data, and some - for example, given the use of various anonymization techniques - only to a limited extent, although there are many more variants in practice. Our imagination limits us. And regulations.
If we intend to create products and services based on data processing, we should know how the "life cycle" of such a product runs and what risks may surface at each stage. This will be necessary for us to understand the very nature of data-driven solutions and create and maintain a risk management system that should correspond to the technical and business nature of the developed products and services. At the same time, it is worth remembering that by using personal data for analytics, we can derive the need to implement an appropriate organizational and technical framework, which may need to be more intuitive, even if we are well acquainted with it data protection regulations.
The life cycle of a #ai-based product itself can be divided into several stages. Among many authors, there is essentially no consensus on how "deep" one should go into the structure of designing and using such solutions. To me, this corresponds to what is presented in ISO:IEEE 38507:2022, which assumes the following phases:
- Conceptual phase - at this stage, we should clearly define what the basic assumptions and goals of the project (product or service) are, as well as what data we will use, i.e., consider whether we have the data necessary for its development, and if not, whether we can obtain it from an appropriate source (here it is worth noting that an ecosystem of responsible data intermediaries is currently being created, which will be subject to the requirements under the Data Governance Act in a short while). We should already write down the sources we will use in the following stages.
- Design and development - as part of this phase, we should remember, among other things, the principle of data protection and privacy by design and default, i.e., the implicit "sewing in" of responsibility for personal data in our product or service; one of the more exciting aspects worth remembering is also the so-called data engineering, which is dealt with, among other things, in this year's ENISA document - Data Protection Engineering; this is also the stage at which we deal with issues such as data cleaning, annotation or labeling, as well as - if we haven't done it before - describing data with appropriate metadata.
- Verification and validation - in simple terms, this is the stage at which we examine whether our solution (e.g., a model) is consistent with the assumptions made and whether it gives us reliable results; this is the point at which we can still correct many values (e.g., variables, data sets) that affect the functioning of the system, and which can be the source of many negative consequences, e.g., discrimination as a result of algorithmic bias, or disclosure of personal data that should not be disclosed; reliable testing is crucial here, as acceptance of a flawed product could be the cause of business failure of our project.
- Deployment - if it turns out that the results of the previous stage are satisfactory, i.e., meet the predetermined indicators (KPIs), we can essentially move to the so-called deployment phase, i.e., the transition from the testing phase to the production phase, where our product can collide with reality; this process can be one-time or extended over time, e.g., in connection with the cyclical unveiling of new functionalities; it is worth remembering that models that learn continuously basically never leave - at least theoretically - the testing and validation phase, as they should undergo it in predetermined cases.
- Operational activities and monitoring - a critical phase that never ends unless we decide to close the project; during this phase, we should ensure effective and efficient monitoring of various (essential to us) indicators, which may include, for example, completeness, accuracy, timeliness or compliance *with*, and more broadly - also the quality of data; each organization should determine on its own what indicators will be vital to it and what will indicate the need to take appropriate action, such as data substitution or correction of the data itself. Each organization should determine on its own what indicators will be essential for it to what will indicate the need to take appropriate actions, e.g., data substitution or correction in the model itself; monitoring systems will be more or less automated, but they should unquestionably have predefined limits (thresholds) and alerts combined with a system of communication and assumption of control and responsibility for the performance of the product; in terms of monitoring we already have appropriate technical solutions available on the market, which can also be found in the use of cloud computing.
- Re-evaluation - it may turn out that despite meeting our assumptions, a given solution does not meet expectations, which may be due to lack of demand or excessive risk for the organization, e.g., due to "irreversible" algorithmic bias; in such a situation it may be necessary to either take two steps back and modify the assumptions or the solution itself, as well as possibly close down the project; this decision should be documented to clarify the basics, which can be used for other projects in the future.
- A kind of retirement, i.e., archiving or permanent deletion; this is the stage at which the project ends its life, which is sometimes simply the result of loss of usefulness, and sometimes the result of some spectacular event that, for example, caused image damage; either way, laws, and regulations impose some specific obligations related to data (or entire solutions) that have been used by the organization, such as. In terms of mandatory archiving of documentation or permanent deletion of data from the medium or even the medium itself; often, if we use personal data, we should tie this not so much to our "belief" as to the period for which we can use the data, but rather to the relevant processing register, in which the relevant periods are often recorded.
Of course, it should not be overlooked here that, for example, in the "Design and Development" and "Verification and Validation" phases, we are dealing with peculiar "sub-phases" that are related to the development of the artificial intelligence model itself, which include data collection, pre-processing, modeling and evaluation involving validation and testing. In practice, then, it will turn out that there are many more processes themselves, and each organization should find its path to the product that takes into account both the goal and objectives of the project and the risks associated with it. And also, the principle of proportionality considers the business's scale. This will require a relatively careful analysis of what place data occupies in a given entity and whether it is treated as a potential (or already actual) source of value or a "necessary evil" that only generates additional handling (legal and regulatory) costs.
Each of the above-mentioned phases will require a different design of processes, involvement of the right people, or application of solutions of a technical nature. These issues should be reflected both in organizational regulations (in terms of the division of roles and responsibilities - accountability), policies (the rules and steps we take at each stage), and procedures, which will define in more detail the next steps that the appropriate people should take to ensure compliance - they are a kind of instruction and security. One of the most critical elements is the introduction of appropriate project management and the developing of a map of the processes associated with the product or service. This will streamline our operational activities.
At the same time, it is also worth remembering that documentation is significant, and both the documentation is given to the customer and so technical, which is essential for current and future employees responsible for the product (another article). Communication will matter at every stage but will be most evident in the operational and monitoring phases.
Failure to consider all the pieces of this puzzle can result in unnecessary legal and regulatory risks and contribute to inefficiencies in the processes involved in creating and developing a product or service. Implementing a data-driven culture and AI & Data Governance can ensure better and more cost-effective data management and processing. Sure, this primarily requires decisions at the board and executive level and allocating relevant resources. Still, in the long run, there is no other choice if we want to be an organization that is smart and secure in its approach to data and monetization.
If you have questions or concerns, feel free to let us know. We will answer any questions you may have.