Even if you can, it does not mean you should.
The debate on the ethics of artificial intelligence is becoming more and more dynamic, and many researchers and policymakers are competing to create robust and practical frameworks for truly reliable or “trustworthy” systems of artificial intelligence. In the last two years, we have seen many publications by international organizations, local authorities, and top-tier researchers from around the globe that aim to propose a universal set of norms and values that should be applied at every stage of development.
J. Fjeld and other authors did excellent work in 2020 when they checked what principles are most welcomed and how they are being implemented and “executed” by relevant stakeholders. They concluded that “[t]he eight themes that surfaced in this research – Privacy, Accountability, Safety and Security, Transparency and Explainability, Fairness and Non-discrimination, Human Control of Technology, Professional Responsibility, and Promotion of Human Values – offer at least some view into the foundational requirements for AI that is ethical and respectful of human rights. However, there’s a wide and thorny gap between articulating these high-level concepts and their actual achievement in the real world”. Indeed, even though two years have passed, not much has changed regarding the application of ethical AI. Many organizations are complying with legal and regulatory requirements, such as GDPR in Europe. Still, they do not follow the norms and values they sometimes include in ethical or similar guidelines or principles that should apply to the company.
It does not mean that entities (or, to be more precise – their representatives) are not eager to apply such rules but rather due to a lack of understanding of the difference between ethics and legal or regulatory compliance. Many privacy and data protection regulations are clear (and similar in many regions) regarding the lawfulness of data processing. Before processing data, including for modelling, you must be sure that you have at least:
1. Clearly defined purpose of processing.
2. Categorized data and datasets.
3. (Informed) Consent by the data subject or other legal grounds for processing data corresponding with the purpose of processing.
4. Confidence that no exclusions may apply, e.g., to sensitive data.
Suppose your checklist includes all “yeses,” and you ensured that the processes were checked and confirmed by the legal and compliance (and data stewards). In that case, you MAY be ready to enter the next stage – modelling – which will be followed by validation and testing. With some luck, you will be able to deploy your AI systems soon. Sounds pretty good and effective, so why should you care about OTHER aspects of the AI’s life cycle? Because using someone’s data is not only legal responsibility but also an ethical one.
You may ask – why should I care about “responsible” or “ethical” aspects if I have all legal and regulatory checkboxes crossed? Let me cite J. Martnes:
“(…) if you obtain access to such data [could be sensitive data, i.a. behavioral data], should you use it? Once more, a balancing act has to be achieved between the ability to move research forward [or do business] by developing and validating algorithms using real-life data versus the extent to which personal and sensitive data are contained in such data and how they have been obtained”.
You may be able to gather informed consent from users that haven’t read a single paragraph of your Terms & Conditions and apply some anonymizing practices (including k-anonymity). Still, it does not necessarily mean that the data should be used for a specific application. Data is the source of sensitive information, and using such information may not always be ethical. I do not want to give real-world examples, but many social media platforms have been hit by authorities and the public for legal yet not ethical practices they applied. But even though you are sure that the particular use of data and – what is even more critical – the outcome of processing is ethically “ok,” you should always consider additional transparency regarding the groups affected by the AI system.
Users are usually unaware of the purposes and entities their data will be used and shared. As a controller of such data, you may say that you have consented to our T&S and gave “informed” consent so we may use your data almost freely. Yes, you certainly can. The question is whether you should, even if legal and regulatory requirements are fulfilled. Users, at least most of them, are clicking “yes” because they expect “free” products and services. Still, at the same time, they also hope that the data they share will be used to process such products and services (and maybe improve such outcomes) but not for other ethically dubious purposes. Yes, you are right – they should read T&C or at least consent, but how many companies (data processors and controller) enable user-friendly content in this domain? This is an integral part of the “transparency” requirement linked to privacy and data protection that goes beyond simple legal and regulatory requirements.
Should I, therefore, use the data I have received from my customers or users? Ethical artificial intelligence or, to be more precise – ethical data processing – requires two principles to be applied at every stage of the life cycle:
1. Proportionality and
2. Risk-based approach.
Both principles are used in “technology-driven” regulations and should be understood as follows – always think about the consequences of your actions and ensure that you apply adequate tools and safeguards. In addition, think about unforeseen (at moment) consequences that may negatively affect the data subject and model subjects. Various scenarios, including data leaks and data breaches, may help you to open your eyes and apply a more rigorous or conservative approach that could be more suitable. This should be your team's “mantra” at every stage of thinking about data. This is how data-driven and responsible entities work.
PS. In upcoming articles, I will elaborate more on both principles of proportionality and the risk-based approach.