Machine Learning is a concept increasingly familiar to the public and users of digital services. Thanks to this discipline of artificial intelligence and computer science, computers can, among other things, identify patterns that describe human behavior. For this to happen, computers have to be trained with large amounts of data that are extracted directly from user activity and the information it confers on the machine.
So far, everything sounds good. The problem comes when we realize that, when predicting behaviors, the computer can also detect weaknesses. Or, put another way, if the model learns when you are going to eat ice cream or chocolate, it can also understand what you prefer. This can lead to the owner of the Machine Learning model being able to manipulate users for their own benefit. At this point, the scientific community has begun to wonder: how can we preserve or even improve the privacy of users and their data, while allowing machine learning models to be built that make such data useful?
Differential privacy: putting privacy at the center to avoid manipulation derived from Machine Learning
How does machine learning manipulate us? Let’s take as an example any of the scenarios where we could be interacting with artificial intelligence trained with Machine Learning: the website of a book store, our favorite video app, etc. In these cases, machine learning makes it easier to model and predict clicks on certain items, offering recommendations of what to see or what to buy, based on your preferences. In these cases, the available options would be so many that no user could process them all. Therefore, users end up being encouraged or even conditioned, to choose from the recommendations that the Machine Learning method preselects, based on predictions of what the user will prefer.
For this reason, within the Machine Learning community, work is being done on alternatives to solve this problem. The development of the technology known as “Privacy-Preserving Machine Learning” (something like machine learning that preserves privacy, whose acronym is PPML) is making it possible to advance and understand the trade-off between data privacy and its usefulness of data models. learning.
One of the techniques that PPML uses to protect user data is differential privacy. “We can imagine differential privacy as a mechanism that introduces noise into the data (or the learning model) to differentiate it from the original data. In this way, we can “hide” or dilute information that would differentiate the user from the original data, ”explains Nicolas Kourtellis, a researcher in the Telefonica scientific team.
Machine Learning: federated learning that preserves privacy
In their latest research, the Telefónica Research team observed that differential privacy can achieve a good trade-off between data privacy and the usefulness of the Machine Learning model, even in the event that an adversary tries to interfere or attack the model. trained with noise through differential privacy.
Another line of research that seeks another alternative to PPML goes through Federated Learning (FL), or federated learning. FL is about keeping user data always at the edge of the network or at the source. That is, instead of collecting the data on the server, each user’s device trains its own version of the Machine Learning model locally. All the resulting models are collected and added to a single more powerful model. But since the learning model that is generated in the devices is not very reliable, what are known as “federated learning rounds” have to be done, in which the data travels back to the devices from that unique model to which it is have been added and the process is repeated, ensuring the high fidelity and usefulness of the model.
The thing about federated learning is that it does not always ensure user privacy, because the construction of the model parameters can filter sensitive information. To address this problem, and protect user data during model learning, the Research team recently proposed the first ‘Privacy-Preserving Federated Learning’ (PPFL) framework. This framework can significantly improve the privacy and usefulness of the model while reducing repetitions of the FL learning process.