Simplify for Success - Conversation with Katharina Koerner
We invited Katharina Koerner on #SimplifyforSuccess, a podcast series presented by Meru Data and hosted by Priya Keshav to discuss IG programs.
As a senior leader with a Ph.D. in EU law and a deep interest in new technologies and information security, Katharina spoke about homomorphic encryption and multi-party computation (MPC).
She also discussed privacy-enhancing technologies (PETs) and how they can generate more value from data sharing and data collaboration.
Listen to it here:
*Views and opinions expressed by guests do not necessarily reflect the view of Meru Data.*
Transcript:
Priya Keshav:
Hello everyone, welcome to our podcast around simplifying for success. Simplification requires discipline and clarity of thought. This is not often easy in today's rapid paced work environment. We've invited a few colleagues in the data and information governance space to share their strategies and approaches for simplification.
Today we will be talking with Katharina Koerner. Katarina is a senior leader who combines a PhD in EU law with senior management experience and a deep interest in new technologies and information security. From 2006 to 2015, she served in the Austrian Ministry of Interior and the Austrian Federal Ministry of European and International Affairs as a policy advisor and legal officer. From 2015 to 2022, she was the CEO of Culture and Language Institute with eight campuses across Eastern and Western Europe. During her tenure, she founded two additional campuses in Sarajevo and Moscow and led the GDPR implementation across the headquarters in Vienna and all 10 locations. Katharina’s mission is to be a translator between privacy, policy and technology by contributing to responsible and privacy preserving, machine learning from a legal and a governance perspective. Currently she is focusing on business applications and privacy compliance of privacy enhancing technologies. Katarina is a policy fellow at the Privacy Tech Alliance PTA, the global initiative by the Future of Privacy forum with a mission to define, enhance and promote the market for privacy technologies.
Today our discussion is on privacy enhancing technologies. More than a decade ago, the Dutch and the Ontario Data protection authorities recognize the role of technology in protecting privacy and coined the term PET or Privacy Enhancing technologies. PETs deploy a broad range of techniques to protect personal or sensitive information. The main purpose of PETs is to protect data, but at the same time ensure the data can still be used for various business reasons whether it is analytics, machine learning, or something else.
Welcome to this show, Katarina.
Katharina Koerner:
Oh, thank you so much for having me, Priya.
Priya Keshav:
Your LinkedIn profile says you are bridging the world between privacy, policy, and IT and I was particularly intrigued by those terms. So, what do you mean by bridging the world between privacy, policy, and IT, how do you do that?
Katharina Koerner:
Yeah, thanks for that question. For me it is really a matter of seeing things holistically. This is what motivates me. This is what I really like, and particularly in the field of PETs, I found a subject where I can do that really well. So, privacy enhancing technologies, the topic we're talking about today, has so many legal aspects, business aspects, business enabling aspects, technical aspects, regulatory requirements which have to be taken into account or are still about to be developed. So, it really covers a whole area of new developments, new opportunities. And that is where I come from, those different perspectives, looking at it.
Priya Keshav:
So, what is a PET? What is a Privacy Enhancing Technology? How will it accelerate an organization's data strategy?
Katharina Koerner: Privacy Enhancing technologies are what I would call, I personally call them also privacy by design technologies. Or others, it's more and more common to also call them partnership enhancing technologies, federated learning, the branch to privacy, multiparty computation, homomorphic encryption, secure enclaves, and synthetic data. I think that it's very commonly understood that it belongs to that group of privacy enhancing technologies. And regarding the question how they can accelerate companies' data strategy well, they really enable use of data or data sharing in completely new way.
so you know you, you said completely new ways and this is something that is very important. Something that at least I can say some of our clients and are actively looking at right? So how to incorporate privacy enhancing technologies so that they can make sure the data is protected and safe when they are sharing or doing what would be required from a business standpoint, right? So how do you bridge the gap between data protection on one side and value on the other side? Do you think privacy utility is truly possible?
Katharina Koerner:
Yes, I do. I do. I think that PETs in particular really help with that. That we can generate more value from data, data sharing, data collaboration. At the same time, secure data better but also secure the privacy properties of the data better. So, if we take an example, particularly, the financial industry is very interested in privacy enhancing technologies, Homomorphic encryption, and multi-party computation because in the end eventually it is even possible to share data with an adversary or someone where it is really important that it is not just an agreement like on paper. That the data will be kept confidential, by mathematical proofs. The other party with whom you shared data, really cannot look into the data. It is impossible. Proven by mathematical protocols to gain insight in the data that was used as input for generating the common output. So, the financial industry has a lot of requirements for know your customer. So, data banks need to share information at the same time, there are just strict privacy regulations, so that's really a tradeoff, that's really hard to bridge those two requirements and by using, for example homomorphic encryption, where you compute on encrypted data, you can really gain insights from data that stays encrypted or by multi-party computation that you have a collaboration on the data and you get the result without knowing what the other party actually put in. That's completely new, so that is what is so fascinating and at the same time, because the technologies are quite hard to understand. I mean, I'm a legal person, so it took me a while to really wrap my head around the technologies and that what I think is still some kind of a roadblock or some bottleneck that, it just takes some more time that regulators and lawyers and the data protection officers really wrap their heads around the technologies. And that we can really use that in a broader way, but I think as soon as this has happened and it's about to happen, then we will see that those technologies will become state of the art in a couple of years.
Priya Keshav:
I agree with that. See so that option as you said is not there because we still don't understand that these technologies are there. But we all need to get comfortable and understand both the use cases as well as the needs around how to implement some of these technologies, right? And the regulators will have to get comfortable with how this protects privacy, because like you said, financial services are one example, but even other industries, retail, for example, right? And it is important to be able to share data because nobody can work in isolation. Even the European regulators have, UK, for example, is talking about AI. And realize how important data is and how AI is important for economic growth and development and can help us in many ways. But this is not possible if data sharing is not possible, so it's not so much that data sharing needs to be stopped. It's how do you safely share data? You mentioned homomorphic encryption. What is Homomorphic encryption? It just sounds really cool, why are you excited about it and what is it?
Katharina Koerner:
So homomorphic encryption, so I completely agree with you. I think it even sounds somehow poetic. Homomorphic encryption means that you can collaborate or run functions on data that stays encrypted to those concepts of securing data in transit or securing data that you store. This now enables also securing data in use. So, for example if we have account information and usually this account information is stored in the cloud. Usually, you would have the option to either, if you have to change the data, if you have to look something up, the cloud provider, would decrypt the data or you would decrypt the data in the cloud, work on the data, compute on the data, and then afterwards the data gets encrypted again. Or you would download the encrypted account data, you would decrypt it. You would compute on it. You would work with it and then you would encrypt it again and upload it on the cloud again. But with homomorphic encryption it is possible to compute on the data, while it stays encrypted. So you go to the cloud, work on the data while it stays encrypted. So, really in this unsecure environment, stays encrypted and that is a completely new concept. I mean they were looking for it, searching on that subject for years and now it’s getting more and more possible to really have that robust and it's working fast enough and that the use cases are getting more and more.
Priya Keshav:
Obviously, if I'm able to do my processing and computations on the data without unencrypting the data, then the data remains secure, but what about the use cases around privacy, right? So security is just one aspect of it. The other aspect of it is, let's take retail or financial services, for example, when they share data, you know it becomes, let's say I am the consumer and I become a commodity, right? So now you know, way too much about me, my behavioral patterns, some of it could be something that I'm comfortable with and some may be beyond what I had envisioned when I had shared the data. So, it doesn't still solve the problem of oversharing in terms of you know I gave it to ‘X’ but now my data has gone to 3 X, you know? Or 10X companies and they are using it for purposes that I could have not even imagined and at some point, that feels like a violation of my privacy rights, right? So even though it's secure, it does not really solve all of the other privacy issues. What do you think of that?
Katharina Koerner:
Yeah no, I completely agree with you. So, the collection limitation or the purpose specification, the use limitation, and all those privacy principles, of course, are not covered by just applying some privacy enhancing technology. That's probably why it's called enhancing and not like something else. So, I completely agree with you, although. There's really an ongoing research or discussion about if, for example, homomorphic encryption, multiparty computation could also be considered anonymizing data. And de-identify, to do de-identification with those technologies, and that would mean that they would not be in the scope of GDPR anymore, for example, or other privacy related regulations. And that's a different topic then right? If we are in the scope or not. So, if we do even have to take care of those privacy principles, if we have anonymized data. This is getting to be a really complex and super interesting field, because I mean if I can imagine, I mean Apple’s CSAM, I think it was a good example where it was also kind of a privacy enhancing technology that they used on the device, on the edge scanning, but it wasn't perceived well by public, so the PR spin it got was really super negative and I think when it comes for example to homomorphic encryption and that could happen too actually. So, if we have a lot of consumer data and there is some computation, or some machine learning training done with that data or some data value generated from homomorphically encrypted consumer data. Someone could argue, well, data was encrypted. I couldn't see what was and so it was deidentified, I didn't have the key, but on the other hand there is still the value that is generated from that, even if it's considered anonymized. So, I think that in the end consumers might not be comfortable with companies gaining insights and gaining value from data, even if one could argue it's deidentified.
Priya Keshav:
So, agreed. So here are my questions. What you’re trying to say is, Privacy Enhancing technologies are great, they solve some problems, but they don’t completely solve all the problems. And it's important to understand their use case. It is also important to understand that just because we adopt privacy enhancing technology, we have not solved privacy problems or privacy altogether. Some of the other issues remain. I think that's where I feel like sometimes there's this assumption and it applies not just in privacy enhancing technologies, right? We were talking about, for example, in cookies like people go, oh, third party cookies are bad but first party cookies, you know now privacy is not theirs because it's just us. But you know some of the issues still exist, and it is looking at it more closely and understanding what issues kind of are solved than the ones that exist and being able to provide meaningful feedback, or if I might call advice to ensure that we're not kind of dropping some of the other principles, privacy principles, and the privacy issues on the table while we're incorporating these things. But you also talked a little bit about pseudonymization and anonymization. Before we go into that topic, I do want to ask you about secure multiparty computation. What is Secure multiparty computation?
Katharina Koerner:
So secure multiparty computation or also called MPC is a way of sharing data or data insights, while the input data remains completely private. So, you and me, we could share how much we earn without wanting each other to know how much it is. So, this is like the starting point of this, the whole development, the millionaires problem, I mean. I hope you are a millionaire, I'm not. But I can tell you so much. But we could share how much we earn, and the output would show that without revealing what our input was? I mean, this example is really simplified. Because of course from the output if it's not set up correctly, one can draw conclusions about what the input was. That should not happen in a real MPC architecture. But in fact, it was used for a wage gap analysis in Boston. That's also a very classic example. I think they're even still doing it, so a lot of companies really sharing their salaries and to see if women earn less than men. But the companies, of course didn't really want to give insight, so it is a multiparty computation architecture is applied, and it really just reveals the wage gaps, but in no way it is revealed how much the companies really pay their employees.
Priya Keshav:
A good use case for this would be we did a podcast with a company relating to FCPA, so they had a AI model that they were training and part of the process was to identify possible violations and then obviously do some investigations around it, but they were looking for ways to share this AI technology that they built in house with other companies, with the idea that you know together some metrics would be shared, but so that the overall AI model can be enhanced. But at the same time, they didn't want to kind of disclose some of the confidential information that probably was specific to the company, so that would be a great use case for multiparty computation. But multiparty computations are already used very widely, isn't it?
Katharina Koerner:
I think that there are way more use cases already for multiparty computation than we are aware of. Just a small research I did, implied distributed signatures, key management, for example code signing, I mean you can not only use it when you collaborate with other companies, but also within your own company, that the key is split up. And only three people together can sign. For example, a code before, it's shipping. And of course, in privacy preserving machine learning, it's one of the main technologies used. Also in blockchain there's some use cases and again in financial fraud detection and digital advertising, and of course medical research so it's I think, way more common already and it's really on the rise or about to become a standard for specific use cases.
Priya Keshav:
You bring up a really good point, right? So EDPB recognizes MPC as a technical safeguard and a pseudonymization technique for processing personal information. Can you elaborate a little bit on that?
Katharina Koerner: Yeah, so we all have heard of Schrems too many times, I guess. Uhm, he's from Vienna like me by the way, and his office was just like the street behind my office. But I don't know him, but anyway, so those new supplementary measures that the European Data Protection Board now published or strongly suggested one of which is, uh, split processing how they call it or multiparty computation and they invested in projects for quite a while before that, so it's actually quite interesting so that they did that, and that's I think the reason why it's on their radar because they could also have mentioned other privacy enhancing technologies explicitly, but yeah, so they really mentioned it as one potential technology used to secure data in transnational data transfers, so also to use it in the transfer impact assessment to take that into consideration as one tool to really supplement any other measures to provide the same protection for data when it's transferred, for example to the US as when it would stay in Europe. So that's really an amazing step forward, and I think it's not yet so widely recognized or thought through what it actually means and how we could utilize this appreciation of the technology that had happened.
Priya Keshav:
So, coming back to we talked a little bit about anonymization, right? I do want to compare Pseudonymization versus anonymization. From a practical standpoint, I think there is a lot of confusion across the board, right? In terms of what data will be considered truly anonymized because sometimes people pseudonymize the data and end up thinking that they have anonymized.
Katharina Koerner:
When can data be considered anonymized is a very complex and tricky question that would require different answers depending on in which country you are. Even in Europe there is no one clear guidance from the European Data Protection Board or provided by the GDPR itself or by the European Court of Justice. So, that is one of the main problems, I think, that there is no legal clarity about what exactly is anonymization and what is pseudonymization. The ICO in the UK just came out with its second chapter of its anonymization report or guidelines. So they're really, really investing in more information and providing increasingly legal clarity around that, and I think that's really awesome and super needed, super helpful and they are using the Motivated Intruder Test to assess if data can be considered anonymized enough and, in that regard, then falls outside of the scope of the GDPR, and it's the same with the HIPAA privacy rule that's also such an old rule and either for de-identification you have this very well known removal of the 19 identifiers or you have an expert opinion. So, if an expert says OK, this data can be considered not individually identifiable, so I cannot identify an individual again. Then it can be considered de-identified, and HIPAA does not apply anymore. And I would argue that many privacy enhancing technologies can fall under this exception. But it’s still a risk because there's a lack of legal clarity around that.
Priya Keshav:
Just PET and now encourage people to store data permanently and gives one a false sense of security and privacy, how do you? Uh, what do you think of that?
Katharina Koerner:
So, I think it really depends what PET you use, and we would need to look at it on a case-to-case basis. But for example, if we have synthetic data and the data set is, that's an anonymized data set, so we don't need to really delete this data set ever. But if we're not outside of regulations, of course we still like we said before, we still need to follow the regular privacy principles. We need to destroy it or delete the data so that it still applies. So, we need the legal basis for processing and if the legal basis is not there anymore then we still need to apply all those different requirements and so that doesn't change anything. But in general, I think Privacy Enhancing Technologies are like as I said at the beginning, privacy by design tools. So, privacy by design strategies can really be pursued with PETs, for example, if we take Jaap-Henk Hoepman’s strategies, privacy by design strategies, he divides that. For example, he lists minimization separation, abstraction, hiding as privacy friendly data processing, and I think those things can be pursued way better with privacy enhancing technologies. So, we can fulfill the privacy by design, privacy by default, requirements that we have in the GDPR Article 25, with privacy enhancing technologies and the GDPR also requires us to use state of the art and that is as soon as the technology is mature enough and rolled out on the market there. The state of the art already begins, so that's why we both, I think, agree that in a couple of years those technology will be way more widespread because we can just increase the privacy and confidentiality with those technologies. I think it's interesting to also think about this privacy triad. Not only the security triad, but the privacy triad that NIST came up in its privacy engineering introduction, the introduction to privacy engineering and risk management in federal systems. Because we hardly ever talk about that triad and it's predictability, manageability, and this sociability and privacy enhancing technologies also help with those privacy objectives very much to know what processing is about to go on to manage the processing and to process the personal information without association to individuals is one part of this privacy NIST triad and so in this regard, I don't think it can only be considered as a security tool, but really as the name says, it's more on the privacy side. Because new collaborations can also take place like we mentioned them before that go beyond. This has not so much to do with security. It's really about the private inputs that stay private that enable new forms of collaboration beyond just securing data in a commonly understood way.
Priya Keshav:
But almost all technologies have vulnerabilities, right? So, it's not like privacy enhancing technologies are perfect, so they bring new vulnerabilities. Can you talk a little bit about the vulnerabilities that are possible with at least two of the privacy enhancing technologies we talked about, which is homomorphic Encryption and multiparty computation.
Katharina Koerner:
Yeah, I think we all know that there is no ‘zero risk’. There is no absolute security. This is why we have so many great people working in the field of information security and still so many things happen. So of course, this is the case also with privacy enhancing technologies and there is, as far as I could see, a lot of research going on, and still, there's a lack of broadly accepted definitions of robustness. For example, in privacy preserving machine learning and the security properties, do not 100% translate from one use case or one technology to the other, but for example, we have membership inference attacks in privacy preserving machine learning. That's a classic, so that the data set, that it's a reverse engineered what data set was used for training, for example, but this field is really so there, so we only think that the 100 people in the world who have heard of homomorphic encryption and multi-party computation because I heard people saying that. But that's not true. There's this huge ecosystem of privacy preserving machine learning on people who are looking into those security properties. So, any system has to be set up appropriately to be as secure as possible. And of course, it's also true for PETs.
Priya Keshav:
We touched on this a little bit, about the lack of coherence from an anonymization standpoint, right? So, you talked a little bit about, there is no consistent way to define what is truly anonymized, and there are so many definitions, so it's very hard to come up with a you know a definition of ‘this is truly anonymized,’ but part of that is also, you know it's very hard to think about how to deanonymize it. Because there's so many different ways in which, like the minute you think it's anonymized, you find a way to sort of deanonymize it if I can use the word. So, the same thing applies to PETs, there are some inconsistencies and from a regulatory standpoint. Can you share your thoughts around those inconsistencies as well as what needs to be done before privacy enhancing technologies become mainstream?
Katharina Koerner:
I mean it would be great if we have more guidance in in terms of just regular reports or guidelines by authorities, but also of course regulations. I don't see that really coming. Particularly in the US, where we are still struggling to get a coherent privacy regulation, but I think that there's a lot of support for research in that field and that will then also have to be followed up by regulation. So, I think sometimes regulations lead like the GDPR. Uhm, it led. It was really like Uhm, spearheading this whole new universe of privacy. But sometimes it has to catch up. And I think in the world of PETs, it's more regulations catching up. And there, that's a lot to do here. But I mean, we see that things happen and this promoting digital Privacy Technologies Act that was at least introduced into the Senate. I don't think it will really be pursued, in the US in February 2021. So, this year defining PETs as software solutions, technical processes, and other technological means of enhancing the privacy and confidentiality of an individual's personal data and also explicitly, for example, mentioning privacy as a multi-party computation, and This act would have foreseen or wants to have a finance more research on the topic, and I think part of that research would also be really looking into the details, when can what be considered anonymizing or de-identifying or not, because this is what we need for the business. I mean, this is what we everyone or everyone like many people talk about generating value from data that we have not even thought of yet, like on what data are we actually sitting. How could we leverage this? And we need those technologies and clear definitions about anonymization and pseudonymization to really be able to do this, because otherwise the risk for the businesses is just still too big.
Priya Keshav:
Any other closing thoughts?
Katharina Koerner: Yeah, just wants to mention that because I stumbled over this last week and going in more detail looking at it and I think it's just really a super interesting method. So, in the realm of threat modeling, we know that threat modeling is really something very useful in Information Security, but for privacy specific privacy threat modeling I find that this LINDDUN Approach, is really super interesting, so maybe some listeners are interested in that tool. It really designs the data flow and then looks at the data flow. LINDDUN stands for linkability, identifiability, Nonrepudiation, detectability, disclosure, unawareness, and non-compliance. It looks at the data flows from bad, specific angles and then maps those vulnerabilities have been found to specific PETs, so that would be a very practical method to bring privacy enhancing technologies into the existing workflows and yeah, I just wanted to drop it here because that was my last post on LinkedIn about and I think that's really something. Super useful and again it was developed in Belgium where they also have this cluster of really smart people and things going on in the in the realm of that PETs, so it might be useful for someone.
Priya Keshav:
Thank you so much for taking the time to participate in the podcast Katarina. It was very, very useful to talk a little bit about PETs and you know obviously it's hard to take a deep dive because these are pretty complex technologies and there is a lot to talk about, but at least you know at a very high level we were able to introduce some of the privacy enhancing technologies that seem to make a difference today.
Katharina Koerner:
Yes, thanks so much for having me. It's really a very complex topic. I agree with you, but it's just that we have to have the courage to just approach it because there is so much opportunity. And so yeah, it's good to talk about it for sure.
Priya Keshav:
Thank you again.
Comentarios