Simplify for Success - Conversation with Jeff Jockisch

Priya Keshav
Aug 4, 2021
21 min read

Updated: Jul 27, 2023

Jeff Jockisch, the Data Privacy Researcher at Privacy Plan, was on #SimplifyforSuccess, a podcast series presented by Meru Data and hosted by Priya Keshav to discuss IG programs.

Jeff spoke about privacy-enhancing technologies and what elements constitute a privacy-enhancing technology.

He also discussed the existence of data biases in analytics programs and the importance of testing to eliminate bias.

Available on ITunes

Spotify

Listen to it here:

*Views and opinions expressed by guests do not necessarily reflect the view of Meru Data.*

Transcript:

Priya Keshav:

Hello everyone, welcome to our podcast around simplifying for success. Simplification requires discipline and clarity of thought. This is not often easy in today's rapid-paced work environment. We've invited a few colleagues in data and information governance space to share their strategies and approaches for simplification.

Today we'll be talking with Jeff Jockisch.

Hi Jeff, Welcome to the show.

Jeff Jockisch:

Yeah, great to be here.

Priya Keshav:

So, tell me a little bit about yourself. I think you're doing some very interesting things, and so I'd like to hear a little bit about what you do at Privacy Plan.

Jeff Jockisch:

Sure Priya, so I'm a data privacy researcher and I sort of live at the intersection of I guess privacy and data and I try to bring my background in data and cognitive computing and search engines and building datasets to my more recent interest in data privacy. About four years ago, I started working as a marketing manager for a data privacy company and realized that I really wanted to get more fully involved in data privacy. And I sort of moved around a little bit, sort of within my own mind as to how I wanted to do that, and I was trying to do some product sales in the data privacy world and then trying to develop my own product.

And I finally realized that I just really needed to do what I loved, which was. This was sort of data analysis and data manipulation and data research, and so I've taken those skills that I already knew how to do and do well and sort of applied them to the data privacy world and so. So that's where I am now, and it's turning out really great, so I'm doing something I love and just applying it to the data privacy world.

Priya Keshav:

So today we're going to talk about privacy-enhancing technology. So, what is privacy-enhancing technology? Do we have a consistent definition across industry? Or is there some ambiguity around what should be considered a PET? I guess, which is a short form for privacy-enhancing technology.

Jeff Jockisch:

Yeah, well, I think there's a heck of a lot of ambiguity there. I'm still trying to figure out what it is, and I don't think really anybody knows what it is? If you probably asked most people, they would probably sort of point you to, maybe the privacy engineering companies that's sort of how I labeled them, right? The companies that are really sort of developing tools that help you some sort of more defined I guess, well, I mean I'm not even sure that's the best definition, right? But so I'm thinking sort of like sort of deidentification and homomorphic encryption, right and differential privacy and Federated Learning tools. Those kinds of things that as maybe what jumps to at the forefront, I think of my mind and maybe most people's mind, but it's really much broader than that.

I've sort of started actually building a database of privacy-enhancing technology companies, and if you take a more expansive view of what that might encompass, it's really much, much broader, right? And I've tried to actually develop a taxonomy to sort of put some parameters around that, and what I've really come up with is sort of five major segments I guess of what I would call privacy-enhancing technology right? And that's sort of privacy program management, right? Which is sort of pretty well defined by the IAPP, the International Association of Privacy Professionals right? And that sort of includes sort of assessment managers and consent managers and data mapping, data subject requests, privacy managers, privacy information managers, those kinds of things. And then the second goal segment would be sort of enterprise privacy like sort of activity monitoring and data discovery and enterprise communications. I also add into that enterprise governance, right... ESG, which IAPP doesn't really include, but I think sort of needs to go into that bucket. And then the third segment I include is privacy engineering right where I put those sort of deidentification things we just mentioned and differential privacy and Federated analysis and homomorphic encryption right? Also, synthetic data generation and zero-knowledge systems.

But I think there also has to be a couple of additional things that we include in this list of privacy-enhancing technologies and that sort of gets more to sort of the new things that are developing in our economy, which are sort of like the consumer privacy products right? Which includes like the private web and private communication tools and monitoring in opt-out tools like privacy agents and maybe even more important, the whole data economy that's starting to grow upright personal data, economy tools, and identity management and data collaboration even sort of distributed ledger technologies and privacy coins so. You can sort of see that there's a huge swath of companies that sort of could be considered privacy-enhancing technologies.

Priya Keshav:

So, in your mind, you're taking a much broader approach to defining privacy-enhancing technologies than the narrow approach that just talks about the encryption and the synthetic data and data manipulation tools that are simply masking and protecting the data itself.

So, it's interesting. How do we standardize on definitions and you know any insights that you'd like to share? You said you basically have a database of all the privacy-enhancing tools that are out there. What do you find you know from the database? Any insights or thoughts that you can share from your research so far?

Jeff Jockisch:

Well, there's not a whole lot of insights that I've gained yet, because I'm really still more in the data collection phase. But you know, I think it's just amazing that first of all, the number of companies that actually fit there, I mean the venture capital funding that's flowing into these organizations is really quite amazing. Here's an interesting data point, if you just look at facial recognition technology right, which you might actually consider maybe more privacy-invasive than privacy-enhancing, I mean, I think there's something like $500 million just in the past year that have flown into facial recognition companies. Which is pretty amazing given the fact that everybody seems to think it's you know, more problematic than helpful. So, it's really pretty crazy you know what's going on in this space, but there's also a heck of a lot of money flowing into companies that want to protect privacy and you can actually look at a lot of the different tools that are looking to protect data. And I think that I've seen a sort of an interesting bifurcation in companies that are trying to sort of store data for first-party platforms. As we sort of move into this cookie-less world, companies are trying to figure out how they're going to store their first-party data and if you look at sort of the literature and the way companies are marketing themselves as sort of silos for that kind of data to find, I think sort of two different approaches.

One is companies that are trying to leverage that data in a very privacy-centric way and other companies that are just sort of avoiding those privacy words completely and focusing exclusively on monetization. And I think it’s just interesting how that's sort of going to shape out. Are companies going to embrace privacy fully or are they just going to try to ignore that and focus on monetization for as long as they can, right? I don't think they're going to be able to ignore the privacy problem forever. But some of them may try to for a while.

Priya Keshav:

So, you bring up a very interesting point on the first-party data, the data analytics, and all the technologies that are helping companies. Let’s focus on those that are sort of taking privacy seriously but still want to collect as much first-party data and obviously use the first-party data in a safe manner. So there is a number of technologies and there are pros and cons to some of that, but that talk a little bit about either using the data set to create a synthetic data set that, you know, mimics the real-world scenario or the ability to sort of either transform the data or mask the data. You know there are so many different ways to encrypt the data, but then there are cons to it too. So those who are sort of looking at this thing that this would just make the problem worse because it's going to give you a false sense of security around privacy where people start thinking that, you know, just because it's encrypted or it's somehow masked that it is now secure so allowing companies to do more mining and monetization.

What are your thoughts around both the those who think that it's a good thing as well as those who sort of look at it as a problematic solution to the problem.

Jeff Jockisch:

So, I think you hit on a really good point. There are a lot of different ways to sort of approach this, I mean, you can generate synthetic data, you can encrypt the data, you can tokenize the data. There are some new solutions out that that sort of claim based upon some of the cryptography, you know, based upon zero-knowledge, proofs, and other stuff coming out as sort of the distributed ledger technology to be able to actually, you know, keep the data encrypted not only in transit and at rest but actually during computation so that there's you know the data is actually never available even when you're sharing it right, which is really interesting. I think that stuff is actually going to come to fruition. I'm not sure if it's actually here yet.

Maybe it's a little bit marketing hype still, but I think those solutions actually do exist and they will come to market. But you're right that they're actually going to cause some issues on the backside of that. Because now we sort of get the sense of security that OK, now all of our data is safe, right? And just like when we had, you know, all these companies saying OK, now all this data is OK because it's de-identified. Well, all that de-identified data that all these companies and data brokers are holding right now, we know that de-identified data is really very toxic, it’s easy to re-identify as much as they claim that it's not almost all of it is, it is very easy to re-identify. I know there's a couple of companies out there that are sort of claiming that, you know, once your data is encrypted and safe and you can't sort of reverse engineer that stuff that now you're going to be able to collect.

You know additional personally-identifying pieces of information like sex and you know, date of birth and things like that, in the past might have been a little bit unsafe to you know to collect. You know political affiliation, things like that, and while that may be the case that now you can collect it more safely. The problem is that when you start to merge that information with things like the creation of algorithms with that data, now you've got the real problem of those algorithms potentially, generating bias because you've got those data points included in the algorithms. Where before you didn't actually collect those pieces of data, right? You essentially gotten rid of the whole data minimization movement, right? And you've moved into a data maximization movement because you feel safe.

Priya Keshav:

So, that's an interesting point of view. So yes, that's what people feel like. Once you start saying that I figured out a safe way to analyze the data, to mask the data, to encrypt the data and somehow I've deidentified the data. There are no, you know, clear metrics and how to measure whether it's truly deidentified? and now you're moving further and further away from data minimization and part of it is why is data minimization, sort of, maybe a hard thing? And is that because we're so we all sort of love data and so we cannot ever think about, you know, getting rid of it? or is it just data minimization is a tough problem? Or you know, is it even at odds with each other?

Jeff Jockisch:

Yeah well, I don't think data minimization is a tough concept. I think it's just tough to implement, right? I mean, I'm a privacy professional. I understand you know the concept of data minimization but being a data guy, I don't like to throw away data. It's always in the back of your mind that you might want this data for something. So, it's hard to want to get rid of it. If you think, you know, what we might be able to use that for something and so, you never really as a marketer or a data guy, a data scientist. And I don't consider myself a data scientist, but, you know, I can understand how those people wouldn't want to get rid of data that you might eventually want to use, but it's really a bad idea if you keep this stuff around and you don't have any specific use case for it, it's just, you know, data. You know, how often companies get breached now, right? It's just sitting there waiting to become a toxic asset for you if you don't have use for it.

Priya Keshav:

How do you expect to be able to measure the effectiveness of data masking? Do you think we'll get there where we'd be able to measure and accurately decide if a particular technology is working well? Or is that a difficult problem to solve?

Jeff Jockisch:

Well, no, I think that the technology is getting really good. I think we are going to be able to encrypt data even down to the field level, right? And even, you know, put rules around it. I really envision a world where we're going to be able to put, you know, even I mean, you might be able to think of it as like putting smart contracts around every little data point, right? So that Priya your age right as a data point might be encrypted so that it's only available to be used when you give it to somebody under certain circumstances, and that encryption says that you know it expires after a certain amount of time and is only usable in these particular instances by these particular people, for these particular reasons. And so I think that the encryption level masking whatever you want to call it is going to get very good and very granular. Uhm, but that doesn't mean that all of our problems go away because if we keep collecting more and more data, it still could become a problem in the future, right? Depending upon how well those encryption keys, for instance, are stored or you know how other problems might present themselves?

Priya Keshav:

So, you brought up this point of introducing bias. And the other point might be also a governance issue when you have a proliferation of data, it just keeps exponentially growing. Do you think the industry understands the bias? I mean there are obviously Microsoft and a few others, Google and a few others have already sort of put some time and effort and are taking the concept of bias very seriously, but you know if you kind of leave the big tech out, do you think the rest of the companies understand bias and sort of think about ways to manage or at least measure the risk of introducing data biases into their analytics program.

Jeff Jockisch:

No, I really don't think they really think about it. I don't think anybody is intentionally trying to put bias into their products. But I think that they really don't understand it and I don't think they think about it, and I don't think they have really the tools to measure it. And there are hard problems, right? Because if you don't know what you're looking for and you don't really understand the problem because you're not looking for it, you're not going to find it.

I mean, if you even look at the sort of well-documented case of Amazon's, you know hiring system that was biased against women. And it's pretty obvious that they didn't design the system to be biased against women, right? But it turned out that it was because they didn't think the whole system through and they didn't test it to make sure that the outcomes weren't going to have bias in them, right? And so, it was right. And so, you have to test for outcomes as well as you know for inputs and that's the real problem, right? It's hard to do those things. And unless we have some sort of testing suites and in real understanding and a lot of, you know, people trained in how to do that, it's just not going to happen.

Priya Keshav:

So, the reason why these biases are there is because the biases exist in the real world, so all that AI or any of these algorithms are doing is just extrapolating the bias that exists and sort of maybe putting it on steroids. Let's take the example of Amazon, you know they were biased against women, because if you look at the technology industry predominantly dominated by men, right? So it's the history and the data that is sort of extrapolating itself into biases that the algorithms are not just making them up right? They're based on facts and data from the past. So, in that sense, I'll be creating a new problem. Or are we just now, you know, understanding the problem that already exists.

Jeff Jockisch:

Well, I think as we pump out more data right? The data you right is already going to have those implicit biases in them. And if we start creating more and more algorithms without doing any of this testing or any of this thought process, we're just going to embed the bias, right? And if we don't understand that, it's going to make things worse. And that's why we have to start doing the testing and start doing the analysis and the thinking. You know it's really easier, I think, almost when we used things like credit scores where we only have a few variables that we sort of understand, like FICO scores, right? But as we make those scores more complicated by throwing in things like you know our social profiles and you know other data that we sort of dump into there from the Internet. Then we don't really understand what those pieces of information might include. We run the chance of adding in proxy variables for things like sex and age and other things that we just don't understand right? And the problem is if you have, you know deep learning algorithms or you know just other types of algorithms that are sucking in data and spitting out. So, it's a black box that you really don't understand how it's coming to those conclusions, right? And very often it's using proxy variables for sensitive data points, that are going to include bias. If you're not testing for outcomes, you're going to have bias. It's just very likely.

Priya Keshav:

True, so I was reading your profile and I noticed something interesting, so I wanted to kind of ask you a question about it. You mentioned there that our tracer record tech effectively monitors for data theft by hackers. So, how do you introduce your tracer record into a data set? Then how do you monitor for theft?

Jeff Jockisch:

Yeah, so I'm actually not really doing much of that anymore. It was a product idea that I was playing with, but the concept is really cool, right? The idea is to actually embed security into your data, right? And actually, the FBI is doing some of this and we actually filed a patent, but we didn't get it, and so that's the reason I sort of backed away from pursuing it as a business venture, but the idea is to take synthetic data and embed it into your actual data records. So, say you've got an employee data set or you've got a sales record data set, right? You throw in a few spiked records and those records might have fake names, fake email addresses, fake credit card numbers, fake phone numbers, fake addresses right and when that data gets stolen in a data breach or maybe an employee steals some contact records or whatever the case may be right? If those records get used right. If the phone number gets called, if the email address gets pinged, if the address gets used, if the credit card number gets used, then it essentially hits a tripwire that says, oh, we've been data breached, right? And so, the concept is that rather than having an intrusion detection system around your data where you may not know that somebody exfiltrated your information until, you know, nine months later when they decide to tell you, right? As soon as your data has been taken and somebody tries to use it, you've got a notification. Obviously, they may not use it immediately, and so maybe that tripwire doesn't get triggered, but in a lot of cases, it can be an early warning system.

Priya Keshav:

You spent a lot of time providing privacy consulting services as well as privacy datasets. Any general trends that you see from a privacy implementation perspective that you'd like to share with us?

Jeff Jockisch:

Well, I think that more and more companies are realizing that they've got to do more than just basic, you know, privacy policies and sort of privacy, washing their businesses and they've got to actually sort of embrace privacy by design. I'm working with a couple of companies now that are actually going back and either doing privacy by design for new products or just trying to embed privacy principles into the engineering team as they're developing new products at the startup phase, and it's never really happened before in startups that I've been a part of right, where you're actually trying to teach your engineering team what privacy really means, right? As you're building products from the ground floor, and I think that's really exciting. So, I think that's maybe the biggest trend. I don't know if that's happening everywhere, but it's happening with some companies I'm working with.

Priya Keshav:

No, I think that makes sense, right? So, you also brought up this point before, which is you see some companies taking privacy very seriously and making it a differentiating factor, Apple being one of them. And then there are others that, you know, probably continue to sort of look at data monetization and want to resist privacy. But do you think that, in general, one thing that GDPR and CCPA has achieved is the realization that privacy is here to stay and that people are thinking about how to incorporate. Maybe not all the way through, but some level of privacy into their programs to where, you know, privacy by design is a concept, that is being taken a lot more seriously than maybe few years ago.

Jeff Jockisch:

Yeah, I don't think there's any doubt right that that privacy is here to stay. I mean, there may be some rollercoaster up and down over the next, you know, 10 or 20 years, but you know, GDPR, when it rolled out with the 4% fines made the world, sort of, you know, sit up and take notice and it wasn't probably the only thing, but it's really, go to the rest of the world into passing you know legislation. You know, really across all the continents, right? It looks like China is now going to pass next month their privacy law and that's going to be pretty seminal moment as well. And I think that it's just pretty amazing what's happening here, and even with, you know, the complaints about the lack of enforcement for GDPR, there are plenty of enforcement actions that are happening, and I think it's just going to take some time for that stuff to ramp up.

But companies are plenty scared even if they're not, you know, quaking in their boots about... worrying about sort of the PR impact of, you know, getting into a privacy controversy if they don't take these rules seriously. Uhm, just like you know, people are starting to worry now about cyber security. And even though every company is not taking that stuff seriously, you know really God help them if they're not because cyber security and privacy are, you know things that you just can't ignore any longer. And I really think that it's only going to get more important, both of those things.

Priya Keshav:

So, do you think it's the compliance that's driving privacy and the fear of fines? Or do people realize that they can enhance consumer trust with being better stewards of their data?

Jeff Jockisch:

So I think it's probably, uh, sort of a two-layered thing, right? I mean, some of the people are just in it for the compliance, and you know, to be honest, there are a lot of people making money on compliance and compliance fear, right? But beyond that, I think there are a lot of companies that really see this as a competitive advantage, and there's a competitive advantage to be had right? I mean, I think consumers are actually beginning to wake up and say yeah, I'd rather have this product than that product because there's a privacy advantage. We can see it in the marketplace, it's actually moving people from one product to another.

Priya Keshav:

I mean, if you kind of look at it from a maturity curve perspective. I mean, I would kind of say three years ago we were probably just starting. Where do you think we are and where do you in terms of where they think will end up eventually? So, if you kind of look at it as a continuum, both from a technology standpoint, you know technology maturity as well as the industries adopting these privacy-enhancing technologies. Where do you think we fall now and how long is the road forward, sort of?

Jeff Jockisch:

Yeah, that's probably a pretty tough question.

I think we're still pretty nascent.

I think it's probably, you know, a 20-year run from here. But I mean, maybe that's even shortsighted. You know, Gartner, came out with their new hype cycle for privacy tech, privacy, I guess in general and you know, they've just finally put, differential privacy and federated learning and stuff like that on their hype cycle, and it's just on the upswing of the first part of the slope, right? And it usually takes a couple of years before those things even start to mature, let alone, you know, get on the downslope and then back on the upslope.

So, I think we're just really nascent.

Priya Keshav:

No, I agree. I think we haven't even defined our requirements yet. We're probably. I don't know if we have a 20-year run, but that's just not because we're mature, but more because I feel like we'll accelerate because of both regulatory requirements as well as consumer requests, right? As companies, I realize that this is going to be not. It's not a nice to have, it's a must-have and it becomes more and more part of the program but I do think that, you know, we're treating privacy as an add-on and slowly over time, it'll become an integral part of everybody's life. Yeah, both within the corporation and outside, right? So that when we get there, hopefully, we get there faster than 20 years.

Jeff Jockisch:

So, I mean one of the things that I think it may be a more interesting question right is as some of these new things are starting to hit corporations, right? Like privacy, ESG? I wonder, sort of, how corporate structures sort of internalized those, right? I mean, you've got sort of IT, you know, with all the new cyber security threats and you've got privacy. And you've got this sort of new ethics movement right, and governance movement, you know, how do those things sort of fit to all of those people individually report to, you know? The CEO or you got legal. You know there as well. Do those individually, start to combine interestingly, in some way because it seems to me like that's too many direct reports to a CEO, and so I wonder how that's going to eventually shape out, right? And I mean some people think that privacy should report to IT and some people think the other way around. And, you know, then you've got this. new ethics organization, how do they fit in? And you know, maybe if people should report through legal and who the heck knows, so I'm really interesting to see how that falls out.

Priya Keshav:

No, you are bringing up a really good point. My take on it is you can't have and again you mentioned ethics, you mentioned legal, IT, cybersecurity but left out data analytics.

Jeff Jockisch:

Yeah, yeah that too right?

Priya Keshav:

So, you can't have four trains like you know, you just can't go on parallel paths. At some point, they have to merge or there needs to be some alignment because it just does not make sense. While the journey and there are some nuances to each one and where, you know, I'm not trying to say they're all the same. They're so intertwined that while, you know, for our own personal interests, it makes sense for Chief Privacy Officer to be separate for a Chief security officer to be separate and each one report directly to the board of the CEO and then the CDO to be separate and again, reporting to the CEO or CTO or whoever that person is.

At some point, if you don't have collaboration and if you have too many heads, it just does not. You know you don't end up with the right kind of incentive for an integrated privacy or data management situation, so I do understand that they're all complex areas, and then for a large company there's probably enough for it to be separate roles, but there's got to be a lot more integration than that what is there today.

So, one thing I truly believe in is for us to mature, there needs to be more cross-functional collaboration, which is not just a steering committee or not just a meeting, but a true cross-functional collaboration where the objectives align. And that's something that I truly believe in. But we'll see where we kind of head.

Jeff Jockisch:

Yeah, it's going to be an interesting future.

Priya Keshav:

Yep, Yep, any closing thoughts that you have that you'd like to share?

Jeff Jockisch:

Well, I think people should be aware that, well, the data that we have is only going to grow and it's only going to be more difficult if we don't take privacy seriously. And I think there's so much overlap with cyber security, we've got to be careful there too. I guess maybe one thing I'd like to say is that I am working on one other project that people might want to be aware of, there's a thing called the data breach collab that I'm working on. It's part of the data collaboration align and the goal of that effort, the data breach collab is to try to create a new reporting system for security incidents and data breaches that will increase information sharing because we need a much better alert system for these cyber security incidents that are happening and right now companies are scared to report because they don't really have any incentive. And they've got a whole lot of disincentives with, you know, a PR angle and also you know liability. And so, we're trying to create a system that would allow people to report anonymously so that there's no risk of them, you know reporting and also keeping all that information encrypted and zero-copy environments. But also, to try to give them incentives for that reporting. So give them better access to analytics about cybercrime and these kinds of trends as well as other incentives. Uhm, that we're trying to figure out exactly how that's going to work. But I think it's an interesting initiative, and if people want to find out more, they can contact me.

Priya Keshav:

Sounds good, thank you for your time. It was great talking to you Jeff.

Jeff Jockisch:

Great wonderful talking to you too Priya.