Simplify for Success - Conversation with Jeff Jockisch
ChatGPT has taken the world by storm, supposedly reaching 100 million users in just 2 months.
But what is ChatGPT? What are the privacy and other issues that can arise from frequent and widespread usage of tools like ChatGPT?
Does ChatGPT associate our data with our identity?
What happens with our data when we interact with ChatGPT? Can ChatGPT provide information on individuals? And most importantly, should we build guardrails for ChatGPT?
Tune into our podcast #SimplifyforSuccess by privacy experts Priya Keshav CEO and founder of Meru Data, and Jeff Jockisch, Data Privacy Researcher at Privacy Plan, as they discuss ChatGPT and its impact.
Listen to it here:
Thank you to Fesliyan Studios for the background music.
*Views and opinions expressed by guests do not necessarily reflect the view of Meru Data.*
Transcript:
Priya Keshav:
Today we will be talking to Jeff Jockisch, and our topic is ChatGPT. It seems like everywhere you go you hear about ChatGPT these days; you heard about how revolutionary it is and how it's going to disrupt how we do business. For example, just heard, saw a headline in PC magazine that said, ChatGPT could disrupt 19% of the US jobs, and if yours was going to be on the list and another article on Forbes that said Chat GPT could disrupt Google.
What about your company? I've seen statistics like TikTok took about nine months after its global launch to reach 100 million users for Instagram. It took about 2 1/2 years, but as chat GPD supposed to have reached 100 million users in just two months.
It's not just the user base where you see this type of record adoption, we just recorded this podcast that before we could even publish it, a lot of changes have happened. For example, Italy became the first Western country to ban Chad GPD over privacy concerns. We've seen confirmed breaches due to vulnerability in some of its open-source library, so we may have to record a follow-up to this podcast very soon.
Hello everyone. Welcome to our podcast around simplifying for success. Simplification requires discipline and clarity of thought. This is not often easy in today's rapid paced work environment. We've invited a few colleagues and data and information governance space to share their strategies and approaches for simplification. Today we'll be talk. Jeff Jockisch, hi, Jeff. Welcome to our podcast again.
Jeff Jockisch:
Hello, Priya. How are you doing today?
Priya Keshav:
Pretty good. You are not a stranger to our listeners, but I wanted to give you some time to introduce yourself so.
Jeff Jockisch
Well, I'm Jeff Jockisch. I'm the CEO of Privacy Plan, a data privacy consultancy, and I do a lot of work on creating data sets about the data privacy world and also strategic consulting for the data privacy world, for different data privacy concerns and one of the things I've been doing lately is actually building data sets about data brokers.
Priya Keshav:
So today we're here to talk about ChatGPT, Open AI, and maybe if we have some time touch on the data brokers as well. There was a lot of buzz around ChatGPT these days pretty much every time I watch LinkedIn, or I see a post about how ChatGPT had a great poem about the topic or people are looking to see if it will draft their policy for them. You know, it seems to give this impression that this tool has limitless capabilities. And can pretty much replace humans one day that so many people using it. it just seems like it's the most popular toy, or maybe the most useful toy, I don't know which one it is, but what are your thoughts on ChatGPT?
Jeff Jockisch
Well, I think it's actually pretty amazing. I don't know if everybody, all of your viewers are aware with the adoption of ChatGPT, chat AI in general has been phenomenal. I think maybe the fastest adopted technology ever in the world. I don't know exactly what the stats are, but I've seen a few infographics that show, you know, it's sort of like taking the world by storm faster than any other technology that's ever been released, which is pretty amazing. And it does seem to have capabilities that are going to change our world at a fundamental level.
It is pretty amazing. It can do things that will probably impact, you know, virtually every job that we do, making it easier. I don't know that it's going to really get rid of a whole bunch of jobs because I think right now where ChatGPT is able to help us do jobs, not replace jobs so much. It still requires what we call a human in the loop, so it'll generate content, but that content really can't be utilized. In most cases without human review.
Priya Keshav:
So, I want to talk a little bit about ChatGPT and privacy. Right. So, in order to use ChatGPT, you have to provide your e-mail address and or phone number. I think so. It sort of connects whatever questions you ask to a person.
So, I'm wondering as you sort of start playing with it, asking for it to write poems or write draft a policy. It's probably learning a lot about you, which and associating that data with your identity right? Like I'm wondering how much information somebody is revealing to ChatGPT through just interacting with it.
Any Thoughts?
Jeff Jockisch:
Yeah, I think that's actually an unexplored question. How much we're revealing to ChatGPT and how much it's actually learning. I mean, there's the opposite question. We could talk about in maybe in a moment about what it was sort of trained on and sort of maybe the privacy implications in that direction. But what a lot of people aren't looking at is the question that you just brought up, right? Is, what is it learning about us from our interactions with it? And there's maybe a third question too, and that is. A lot of people aren't thinking about this either.
Is that ChatGPT right now is sort of operating in a little bit of a vacuum, right? It already has a trained model and you're asking it questions and its sort of querying that corpus of data, but you can connect it to external databases. So, you can take that API and add the information to it. You could connect it to your corporate knowledge base, for instance, right and then query that information and when you give it access to all that information, including things like your human resource database. Is it learning that, you know, does it have access to your employee records? I'm sure there are probably some firewalls that can be put in place there, but you know we need to think about what that means when we start giving it access to more information and including our personal information and our corporate information.
Priya Keshav:
I mean, when I was thinking about this and I was reflecting on a, you know, Stanford University research, that was done around Facebook and AI and what it's capable of doing and what Facebook was capable of recognizing about individuals, right, and something that stuck with me was that all they needed was a two or three data points about a user before they could probably predict certain things about a person better than their spouse.
So, you know, a couple of likes and dislikes and comments to a post and that's all you need to sort of feed Facebook before Facebook is able to tell you more than what your spouse can tell you about yourself, which is amazing given that the other person is living with you all your life. And now you your deepest secrets are known by Facebook, right? But if you kind of extrapolate that and think about ChatGPT, and if you're constantly, I mean, while it seems like a fun toy. And the same thing could be true of search engines and, but it's like you're feeding a lot of information and then you sort of wonder what it's learning and how is all that personal information sort of managed. We have no idea so.
Jeff Jockisch:
Right. Yeah, we really don't. We don't know what it's learning about us at this point. I'm not sure that it's learning things on the fly. But you can certainly you can be sure that open AI is actually collecting that information, and we'll, you know, are they going to feed that back into the to the next model, That's unclear, right?
Priya Keshav:
So, another big concern that has been raised again a lot that when it comes to Open AI and ChatGPT is the amount of data. I mean obviously it's as good as it is because it's supposed to have. I saw some numbers like 570 gigabytes of data, 300 billion words from one site. And across all sites, it just kind of knows just way too much information pretty much, you know, everything that is publicly available via the Internet, right? So that includes probably our personal conversations on social media channels, websites that might be copyrighted, books that might be copyrighted.
What are your thoughts on, you know, using all that information, some of which is not, you know, open and free technically to be used and analysed the way ChatGPT analyses it, any thoughts on that?
Jeff Jockisch:
Well, I don't. Know if it's actually scraping social media conversations, but it just definitely could have some conversations. We know that a lot of the data it has is from gigabytes of data from common crawl which is a large data set of website crawls right? That is from 2008 to the present. And it doesn't explicitly say that not like you know website Crawls from Facebook or Twitter or social media sites. My initial thought was that was not the case, that it was more like to sort of like regular web pages, but it's still not clear. I guess we could, we might be able to actually find that out by doing some more research and probably we should figure that out.
But you know, there's other sort of web text that that's in that corpus that's from links that link out from, Reddit articles that have a lot of uploads, right. And there are also a couple of different book corpuses that are all books that are not copyrighted, that are supposed to all be public domain books. That's sort of interesting, right? So, they're the idea behind a lot of the data that's in this corpus is they're trying to not get stuff that's technically copyrighted in a particular way. But it's also not necessarily in the public domainspecifically, either right? It's nobody's essentially saying, hey here copy my data so it's sort of in a gray area, right where they're grabbing this data.
Priya Keshav:
Yep, yep. I mean there was something that I want to point out I was listening to this NPR article and there is a correlation to this and It's kind of not related to ChatGPT but something else and they were talking about artists, and they were talking about all these AI tools that've developed now. And they were referring to one particular company or website that was allowing artists to upload their art for free. I think it was deviant art and then what they did and so obviously a lot of artists were uploading their art for free and into this tool and the tool itself was learning from it. And there was an AI ML model built out of the art. And so, what it now allows for people to do, and I think there was also going to be some questions around, you know, whether there would be a paid model to use the AI. But you could ask for art and it would produce art based on a certain style of painting So tThe artists were questioning whether that is copyrighted material that is being used without their permission, and whether you know the fact that it was allowing them to upload art for free meant that that could be used to train the AI ML data.
It was kind of fascinating story that I was listening to an NPR. I was kind of thinking it's the same logic, you know, or the same analogy that you can kind of apply to ChatGPT here. Where we all post data to the Internet, but not necessarily for the consumption of ChatGPT, but some of it is data that is probably owned by others, but it's feeding the ML models and you don't know what is being fed to it.
And I in fact saw the poster on LinkedIn where it was where somebody had asked ChatGPT of their companies views or basically what ChatGPT thought of that particular company, and it had provided like the, its own thought process in terms of What the company was supposed to be good at and what based on reviews, posts and what it was meaning in general, which was kind of interesting because if we could do that kind of analysis for a company, then it could probably do the same for a single person.
So I'm sure if you go to ChatGPT asked, “Hey, what do you know about the Priya Keshav?” It's probably not doing to do that not true. I’m not famous enough, but it would if we asked ChatGPT about Brad Pitt , it's probably going to give you some data on Brad Pitt, so it is going to be depending on how popular you are. I'm guessing and how much information about you exists, but one would wonder if it does have anything to say about Priya Keshav, for example.
Jeff Jockisch:
Well, I think you bring up a lot of interesting points. And I wanted to sort of clarify a little bit about my previous statement. I'm not sure I was very clear. I think that there's a lot of content that we put up on the Internet, right that does not have clear purpose limitations on it, right. So, we post something to the web, we post information to our LinkedIn profiles, and we're not really clearly specifying what we expect that data is used for. Or what's it's going to be used for? But when I put my information up on a LinkedIn profile, I don't expect that it's going to be scraped and used for a bunch of other purposes, right? Or when I load art up to deviant art, right? I don't expect it’s necessarily going to be grabbed by somebody and used to train a machine learning algorithm
And so that's really where the question is going to rest in a lot of these court cases that are that are being litigated even now is that scraping that’s happening by these organizations that essentially, is turning into corpuses that are feeding these machine learning algorithms. Is that legal? And we don't know where that's going to end up. But it's a really important question. I My take is that the courts are probably going to allow this to happen and part of the reason is because there's already some Supreme Court precedents that say that that web scraping is OK unless it's sort of like explicitly illegal.
There's another problem that these machine learning AIs aren't explicitly reusing data. So, for instance, in a search engine, if I search for something and I search for your profile Priya and it returns that result, it's actually returning that particular content. But if I ask ChatGPT about you right and it was to return some information. It's not actually returning any particular page or content about you, it's actually returning some sort of synthesized information about you from multiple different pieces of information that it might have on you, right? So, it's not actually stealing that information from anyone particular source. And so that synthesis of information is that stealing, and that's a much more complicated thing for the courts to determine. And so, they still could say, yeah, that's actually stealing. It's just stealing from multiple different sources and using it in a different way.
But that's up for them to determine, and I think that's a harder thing for them to wrap their mind around And if they were to actually say yes, that's stealing and turn off this spigot that all this machine learning and ChatGPT and all the stuff that you know, the text to image stuff is doing, it's going to have a profoundly huge negative impact on innovation. And so, while there might be some reasons for them to do. My guess is they're not going to. I think they're going to allow this technology to flourish.
Priya Keshav:
I don't know where the courts would go, but it's an interesting topic, like you talked about the purpose limitation, right? But I'm leaving copyright aside, I think that's for the courts to decide.
But let's talk about the privacy itself. You just said that it wasn't going to reproduce content from somewhere, but it was going to synthesize stuff about me from various places. So, my question to you is sometimes ChatGPT is completely wrong, right? And we've seen that in some instances where it has provided either wrong information or false information or false conclusions.
So, what happens if ChatGPT sort of decides to throw? I mean, again, whether ChatGPT would care enough to write about me is a different question, but if it does synthesize, and if it happens to be a completely wrong information about me, what does that do for my privacy? What does it do for purpose limitation and how does one sort of deal with those kinds of issues that might arise from tools like ChatGPT ?
Jeff Jockisch:
Well, that could be a whole different barrel of monkeys, I guess.
Priya Keshav:
Yes
Jeff Jockisch:
You know, So, there's a couple different issues that right, there's privacy concerns, there's disinformation concerns, and there's, ChatGPT actually making up things. So, and I think disinformation and AI dreaming are really two different things, right. So, one might be actually, somebody putting disinformation about you into the corpuses in the machine and ChatGPT or like technologies, actually just resurfacing that, right? So, what if somebody puts this information out about you into the corpus? Right. How do you get rid of that?
Priya Keshav:
Yeah
Jeff Jockisch:
OK, that's a problem, right?
Priya Keshav:
Yes
Jeff Jockisch:
But then what if it's just making stuff up based upon, you know, there's good information in there, but it's making stuff up. It's drawing inferences that are incorrect.
Priya Keshav:
Yeah, I mean, we've talked about, I mean, in this case, you know we've talked about certain types of individuals. I mean, sometimes there it extrapolates, you know, not necessarily in this context. But AI is supposed to have shown to discriminate against certain types of individuals in hiring and things like that. So, the same could happen in these kinds of conversations where it might be picking up something that is completely wrong or making up false information. So, how do you sort of correct control manage those things?
Jeff Jockisch:
Thing I don't think we have a clue yet, right? And we know that these models are going to have. Implicit biases and then based upon the fact that you know all the data we used to train them have implicit biases in them.
Priya Keshav:
Yes
Jeff Jockisch:
I think that's going to end up coming into the answers.
But the Dreaming portion, right, the fact that these AI's can sort of like make up things. I don't think it's actually a widespread problem, but I think when it happens it's a huge problem. It may only be like one in a million answers that it actually sort of like dreams things up, like facts. But the fact that it does that, you know, in that that one in a million answers and you know potentially make something up about a person right in a critical situation could just be devastating. And you know, if it's, it happens more than once in a million it could be really problematic. I mean depending upon the use case of course, right. And if he's making stuff up about people? God, I mean, that could be like lawsuits. It could be, you know, really bad lawsuits. So, they got to figure out how to control that. And if they don't, it could have, you know, severe ramifications that could, you know, that could be kind of lawsuits to put you out of business.
Priya Keshav:
So, another question though, do you think it is violating existing privacy laws like GDPR? Because I don't think it has. I don't know if there is a way. I haven't seen one, at least for to ask and. You know, ask for or get access to specific personal information, or is that based on the pieces that most of this is publicly available, information that it is looking at? So, there is nothing that is personal, you know, for them to sort of adhere to the PI or the rights that are typically offered by or under GDPR or CCPA.
What your thoughts are on that?
Jeff Jockisch:
Yeah, you know, I mean, I see a lot of people complaining about it, but I'm not sure I've seen any real specific violations yet. I mean, if it starts to collect information about us and it starts to publish it. Maybe I don't know that ChatGPT itself is a violation. I could see that people could apply the technology in ways that would be a violation.
So, if you implement it in your company and hook it to databases in ways that could violate it very quickly. You know I made a comment on a post by somebody else and I could really see like a data broker applying this technology to, you know to linking it up to their database of personal information and letting you know law enforcement or other customers query their database and, you know, ask for all the information about Priya. Or specific questions about Priya. That can be pretty interesting and maybe a violation of GDPR or other privacy laws, right? Be interesting. Scary.
Priya Keshav:
Yep, yep.
So overall, do you think that ChatGPT has is a good thing? You did mention that stopping it would be a bad thing for us. It would definitely reduce the amount of innovation that needs to happen in this space. But what would be some of the good guardrails that needs to be put in place?
Jeff Jockisch:
Well, I mean, I love the technology because I think it's like amazing. But I also think that it's scary as hell. So, I think, yeah, we need to, we need to use the technology, but we definitely are going to need some guardrails?
So, I don't think we know what those guardrails even look like yet, but you know that there are AI technology regulations that are coming up right, Europe is proposing some, Canada is proposing some and NIST just came out with some new AI regulations. For Humanity is doing some awesome work in AI regulation and certification, there's all kinds of AI ethic models that have been coming out recently. There's so much work that has to be done there.
I'm not sure, we've really quite grasped the exact way to understand and regulate AI yet. But we're starting to get there. And companies really need to start understanding that if they're going to create, slash, deploy these kinds of tools that they can't do it in, in a vacuum. They have to understand that there are standards and regulations that are going to be applied to them. And it's going to affect, you know, whole lines of their business, especially if they're in regulated industries like healthcare and finance and anything that, has large amounts of personal information.
Priya Keshav:
Makes sense. Any other thoughts on ChatGPT ?
Jeff Jockisch:
I think it's just changing so fast that in three months we're going to have a different view than we have right now. Maybe this conversation is completely obsolete in three months. It's changing that fast and it may. There may actually be another player that we're not even talking about ChatGPT. We may be talking about a whole different, you know, player that's leapfrogged them because this this technology is evolving so fast.
If you actually look at what happened with you know the text to image AI stuff. They keep leapfrogging each other. You know, first, you know, it was it was DALL·E and then it was Stable Diffusion, and it was Mid Journey. And, you know, three months from now, it maybe DALL·E again is the one who's on top and you know, then somebody else and then somebody else, somebody we haven't even heard of. And so ChatGPT may be on top right now, but three months from now may not be.
Priya Keshav:
No, I agree.
I see that maybe we'll catch up again in three months and talk about how this has evolved. But it's a fascinating area. I see a lot of potential. I saw just somebody looking at how we could answer possibly some legal questions. So, you see a lot of industries sort of trying to see how they can use it for at least answering basic questions or you know, to make somebody's job more efficient, and even those who are trying to kind of write those policies, they've they felt that it was pretty good. So, it's been producing a lot of good result. But yeah, as you said, it's evolving fast. It's being used by a lot of people, but it's important just when you're, you know when you have an AI like that that's taking in so much data, it's very critical for it to have some guardrails in place and so that it doesn't violate any of the other laws that protect the privacy and the interest of the people.
So, but thank you so much, Jeff, appreciate you joining us today.
Jeff Jockisch:
Oh, I appreciate the opportunity Priya, It's wonderful talk again.
Priya Keshav:
Thank you.
Comments