Simplify for Success - Conversation with Jake Frazier
As the practice lead of Information Governance, Privacy & Security, Mr. Frazier helps legal, records, information technology and information security departments with e-discovery and IG.
He discussed the challenges in implementing data minimization and ways to overcome them.
Listen to it here:
*Views and opinions expressed by guests do not necessarily reflect the view of Meru Data.*
Hello everyone, welcome to our podcast around simplifying for success. Simplification requires discipline and clarity of thought. This is not often easy in today's rapid paced work environment.
We've invited a few colleagues in data and information governance space to share their strategies and approaches for simplification.
Today we will be talking with Jake Frazier. Hello Jake, welcome to the show.
Thanks so much. Happy New year.
You come from the consulting world, and you've been with the FTI for many, many years. Would you like to kind of introduce yourself and tell us a little bit more about your experience and what you do?
Yeah, sure, absolutely. In my current role at FTI, where I’ve been for quite some time, as you mentioned, I’m a global practice lead of a practice called information governance, privacy and security. So we're a team of about 50 folks around the world that work with corporations to solve problems wherever law and technology collide. So those, you know, nowadays are regulatory retention requirements, legal hold preservation, privacy regulations, which there's no shortage of new regulations to wrangle with, and cyber security. So that's my role.
So today we're going to talk a little bit about challenges in implementing data minimization. First of all, what is data minimization and why do you think it's difficult to implement?
Yeah, so you know, I'd say around 2000 or so, right when we saw E discovery hit, there was a big case Zubulake versus UBS warmer that really shocked the corporate world with regard to preserving data.
And I'll get to a second to my definition of data minimization, but I think it's important background to realize a lot of companies you talked to their corporate counsel and said, OK, you know you want us to preserve, well tell us what to preserve and they would say we don't really know yet, so just preserve everything. And so over the past, you know, 20 years or so, that's where a lot of corporations have found themselves, in the save-everything mode. Same with regulatory retention. People really, you know, couldn't quite figure that out and they said, OK, let's just save everything, it seems the safer bet.
So fast forward to GDPR, the General Data Protection Regulation passing just a couple years ago. Our big coming into effect just a couple of years ago. And that kind of turned everything on its head because in GDPR, there is this principle of data minimization. And the concept you know comes from these European privacy laws that say, look you can't keep private data of somebody for longer than what's called the purpose of use limitation.
Basically, if you've got customer data and that customer ceases being a customer for two years or something reasonable, there's an expectation that you'll achieve data minimization and not hoard all of that private data, so that's really data minimization, kind of one aspect in the privacy laws. So if you're a brand new company that's starting business today, you know, pretty straightforward, you can read the laws, determine how you're going to retain data, and as you put repositories or systems online, you can kind of plum in the retention. But if you're not a company that starts, you know, from scratch, and you have 20, 30, 50 years of history, there's a lot of legacy data, and it's commingled, some of it’s offline, very difficult to go and precisely determine what to save and what not to, to accomplish that data minimization. The good news is there's plenty of best practices to do so, but it's definitely not easy.
No, I agree, and I think like you said, I remember Zubulake very well and at that time after Zubulake it just created this fear of what if I don't preserve something and so pretty much everything was about making sure you or legal hold and is working well and you're keeping as much data as possible and then there is the other part too which is the whole digital transformation and analytics.
So I was talking to a Chief Data Officer several episodes ago and one of the things that he was pointing out was analytics is about asking questions, and so you can't predict all the questions that you want to ask to your data. So you kind of presume that, well, if I keep everything then if a new question comes I should be able to answer it or, and also from a collection standpoint, you, sort of, say I don't know how I use it, so I might as well collect as much as I can.
So, you know, together with the approach of collecting as much as I can without thinking about how much of it I'm going to use and keeping everything for analytics as well as the idea from a regulatory perspective, fear of spoliation has meant that everybody has kept everything forever. So, coming back to though, you said yes, if you're an organization who is 10, 15, 20, 30, or 50 years old, you have old data, so is it just a matter of going back and dealing with legacy data or are there other fundamental challenges to data minimization that sort of makes it much more difficult to implement?
Yeah, I mean it, you know it's a great question. It's sort of, kind of, where do we start and what sequence do we go in right when we're looking at data minimization like, you know, say there's the first step, which is really kind of at a policy layer is to determine OK what regulations do we have that we need to, you know, make sure that we retain data. If you're a stockbroker, there are three and six year limitations that the SEC and FINRA place on you that you need to retain client communications, just as kind of one example.
And so you know, first, it's figuring out those retention policies. Second, as you mentioned, legal holds, this where it gets a little bit more difficult. If you’re a large company, you might have hundreds of matters that are ongoing in some stage of litigation or government investigation in determining exactly which custodians which data sources to place, you know that are subject to the legal hold is very important, and that can be done is, uh, you know starts to get difficult. And then, you know, what about your business, right? You brought up the concept of Chief Data Officer, who, oftentimes represents the business’ view of how can we monetize and leverage this information? So even if you don't legally need to keep it, it could be that the business wants to keep it in order to leverage it, right?
In that last category, kind of business value, you know, sometimes it's interesting to talk to business leaders that say, yeah, we want that data forever, because we're going to mine it, we're going to find patterns, you know we're going to… it's going to help us figure out, you know, who buys our products and who we should sell to, and so forth.
And so oftentimes you can, kind of, accomplish both and say, okay, well, do you really need their Social Security number to determine who is buying which products? And they’re like “no”, okay, great. Well, let's either remove that, you know, from the database or mask it, or tokenize it.
And there's lots of different approaches here- pseudonymization, anonymization to make sure that Social Security number is not sitting there, and you can still use your data, but we've minimized the risk. So once you have those three, you know figured out the regulatory retention, the legal holds and preservation, then you at least have you know that central kind of brain of what is it that we need to keep?
That's when you kind of go down to the data life. Oftentimes, it's today's current tools. Let's take Microsoft 365 as a repository. It has, you know, some built-in capabilities for legal hold and for retention. And that's great. The issue is you got to look a little deeper and say well, do we have backup tapes that we have stored over the years perhaps in response to the preservation scares? Perhaps just no one knew any better and we sent, you know, quarterly folds to Iron Mountain every quarter for the past 20 years.
And the problem is, if you dispose of your data that you don't need to keep in Microsoft 365, but those same messages that you just dispose of our on the backup tapes. You've actually just kind of moved the problem, rather than solve it. So that begins kind of the analysis, I think, for data minimization is that what you're seeing or do you see a little bit different approach?
So you brought up a number of things, so I want to go to the first point that you talked about, which is your retention schedule. So that itself, and then we'll come back to the data lake and then Office 365 because I just feel like there's so many, yeah, too many things in your conversation, right? But going back to the retention schedule. So typically, the retention schedules were drafted to talk about how long you need to keep, not about when you need to delete so, you know, it's new when you turn it around and sort of say OK can I dispose it off? It just becomes a completely new, it's almost like you have to look at it from a perspective of doesn't work and the other is because it was focused around how long you need to keep, it was about regulatory requirements.
And often times, especially when you combine it with privacy, at least what I find is that typically customer data, some parts of PII are part of intertwined with certain regulatory requirements, especially around employee data and things like that. But a lot of customer data that has PI, for example, our new sources of information, your new IoT data, your new AI data, your new sources of information that may not be technically a record and they sort of fall under this. So you brought up this point about do you need your Social Security number, so the deletion of single data points or PI sort of doesn't tie very naturally with the retention schedule and also looking beyond records at your content itself and being able to kind of say you know what kind of retentions I should apply also is a new topic so have you found challenges in terms of getting? It's almost like you're reading the book differently, and now some of it maybe should be rewritten and we have an industry and sort of a method of doing it, which maybe is, you know, worked very well in the past, but needs to be tweaked and fundamentally revised and changed for the for the new world? I don't know if that's the best way to explain it or do you do you agree or do you don't? Or you see that too or what are your thoughts around that?
Yeah no, absolutely. I mean, I kind of talked about the broker dealer requirement from the SEC from a retention standpoint, and this was written a long time ago, so you know it's been updated over the years with guidance and things like that. But I mean, for the most part, and this is why I'll give you my disclaimer. I'm an attorney, but you know this is not meant to be legal advice or anything like that. But if you read the reg, the reg says OK, you need to keep now I'll paraphrase and I'll give you a quote. You need to keep you know all information that is client communications for three years and when you look for the reg where to say well, well, what does that count as like?
What do you mean it will say? It says literally, you know, that pertains to your business as such, so that's a case of a regulation that might make sense if you had paper and files and things like that. But if you're looking at, Wiki in Microsoft 365. You know that even will say IT is right? Does that have to do with your business as such as a broker dealer? Don't know, right? So there's a lot of gray area from these new sort of data sources. So yeah, I mean I do think it would be great if we could get them rewritten. We probably won't be able to, and so you know, that's where it comes on, you know, folks like us and outside counsel to make the calls where there's a gray area.
Yeah, and that is a good point, right? There are a lot of gray areas and so working through them is important and it's an important step that takes time and effort. So assuming that you know we have a good retention schedule and know when to delete what. Then you brought up some points around deletion itself. It's not easy to pull the trigger because you have the fear of, of course data integrity issues because these systems have been forever and removing something or tokenizing something, you have to do the impact analysis and due diligence, which means implementing that is again a challenge in itself. But coming back to, like, you do have newer technologies and now there's been a lot of new startups and options that come with it around privacy enhancing technologies which will allow you to tokenize, anonymize, pseudonymize some of this information. So some people believe that it gives you a false sense of security that now it's anonymized, so you can once again go back to the new normal, which is you can keep as much as you want for as long as you want, because I might be able to kind of figure out how to use that data versus you know, still looking at it from a data minimization standpoint.
So do you think privacy enhancing technologies like that that can synonymize sort of work counterproductive to data minimization? Or do you think they are both tools that sort of support the same goal?
Yeah, I mean you know, I'll give you the standard lawyer answer of “it depends, it depends”. I mean, you know, I will say that I have examples of clients where the business gets data and get some reports because they want to see what's going on and what their customer base is using and liking and so forth. And I've seen where they get data that instead of a name, it's a scrambled up, you know, NQ3!... you know, just sort of gibberish and it's very frustrating, right? For the business so. So on that some kind of one end of the spectrum, the other end of the spectrum, you know, if a hacker gets into a system and you've got Social Security numbers, that's a big problem, and so that's really that kind of balance, right between data availability for the business and data minimization for the privacy regulations.
You know, it's kind of interesting. We'll just go system by system typically and just say OK, let's look at what's in the system. Is it a system where, you know, it's current and it really, you know, it has a lot of business value? Does the historical data have business value or is it stale? If it's stale that oftentimes there's no problem with applying disposal, as long as that system can support it. You know, where you get, let's say, upstream oil companies that have databases of coordinates of various geological studies and surveys going back 100 years. Yeah, it's going to be tough to apply data minimization to that pretty valuable data. But fortunately, that data doesn't have lot of private data, so you just kind of have to do an application-by-application kind of a master data management approach. I mean, there's tools out there that certainly can help. It's really, I think, a little bit more of the people and the process, you know, to that need to be added on top of the technology to make sure it's being used properly.
So what about the data is used for is to understand normal versus abnormal patterns, a typical user behavior, you know. How do you balance needs around fraud and bot activity and bunch of other things that are sort of more security related as well as balance the requirements and needs to have data for that versus data minimization?
Yeah, you know, it's an interesting point. If you look at the same systems that are out there that are kind of collecting all the various alerts from threat vectors, right? This in the security space. It's interesting because getting data is typically not the problem. The problem often is that you'll get you know 100,000 alerts a day, right? So you know, I think it is important to keep that information in a sandbox so that you can do threat hunting proactively. And you need a good data lake with going back for some amount of time so that you can spot those patterns 'cause you know hackers and others will often take years to accomplish what they're trying to accomplish, and you can miss it if you look for a very short period of time, like a month, or 60 days.
So you know, that's another system, the SIM system, right? SIM would be the system you look at and say look for this one, we need to keep the data. I'd say that most privacy laws do have a concept baked into them really around this idea of legitimate purpose, right? So let's say if you say, we're keeping all this data, sure, there's some private data in there, but we need it to be able to do threat hunting. You know, that's that certainly sounds like a legitimate purpose. I think where the privacy laws are really focused is more where we're keeping around all of our customer records from last 25 years, not 'cause we need them, we just don't feel like tackling that problem and spending any money or paying attention to it. That's where that data minimization, the privacy rights, I think, is more focused.
So going back to Office 365, our favorite topic. So obviously with COVID and us being more remote nowadays, has accelerated adoption of Teams, has accelerated adoption of Office 365 and other technologies. With it, we're also producing data in very, very large amounts.
I was looking at some numbers on the growth of Teams users and it was staggering to see that growth in the last year alone and that kind of also applies to Zoom and other applications. So how does data minimization and information governance policies and management of some of this data?
So how do you think, did our existing policies do? You see, they've been added, quit or have you seen clients go back and revisit them? What are some trends and as well as challenges that you see in this area?
Yeah, you know it certainly, takes up a lot of our time, these days. You know, Microsoft 365 has offered our corporate clients a lot of cost savings, a lot of efficiencies., it's certainly more functionality with regard to various information governance use cases, like retention and security, data leakage prevention, legal hold, you name it. There's modules that do a lot of really great things in Microsoft 360.
But to your point, I think, the new features on the new paradigm of work from home and hybrid work environments, definitely accelerated Teams and Yammer, and some of the other sources and even outside of Microsoft, Box and Dropbox and Trello and Slack, and on and on. And, you know, the analogies that retention finally, kind of, got to be able to handle email don't really apply to Teams.
A lot of organizations that we've seen, we've worked with them to implement retention in Teams. That's very different from email, so email might be, you know, six months or two years if you declare a record or something like that. Teams has been, in our experience, treated a little bit more like an ephemeral data source. And the policy is going to read something like, for Teams, that's meant for more, kind of, transitory information is not meant to be business records, and for at least the Teams that is, you know, kind of one-to-one or one-to-many chats, the retention might be five days. And just sort of rolling five days and then you can see as a user that your conversation gets chopped off after. So that's something kind of brand new, right? It's sort of a Teams-specific retention best practice that a lot of companies have adopted, and I think that's absolutely been thrust upon corporates because of Covid, but because of the heavy reliance on Teams and that really, like I said, applies to all sorts of other non-Microsoft sources as well and Teams is perhaps where we see it most acutely.
Yeah, I was also glad to notice that the default retention policy if you don't set anything up on Teams recordings, Microsoft was recommending 60 days, which is kind of really good because, as you said, there shouldn't be a need to keep data unless you know in extraneous circumstances where there is something that needs to be preserved for the most part, this should be treated as data that is kind of temporary that needs to just go away after a few days for sure.
Any other closing thoughts, maybe from a cultural perspective, technology perspective in anything else that you know from an overall data minimization, things that you have seen that are sort of positive as well as sort of challenges that we all need to address as an industry?
Yeah, I mean, I really see a lot of folks struggle with the concept of the right to be forgotten in the privacy regulations. And, you know, this sort of starts with the data subject preparing an access request says, hey, you know what data do you have on the bank that I used to use six years ago? Answering that question is already hard right, none of the systems that were built back there, or at least most of them don't sort of have this notion of hey across our 2000 systems, where’s customer you know number 123. Where are their data? You know, that's tough. And then, should that data subjects say well, I want you to erase it all, that gets real tough right?
Because especially, databases under structured data applications, oftentimes, you can't just pluck data out in the table without causing some damage, and so you know that's kind of a tough one because it's reactive to an individual or group of individuals. And if it's for those companies that have been around for a while, you know, oftentimes those applications don't really support those use cases, so that's a tough one.
And you know, we've seen some best practices of all. There's some technology that can help, but it does certainly take a cross functional group across the organization, you know the chief data Officer or Chief Technology officer, CIO, as well as privacy, compliance, legal, HR, sometimes all kind of work together and have a seat at the table.
Each one of them, I think, has a necessary but not sufficient piece of the puzzle, so as long as they work together, the problems can be solved. But if there's any silos in the organization, it gets pretty tough.
You guys did this study with IAPP and we refer to that data point all the time and, I think, you mentioned the right to be forgotten, or data deletion and of course, the building of the data map, which is near and dear to us, are two of the toughest things that organizations find challenging to tackle and precisely for the reasons that you mentioned, right? Which is both of them require a cross functional alignment. And cross functional isn't very easy because we're not structured to do so. It’s just something that is fundamentally different from the way everything else within the organization is structured.
So for some reason, the execution becomes tough because we can't get everyone to align and working together. At least that seems to be one of the things that I notice as well, so.
Yeah it is. It is nice to see when it comes together though because, you know, all those folks sitting around the table, their interests are all aligned. It might not seem like it right. IT might think legal is making them keep everything, legal might think IT is making them keep everything because IT can't provide search functionality across various sources, right? But when you get them, you know everybody at the table, you know, once you kind of do the marriage counseling, you start to see the cross talk and it's oh you have a tool that we'll be able to analyze for this? Can I use that for e-discovery searches? And then everybody, data minimization is something we see as a rallying cry for all of them. It makes everyone's job a little bit easier, saves some money, you know, fewer systems to manage, really maintaining upon for future fishermen to fish in. You know, it's not something that's in the interest of really any of those stakeholders, so you know, it's good to see podcasts like this that can help kind of get the word out and you know, get that collaboration across the table and get the folks to the table where good things can happen?
So it's a pleasure to be here.
Thank you for your time, great thoughts and thank you for taking the time to participate in the podcast.
Alright, thank you.