ISMG Editors: Will We Ever Get a Handle on API Security?
Also: Why We Should Care About the New York Times' Copyright Lawsuit Against OpenAI Anna Delaney (annamadeline) • January 12, 2024In the latest weekly update, Information Security Media Group editors discussed how the surge in API usage poses challenges for organizations, why good governance is so crucial to solving API issues and how The New York Times' legal action against OpenAI and Microsoft highlights copyright concerns.
See Also: Wipro, AWS Team Up to Address Compliance in the Cloud Era
The panelists - Anna Delaney, director, productions; Mathew Schwartz, executive editor of DataBreachToday and Europe; and Rashmi Ramesh, assistant editor, global news desk - discussed:
- How the increasing prevalence of software components interacting through APIs poses a challenge, as many organizations lack awareness of how many open APIs they have, how secure those APIs are and how to properly control access to them;
- Guidance on initiating an API governance program;
- Why The New York Times is suing OpenAI and its major backer Microsoft for alleged copyright infringement and what the case could mean for companies using the latest generative AI tools.
The ISMG Editors' Panel runs weekly. Don't miss our previous installments, including the Dec. 29 edition on looking back on ISMG's interviews in 2023 and the Jan. 5 edition on why ransomware victims are still paying ransoms.
Transcript
This transcript has been edited and refined for clarity.
Anna Delaney: Hello and welcome to the ISMG Editors' Panel. I'm Anna Delaney, and this is a weekly program dedicated to keeping you informed about the latest developments and news in the fields of information and cybersecurity. And of course, AI. We've a merry gang of editors joining me today, Rashmi, Ramesh, assistant editor global news desk, and Mathew Schwartz, executive editor of DataBreachToday and Europe. Good to see you both.
Mathew Schwartz: Great to be here.
Delaney: Mat, you were talking about the escalating use of APIs and the challenges many organizations face in managing them. So tell us more.
Schwartz: Yes, so this is an interesting report that's just come out from Cloudflare, looking at dynamic traffic flowing across the internet. So much of it these days is handled by APIs - application programming interfaces. And however sophisticated that might sound, all it boils down to is one software component communicating with another software component. So every time you go on your phone, if you're like me, and you're checking the weather in a slightly obsessive manner, sometimes, every time you ping it to update, that's an API doing a call to a server saying, hey, give me the latest weather data, and it flings it back at you. So very, very pervasive. As you can imagine, it's already pervasive, and it's growing even more pervasive. So we have this interesting report that's come out, which gives us some trends. For example, Cloudflare, said that the amount of API traffic crossing the internet, that it's seen has continued to increase is now accounting for 57% of all dynamic HTTP traffic. So dynamic means things that are generated as a one off, so like I said, you're going to get your weather or you're checking your email and doing a handshake to get that back. So many different things, your bank account, all these different things are typically handled by APIs. So the visit the location the device, anytime that changes that's dynamic, you're getting a different response than somebody else will. So huge when it comes to IoT platforms, ride sharing, or rail, as we were just talking about with the Toronto bus, taxi, legal services, multimedia, games, logistics, supply chains, these are industries where a huge amount of their traffic is down to APIs. So this leads into some interesting discussions inside organizations that CIOs and CISOs should be leading. Because Cloudflare can analyze the kind of data that's flowing across its networks. And some of these APIs will announce themselves correctly, many of them don't. And so that has led the organization along with a little bit more sleuthing to conclude that about 1/3 of all API traffic is not being accounted for by the organizations that own it. So basically, there's 31% more API endpoints than organizations know about. Why is this a concern? It's a concern or we wouldn't have a report about it. But it's a concern because if you're trying to secure APIs, you need to know that they're there. If you're using a service like Cloudflare, or other DDoS defense providers, others exist many other options. It helps to know what good API traffic looks like. One of the ways that you defend against DDoS attacks is turning down your response rate. So if hundred, let's say API calls per second is typical, and it spikes to 10,000, and you think that's because you're having a DDoS attack, you're going to attempt to defend against it in a certain way. If, however, there's been an increase to 5,000, or 6,000 in normal traffic, maybe because the Securities and Exchange Administration's X or Twitter account got hacked with information that suddenly Bitcoin, spot trading is going to be allowed. And there's this huge rush for people to trade cryptocurrency, well, maybe you need to be able to support that kind of behavior. And that is the new normal. So you need to have better insights basically, into what is going on with your APIs. Sounds fine in theory. But the report here and the finding that there's just been this surge in API traffic is a big reminder that organizations need to have some sort of formal program for looking at their APIs, not just looking at them, but discovering them. Because as with so many things, there is a huge shadow IT component. APIs may have been stood up by parts of the business without checking in with command and control, IT management, IT administrators, the CISO, the CIO, could be a different business unit in a different country. It could be that whoever was keeping track of it moved on and people have forgotten that it exists. We have seen some massive data breaches, because of APIs that were created, for example, to pass billing information to companies that work with healthcare entities, and somebody figures out that this API exists. And they can send out an API call and get the information back as well. Many times these APIs are not being properly defended, for example, by requiring credentials to access them, or by limiting the organizations that can make the API calls. So many security components, as with all things involving cybersecurity, and IT, but so many things, which if you take the time to do some good discovery, have a good governance program, also think ahead about what you're going to do if these get misused. All of that can pay massive dividends in the event that things break down. Hackers come calling, steal data via APIs, and you're having to do clean up very quickly, and try to figure out what happened and what to do about it.
Delaney: Huge problem, this API challenge for organizations. And before Christmas, I spoke with Sandy Carielli, who worked on the report, the eight components of API security, which I recommend, it's worth a read. She highlighted a positive trend. So she said a year ago, inquiries, mostly focused on API discovery. Organizations now, she sees, recognize the need to invest in discovery as a foundational step. And the current inquiries she's receiving show a notable increase in API security testing concerns. So questions surrounding protection, detection and response, and ensuring correct acre construction, establishing, as you say, a robust governance program and ensuring production throughout the API life cycle. So perhaps this shows a maturing in organizations understanding and management of APIs, but there's still a way to go, as you've highlighted there. And to your point about governance, do you have any recommendations to organizations about how they can go about establishing a governance program or perhaps where to begin?
Schwartz: Great question. I think if you need some muscle or some impetus to convince the board or senior management that this needs to be taken more seriously, there are some regulations that are going to require this. I think the healthcare sector we're seeing some moves to ensure organizations are paying close attention to this. Also the payment card industry's Data Security Standard, the PCI DSS v4.0, which has been circulating for a while, it's set to take effect at the end of March. And that is going to require for the very first time API security checks, at least in the code review and the testing process. They're looking at any attempts, for example, to abuse or to bypass applications, features and functions ... while in applications features and functionality via manipulating APIs, so basically using APIs to grab data from databases, that sort of stuff. Another best practice, which isn't going to be mandatory until about 12 months from now is knowing what API components you have, even if it's in a third-party components or software that you buy. So that's a bit more of a supply chain thing. But definitely, that shows the direction of travel, you need to be keeping an eye on these things. Unfortunately, there's a lot of legacy tech, as with all things, enterprise IT, lots of legacy stuff. So this move toward discovery is great. I think it does need to get extended to your supply chain, not just the stuff that you build, so that you have a sense of what is there because so often, there's stuff you didn't know about, which can hurt you massively unless steps were taken to lock it down.
Delaney: Great advice. That's excellent. Thank you, Mat. Rashmi, The New York Times has filed a lawsuit against open AI and its primary support of Microsoft, accusing them of copyright infringement. So it's getting thorny in the AI world, talk us through this case. Perfect. So a little bit of background first. So The New York Times said that OpenAI used without permission, millions of its copyrighted articles to train large language models that power chatbots like ChatGPT. It also said that ChatGPT's responses were often nearly identical to Times' articles, but also that sometimes it inaccurately attributed its responses to information source from The Times. So Times said that these issues have a direct impact on it. It said that OpenAI is using its content without permission to develop products that will directly compete with The Times. So that threatens its business financially by taking away users and OpenAI in turn gets a free ride by not acknowledging the investments that Times has made into its journalism. And it also said that if journalists stop making original and independent content that is not a vacuum AI can fill. So why is Microsoft party to this? Because it's OpenAI's biggest backup, it is intimately involved in its operations. It uses OpenAP's technology in its own products, and the LLM it uses also provides, and this is I'm quoting from the complaint, "infringing content while being chat or Copilot" as we know it now. Now, the context to this complaint is also important. Times said that it filed the case because it said that it tried to negotiate for months and failed to get a deal with OpenAI, where the latter would pay the media house to license its content. And this stands out a bit because OpenAI has struck deals with other media companies. Axel Springer, the publisher of Business Insider, for one now allows OpenAI to use its data for about three years, in exchange for what they say is tens of millions of euros. We don't know the exact amount yet. And the AP also signed a two year deal that allows OpenAI to use some of its news content back from even 1985, I think, to train an algorithm. So basically, the idea is that the more accurate the input is to train the models, the more accurate the results will be. And they desperately need it. In my experience, so far, at least, the results that I've got from ChatGPT whenever I have used it, to maybe get a summary of anything or see what the background is, they're riddled with inaccuracies, and so many misattributions. Now, the Times has not sought specific damages, but it said that it wants to hold OpenAI responsible for billions of dollars in statutory and actual damages. So this is the background, the latest update from Monday, is that OpenAI responded to the allegations. It said that The New York Times is not telling the full story. It said that it provides an opt out process for publishers to prevent issues from accessing their site, and that NYT adopted it in August 2023. It also called its regurgitation process or rare bug that it's trying to fix. So we'll definitely see more of this in the coming days. Great overview there. So what about other media platforms? I think you cited in your article that other outlets have been experimenting with AI, particularly in the context of chatbot capabilities. Tell us more.
Rashmi Ramesh: Take the AP, for example. They've been using it in various ways with mixed results. So the AP, for example, it issued guidelines on what AI can be used for, and what it can be used for. For example, it can be used to create publishable content and images for the new service. But I have seen several news platforms that are using it to generate images for the news stories. And the Guardian and Insider also published statements similar to the AP, saying that they will not use it to create original content, but only make journalist content better in terms of things like structure and readability and also help hone their marketing strategies. The Times recently hired an editorial director to lead its AI initiative. So a little bit of a self-plug moment here, but we also started an AI-focused website a few months ago, and my primary job is to write content for it. But some media firms have already gotten into a bit of a hot water on how they've used AI. Now Sports Illustrated publisher Arena Group fired several people who are overseeing the use of AI to generate content, because they will attributed to fake bylines allegedly. And CNET also began publishing AI written stories, and found out much later that there were errors in more than half of them. So that has been my experience as well. Like if you use ChatGPT or any other AI chatbot, check, check, check and check everything. I have seen it make up facts, I've seen it misattribute content, hallucinate, you name it, and it does it so... but that doesn't mean that it's of no help at all. It takes care of a lot of repetitive tasks. It's great to brainstorm content ideas, find sources, find experts, get a lot of background information. So definitely use it. Experiment as much as possible. But always, always, always take the results with a truckload of salt, at least at the moment.
Delaney: Wise words, certainly a useful tool. So there is the question of what happens here. What's the verdict going to be? But there's a bigger question as well, perhaps on all of our minds. What does this mean about the future of journalism? So maybe it's too soon to say that maybe you've got your own thoughts, Rashmi?
Ramesh: The one clear outcome that I see from all of this drama is that we'll have some idea on how AI can be used in journalism. As you know, companies continue to experiment with it further, use OpenAI and other companies' LLMs also and also develop their own GPT models. And this specific case, will also maybe help us set clearer guidelines for companies to use journalism content to train their LLMs, and the NYT case will most likely set precedent for future violations as well. And I know how everyone talks about how journalism is dead because chatbots can now write stories. Well, anyone including a chatbot can regurgitate press releases with proper instructions. But actual journalism requires legwork. It requires humans speaking with humans and connecting all of those dots to weave a story that evokes curiosity and then seats that curiosity. So in my opinion, AI will only help with that and not hinder it.
Delaney: Let's hope so. Mat, are you have the same opinion?
Schwartz: Yeah, definitely. We've seen so many interesting use cases with AI. Just this week, I was reading about it being used to discover chemical substances that can be used as replacements for other sorts of materials. They had trained a very particular AI to be able to solve chemical problems, for example. So there's so much potential here. And I think we can get thrown off sometimes when we try to use these tools ourselves. And they don't always work the way that we think they should. But we're seeing so many new ways of training them to do very specific or very complicated types of tasks that I think - not to be cliche - but I think the sky is still the limit with a lot of the applications that we're going to be seeing.
Delaney: Right. Well, let's move on to our final question, just for fun. If you could interview any historical figure about their thoughts on cybersecurity or AI, who would it be? And what question would you ask them? Go for it, Rashmi.
Ramesh: I would pick Salvador Dali, because his art was all about bending reality and perception. And I think he would have a very, very unique perspective and offer a new angle on a world that is equally fluid and deceptive. So I would probably just ask him in your dreamscape where logic literally melts and clocks trip, how would he represent threats and defenses of the cybersecurity world?
Delaney: Fantastic. I'd love to know what Dali thinks. Brilliant. Mat?
Schwartz: The only caution there is, as a surrealist his answer to what's the secret to cybersecurity might be like, watermelon or something or hair I don't know. You might not like what you hear. I don't know. I think we could use a little more levity. So I would love to interview somebody like Mark Twain, who it is, there's a quote attributed to him where, "If you don't read the newspaper, you're uninformed. If you do read the newspaper, you're misinformed." And there's so many wonderful quips and observations from him about not taking things too seriously. Having good perspective on things, always trying to be a good person, even though others around you might seem like scoundrels or fools, that sort of thing. And I just think with the degradation in the discourse that we've been having with the implosion of things like Twitter, now known as X, I just think we need more maybe a little more levity, a little more lightness, and collectively or maybe just personally taking things a little less seriously.
Delaney: Excellent, wise words, I think it's interesting how we've all gone for creatives, writers, artists, so I've chosen the romantic poet Lord Byron. I'd love to know his thoughts on say, the challenges of preserving privacy and ethics and individual freedoms in this age of AI and digital interconnectedness. So I think it's interesting how we haven't gone for traditional technologists. What does that say about us? Well, thank you so much. This has been absolutely brilliant, excellent discussion.
Schwartz: Thank you so much, Anna, for having us.
Ramesh: Thank you, Anna.
Delaney: It's my pleasure. And thank you so much for watching. Until next time.