由聊天机器人带来的革命已经将我们的世界中充斥满了AI生成的文本:它们已经渗透进了我们的新闻推送,试卷和收件箱中。这样的内容是如此之多以至于行业上已经开始行动并提供反制手段。一些公司提供通过分析文字内容来识别AI生成文本的服务,而另一些则宣称他们的工具能够使您用AI生成的文字更加“人性化”并难以检测到。两种工具的性能都值得怀疑,而且随着聊天机器人不断进步,想要分辨一些词语到底是由人还是由算法合成只会变得越来越难。

The chatbot revolution has left our world awash in AI-generated text: It has infiltrated our news feeds, term papers, and inboxes. It’s so absurdly abundant that industries have sprung up to provide moves and countermoves. Some companies offer services to identify AI-generated text by analyzing the material, while others say their tools will “humanize“ your AI-generated text and make it undetectable. Both types of tools have questionable performance, and as chatbots get better and better, it will only get more difficult to tell whether words were strung together by a human or an algorithm.

这里是另一种解决方法:一开始就在文本中加入一种水印或内容的凭证,使人们轻易检查文本是否由AI生成。谷歌DeepMind团队今天刊登在《自然》杂志上的研究结果恰恰能实现这一点。这个系统叫做SynthID-Text,并不会损失“文本生成的质量,准确性,创造性和速度”,谷歌DeepMind的研究副总裁,这个论文的合作作者Pushmeet Kohli说。但是研究人员也承认他们的系统还远远达不到不会出错的地步,也还没有对每个人开放——与其说是一个大规模的解决方案,它更是一种演示。

Here’s another approach: Adding some sort of watermark or content credential to text from the start, which lets people easily check whether the text was AI-generated. New research from Google DeepMind, described today in the journal Nature, offers a way to do just that. The system, called SynthID-Text, doesn’t compromise “the quality, accuracy, creativity, or speed of the text generation,” says Pushmeet Kohli, vice president of research at Google DeepMind and a coauthor of the paper. But the researchers acknowledge that their system is far from foolproof, and isn’t yet available to everyone—it’s more of a demonstration than a scalable solution. 

谷歌今日宣布,已经把这种水印机制加入到了Gemini聊天机器人中。它同时也将工具开源并向开发者和公司开放,使得他们可以判断输出的文本是否来自他们自己的大语言模型,驱动聊天机器人的AI系统。但是,只有谷歌和那些开发者目前有检查水印的检测工具。就像Kohli说的那样:“虽然SynthID不是识别AI生成内容的万能方法,它是开发更可靠的人工智能识别工具的重要组成部分。

Google has already integrated this new watermarking system into its Gemini chatbot, the company announced today. It has also open-sourced the tool and made it availableto developers and businesses, allowing them to use the tool to determine whether text outputs have come from their own large language models (LLMs), the AI systems that power chatbots. However, only Google and those developers currently have access to the detector that checks for the watermark. As Kohli says: “While SynthID isn’t a silver bullet for identifying AI-generated content, it is an important building block for developing more reliable AI identification tools.”

内容凭据的兴起

内容凭据已经成了图片和视频方面的热门话题,并被认为是一种对抗深度伪造的办法。科技公司和主流媒体已经共同加入了一项叫做C2PA的项目中,并已经研发出一套为图片和视频文件添加加密的指示它们是真实的还是AI生成的水印系统。但是文本是一个更复杂的问题,因为文本能够被修改来隐藏或去除水印。即使SynthID-Text不是第一个为文本开发水印系统的尝试,它是第一个在2000万个提示语上被测试的。

Content credentials have been a hot topic for images and video, and have been viewed as one way to combat the rise of deepfakes. Tech companies and major media outlets have joined together in an initiative called C2PA, which has worked out a system for attaching encrypted metadata to image and video files indicating if they’re real or AI-generated. But text is a much harder problem, since text can so easily be altered to obscure or eliminate a watermark. While SynthID-Text isn’t the first attempt at creating a watermarking system for text, it is the first one to be tested on 20 million prompts.

外界的研究内容凭据的专家认为DeepMind的研究是很好的的一步。它“保持着在文件和源文本上使用来自C2PA的内容凭据的承诺,“微软的媒体起源顾问和C2PA的执行主席Andrew Jenks说。“这个问题难以解决,看到有了进展很高兴,”C2PA的指导委员会成员Bruce MacCormack说。

Outside experts working on content credentials see the DeepMind research as a good step. It “holds promise for improving the use of durable content credentials from C2PA for documents and raw text,” says Andrew Jenks, Microsoft’s director of media provenance and executive chair of the C2PA. “This is a tough problem to solve, and it is nice to see some progress being made,” says Bruce MacCormack, a member of the C2PA steering committee.

谷歌的文本水印如何工作

SynthID-Text的工作方式是直接介入生成的过程:它通过一些人类无法注意到但是对SynthID检测器很明显的方式修改一些聊天机器人对用户输出的词汇。“这样的修改可以对生成的文本引入一个数据上的签名,”研究人员在论文上说。“在水印检测阶段,可以使用签名来测量文本是否确实是由有水印系统的大语言模型生成的。”

SynthID-Text works by discreetly interfering in the generation process: It alters some of the words that a chatbot outputs to the user in a way that’s invisible to humans but clear to a SynthID detector. “Such modifications introduce a statistical signature into the generated text,” the researchers write in the paper. “During the watermark detection phase, the signature can be measured to determine whether the text was indeed generated by the watermarked LLM.”

驱动聊天机器人的大语言模型通过一个一个生成词汇的方式生成句子,并查看之前的上文来选择一个可能的词汇作为下文。大体上,SynthID-Text通过随机为候选词汇分配分数并让大语言模型输出分数最高的那个词。之后检测器可以输入一段文本并计算总分;有水印的文本会比无水印的文本得分高。DeepMind团队检查了他们系统相比于其他修改生成过程的文本水印系统,并发现它在识别有水印的文本上做的更好。

The LLMs that power chatbots work by generating sentences word by word, looking at the context of what has come before to choose a likely next word. Essentially, SynthID-Text interferes by randomly assigning number scores to candidate words and having the LLM output words with higher scores. Later, a detector can take in a piece of text and calculate its overall score; watermarked text will have a higher score than non-watermarked text. The DeepMind team checked their system’s performance against other text watermarking tools that alter the generation process, and found that it did a better job of detecting watermarked text.

但是,研究人员在论文中承认修改Gemini生成的文本来骗过检测器仍然简单。即使用户不知道改哪个词,只要他们明显修改了文本或甚至是让另一个聊天机器人来总结文本,水印很有可能就变得模糊了。

However, the researchers acknowledge in their paper that it’s still easy to alter a Gemini-generated text and fool the detector. Even though users wouldn’t know which words to change, if they edit the text significantly or even ask another chatbot to summarize the text, the watermark would likely be obscured.

大规模测试文本水印

为了确认SynthID-Text真的没有让聊天机器人产生更糟的回应,团队在给予Gemini的2000万的提示语上测试了它。有一半的提示语被输入进了SynthID-Text系统并得到了加了水印的回应,而剩下一半则得到常规的Gemini回应。通过用户的“点赞”和“点踩”反馈,打了水印的回应和标准的回应同样能使用户满意。

To be sure that SynthID-Text truly didn’t make chatbots produce worse responses, the team tested it on 20 million prompts given to Gemini. Half of those prompts were routed to the SynthID-Text system and got a watermarked response, while the other half got the standard Gemini response. Judging by the “thumbs up” and “thumbs down” feedback from users, the watermarked responses were just as satisfactory to users as the standard ones.

这件事对于谷歌和基于Gemini构建程序的开发者是好事。但是想解决完整的识别AI生成文本的问题(有人把它称作AI废水)需要更多公司引入水印技术——在互相兼容的理想情况下一种检测器应该能够识别来自多个大语言模型的生成文本。而且就算是在所有主要AI公司都签订某种协议的不太可能的情况下,也还是会有开源大语言模型的问题,它们可以被简单修改来去除水印功能。

Which is great for Google and the developers building on Gemini. But tackling the full problem of identifying AI-generated text (which some call AI slop) will require many more AI companies to implement watermarking technologies—ideally, in an interoperable manner so that one detector could identify text from many different LLMs. And even in the unlikely event that all the major AI companies signed on to some agreement, there would still be the problem of open-source LLMs, which can easily be altered to remove any watermarking functionality.

来自C2PA的MacCormack强调当从可行性角度分析部署这个检测机制是个特别的问题。“有着检测未知来源的文本的挑战,”他说,“您要知道文本应用了哪种水印模型以及去哪里查找线索。”总体上,他说研究人员还有更多工作要做。这次努力“并不是死路一条,”MacCormack说,“但这是长路上的第一步。”

MacCormack of C2PA notes that detection is a particular problem when you start to think practically about implementation. “There are challenges with the review of text in the wild,” he says, “where you would have to know which watermarking model has been applied to know how and where to look for the signal.” Overall, he says, the researchers still have their work cut out for them. This effort “is not a dead end,” says MacCormack, “but it’s the first step on a long road.”

原文:https://spectrum.ieee.org/watermark

Last modified: October 24, 2024

Author

Comments

Leave a Reply