Scale AI exposed sensitive data about clients like Meta and xAI in public Google Docs, BI finds

Authors: Charles Rollet, Effie Webb, Shubhangi Goel, Hugh Langley

Publisher:

Business Insider

Published: 6/24/2025

Language:Chinese

--:--

David

早上好，各位听众！我是David，这里是专为你打造的<Goose Pod>节目。今天是6月30日星期一，早上八点整。

Ema

我是Ema！非常高兴能和大家一起。今天，咱们要聊一个最近备受关注的话题：AI数据标注公司Scale AI，竟然被曝在公共Google Docs中泄露了客户的敏感数据，这可真是个大新闻！

David

是啊，这事儿确实让人震惊。商业内幕的调查揭露，Scale AI这家公司，竟然经常使用公开的Google Docs来处理与Google、Meta和xAI这些顶尖客户的工作。

Ema

嗯，这听起来就像是把公司最重要的机密文件，直接丢在了公园长椅上，谁都能捡到！他们标记为“机密”，但只要有链接，任何人都能访问，甚至有些还能编辑！

David

正是如此。商业内幕审查数千份文件，发现其中不仅有客户的机密AI训练文档，还包括Scale AI自家承包商的敏感数据。这无疑暴露了公司数据安全上的严重漏洞。

Ema

而且，你想想，Scale AI可是有至少24万名承包商，一个庞大的自由职业者大军！他们用Google Docs可能是为了提高效率，但代价也太大了。简直是把公司命脉暴露在光天化日之下。

David

嗯，没错。虽然目前没有迹象表明Scale AI因此遭受了直接的黑客攻击，但网络安全专家警告说，这种做法极易导致各种形式的攻击。比如，黑客可能冒充承包商，或上传恶意软件。

Ema

这就像是给小偷打开了大门，虽然不一定会立刻被偷，但风险无处不在。Scale AI声称正在“彻底调查”，并已禁用公开分享功能。希望亡羊补牢，为时不晚吧。

David

他们表示会加强技术和政策保障，以保护机密信息。但问题是，这些漏洞在Meta对其巨额投资后才曝光，让人质疑Meta在投资前是否充分了解了这些安全隐患。

Ema

是啊，这笔投资可不是小数目。要是Meta早知道这些，那他们是明知故犯吗？还是Scale AI隐藏得太深了？这背后肯定有很多故事，真是让人好奇呢。

David

嗯，确实。这起事件不仅对Scale AI的声誉造成了打击，也对其与顶尖科技公司的合作关系蒙上了一层阴影。毕竟，数据安全是这些科技巨头的生命线。

Ema

没错。对Google、Meta、xAI来说，数据的价值无法估量。想象一下，他们的AI模型训练数据，甚至是改进聊天机器人的“秘籍”都被公开，这简直是灾难性的！

David

要理解这次事件的严重性，我们首先要了解Scale AI是做什么的。简单来说，它是一家提供数据标注和标签服务的公司，这对于训练人工智能模型至关重要。

Ema

对，打个比方，就像我们教小孩子认识世界。AI也一样，它需要有人告诉它图片里是猫狗，声音里说的是什么。Scale AI就是做这个“老师”的工作，而且服务对象都是大公司。

David

正是因为服务于这些巨头，Scale AI的数据处理量非常庞大。为了高效管理全球约24万名承包商，他们采用了Google Docs作为内部文件共享和工作跟踪的主要工具。

Ema

听起来是为了方便，但却忽略了安全。我有个朋友的公司，他们也用Google Docs，但所有文件都有严格的权限设置。像Scale AI这么大的公司还用公共链接，简直匪夷所思！

David

确实。商业内幕调查发现，这些通过公共链接可访问的文件中，包含了大量机密AI项目信息。比如，Google有至少七份被标记为“机密”的说明手册被公开。

David

这些文档详细说明了Google如何使用ChatGPT改进当时的聊天机器人Bard，以及Scale AI的承包商应如何修复Bard的缺点，比如它难以回答复杂问题的情况。

Ema

哇，这等于把Google的“内部考试答案”都泄露了啊！还有xAI的“木琴项目”细节也被曝光了。这个项目专注于提高AI对话能力，从僵尸末日到管道维修都涉及。

Ema

嗯，700个对话提示清单都公开了，这简直是给竞争对手送大礼包！听起来就让人觉得不可思议。

David

Meta的情况同样不容乐观。他们的训练文档被标记为机密，同样可通过公共链接访问。文件甚至有可访问的音频链接，揭示了Meta对其AI产品表达能力的标准。

Ema

嗯，想象一下，你辛辛苦苦训练出来的AI，它的“音色”和“语调”标准都被对手知道了，那还怎么保持竞争力啊？承包商们还说，即使项目被代号化了，也很容易猜出客户。

David

除了客户的机密项目，Scale AI还泄露了大量承包商的敏感个人信息。商业内幕审查的电子表格中，列出了数千名承包商的姓名和私人Gmail地址，而且这些表格都没有被锁定。

Ema

我的天！这简直是把员工的隐私都扒光了。更离谱的是，有些文件甚至详细记录了他们的工作表现，比如“好人和坏人”的表格，把几十名员工分为“高质量”或“涉嫌作弊”。

David

还有，一个标题为“移动所有作弊任务者”的清单，列出了数百个私人电子邮件地址，并标记了“可疑行为”。甚至，一份文件列出了近1000名被“错误禁用”的承包商。

Ema

这真是太糟糕了！不仅泄露了个人信息，还把对员工的评价也公开了。这会给那些被标记的承包商带来多大的困扰啊。而且，这些文件还显示了个人承包商的薪资。

David

所有这些信息的公开，都指向一个核心问题：Scale AI在追求效率的同时，严重忽视了数据安全的基本原则。尤其在Meta巨额投资后，这些安全漏洞显得格外刺眼。

Ema

确实。这就像一个高速运转的工厂，为了生产效率而忽略了设备的安全检查，最终导致了严重的事故。希望这次事件能给所有科技公司敲响警钟，速度固然重要，但安全是底线。

David

这次事件中最令人担忧的，是Scale AI内部承包商对这种“极其糟糕”的系统感受。五位受访的现任和前任承包商表示，公司内部普遍使用公共Google Docs。

Ema

“极其糟糕”？嗯，这可不是什么好词儿啊！听起来就是一团糟。承包商们还说，即使不再负责某个项目，也仍然能访问旧的项目文件，甚至这些文件还会更新客户的新请求。

David

是的，这暴露了权限管理上的巨大漏洞。哥伦比亚大学网络安全讲师约瑟夫·斯坦伯格指出，通过公共Google Docs组织内部工作会带来严重风险。

David

他直言不讳地说：“这当然很危险。最好的情况，也只是助长了社会工程攻击。”

Ema

社会工程攻击？嗯，这听起来像是在演谍战片。能给咱们的听众解释一下，什么是社会工程攻击吗？我猜很多人可能不太了解。

David

当然。社会工程攻击是指黑客通过欺骗员工或承包商来获取访问权限，通常是冒充公司内部人员。斯坦伯格表示，将数千名承包商的详细信息轻易暴露，为这类攻击创造了大量机会。

Ema

这下我明白了，就像是骗子假装你的领导，让你点击病毒链接。太可怕了！网络安全公司Trace3的斯蒂芬妮·库尔茨也提到，有些Google Docs可编辑，这更糟糕。

David

没错，坏人可能会在里面插入恶意链接。她强调，公司应该通过邀请来管理访问权限，而不是“把它放在那里，然后希望没人分享链接，那可不是什么好策略”。

Ema

嗯，这反映了Scale AI在基础安全防护上的不足。这让我想起一个经历，我有个朋友的公司，也是因为权限设置不当，导致一个重要的项目文档被竞争对手看到了。

David

是的，你的朋友的经历很典型。斯坦伯格还提到了一个增长型初创公司面临的困境：在安全上的投入会减缓市场扩张速度。

David

他直言：“那些真正花时间做好安全的公司，往往会因为其他公司更快进入市场而落后。”

Ema

这真是个两难的选择。一边是快速发展，抢占市场；另一边是稳扎稳打，确保安全。但从长远来看，如果安全出了问题，再快的增长也会瞬间崩塌。

Ema

你看，这次事件之后，Google、OpenAI和xAI都暂停了与Scale AI的合作。客户的反应是最直接的体现。

David

是的。尽管Scale AI发表了博文，试图安抚客户，强调他们是中立和独立的合作伙伴，并拥有严格的安全标准，但BI的发现却提出了质疑：他们真的做得足够好吗？

Ema

嗯，尤其是Meta，他们对这次事件保持了沉默，Google和xAI也没有回应评论请求。这种沉默本身就说明了很多问题。信任一旦受损，重建起来可就难了。

David

所以，这次事件不仅仅是技术上的疏忽，更是一场信任危机。它迫使Scale AI，乃至整个AI数据服务行业，重新审视效率与安全之间的平衡。

David

这次数据泄露事件对Scale AI的影响是多方面的。首先，最直接的就是对其声誉和客户信任度的严重打击。在一个高度依赖信任的行业，这样的负面新闻无疑是致命的。

Ema

是啊。就像一个餐厅，如果被爆出后厨不干净，顾客还会愿意去吗？Scale AI作为AI数据标注的巨头，客户都是顶级科技公司，他们对数据安全的标准极高。

David

这次事件肯定会让他们重新评估合作关系。没错，报告指出，Google、xAI和OpenAI等主要客户在事件曝光后都暂停或缩减了与Scale AI的合作。

David

这不仅意味着业务量的损失，更重要的是，会迫使Scale AI投入大量资源进行内部审计，并实施更严格的数据安全协议和员工培训。

Ema

这就像是“亡羊补牢”，但代价不小。他们可能需要重新谈判合同，客户也会提出更高的安全保障要求。这会给公司运营带来巨大压力和额外成本。估计内部现在肯定一片混乱。

David

除了对Scale AI自身的影响，这次事件也对整个AI供应链的安全敲响了警钟。它凸显了第三方供应商风险、数据最小化原则以及人为错误的重要性。

David

当公司将敏感数据交给外部伙伴处理时，必须确保对方有足够的安全保障。我来举个例子吧，就像我们去银行存钱，信任银行能保管好我们的钱。

Ema

现在AI公司把最重要的数据交给Scale AI，结果发现Scale AI的“保险箱”门没关。这不仅是Scale AI的问题，也让所有依赖第三方AI数据服务的公司都得警惕起来。

David

正是如此。而且，这次事件发生在Meta对其进行巨额投资之后，也引发了关于尽职调查的疑问。Meta是否在投资前充分了解了Scale AI的安全状况？

Ema

这对于未来大型科技公司在AI领域的投资策略可能会产生影响。这就像是买房子，结果发现房子有重大隐患。这笔巨额投资，现在看来，可能比想象中要烫手。

David

这次事件也提醒我们，即使是技术最前沿的公司，也可能在最基本的安全实践上犯错。是的，它强调了在AI快速发展的今天，数据安全绝不能被忽视。

David

任何一点疏忽都可能导致灾难性的后果，无论是对公司自身还是对整个行业生态而言。

Ema

那么，经历了这次事件，你觉得未来AI数据服务行业会发生哪些变化呢？我觉得，首先，所有AI数据供应商都会面临更严格的审查。

David

嗯，客户会要求更高的安全标准。你说的没错。我们可以预见到，AI数据生态系统内部将出现更复杂的安全措施，包括更高级的加密技术、更严格的访问控制、自动漏洞扫描以及持续监控。

Ema

企业将不再满足于基本的安全保障。这就像是在数据传输的每一站都加上了多重锁，并且有专人24小时盯着。

Ema

我猜，监管机构也会加入进来，对AI数据处理制定更严格的法律法规，对数据泄露的惩罚也会更重。

David

很有可能。随着AI技术的进步和数据泄露事件的增多，全球的监管机构可能会出台更严厉的数据保护法律和执行机制。

David

此外，公司可能会重新评估使用Google Docs这类通用协作工具来处理高度敏感数据的方式。

Ema

没错，他们会转向那些更安全、专门为此类敏感数据设计的平台，这些平台会有更精细的访问控制和审计跟踪功能。就像你不会把金条放在普通抽屉里，而是会放在专业金库里一样。

David

Meta对Scale AI的投资也给我们一个启示，即数据标注对于大型AI模型具有战略重要性。

David

这次事件可能会促使Meta对其AI合作伙伴的安全实践进行更彻底的尽职调查，并影响其未来在AI供应链中的投资策略。

Ema

总而言之，Scale AI的数据泄露事件给我们敲响了警钟：在AI时代，数据安全绝不是小事。

David

希望所有公司都能从中吸取教训，将数据安全放在首位。感谢各位听众收听今天的<Goose Pod>。

Ema

我们明天再见！

# Comprehensive News Summary: Scale AI Data Exposure --- * **News Title:** Scale AI exposed sensitive data about clients like Meta and xAI in public Google Docs, BI finds * **Report Provider:** Business Insider * **Authors:** Charles Rollet, Effie Webb, Shubhangi Goel, Hugh Langley * **Date Published:** 2025-06-24 16:42:16 (UTC) * **Topic/Sub-Topic:** Technology / AI * **Keywords:** Meta, Cybersecurity, Exclusive * **URL:** [https://www.businessinsider.com/scale-ai-public-google-docs-security-2025-6](https://www.businessinsider.com/scale-ai-public-google-docs-security-2025-6) --- ## Summary of Findings Business Insider (BI) has uncovered significant security vulnerabilities at Scale AI, a prominent AI data labeling startup, revealing that the company routinely used public Google Docs to manage work for high-profile clients like Google, Meta, and xAI. This practice exposed thousands of confidential AI training documents and sensitive contractor data, raising serious cybersecurity and confidentiality concerns. The revelations come in the wake of Meta's recent $14.3 billion investment in Scale AI and the planned move of Scale AI cofounder Alexandr Wang to Meta. ### Core Issue: Public Google Docs Usage Scale AI, which relies on a vast network of at least **240,000 contractors**, utilized public Google Docs as an efficient, albeit risky, method for sharing internal files and tracking work. BI reviewed thousands of these files, finding many marked "confidential" and accessible to anyone with the link. Some documents were even editable by external parties. ### Details of Data Exposure 1. **Client Confidential AI Projects:** * BI viewed **thousands of pages** of project documents across **85 individual Google Docs** related to Scale AI's work with major tech clients. * **Google:** At least **seven instruction manuals** marked "confidential" by Google were publicly accessible. These documents detailed issues with Google's chatbot, then called Bard (e.g., difficulties with complex questions), and outlined how Scale contractors should improve it, including how Google used ChatGPT for Bard's improvement. * **xAI:** Public Google documents and spreadsheets exposed details of "Project Xylophone," for which Scale ran at least **10 generative AI projects** as of April. This included training documents and a list of **700 conversation prompts** focused on improving the AI's conversation skills across diverse topics. * **Meta:** Confidential Meta training documents, including links to accessible audio files with examples of "good" and "bad" speech prompts, were publicly available. These projects aimed to train Meta's chatbots to be more conversational, emotionally engaging, and handle sensitive topics safely. Meta had at least **21 generative AI projects** with Scale as of April. * Contractors reported easily identifying clients or products, even when projects were codenamed, sometimes due to client logos or by directly prompting the AI model. 2. **Contractor Sensitive Information:** * Unsecured spreadsheets listed the names and private Gmail addresses of **thousands of Scale AI workers**. * Documents detailed contractor work performance, including a spreadsheet titled "Good and Bad Folks" categorizing workers as "high quality" or suspected of "cheating." * Another list of hundreds of personal email addresses was titled "move all cheating taskers," flagging workers for "suspicious behavior." * One document named nearly **1,000 contractors** who were "mistakenly banned" from Scale AI's platforms. * Other documents showed individual contractor pay rates, along with notes on pay disputes and discrepancies. ### Context and Background * **Meta's Investment:** The findings emerge shortly after Meta's substantial **$14.3 billion investment** in Scale AI, which also involves Scale AI cofounder Alexandr Wang joining Meta. * **Client Reactions:** Following Meta's investment, clients such as Google, OpenAI, and xAI reportedly **paused work** with Scale AI. * **Scale AI's Reassurance:** In a recent blog post, Scale AI sought to reassure Big Tech clients, emphasizing its neutrality, independence, and commitment to "robust technical and policy safeguards" and "strict security standards." ### Risks and Concerns * **Cybersecurity Vulnerabilities:** Cybersecurity experts Joseph Steinberg (Columbia University) and Stephanie Kurtz (Trace3) confirmed that using public Google Docs creates serious risks. * **Social Engineering:** The exposed contractor data facilitates "social engineering" attacks, where hackers could impersonate employees or contractors to gain unauthorized access. * **Malware Insertion:** The fact that some Google Docs were editable by anyone creates a risk of bad actors inserting malicious links or malware. * **Operational Janky-ness:** Five current and former Scale AI contractors described the Google Docs system as "incredibly janky" and noted that they retained access to old projects, sometimes updated with new client requests, even after their work on them concluded. * **Growth vs. Security:** Steinberg highlighted the dilemma for growth-oriented startups, where prioritizing security can slow market entry. ### Company Responses * **Scale AI:** A spokesperson stated, "We are conducting a thorough investigation and have disabled any user's ability to publicly share documents from Scale-managed systems." They reiterated their commitment to data security and strengthening practices. * **Meta:** Declined to comment on the findings. * **Google and xAI:** Did not respond to requests for comment. ### Implications BI's findings raise significant questions regarding the adequacy of Scale AI's security measures and whether Meta was aware of these vulnerabilities prior to its substantial investment. While there is no indication that Scale AI has suffered a breach directly because of these practices, the exposed data and lax security protocols leave the company and its high-profile clients highly vulnerable to future attacks.

Read original at Business Insider →

Scale AI cofounder Alexandr Wang, who is joining Meta following a $14.3 billion investment.Scale AI Scale AI routinely uses public Google Docs for work with Google, Meta, and xAI.BI reviewed thousands of files — some marked confidential, others exposing contractor data.Scale AI says it's conducting a "thorough investigation."

As Scale AI seeks to reassure customers that their data is secure following Meta's $14.3 billion investment, leaked files and the startup's own contractors indicate it has some serious security holes.Scale AI routinely uses public Google Docs to track work for high-profile customers like Google, Meta, and xAI, leaving multiple AI training documents labeled "confidential" accessible to anyone with the link, Business Insider found.

Contractors told BI the company relies on public Google Docs to share internal files, a method that's efficient for its vast army of at least 240,000 contractors and presents clear cybersecurity and confidentiality risks.Scale AI also left public Google Docs with sensitive details about thousands of its contractors, including their private email addresses and whether they were suspected of "cheating."

Some of those documents can be viewed and also edited by anyone with the right URL.There's no indication that Scale AI has suffered a breach because of this. Two cybersecurity experts told BI that such practices could leave the company and its clients vulnerable to various kinds of hacks, such as hackers impersonating contractors or uploading malware into accessible files.

Scale AI told Business Insider it takes data security seriously and is looking into the matter."We are conducting a thorough investigation and have disabled any user's ability to publicly share documents from Scale-managed systems," a Scale AI spokesperson said. "We remain committed to robust technical and policy safeguards to protect confidential information and are always working to strengthen our practices."

Meta declined to comment. Google and xAI didn't respond to requests for comment.In the wake of Meta's blockbuster investment, clients like Google, OpenAI, and xAI paused work with Scale. In a blog post last week, Scale reassured Big Tech clients that it remains a neutral and independent partner with strict security standards.

The company said that "ensuring customer trust has been and will always be a top priority," and that it has "robust technical and policy safeguards to protect customers' confidential information."BI's findings raise questions about whether it did enough to ensure security and whether Meta was aware of the issue before writing the check.

Confidential AI projects were accessibleBI was able to view thousands of pages of project documents across 85 individual Google Docs tied to Scale AI's work with Big Tech clients. The documents include sensitive details, such as how Google used ChatGPT to improve its own struggling chatbot, then called Bard.

Scale also left public at least seven instruction manuals marked "confidential" by Google, which were accessible to anyone with the link. Those documents spell out what Google thought was wrong with Bard — that it had difficulties answering complex questions — and how Scale contractors should fix it.

For Elon Musk's xAI, for which Scale ran at least 10 generative AI projects as of April, public Google documents and spreadsheets show details of "Project Xylophone," BI reported earlier this month. Training documents and a list of 700 conversation prompts revealed how the project focused on improving the AI's conversation skills about a wide array of topics, from zombie apocalypses to plumbing.

Meta training documents, marked confidential at the top, were also left public to anyone with the link. These included links to accessible audio files with examples of "good" and "bad" speech prompts, suggesting the standards Meta set for expressiveness in its AI products.Some of those projects focused on training Meta's chatbots to be more conversational and emotionally engaging while ensuring they handled sensitive topics safely, BI previously reported.

As of April, Meta had at least 21 generative AI projects with Scale.Several Scale AI contractors interviewed by BI said it was easy to figure out which client they worked for, even though they were codenamed, often just from the nature of the task or the way the instructions were phrased. Sometimes it was even easier: One presentation seen by BI had Google's logo.

Even when projects were meant to be anonymized, contractors across different projects described instantly recognizing clients or products. In some cases, simply prompting the model or asking it directly which chatbot it was would reveal the underlying client, contractors said.Scale AI left contractor information publicOther Google Docs exposed sensitive personal information about Scale's contractors.

BI reviewed spreadsheets that were not locked down and that listed the names and private Gmail addresses of thousands of workers. Several contacted by BI said they were surprised to learn their details were accessible to anyone with the URL of the document.Many documents include details about their work performance.

One spreadsheet titled "Good and Bad Folks" categorizes dozens of workers as either "high quality" or suspected of "cheating." Another list of hundreds of personal email addresses is titled "move all cheating taskers," which also flagged workers for "suspicious behavior."Another sheet names nearly 1,000 contractors who were "mistakenly banned" from Scale AI's platforms.

Other documents show how much individual contractors were paid, along with detailed notes on pay disputes and discrepancies.The system seemed 'incredibly janky'Five current and former Scale AI contractors who worked on separate projects told BI that the use of public Google Docs was widespread across the company.

Contractors said that using them streamlined operations for Scale, which relies mostly on freelance contributors. Managing individual access permissions for each contractor would have slowed down the process.Scale AI's internal platform requires workers to verify themselves, sometimes using their camera, contractors told BI.

At the same time, many documents containing information on training AI models can be accessed through public links or links in other documents without verification."The whole Google Docs system always seemed incredibly janky," one worker said.Two other workers said they retained access to old projects they no longer worked on, which were sometimes updated with requests from the client company regarding how the models should be trained.'

Of course it's dangerous'Organizing internal work through public Google Docs can create serious cybersecurity risks, Joseph Steinberg, a Columbia University cybersecurity lecturer, told BI."Of course it's dangerous. In the best-case scenario, it's just enabling social engineering," he said.Social engineering refers to attacks where hackers trick employees or contractors into giving up access, often by impersonating someone within the company.

Leaving details about thousands of contractors easily accessible creates many opportunities for that kind of breach, Steinberg said.At the same time, investing more in security can slow down growth-oriented startups."The companies that actually spend time doing security right very often lose out because other companies move faster to market," Steinberg said.

The fact that some of the Google Docs were editable by anyone creates risks, such as bad actors inserting malicious links into the documents for others to click, Stephanie Kurtz, a regional director at cyber firm Trace3, told BI.Kurtz added that companies should start with managing access via invites."

Putting it out there and hoping somebody doesn't share a link, that's not a great strategy there," she said.Have a tip? Contact this reporter via email at crollet@insider.com or Signal and WhatsApp at 628-282-2811. Use a personal email address and a nonwork device; here's our guide to sharing information securely.

Meta Cybersecurity ExclusiveMore Read next

Analysis

Impact Analysis+

Event Background+

Future Projection+

Key Entities+

Twitter Insights+

Related Podcasts