Google metadata

9/27/2023

“In the absence of meaningful privacy regulations, that means that people can scrape really widely all over the internet, take anything that is ‘publicly available’ - that top layer of the internet for lack of a better term - and just use it in their product,” said Ben Winters, who leads the Electronic Privacy Information Center’s AI and Human Rights Project and co-authored its report on generative AI harms. But that gobbling process doesn’t distinguish between copyrighted works or personal data if it’s out there, it takes it. The internet provides massive amounts of data that’s relatively easy to gobble up through web scraping tools and APIs. The more they get, the better they can generate approximations of how humans sound, look, talk, and write. Simply put, generative AI systems need as much data as possible to train on. Generative AI companies are hungry for your data. Those might take care of your data in the future, but what can you do about what these companies already took, used, and profited from? The answer is probably not a whole lot. Google’s general counsel, Halimah DeLaine Prado, said in a statement that the company has been clear that it uses data from public sources, adding that “American law supports using public information to create new beneficial uses, and we look forward to refuting these baseless claims.”Įxactly what rights we may have over our own information, however, is still being worked out in lawsuits, worker strikes, regulator probes, executive orders, and possibly new laws. “Up until this point, tech companies have not done what they’re doing now with generative AI, which is to take everyone’s information and feed it into a product that can then contribute to people’s professional obsolescence and totally decimate their privacy in ways previously unimaginable,” said Ryan Clarkson, whose law firm is behind class action lawsuits against OpenAI and Microsoft and Google. (Google says it doesn’t use data from its free or enterprise Workspace products - that includes Gmail and Docs - to train its generative AI models unless it has user permission, though it does train some Workspace AI features like spellcheck and Smart Compose using anonymized data.) The fact that this may not even be true is almost beside the point. In a follow-up, its author claimed that Google “used docs and emails to train their AI for years.” The initial tweet has nearly 10 million views, and it’s been retweeted thousands of times. Two weeks ago, a viral tweet accused Google of scraping Google Docs for data on which to train its AI tools. You may be okay with all of this, or think the good that generative AI can do far outweighs whatever bad went into building it. This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.įor more newsletters, check out our newsletters page. By submitting your email, you agree to our Terms and Privacy Notice.

0 Comments

Google metadata

Leave a Reply.

Author

Archives

Categories