Weekly Roundup: Copyright battles and AI election interference
Copyrights and patents at all stages of machine learning are being tested in court, AI in political campaigns gets more coverage, plus highlights from this week's AI regulation news.
Is the pirated piratable?
Copyright issues are driving a good deal of the news cycle both on the input side and output side of machine learning. On the input side, OpenAI has moved to dismiss a lawsuit claiming they infringed on a group of author’s copyrighted material. Vice reports the following key facts:
Although OpenAI has not released any information on the training data for GPT-4, the firm has admitted that training data for its earlier GPT-3 model included "internet-based books corpora," meaning a database of books available online.
The authors suing OpenAI in two separate lawsuits…allege that every output that ChatGPT makes is thus a derivative work of their books and infringes copyright.
The motion to dismiss emphasizes:
Scale: “For the purpose of ‘training’ a model of this type, it is the volume of text used, more than any particular selection of text.”
Purpose: “[The plantiffs] believe their texts were a tiny part of the dataset that OpenAI used to teach its models to derive the rules underlying human language” (emphasis added)
Non-derivative outputs: “…outputs are, in only a remote and colloquial sense, ‘based on’ an enormous training dataset that allegedly included Plaintiffs’ books.”
Limited copyright monopolies: Highlighted in four subheadings: “The Copyright Act Grants Only Specific, Enumerated Rights; Copyright Does Not Protect Ideas, Facts, or Language; Substantial Similarity Is Required for Infringement; Fair Use Is Not Infringement.”
This case tests how much existing copyright law can restrict generative AI. It is also one of the first tests on how the courts will understand outputs vis-à-vis training data. If the argument that the input data is only used in “a remote and colloquial sense” holds, then there will undoubtedly be consequences for how data owners receive compensation, if at all.
The Wall Street Journal has an excellent podcast episode on this subject from an artist’s perspective.
For copyright controversies on the output side: A few weeks ago, a federal judge ruled against plaintiff Stephen Thaler’s attempt to copyright AI-produced art. Mr. Thaler and his supporter, Ryan Abbot, argue that machine learning outputs should be copyrightable. Mr. Thaler believes that his AI system, DABUS, is sentient.
Mr. Abbot, however, being more measured in his legal approach, thinks the patent and copyright system should incentivize people to use AI for the common good.
In short, Abbott says, copyright and patent regimes should exist to encourage creation, not limit it. Rather than searching for a vague legal line in the sand where an AI-human collaboration becomes protectable, we should sweep away the line entirely. Intellectual property rights should be granted regardless of how a thing was made, including in the absence of a human inventor or author.
Assessing authorship
The U.S. Copyright Office has been aware of AI copyright issues for several years (as early as 1965, according to their latest notice of inquiry). This week, they posted a notice of inquiry requesting information on multiple points regarding AI copyright issues. In summary:
1. Using copyrighted works to train AI models:
How and where data is collected, curated, and used in training.
What kind of remuneration system(s) might be feasible and effective if permission and compensation are required?
2. The “copyrightability” of material generated using AI systems:
What should the proper scope of copyright protection be for material created using generative AI?
3. Potential liability for infringing works generated using AI systems:
How should liability be apportioned between the user whose instructions prompted the output and the developers of the system and dataset?
4. The treatment of generative AI outputs that imitate the identity or style of human artists:
Personal attributes (voices, likeness, or style) are not generally protected by copyright law, but their copying may implicate varying domestic laws and international treaties.
To copyright generative AI output, it has to be known that AI created it. To assist users with copyright infringement and identifying AI art, Google DeepMind released an AI image watermarking tool this week (in response to a request by The White House earlier this summer). Companies, however, are torn on their responsibility to disclose AI-generated content.
The Center for AI and Digital Policy (CAIDP) summarized recent AI copyright developments and published guidance on submitting comments. Comments are due by November 15.
In other news
Senator Chuck Schumer (D., NY.) will host the first “AI insight forums” on September 13 with Elon Musk of Tesla, Sundar Pichai of Google, Sam Altman of OpenAI, and Satya Nadella of Microsoft.
A bill was introduced in the New York State Senate restricting the use “of electronic monitoring or an automated employment decision [tools]…unless such tool has been the subject of a bias audit…and the results” have been made public.
New events have been added to the Events page!
Data wars continue
X updated its privacy policy to collect new troves of data, including biometrics. The new policy states the intent to use the data to train machine learning models.
The Guardian blocked OpenAI from scraping its data.
Generating election (in)security
Wired reports that building an AI disinformation machine costs as little as $400.
The impact, while not to be ignored, may not be as dire as some think. Americans tend to be immune to political ads, and only 1 in 5 receive news from social media—the primary platform of disinformation campaigns. However, AI can push stories to prominence by manipulating algorithms.
Comments on the proposal to have the FEC regulate AI-generated content in campaigns close on October 16.
Think tanks published new regulatory frameworks
The Centre for the Governance of AI’s report “describes trade-offs in the design of international governance arrangements for civilian artificial intelligence.”
The Brookings Institution’s report “proposes a new regulatory approach…to allow federal regulators to flexibly govern algorithms.”
“AI summer” is coming to an end
Bloomberg Law released a summary of regulatory actions states are taking.
If you missed some AI news this summer or need a great timeline, CSET published an excellent summary of last year’s AI news and regulation events.
Is there a topic you want to hear more about or want to collaborate with Pioneering Oversight? If so, I’d love to hear from you!