Roundtable

How Copyright Shapes Your Datasets and What To Do About It

Amanda Levendowski

2021 Roundtable

Abstract

Grappling with copyright law is unavoidable for ML researchers. Copyright protects works like text, photographs, and videos--all of which are used as ML training data, often without consent of the copyright owner. Relying on public domain works (like works published pre-1926), Creative Commons-licensed data (like Wikipedia) or ubiquitous data (like the Enron emails) seems like an easy way to avoid dealing with copyright. Unfortunately, only relying on those works predictably introduces bias into ML algorithms. This Workshop will not provide any legal advice, but it will equip researchers with the tools to understand copyright law and its relationship to ML bias, how the fair use doctrine may allow some copyrighted works to be used as training data without consent, and resources for obtaining legal advice related to copyright and ML research. Attendees will be able to participate in a Q&A after the presentation.

These are some of the resources mentioned in the discussion:

Friendly Neighborhood Tech Clinics (no single website, but offices are scattered throughout the US and possibly other countries)
How Copyright Law Can Fix Artificial Intelligence’s Implicit Bias Problem
Paper: Resisting Face Surveillance with Copyright Law
Paper: How Copyright Law Can Fix Artificial Intelligence's Implicit Bias Problem
Paper: Fair Learning by Mark Lemley + Bryan Casey

Video

The live parts of this page are not open to all registrants until 2021-12-08 18:00:00+00:00. You are seeing them because you have privileged access.

Chat is not available.