• 3 Posts
  • 398 Comments
Joined 1 year ago
cake
Cake day: June 16th, 2023

help-circle







  • But at what point does that guidance just become the dataset you removed from the training data?

    The whole point is that it didn’t know the concepts beforehand, and no it doesn’t become the dataset. Observations made of the training data are added to the model’s weights after training, the dataset is never relevant again as the model’s weights are locked in.

    To get it to run Doom, they used Doom.

    To realize a new genre, you’ll “just” have to make that game the old fashion way, first.

    Or you could train a more general model. These things happen in steps, research is a process.













  • You should read this letter by Katherine Klosek, the director of information policy and federal relations at the Association of Research Libraries.

    Why are scholars and librarians so invested in protecting the precedent that training AI LLMs on copyright-protected works is a transformative fair use? Rachael G. Samberg, Timothy Vollmer, and Samantha Teremi (of UC Berkeley Library) recently wrote that maintaining the continued treatment of training AI models as fair use is “essential to protecting research,” including non-generative, nonprofit educational research methodologies like text and data mining (TDM). If fair use rights were overridden and licenses restricted researchers to training AI on public domain works, scholars would be limited in the scope of inquiries that can be made using AI tools. Works in the public domain are not representative of the full scope of culture, and training AI on public domain works would omit studies of contemporary history, culture, and society from the scholarly record, as Authors Alliance and LCA described in a recent petition to the US Copyright Office. Hampering researchers’ ability to interrogate modern in-copyright materials through a licensing regime would mean that research is less relevant and useful to the concerns of the day.