The generative AI models that power deepfake tools were trained on billions of images scraped from the internet, including social media photos, news photos, and platform content. While training data for general models includes countless images, the more targeted threat is fine-tuning: taking a general model and training it on a small collection of reference photos of a specific individual to enable generation of that person's likeness in any scenario. The fine-tuning process requires as few as 10-20 photos — a number trivially available for anyone with a public social media presence. There is currently limited legal regulation of the data collection for AI training, though the outputs — deepfake NCII — are clearly regulated.

Key facts about this term

  1. Fine-tuning on scraped photos enables personalized deepfakes General AI models are fine-tuned on scraped photos of specific individuals to create models capable of generating any scene featuring that person's realistic likeness — including intimate scenes.
  2. Public social media is the primary scraping source Profile photos, tagged photos, and publicly visible posts are all harvested by AI scraping bots. Minimizing public photo exposure reduces fine-tuning risk.
  3. The TAKE IT DOWN Act regulates outputs, not training Current U.S. law does not specifically regulate AI training data collection. The TAKE IT DOWN Act regulates the resulting intimate imagery — the output — regardless of how the model was trained.

Frequently asked questions

Can I opt my photos out of AI training datasets?

Some AI companies have implemented opt-out mechanisms for their training data. Google's AI opt-out, HaveIBeenTrained.com, and similar tools allow you to request removal from known datasets, though compliance is voluntary.

Is scraping social media photos for AI training illegal?

Courts have issued conflicting rulings on scraping and copyright/terms of service. The use of scraped photos to generate NCII is clearly illegal under the TAKE IT DOWN Act even if the underlying data collection is contested.