Skip to main content
SPARCS - Topic Of The Week

Comments on Data Types and AI

The robots will eat anything you give them

Cartoon dinosaur with pig-like features in a basement with wide open mouth catching a stream of food scraps and bones coming through a hole in the ceilingYou paste a paragraph into an AI tool to clean it up or summarize it. It feels harmless, routine even; but the moment that information is submitted, it leaves your immediate control.

It doesn’t feel like sharing sensitive data. It feels like getting help. And that small, everyday action is where most data exposure begins.

Recently, there has been a surge in the use of AI-related tools, from LLMs (large language models) such as Claude to transcription tools such as Granola AI. This is understandable given the times we live in. However, many people are still unclear on what kind of data should or should not be shared with these tools, and why.

We often assume that if we pay for or subscribe to a tool, it is safe to use with high-risk data (such as personal data, financial information, student records, or confidential materials), and that we can input any sensitive information we want. However, this is not entirely accurate, especially when using personal, trial, or free AI tools.

Even if a tool meets certain security standards, that does not mean every type of sensitive data is appropriate to enter into it. Security controls reduce risk, but they do not eliminate it. Before entering any data, ask yourself whether real data is actually necessary, or if a placeholder or example would work just as well.

Where your data actually goes

When you use cloud-based AI tools like ChatGPT, Gemini, or Claude, your data is processed on external servers. From that point on, you no longer fully control how it is stored, reviewed, or used. In many cases:

  • Your prompts and conversations may be stored, either temporarily or longer-term
  • They may be reviewed by humans to improve system quality
  • They could be used to train future versions of the AI (depending on settings and policies)

Even when anonymized, data is not always truly anonymous. Small details can sometimes be combined to reveal more than intended.

There have also been demonstrated techniques to extract or infer sensitive information from AI systems under certain conditions. While not common in everyday use, this reinforces an important point: once data is shared externally, some level of control is lost.

Why this matters

The biggest risk is not misuse, but routine use, the everyday habit of pasting information without thinking twice.
Most data exposure does not come from malicious intent; it happens during normal tasks such as debugging code, summarizing notes, drafting emails, or reviewing documents.

In early 2023, engineers at Samsung’s semiconductor division unintentionally leaked sensitive internal data by pasting proprietary code and confidential meeting notes into ChatGPT while trying to work more efficiently. In doing so, they exposed information outside their organization.

The same pattern applies in everyday settings, including universities. This might involve pasting assignment content, working with unpublished research, summarizing internal communications, or asking AI to review documents.

The consequences of sharing data can be significant and often extend beyond a single incident. Exposure of sensitive information can lead to privacy violations, loss of trust, or misuse of personal data. In some cases, it may also result in legal or regulatory consequences, as well as loss of intellectual property or reputational damage.

Sensitive information shared with AI tools may be retained, exposed, or combined with other data in ways that affect your privacy, finances, or digital security.

How to Safely Use AI Tools Without Exposing Sensitive Data

This is not about avoiding AI tools or discouraging their use. It is about understanding how to use them responsibly and what not to share.

Never Enter:

  • Your passwords or PINs
  • Social Security, passport, or tax ID numbers
  • Credit card or banking details
  • Internal business documents
  • Medical records or diagnosis details
  • Secrets, whether personal, professional, or private

Use caution with:

  • Student or customer-related information
  • Internal emails or meeting notes
  • Contracts or administrative documents
  • Source code or research data
  • Early-stage ideas that provide competitive advantage

Safer alternatives:

  • Synthetic examples (fake but realistic data)
  • Redacted content (“My password is [REDACTED]”)
  • Placeholders instead of real values
  • Public or non-sensitive information

Generally safe uses:

  • Brainstorming ideas
  • Drafting outlines or emails without sensitive details
  • Working with publicly available information
  • Practicing with sample or fictional data

Housekeeping tips:

  • Clear your chat history regularly
  • Opt out of data use for AI training (check your settings!)
  • Review privacy policies and app permissions for the tools you use

Remember that AI is not your friend; the replies you get may feel personal, but it is still a tool and should be treated like a public-facing system rather than a private conversation.

Even if there are settings that claim to minimize or delete what’s saved, AI is still a machine. And like machines, and elephants, it doesn’t truly forget. Once your information has been entered, there’s no guarantee it can’t be stored, accessed, or resurfaced later in ways you didn’t expect. 

Simple rule: if you wouldn’t post it on a public website, don’t put it into an AI tool.

While UMD offers AI tools that have enhanced privacy and security protections versus publicly available tools, the advice above is still good to keep in mind. If you have any questions about data protection in the UMD AI tools, please reach out to DIT’s AI Solutions team. If you have questions about data sensitivity and usage, please contact sparcs@umd.edu.

On
Back to Top