Click on the black arrow to open the chat in a new window.
If we're not online, please email us at urilibrarian@gmail.com. Please allow 1-2 business days for a response.
The corpus or training data used in many AI tools is proprietary and collected from various sources, so we can't know what exactly is in the collection. As a result, it's essential to be aware of potential problems with the output. The cautions and caveats collected on this page are just a starting point to get you thinking. Be aware that just because information is machine-generated doesn't mean it's necessarily accurate and impartial.
Because AI tools use aggregated information to predict words, images, or musical notes that might follow or appear next to others, sometimes responses veer off into what is a likely response rather than an accurate or correct response. These responses are called "hallucinations." Some hallucinations include fabricated references, invented facts, and potentially dangerous instructions. Many students have used generative AI to produce a works cited or list of sources for a paper only to discover that most of the suggested articles and books don't actually exist.
Always fact check AI output before using it! While generative AI is incredibly useful, keep in mind that the information it gives you may be a blend of fact and fiction, so you should be careful and critical.
The training data for AI tools comes from platforms where people have published information. Training data may come from social media, online forums such as Reddit, news sources like The New York Times or The Providence Journal, Wikipedia, collections of factual information, scholarly articles, and even the text of fictional books and fantasy stories.
While we don't know the sources of the training data, we do know that those with access write the materials used to publish on those platforms. Those who don't have access to publish online or through traditional venues may not be represented. As a result, the output may be skewed toward the experiences of those with the skills and access to publish online rather than representative of the breadth of experience.
Moreover, AI tools may produce output that reflects biases in the data and material they are trained on. Just as you should be critically aware of bias when evaluating news articles and other sources of information, so you should be careful to watch for biased output when using AI tools. Just like humans, AI can produce information that reflects cultural, gender, or racial stereotypes.
The algorithms used to explore the training data are also proprietary, meaning users can't know precisely how the data is being mined. How the algorithms are programmed may reflect the conscious or subconscious biases of the developer.
Unless users change the tool's settings, the prompts and content provided to the tool become part of the training data. This means that sensitive information shared with the tool could be tied to an account or used as a response to someone else's query. Uploading a paper or PDF to an AI tool could result in losing one's intellectual property - or someone else's.
AI tools may generate partially or wholly plagiarized responses from an author's work without citation. Using the responses could put the user at risk of a plagiarism charge.
While some news organizations, such as the Associated Press, have licensed their work for use in specific AI tools, some authors and other news organizations (notably The New York Times) are suing OpenAI (the company responsible for ChatGPT) for the unauthorized reuse of their published content. As a result, the use of AI tool output could infringe on someone's copyright.
This work is licensed under a Creative Commons Attribution 4.0 International License.