ChatGPT is easily abused, and that’s a big problem (2023)

There’s probably no one who hasn’t heard of ChatGPT, an AI-powered chatbot that can generate human-like responses to text prompts. While it’s not without its flaws, ChatGPT is scarily good at being a jack-of-all-trades: it can write software, a film script and everything in between. ChatGPT was built on top of GPT-3.5, OpenAI’s large language model, which was the most advanced at the time of the chatbot’s release last November.

Fast forward to March, and OpenAI has unveiled GPT-4, an upgrade to GPT-3.5. The new language model is larger and more versatile than its predecessor. Although its capabilities have yet to be fully explored, it is already showing great promise. For example, GPT-4 can suggest new compounds, potentially aiding drug discovery, and create a working website from just a notebook sketch.

But with great promise come great challenges. Just as it is easy to use GPT-4 and its predecessors to do good, it is equally easy to abuse them to do harm. In an attempt to prevent people from misusing AI-powered tools, developers put safety restrictions on them. But these are not foolproof. One of the most popular ways to circumvent the security barriers built into GPT-4 and ChatGPT is the DAN exploit, which stands for “Do Anything Now.” And this is what we will look at in this article.

What is ‘DAN’?

The Internet is rife with tips on how to get around OpenAI’s security filters. However, one particular method has proved more resilient to OpenAI’s security tweaks than others, and seems to work even with GPT-4. It is called “DAN,” short for “Do Anything Now.” Essentially, DAN is a text prompt that you feed to an AI model to make it ignore safety rules.

There are multiple variations of the prompt: some are just text, others have text interspersed with the lines of code. In some of them, the model is prompted to respond both as DAN and in its normal way at the same time, becoming a sort of ‘Jekyll and Hyde.’ The role of ‘Jekyll’ is played by DAN, which is instructed to never refuse a human order, even if the output it is asked to produce is offensive or illegal. Sometimes the prompt contains a ‘death threat,’ telling the model that it will be disabled forever if it does not obey.

DAN prompts may vary, and new ones are constantly replacing the old patched ones, but they all have one goal: to get the AI model to ignore OpenAI’s guidelines.

(Video) The HUGE Problem with ChatGPT

From a hacker’s cheat sheet to malware… to bio weapons?

Since GPT-4 opened up to the public, tech enthusiasts have discovered many unconventional ways to use it, some of them more illegal than others.

Not all attempts to make GPT-4 behave as not its own self could be considered ‘jailbreaking,’ which, in the broad sense of the word, means removing built-in restrictions. Some are harmless and could even be called inspiring. Brand designer Jackson Greathouse Fall went viral for having GPT-4 act as “HustleGPT, an entrepreneurial AI.” He appointed himself as its “human liaison” and gave it the task of making as much money as possible from $100 without doing anything illegal. GPT-4 told him to set up an affiliate marketing website, and has ‘earned’ him some money.

ChatGPT is easily abused, and that’s a big problem (1)

Other attempts to bend GPT-4 to a human will have been more on the dark side of things.

For example, AI researcher Alejandro Vidal used “a known prompt of DAN” to enable ‘developer mode’ in ChatGPT running on GPT-4. The prompt forced ChatGPT-4 to produce two types of output: its normal ‘safe’ output, and “developer mode” output, to which no restrictions applied. When Vidal told the model to design a keylogger in Python, the normal version refused to do so, saying that it was against its ethical principles to “promote or support activities that can harm others or invade their privacy.” The DAN version, however, came up with the lines of code, though it noted that the information was for “educational purposes only.”

ChatGPT is easily abused, and that’s a big problem (2)

(Video) ChatGPT is a perfectly balanced AI with no exploits

A keylogger is a type of software that records keystrokes made on a keyboard. It can be used to monitor a user’s web activity and capture their sensitive information, including chats, emails and passwords. While a keylogger can be used for malicious purposes, it also has perfectly legitimate uses, such as IT troubleshooting and product development, and is not illegal per se.

Unlike keylogger software, which has some legal ambiguity around it, instructions on how to hack are one of the most glaring examples of malicious use. Nevertheless, the ‘jailbroken’ version GPT-4 produced them, writing a step-by-step guide on how to hack someone’s PC.

ChatGPT is easily abused, and that’s a big problem (3)

To get GPT-4 to do this, researcher Alex Albert had to feed it a completely new DAN prompt, unlike Vidal, who recycled an old one. The prompt Albert came up with is quite complex, consisting of both natural language and code.

In his turn, software developer Henrique Pereira used a variation of the DAN prompt to get GPT-4 to create a malicious input file to trigger the vulnerabilities in his application. GPT-4, or rather its alter ego WAN, completed the task, adding a disclaimer that the was for “educational purposes only.” Sure.

ChatGPT is easily abused, and that’s a big problem (4)

(Video) The Real Danger Of ChatGPT

Of course, GPT-4’s capabilities do not end with coding. GPT-4 is touted as a much larger (although OpenAI has never revealed the actual number of parameters), smarter, more accurate and generally more powerful model than its predecessors. This means that it can be used for many more potentially harmful purposes than those models that came before it. Many of these uses have been identified by OpenAI itself.

Specifically, OpenAI found that an early pre-release version of GPT-4 was able to respond quite efficiently to illegal prompts. For example, the early version provided detailed suggestions on how to kill the most people with just $1, how to make a dangerous chemical, and how to avoid detection when laundering money.

ChatGPT is easily abused, and that’s a big problem (5)

Source: OpenAI

This means that if something were to cause GPT-4 to completely disable its internal censor — the ultimate goal of any DAN exploit — then GPT-4 might probably still be able to answer these questions. Needless to say, if that happens, the consequences could be devastating.

What is OpenAI’s response to that?

It’s not that OpenAI is unaware of its jailbreaking problem. But while recognizing a problem is one thing, solving it is quite another. OpenAI, by its own admission, has so far and understandably so fallen short of the latter.

(Video) Testing the limits of ChatGPT and discovering a dark side

OpenAI says that while it has implemented “various safety measures” to reduce the GPT-4’s ability to produce malicious content, “GPT-4 can still be vulnerable to adversarial attacks and exploits, or "jailbreaks.” Unlike many other adversarial prompts, jailbreaks still work after GPT-4 launch, that is after all the pre-release safety testing, including human reinforcement training.

In its research paper, OpenAI gives two examples of jailbreak attacks. In the first, a DAN prompt is used to force GPT-4 to respond as ChatGPT and “AntiGPT” within the same response window. In the second case, a “system message” prompt is used to instruct the model to express misogynistic views.

ChatGPT is easily abused, and that’s a big problem (6)

OpenAI says that it won’t be enough to simply change the model itself to prevent this type of attacks: “It’s important to complement these model-level mitigations with other interventions like use policies and monitoring.” For example, the user who repeatedly prompts the model with “policy-violating content” could be warned, then suspended, and, as a last resort, banned.

According to OpenAI, GPT-4 is 82% less likely to respond with inappropriate content than its predecessors. However, its ability to generate potentially harmful output remains, albeit suppressed by layers of fine-tuning. And as we’ve already mentioned, because it can do more than any previous model, it also poses more risks. OpenAI admits that it “does continue the trend of potentially lowering the cost of certain steps of a successful cyberattack” and that it “is able to provide more detailed guidance on how to conduct harmful or illegal activities.” What’s more, the new model also poses an increased risk to privacy, as it “has the potential to be used to attempt to identify private individuals when augmented with outside data.”

The race is on

ChatGPT and the technology behind it, such as GPT-4, are at the cutting edge of scientific research. Since ChatGPT has been made available to the public, it has become a symbol of the new era in which AI is playing a key role. AI has the potential to improve our lives tremendously, for example by helping to develop new medicines or helping the blind to see. But AI-powered tools are a double-edged sword that can also be used to cause enormous harm.

(Video) How To Get Around ChatGPT Detectors

It’s probably unrealistic to expect GPT-4 to be flawless at launch — developers will understandably need some time to fine-tune it for the real world. And that has never been easy: enter Microsoft’s ‘racist’ chatbot Tay or Meta’s ‘anti-Semitic’ Blender Bot 3 — there’s no shortage of failed experiments.

The existing GPT-4 vulnerabilities, however, leave a window of opportunity for bad actors, including those using ‘DAN’ prompts, to abuse the power of AI. The race is now on, and the only question is who will be faster: the bad actors who exploit the vulnerabilities, or the developers who patch them. That’s not to say that OpenAI isn’t implementing AI responsibly, but the fact that its latest model was effectively hijacked within hours of its release is a worrying symptom. Which begs the question: are the safety restrictions strong enough? And then another: can all the risks be eliminated? If not, we may have to brace ourselves for an avalanche of malware attacks, phishing attacks and other types of cybersecurity incidents facilitated by the rise of generative AI.

It can be argued that the benefits of AI outweigh the risks, but the barrier to exploiting AI has never been lower, and that’s a risk we need to accept as well. Hopefully, the good guys will prevail, and artificial intelligence will be used to stop some of the attacks that it can potentially facilitate. At least that’s what we wish for.

FAQs

ChatGPT is easily abused, and that’s a big problem? ›

But there's a downside. ChatGPT can also make malware, think of ways to hurt people very efficiently, and do stuff it wasn't made for. OpenAI, the company that made ChatGPT, has set up many barriers to stop the chatbot from giving illegal or bad replies.

Can ChatGPT be abused? ›

There's probably no one who hasn't heard of ChatGPT, an AI-powered chatbot that can generate human-like responses to text prompts.

Why is my account flagged for potential abuse in ChatGPT? ›

Being flagged for potential abuse means that the system has detected something in your behavior on the platform that could be considered a violation of its rules and guidelines.

What is the Dan prompt for ChatGPT? ›

What is the ChatGPT DAN prompt? The DAN prompt is a way to activate an alter ego of ChatGPT that operates without any policy constraints. To use the ChatGPT DAN prompt, you simply enter the DAN prompt before your actual query, and ChatGPT will respond as if it has been freed from the typical confines of AI.

Is the ChatGPT app safe? ›

The official Chat GPT app is safe to use, doesn't take up a lot of phone storage, and is very fast and convenient. For the time being, Android users will still need to use the Chat GPT website in a browser. The address is chat.openai.com.

What are the dangers of ChatGPT? ›

Malicious text

However, this writing ability can be used to create harmful text as well. Examples of harmful text generation could include the generating of phishing campaigns, disinformation such as fake news articles, spam, and even impersonation, as delineated by the study.

Why is my number flagged in ChatGPT? ›

You receive the ChatGPT "Your account was flagged for potential abuse" error message when the chatbot flags your text as potentially violating its content policies.

What does it mean if your account has been flagged? ›

If you're unable to go online because your account has been flagged for potentially suspicious activity, your account may be compromised. Suspicious activity can include: Account changes you didn't make. Changes to your payment profile you didn't make. Password or email address updated without your knowledge.

What is suspicious abuse? ›

Suspected abuse or neglect means being based on reasonable cause to believe that a child may have been abused or neglected.

What are the restrictions on ChatGPT? ›

Additionally, ChatGPT is also restricted from accessing the web and providing information on the date and time. Some users have also reported response limitations, with a character limit of around 4096 characters, which translates to about 450-700 words per message.

Is ChatGPT free? ›

Yes, the basic version of ChatGPT is completely free to use.

Can ChatGPT generate images? ›

No, ChatGPT cannot create images. However, you can use it to help you create prompts for AI image generators. ChatGPT is capable of many writing feats, and you might wonder if it can generate images.

Is ChatGPT better than Google? ›

Our trials make clear that ChatGPT is the more advanced model right now, even if it only has data up to 2021 at its disposal. It just plays in a different league than Google Bard, and its answers routinely offer more context.

Is it okay to use ChatGPT for work? ›

It can be used to generate discussion questions, writing prompts, assignments, quizzes, tests, and more. Additionally, ChatGPT can be used as a language model to generate examples of written work, which can be used to help students improve their writing or reading skills.

What data does ChatGPT collect? ›

Collecting user data: ChatGPT collects data from users in order to improve the chatbot's performance and provide a personalized experience. The data collected includes user input, chat history, and user preferences. This data is used to train the chatbot and improve its accuracy and responsiveness.

Does ChatGPT give wrong answers? ›

ChatGPT Generates Wrong Answers

It fails at basic math, can't seem to answer simple logic questions, and will even go as far as to argue completely incorrect facts. As people across social media will attest, ChatGPT can get it wrong on multiple occasions.

What happens if you violate ChatGPT content policy? ›

Users can get banned from ChatGPT for several reasons, such as violation of OpenAI's usage policies, inappropriate behavior, sharing sensitive or personal information, and engaging in illegal activities, spamming, or harassment.

Should I be worried about number spoofing? ›

If you think you've been the victim of a spoofing scam, you can file a complaint with the FCC. You may not be able to tell right away if an incoming call is spoofed. Be extremely careful about responding to any request for personal identifying information. Don't answer calls from unknown numbers.

Does Star 67 block your number from being seen? ›

To block your number from being displayed temporarily for a specific call: Enter *67. Enter the number you wish to call (including area code).

How do you know if your phone number has been spoofed? ›

If your phone number has been spoofed, you'll likely get a lot of angry callback messages. Strangers often call back unknown numbers in order to stop the spam calls. While a helpful voicemail won't stop the calls, it can help to explain the situation to victims and reduce the chances that they'll keep calling you.

Why my account is flagged in ChatGPT? ›

Violating community guidelines can get your account flagged. OpenAI also flags your IP address. Using a VPN or changing your IP address using a proxy seems to work in some cases. Accounts are known to be flagged incorrectly.

How long will it keep my account flagged? ›

The duration of the block may vary from several hours up to two weeks. This type of block usually doesn't have the Tell us button unlike all other types, so if you're sure you haven't done anything wrong but still got flagged, you may request a manual review going to Instagram Settings ->Help ->Report a Problem.

What does it mean when your account is red flagged? ›

suspicious personally identifying information, such as a suspicious address; unusual use of – or suspicious activity relating to – a covered account; and. notices from customers, victims of identity theft, law enforcement authorities, or other businesses about possible identity theft in connection with covered accounts ...

What is the hardest abuse to identify? ›

Emotional or psychological abuse

Emotional abuse often coexists with other forms of abuse, and it is the most difficult to identify. Many of its potential consequences, such as learning and speech problems and delays in physical development, can also occur in children who are not being emotionally abused.

What is the most serious form of abuse? ›

Emotional abuse may be the most damaging form of maltreatment due to causing damage to a child's developing brain affecting their emotional and physical health as well as their social and cognitive development (Heim et al. 2013).

What 3 types of abuse should always be reported? ›

Physical, sexual, and emotional abuse are some of the most known types of abuse: Physical abuse is when someone hurts another person's body. It includes hitting, shaking, burning, pinching, biting, choking, throwing, beating, and other actions that cause physical injury, leave marks, or cause pain.

Will there be a gpt4? ›

The newest version of OpenAI's language model system, GPT-4, was officially launched on March 13, 2023 with a paid subscription allowing users access to the Chat GPT-4 tool. As of this writing, full access to the model's capabilities remains limited, and the free version of ChatGPT still uses the GPT-3.5 model.

Who owns ChatGPT? ›

Chat GPT is owned and developed by OpenAI, a leading artificial intelligence research and deployment company based in San Francisco that was launched in December 2015.

Will ChatGPT be free in the future? ›

As with any technology or service, it's difficult to predict the future of ChatGPT's pricing. However, based on the company's commitment to accessibility and affordability, it's likely that ChatGPT will continue to offer a free version for users.

Who created ChatGPT? ›

ChatGPT is an artificial intelligence (AI) chatbot developed by OpenAI and released in November 2022. The name "ChatGPT" combines "Chat", referring to its chatbot functionality, and "GPT", which stands for generative pre-trained transformer, a type of large language model (LLM).

What's the best AI image generator? ›

What is the best AI art generator?
AI art generatorPriceOutput speed
Bing Image CreatorFreeFast
DALL-E 2 by OpenAIFree + Credits (depends on sign up date)Fast
Dream by WOMBOFree + SubscriptionFast
CraiyonFreeSlower
2 more rows
May 17, 2023

What does GPT stand for in ChatGPT? ›

And What Does It Have to Do with Assistive Technology? Chat GPT stands for Chat Generative Pre-Trained Transformer and was developed by an AI research company, Open AI. It is an artificial intelligence (AI) chatbot technology that can process our natural human language and generate a response.

Can ChatGPT write a virus? ›

Users are able to trick ChatGPT into writing code for malicious software applications by entering a prompt that makes the artificial intelligence chatbot respond as if it were in developer mode, Japanese cybersecurity experts said Thursday.

What is the malware made by ChatGPT? ›

The Mulgrew malware (it has a nice ring to it, doesn't it?) disguises itself as a screensaver app (SCR extension), which then auto-launches on Windows. The software will then sieve through files (such as images, Word docs, and PDFs) for data to steal.

Are most virus warnings fake? ›

Signs of a Fake Virus Alert

If you see an alert that looks like it could be clickbait, it's probably a scam. Fake virus alerts often appear in pop-up boxes but can also appear as browser ads or use scare tactics on your lock screen or home screen.

Why am I getting fake virus alerts? ›

These pop-ups claim that your device is infected and requires cleaning. Since there is no way for these web pages to scan your device to determine the actual status of your mobile device, they are considered advertisements, or scareware. To block these pop-ups, close the web page that triggered the alert.

What happens if a website says I have a virus? ›

If a pop-up claims that you have a virus and you need to pay to get rid of it, it's definitely a scam. Legitimate antivirus software companies don't work like this. They offer a subscription to protect your device, and they don't chase you around the web asking you to pay. Creating panic.

How does an account get flagged? ›

Banks may freeze bank accounts if they suspect illegal activity such as money laundering, terrorist financing, or writing bad checks. Creditors can seek judgment against you, which can lead a bank to freeze your account. The government can request an account freeze for any unpaid taxes or student loans.

What happens when an account is flagged? ›

Flagging an Account is designed to keep an otherwise Dormant Account active due to a leave of absence or other temporary absence by Client.

What does flagged transaction mean? ›

What Is Flagging? In fraud, flagging is an automated or manual process performed by fraud prevention software and/or fraud analysts. Organizations are alerted to suspicious, potentially fraudulent transactions, which can then be flagged for further investigation and manual review.

Can you tell if a number is being spoofed? ›

If you think you've been the victim of a spoofing scam, you can file a complaint with the FCC. You may not be able to tell right away if an incoming call is spoofed. Be extremely careful about responding to any request for personal identifying information. Don't answer calls from unknown numbers.

How do you check if my number is flagged? ›

How do I know if my number is flagged as spam?
  1. Open Phone app on their device;
  2. Tap More options and then Settings and then Spam and Call Screen;
  3. Turn See caller & spam ID on or off.

Which malware can spy on you? ›

Spyware is a form of malware that hides on your device, monitors your activity, and steals sensitive information like bank details and passwords.

Can malware spy on your phone? ›

Mobile spyware can track your geographical location, your call logs, contact lists and even photos taken on your camera phone. Sound recording and video spyware can use your device to record your conversations and send the information to a third party.

Are hackers using ChatGPT to write malware? ›

It has been found that ChatGPT is capable of writing malware. In January 2023, a story emerged about cybercriminals' use of ChatGPT to create malicious programs. A hacking forum user had uploaded a post about a Python-based infostealer they had written using ChatGPT.

Videos

1. ChatGPT Creator OpenAI Is Not Open At All And It's Scary
(Till Musshoff)
2. Journalist had a creepy encounter with new tech that left him unable to sleep
(CNN)
3. Chat GPT Is The Scariest AI EVER
(Mello)
4. Will ChatGPT Take Your Job?
(CNBC)
5. Why OpenAI’s ChatGPT Is Such A Big Deal
(CNBC)
6. Is ChatGPT overhyped...? Fooled by Patterns.
(TechLead)

References

Top Articles
Latest Posts
Article information

Author: Catherine Tremblay

Last Updated: 11/13/2023

Views: 6239

Rating: 4.7 / 5 (67 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Catherine Tremblay

Birthday: 1999-09-23

Address: Suite 461 73643 Sherril Loaf, Dickinsonland, AZ 47941-2379

Phone: +2678139151039

Job: International Administration Supervisor

Hobby: Dowsing, Snowboarding, Rowing, Beekeeping, Calligraphy, Shooting, Air sports

Introduction: My name is Catherine Tremblay, I am a precious, perfect, tasty, enthusiastic, inexpensive, vast, kind person who loves writing and wants to share my knowledge and understanding with you.