Escape / Unescape Tools

Escape and unescape special characters for HTML, JavaScript, SQL, Shell commands, Regular Expressions, and XML to prevent injection attacks and ensure proper encoding.

🔒Escape / Unescape Tools

HTML Escape: Escapes HTML special characters to prevent XSS attacks and display HTML code as text

Input Text

Output (Escaped)

Common Escape Sequences

HTML

< → &lt;
> → &gt;
& → &amp;
" → &quot;

JavaScript

" → \"
' → \'
\n → \\n
\t → \\t

SQL

' → ''
\ → \\
" → \"
NULL → \0

⚠️ Security Notice

Always escape user input before using it in HTML, SQL queries, shell commands, or other contexts where injection attacks are possible. This tool is for development purposes - use proper security libraries in production applications.

Why Escape Characters?

Escaping special characters is crucial for security and proper data handling in programming. It prevents injection attacks and ensures data is interpreted correctly.

✓Security: Prevent XSS, SQL injection, and command injection
✓Data Integrity: Preserve special characters in strings
✓Compatibility: Ensure proper parsing across systems

Common Use Cases

•HTML: Display code snippets, prevent XSS attacks
•JavaScript: Include quotes in strings, JSON data
•SQL: Safely insert user data into queries
•Shell: Handle filenames with spaces, special chars
•Regex: Match literal special characters
•XML: Encode data in XML/SOAP messages

Security Best Practices

Always Escape

• User input before displaying in HTML
• Variables in SQL queries (or use prepared statements)
• User data in shell commands
• Dynamic content in JavaScript strings
• Untrusted data in XML documents

Additional Measures

• Use parameterized queries for databases
• Implement Content Security Policy (CSP)
• Validate input on both client and server
• Use security libraries in production
• Regular security audits and updates

Injection Attack Examples

XSS (Cross-Site Scripting)

<script>alert('XSS')</script>

Unescaped HTML can execute malicious scripts

SQL Injection

'; DROP TABLE users; --

Unescaped SQL can destroy database tables

Command Injection

; rm -rf /

Unescaped shell commands can damage systems

How Character Escaping Works

Fundamentals of Character Escaping

Character escaping transforms special characters into safe representations that won't be interpreted as code or commands. Each context (HTML, JavaScript, SQL, etc.) has its own set of special characters that need escaping. For example, in HTML, < becomes < to prevent it from starting a tag. In JavaScript strings, quotes must be escaped: "He said \"hello\"" to prevent prematurely closing the string.

The fundamental principle: special characters that have syntactic meaning in a language must be represented differently when they appear as data rather than code. Without escaping, interpreters can't distinguish between data containing special characters and actual code. This confusion is the root cause of injection vulnerabilities—attackers inject code disguised as data.

Context awareness is critical. A string safe in one context may be dangerous in another. The value <script>alert(1)</script> is safe in a JavaScript string but dangerous in HTML. Proper escaping requires knowing the destination context and applying the appropriate transformation. Escaping for the wrong context provides no protection and may introduce new vulnerabilities.

Encoding versus escaping: related but distinct concepts. Encoding transforms data for transmission or storage (Base64, URL encoding), while escaping prevents interpretation as code. URL encoding converts spaces to %20 for transport; URL escaping in JavaScript uses encodeURIComponent() to prevent injection. Sometimes the same technique serves both purposes, but understanding the distinction clarifies when and why to apply each.

Double-escaping problems occur when data is escaped multiple times, producing incorrect output. Escaping < produces &lt;—the ampersand itself gets escaped. This creates visual artifacts (< displays in browsers). Always track escaping state: has this data been escaped already? Frameworks often handle this automatically, but manual escaping requires care to avoid double-escaping or missing escaping entirely.

HTML and XML Escaping

HTML entity encoding replaces reserved characters with named or numeric entities. Five characters require escaping in HTML: < becomes <, > becomes >, & becomes &, " becomes ", and ' becomes ' or '. These entities prevent characters from being interpreted as HTML syntax. For example, <script> becomes <script>, which displays as text rather than executing.

Attribute context requires additional care. In HTML attributes, both quote characters and the attribute delimiter must be escaped. In <div title="value">, the quote inside value must be ". If using single-quoted attributes (<div title='value'>), escape single quotes instead. Unquoted attributes (deprecated) require escaping spaces and many other characters—always use quoted attributes.

JavaScript-in-HTML contexts are particularly dangerous. Event handlers like onclick="doSomething()" contain JavaScript within HTML attributes, creating nested contexts. Data must be escaped for JavaScript first, then for HTML attributes. Simply HTML-escaping isn't sufficient—attackers can inject JavaScript code that remains valid after HTML decoding. Use JSON encoding for passing data to inline JavaScript.

XML escaping follows similar rules to HTML but is stricter. The five XML entities (< > & " ') must be escaped in content and attributes. CDATA sections provide an alternative: <![CDATA[content]]> treats content as literal text without entity escaping. However, CDATA itself must not contain ]]>, requiring splitting if needed. XML parsers reject malformed entities, providing some protection against injection.

Unicode and character encoding add complexity. HTML supports numeric character references (A for A, A for hex). Some characters (like zero-width spaces, directional overrides) can create visual confusion or bypass filters without being traditional injection. Normalize Unicode (NFC) before escaping to prevent bypasses using equivalent representations. Ensure your application uses UTF-8 consistently to avoid encoding-based attacks.

JavaScript and JSON Escaping

JavaScript string escaping uses backslash for special characters. Single and double quotes must be escaped (\' and \"), as must backslash itself (\\). Control characters use escape sequences: \n (newline), \t (tab), \r (carriage return). Unicode escapes (\uXXXX) represent characters by code point. Template literals use backticks and require escaping $ when not interpolating (\\$).

JSON is stricter than JavaScript. Only double-quoted strings are allowed—no single quotes or template literals. Required escapes: " becomes \", \ becomes \\, control characters use \n \t \r etc., and Unicode characters can be \uXXXX. Importantly, JSON doesn't allow undefined, functions, dates, or regex—only strings, numbers, booleans, null, objects, and arrays. Attempting to JSON.stringify() these throws errors or omits them.

HTML-in-JavaScript strings create nested contexts requiring careful escaping. When generating HTML from JavaScript: const html = "<div>" + data + "</div>", the data must be HTML-escaped before concatenation. If data contains <script>, it will execute when the HTML is inserted into the DOM. Use textContent or innerHTML with properly escaped content, or better yet, use DOM methods that automatically handle escaping.

Regular expression escaping prevents regex metacharacters from being interpreted as patterns. Characters like . * + ? [ ] ( ) { } ^ $ | \ have special meaning in regex. To match them literally, escape with backslash: \. matches a period, not any character. When creating regex from user input, escape all metacharacters or use the RegExp constructor carefully. Unescaped user input in regex enables ReDoS (Regular Expression Denial of Service) attacks.

Script tag content handling is tricky. <script> tags treat content as JavaScript, not HTML. HTML entities aren't decoded, so < displays as <, not <. The string </script> inside a script tag closes the tag, even in strings: var x = "</script>"; breaks. Split it as "\/script>" or use JSON encoding. External scripts with src attributes avoid this entirely—recommended for non-trivial JavaScript.

SQL and Database Escaping

SQL injection is prevented primarily by parameterized queries (prepared statements), not escaping. Parameterized queries separate SQL code from data: SELECT * FROM users WHERE id = ? with parameter binding. The database driver handles escaping automatically and correctly. This is the gold standard for SQL injection prevention—always use prepared statements when available.

When parameterization isn't possible (dynamic table/column names, LIMIT clauses in some databases), manual escaping is necessary. Single quotes are escaped by doubling: O'Reilly becomes O''Reilly. Backslash escaping (\') works in MySQL with certain settings but not all databases. Database-specific escape functions (mysql_real_escape_string, pg_escape_string) handle this correctly for each database.

Identifier escaping (table/column names) uses different delimiters. MySQL uses backticks: `user-name`, PostgreSQL uses double-quotes: "user-name", SQL Server uses square brackets: [user-name]. These allow identifiers with special characters or reserved words. However, never build identifiers from user input—use allowlists of valid names instead. Even with escaping, dynamic identifiers are risky.

LIKE clause escaping requires extra care. LIKE uses % (multi-character wildcard) and _ (single-character wildcard). When searching for literal %, it must be escaped: user input % becomes \% (or database-specific syntax). If not escaped, user input could wildcard-match more than intended. Define an escape character: LIKE ? ESCAPE '\' and escape %, _, and \ in the parameter.

NoSQL injection affects MongoDB and similar databases. While traditional SQL injection doesn't apply, similar issues exist. User input in query operators can inject logic: {username: req.body.username} is vulnerable if req.body.username is {$gt: ""} (matches all users). Validate input types—expect strings, reject objects. Use schema validation and proper ORM/ODM libraries that parameterize queries properly.

Shell and Command Escaping

Shell command injection is extremely dangerous because shells interpret many special characters: ; | & $ ` \ " ' ( ) { } [ ] < > * ? ! and more. Injecting ; allows executing additional commands. Backticks or $() execute subcommands. Pipes and redirects alter command behavior. Properly escaping all these characters is error-prone—better to avoid shell execution entirely.

Safer alternatives to shell execution include direct process spawning. In Node.js, use child_process.execFile() or spawn() instead of exec()—these don't invoke a shell, so shell metacharacters are literal. Python's subprocess.run() with shell=False does the same. Pass command and arguments as separate array elements, not a single string: ['ls', '-la', userInput] is safe, 'ls -la ' + userInput is not.

When shell execution is unavoidable (complex pipelines, shell-specific features), use proper quoting. POSIX shells support single quotes which prevent all expansion: 'user input' treats everything literally except single quotes. To include single quotes, escape them: 'don\''t'. Double quotes prevent most expansion but allow $ and backticks. Libraries like shell-escape handle this complexity correctly.

Argument escaping for specific commands may have special requirements. Arguments starting with - might be interpreted as options. Use -- to signal end of options: command -- -userInput treats -userInput as an argument, not an option. Some commands have --arg=value syntax; others require separate tokens. Read command documentation and test thoroughly with adversarial inputs.

Path traversal prevention is related to shell escaping. User input in file paths can include ../ to access parent directories, potentially accessing sensitive files. Validate paths: resolve to absolute paths, check they're within allowed directories. Use path.join() and path.resolve() to handle paths safely. Never directly concatenate user input into file paths or shell commands.

Context-Specific Considerations

LDAP injection requires escaping special characters in Lightweight Directory Access Protocol queries. Characters like * ( ) \ NUL must be escaped with backslash-hex notation: * becomes \2a, ( becomes \28. LDAP filters are similar to SQL queries—never concatenate user input. Use parameterized LDAP libraries or strict validation and escaping. LDAP injection can bypass authentication or access unauthorized data.

CSV injection (formula injection) is a less-known but serious vulnerability. Spreadsheet applications interpret cells starting with = + - @ as formulas. User input starting with these characters can execute formulas that access files, call web services, or execute commands (depending on application). Escape by prefixing with single quote: '=1+1 displays literally. Export systems must sanitize CSV data to prevent formula injection.

Email header injection happens when user input appears in email headers without validation. Newlines (\r\n) in headers can inject additional headers like Bcc: or change Subject:. Validate email addresses strictly—no newlines, control characters, or unexpected syntax. Use email libraries that parameterize headers correctly rather than string concatenation. Similar issues affect HTTP response header injection.

XML External Entity (XXE) attacks exploit XML parsers that process external entities. An attacker includes <!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]> to read local files. Disable external entity processing in XML parsers—this is configuration, not escaping. Parse untrusted XML with external entities disabled. Validate and sanitize XML input, not just escape characters.

Content Security Policy (CSP) provides defense-in-depth for web applications. Even with perfect escaping, CSP limits damage from successful XSS by restricting script sources, forbidding inline scripts, and blocking eval(). Set strict CSP headers: script-src 'self' disallows inline and external scripts except from same origin. CSP doesn't replace escaping but provides an additional security layer when escaping fails.

FAQ

What's the difference between escaping and encoding?

Escaping prevents characters from being interpreted as code by transforming them (HTML: < to <). Encoding transforms data for transmission or storage (Base64, URL encoding). While related, they serve different purposes. Sometimes the same technique does both: URL encoding %20 for spaces encodes for HTTP transport and escapes to prevent injection. Context determines which term applies.

Do I always need to escape user input?

Yes, when incorporating user input into code contexts (HTML, SQL, shell). However, the method varies: use parameterized queries for SQL (not escaping), use textContent instead of innerHTML for HTML (prevents interpretation as HTML), or avoid shell execution entirely. "Escape user input" is shorthand for "safely handle user input in code contexts"—sometimes this means parameterization or avoidance rather than literal escaping.

Can I just remove dangerous characters instead of escaping?

No, this approach (denylisting) is fragile and often breaks legitimate use cases. Users with names like O'Brien or businesses with <Company> can't use your application. Attackers find bypasses using Unicode equivalents, alternative syntax, or overlooked characters. Instead, escape properly to preserve the data while removing the danger. Validation (ensuring input matches expected format) complements but doesn't replace escaping.

My framework already escapes output. Do I still need to worry?

Modern frameworks (React, Vue, Angular) auto-escape by default, providing good protection for most cases. However, some contexts bypass escaping: dangerouslySetInnerHTML in React, v-html in Vue. SQL still requires parameterized queries—frameworks don't handle database escaping. Shell commands, external APIs, and other contexts need explicit handling. Understand what your framework protects and what requires manual attention.