URL Encoder/Decoder

Encode special characters for URLs and decode URL-encoded strings. Essential for web developers working with query parameters and form data.

🔗URL Encoder/Decoder

Original Text

encodeURI() Result(Preserves URL structure)

Best for complete URLs. Preserves: : / ? # [ ] @ ! $ & ' ( ) * + , ; =

encodeURIComponent() Result(Encodes everything)

Best for query parameters. Only preserves: A-Z a-z 0-9 - _ . ! ~ * ' ( )

Quick Examples

URL Encoding Guide

When to use encodeURI():

• Encoding a complete URL
• Preserving the URL structure
• Working with full paths

When to use encodeURIComponent():

• Encoding query parameters
• Encoding form data
• Encoding URL fragments

Common encoded characters: Space=%20 or +, &=%26, ?=%3F, /=%2F, #=%23, @=%40

What is URL Encoding?

URL encoding (also known as percent encoding) is a mechanism to encode information in a Uniform Resource Identifier (URI) under certain circumstances. It replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits.

This encoding is necessary because URLs can only contain a limited set of characters from the ASCII character set. Special characters like spaces, ampersands, and non-ASCII characters must be encoded.

Common Use Cases

•Query Parameters: Encoding form data in URL parameters
•API Requests: Encoding data for GET requests
•Search Queries: Encoding search terms with special characters
•Form Submissions: Encoding form data before transmission
•File Paths: Encoding file names with special characters

Quick Reference

Common Encodings

• Space: %20
• ! : %21
• # : %23
• & : %26
• + : %2B

Reserved Characters

• : / ? # [ ] @
• ! $ & ' ( ) * + , ; =
• Must be encoded in certain contexts

Safe Characters

• A-Z, a-z, 0-9
• - . _ ~
• No encoding needed

How URL Encoding Works

The Mechanics of Percent Encoding

URL encoding, formally called percent encoding, converts characters into a format that can be safely transmitted over the internet. The process is straightforward: any character that isn't a letter (A-Z, a-z), digit (0-9), or one of the unreserved characters (- . _ ~) gets converted to a percent sign (%) followed by two hexadecimal digits representing the character's UTF-8 byte value.

For example, a space character has the ASCII value 32, which is 0x20 in hexadecimal, so it encodes to %20. The ampersand (&) has ASCII value 38 (0x26), encoding to %26. The encoding process: take the character, find its UTF-8 byte representation, convert each byte to hexadecimal, and prepend each with %. For multi-byte UTF-8 characters like emoji, each byte gets separately encoded.

The encoding rules distinguish between different parts of a URL. In query strings (everything after ?), spaces can be encoded as either %20 or +, though %20 is more universal. In path components (/path/to/resource), + doesn't mean space, so %20 must be used. Reserved characters like / and ? must be encoded when they're used as data rather than delimiters. For instance, if a query parameter value contains a question mark, it must be encoded as %3F.

Decoding reverses this process. When a browser or server encounters %XX, it interprets XX as hexadecimal and converts it back to the original character. Multiple percent-encoded bytes are combined to reconstruct multi-byte UTF-8 characters. The URL https://example.com/search?q=hello%20world decodes to https://example.com/search?q=hello world.

Modern URLs can contain international characters through Internationalized Resource Identifiers (IRIs). These are converted to URIs by percent-encoding their UTF-8 representation. The domain part uses Punycode for international domain names (IDN), while the path and query parts use percent encoding. This allows URLs like https://例え.jp/検索?q=テスト to be represented as ASCII-compatible URIs.

Historical Development and RFC Standards

URL encoding emerged with the birth of the World Wide Web in the early 1990s. Tim Berners-Lee defined the original URL specification in RFC 1738 (1994), establishing the percent encoding mechanism. The need arose because URLs had to work across different computer systems with varying character sets, and many systems couldn't handle non-ASCII characters or certain special characters safely.

The specification evolved through several RFCs. RFC 2396 (1998) refined URI syntax and clarified which characters needed encoding. RFC 3986 (2005), the current standard, unified URI handling and defined the unreserved characters (A-Z, a-z, 0-9, -, ., _, ~) that never need encoding. This RFC also formalized the distinction between URIs (Uniform Resource Identifiers) and URLs (Uniform Resource Locators), with URLs being a subset of URIs.

The plus sign (+) as an encoding for space is a legacy from HTML form submission (application/x-www-form-urlencoded). In this encoding, spaces in form data are converted to + for backward compatibility with early web servers. However, %20 is the standard percent encoding for space according to RFC 3986. Modern APIs typically accept both but generate %20 to avoid ambiguity.

RFC 3987 (2005) introduced IRIs (Internationalized Resource Identifiers), allowing Unicode characters directly in URLs. However, these must still be converted to percent-encoded ASCII for actual transmission over HTTP. Browsers handle this conversion transparently—you can type a URL with Chinese characters, and the browser encodes it automatically before sending the request.

Reserved vs Unreserved Characters

RFC 3986 divides characters into several categories with specific encoding rules. Unreserved characters (A-Z, a-z, 0-9, -, ., _, ~) can appear in URLs without encoding and maintain their literal meaning everywhere. These 66 characters form the safe core of URL syntax and never need percent encoding.

Reserved characters have special meaning in URL syntax and must be percent-encoded when used literally: : / ? # [ ] @ are gen-delims (general delimiters), while ! $ & ' ( ) * + , ; = are sub-delims (sub-component delimiters). For example, / separates path segments, so if you want an actual slash in a filename, it must be encoded as %2F. Similarly, ? starts the query string, so a literal question mark in data needs to be %3F.

Context matters critically. In query string values, & separates parameters, so it must be encoded as %26 when it's part of the data. In the path component, & has no special meaning and doesn't need encoding (though it's often encoded anyway for consistency). The colon (:) doesn't need encoding in query strings but should be encoded in the path on many servers.

Some characters always need encoding regardless of context: space, <, >, ", {, }, |, \, ^, `, and all non-ASCII characters. These are considered unsafe because they can cause parsing problems in various contexts. The space is particularly interesting—it's the most commonly encoded character, appearing as %20 in URLs and + in form data.

Practical Implementation in Web Development

JavaScript provides multiple encoding functions, each with subtle differences. encodeURI() encodes a complete URI, preserving reserved characters like : / ? # so it doesn't break the URL structure. encodeURIComponent() encodes everything except unreserved characters, making it suitable for encoding individual query parameters or path segments. For example, encodeURIComponent("hello world") produces "hello%20world", while encodeURIComponent("http://example.com") produces "http%3A%2F%2Fexample.com".

The correct function depends on what you're encoding. Use encodeURIComponent() for query parameter values: const url = `https://api.example.com/search?q=${encodeURIComponent(userInput)}`. Use encodeURI() for complete URLs when you need to preserve the URL structure but encode special characters. Never use the deprecated escape() function—it uses a non-standard encoding and doesn't handle UTF-8 properly.

Form submissions automatically handle encoding. When a form is submitted with method="GET", the browser constructs a query string, encoding parameter names and values using application/x-www-form-urlencoded format (spaces as +, reserved characters percent-encoded). With method="POST" and enctype="application/x-www-form-urlencoded", the same encoding is applied to the request body. For enctype="multipart/form-data", encoding is handled differently to support file uploads.

Backend languages have their own encoding functions. Python uses urllib.parse.quote() and quote_plus(), PHP has urlencode() and rawurlencode(), and Node.js uses encodeURIComponent(). The key difference between variants is usually whether spaces encode to %20 or +. The "raw" versions (rawurlencode, quote) use %20, matching RFC 3986, while the standard versions use + for spaces to match HTML form encoding.

Common Pitfalls and Security Considerations

Double encoding is a frequent mistake. If you encode data multiple times—once in client JavaScript and again on the server—you end up with %2520 (percent-encoded percent sign) instead of %20. The decoded result becomes "%20" instead of a space. Always be clear about which layer handles encoding and avoid redundant encoding.

Security vulnerabilities arise from improper URL handling. URL-based injection attacks can occur when user input isn't properly encoded before being inserted into URLs. An attacker might inject & to add extra parameters, or use %0A (newline) for HTTP response splitting in older systems. Always encode user data before including it in URLs, and validate decoded data on the server.

Open redirect vulnerabilities exploit improper validation of redirect URLs. If your code does something like redirect(userProvidedUrl) without validation, attackers can use it for phishing by redirecting to malicious sites. Always validate redirect destinations against a whitelist, or at minimum ensure they're relative URLs or match your domain.

Canonicalization issues occur when different URL encodings represent the same resource. %2F and / are semantically different in URLs—%2F is literal data while / is a path separator. However, some servers incorrectly treat them the same, potentially allowing directory traversal attacks. Modern web servers and frameworks handle this correctly, but legacy systems may be vulnerable. When validating URLs for security, always decode them first to see what they actually represent.

Special Cases and Edge Cases

International domain names (IDN) use Punycode encoding in the domain part of URLs, not percent encoding. The domain 例え.jp appears in URLs as xn--r8jz45g.jp. This is separate from percent encoding and is handled automatically by browsers. When you type an international domain, the browser converts it to Punycode before making the request.

Fragment identifiers (the part after #) follow different encoding rules in practice. Technically, RFC 3986 says they should be percent-encoded, but many modern single-page applications put complex data in fragments, sometimes using custom encoding schemes. Browsers generally don't send fragments to servers—they're purely client-side—so encoding practices vary widely.

Emoji and other Unicode characters require multiple bytes of UTF-8 encoding. The emoji 😀 is U+1F600, which in UTF-8 is four bytes: F0 9F 98 80, so it encodes as %F0%9F%98%80. Different programming languages handle this differently—some automatically use UTF-8, while others might use different encodings by default. Modern systems universally use UTF-8 for URL encoding, but legacy systems might use ISO-8859-1 or other encodings, causing mojibake when decoding.

FAQ

What's the difference between %20 and + for encoding spaces?

%20 is the standard percent encoding for space according to RFC 3986 and works everywhere in URLs. + is a legacy encoding from HTML forms (application/x-www-form-urlencoded) that only represents space in query strings. Use %20 for path components and when in doubt; + only works reliably in query parameters.

Should I use encodeURI() or encodeURIComponent() in JavaScript?

Use encodeURIComponent() for individual URL parts (query parameters, path segments) as it encodes all special characters. Use encodeURI() only when encoding a complete URL while preserving its structure (: / ? # characters). For most cases like query parameters, encodeURIComponent() is the correct choice.

Why do some URLs have %25 in them?

%25 is the percent-encoded form of the % character itself. This happens with double-encoding (encoding already-encoded data) or when a literal % is part of the data being transmitted. If you see %2520, that's double-encoded space (%20 encoded again), usually indicating an encoding bug.

Do I need to encode the entire URL or just parts of it?

Only encode the data parts, not the structure. Encode query parameter values, path segments with user data, and fragment content. Don't encode the protocol (http://), domain name, or structural characters (/ ? & = #) unless they're part of the actual data. Use encodeURIComponent() for data parts and construct URLs manually.