在RFC1738中,对于URL可以使用的字符集做了如下规定:

只有0-9a-zA-Z的字母以及$-_.+!*'(),"这几个特殊字符

而在html4中扩展了所有的unicode character set能够在url中使用。

那么到底有哪些字符需要encoded呢?

1. ascii control characters

 原因是:他们不可打印,

 字符范围iso-8859-1的00-1F 以及7F

2. non-ascii characters:

原因:这些字符因为不在ascii集合中不被认为在url中是合法的

字符范围: iso-latin的80-FF范围

3. reserved characters:

原因:URL使用部分预留的字符来定义url的语法。当这些字符在url中不被当作其特殊角色时,他们必须被encoded

字符范围: $, &,+, , /,:,;,=,?,@

 

Character Code
Points
(Hex)
Code
Points
(Dec)
 Dollar ("$")
 Ampersand ("&")
 Plus ("+")
 Comma (",")
 Forward slash/Virgule ("/")
 Colon (":")
 Semi-colon (";")
 Equals ("=")
 Question mark ("?")
 'At' symbol ("@")
24
26
2B
2C
2F
3A
3B
3D
3F
40
36
38
43
44
47
58
59
61
63
64

4.unsafe characters

原因: 部分字符如果在url中可能导致歧义。这些字符也必须被encoded:

 

Character Code
Points
(Hex)
Code
Points
(Dec)
Why encode?
Space 20 32 Significant sequences of spaces may be lost in some uses (especially multiple spaces)
Quotation marks
'Less Than' symbol ("<")
'Greater Than' symbol (">")
22
3C
3E
34
60
62
These characters are often used to delimit URLs in plain text.
'Pound' character ("#") 23 35 This is used in URLs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins.
Percent character ("%") 25 37 This is used to URL encode/escape other characters, so it should itself also be encoded.
Misc. characters:
   Left Curly Brace ("{")
   Right Curly Brace ("}")
   Vertical Bar/Pipe ("|")
   Backslash ("\")
   Caret ("^")
   Tilde ("~")
   Left Square Bracket ("[")
   Right Square Bracket ("]")
   Grave Accent ("`")

7B
7D
7C
5C
5E
7E
5B
5D
60

123
125
124
92
94
126
91
93
96
Some systems can possibly modify these chara

 如何做url encoded呢?

url encoding of a character包含一个%号,并且以iso-latin的16进制两位数来跟进

例如:

space = %20

使用javascript的 

encodeURIComponent 函数来实现

相关文章: