Changes: HTML entities

Latest revision as of 09:14, 13 April 2017

Tip 1005 Printable Monobook Previous Next

created 2005 · complexity basic · author Jos van den Oever · version 6.0

There are several ways to deal with HTML entities so that text can be edited, for example, while it contains a simple ampersand (&) rather than its HTML entity (&).

Simple search and replace[]

This code allows you to easily escape or unescape HTML entities: Change (<, >, &) to (<, >, &), or the reverse.

This does not escape all characters that should be escaped—just the most common.

" Escape/unescape & < > HTML entities in range (default current line).
function! HtmlEntities(line1, line2, action)
  let search = @/
  let range = 'silent ' . a:line1 . ',' . a:line2
  if a:action == 0  " must convert &amp; last
    execute range . 'sno/&lt;/</eg'
    execute range . 'sno/&gt;/>/eg'
    execute range . 'sno/&amp;/&/eg'
  else              " must convert & first
    execute range . 'sno/&/&amp;/eg'
    execute range . 'sno/</&lt;/eg'
    execute range . 'sno/>/&gt;/eg'
  endif
  nohl
  let @/ = search
endfunction
command! -range -nargs=1 Entities call HtmlEntities(<line1>, <line2>, <args>)
noremap <silent> <Leader>h :Entities 0<CR>
noremap <silent> <Leader>H :Entities 1<CR>

If you add the above code to your vimrc, you can HTML escape the current line by typing \H, and unescape by typing \h (assuming the default backslash leader key). The same keys can be used to operate on all lines in a visually selected area, for example, select several lines then type \h to unescape them.

In addition, a user command is defined. It defaults to operating on the current line, but accepts a range. The argument is 0 to unescape, or 1 to escape, for example:

" Unescape lines 10 to 20 inclusive.
:10,20Entities 0

" Escape all lines.
:%Entities 1

Automagic escaping[]

A script is available (unicodeswitch) that automagically converts entities when files are read and written, so you can view the characters, and write the codes, or vice versa. It was originally written for Java unicodes, but there is also a setting for HTML codes.

The script is for &nnn style encoding, not the HTML entities.

Perl HTML::Entities[]

Note: Vim needs to compiled with the "perl" feature enabled for this to work.

A slightly more complex solution that escapes all characters uses Perl. You need Perl and HTML-Parser.

function! HTMLEncode()
perl << EOF
 use HTML::Entities;
 @pos = $curwin->Cursor();
 $line = $curbuf->Get($pos[0]);
 $encvalue = encode_entities($line);
 $curbuf->Set($pos[0],$encvalue)
EOF
endfunction

function! HTMLDecode()
perl << EOF
 use HTML::Entities;
 @pos = $curwin->Cursor();
 $line = $curbuf->Get($pos[0]);
 $encvalue = decode_entities($line);
 $curbuf->Set($pos[0],$encvalue)
EOF
endfunction

nnoremap <Leader>h :call HTMLEncode()<CR>
nnoremap <Leader>H :call HTMLDecode()<CR>

To convert a line, put the cursor in the line and type \h or \H.

Ruby HTMLEncode[]

Note: Vim needs to be compiled with the "ruby" feature enabled for this to work.

The following is a simpler alternative using Ruby.

function! HTMLEncode()
ruby << EOF
  @str=VIM::Buffer.current.line
  VIM::Buffer.current.line=@str.unpack("U*").collect {|s| (s > 127 ? "&##{s};" : s.chr) }.join("")
EOF
endfunction

nnoremap <Leader>h :call HTMLEncode()<CR>

Language specific HTML-entities[]

To change, for example, Norwegian special characters, there is no need to select text and not check all the text since it is never part of code-syntax. With the following, typing ,r will check all the text and replace all three Norwegian special characters with entities. This can easily be applied to other languages.

" Replace all Norwegian special characters with entities.
nnoremap <silent> ,r :call ReplaceNorChar()<CR>
function! ReplaceNorChar()
  silent %s/Æ/\&AElig;/eg
  silent %s/Ø/\&Oslash;/eg
  silent %s/Å/\&Aring;/eg
  silent %s/æ/\&aelig;/eg
  silent %s/ø/\&oslash;/eg
  silent %s/å/\&aring;/eg
endfunction

Add it to your ~/.vimrc or ~/.vim/ftplugin/html.vim.

Comments[]

Can check it with:

.! php -r "echo htmlentities('<cword>');"

command Entities :call Entities()
function Entities()
  silent s/À/\&Agrave;/eg
  silent s/Á/\&Aacute;/eg
  silent s/Â/\&Acirc;/eg
  silent s/Ã/\&Atilde;/eg
  silent s/Ä/\&Auml;/eg
  silent s/Å/\&Aring;/eg
  silent s/Æ/\&AElig;/eg
  silent s/Ç/\&Ccedil;/eg
  silent s/È/\&Egrave;/eg
  silent s/É/\&Eacute;/eg
  silent s/Ê/\&Ecirc;/eg
  silent s/Ë/\&Euml;/eg
  silent s/Ì/\&Igrave;/eg
  silent s/Í/\&Iacute;/eg
  silent s/Î/\&Icirc;/eg
  silent s/Ï/\&Iuml;/eg
  silent s/Ð/\&ETH;/eg
  silent s/Ñ/\&Ntilde;/eg
  silent s/Ò/\&Ograve;/eg
  silent s/Ó/\&Oacute;/eg
  silent s/Ô/\&Ocirc;/eg
  silent s/Õ/\&Otilde;/eg
  silent s/Ö/\&Ouml;/eg
  silent s/Ø/\&Oslash;/eg
  silent s/Ù/\&Ugrave;/eg
  silent s/Ú/\&Uacute;/eg
  silent s/Û/\&Ucirc;/eg
  silent s/Ü/\&Uuml;/eg
  silent s/Ý/\&Yacute;/eg
  silent s/Þ/\&THORN;/eg
  silent s/ß/\&szlig;/eg
  silent s/à/\&agrave;/eg
  silent s/á/\&aacute;/eg
  silent s/â/\&acirc;/eg
  silent s/ã/\&atilde;/eg
  silent s/ä/\&auml;/eg
  silent s/å/\&aring;/eg
  silent s/æ/\&aelig;/eg
  silent s/ç/\&ccedil;/eg
  silent s/è/\&egrave;/eg
  silent s/é/\&eacute;/eg
  silent s/ê/\&ecirc;/eg
  silent s/ë/\&euml;/eg
  silent s/ì/\&igrave;/eg
  silent s/í/\&iacute;/eg
  silent s/î/\&icirc;/eg
  silent s/ï/\&iuml;/eg
  silent s/ð/\&eth;/eg
  silent s/ñ/\&ntilde;/eg
  silent s/ò/\&ograve;/eg
  silent s/ó/\&oacute;/eg
  silent s/ô/\&ocirc;/eg
  silent s/õ/\&otilde;/eg
  silent s/ö/\&ouml;/eg
  silent s/ø/\&oslash;/eg
  silent s/ù/\&ugrave;/eg
  silent s/ú/\&uacute;/eg
  silent s/û/\&ucirc;/eg
  silent s/ü/\&uuml;/eg
  silent s/ý/\&yacute;/eg
  silent s/þ/\&thorn;/eg
  silent s/ÿ/\&yuml;/eg
endfunction

@@ Line 11: / Line 11: @@
 |category2=
 }}
-There are several ways to deal with HTML entities.
+There are several ways to deal with HTML entities so that text can be edited, for example, while it contains a simple ampersand (<code>&</code>) rather than its HTML entity (<code>&amp;amp;</code>).
-==Simple search & replace==
+==Simple search and replace==
-This code allows you to escape your HTML entities with one shortcut key: Change (<tt><, >, &</tt>) to (<tt>&amp;lt;, &amp;gt;, &amp;amp;</tt>), or the reverse.
+This code allows you to easily escape or unescape HTML entities: Change (<code><</code>, <code>></code>, <code>&</code>) to (<code>&amp;lt;</code>, <code>&amp;gt;</code>, <code>&amp;amp;</code>), or the reverse.
-Note that this does not escape all characters that should be escaped, just the most common.
+This does not escape all characters that should be escaped—just the most common.
 <pre>
+" Escape/unescape & < > HTML entities in range (default current line).
-function HtmlEscape()
+function! HtmlEntities(line1, line2, action)
-  silent s/&/\&amp;amp;/eg
+  let search = @/
-  silent s/</\&amp;lt;/eg
+  let range = 'silent ' . a:line1 . ',' . a:line2
-  silent s/>/\&amp;gt;/eg
+  if a:action == 0  " must convert &amp;amp; last
+    execute range . 'sno/&amp;lt;/</eg'
+    execute range . 'sno/&amp;gt;/>/eg'
+    execute range . 'sno/&amp;amp;/&/eg'
+  else              " must convert & first
+    execute range . 'sno/&/&amp;amp;/eg'
+    execute range . 'sno/</&amp;lt;/eg'
+    execute range . 'sno/>/&amp;gt;/eg'
+  endif
+  nohl
+  let @/ = search
 endfunction
+command! -range -nargs=1 Entities call HtmlEntities(<line1>, <line2>, <args>)
+noremap <silent> <Leader>h :Entities 0<CR>
+noremap <silent> <Leader>H :Entities 1<CR>
+</pre>
+If you add the above code to your [[vimrc]], you can HTML escape the current line by typing <code>\H</code>, and unescape by typing <code>\h</code> (assuming the default backslash leader key). The same keys can be used to operate on all lines in a visually selected area, for example, select several lines then type <code>\h</code> to unescape them.
-function HtmlUnEscape()
-  silent s/&amp;lt;/</eg
-  silent s/&amp;gt;/>/eg
-  silent s/&amp;amp;/\&/eg
-endfunction
+In addition, a user command is defined. It defaults to operating on the current line, but accepts a range. The argument is <code>0</code> to unescape, or <code>1</code> to escape, for example:
-map <silent> <c-h> :call HtmlEscape()<CR>
+<pre>
-map <silent> <c-u> :call HtmlUnEscape()<CR>
+" Unescape lines 10 to 20 inclusive.
+:10,20Entities 0
+" Escape all lines.
+:%Entities 1
 </pre>
-If you add this code to your vimrc, you can escape visually-selected HTML with ctrl-h, and unescape with ctrl-u.
 ==Automagic escaping==
-There's also script that does this for you automagically when you read and write files, so you can view the characters, and write the codes, or vice versa: {{script|id=909}}.
+A script is available ({{script|id=909|text=unicodeswitch}}) that automagically converts entities when files are read and written, so you can view the characters, and write the codes, or vice versa. It was originally written for Java unicodes, but there is also a setting for HTML codes.
-Originally written for Java unicodes, but there is also a setting for html codes.
-The script is for &nnn style encoding, not the html entities.
+The script is for <code>&nnn</code> style encoding, not the HTML entities.
-==perl HTML::Entities==
+==Perl HTML::Entities==
-''Note: Vim needs to compiled with the "perl" feature enabled for this to work''
+''Note: Vim needs to compiled with the "perl" feature enabled for this to work.''
-A slightly more complex solution that escape all characters is using perl, you will need [http://www.perl.org/ perl] and [http://search.cpan.org/dist/HTML-Parser/ HTML-Parser]
+A slightly more complex solution that escapes all characters uses Perl. You need [http://www.perl.org/ Perl] and [http://search.cpan.org/dist/HTML-Parser/ HTML-Parser].
 <pre>
 function! HTMLEncode()
@@ Line 68: / Line 80: @@
 endfunction
-map <Leader>h :call HTMLEncode()<CR>
+nnoremap <Leader>h :call HTMLEncode()<CR>
-map <Leader>H :call HTMLDecode()<CR>
+nnoremap <Leader>H :call HTMLDecode()<CR>
 </pre>
-Go to the line and do <tt>\h</tt> or <tt>\H</tt> to check it out.
+To convert a line, put the cursor in the line and type <code>\h</code> or <code>\H</code>.
-==ruby version of HTMLEncode()==
+==Ruby HTMLEncode==
-''Note: Vim needs to be compiled with the "ruby" feature enabled for this to work''
+''Note: Vim needs to be compiled with the "ruby" feature enabled for this to work.''
-The following is a simpler ruby solution to the perl version of HTMLEncode above.
+The following is a simpler alternative using Ruby.
 <pre>
 function! HTMLEncode()
+ruby << EOF
-ruby << EOF
   @str=VIM::Buffer.current.line
   VIM::Buffer.current.line=@str.unpack("U*").collect {|s| (s > 127 ? "&##{s};" : s.chr) }.join("")
 EOF
 endfunction
-map <Leader>h :call HTMLEncode()<CR>
+nnoremap <Leader>h :call HTMLEncode()<CR>
 </pre>
 ==Language specific HTML-entities==
-To change e.g. the Norwegian special characters there is no need to select text and not check all the text since it is never part of code-syntax (as far as I know). With the following, pressing &quot;,r&quot; from normal-mode will check all the text and replace all three Norwegian special chars with entities (and can easily be applied to other languages):
+To change, for example, Norwegian special characters, there is no need to select text and not check all the text since it is never part of code-syntax. With the following, typing <code>,r</code> will check all the text and replace all three Norwegian special characters with entities. This can easily be applied to other languages.
 <pre>
-" To replace all Norwegian special chars with entities.
+" Replace all Norwegian special characters with entities.
-nmap <silent> ,r :call ReplaceNorChar()<CR>
+nnoremap <silent> ,r :call ReplaceNorChar()<CR>
 function! ReplaceNorChar()
-	silent! %s/Æ/\&amp;AElig;/eg
+  silent %s/Æ/\&amp;AElig;/eg
-	silent! %s/Ø/\&amp;Oslash;/eg
+  silent %s/Ø/\&amp;Oslash;/eg
-	silent! %s/Å/\&amp;Aring;/eg
+  silent %s/Å/\&amp;Aring;/eg
-	silent! %s/æ/\&amp;aelig;/eg
+  silent %s/æ/\&amp;aelig;/eg
-	silent! %s/ø/\&amp;oslash;/eg
+  silent %s/ø/\&amp;oslash;/eg
-	silent! %s/å/\&amp;aring;/eg
+  silent %s/å/\&amp;aring;/eg
 endfunction
 </pre>
@@ Line 107: / Line 118: @@
 Add it to your ~/.vimrc or ~/.vim/ftplugin/html.vim.
+==See also==
+*[http://www.html-entities.org/ Quick reference and decode/encode tool]
+*[http://puzzlersworld.com/misc/html_escape_tool.html Html escape tool]
 ==Comments==
@@ Line 182: / Line 196: @@
 endfunction
 </pre>
-<small>--Preceding [[Vim Tips Wiki:Quick reference|unsigned]] comment added by [[User:212.145.191.182|212.145.191.182]] 11:11, July 23, 2010</small>
-:I formatted the above. What was the dot doing in <tt>s./old/new/eg</tt>? Is the dot supposed to be before the <tt>s</tt> (the current line)? If so, it is redundant because the default is the current line. I removed the dot from each command. [[User:JohnBeckett|JohnBeckett]] 12:07, July 23, 2010 (UTC)
-very well done, thank you