Tip 1005 Printable Monobook Previous Next

created 2005 · complexity basic · author Jos van den Oever · version 6.0

There are several ways to deal with HTML entities.

Simple search & replace

This code allows you to escape your HTML entities with one shortcut key: Change (<, >, &) to (&lt;, &gt;, &amp;), or the reverse.

Note that this does not escape all characters that should be escaped, just the most common.

function HtmlEscape()
  silent s/&/\&amp;/eg
  silent s/</\&lt;/eg
  silent s/>/\&gt;/eg

function HtmlUnEscape()
  silent s/&lt;/</eg
  silent s/&gt;/>/eg
  silent s/&amp;/\&/eg

map <silent> <c-h> :call HtmlEscape()<CR>
map <silent> <c-u> :call HtmlUnEscape()<CR>

If you add this code to your vimrc, you can escape visually-selected HTML with ctrl-h, and unescape with ctrl-u.

Automagic escaping

There's also script that does this for you automagically when you read and write files, so you can view the characters, and write the codes, or vice versa: script#909.

Originally written for Java unicodes, but there is also a setting for html codes.

The script is for &nnn style encoding, not the html entities.

perl HTML::Entities

Note: Vim needs to compiled with the "perl" feature enabled for this to work

A slightly more complex solution that escape all characters is using perl, you will need perl and HTML-Parser

function! HTMLEncode()
perl << EOF
 use HTML::Entities;
 @pos = $curwin->Cursor();
 $line = $curbuf->Get($pos[0]);
 $encvalue = encode_entities($line);

function! HTMLDecode()
perl << EOF
 use HTML::Entities;
 @pos = $curwin->Cursor();
 $line = $curbuf->Get($pos[0]);
 $encvalue = decode_entities($line);

map <Leader>h :call HTMLEncode()<CR>
map <Leader>H :call HTMLDecode()<CR>

Go to the line and do \h or \H to check it out.


Can check it with:

.! php -r "echo htmlentities('<cword>');"

command Entities :call Entities()
function Entities()
  silent s/À/&Agrave;/eg
  silent s/Á/&Aacute;/eg
  silent s/Â/&Acirc;/eg
  silent s/Ã/&Atilde;/eg
  silent s/Ä/&Auml;/eg
  silent s/Å/&Aring;/eg
  silent s/Æ/&AElig;/eg
  silent s/Ç/&Ccedil;/eg
  silent s/È/&Egrave;/eg
  silent s/É/&Eacute;/eg
  silent s/Ê/&Ecirc;/eg
  silent s/Ë/&Euml;/eg
  silent s/Ì/&Igrave;/eg
  silent s/Í/&Iacute;/eg
  silent s/Î/&Icirc;/eg
  silent s/Ï/&Iuml;/eg
  silent s/Ð/&ETH;/eg
  silent s/Ñ/&Ntilde;/eg
  silent s/Ò/&Ograve;/eg
  silent s/Ó/&Oacute;/eg
  silent s/Ô/&Ocirc;/eg
  silent s/Õ/&Otilde;/eg
  silent s/Ö/&Ouml;/eg
  silent s/Ø/&Oslash;/eg
  silent s/Ù/&Ugrave;/eg
  silent s/Ú/&Uacute;/eg
  silent s/Û/&Ucirc;/eg
  silent s/Ü/&Uuml;/eg
  silent s/Ý/&Yacute;/eg
  silent s/Þ/&THORN;/eg
  silent s/ß/&szlig;/eg
  silent s/à/&agrave;/eg
  silent s/á/&aacute;/eg
  silent s/â/&acirc;/eg
  silent s/ã/&atilde;/eg
  silent s/ä/&auml;/eg
  silent s/å/&aring;/eg
  silent s/æ/&aelig;/eg
  silent s/ç/&ccedil;/eg
  silent s/è/&egrave;/eg
  silent s/é/&eacute;/eg
  silent s/ê/&ecirc;/eg
  silent s/ë/&euml;/eg
  silent s/ì/&igrave;/eg
  silent s/í/&iacute;/eg
  silent s/î/&icirc;/eg
  silent s/ï/&iuml;/eg
  silent s/ð/&eth;/eg
  silent s/ñ/&ntilde;/eg
  silent s/ò/&ograve;/eg
  silent s/ó/&oacute;/eg
  silent s/ô/&ocirc;/eg
  silent s/õ/&otilde;/eg
  silent s/ö/&ouml;/eg
  silent s/ø/&oslash;/eg
  silent s/ù/&ugrave;/eg
  silent s/ú/&uacute;/eg
  silent s/û/&ucirc;/eg
  silent s/ü/&uuml;/eg
  silent s/ý/&yacute;/eg
  silent s/þ/&thorn;/eg
  silent s/ÿ/&yuml;/eg

--Preceding unsigned comment added by 11:11, July 23, 2010

I formatted the above. What was the dot doing in s./old/new/eg? Is the dot supposed to be before the s (the current line)? If so, it is redundant because the default is the current line. I removed the dot from each command. JohnBeckett 12:04, July 23, 2010 (UTC)
Community content is available under CC-BY-SA unless otherwise noted.