JohnBeckett (talk | contribs) (more wikify and minor tweaks) |
(→Patterns including end-of-line: Adding non-greedy variant to example) Tag: Visual edit |
||
(14 intermediate revisions by 5 users not shown) | |||
Line 5: | Line 5: | ||
|created=2002 |
|created=2002 |
||
|complexity=intermediate |
|complexity=intermediate |
||
− | |author= |
+ | |author= |
|version=6.0 |
|version=6.0 |
||
|rating=31/16 |
|rating=31/16 |
||
Line 11: | Line 11: | ||
|category2= |
|category2= |
||
}} |
}} |
||
+ | Vim can search for text that spans multiple lines. For example, the search <code>/hello\_sworld</code> finds "hello world" in a single line, and also finds "hello" ending one line, with "world" starting the next line. In a search, <code>\s</code> finds space or tab, while <code>\_s</code> finds newline or space or tab: an underscore adds a newline to any character class. |
||
− | One of the most uncelebrated features of Vim is the ability to span a search across multiple lines. |
||
+ | This tip shows how to search over multiple lines, and presents a useful command so entering <code>:S hello world</code> finds "hello" followed by "world" separated by spaces or tabs or newlines, and <code>:S! hello world</code> allows any non-word characters, including newlines, between the words. |
||
− | All of the following match line beginnings or endings anywhere in the search pattern, unlike <tt>^</tt> and <tt>$</tt>. |
||
+ | |||
+ | ==Patterns including end-of-line== |
||
+ | The search <code>/^abcd</code> finds <code>abcd</code> at the beginning of a line, and <code>/abcd$</code> finds <code>abcd</code> at the end of a line. However, in <code>/abcd^efgh</code> and <code>/abcd$efgh</code> the <code>^</code> and <code>$</code> are just ordinary characters with no special meaning. By contrast, each of the following has a special meaning anywhere in a search pattern. |
||
{| class="cleartable" |
{| class="cleartable" |
||
− | | < |
+ | | <code>\n</code> || a newline character (line ending) |
+ | |- |
||
+ | | <code>\_s</code> || a whitespace (space or tab) or newline character |
||
|- |
|- |
||
− | | < |
+ | | <code>\_^</code> || the beginning of a line (zero width) |
|- |
|- |
||
− | | < |
+ | | <code>\_$</code> || the end of a line (zero width) |
|- |
|- |
||
− | | < |
+ | | <code>\_.</code> || any character including a newline |
|} |
|} |
||
+ | Example searches: |
||
− | For example, <tt>/{\_s</tt> finds <tt>{</tt> followed by a whitespace or newline character. |
||
+ | ;<code>/abcd\n*efgh</code> |
||
+ | :Finds <code>abcd</code> followed by zero or more newlines then <code>efgh</code>. |
||
+ | :Finds <code>abcdefgh</code> or <code>abcd</code> followed by blank lines and <code>efgh</code>. |
||
+ | :The blank lines have to be empty (no space or tab characters). |
||
+ | ;<code>/abcd\_s*efgh</code> |
||
− | Some of these can be confusing to work with. For example, this works as expected: |
||
+ | :Finds <code>abcd</code> followed by any whitespace or newlines then <code>efgh</code>. |
||
− | <pre> |
||
+ | :Finds <code>abcdefgh</code> or <code>abcd</code> followed by blank lines and <code>efgh</code>. |
||
− | end one line\_^begin the next |
||
+ | :The blank lines can contain any number of space or tab characters. |
||
− | </pre> |
||
+ | :There may be whitespace after <code>abcd</code> or before <code>efgh</code>. |
||
+ | ;<code>/abcd\_$\_s*efgh</code> |
||
− | <tt>\_$</tt> is not equivalent. It also is a zero-length marker, but that means the end-of-line characters remain between it and the next line. The following never matches, because <tt>u</tt> doesn't match the end-of-line character. |
||
+ | :Finds <code>abcd</code> at end-of-line followed by any whitespace or newlines then <code>efgh</code>. |
||
− | <pre> |
||
+ | :There must be no characters (other than a newline) following <code>abcd</code>. |
||
− | end one line\_$um |
||
+ | :There can be any number of space, tab or newline characters before <code>efgh</code>. |
||
− | </pre> |
||
+ | ;<code>/abcd\_s*\_^efgh</code> |
||
− | This does what you want: |
||
+ | :Finds <code>abcd</code> followed by any whitespace or newlines then <code>efgh</code> where <code>efgh</code> begins a line. |
||
− | <pre> |
||
+ | :There must be no characters (other than a newline) before <code>efgh</code>. |
||
− | end one line\nnext line |
||
+ | :There can be any number of space, tab or newline characters after <code>abcd</code>. |
||
− | </pre> |
||
+ | ;<code>/abcd\_$efgh</code> |
||
− | <tt>\_s</tt> is a different kind of beast. You can insert the underscore in any of the character-class atoms to include line-ends in the class. In this case the match position moves past a line-end when it matches. This means you can search for things like <tt>\_S\+</tt> to match any sequence of NON-whitespace characters, even across multiple lines, or <tt>\_[abc]</tt> to match sequences of characters containing only the letters <tt>a</tt>, <tt>b</tt>, or <tt>c</tt>, that can span multiple lines. |
||
+ | :Finds nothing because <code>\_$</code> is "zero width" so the search is looking for <code>abcdefgh</code> where <code>abcd</code> is also at end-of-line (which cannot occur). |
||
+ | ;<code>/abcd\_^efgh</code> |
||
− | The last member of the set is <tt>\_.</tt>, which matches any character in the buffer, including line-ends. <tt>\_.*</tt> matches the rest of the buffer from the current position. Use this with caution, because it can easily match much more than you want or slow down your search considerably. Consider using a non-greedy search (<tt>\_.\{-}</tt>) instead. |
||
+ | :Finds nothing because <code>\_^</code> is "zero width" so the search is looking for <code>abcdefgh</code> where <code>efgh</code> is also at beginning-of-line (which cannot occur). |
||
+ | ;<code>/abcd\_.\{-}efgh</code> |
||
⚫ | |||
+ | :Finds <code>abcd</code> followed by any characters or newlines (as few as possible) then <code>efgh</code>. |
||
⚫ | |||
+ | :Finds <code>abcdefgh</code> or <code>abcd</code> followed by any characters then <code>efgh</code>. |
||
+ | |||
+ | ;<code>/abcd\(\_s.*\)\{0,18\}\_sefgh</code><code>/abcd\(\_s.\{-\}\)\{0,18\}\_sefgh</code> |
||
+ | :Finds a block of 0 to 18 lines enclosed by <code>abcd</code> and <code>efgh</code>. The first option is greedy (i.e. it captures as many lines as possible which may span multiple matches). If you want each match highlighted separately use the second regex as it will be non-greedy.. |
||
+ | :limiting the number of lines is important, replacing this by a star will cause vim to consume 100% CPU. |
||
+ | ==Searching for multiline HTML comments== |
||
⚫ | |||
+ | It is common for comments in HTML documents to span several lines: |
||
− | {{todo}} |
||
− | *Sort out following mess. A very quick scan suggests some later comments fix problems/questions in earlier comments. |
||
− | *What is the Python script at the bottom? Surely way over-the-top? |
||
− | Haven't got time to look now. [[User:JohnBeckett|JohnBeckett]] 22:48, November 6, 2009 (UTC) |
||
− | ---- |
||
− | To seek out HTML comments over ''multiple'' lines, for example: |
||
<pre> |
<pre> |
||
− | <!-- |
+ | <!-- This comment |
− | + | covers two lines. --> |
|
</pre> |
</pre> |
||
− | + | The following search finds any HTML comment: |
|
<pre> |
<pre> |
||
− | /<!--\ |
+ | /<!--\_.\{-}--> |
</pre> |
</pre> |
||
− | + | The atom <code>\_.</code> finds any character including end-of-line. The multi <code>\{-}</code> matches as few as possible (stopping at the first "<code>--></code>"; the multi <code>*</code> is too greedy and would stop at the last occurrence). |
|
+ | Syntax highlighting may be not be accurate, particularly with long comments. The following command will improve the accuracy when jumping in the file, but may be slower ({{help|:syn-sync}}): |
||
− | The key is of course <tt>\_p</tt> which is printable characters including EOL end-of-lines. |
||
− | |||
− | However, the highlighting is very erratic when the span over number of lines exceeds, say, 30. And highlighting is rather spotty when there are shifts in screen views. This is due to the default that improves highlighting performance. |
||
− | |||
− | If you want to ensure the most accurate highlighting, try: |
||
<pre> |
<pre> |
||
:syntax sync fromstart |
:syntax sync fromstart |
||
</pre> |
</pre> |
||
+ | ==Searching over multiple lines== |
||
− | This can slow things down on large files with complex highlighting. {{help|:syn-sync}} |
||
+ | A pattern can find any specified characters, for example, <code>[aeiou]</code> matches 'a' or 'e' or 'i' or 'o' or 'u'. In addition, Vim defines several character classes. For example, <code>\a</code> is <code>[A-Za-z]</code> (matches any alphabetic character), and <code>\A</code> is <code>[^A-Za-z]</code> (opposite of <code>\a</code>; matches any non-alphabetic character). {{help|/\a}} |
||
+ | An underscore can be used to extend a character class to include a newline (end of line). For example, searching for <code>\_[aeiou]</code> finds a newline or a vowel, so <code>\_[aeiou]\+</code> matches any sequence of vowels, even a sequence spanning multiple lines. Similarly, <code>\_a\+</code> matches any sequence of alphabetic characters, even when spanning multiple lines. |
||
− | ---- |
||
− | For some reason <tt><!--\_p\{-}--></tt> doesn't work if your comments are indented (with opening and closing comment tag indented). |
||
+ | The following search pattern finds "hello world" where any non-alphabetic characters separate the words: |
||
− | Here's another way to highlight HTML comments using conventional regex: |
||
<pre> |
<pre> |
||
+ | hello\_[^a-zA-Z]*world |
||
− | /<\!--\(.\|\n\)*--> |
||
</pre> |
</pre> |
||
+ | The above pattern (which is equivalent to <code>hello\_A*world</code>) matches "helloworld", and "hello? ... world", and similar strings, even if "hello" is on one line and "world" is on a following line. |
||
− | However, this one will spill over to the next comment if there's more than one so it's not too useful. |
||
+ | ==Searching over multiple lines with a user command== |
||
− | ---- |
||
+ | The script below defines the command <code>:S</code> that will search for a phrase, even when the words are on different lines. Examples: |
||
− | The Tab character is among the control chars, thus not matched with <tt>\p</tt> per default. |
||
+ | ;<code>:S hello world</code> |
||
− | ---- |
||
+ | :Searches for "hello" followed by "world", separated by whitespace including newlines. |
||
− | The script attached to http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=256743 offers a convenient way to address this issue, you can type ":S foo bar" and it is translated to "/sfoo\_s\+bar" |
||
+ | ;<code>:S! hello world</code> |
||
+ | :Searches for "hello" followed by "world", separated by any non-word characters (whitespace, newlines, punctuation). |
||
+ | :Finds, for example, "hello, world" and "hello+world" and "hello ... world". The words can be on different lines. |
||
+ | After entering the command, press <code>n</code> or <code>N</code> to search for the next or previous occurrence. |
||
− | This works even better for me. Put the following into file <tt>~/.vim/project/blanksearch.vim</tt>: |
||
− | <pre> |
||
− | :py <<EOF |
||
− | import vim |
||
− | def MySearch(*args): |
||
− | s="\\_s\\+".join(args) |
||
− | vim.command("/"+s) |
||
− | EOF |
||
⚫ | |||
− | </pre> |
||
+ | Put the following in your [[vimrc]] (or in file <code>searchmultiline.vim</code> in your plugin directory): |
||
− | Note the tab (not spaces!) in the two indented lines. |
||
+ | <source lang="vim"> |
||
+ | " Search for the ... arguments separated with whitespace (if no '!'), |
||
+ | " or with non-word characters (if '!' added to command). |
||
+ | function! SearchMultiLine(bang, ...) |
||
+ | if a:0 > 0 |
||
+ | let sep = (a:bang) ? '\_W\+' : '\_s\+' |
||
+ | let @/ = join(a:000, sep) |
||
+ | endif |
||
+ | endfunction |
||
⚫ | |||
+ | </source> |
||
+ | ==See also== |
||
− | The advantage of this version is that <tt>N</tt> and <tt>n</tt> work afterwards. |
||
+ | *[[Searching]] how to search |
||
+ | *[[Search patterns]] regex information and examples |
||
+ | *[[Search for visually selected text]] search for selected text; finds targets on multiple lines |
||
⚫ | |||
− | ---- |
||
⚫ | |||
+ | |||
⚫ |
Latest revision as of 20:56, 31 July 2020
Vim can search for text that spans multiple lines. For example, the search /hello\_sworld
finds "hello world" in a single line, and also finds "hello" ending one line, with "world" starting the next line. In a search, \s
finds space or tab, while \_s
finds newline or space or tab: an underscore adds a newline to any character class.
This tip shows how to search over multiple lines, and presents a useful command so entering :S hello world
finds "hello" followed by "world" separated by spaces or tabs or newlines, and :S! hello world
allows any non-word characters, including newlines, between the words.
Patterns including end-of-line[]
The search /^abcd
finds abcd
at the beginning of a line, and /abcd$
finds abcd
at the end of a line. However, in /abcd^efgh
and /abcd$efgh
the ^
and $
are just ordinary characters with no special meaning. By contrast, each of the following has a special meaning anywhere in a search pattern.
\n |
a newline character (line ending) |
\_s |
a whitespace (space or tab) or newline character |
\_^ |
the beginning of a line (zero width) |
\_$ |
the end of a line (zero width) |
\_. |
any character including a newline |
Example searches:
/abcd\n*efgh
- Finds
abcd
followed by zero or more newlines thenefgh
. - Finds
abcdefgh
orabcd
followed by blank lines andefgh
. - The blank lines have to be empty (no space or tab characters).
/abcd\_s*efgh
- Finds
abcd
followed by any whitespace or newlines thenefgh
. - Finds
abcdefgh
orabcd
followed by blank lines andefgh
. - The blank lines can contain any number of space or tab characters.
- There may be whitespace after
abcd
or beforeefgh
.
/abcd\_$\_s*efgh
- Finds
abcd
at end-of-line followed by any whitespace or newlines thenefgh
. - There must be no characters (other than a newline) following
abcd
. - There can be any number of space, tab or newline characters before
efgh
.
/abcd\_s*\_^efgh
- Finds
abcd
followed by any whitespace or newlines thenefgh
whereefgh
begins a line. - There must be no characters (other than a newline) before
efgh
. - There can be any number of space, tab or newline characters after
abcd
.
/abcd\_$efgh
- Finds nothing because
\_$
is "zero width" so the search is looking forabcdefgh
whereabcd
is also at end-of-line (which cannot occur).
/abcd\_^efgh
- Finds nothing because
\_^
is "zero width" so the search is looking forabcdefgh
whereefgh
is also at beginning-of-line (which cannot occur).
/abcd\_.\{-}efgh
- Finds
abcd
followed by any characters or newlines (as few as possible) thenefgh
. - Finds
abcdefgh
orabcd
followed by any characters thenefgh
.
/abcd\(\_s.*\)\{0,18\}\_sefgh
/abcd\(\_s.\{-\}\)\{0,18\}\_sefgh
- Finds a block of 0 to 18 lines enclosed by
abcd
andefgh
. The first option is greedy (i.e. it captures as many lines as possible which may span multiple matches). If you want each match highlighted separately use the second regex as it will be non-greedy.. - limiting the number of lines is important, replacing this by a star will cause vim to consume 100% CPU.
Searching for multiline HTML comments[]
It is common for comments in HTML documents to span several lines:
<!-- This comment covers two lines. -->
The following search finds any HTML comment:
/<!--\_.\{-}-->
The atom \_.
finds any character including end-of-line. The multi \{-}
matches as few as possible (stopping at the first "-->
"; the multi *
is too greedy and would stop at the last occurrence).
Syntax highlighting may be not be accurate, particularly with long comments. The following command will improve the accuracy when jumping in the file, but may be slower (:help :syn-sync):
:syntax sync fromstart
Searching over multiple lines[]
A pattern can find any specified characters, for example, [aeiou]
matches 'a' or 'e' or 'i' or 'o' or 'u'. In addition, Vim defines several character classes. For example, \a
is [A-Za-z]
(matches any alphabetic character), and \A
is [^A-Za-z]
(opposite of \a
; matches any non-alphabetic character). :help /\a
An underscore can be used to extend a character class to include a newline (end of line). For example, searching for \_[aeiou]
finds a newline or a vowel, so \_[aeiou]\+
matches any sequence of vowels, even a sequence spanning multiple lines. Similarly, \_a\+
matches any sequence of alphabetic characters, even when spanning multiple lines.
The following search pattern finds "hello world" where any non-alphabetic characters separate the words:
hello\_[^a-zA-Z]*world
The above pattern (which is equivalent to hello\_A*world
) matches "helloworld", and "hello? ... world", and similar strings, even if "hello" is on one line and "world" is on a following line.
Searching over multiple lines with a user command[]
The script below defines the command :S
that will search for a phrase, even when the words are on different lines. Examples:
:S hello world
- Searches for "hello" followed by "world", separated by whitespace including newlines.
:S! hello world
- Searches for "hello" followed by "world", separated by any non-word characters (whitespace, newlines, punctuation).
- Finds, for example, "hello, world" and "hello+world" and "hello ... world". The words can be on different lines.
After entering the command, press n
or N
to search for the next or previous occurrence.
Put the following in your vimrc (or in file searchmultiline.vim
in your plugin directory):
" Search for the ... arguments separated with whitespace (if no '!'),
" or with non-word characters (if '!' added to command).
function! SearchMultiLine(bang, ...)
if a:0 > 0
let sep = (a:bang) ? '\_W\+' : '\_s\+'
let @/ = join(a:000, sep)
endif
endfunction
command! -bang -nargs=* -complete=tag S call SearchMultiLine(<bang>0, <f-args>)|normal! /<C-R>/<CR>
See also[]
- Searching how to search
- Search patterns regex information and examples
- Search for visually selected text search for selected text; finds targets on multiple lines