Changes: Search across multiple lines

Latest revision as of 20:56, 31 July 2020

Tip 242 Printable Monobook Previous Next

created 2002 · complexity intermediate · version 6.0

Vim can search for text that spans multiple lines. For example, the search /hello\_sworld finds "hello world" in a single line, and also finds "hello" ending one line, with "world" starting the next line. In a search, \s finds space or tab, while \_s finds newline or space or tab: an underscore adds a newline to any character class.

This tip shows how to search over multiple lines, and presents a useful command so entering :S hello world finds "hello" followed by "world" separated by spaces or tabs or newlines, and :S! hello world allows any non-word characters, including newlines, between the words.

Patterns including end-of-line[]

The search /^abcd finds abcd at the beginning of a line, and /abcd$ finds abcd at the end of a line. However, in /abcd^efgh and /abcd$efgh the ^ and $ are just ordinary characters with no special meaning. By contrast, each of the following has a special meaning anywhere in a search pattern.

`\n`	a newline character (line ending)
`\_s`	a whitespace (space or tab) or newline character
`\_^`	the beginning of a line (zero width)
`\_$`	the end of a line (zero width)
`\_.`	any character including a newline

Example searches:

/abcd\n*efgh: Finds abcd followed by zero or more newlines then efgh.; Finds abcdefgh or abcd followed by blank lines and efgh.; The blank lines have to be empty (no space or tab characters).

/abcd\_s*efgh: Finds abcd followed by any whitespace or newlines then efgh.; Finds abcdefgh or abcd followed by blank lines and efgh.; The blank lines can contain any number of space or tab characters.; There may be whitespace after abcd or before efgh.

/abcd\_$\_s*efgh: Finds abcd at end-of-line followed by any whitespace or newlines then efgh.; There must be no characters (other than a newline) following abcd.; There can be any number of space, tab or newline characters before efgh.

/abcd\_s*\_^efgh: Finds abcd followed by any whitespace or newlines then efgh where efgh begins a line.; There must be no characters (other than a newline) before efgh.; There can be any number of space, tab or newline characters after abcd.

/abcd\_$efgh: Finds nothing because \_$ is "zero width" so the search is looking for abcdefgh where abcd is also at end-of-line (which cannot occur).

/abcd\_^efgh: Finds nothing because \_^ is "zero width" so the search is looking for abcdefgh where efgh is also at beginning-of-line (which cannot occur).

/abcd\_.\{-}efgh: Finds abcd followed by any characters or newlines (as few as possible) then efgh.; Finds abcdefgh or abcd followed by any characters then efgh.

/abcd$\_s.*$\{0,18\}\_sefgh/abcd$\_s.\{-\}$\{0,18\}\_sefgh: Finds a block of 0 to 18 lines enclosed by abcd and efgh. The first option is greedy (i.e. it captures as many lines as possible which may span multiple matches). If you want each match highlighted separately use the second regex as it will be non-greedy..; limiting the number of lines is important, replacing this by a star will cause vim to consume 100% CPU.

Searching for multiline HTML comments[]

It is common for comments in HTML documents to span several lines:

<!-- This comment
 covers two lines. -->

The following search finds any HTML comment:

/<!--\_.\{-}-->

The atom \_. finds any character including end-of-line. The multi \{-} matches as few as possible (stopping at the first "-->"; the multi * is too greedy and would stop at the last occurrence).

Syntax highlighting may be not be accurate, particularly with long comments. The following command will improve the accuracy when jumping in the file, but may be slower (:help :syn-sync):

:syntax sync fromstart

Searching over multiple lines[]

A pattern can find any specified characters, for example, [aeiou] matches 'a' or 'e' or 'i' or 'o' or 'u'. In addition, Vim defines several character classes. For example, \a is [A-Za-z] (matches any alphabetic character), and \A is [^A-Za-z] (opposite of \a; matches any non-alphabetic character). :help /\a

An underscore can be used to extend a character class to include a newline (end of line). For example, searching for \_[aeiou] finds a newline or a vowel, so \_[aeiou]\+ matches any sequence of vowels, even a sequence spanning multiple lines. Similarly, \_a\+ matches any sequence of alphabetic characters, even when spanning multiple lines.

The following search pattern finds "hello world" where any non-alphabetic characters separate the words:

hello\_[^a-zA-Z]*world

The above pattern (which is equivalent to hello\_A*world) matches "helloworld", and "hello? ... world", and similar strings, even if "hello" is on one line and "world" is on a following line.

Searching over multiple lines with a user command[]

The script below defines the command :S that will search for a phrase, even when the words are on different lines. Examples:

:S hello world: Searches for "hello" followed by "world", separated by whitespace including newlines.
:S! hello world: Searches for "hello" followed by "world", separated by any non-word characters (whitespace, newlines, punctuation).; Finds, for example, "hello, world" and "hello+world" and "hello ... world". The words can be on different lines.

After entering the command, press n or N to search for the next or previous occurrence.

Put the following in your vimrc (or in file searchmultiline.vim in your plugin directory):

" Search for the ... arguments separated with whitespace (if no '!'),
" or with non-word characters (if '!' added to command).
function! SearchMultiLine(bang, ...)
  if a:0 > 0
    let sep = (a:bang) ? '\_W\+' : '\_s\+'
    let @/ = join(a:000, sep)
  endif
endfunction
command! -bang -nargs=* -complete=tag S call SearchMultiLine(<bang>0, <f-args>)|normal! /<C-R>/<CR>

References[]

:help pattern

@@ Line 5: / Line 5: @@
 |created=2002
 |complexity=intermediate
-|author=vim_power
+|author=
 |version=6.0
 |rating=31/16
@@ Line 11: / Line 11: @@
 |category2=
 }}
+Vim can search for text that spans multiple lines. For example, the search <code>/hello\_sworld</code> finds "hello world" in a single line, and also finds "hello" ending one line, with "world" starting the next line. In a search, <code>\s</code> finds space or tab, while <code>\_s</code> finds newline or space or tab: an underscore adds a newline to any character class.
-One of the most uncelebrated features of Vim is the ability to span a search across multiple lines.
+This tip shows how to search over multiple lines, and presents a useful command so entering <code>:S&nbsp;hello&nbsp;world</code> finds "hello" followed by "world" separated by spaces or tabs or newlines, and <code>:S!&nbsp;hello&nbsp;world</code> allows any non-word characters, including newlines, between the words.
-All of the following match line beginnings or endings anywhere in the search pattern, unlike <tt>^</tt> and <tt>$</tt>.
+==Patterns including end-of-line==
+The search <code>/^abcd</code> finds <code>abcd</code> at the beginning of a line, and <code>/abcd$</code> finds <code>abcd</code> at the end of a line. However, in <code>/abcd^efgh</code> and <code>/abcd$efgh</code> the <code>^</code> and <code>$</code> are just ordinary characters with no special meaning. By contrast, each of the following has a special meaning anywhere in a search pattern.
 {| class="cleartable"
-| <tt>\n</tt> || the newline character itself
+| <code>\n</code> || a newline character (line ending)
+|-
+| <code>\_s</code> || a whitespace (space or tab) or newline character
 |-
-| <tt>\_^</tt> || the beginning of a line
+| <code>\_^</code> || the beginning of a line (zero width)
 |-
-| <tt>\_$</tt> || the end of a line but before any newline character
+| <code>\_$</code> || the end of a line (zero width)
 |-
-| <tt>\_s</tt> || a space, tab character, or newline character
+| <code>\_.</code> || any character including a newline
 |}
+Example searches:
-For example, <tt>/{\_s</tt> finds <tt>{</tt> followed by a whitespace or newline character.
+;<code>/abcd\n*efgh</code>
+:Finds <code>abcd</code> followed by zero or more newlines then <code>efgh</code>.
+:Finds <code>abcdefgh</code> or <code>abcd</code> followed by blank lines and <code>efgh</code>.
+:The blank lines have to be empty (no space or tab characters).
+;<code>/abcd\_s*efgh</code>
-Some of these can be confusing to work with. For example, this works as expected:
+:Finds <code>abcd</code> followed by any whitespace or newlines then <code>efgh</code>.
-<pre>
+:Finds <code>abcdefgh</code> or <code>abcd</code> followed by blank lines and <code>efgh</code>.
-end one line\_^begin the next
+:The blank lines can contain any number of space or tab characters.
-</pre>
+:There may be whitespace after <code>abcd</code> or before <code>efgh</code>.
+;<code>/abcd\_$\_s*efgh</code>
-<tt>\_$</tt> is not equivalent. It also is a zero-length marker, but that means the end-of-line characters remain between it and the next line. The following never matches, because <tt>u</tt> doesn't match the end-of-line character.
+:Finds <code>abcd</code> at end-of-line followed by any whitespace or newlines then <code>efgh</code>.
-<pre>
+:There must be no characters (other than a newline) following <code>abcd</code>.
-end one line\_$um
+:There can be any number of space, tab or newline characters before <code>efgh</code>.
-</pre>
+;<code>/abcd\_s*\_^efgh</code>
-This does what you want:
+:Finds <code>abcd</code> followed by any whitespace or newlines then <code>efgh</code> where <code>efgh</code> begins a line.
-<pre>
+:There must be no characters (other than a newline) before <code>efgh</code>.
-end one line\nnext line
+:There can be any number of space, tab or newline characters after <code>abcd</code>.
-</pre>
+;<code>/abcd\_$efgh</code>
-<tt>\_s</tt> is a different kind of beast. You can insert the underscore in any of the character-class atoms to include line-ends in the class. In this case the match position moves past a line-end when it matches. This means you can search for things like <tt>\_S\+</tt> to match any sequence of NON-whitespace characters, even across multiple lines, or <tt>\_[abc]</tt> to match sequences of characters containing only the letters <tt>a</tt>, <tt>b</tt>, or <tt>c</tt>, that can span multiple lines.
+:Finds nothing because <code>\_$</code> is "zero width" so the search is looking for <code>abcdefgh</code> where <code>abcd</code> is also at end-of-line (which cannot occur).
+;<code>/abcd\_^efgh</code>
-The last member of the set is <tt>\_.</tt>, which matches any character in the buffer, including line-ends. <tt>\_.*</tt> matches the rest of the buffer from the current position. Use this with caution, because it can easily match much more than you want or slow down your search considerably. Consider using a non-greedy search (<tt>\_.\{-}</tt>) instead.
+:Finds nothing because <code>\_^</code> is "zero width" so the search is looking for <code>abcdefgh</code> where <code>efgh</code> is also at beginning-of-line (which cannot occur).
+;<code>/abcd\_.\{-}efgh</code>
-==References==
+:Finds <code>abcd</code> followed by any characters or newlines (as few as possible) then <code>efgh</code>.
-*{{help|pattern}}
+:Finds <code>abcdefgh</code> or <code>abcd</code> followed by any characters then <code>efgh</code>.
+;<code>/abcd\(\_s.*\)\{0,18\}\_sefgh</code><code>/abcd\(\_s.\{-\}\)\{0,18\}\_sefgh</code>
+:Finds a block of 0 to 18 lines enclosed by <code>abcd</code> and <code>efgh</code>. The first option is greedy (i.e. it captures as many lines as possible which may span multiple matches).  If you want each match highlighted separately use the second regex as it will be non-greedy..
+:limiting the number of lines is important, replacing this by a star will cause vim to consume 100% CPU.
+==Searching for multiline HTML comments==
-==Comments==
+It is common for comments in HTML documents to span several lines:
-{{todo}}
-*Sort out following mess. A very quick scan suggests some later comments fix problems/questions in earlier comments.
-*What is the Python script at the bottom? Surely way over-the-top?
-Haven't got time to look now. [[User:JohnBeckett|JohnBeckett]] 22:48, November 6, 2009 (UTC)
-----
-To seek out HTML comments over ''multiple'' lines, for example:
 <pre>
-<!-- foobar does
+<!-- This comment
- not exist -->
+ covers two lines. -->
 </pre>
-Use the search:
+The following search finds any HTML comment:
 <pre>
-/<!--\_p\{-}-->
+/<!--\_.\{-}-->
 </pre>
-We used <tt>\{-}</tt> the "few as possible" operator rather than <tt>*</tt> which is too greedy when there are many such comments in the file.
+The atom <code>\_.</code> finds any character including end-of-line. The multi <code>\{-}</code> matches as few as possible (stopping at the first "<code>--></code>"; the multi <code>*</code> is too greedy and would stop at the last occurrence).
+Syntax highlighting may be not be accurate, particularly with long comments. The following command will improve the accuracy when jumping in the file, but may be slower ({{help|:syn-sync}}):
-The key is of course <tt>\_p</tt> which is printable characters including EOL end-of-lines.
-However, the highlighting is very erratic when the span over number of lines exceeds, say, 30. And highlighting is rather spotty when there are shifts in screen views. This is due to the default that improves highlighting performance.
-If you want to ensure the most accurate highlighting, try:
 <pre>
 :syntax sync fromstart
 </pre>
+==Searching over multiple lines==
-This can slow things down on large files with complex highlighting. {{help|:syn-sync}}
+A pattern can find any specified characters, for example, <code>[aeiou]</code> matches 'a' or 'e' or 'i' or 'o' or 'u'. In addition, Vim defines several character classes. For example, <code>\a</code> is <code>[A-Za-z]</code> (matches any alphabetic character), and <code>\A</code> is <code>[^A-Za-z]</code> (opposite of <code>\a</code>; matches any non-alphabetic character). {{help|/\a}}
+An underscore can be used to extend a character class to include a newline (end of line). For example, searching for <code>\_[aeiou]</code> finds a newline or a vowel, so <code>\_[aeiou]\+</code> matches any sequence of vowels, even a sequence spanning multiple lines. Similarly, <code>\_a\+</code> matches any sequence of alphabetic characters, even when spanning multiple lines.
-----
-For some reason <tt>&lt;!--\_p\{-}--></tt> doesn't work if your comments are indented (with opening and closing comment tag indented).
+The following search pattern finds "hello world" where any non-alphabetic characters separate the words:
-Here's another way to highlight HTML comments using conventional regex:
 <pre>
+hello\_[^a-zA-Z]*world
-/<\!--\(.\|\n\)*-->
 </pre>
+The above pattern (which is equivalent to <code>hello\_A*world</code>) matches "helloworld", and "hello? ... world", and similar strings, even if "hello" is on one line and "world" is on a following line.
-However, this one will spill over to the next comment if there's more than one so it's not too useful.
+==Searching over multiple lines with a user command==
-----
+The script below defines the command <code>:S</code> that will search for a phrase, even when the words are on different lines. Examples:
-The Tab character is among the control chars, thus not matched with <tt>\p</tt> per default.
+;<code>:S hello world</code>
-----
+:Searches for "hello" followed by "world", separated by whitespace including newlines.
-The script attached to http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=256743 offers a convenient way to address this issue, you can type ":S foo bar" and it is translated to "/sfoo\_s\+bar"
+;<code>:S! hello world</code>
+:Searches for "hello" followed by "world", separated by any non-word characters (whitespace, newlines, punctuation).
+:Finds, for example, "hello, world" and "hello+world" and "hello ... world". The words can be on different lines.
+After entering the command, press <code>n</code> or <code>N</code> to search for the next or previous occurrence.
-This works even better for me. Put the following into file <tt>~/.vim/project/blanksearch.vim</tt>:
-<pre>
-:py <<EOF
-import vim
-def MySearch(*args):
-    s="\\_s\\+".join(args)
-    vim.command("/"+s)
-EOF
-command -nargs=* -complete=tag S :py MySearch(<f-args>)
-</pre>
+Put the following in your [[vimrc]] (or in file <code>searchmultiline.vim</code> in your plugin directory):
-Note the tab (not spaces!) in the two indented lines.
+<source lang="vim">
+" Search for the ... arguments separated with whitespace (if no '!'),
+" or with non-word characters (if '!' added to command).
+function! SearchMultiLine(bang, ...)
+  if a:0 > 0
+    let sep = (a:bang) ? '\_W\+' : '\_s\+'
+    let @/ = join(a:000, sep)
+  endif
+endfunction
+command! -bang -nargs=* -complete=tag S call SearchMultiLine(<bang>0, <f-args>)|normal! /<C-R>/<CR>
+</source>
+==See also==
-The advantage of this version is that <tt>N</tt> and <tt>n</tt> work afterwards.
+*[[Searching]] how to search
+*[[Search patterns]] regex information and examples
+*[[Search for visually selected text]] search for selected text; finds targets on multiple lines
+==References==
-----
+*{{help|pattern}}
+==Comments==