UnixWorld Online: Tutorial: Article No. 009

The vi/ex Editor, Part 7: A Little ``R'' and ``r'': The Fine Points of those Replacement Commands

By Walter Alan Zintz.

This installment of our Vi/Ex tutorial series is a diversion from the subjects I promised at the end of the previous part -- the change is my fault, and yet it is necessary. When I blithely suggested last time that the R command is just like the familiar r command, except for a few differences I mentioned, I was leading you astray.

There are several differences that can cause problems in certain uses unless you understand those differences. And you won't really comprehend the greatest of those differences until you know about metacharacters in insert mode. But as an encouragement to follow all this, consider that almost all of what I say here about the R command also is valid with all the other commands that put you into text insertion mode: a A i I o O c s :a :i etcetera.

There's more to R than to r

The r command replaces whatever character is presently under the cursor, so there must be some character under the cursor for it to replace -- otherwise it just gives you an error beep. Not so with R. You can give the R command on an empty line; whatever you type after that, up to the next escape character, will take the place of that empty line just as though you had typed past the end of an existing line after giving an R command. (I was going to say ``just as though you had given an a command'', but I'm now very leary of making comparisons that are incomplete without paragraphs of explanations.) You can even start entering text into a brand-new file via the R command.

The factor above can be useful in various situations; I only have space to mention one. At times I want to type new characters to replace blank spaces in a place where some of the lines are empty. These do not have any blanks; no characters at all. But I do not have to look at each line before I start typing on it, to see whether I should use an R or an a command, because R will work in either case.

The R command is more forgiving of your typing errors, too. Whatever character you type after an r is final. If you accidentally typed the wrong character, you can only put back what was there by typing a u command, if the mistake was the last editing command you typed, or put in the replacement you had in mind by returning the cursor to the spot and running another, more careful, r command.

But if you mistype during an R command, you can backspace over the error with the backspace key. Then you can type in the character (or characters; you can back up multiple spaces by repeating the backspace key) you should have typed. And if you simply typed too far, you'll be glad to know that backspacing doesn't just remove the incorrect characters, it restores the characters that were there, either right away or as soon as you hit the escape key. You can even backspace over everything you've typed during this R command before you type escape, because the editor does not object to a replacement string length of zero.

One caveat here, though, lest my clarification turn out to need a clarification of its own. With either of these commands it is possible to break a line, just by typing the return key as a replacement character, and with the R command this linebreaking can be done either while actually replacing characters or when typing on beyond the end of the existing line. With almost all versions of the editor, it is not possible to backspace over an inserted linebreak, even while you are still in R insertion mode.

The most important difference, though, is the handling of metacharacters. Yes, text insertion utilizes metacharacters too, quite apart from the ones that the replacement patterns in :substitute commands use. The r command recognizes hardly any of these metacharacters, and quoting those in as literal characters is very simple. The R command, though, recognizes almost all of them, and quoting characters in with R is rather complicated.

Quoting in Characters

The phrase ``quoting in'' is standard terminology, but it is rather misleading in the editor. Unlike Unix shells, the editor does not use any of the ASCII quotation marks: ` ' " (backquote, single and double quote) to quote characters into a file. Instead, it uses the backslash (``\'') and control-V (``^V''); the latter is what you send when you press the V key while holding the CONTROL or CTRL key down. In either case, you quote a character in by typing the quoting character just prior to the character you want to quote in. So if @ is your line kill character, and you want to put that character in the text you are typing in, you would have to type either \@ or ^V@ to get it there. And if you want several consecutive characters quoted in, you must quote each of them individually. That is, if you want to put @@@ into a line, you must type either ^V@^V@^V@ or \@\@\@ to put that string there.

But \ and ^V are not always interchangeable. In many cases either will work; but sometimes you must choose the right one. Which one to use depends both on what character you want to quote in and whether you're using the r or R command.

One obvious use for quoting is to insert a character that normally erases part or all of what you've just typed in. The ASCII backspace character, control-H, must be quoted in, and so must your own line-kill character (@ in the example above) and your own erase character if it is not control-H. With the r command you quote in any of these with a backslash; when using R you may quote any of these in using either backslash or control-V.

A pause here, to answer a question that might be in the minds of people who know a little about Unix internals. Ordinarily it is the asynchronous serial terminal line (or TTY) driver that recognizes the erase and line-kill characters and edits the input line accordingly without including these characters in the final result. Then, how can one enter these same input-line characters into the edit buffer if they don't get past the TTY driver? Because Vi/Ex places the TTY driver into a special ``raw'' mode that ignores the line-editing characters passing them on to the editor. Otherwise you would not be able to quote these characters in. Also, the editor is set up to discover your erase and line-kill characters by querying your personal environment, and then interpret these characters as the line driver would have. A nifty feature -- but unfortunately, the editor has no way to let the user turn this feature off.

The editor's creators came up with a curious method for repeating short text insertions, where the text to go in is always the same but any outgoing text varies. They decided that when you are in screen mode, and have just gone into typing-in-text submode, and make Control-@ (``^@'') the first character you type in, then the editor should insert the last piece of text you had previously inserted (if it was not more than 128 characters long) and take you back to command mode. Unfortunately, they never made this work as promised.

In actuality, ^@ operates anywhere in a text insertion, not just in the first character position. What a ^@ does there depends on the situation. If your last c d y command, or one of their variants such as s D etcetera, removed or copied a full line of text or parts of two or more lines, or if you haven't run one of those commands in your current editing session, then typing ^@ is just a nuisance. It will take you out of text input submode and probably move the cursor back a few characters from where the input ended.

But if you have done at least one c d y command or a variant, and if the very last one you did removed or copied only a part of a single line of text, then surprise! Typing a ^@ in this case will do three things:

  1. Unless you typed it at the first character position on a line, it will move the cursor back one character. This will move over the last character you typed in if you've typed any, or over one existing character if you type ^@ as the first character of your insertion, but will not erase the character it passes over.
  2. Just to the left of the new cursor position, the editor will insert the text that was removed or copied by your last c d y command or variant. (If you went into text-insertion submode via a c command or a variant of it, the text you just took out is what will be put back in.)
  3. Finally, the text insertion will automatically end and you will be back in command submode, with the cursor positioned at the start of the last simple word that was inserted by the ^@ metacharacter.

Quoting a ^@ into your text isn't possible, because the editor reserves that character for internal use and will not accept it as itself in any file you may edit. Not that there would be any reason to put ^@ in a file anyway: it is the ASCII character NUL, a padding character that is routinely inserted in data streams by device drivers, and just as routinely stripped at the receiving end, so any ^@ characters you might add would be lost in the shuffle. But when you are using the R command, or any other command that lets you insert an indefinite amount of text, you can quote a ^@ anyway by preceding it with a ^V. The result will be to quote ^[Pb into your file at that point; this being the command string the editor issues to perform the odd operation I've detailed above.

Those of you who are skillful with the editor may wonder why the ^@ insertion operates only when your last text extraction was a fragment of one line. After all, the P command by itself inserts the contents of the unnamed buffer, and that buffer holds whatever was extracted last, be it half a line or a hundred lines, doesn't it? The answer lies in one of the editor's undocumented features. When you give a command to insert text, even the r command that only inserts a single character, the editor simultaneously flushes the unnamed buffer and leaves it empty -- if and only if that buffer contained more than a fragment of one line. So, when you entered the text insertion mode from which ^@ operates, you emptied the unnamed buffer unless there was only a fragment of one line in it.

At times you may want to use the beautify option to the set command. This tells the editor to throw away most, but not all, control characters you may try to type in -- the exceptions usually are the tab (^I), newline (^J), and form feed (^L) -- in order to keep you from inadvertently putting in invisible control characters that will be hard to detect later. This option is normally off, but you can type :se bf to turn it on.

But even when you want most control characters thrown out, there will be occasions when one must go in. This is not possible using a r command. The usual r technique of backslashing will usually bite back in this case -- the editor will interpret the control character by acting on its control meaning rather than inserting it in the text. Using R, though, you can insert most control characters by preceding each with ^V.

Even this may not be enough. Some systems are set up so that when certain control characters are typed in, even though preceded by ^V, the system acts on them as control characters before the editor ever sees them. To get around this problem, many implementations of the editor, especially older ones, interpret an ordinary character typed right after a ^V as a control character. That is, on these systems, typing ^VF or ^Vf while running an R command inserts a ^F in the file, just as typing ^V^F would on systems that don't have this challenge.

Readers Ask

Here are the latest questions, and my solutions, from inquiring readers with problems you might face someday.

Tommy Spratlin writes:

Hi Walter,

In moving files from Windows machines to UNIX, some of our users do binary transfers which result in ^M characters in the ASCII files. Usually they occur at the ends of individual lines and I do:


:1,$ s/^M//g

where ^M is generated by ^V^M and everything works fine to delete these characters. I now have a new problem: I found a file with ^M characters embedded in it, but the file is one long line. I need to replace them with Vi's line-end character to split this long line into multiple lines. But I can't because it's the same as pressing the ENTER or RETURN key in the middle of the substitution command. How can I replace the superfluous carriage return? We have several files like this and it's causing problems viewing them with Web browsers.

I tried substituting a newline with the character code and the octal code unsuccessfully, and tried the ^M as a last unsuccessful resort.

Things aren't as complicated as you make them seem, Tommy. First of all, Web browsers generally ignore carriage-return and/or linefeed characters while formatting text for display. If your browser is choking on these all-one-line files, it is probably because the lines are too long for your browser, or for some other cause not related to embedded ^M characters.

Now, as you have deduced, the difference between Microsoft and Unix text file formats is that Microsoft operating systems seem to favor carriage-return followed by linefeed (^J) as the line separator, while Unix systems use linefeed alone.

As you've discovered, you cannot directly quote a ^J into any editor command. And yet, you put a ^J into your file every time you hit return during text entry, although the return key on most terminals sends a ^M character. That's the trick; the substitute command regards a ^M in the input pattern as a signal to insert a ^J and discard the ^M. So you only need to get that ^M into the replacement pattern by typing in your command line like this:


:1,$ s/^V^M/^V^M/g

You just have to overlook the appearance of futility in this command line, as though it were going to replace each ^M with itself. That first ^M is in the outgoing pattern, so it matches a real ^M. The second, in the replacement pattern, calls for a ^J as I explained above.

However, these all-one-line files may be too long for the Vi editor, which cannot handle lines much more than a thousand characters long in most common implementations, with shorter limits in older versions. The editor will truncate lines that exceed the limit, with only a minimal and rather cryptic warning. In such cases, use the tr utility to replace the ^M characters (which is a very straightforward job with that tool), before you bring the file into the Vi editor.

You may wonder then, how you would use the substitute command to put ^M characters into your file. The answer is to backslash the quoted-in ^M. To add a ^M at the end of every line in your file, so as to conform it to Microsoft practice, type this command:


:%s/$/\^V^M

(Note that it is important to type the \ first, then the ^V, followed by the ^M.) The ^V puts the immediately-following ^M into the command line, and the backslash tells the command that this ^M is to be considered a real one, not a metacharacter for ^J. In fact, these are the general principles for quoting characters almost everywhere except in typing-in-text mode:

  1. Precede a character by ^V to keep that character from being interpreted as a metacharacter at the moment you type it. In this case, you don't want typing ^M to immediately end the substitution command.
  2. Precede a character by a backslash to keep that character from acting as a metacharacter later, when what you've typed is interpreted by the editor -- for example, when what you have typed in is run as a command, or interpreted as a search pattern. This command uses a backslash to keep the command from inserting ^J instead of ^M at the time it executes.
  3. When you must use both, as in this case, type the backslash before you type the ^V. (If you think that this backslash would then affect the immediately following ^V rather than the later ^M, remember that the ^V is not there when the backslash takes effect. The ^V disappears as soon as it tells the editor to insert the ^M in the command instead of taking the ^M as the signal to end the command.)

Finally, you can replace linefeed characters with something else via line mode commands, but you must use two commands and only one of them is the substitute command. Suppose you need to change a short file's format from a number of lines to the format Tommy encountered: a single line with ^M separators. That is, replace each ^J (except the last) with a ^M. (This had better be a fairly short file, because even newer versions of the editor can't handle any lines longer than 1024 characters.)

Start by using a command similar to the one above to put ^M at the end of every line except the last. (Since these ^M characters are to separate lines, there's no use for one at the end of the last line.) Then use this command:


:%j!

to join all the lines into one. The ``j'' in this command line is the shortest abbreviation for the line mode join command, and the ``!'' switch at the end of it tells the command not to insert blank space between the lines it joins.

Thai-Nghia Dinh writes:

Hi,

I have a question (rather simple, really) but no one seem able to know the answer. Not even the help desk (with all the Vi gurus :) ). I'm hoping you can help me with it.

I have a text file of unknown length. Each line of the file can be very short or very long (from 3 characters up to 1000 characters).

Within this file, I'm trying to locate (search) the nth occurrence of a word.

Here are a few things I've tried:

  1. The simple solution would be (from visual command mode): a /foobar command followed by the n command typed n-1 times. But what if n is large, say 200 or greater?)
  2. :1,$ global /^/ /foobar/ (and its variations) Nothing useful...

Can you suggest a better way?

Yes, although it involves a slightly tricky procedure. Consider the following command string:


:$|/\<foobar\>/s//QQQ

The first command in this string takes us to the last line of our file and -- incidentally -- displays it on our screen, which is not important here. The second command searches forward for a line containing ``foobar'' as a word, and starting from the last line the search must wrap around and find the first instance in the file. Then that second command replaces the word ``foobar'' with ``QQQ'', leaving the cursor at the point where the substitution was made.

Now let us make an addition to the start of this command string:


:1,199g/^/$|/\<foobar\>/s//QQQ

This revised string repeats the procedure 199 times; each time the first instance of ``foobar'' remaining in the file is the one replaced. So we end up sitting on the ``QQQ'' string that replaced the 199th instance of ``foobar''; simply typing n will bring us to the 200th instance. And if we move off that 200th instance for any reason, going to the top of the file and searching for ``foobar'' will bring us right back to it, because the first 199 are now gone.

When we are finished with that 200th ``foobar'', this command:


:%s/QQQ/foobar/g

will change those 199 ``QQQ'' strings back to ``foobar''. Of course, if there is any chance that ``QQQ'' might occur in the document as itself, we can choose another dummy string.

And while I'm at it, I've got another question.

How do I delete all lines beginning with a certain string, say, !@#$ (or foobar for that matter). And a related question: how to delete lines containing the word foobar (anywhere within the line)?

The first command line following will solve your first problem, and the second will solve your second:


:g/^foobar/d
:g/\<foobar\>/d

Next Time Around

To make room to answer two readers' questions, I had to skip presenting three great Vi tools -- autoindent, abbreviate, and map! -- and the effect their metacharacters have in text-insertion mode. They'll be first up in the next part of this tutorial.

More answers to reader questions are coming, too. I have queries to answer about the semicolon address separator and about yanking within macros -- and if a few more significant problems arrive here, I'll try to fit them in, too.

And this time you won't have to wait and wait for the next tutorial part. As I write this paragraph, I'm already in the middle of creating the next part, so you should see it within a month after this part appears online.

Back to the index