Find and replace in several files at once

13 min. read

I often find myself having to find a set of characters in several files and replace them with something else. It is not very practical to open each file and press Ctrl + H to find and replace. Or for example, I recently had to get a list of all the images inserted in all my site pages.

You can do all this very easily with the sed command. sed is a stream editor for filtering and transforming text.

What I do is, first, I use the grep command to search the files where there is an occurrence of the string, let's call it "old":

grep "old" *

The asterisk tells grep to search in all the files under the current directory. You can also use */*, */*/* and so on to search inside directories, or -r to search recursively in ALL sub-directories. This will return the file or list of files that contain the string "old" and a bit of the characters before and after the occurrences.

Now you can use sed to search for all the occurrences of "old" in that file and replace them with the "new" string:

sed -i 's/old/new/g' file

Or find and replace in a list of files, separated by a space:

sed -i 's/old/new/g' file1 file2 etc

The -i is for editing files in place (makes backup if extension supplied). The s/old/new/g attempts to match "old" against the pattern space, where s is the substitute command and g means global (i.e. replace all and not just the first occurrence). If successful, replaces "old" with "new".

Beware: Don't use -r with sed to search recursively, it is used for a totally different thing!

If you are lazy, you can also use wildchars. For example, to target all the files under folder directory, you could use:

sed -i 's/old/new/g' directory/*

This would change all the files under directory where sed finds occurrences of "old".

There is another interesting command, the d or "delete", more if it is preceded by an exclamation mark (!d), whose purpose is to negate the command (to do the contrary), as you would do in programing, when you write, for example, != for "not equal". This will help you make other kind of searches as I'll show you later.

Advanced

There is an even more powerful command using good ol' friend find. However, it is much more difficult to remember or to type, and you would probably have to make a bash alias for it.

This is how it looks:

find -L . -type f -exec sed -i 's|old|new|g' '{}' \;

In this case, you won't even need to specify the path to your files, it will recursively search from your actual location down. This also means it is slower compared to the ultra fast sed. So use it if you are very, VERY lazy.

If after using this command on a git repo, you have problems with the git index like this:


$ git status
error: bad index file sha1 signature
fatal: index file corrupt

you can just delete the index:


$ rm -f .git/index
$ git reset

Example with sed

I used this command recently in my http://javasnippets.tk site files (see blog post). I had my SVG files saved as name.html instead of name.svg, so I renamed them correctly. But since these SVG files were in the Jekyll _includes folder, they were being embedded in several other documents of the site. For example, github.html, which stored an SVG version of the GitHub logo, was included in 11 files:


$ grep "github.html" *
agecalc.html:	<span class="button text-center">{% include github.html %} <a href="https://github.com/{{ site.git_username }}/{{ site.git_repo}}/tree/master/ageCalc">View Age Calculation‘s repo</a></span>
ascii.html:		<span class="button text-center">{% include github.html %} <a href="https://github.com/{{ site.git_username }}/{{ site.git_repo}}/tree/master/ascii">View ASCII‘s repo</a></span>
commons.html: ...

...
etc.

Instead of opening 11 files to perform find and replace, I used the sed command:

sed -i 's/github.html/github.svg/g' *.html

Getting a list of the images in all your site pages

I had a bunch of files in one of my sites' img folders, from all the testing and optimizing, etc. I didn't want all of those cluttering my production site, so I had to leave just the ones that were inserted through an img tag, and remove the rest. How to do that?

If it was a WordPress installation, I could just go to phpmyadmin and type some good ol' SQL to get a list of the used images. But it was a Jekyll site, so this is what I did:


$ cd sitefolder
$ sed '/<img/!d' _posts/* > output.txt
$ sed -i 's/.*}//g' output.txt && sed -i 's/".*//g' output.txt

Let's break down what we did here.

The d command would delete ALL the occurrences of the string <img, but if we reverse it with !, then it deletes everything BUT the occurrences of <img. With _posts/*, I'm telling sed to search in all files under the _posts directory, which is where Jekyll stores your blog files. The > output.txt tells any command you type in your terminal to write the results of the execution to a file named output.txt (you can change the name to something that makes more sense to you).

So I get an output.txt with these contents:


<img src="{{ site.baseurl }}/img/filename1.png" width="579" height="630" alt="Alt text">
<img src="{{ site.baseurl }}/img/filename2.jpg" width="482" height="573" alt="Alt text" style="margin-top: 1em">

...

There is still a lot of stuff there that is not the images names (the {{ site.baseurl }} is Jekyll's liquid notation for automatically inserting the url of our site). Let's take a look at the second and third sed commands, what do they do?

The first:


sed -i 's/.*}//g' output.txt

is just searching for a pattern like this .*} in the output.txt file, and replacing it with... well, with nothing. Then, what is that expression targeting? The .* is telling sed to take everything it finds up until the last occurrence of }. The second sed command does something similar:


sed -i 's/".*//g' output.txt

it searches for the first occurrence of " after the first sed is executed, and replaces everything that finds after (that is, .*) with nothing. Since sed works line by line, we end up with an output.txt file whose contents are:


/img/filename1.png
/img/filename2.jpg

...

and now, it's very easy to manipulate that file for your purposes. For example, if you wanted to copy those images to a new path, you would have to add something like cp (the Linux copy command plus a space) at the beginning of each line, followed by a space plus the new path for your files, at the end of each line. Then you could run the resulting file in your terminal.

But we can do that with sed too! Take our previous "double sed" command. If, instead of replacing with nothing, we replace with cp\ . in the first sed (notice the use of \ before spaces) and with the new path \ newpath in the second sed, we're done!


$ cd sitefolder
$ sed '/<img/!d' _posts/* > output.txt
$ sed -i 's/.*}/cp\ ./g' output.txt && sed -i 's/".*/\ newpath/g' output.txt

This would generate an output.txt file containing:


cp ./img/filename1.png newpath
cp ./img/filename2.jpg newpath

...

If you now run that file in your terminal:


$ ./output.txt

You will have all your relevant images (only the ones that are actually embedded in your posts) copied magically to a new path.

TA-DA!

And now, go for a cookie :-)

Want to know more?

If you want a good resource about sed with exhaustive examples, Bruce Barnett wrote an excellent guide at Grymoire.

Comments