Posted on
Or any type of file, really. I needed to quickly remove links from an old website that was using flat HTML files. In my linux command line, I found I could do:
perl -pi -e 's/SEARCH/REPLACE/g' *.html
To replace all instances of SEARCH with REPLACE in *.html.
Except I needed to do a fair bit of escaping, because HTML is full of characters that mean something else on the command line.
So let’s say the string I needed to remove was:
<a title="Search Engine Optimisation" href="http://superspammyseocompany.com/" target="_self"><span>Search Engine Optimisation</span></a> by <a title="Super Spammy SEO Company" href="http://superspammyseocompany.com/" target="_self">Super Spammy SEO Company</a>
I copy + pasted this into vim, and then every time these characters occur:
< , >, / and ”
I put a \ in front of each of these, which gave me:
\<a title=\"Search Engine Optimisation\" href=\"http:\/\/superspammyseocompany.com\/\" target=\"_self\"\>\<span\>Search Engine Optimisation\<\/span\>\<\/a\> by \<a title=\"Super Spammy SEO Company\" href=\"http:\/\/superspammyseocompany.com\/\" target=\"_self\">Super Spammy SEO Company\<\/a\>
Which was a bit of work, but still much more fun than manually removing the link from each file.
Note that these characters do not need to be escaped with a backslash:
= (equals), . (dot), and _ (underscore)
So my final command was:
perl -pi -e 's\\<a title=\"Search Engine Optimisation\" href=\"http:\/\/superspammyseocompany.com\/\" target=\"_self\"\>\<span\>Search Engine Optimisation\<\/span\>\<\/a\> by \<a title=\"Super Spammy SEO Company\" href=\"http:\/\/superspammyseocompany.com\/\" target=\"_self\">Super Spammy SEO Company\<\/a\>//' *.html
I’d already initialised a git repository and committed the files so I could easily restore the files in case of a mistake. A quick look through the links showed it all worked perfectly, and it saved me so much time I thought I’d write this post about it.
Bonus: I outputted all the changed files to list.html, which had one filename per line, like:
./file1.html ./file2.html ./file3.html
Here’s the vim command to turn them all into links, for easy human checking:
:%s/^\(.*\)$/<a href="\1">\1\<\/a\>\<\/br\>