cURL for SEO

Long live the command line!

By Julien on November 14, 2019

I’ve been quite fond of the command line since my debuts on Linux, some 15 years ago.
It has a lot of sense for SEOs to learn how to use it, for a lot of different types of tasks.
And it’s often quicker than using any other tool.

Today, let’s talk about cURL.
This command line tool aims at transfering data using URLs. It’s more than 20 years old, fast and robust.
As an SEO, I use it daily for quick checks. Here’s some.

Fetching a file

Simply download a file. For instance, an HTML page:

$ curl https://www.example.com/my-page.html

It will print the source code for my-page.html in your terminal.
To save the file on your disk, simply redirect it:

$ curl https://www.example.com/my-page.html > my-file.html

You could also use the pipe (|) to chain it with another command. For example, let’s see if our page contains a <title> tag:

$ curl https://www.example.com/my-page.html | grep "<title>"

This will print any line with a <title> tag. Or nothing if there aren’t any.

Printing the headers only

I often just want to check the HTTP headers for a URL. It’s very simple with cURL, and the -I argument:

$ curl -I https://www.example.com/

This will print the headers in the terminal, allowing to check for X-Robots-Tag and other funky stuff.

To get both the headers and the body, use a lower -i instead:

$ curl -i https://www.example.com/

Customising User-agent

You might want to use a specific User-agent to test some pages. And that’s quite easy with cURL too. Let’s use the standard Googlebot-Desktop UA:

$ curl -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" https://www.example.com/  

And that’s it!

Following redirects

Let’s try some more complex arguments. Imagine you want to follow all the steps of a redirect chain.
Here’s how to do it with cURL:

$ curl -sLD - -o /dev/null -w "%{url_effective}" https://httpbin.org/redirect/3

Try this in your terminal and you will get 3 successive 302 redirects, then the final https://httpbin.org/get URL.

Creating aliases

As we just saw, some cURL arguments can be quite long and complex.
But we can create aliases to replace these complex commands with shortuts.

Here are my most used cURL aliases:

alias curlm='curl -A "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.103 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"'
alias curld='curl -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"'
alias curlr='curl -sLD - -o /dev/null -w "%{url_effective}"'

Simply paste these line in your .bash_profile file (usually at the root of your user directory) and restart your terminal.
You will then be able to just use curlm https://www.example.com/ to fetch a URL using Googlebot’s mobile User-agent, and so on.

cURL has a lot of other options. Checkout the man page for more.
Cheers!