You are on page 1of 31

Foot-printing & Reconnaissance #1

Google Hacking

“search term”

Force an exact-match search. Use this to refine results for ambiguous


searches, or to exclude synonyms when searching for single words.

Example: “Bill Gates”

OR

Search for X or Y. This will return results related to X or Y, or both. Note: The


pipe (|) operator can also be used in place of “OR.”

Examples: musk OR gates / musk | gates

AND

Search for X and Y. This will return only results related to both
X and Y. Note: It doesn’t really make much difference for regular searches, as
Google defaults to “AND” anyway. But it’s very useful when paired with
other operators.

Example: musk AND gates
-
Exclude a term or phrase. In our example, any pages returned will be related
to jobs but not Apple (the company).

Example: jobs -apple

*
Acts as a wildcard and will match any word or phrase.

Example: steve * apple 

()
Group multiple terms or search operators to control how the search is
executed.

Example: (imac OR iphone) apple

$
Search for prices. Also works for Euro (€), but not GBP (£) 

Example: iphone $629
define:
A dictionary built into Google, basically. This will display the meaning of a
word in a card-like result in the SERPs.

Example: define:hacker

cache:
Returns the most recent cached version of a web page (providing the page is
indexed, of course).

Example: cache:apple.com

filetype:
Restrict results to those of a certain filetype. E.g., PDF, DOCX, TXT, PPT,
etc. Note: The “ext:” operator can also be used—the results are identical.

Example: apple filetype:pdf / apple ext:pdf

site:
Limit results to those from a specific website.

Example: site:apple.com
related:
Find sites related to a given domain.

Example: related:apple.com

intitle:
Find pages with a certain word (or words) in the title. In our example, any
results containing the word “apple” in the title tag will be returned.

Example: intitle:apple

allintitle:
Similar to “intitle,” but only results containing all of the specified words in
the title tag will be returned.

Example: allintitle:apple iphone

inurl:
Find pages with a certain word (or words) in the URL. For this example, any
results containing the word “apple” in the URL will be returned.

Example: inurl:apple
allinurl:
Similar to “inurl,” but only results containing all of the specified words in
the URL will be returned.

Example: allinurl:apple iphone

intext:
Find pages containing a certain word (or words) somewhere in the content.
For this example, any results containing the word “apple” in the page
content will be returned.

Example: intext:apple

allintext:
Similar to “intext,” but only results containing all of the specified words
somewhere on the page will be returned.

Example: allintext:apple iphone
AROUND(X)
Proximity search. Find pages containing two words or phrases within X words
of each other. For this example, the words “apple” and “iphone” must be
present in the content and no further than four words apart.

Example: apple AROUND(4) iphone

weather:
Find the weather for a specific location. This is displayed in a weather
snippet, but it also returns results from other “weather” websites.

Example: weather:san francisco

stocks:
See stock information (i.e., price, etc.) for a specific ticker.

Example: stocks:aapl

map:
Force Google to show map results for a locational search.

Example: map:silicon valley
movie:
Find information about a specific movie. Also finds movie showtimes if the
movie is currently showing near you.

Example: movie:steve jobs

in
Convert one unit to another. Works with currencies, weights, temperatures,
etc.

Example: $329 in GBP

source:
Find news results from a certain source in Google News.

Example: apple source:the_verge

x_x
Not exactly a search operator, but acts as a wildcard for Google
Autocomplete.

Example: apple CEO _ jobs
#..#
Search for a range of numbers. In the example below, searches related to
“WWDC videos” are returned for the years 2010–2014, but not for 2015 and
beyond.

Example: wwdc video 2010..2014

inanchor:
Find pages that are being linked to with specific anchor text. For this
example, any results with inbound links containing either “apple” or
“iphone” in the anchor text will be returned.

Example: inanchor:apple iphone

allinanchor:
Similar to “inanchor,” but only results containing all of the specified words in
the inbound anchor text will be returned.

Example: allinanchor:apple iphone

loc:placename
Find results from a given area.

Example: loc:”san francisco” apple


location:
Find news from a certain location in Google News.

Example: loc:”san francisco” apple

Vulnerable web servers


The following Google Dork can be used to detect vulnerable or hacked
servers that allow appending “/proc/self/cwd/” directly to the URL of your
website.

inurl:/proc/self/cwd

As you can see in the following screenshot, vulnerable server results will
appear, along with their exposed directories that can be surfed from your own
browser.
Open FTP servers
Google does not only index HTTP-based servers, it also indexes open FTP
servers.

With the following dork, you’ll be able to explore public FTP servers, which
can often reveal interesting things.

intitle:"index of" inurl:ftp

In this example, we found an important government server with their FTP


space open. Chances are that this was on purpose — but it could also be a
security issue.
ENV files
.env files are the ones used by popular web development frameworks to
declare general variables and configurations for local and online dev
environments.

One of the recommended practices is to move these .env files to somewhere


that isn’t publicly accessible. However, as you will see, there are a lot of devs
who don’t care about this and insert their .env file in the main public website
directory.

As this is a critical dork we will not show you how do it; instead, we will only
show you the critical results:
You’ll notice that unencrypted usernames, passwords and IPs are directly
exposed in the search results. You don’t even need to click the links to get the
database login details.

SSH private keys

SSH private keys are used to decrypt information that is exchanged in the
SSH protocol. As a general security rule, private keys must always remain on
the system being used to access the remote SSH server, and shouldn’t be
shared with anyone.

With the following dork, you’ll be able to find SSH private keys that were
indexed by uncle Google.

intitle:index.of id_rsa -id_rsa.pub

Let’s move on to another interesting SSH Dork.

If this isn’t your lucky day, and you’re using a Windows operating system with
PUTTY SSH client, remember that this program always logs the usernames
of your SSH connections.

In this case, we can use a simple dork to fetch SSH usernames from PUTTY
logs:

filetype:log username putty

Here’s the expected output:


Email lists
It’s pretty easy to find email lists using Google Dorks. In the following
example, we are going to fetch excel files which may contain a lot of email
addresses.

filetype:xls inurl:”email.xls"

We filtered to check out only the .edu domain names and found a popular
university with around 1800 emails from students and teachers.
site:.edu filetype:xls inurl:"email.xls"

Remember that the real power of Google Dorks comes from the unlimited
combinations you can use. Spammers know this trick too, and use it on a
daily basis to build and grow their spamming email lists.

Live cameras
Have you ever wondered if your private live camera could be watched not
only by you but also by anyone on the Internet?

The following Google hacking techniques can help you fetch live camera web
pages that are not restricted by IP.

Here’s the dork to fetch various IP based cameras:

inurl:top.htm inurl:currenttime

To find WebcamXP-based transmissions:

intitle:"webcamXP 5"

And another one for general live cameras:

inurl:”lvappl.htm"

There are a lot of live camera dorks that can let you watch any part of the
world, live. You can find education, government, and even military cameras
without IP restrictions.

If you get creative you can even do some white hat penetration testing on
these cameras; you’ll be surprised at how you’re able to take control of the
full admin panel remotely, and even re-configure the cameras as you like.
Find indexation errors
Google indexation errors exist for most sites.

It could be that a page that should be indexed, isn’t. Or vice-versa.

Let’s use the site: operator to see how many pages Google has indexed
for ahrefs.com.

~1,040.

But how many of these pages are blog posts?

Let’s find out.
~249. That’s roughly ¼.

Let’s investigate further.

OK, so it seems that a few odd pages are being indexed.

(This page isn’t even live—it’s a 404)

Such pages should be removed from the SERPs by noindexing them.

Let’s also narrow the search to subdomains and see what we find.

~731 results.

SIDENOTE. Here, we’re using the wildcard (*) operator to find all


subdomains belonging to the domain, combined with the exclusion operator
(-) to exclude regular www results.
Here’s a page residing on a subdomain that definitely shouldn’t be indexed.
It gives a 404 error for a start.

Here are a few other ways to uncover indexation errors with Google
operators:

• site:yourblog.com/category — find WordPress blog category pages;


• site:yourblog.com inurl:tag — find WordPress “tag” pages.

Find non-secure pages


(non-https)
HTTPs is a must these days, especially for ecommerce sites.

But did you know that you can find unsecure pages with the site: operator?

Let’s try it for asos.com.

~2.47M unsecure pages.
It looks like ASOS don’t currently use SSL—unbelievable for such a large site.

But here’s another crazy thing:

ASOS is accessible at both the https and http versions.

And we learned all that from a simple site: search!

SIDENOTE. Sometimes, when using this tactic, pages will be indexed without


the https. But when you click-through, you will be directed to the https
version. So don’t assume that your pages are unsecure just because they
appear as such in Google’s index. Always click a few of them to double-
check. 
Find duplicate content
issues
Duplicate content = bad.

Here’s a pair of Abercrombie and Fitch jeans from ASOS with this brand


description:

With third-party brand descriptions like this, they’re often duplicated on


other sites.

But first, I’m wondering how many times this copy appears on asos.com.

~4.2K.
Now I’m wondering if this copy is even unique to ASOS.

Let’s check.
No, it isn’t.

That’s 15 other sites with this exact same copy—i.e., duplicate content.

Sometimes duplicate content issues can arise from similar product pages,
too.

For example, similar or identical products with different quantity counts.

Here’s an example from ASOS:

You can see that—quantities aside—all of these product pages are the same.

But duplicate content isn’t only a problem for ecommerce sites.

If you have a blog, then people could be stealing and republishing your
content without attribution.

You can use the tool: FIND STOLEN CONTENT IN SECONDS website:
https://ahrefs.com/content-explorer
Find odd files on your
domain (that you may have
forgotten about)
Keeping track of everything on your website can be difficult.

(This is especially true for big sites.)

For this reason, it’s easy to forget about old files you may have uploaded.

PDF files; Word documents; Powerpoint presentations; text files; etc.

Let’s use the filetype: operator to check for these on ahrefs.com.

SIDENOTE. Remember, you can also use the ext: operator—it does the


same thing.
Here’s one of those files:

I’ve never seen that piece of content before. Have you?

But we can extend this further than just PDF files.


By combining a few operators, it’s possible to return results for all supported
file types at once.

SIDENOTE. The filetype operator does also support things


like .asp, .php, .html, etc.

It’s important to delete or noindex these if you’d prefer people didn’t come
across them.

Find guest post opportunities


Guest post opportunities… there are TONS of ways to find them, such as:

But you already knew about that method, right!? 😉

SIDENOTE. For those who haven’t seen this one before, it uncovers so-
called “write for us” pages in your niche—the pages many sites create when
they’re actively seeking guest contributions. 
So let’s get more creative.

First off: don’t limit yourself to “write for us.”

You can also use:

• “become a contributor"
• “contribute to”
• “write for me” (yep—there are solo bloggers seeking guest posts,
too!)
• “guest post guidelines”
• inurl:guest-post
• inurl:guest-contributor-guidelines
• etc.

SIDENOTE. Did you notice I’m using the pipe (“|”) operator instead of “OR”
this time? Remember, it does the same thing. 

You can even search for multiple footprints AND multiple keywords.

Looking for opportunities in a specific country?

Just add a site:.tld operator.

Find sites that feature


infographics… so you can
pitch YOURS
Infographics get a bad rap.

Most likely, this is because a lot of people create low-quality, cheap


infographics that serve no real purpose… other than to “attract links.”

But infographics aren’t always bad.

Here’s the general strategy for infographics:

• create infographic
• pitch infographic
• get featured, get link (and PR!)
But who should you pitch your infographic to?
Just any old sites in your niche?

NO.

You should pitch to sites that are actually likely to want to feature your


infographic.

The best way to do this is to find sites that have featured infographics before.

Here’s how:

SIDENOTE. It can also be worth searching within a recent date range—e.g.,


the past 3 months. If a site featured an infographic two years ago, that
doesn’t necessarily mean they still care about infographics. Whereas if a site
featured an infographic in the past few months, chances are they still
regularly feature them. But as the “daterange:” operator no longer seems to
work, you’ll have to do this using the in-built filter in Google search. 

But again, this can kick back some serious junk.

So here’s a quick trick:

• use the above search to find a good, relevant infographic (i.e., well-
designed, etc.)
• search for that specific infographic
Here’s an example:

This found ~2 results from the last 3 months. And 450+ all-time results.

Do this for a handful of infographics and you’ll have a good list of prospects.
8. Find more link
prospects… AND check
how relevant they really are
Let’s assume you’ve found a site that you want a link from.

It’s been manually vetted for relevance… and all looks good.

Here’s how to find a list of similar sites or pages:

This returned ~49 results—all of which were similar sites.

SIDENOTE. In the example above, we’re looking for similar sites to Ahrefs’
blog—not Ahrefs as a whole.

But let’s assume that I know nothing about this site (yoast), how could I
quickly vet this prospect?

Here’s how:

• do a site:domain.com search, and note down the number of results;


• do a site:domain.com [niche] search, then also note down the number
of results;
• divide the second number by the first—if it’s above 0.5, it’s a good,
relevant prospect; if it’s above 0.75, it’s a super-relevant prospect.
Let’s try this with yoast.com.

Here’s the number of results for a simple site: search:


And site: [niche]:

So that’s 3,950 / 3,330 = ~0.84.

(Remember, >0.75 translates to a very relevant prospect, usually)

Now let’s try the same for a site that I know to be irrelevant: greatist.com.

Number of results for site:greatist.com search: ~18,000

Number of results for site:greatist.com SEO search: ~7

(18,000 / 7 = ~0.0004 = a totally irrelevant site)

IMPORTANT! This is a great way to quick eliminate highly-irrelevant tactics,


but it’s not foolproof—you will sometimes get strange or unenlightening
results. I also want to stress that it’s certainly no replacement for manually
checking a potential prospect’s website. You should ALWAYS thoroughly
check a prospects site before reaching out to them. Failure to do that
= SPAMMING.

HERE’S ANOTHER WAY TO FIND SIMILAR DOMAINS/PROSPECTS…


Site Explorer > relevant domain > Competing Domains
For example, let’s assume I was looking for more SEO-related link prospects.
I could enter ahrefs.com/blog into Site Explorer.
Then check the Competing Domains.
This will reveal domains competing for the same keywords.

Find social profiles for


outreach prospects
Got someone in mind that you want to reach out to?

Try this trick to find their contact details:

SIDENOTE. You NEED to know their name for this one. This is usually quite
easy to find on most websites—it’s just the contact details that can be
somewhat elusive.
Here are the top 4 results:
Find internal linking
opportunities

Internal links are important.

They help visitors to find their way around your site.

And they also bring SEO benefits (when used wisely).

But you need to make sure that you’re ONLY adding internal links where
relevant.

Let’s say that you just published a big list of SEO tips.

Wouldn’t it be cool to add an internal link to that post from any other posts
where you talk about SEO tips?

Definitely.

It’s just that finding relevant places to add such links can be difficult—
especially with big sites.

So here’s a quick trick:

For those of you who still haven’t gotten the hang of search operators, here’s
what this does:

• Restricts the search to a specific site;


• Excludes the page/post that you want to build internal links to;
• Looks for a certain word or phrase in the text.
Find sponsored post
opportunities
Sponsored posts are paid-for posts promoting your brand, product or
service.

These are NOT link building opportunities.

Google’s guidelines states the following;

Buying or selling links that pass PageRank. This includes exchanging money
for links, or posts that contain links; exchanging goods or services for links; or

sending someone a “free” product in exchange for them writing about it and

including a link

This is why you should ALWAYS nofollow links in sponsored posts.

But the true value of a sponsored post doesn’t come down to links anyway.

It comes down to PR—i.e., getting your brand in front of the right people.

Here’s one way to find sponsored post opportunities using Google search
operators:

~151 results.
Here are a few other operator combinations to use:

• [niche] intext:”this is a sponsored post by”


• [niche] intext:”this post was sponsored by”
• [niche] intitle:”sponsored post”
• [niche] intitle:”sponsored post archives” inurl:”category/sponsored-
post”
• “sponsored” AROUND(3) “post”

SIDENOTE. The examples above are exactly that—examples. There are
almost certainly other footprints you can use to find such posts. Don’t be
afraid to try other ideas.

WANT TO KNOW HOW MUCH TRAFFIC EACH OF THESE SITES GET?


DO THIS.
Use this Chrome bookmarklet to extract the Google search results.
Batch Analysis > paste the URLs > select “domain/*” mode > sort by organic
search traffic

Now you have a list of the sites with the most traffic, which are usually the
best opportunities.

You might also like