ResearchGate
Study on Implementation and Impact of Google Hacking in Internet Security
2 2355
e
Some th autora hi publeston areal wer on hee elated projetsStudy on Implementation and Impact of Google Hacking in Internet
Security
Muharman Lubis’, Nurul Ibtisam binti Yaacob’, Hafizah binti Reh® and Montadzah Ambag
Abdulghani*
International Islamic University of Malaysia (HUM)
muharmanlubis @gmailcom, ‘nibisam@gmailcom, "ta_hafizBS@yshoo.com, ‘mon_ebdulgani@yahoo.com
ABSTRACT - As the number of websites and amount of
information has increased, madem lives rely. more on search
‘engines 10 scoop up relevant piece of information out of
Information Sea. In response to rapidly growing amount of
information on the web space, major search engine companies
such as Google, Yahoo, and MSN crawl web servers and index
crawled information more frequently and thoroughly on te global
level. Furthermore, to stay on top of the competitive search engine
market, they diligently improve search algoritim and endeavor to
provide Internet users with easy-to-use search interface. Due to
this diligent and competing effort of search engine compat
Internet users can Geely access billions of pages of information
regardless of time and space constraints with a simple typing and
clicking, Google Hacking uses the Google search engine to locate
Sensitive information or to find vulnerabilities that may be
‘exploited, This paper evaluates how much effor it takes to get
Google Hacking to work and how serious the threat of Google
Hacking is. The paper discusses the implementation and impact of
‘Google hacking in Intemet security,
Index Terms ~ Google hacking, Internet security, Google
implementation, impact
L INTRODUCTION
‘The idea of hacking might be conjured up stylized
images of electronic vandalism, espionage, dyed hair and
body piercing but the essential none other than that. Most
people associated hacking with breaking the law, claimed
all those who engage in hacking activities to be criminals.
Indeed, there are people out there who use hacking
techniques to break the law but hacking is not really about
that [9], the hacking is a way of understanding what is
possible, sensible and ethical in the twenty-first century by
slressed this embedded towards our life because the hack
needs a social and cultural context [10],
understand what hacking is, we need to know the difference
between hacking and cracking [11]. The definition of
hacking somehow change with cracking because the media
role and thoughtless of some people. All these hacking
activities exist within a set of communal relations that each
‘of them expresses a different aspect of hacking,
Recently, Hackers have been divided into three
‘categories that are black hats, white hats and grey hats,
which have been referred as malicious hackers, ethical
hackers and ambiguous hackers correspondingly (17]. Black
hhat hackers are people who hack computing systems for
their own benefit or the one who broaks into systems
illegally for personal gain, notoriety or other less-thas
legitimate purposes [I4][15], for example, they may hack
into an online store’s computer system and steal credit card
‘numbers stored in it while white hat hackers are the one
who wrote and tested open source software, worked for
In order to
comporations or hired by companies to help them beef up
their security, worked for the government to help catch and
prosecute black hat hackers and otherwise use their hacking
skills for noble and legal purposes,
‘There are also hackers who refer themselves as gray hat
hackers that they are operating somewhere between the wo
primary groups [14]. Gray hat hackers might break the law
‘but they consider themselves to have a noble purpose in
doing so. For example, they might erack systems without
authorization and then notify the system owners of the
systems" fallibility as a public service or find security holes
in software and then publish them to force the software
‘vendors to ereate patches or fixes for the problem.
Google Hacking is the most popular technique in
hacking activities that publicly introduce by Johnny Long
around 2004 that define as “the art of creating complex
search engine queries in order to filter through large
amounts of search results for information related 10
‘computer security” [16]. Attackers can use Google Hacking
to uncover sensitive information about a company or to
uncover potential security vulnerabilities through Intemet
‘even some people could use Google Hacking to determine if
their websites are disclosing sensitive information which
known as penetration testing or vulnerability assessment.
‘When a computer connects to a network and begin
communicating with other computers, it is essentially taking,
a risk refer to Internet security that involves the protection
of a computer's Internet account and files from intrusion of
an unknown user. Basic security measures involve
protection by well-selected passwords, change of file
‘permissions and back up of computer's data, In this paper,
implementation define as “carrying out, execution, or
practice of a plan, a method, or any design for doing
something” [16] and impact define as “a forceful
consequence; @ strong effect” that relates to the using
methodology of Google Hacking by certain malicious
people or group that effect the Internet Security
I, BACKGROUND
Nowadays, itis really hard to search information as the
umber of resources available on the Internet is increasing
at a rapid rate. Consequently, search engine that is also
Imown as web search engine or automated web search,
which is one of the services provided by Intemet has been
introduced, Search engine has been designed to help in
finding information stored on a computer system or on a
Website and help to minimize the time required to find
information [8]. One of the most powerful, efficient and
effective search engines is Google, Currently, Google
search engine has up to 12 billion pages [17] and whetherbelieve it or not, it is actually the starting point of many
hacking activities later and in fact, it is also one of the most
interesting uses of Google search engine by certain people,
this kind of activity is known as Google Hacking,
‘Organizations usually disclose too much information on
their Web servers without ever knowing that the leak or
‘weakness in there, somehow it’s utilized by malicious
hhacker. Further, search engines like Google has powerful
features that allow users to find some sensitive information
stored in the far comers of Web-connected servers and even
perform a vulnerability-searching attack. Inthe past 2 years,
Google hacking is a term that has not only become
commonplace in the security community but in the
‘mainstream media as well [13]. Apparently, Google hacking
involves using the popular Google search engine to locate
sensitive or confidential online information that should be
protected but they are not.
Using search engines to uncover sensitive information is
not a new concept. Nonetheless, with the numerous
advanced search operators that Google makes available on
its enormous database, carefully crafted query strings can
reveal jaw-dropping results [3]. Usually, The filtering
methods are performed through using advanced Google
‘operators while attackers can use Google Hacking to
uncover sensitive information about a company of to
uncover potential security vulnerabilities. A security
professional can use Google Hacking to determine if their
websites are disclosing sensitive information. Google
Hacking tumed out to be a very powerful and flexible
hhacking approach, it also found was very helpful to use
Google cached pages while performing Google Hacking
[2]. Google crawls web pages and stores a copy of them on
its local servers. They have tried to use Google cached
pages to anonymously browse a target's site without sending.
2 single packet to its server. Google grabs most of the pages
through crawls but omits images with some other space
consuming media, When viewed Google cached pages by
simply clicking on the cached link on the results page, the
hackers will end up connecting to the target's server to get
the rest of the page content.
‘When a user enters a keyword in a search text, the spider
‘will start exploring the Webs, Then the Google later on willl
retum a results page that consists of a name list of the site, a
summary or snippet of the site, the URL. of the actual page,
2 cached link that shows the page as it looked when the
spider last visited the page and a link to pages that have
similar content [3], Google's search resulls are dynamic,
When a query is submitted through Google’s web interface,
Google takes user to a created results page that can be
represented by a single URL that will appear in user
browser's address bar, For instance like the following URL:
hutp://www.google.com/searchhl=endeq=%22peanut
utler+and?422+jelly&binG=Search
"The question mark (?) denotes the end of the URL and
the start of arguments, the ampersand (&) separates
‘arguments, (BL) represents the language in which the results
page will be printed, (q) represents the start of the query
string, (%422) represents the hexadecimal value of the
double quotes character, the plus sign (+) represents a
space, and (btnG=Seareh) denotes that the Search button
was pressed on Google’s. web interface [13
Knowledgeable Google users can edit the URL directly
inside their browser's address bar and hit enter to get the
new search results in a very quick way. As a security
professional, itis critical to understand these URLs, so that
no one can perform a Google hacking vulnerability
assessment. The Google Hacking Database (GHDB) is a
database of queries used by contributed hackers to identify
sensitive data on your website such as error messages, files
containing passwords, files containing usemames (no
passwords), files containing juicy info (no usemames or
passwords, but interesting stuff none the less), pages
containing logon portals, pages containing network or
vulnerability data such as Firewall logs, sensitive online
shopping information, various online devices and
vulnerable servers [17][19]
IIL, IMPLEMENTATION
Google hacking already became popular and famous not
only in the hacker communities, but it also in common
people who don't really understand the hacking procedure.
The method and design in using Google search engine for
hacking, activities involves the combination between basi
and advance operator in Google to maximize the specilie
searching and finding, it could be divided as formal design
that refer to gaogledork and gaogleturds which introduce by
Johnny Long and recognized by academic and media level,
Google automated scanner that refers to specific software
bbe built by communities oF person to facilitate Google
hacking, manual exploration that refer to single attempt by
certain malicious hacker in enhancing Google hacking
Knowledge and lastly, the integrated hacking which put
Google hacking as the beginning process before do other
‘method of hacking.
Table 1. Advanced Googie Operator
|Search Service Advanced Seareh Operators
allinanchor:allintext allintite:, llinut:
cache: define fletype, i, inanchor: inf,
‘Web Search | rex initle, in, phonebook: relate,
image Sere | Mn aia, type na, nes,
allintext, alin, author, group:
Groups insubject, intext intitle
Diciony | nota, lina, ext, Hep
allintext aint, allinur intext, itl
News inurl; location: source:
Product
roduc allintext, allntit
The one of well-known implementation to ullize the
Google Search engine that is “googledork”, It is the attempt
to standardize function of Google Hacking by the first
introducer, Johany Long in his website as sharing
Knowledge. The term "googledork” that was coined by the
author has originally meant "An inept or foolish person as
revealed by Google” [19] but afler a great deal of media
attention, the term came to describe those who “troll the
Internet for confidential goods." Either description is really
fine but what matters are that the term googledork conveys
the concept that sensitive stuff is on the web and Google
ccan help you find it. The oficial googledorks page lists
many different examples of unbelievable things that have
‘been dug up through Google by the maintainer of the page,
Johnny Long, are around 14 categories of them refer to
GHDB [I7][19]. Each listing shows the Google searchrequired to find the information, along with a description of
‘why the data found on cach page is extremely interesting.
‘On the other hand, syntax and operator function in
Google search engine docsn't miss the error or the
‘weakness either, these one recognized as “Googleturds” that
defines as the litle dity pieces of Google ‘waste’ [1]. These
search results seem to have stemmed from typos Google
found while crawling a web page. Google also can reveal
‘many personal data when its advanced scarch parameters
are used. The implementation of googleturds along before is
{quite amazing with the revealing of eredit card, web
directories, password and many more. Google concern more
to correct the ertor in their operator and patterns because the
hhigh pressures from certain organization and companies
towards the huge impact of the uses these errors,
Google hacking involves the use of certain types of
search queries to look for Web site vulnerabilities, More
than approximately 1,500 such queries, mostly store in the
Google Hack database website by Johny Long, some of just
spread to the other discussion website or blog. This Google
Hacking tend slowly but surely brings more people to
develop the easiest tools or software to search effectively
and efficiently that combine certain methods of the
poogledorks inside it. Unfortunately, sometimes
‘organizations set up their systems in a way that allows
Google to index and save a lot more information than they
intended [23]
Another implementation is Google automated scanner,
‘one of the famous one is Goolag Scanner, a Windows-based
‘auditing tool that was built around the concept of "Google
hacking”. The Cult of the Dead Cow hacker group released
‘an open-source tool designed to enable IT workers to
‘quickly scan their Web sites for security vulnerabilities and
at-risk sensitive data, using a collection of specially crafted
Google search terms to provide a very easy and legitimate
tool for security professionals ta test their own Web sites for
vulnerabilities, and to raise awareness about Web security
in and of itself” [22]. Actually, many attempts have been
done in implementing Google hacking process.
‘automatically using the software even the newbie could use
it at all. Goolag Scanner, Goosean, Google Hacks,
subdomain Lookup, etc are the example of software to
facilitate these activities, pethaps time by time will
increased more along with the popularity of Google which
become great
Interestingly, the approach will bring many people try
‘mote to use Google operator in searching private data
‘thorough the Internet intentionally that force organization or
‘company put their best effort to prevent them, On contrary,
the malicious expert hacker could utilize this kind of
technique whereby massive newbie try to use Google
hhacking, Once they know the pattern, the in-depth searching
‘of implementation through manual exploration could be
done effectively and efficiently. The ease of use Google
Hacking somehow really incredible, every time and
everywhere we could use Google search engine to find out
private data in various purpose. The private data searches
‘grouped into four different seetions according to the privacy
level. These are identification data, sensitive data,
‘confidential data and secret data searches (21)
Identification data relates to personal identity of user, it
could be found out by keywords like name, address, phone,
email, curriculum vitae, usemame, ete. optionally for a
certain person or within certain document types. like
following query which find out many list of identity:
allintext:name email phone address ext:pdf
Meanwhile, sensitive data relates to data public but
contain private data whose reveal might be anger the owner,
like emails, forum postings, sensitive directories and
Web2.0 based applications like following query which find
‘out sensitive directories:
intitle:"index of inuelybackup
‘Whereas the confidential data relates to private data that
could be access by certain group or person only like
passwords, chat logs, confidential file, online webcams, ete
‘but Google still could reveal this kind of data like following
{query which find out address of online webcam:
inurl" viewerfiame?mode=motios
Lastly, secret data that relates to private data accessible
only to the owners like encryption messages, private keys,
secret keys, ctc. Finding encrypted message could be found
by following query like:
sintext
i" extiene
All kind of manual search exploration only needs the in-
depth knowledge in format, pattems, operator, perimeter
and practice. Certain person could be independence or
autodidact to be expert in Google hacking. However, the
‘worst case in implementing Google search engine is by
combining it with other hacking methods so it only the
beginning process like footprinting, port scanning,
information gathering, etc to further hacking process which
known as integrated hacking.
The prediction to utilize Google search engine as only
the first step to gain current weakness or vulnerability in
certain method became the hot topic relates to the process 10
prevent and recognizes it, Nowadays, pretty much any
hhacking incident most likely hegins with Google [1] so
utilization of Google Hacking is only the beginning but the
impact resides in there hazardously. The organization,
comporation or even the single individual which store their
data in the website should develop their own strategy,
policy and procedure to keep secure and safety their own
data from being revealed by somebody.
Iv. IMPACT
The search engines especially Google itself already
‘became the important tool in our daily activity by using
Internet so it will difficult to differentiate in the beginning
which user or group has intentionally to do Google hacking.
Consequently, there is no certain method to idemtify them
but only the protection and response from both Google
Corporation and the website system administrator can be
‘measured at this moment. The massive attack of Google
hacking have given the direct and indirect impact to Internet
security, in this study we classified the directly impact into
Tow impact which associated mostly in exploration,
‘moderate impact which associated mostly in exposure and
hhigh impact which associated mostly in exploitation, with
oth positive and negative impact, somehow trend and
standardization; awareness on Internet security; strategy,
ppolicy and procedure become the indirectly impact.er . |
Fice access online newspaper | Finding information regard
certain people
Google lock the
‘Mass quantity Google
backing user “poogleturds”
Find Sub domain lookup | Google hacking as Aisi
Inxeligence
Google Proxy Server hacking | Awareness on Interact,
“Moderate Impact
Find out vulnerabilities in Google Analytics
serve, files, web files, web | Implementation Checklist,
application, unauthenticated
program, various online
devices, ete
Application Fraud
Fahhanoe Tools to Proteet he
Privacy
Penetration Testing
Google Macking Exposed
VoIP
Footprinting ‘Standardization Tacking
Process Analysis
Sovial Engineering (Ceriicaton Ethical Hacker
Google Hacking as Malicious | — Honeypot Project Google
Code hacking trap
[ Tigh Impact,
‘Oracle Database Explovation
Fraud Prevention
Telecommunication
‘Safeguarding privacy against,
‘misuse and exploits
Tadexes & Reveal
Sensitve/Confidential Data
‘SAP Enterprise Paral Fnhance Defense Security
‘Security Exposure Surtegy
denliy The Fraud Management Indust
“avanced MySQH Tntemet security management
Exploitation Development (Prevention,
Protection, Response
Recovery
Thtemet safety, seurity and
privacy manuals
‘Apache Database Exploration |
Table 2. Google Hocking Impact
‘The last several years, identity thefl has been one of the
fastest growing crimes, Unfortunately, the Internet has been
facilitating this phenomenon since it represents a
tremendous open repository for sensitive _ identity
information available for those who know how to find them,
including fraudsters [20]. The exploitation of benefits from,
‘one identity is really worst case let alone the many identity
being used by certain user, it will give huge damage to
‘certain company, the system and the related users
themselves. Google backing somehow became the trend in
information communication and technology community
even they made the standardization as the process of
developing and agreeing upon technical standards among
themselves. Both end and standardization also become the
hot discussion influence by the massively use of Google
hacking
Nowadays, information, systems, and networks are
pervasive and ubiquitous, all of them provided throughout
the Internet. Internet's vast resources are an excellent means
for everyone to explore, research, and enjoy new
information and interests, The Intemet is a public place,
however, so it became important to teach the Internet user
how to be safe throughout the Internet because it also lies
the dangerous in there.
Recently, we need to lear about Google Hacking to
provide a good level of protection for our sites and to check
for sensitive information disclosure as strategy and policy in
anticipating Google hacking, As we become more familiar
with manual backs, we can start using some of the
automated Google Hacking tools. It will automate the hacks
but it is ensuring that every single page within our site is
protected. Automated toois allow for periodic secur
checks with frequency that is simply impossible to achieve
‘with manual hacks, Here the common activities based on
the strategy, policy and procedure in assuring Intemet
security that usually organization proceed [18],
1, Ensure host and network security basies are in place,
2. Buildipublish security features (authentication, role
management, key management, audivlog, crypto and
protocols.
3. Use external penetration testers to find problems.
4, Create share standard policy.
5. Identify gate locations and gather necessary artifacts
6, Know all regulatory pressures and unify approach,
7. Identify personally identifiable information (PII)
obligations.
8, Provide awareness training
9. Create security standards
10, Perform security feature review.
I. Tdentify software defects found in operations
monitoring and feed them back to development.
12. Ensure QA supports edge/boundary value condition
testing
13, Create of interface with ineident response.
14, Create data classification scheme and inventory.
15, Use automated tools along with manual review.
The sharing knowledge on the strategies, policies and
procedures is the advantages own by every companies or
‘organization to fight back the Google hacking threat, even
though the full security itself never exist, somehow it really
prevent the confidential or sensitive data towards the kiddy
‘or newbie of Google Hacking that usually increase day after
day as mountain as the impact of the openness of
knowledge in the Internet, The discussion and improvement,
should be done frequently, just in case to expertise the
strength of the strategy; policy and procedures while the
process of assuring and monitoring also should be done
cffectively.
The maintenance of these three approaches is really
difficult like other process of maintenance. Apparently, we
could not deny i because the management risk is\ the
attempt to prevent such disaster occurs towards security of
the company. It's better to prevent rather than a cure, in this
cease a security measurement rather than a disaster recovery,
Google backing could become serious and great threats to
fan organization or company. If a hacker spent enough time
analyzing the target and understanding how the queries
found information, they will be able to find the information
that they want even if the information are confidential or
sensitive. Moreover, the well-known implementation to
utilize the Google Search engine, “Google dork” and the
Google automated scanners will help hacker to easily find
sensitive data using Google. It shows that almost effortless
for today hacker to do Google hackingV. CONCLUSION
‘To be publicly accessible is the nature of web sites and
applications. Combined with search engine functionality, it
makes it effortless for attackers to access an organization
site or find out information about the organization, Some
‘organization did not realize that even directory listings,
error pages and hidden login pages can be indexed and
when a search engine “indexes” a site, it inadvertently
providing loads of information for potential attackers. This
is what Google hacking all about. However, there are some
probable solution and prevention to these’ problems. The
best way 0 face the Google hacking by doing the basic
risks management and those are prevention, protection and
response.
Prevention from Google hacking relates to anticipation
‘of needs, management wishes, hazards and risks [4). One of
the strategies in approaching the Google hacking prevention
are by reducing risk in the existing website so that the
disaster has the lower probability of occurrence. However,
those methods can be implemented if the services are
provided only in the low scale without the need of mobility
‘and intense of shating data such as keep sensitive data of,
consider removing site from Google's index [7], automatic
scanner, mitigation data, run regularly schedule assessment,
ete.
Protection from Google hacking relates to the process of |
keeping something such as confidential information or
sensitive data safe from being hacked. For instance, @
security token associated with a resource such as a file
Usually, the approach an organization could take for
protection is by balancing their availability, integrity,
confidentiality and performance such as installing firewall,
Robot.xt [6], Google Hack Honeypot [5], etc
Response from a company that has been hacked by
Google hacker is very important in order to avoid the
‘occurrence of the same kind of incident. If the company did
not respond to this incident, there might be another sensitive
‘or confidential data stolen by the hacker. Furthermore, since
the replacement cost le say, for stolen research data is high,
‘of course, a company do not want to cover the cost again
after one incident such as report the incident, educate the
‘employee, incident Response Policy, ete,
"Thus, we as a user should aware on the situation that
‘even hidden login pages can be indexed and when a search
engine “indexes” a site, it unconsciously providing treasure
‘woves of information for potential attackers. We should
improve our information security in order to avoid our
‘confidential or sensitive data from being hacked and finally,
‘we should also prepare ourselves with the best methods of
protection and prevention from Google Hacking.
REFERENCE,
(1) Long, J (2004), Google Hacking Min: Guide, reieved on | Feb 2010,
fiom
ip informs com/artcesartcle ype? LTO880A seg
[2] Big T, Dasehenke,¥. and Fran, C208), “Fratton of
ona Machin Proce of IfSescD Conference OS,
27.32, Sepember 2-7, 2008 Kemnesn A, USA
[5] taco. Workman, R207 Cxng Got Hacking to Enhance
Deferce Satie, Proceedings of IGESE 07x 491-895
18] Hawi Pala (HE), Preventing Googe hacking est proet
ow web applation, raised n't teh 2010" fom
Tein date sm lnes 29902 ton! S30Sen
eon ng ang
15] SowteFogend, What CHE’ etovol on 1 Feb 2010 om
rp aihaoureroge nt
(61 The rt Rabon Popes
‘upon abana
{7} Webmaster Tok Removing my ow cont fom Goole, eed om
1 Fb 2010 fom ho ios ggtecom ene hl
{8} Comer D200) The het Book vertng Tow Ned to Know
“hot Computer Networng td How Be terns Work BS,
Premise a
{o) kao) (003), Hacking: The rt of Exploitation, No Sch rs
{19} orn, TOs, Hackng ‘Distal Mosie nd Teco!
‘errmint Pay
{U1 Kil 7 Soo MON), veh ou Nendo Kao abot he
‘are of Computer Hacking. The Hosea Publishing Group
[12] tayo 1 (005) Hacking Goole for Pun and Profit. Reined
ebay” 2010," fet Chanel Regie webnte
[19] Lng Ted Skoud, F 05), Googe Hacking Jor Penton
Testers Serre
114 sind D1 Cras, M. 000, See of the Cbercrime 2B
Smes
{151 Wang} 000, Computer Neork Sear Springs
{ie} Wikipedm. 00, Goel Mating. Retined bry 8, 2010
rem hay en lige ang Gag hcing
iT) Verena XP (2007. Gog Hacking, Rev ay,
SHO fom Coleg of Ener an Jose Sate Unveriy
wate
Iw ngr sed mernaleousesonpe23hnudenprese
a ats GOOG PCILACRNG a
{U8} Metin. 6, Ces, Ban Migs, 8 2010). Sofware fnew:
Wit Wark Sofware Scary, Rescve Fey 28 2010
fom: hun. rm comaesurleprp 1308005
19} GHD, Gooey, Gaope Hacky atc Reeve Daas 3
2010 st pin ara com?
(20) Abii” Aan Toe Tse G00) The Impct of Googe
tasting om enti and Appian Prd PACKIMO?
(21 Emin ars Tai (200) Google acting gis Pris, Keser
Sonia, Gey ly 26
(221 Vina) 200, Hake grup rene ntomated ‘oot hacking
wo Revived Fein 32010 a
‘ping compurword cm eileDO82238acer grou
ater sional Cross asin ta
25) Sele § D200), 5 Way Gone Shaking the Security Worl
Rete Fey a Mai ae
[pin csvonnecom aisle 22131V/5 Ways Guole Is St
Hinge Sein Wor pane=
retteved on 1 Feb 2010 fom