P. 1
Mining Unique Information Sources & Deep-Invisible-Hidden-Opaque Web Recap Final

Mining Unique Information Sources & Deep-Invisible-Hidden-Opaque Web Recap Final

|Views: 67|Likes:
Published by ClosetLibrarian

More info:

Published by: ClosetLibrarian on Aug 13, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

08/13/2011

pdf

text

original

Anna F.

Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

June 14, 2011

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

Definitions vary as to what it is / is not Many names – deep, invisible, hidden, opaque etc
Surface web is

“visible” portion

Baseline Research, particularly re size, dated
Term coined by

Michael Bergman…

Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

2

 Pages lower ranked due to Search Engine Optimization [SEO]  Sites coded to exclude bots

 Dynamic content generated by page search – stats, etc
 Search engine chooses not cover whole site due volume of context  Format – video/image w/o text/tags, or they’re incomplete  Site/pages not connected with pages browser(s)
Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

 Password protected pages

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

3

Figure 1. Search Engines: Dragging a Net Across the Web's Surface

Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

4

Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

Q4 NPD Search &Portal Site Study, reported by Search Engine Watch

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

5

Pearl Grow … ID Leakage Points [factoring in copyright & other IP concerns] Non Central Hosts of Content
 E.

G. Content not controlled by “HQ”

Surgical Manual Browsing

Dark Web Browsers [also use pathfinders]
Leverage Your {on & offline} Networks ….

Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

6

Conduct Search

Evaluate What You Find…

Mine the most on point for more ideas …

Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

Refine your search strategy, Continue your investigation, Repeat process as needed…

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

7

Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

www.pearltrees.com

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

8

Look

For…

 Concepts/Terms/Catch Phrases etc  Names –

Experts/Reports/Publications  URL Roots , e.g. are the most relevant loaded on the same part of site

Revisit strategy adding incremental terms and/or re-weighting / editing Boolean linkages


Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

Don’t forget the reverse
 What are key

terms repeating in the “false drops”

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

9

[factoring in copyright & other IP concerns]

Analogy: Similar to emotional conversations, where “speaker” may or may not
 Intend

for public [or so much of them] to “hear” [have access to] it  Fully comprehend others’ valuation of the information  Understand originators perspective – “But I only told that to…
And they promised not to tell anyone…”


Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

Information Leakage Points
 Reuse by others – clients, ex employees
   

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

Conference Presentations Case Studies or other Sales & Marketing Collateral Continuing Education [especially MBA classes] Social Media
10

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

Non Central Content Hosts, e. g. content not “HQ” controlled

Branch offices of consultants, research firms, usually non-US
Biz units migrating tech platforms [generally post-merger] Satellite campuses, larger academic institutions Event-driven sites – conferences, product introductions, etc… Non-merger partnerships / joint initiatives “Relationships” NEC–specialized social networks , non-profits.. [sometimes exec bios NOT pasted verbatim from corp site]

Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

11

Dark Web / Specialized Browsers, etc

Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

12

Surgical Browsing
 

Identify Potential High-Value Sites Navigate Manually

Create Your Own Site Index Using a Browser

It can include
  

Downloading {smaller} sites into Adobe to browse offline Looking for cross-linking to site, especially several layers in Locating historical content in caches or archiving sites

Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

Be Careful

While limiting searches by doc type [pdf etc] is effective

Searchable layers can mask them behind other file types

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

13

Leveraging Who You Know, Digitally & Offline
 What

do people in that field read, on & offline ? What would they consider a waste of time?

A

large part of the challenge is indexing…
 And

you need to ID what they “miss”

Sometimes there’s no GPS, must already know where you’re going, or at least a mid-point…

Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

14

Needles in Haystacks aren’t invisible, but they can be more work to locate Some even hide in plain sight Have a plan, but flex it as needed Take good notes, bookmark good leads, save best hits Might not find them again or they change ALWAYS Consider the Source Manage time spent, don’t get lost
Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

Be Flexible, but still… Plan Ahead

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

15

Planning Ahead
*AFS example based on model designed by KnowledgeInforrm

Based On Questions You Are Seeking To Answer ID Potential Sources, & “Pearl Grow” From There

Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

16

ID/Target Sources {categorization subjective – many fit multiple}
Influencers Consultants / Think Tanks Pollsters / Market Researchers Academia Governmental NGOs & Advocacy Groups Trade/Professional Associations Other Niche Organizations Businesses & Publishers, NEC Aggregators/Re-packagers/Peer-Sharing , NEC PEOPLE NEC

Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

http://tinyurl.com/AIIP-AFS
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

17

 Deep

Web: Surfacing Hidden Value Zillman: Deep Web Research

www.brightplanet.com/images/uploads/DeepWebWhitePaper_20091015.pdf

 Marcus

www.llrx.com/features/deepweb2011.htm or www.deepwebresearch.info

 Chris

Sherman @ Information Online
Jackson [for SCIP] Getting Most ….

www.docstoc.com/Docs/Document-Detail-14.aspx?doc_id=84592274

 August

http://homepage.mac.com/cornfed/internetdeepweb.pdf

 Using Web Investigative Reporting Tool www.slideshare.net/tccj/web-as-investigative-tool or http://campuscoverage.org/sites/default/files/Docs/Presentations/CCPInternet.ppt Model

& Analyze Deep Web & Efficient Crawling Deep Web

Anna F. Shallenberger
President & “Chief Archer”
Shallenberger Intelligence Services anna@targetedknowledge.com 203.258.2383 917.591.6732 fax www.targetedknowledge.com http://twitter.com/ClosetLibrarian
http://www.slideshare.net/ClosetLibrarian

www.scribd.com/doc/59496007/Modeling-and-Analyze-the-Deep-Web-Surfacing-Hidden-Value

 Accurate

www.scribd.com/doc/57147960/Accurate-And-Efficient-Crawling-The-Deep-Web-Surfacing-Hidden-Value

 Web CRS

& Twitter Archiving @ Library of Congress

www.slideshare.net/nullhandle/web-and-twitter-archiving-at-the-library-of-congress

report to Congress www.docstoc.com/docs/84024621/CRS-Report-for-Congress Much Information [UC Berkeley]

http://closetlibrarian.blogspot.com
www.linkedin.com/in/annafayshallenberger www.ci2020.com/profile/AnnaFShallenberger

 How

http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/printable_report.pdf
Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

http://tinyurl.com/AIIP-AFS

18

No member of a crew is praised for the rugged individuality of his rowing. ~Ralph Waldo Emerson Thanks & Best Regards, 203.258.2383 cell 917.591.6732 fax
anna@targetedknowledge.com http://twitter.com/ClosetLibrarian http://www.slideshare.net/ClosetLibrarian http://closetlibrarian.blogspot.com

Anna F. Shallenberger

An experienced researcher, educator, author, blogger, strategist & consultant, Anna Shallenberger, aka the ClosetLibrarian, was recently recognized in Best of the Business Web & featured on SlideShare’s home page. At SLA 2011 , Anna was a panelist for “Integrating with Sales & Marketing to Capture & Deliver Intelligence” & led an "Intelligence Café“ discussion regarding Unique Information Sources & the Deep Web. She was also a spotlight panelist @ SLA 2010 & served as conference planner for the CI Division.

Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use

19

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->