Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword or section
Like this
2Activity

Table Of Contents

CHAPTER 1: THE STRUCTURE OF HTTP REQUESTS
The HTTP Recipes Examples Site
Figure 1.1: The HTTP Recipes Web Site
Figure 1.2: The Homepage for this Book
The Structure of Surfing
Examining HTTP Requests
Figure 1.3: A Typical Web Page
Listing 1.1: HTML for the Typical Web Page
HTTP Requests and Responses
Figure 1.4: Result of GET Request
Figure 1.5: An HTML Form
Listing 1.2: The HTML Form
Figure 1.6: Result of the POST Request
HTTP Headers
Listing 1.3: Typical Request Headers
Listing 1.4: Typical Response Headers
Recipes
Listing 1.5: Port Already in Use
Figure 1.7: Hello World
Listing 1.6: Simple Web Server (SimpleWebServer.cs)
Listing 1.7: File Based Web Server (WebServer.cs)
Summary
CHAPTER 2: EXAMINING HTTP TRAFFIC
Using a Network Analyzer
Understanding WireShark
Figure 2.1: Internet Explorer Options
Figure 2.2: WireShark
Figure 2.3: Select an Interface
Figure 2.4: Capturing Packets
Figure 2.5: Captured Packets
Figure 2.6: Filter Options
Figure 2.7: Filtered to Only HTTP Packets
Figure 2.8: The Parts of an HTTP Request Packet
Figure 2.9: An HTTP Request
Figure 2.10: An HTTP Response
Figure 2.11: A Reassembled PDU
Figure 2.12: Ready to Create a Cookie
Figure 2.13: Cookie as Part of a HTTP Response
Figure 2.14: Cookie as Part of a HTTP Request
Figure 2.15: An HTML Form
Figure 2.16: An HTTP Form Request
CHAPTER 3: SIMPLE REQUESTS
Figure 3.1: The Address of a Web Page
Constructing URLs
What is a URL?
Table 3.1: Common HTML Schemes
Encoding Special Characters into a URL
Reading from URLs
Listing 3.1: Download a Web Page (GetPage.cs)
Figure 3.2: The Current Time
Listing 3.2: HTML Source for the Current Time
Listing 3.3: Get the Time in St. Louis (GetTime.cs)
Listing 3.4: The HTML for the Cities List
Listing 3.5: Get the Time for Select Cities (GetCityTime.cs)
Figure 3.4: An Image to Download
Listing 3.6: Download a Binary File (DownloadBinary.cs)
Table 3.2: How Operating Systems End Lines
Listing 3.7: Download a Text File (DownloadText.cs)
CHAPTER 4: BEYOND SIMPLE REQUESTS
Using HttpWebRequest
Table 4.1: HTTP Request Header Methods and Functions
Table 4.2: Identities of Several Major Browsers
Using HttpWebResponse
Table 4.3: MIME Families
Table 4.4: Common MIME Types
Other Useful Options
Listing 4.1: Scan a URL for HTTP Response Headers (ScanURL.cs)
Listing 4.2: Scan for Web Sites (ScanSites.cs)
Figure 4.1: Scan for Sites
Listing 4.3: Download Text or Binary (DownloadURL.cs)
Listing 4.4: Monitor Site (MonitorSite.cs)
CHAPTER 5: SECURE HTTP REQUESTS
Using HTTPS in C#
Figure 5.1: HTTPS Verification Seal
Figure 5.2: The HTTP Recipes Certificate
Understanding HTTP Authentication
Figure 5.3: Ready to Enter a Protected Area
Figure 5.4: Enter your ID and Password
Listing 5.1: Is a Connection HTTPS (IsHTTPS.cs)
Listing 5.2: Download Authenticated URL (AuthDownloadURL.cs)
CHAPTER 6: EXTRACTING DATA
Peekable Stream
Listing 6.1: The Peekable Stream (PeekableInputStream.cs)
Parsing HTML
Listing 6.2: Parsing HTML (ParseHTML.cs)
Encapsulating HTML Tags
Listing 6.3: HTML Tags (HTMLTag.cs)
Figure 6.1: An HTML Choice List
Listing 6.4: Parse a Choice List (ParseChoiceList.cs)
Figure 6.2: An HTML List
Listing 6.5: Parse an HTML List (ParseList.cs)
Figure 6.3: An HTML Table
Listing 6.6: Parse a Table (ParseTable.cs)
Figure 6.4: Hyperlinks
Listing 6.7: Parse Hyperlinks (ExtractLinks.cs)
Figure 6.5: HTML Images
Listing 6.8: Extracting Images from HTML (ExtractImages.cs)
Figure 6.6: A List of Subpages
Figure 6.7: The Missouri Sub-Page
Listing 6.9: Parse HTML Sub-Pages (ExtractSubPage.cs)
Figure 6.8: A Partial HTML Page
Listing 6.10: Parse HTML Partial-Pages (ExtractPartial.cs)
Figure 7.3: A Multipart Form
Processing Forms
Listing 7.1: Form Utility (FormUtility.cs)
Listing 7.2: Using HTTP GET (FormGet.cs)
Listing 7.3: Using HTTP POST (FormPOST.cs)
Figure 7.4: A Successful Upload
Listing 7.4: Using Multipart Forms to Upload (FormUpload.cs)
CHAPTER 8: HANDLING SESSIONS AND COOKIES
URL Variables for State
Cookies for State
Table 8.1: Extracting from Cookieless Session
Listing 8.1: Cookieless Session (Cookieless.cs)
Table 8.2: Extracting from Cookie Based Session
Listing 8.2: Cookie-Based Session (UseCookie.cs)
CHAPTER 9: USING JAVASCRIPT
Understanding JavaScript
Common JavaScript Techniques
Figure 9.1: An Automatic Choice List
Figure 9.2: JavaScript Includes
Figure 9.3: A JavaScript Enabled Form
Figure 9.4: Amortization Data
Interpreting JavaScript
Listing 9.1: Automatic Choice Lists (DownloadAtricle.cs)
Listing 9.2: JavaScript Includes (Includes.cs)
Listing 9.3: JavaScript Forms (JavaScriptForms.cs)
CHAPTER 10: WORKING WITH AJAX SITES
Table 10.1: AJAX Components
Understanding AJAX
Figure 10.1: A Simple HTML Page
Figure 10.2: A Simple Page in the DOM Inspector
Figure 10.3: An AJAX Drop-List
Figure 10.4: Viewing Missouri
Figure 10.5: WireShark Examining AJAX
Listing 10.1: Non-XML AJAX Bot (AjaxNonXML.cs)
Figure 10.6: Searching for States
Figure 10.7: Displaying a State
Listing 10.2: XML AJAX Bot (AjaxXML.cs)
CHAPTER 11: HANDLING WEB SERVICES
Notable Public Web Services
Table 11.1: Large Websites Offering Web Services
Using the Google API
Hybrid Bots
Understanding SOAP
Listing 11.1: Pig Latin Server’s WSDL
Listing 11.2: Pig Latin SOAP Request
Listing 11.3: Pig Latin Server’s SOAP Response
Figure 11.1: Simple Project
Figure 11.2: Adding a Web Reference
Figure 11.3: The Google Search Service
Figure 11.4: The Google Search Service Added
Figure 11.5: Links Between Sites
Listing 11.4: Scanning for Links (GoogleSearch.cs)
Listing 11.5: Using .NET to Access a SOAP Server (PigLatinTranslate.cs)
Listing 11.6: A Google Hybrid Bot (WhenBorn.cs)
CHAPTER 12: WORKING WITH RSS FEEDS
Using RSS with a Web Browser
Figure 12.1: A RSS Enabled Site
Figure 12.2: The HTTP Recipes Feed
RSS Format
Listing 12.1: A RSS 1.0 File
Listing 12.2: A RSS 2.0 File
Parsing RSS Files
Listing 12.3: The RSS Class (RSS.cs)
Listing 12.4: The RSSItem Class (RSSItem.cs)
Listing 12.5: Display an RSS Feed (LoadRSS.cs)
Listing 12.6: Find an RSS Feed (FindRSS.cs)
CHAPTER 13: USING A SPIDER
Using the Heaton Research Spider
Table 13.1: Spider Configuration Options
Listing 13.1: A Configuration file for the Spider (spider.conf)
Table 13.2: The spider_host Table
Table 13.3: the spider_workload Table
Listing 13.2: Example CREATE TABLE DDL for Microsoft Access
Table 13.4: Spider Statuses
Listing 13.3: The SpiderReportable Interface (SpiderReportable.cs)
Table 13.5: Functions and Methods of the SpiderReportable Interface
Listing 13.4: Find Broken Links (CheckLinks.cs)
Listing 13.5: Report Broken Links (LinkReport.cs)
Listing 13.6: Download a Site (DownloadSite.cs)
Listing 13.7: Report Download Information (SpiderReport.cs)
Listing 13.8: Download the World (WorldSpider.cs)
Listing 13.9: Report for World Spider (WorldSpiderReport.cs)
Figure 13.1: Monitoring a Spider
Listing 13.10: Display Spider Statistics (SpiderStats.cs)
CHAPTER 14: INSIDE THE HEATON RESEARCH SPIDER
Table 14.1: The Heaton Research Spider Classes
The Spider Class
Listing 14.1: The Spider Class (Spider.cs)
Table 14.2: Instance Variables for the Spider Class
Other Important Classes in the Heaton Research Spider
Listing 14.2: Configuring the Spider (SpiderOptions.cs)
Workload Management
Table 14.3: URL States
Figure 14.1: URL State Diagram
Listing 14.5: Workload Management (WorkloadManager.cs)
Table 14.4: Methods and Functions in the WorkloadManager Interface
Implementing a Memory Based WorkloadManager
Listing 14.6: Memory Workload Manager (MemoryWorkloadManager.cs)
Table 14.5: Instance Variables of the MemoryWorkloadManager
Table 15.1: Instance Variables of the RepeatableStatement Class
Implementing a SQL Based Workload Manager
Listing 15.2: SQL Workload Management (SQLWorkloadManager.cs)
Table 15.2: Instance Variables of the RepeatableStatement Class
CHAPTER 16: WELL BEHAVED BOTS
Using a CAPTCHA
Figure 16.1: Four CAPTCHAs
User Agent Filtering
Robots Exclusion Standard
Using Filters with the Heaton Research Spider
Listing 16.1: Spider Configuration (Spider.conf)
Listing 16.2: The SpiderFilter Interface (SpiderFilter.cs)
Implementing a robots.txt Filter
Listing 16.3: A robots.txt Filter (RobotsFilter.cs)
APPENDIX A: DOWNLOADING EXAMPLES
APPENDIX B: SETTING UP EXAMPLES
Figure B.1: The Examples Archive
Running from a Command Prompt
Figure B.2: Running an Example from the Command Prompt
Running from Visual Studio
Figure B.3: The Recipes in Visual Studio
Compiling without Visual Studio
APPENDIX C: USEFUL CLASSES, METHODS AND FUNCTIONS
Reusable Functions and Methods
Table C.1: Reusable Functions and Methods
Reusable Classes
Table C.2: Reusable Classes
All Recipes
APPENDIX D: SETTING UP YOUR DATABASE
DDL for MySQL
Listing D.1: MySQL DDL Script
DDL for Microsoft Access
Listing D.2: Microsoft Access DDL Script
DDL for Oracle
Listing D.3: Oracle DDL Script
OLEDB for .NET
Listing D.4: A Spider.CONF File for OLEDB
APPENDIX E: HTTP RESPONSE CODES
1xx Informational
2xx Success
3xx Redirection
0 of .
Results for:
No results containing your search query
P. 1
HTTP_Programming_Recipes_for_C__Bots

HTTP_Programming_Recipes_for_C__Bots

Ratings: (0)|Views: 247 |Likes:
Published by Julio Santos

More info:

Published by: Julio Santos on Apr 12, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

04/06/2013

pdf

text

original

You're Reading a Free Preview
Pages 13 to 247 are not shown in this preview.
You're Reading a Free Preview
Pages 260 to 486 are not shown in this preview.
You're Reading a Free Preview
Pages 499 to 516 are not shown in this preview.
You're Reading a Free Preview
Pages 529 to 610 are not shown in this preview.
You're Reading a Free Preview
Pages 623 to 628 are not shown in this preview.

Activity (2)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->