y what is data? y data formats used on the Web y finding data y downloading data y using that data
working with downloaded data
y distributing data y data quality issues
y data is here defined as statistical information generally stored in a database or analyzed and output in a variety of formats y data results from research, studies, surveys and data collection efforts y data can be found on the Web in a variety of formats
as raw data (as in databases or spreadsheets) or y as processed data in the form of tables, graphs and maps
How Are Data Made Available?
y Narrative - reports y tables y databases (possibly including documentation) y raw data y figures and graphs y Comma or tab delimited y maps y multiple formats (including facts) y metadata y questionnaires
What File Formats Are Being Used?
y ASCII. DOC. DAT y many others from graphic formats to GIS formats to database formats
. MTW (Minitab). HTML. EXE. XLS or WKS y PDF.
Explanations of File Extensions
y ASCII .Microsoft Word format y RTF .HyperText Markup Language format
y DOC .Adobe (proprietary)
.plain text with no boldings or control characters y HTML .Rich Text Format (output from MS Word) y PDF .
Database and Spreadsheet
y DBF .Dbase format y used by dBASE III. dBASE I. Foxpro y Quattro .Microsoft Excel spreadsheet format y WKS .database format y XLS . Vfoxbase.Lotus 123 spreadsheet format
.database format y Paradox .
y EXE .an executable file which you click on and the file opens up the compressed data into a format you can use y MTW .Minitab format y DAT files are basically text files formatted for use with SAS and contain data only y uuencoded y TAR .tar files (UNIX) -several files compacted and output as one file y ZIP .zipped
bitmap file format y GIF .Graphical Interchange Format y JPG .Common Graphics Formats
y BMP .another graphics format y WMF (Windows Metafont)
.Joint Photographic Experts Group (after group which created the standard) y TIF .
Atlas Mapmaker y *.GISFormats
y *.MIF/*. *.MapViewer
.Atlas GIS/Pro y *.MID.BNA .TAB .GCM .MIF/*.BNA .MapInfo y *.DXF .AutoCAD y *.Geo Concept y *.ArcView y *.AGF .Maptitude y *.MID .DXF .E00 .Arc/Info y *.
y bits and bytes y paper (e. reports) y HTML y 9 track tapes y cartridges y CD-ROMS y Note: formats go out of fashion as new technology comes along
. memos. summary data: press releases..g.
y will establishing standards help the problem of too many formats? y yes. no. maybe y we still need to deal with the data which has already been produced y new formats are being developed all the time
y knowing which format the data is in helps IF we know how to use that format y knowing how to move from one format to another is also useful
Where Do You Find Data on the Web?
y y y y
Government at all levels Foundations and Think Tanks Pharmaceutical companies Disease specific Associations (e.search on: health data
. Reports and other documents y other sites ..g. American Heart Association) y Textbooks.
Use Search Engines to Find Data
y robots which search across the entire Web y use the term: health data to turn up quite a few Websites y use site specific search engines to locate information within one site
y guides are hierarchical arrangements of Websites usually organized by subject y several very good sites on statistics appear on the Web
4.24/cgibin/surveymost?bls y e.g.142. White House Briefing Room
.html and http://146.Preformatted Data (Tables or Figures)
y this is often data which is requested time and time again y e.g.gov/top20..bls.. Bureau of Labor Statistics http://stats.
y look for sites which enable you to retrieve data according to your needs y usually searchable databases where you type in the population and variables you want data on y sites which enable you to retrieve data according to your needs are most often found at government sites such as at CDC WONDER
Downloading Data Graphics
y click on the image with right mouse button | then save file to disk y insert graphics into your document with the source information appended
y go to a Website which has an HTML table y In Netscape: File | Save as File (in HTML format)
Netscape will save the whole page y delete out what you don’t want
y In Excel: File | Open File | change file format using down arrow | click on file y Excel will turn the page from HTML to a spreadsheet
y copy and paste
Excel (XLS) Files
y click on file and either open. view and save the file. OR click on the file and save directly to disk
for figures.Word Tables or Figures
y for tables. click once to select. copy and paste. then Edit | Copy | Edit | Paste
y will depend on the database format and whether you can search the database for information y or download the data in tab.or comma-delimited format
PDF .Moving Tables and Graphs into Office 97
y what you can and cannot do y you can bring a PDF table or graph into Word and PowerPoint y you can make the graph or table larger or smaller y you cannot edit the contents of the graph y a PDF document created from Word cannot be converted back into Word
select Select Graphics
.Moving Tables from PDF...
y open the Adobe Viewer and the document you wish to view y highlight the text or graphic y from the Tools menu.
Moving Tables from PDF...
y the cursor changes to a cross-hair y draw a rectangle around the text or graphic by clicking once in the upper left-hand corner of the text and drawing at a diagonal to the other corner
Moving Tables from PDF...
y click on Edit | Copy. switch to the receiving document and hit Edit | Paste y the selected material is copied as a WMF (Windows MetaFont) graphic y note the WMF is a space hog y change the format to GIF or JPG using a graphics program like Paint Shop Pro
Uuencoded Data .What to do with it?
y data sent to you in uuencoded format needs to be uudecoded y format is uudecode FILENAME.EXT (the file name is often in caps) y sometimes the uudecoding doesn’t work
y Information that has not been organized. or analyzed y raw data can be imported into SPSS if it’s organized in columns y import tab and comma delimited ASCII files into Excel as well
A format for representing data used by some applications including EXE files and numerical data
y binary .
2 y PkZip
y you must unzip the files using one of a number of unzipping programs y my two favorites are: y Winzip v 6.
locate the file. select it and have the server send you the data
. log in as anonymous.Retrieving Data through FTP
y some sites make their data available on anonymous (public) FTP subdirectories y you FTP to the site. give your login address as your password.
Retrieving Data via Gopher
y use a gopher menu or gopher on your browser to burrow down the menu to get the data you want y when you reach the data it will be in ASCII format (most of the time) y Gopher format on the Web: gopher://site.edu/subdirectory/subdirectory/filena me.ext
HTTPing Data to Your Desktop. or. Get Data from a Web Source
y look at Excel Help for “Get data from a Web Source”
Now We Have the Data.So. Now What?
y What tools can we use? y Excel y Access y Statistical Software y other software
use the Import command: File | Open | Type of File | Text Files | select file you wish to import (doubleclick) y the Text Import Wizard will help you specify into which column you want to put your text
.One Example of Converting Data from One Format to Another
y you may need to convert a tab or comma delimited file to Excel y in Excel.
htm y pulls data from virtually any computer report and converts it into nearly 40 spreadsheet and database formats y e.$189/single user y http://www.File Translation Utility
y DataImport Version 5 . conversion of hospital financial and admissions data for analysis
.What is an Excel Web Query?
y A query that can retrieve data from several locations y Internet (World Wide Web) y Intranet y Hard drive y using HTTP. FTP.
Web Queries .Basic Process
y open a new workbook in Excel y create a Web query or use an existing query
queries are located in the Queries folder on your installation disk y you may need to install them
y activate MS Internet Explorer / Netscape Communicator y connect to the Internet through your ISP y run the query y examine the data
then Create a Web Query
.Create Your Own Web Query (.iqy) files
y y y y
open a new workbook in Excel you can create your own Web Queries read the Excel Help Documentation to get details look under Web Queries.
who?) y is complete documentation available?
variables defined and located on media
y is the data in a format you can easily used?
y how do you evaluate the quality of data on a site? y consider who is the author (government agency? pharmaceutical company. interested layperson.
SAS..Data Quality. SPSS?
y How current is the data? y is the source of the data made clear? y what is the sample size? y weighting and other statistical necessities? y is it easily transported to a statistical package such as Epi Info..
y what format is the data in? AND do you have the hardware to work with the data
9-track tape? Is the tape medium old or has it been refreshed? Old tapes sometime cause data loss when you try to read them y cartridge? y CD-ROM?
y why was the data collected? y Does the data collected reflect “real life”
Data Quality .Role of Documentation and Metadata
y documentation helps the viewer of the data understand how the data was created and how to use it y metadata is data about the data
Making Data Available
y y y y
email attachments Web using Excel and HTML diskette and Zip cartridges CD-ROM
Convert an Excel Spreadsheet to HTML
y File | Save as HTML y the HTML wizard will appear and will ask you to specify a range of cells and charts to convert y you decide if you wish to create a whole new HTML page or just a table which is then inserted into anothe HTML page y tell Excel where you want to put the file y then click Finish
create a graph in Excel and send in a uuencoded format or as an attachment unless you know the secret of uudecoding the file.g..Really Easy Distribution of Data via Email
y data can be sent as ASCII text in the body of an email or in other formats as attachments y people can create data in one format but might have to change it to another format to send it over the Internet
e. how can you retrieve the graph?
y attachments are preferred to uuencoding a file