Data Mining and Homeland Security: An Overview
Data mining has become one of the key features of many homeland securityinitiatives. Often used as a means for detecting fraud, assessing risk, and productretailing, data mining involves the use of data analysis tools to discover previouslyunknown, valid patterns and relationships in large data sets. In the context of homeland security, data mining can be a potential means to identify terroristactivities, such as money transfers and communications, and to identify and track individual terrorists themselves, such as through travel and immigration records.While data mining represents a significant advance in the type of analytical toolscurrently available, there are limitations to its capability. One limitation is thatalthough data mining can help reveal patterns and relationships, it does not tell theuser the value or significance of these patterns. These types of determinations mustbe made by the user. A second limitation is that while data mining can identifyconnections between behaviors and/or variables, it does not necessarily identify acausal relationship. Successful data mining still requires skilled technical andanalytical specialists who can structure the analysis and interpret the output.Data mining is becoming increasingly common in both the private and publicsectors. Industries such as banking, insurance, medicine, and retailing commonly usedata mining to reduce costs, enhance research, and increase sales. In the publicsector, data mining applications initially were used as a means to detect fraud andwaste, but have grown to also be used for purposes such as measuring and improvingprogram performance. However, some of the homeland security data miningapplications represent a significant expansion in the quantity and scope of data to beanalyzed. Some efforts that have attracted a higher level of congressional interestinclude the Terrorism Information Awareness (TIA) project (now-discontinued) andthe Computer-Assisted Passenger Prescreening System II (CAPPS II) project (now-canceled and replaced by Secure Flight). Other initiatives that have been the subjectof congressional interest include the Multi-State Anti-Terrorism InformationExchange (MATRIX), the Able Danger program, the Automated Targeting System(ATS), and data collection and analysis projects being conducted by the NationalSecurity Agency (NSA).As with other aspects of data mining, while technological capabilities areimportant, there are other implementation and oversight issues that can influence thesuccess of a project’s outcome. One issue is data quality, which refers to theaccuracy and completeness of the data being analyzed. A second issue is theinteroperability of the data mining software and databases being used by differentagencies. A third issue is mission creep, or the use of data for purposes other thanfor which the data were originally collected. A fourth issue is privacy. Questionsthat may be considered include the degree to which government agencies should useand mix commercial data with government data, whether data sources are being usedfor purposes other than those for which they were originally designed, and possibleapplication of the Privacy Act to these initiatives. It is anticipated that congressionaloversight of data mining projects will grow as data mining efforts continue to evolve.This report will be updated as events warrant.