Professional Documents
Culture Documents
1
Course Agenda
Week Day Theory Topic Hands-on Lab
2
Today’s Agenda (Day-3)
3
What is Ethics?
Ethics are shared values that help us distinguish right from wrong.
Ethics are the basis for the rules we all voluntarily choose to follow because that
makes the world a better place for all of us.
Ethics flow from shared values, which could be on account of religion, or not.
Ethics guide the creation of laws, so the two are often in consonance.
Not everyone will be ethical, even when there are shared values
– That is why people go to jail for theft
There is tremendous excitement about Data Science precisely because of the many
ways in which it provides us with a “better” way to do something.
But there are possible undesired consequences, for privacy, fairness, etc.
Dvijesh Shastri
13
The positive aspects
14
The positive aspects
15
The positive aspects
AI applications in crop management and food production help feed the world.
https://www.youtube.com/watch?v=-YCa8RntsRE&t=17s
Optimization of business processes using machine learning will make businesses more
productive, increasing wealth and providing more employment.
https://youtu.be/kbPRrHjji3g
Automation can replace the tedious and dangerous tasks that many workers face, and
free them to concentrate on more interesting aspects.
https://www.youtube.com/watch?v=Rc8QP-0zUZE
16
The positive aspects
People with disabilities will benefit from AI-based assistance in seeing, hearing,
and mobility.
17
Data Science and AI Need Ethics
• While Data Science and AI can help in so many ways, it can hurt too.
18
Ethical Considerations
19
Ethical Considerations
20
Who generates Big Data
• Machines
• People
• Organizations
21
22
Who generates Big Data?
• Machines
• People
• Organizations
23
Machine-Generated Data
24
Sensor Data
25
Who generates Big Data
• Machines
• People
• Organizations
26
People-Generated Data
27
Huge Volume and High Velocity
A recent report by DOMO estimates the amount of
data generated every minute on popular online
platforms:
http://www.internetlivestats.com/one-second/
28
Twitter data companies analyze 12 TB of data everyday to
measure “sentiment” around their products
29
KB (103) MB (106) GB (109) TB (1012) PB (1015) EB (1018) ZB (1021) YB (1024)
30
Who generates Big Data
• Machines
• People
• Organizations
31
Organization-Generated Data
Healthcare data collected in electronic
health record (EHR) systems
Government Data
32
Why?
Fraud detection
Meet Competitive Pressure
Provide better, customized
services for an edge (e.g. in
Customer Relationship
Management)
33
Some Stats
34
Data Gathering and Privacy Implications
35
Facebook Tags
Tag: Label identifying a person in a photo
Facebook allows users to tag people who are on their list of friends
About 100 million tags added per day in Facebook
Facebook uses facial recognition to suggest name of friend appearing in photo
Does this feature increase risk of improper tagging?
36
Facebook Login
Allows people to login to Web sites or apps using their Facebook credentials
App’s developer has permission to access information from person’s Facebook
profile: name, location, email address, and friends list
37
Malls Track Shoppers’ Cell Phones
In 2011 two malls recorded movement of shopper by tracking locations of cell
phones
How much time people spend in each store?
Do people who shop at X also shop at Y?
Are there unpopular areas of mall?
Small signs informed shoppers of study
After protest, mall quickly halted study
38
iPhone Apps Upload Address Books
In 2012 a programmer discovered Path was uploading
iPhone address books without permission
40
Enhanced 911 Services
Cell phone providers in United States required to track locations of active cell phones
to within 100 meters
41
Rewards or Loyalty Programs
Shoppers who belong to store’s rewards program can save money on many of
their purchases
42
Body Scanners (1 of 2)
Some department stores have 3-D body scanners
43
Advanced Imaging Technology (AIT) Scanners
Transportation Security Administration began installing AIT scanners in 2007
AIT scanners revealed anatomical features
Electronic Privacy Information Center sued government in 2010, saying systems
violated 4th Amendment and various laws
TSA announced it would develop new software that would replace passenger-
specific images with generic outlines
All body scanners producing passenger specific images removed in 2013
44
Advanced Imaging Technology Scanner
When the first advanced imaging technology scanners were deployed in American airports, they revealed anatomical features
in great detail. (Paul Ellis/AFP/Getty Images)
45
RFID Tags
RFID: Radio frequency identification
An RFID tag is a tiny wireless transmitter
Manufacturers are replacing bar codes with RFID
tags
Contain more information
Can be scanned more easily
If tag cannot be removed or disabled, it becomes
a tracking device
Employees take inventory more quickly and make fewer errors when items
are marked with RFID tags. (Marc F. Henning/Alamy)
46
Implanted Chips
Taiwan: Every domesticated dog must have an implanted microchip
Size of a grain of rice; implanted into ear
Chip contains name, address of owner
Allows lost dogs to be returned to owners
RFID tags approved for use in humans
Can be used to store medical information
Can be used as a “debit card”
47
Mobile Apps
Many apps on Android smartphones and iPhones collect location information and sell
it to advertisers and data brokers
Angry Birds
Brightest Flashlight
Flurry: a company specializing in analyzing data collected from mobile apps
Has access to data from > 500,000 apps
48
OnStar
OnStar manufactures communication system incorporated into rear-view mirror
Emergency, security, navigation, and diagnostics services provided subscribers
Two-way communication and GPS
Automatic communication when airbags deploy
Service center can even disable gas pedal
49
Automobile “Black Boxes”
Modern automobiles come equipped with a “black box”
Maintains data for five seconds:
Speed of car
Amount of pressure being put on brake pedal
Seat belt status
After an accident, investigators can retrieve and gather information from “black box”
50
Medical Records
Advantages of changing from paper-based to electronic medical records
Quicker and cheaper for information to be shared among caregivers
Lower medical costs
Improve quality of medical care
Once information in a database, more difficult to control how it is disseminated
51
Digital Video Recorders
TiVo service allows subscribers to record programs and watch them later
TiVo collects detailed information about viewing habits of its subscribers
Data collected second by second, making it valuable to advertisers and others
interested in knowing viewing habits
52
Cookies
Cookie: File placed on computer’s hard drive by a Web server
Contains information about visits to a Web site
Allows Web sites to provide personalized services
Put on hard drive without user’s permission
You can set Web browser to alert you to new cookies or to block cookies entirely
53
Privacy
Balanced against the individual’s right to privacy is the value that society gains
from sharing data.
54
The Problem of Re-identification
The problem is that the shared de-identified data may be subject to re-
identification.
Examples:
If the data strips out the name, social security number, and street
address, but includes date of birth, gender, and zip code, then, as
shown by Latanya Sweeney (2000), 87% of the U.S. population can
be uniquely re-identified.
55
Netflix Prize
Netflix offered $1 million prize to any group that could come up with a significantly better
algorithm for predicting user ratings (2006)
Released more than 100 million movie ratings from a half million customers
Stripped ratings of private information
Researchers demonstrated that ratings not truly anonymous if a little more information from
individuals was available
U.S. Federal Trade Commission complaint and lawsuit
Netflix canceled sequel to Netflix Prize (2010)
56
AOL Search Dataset
AOL researcher Dr. Chowdhury posted three months’ worth of user queries from
650,000 users (2006)
No names used; random integers used to label all queries from particular users
Researchers identified some users from queries; e.g., many people performed
searches on their own names
New York Times investigation led to public outcry
AOL took down dataset, but already copied and reposted
AOL fired Dr. Chowdhury and his supervisor
57
Solution for Re-identification
Generalizing fields: Replacing the exact birth date with just the year of birth, or a
broader range like “20-30 years old.” Deleting a field altogether can be seen as a
form of generalizing to “any”.
k-anonymity: a database is k-anonymized if every record in the database is
indistinguishable from at least k−1 other records. If there are records that are
more unique than this, they would have to be further generalized.
Aggregate querying: An API for queries against the database is provided, and
valid queries receive a response that summarizes the data with a count or
average (e.g., for each zip code, the percentage of people with cancer).
58
Solution for Re-identification
59
General Data Protection Regulation
General Data Protection Regulation (GDPR): set of rules governing collection of
information from citizens of European Union
Requires companies to…
Disclose information they are seeking to collect
Disclose why they are collecting it
Get permission before collecting it
Responding to GDPR, most large American companies are adopting new privacy
guidelines
Web-site banners informing users, asking for consent
60
Ethical Considerations
61
Predictive Policing
Hypothesis: Criminals behave in a predictable way
Times of crimes fall into patterns
Some areas have higher incidence of crimes
Predictive policing: use of data mining to deploy police officers to areas where crimes
are more likely to occur
Police in Santa Cruz and Los Angeles saw significant declines in property crime
62
Facebook Beacon
2007: Facebook announced Beacon, a targeted advertising device
Facebook user makes purchase
Facebook broadcasts purchase to user’s friends
Based on opt-out policy: users enrolled unless explicitly asked to be excluded
A significant source of advertising revenue for Facebook
MoveOn.org led online campaign lobbying Facebook to switch to an opt-in policy
Mark Zuckerberg apologized, and Facebook switched to an opt-in policy
63
Google’s Personalized Search
Secondary use: Information collected for one purpose use for another purpose
Google keeps track of your search queries and Web pages you have visited
It uses this information to infer your interests and determine which pages to
return
Example: “bass” could refer to fishing or music
64
Limiting Information Google Saves
You can limit amount of information Google saves about your activities
Privacy Checkup lets you pause collection of personal information
Search queries and other Google activity
Location information collected from signed-in devices
– Where you have gone
– How often you have gone there
– How long you have stayed
– Customary routes of travel
Contact and calendar information
Recordings of your voice and accompanying audio
YouTube search queries
YouTube videos you have watched
65
Secondary Uses of Information
66
Collaborative Filtering
Form of data mining
Analyze information about preferences of large number of people to predict what one
person may prefer
Explicit method: ask people to rank preferences
Implicit method: keep track of purchases
67
Ownership of Transaction Information
Who controls transaction information?
Buyer?
Seller?
Both?
Opt-in: Consumer must explicitly give permission before the organization can share info
Opt-out: Organization can share info until consumer explicitly forbid it
Opt-in is a barrier for new businesses, so direct marketing organizations prefer opt-out
68
“Target”-ing Pregnant Women
Most people keep shopping at the same stores, but new parents have malleable
shopping habits
69
Credit Reports
Example of how information about customers can itself become a commodity
Credit bureaus
Keep track of an individual’s assets, debts, and history of paying bills and
repaying loans
Sell credit reports to banks, credit card companies, and other potential
lenders
System gives you more choices in where to borrow money
Poor credit can hurt employment prospects
70
Targeted Direct Mail
Businesses mail advertisements only to those most likely to purchase products
Data brokers provide customized mailing lists created for information gathered online
and offline
Example of making inferences for targeted direct mail
Shopping for clothes online + frequent fast-food dining + subscribing to premium
cable T V channels → more likely to be obese
Two shoppers visiting same site may pay different prices based on inferences about
their relative affluence
71
Microtargeting
Political campaigns determine voters most likely to support
particular candidates
Voter registration
Voting frequency
Consumer data
G I S data
Target direct mailings, emails, text messages, home visits to
most likely supporters
72
Social Network Analysis
Collect information from social networks to inform decisions
Bharti Airtel (India) offers special promotions to “influencers”
Police use Facebook and Twitter posts to deploy officers on big party nights
Banks combine social network data with credit reports to determine creditworthiness
73
Cambridge Analytica (1 of 3)
Robert Mercer’s vision: Use data analytics to help conservative candidates and causes
Mercer formed joint venture with SCL Group and invested $15 million in new firm:
Cambridge Analytica
SCL Group hired Aleksandr Kogan to gather data about American voters
Kogan created survey app: “thisisyourdigitallife”
Promoted survey using Amazon’s Mechanical Turk
Users paid $1 or $2 to take personality test
Users had to access app using Facebook Login
Users agreed that app would download information about them and their Facebook
friends
74
Cambridge Analytica (2 of 3)
Personal data collected from 270,000 people who took surveys and as many as 87 million
people who were on their friends’ lists
Kogan shared profiles with Cambridge Analytica
About 30 million profiles were detailed enough that Cambridge Analytica could combine data
with other data they had, creating psychographic profiles
Classified voters over five personality traits: openness, conscientiousness, extroversion,
agreeableness, neuroticism
Strategy: target ads based on psychographic profile
Ted Cruz campaign hired Cambridge Analytica to help with microtargeting
Value of advice debatable
Campaign staffers said predictions were bad
75
Cambridge Analytica (3 of 3)
Trump campaign hired Cambridge Analytica in fall 2016 – firm promised to provide names of
millions of voters likely to vote for Trump
“Data breach” story broke in spring 2018
Facebook response
– Not a breach – everyone who used Kogan’s app had granted their consent, and privacy
settings of their friends allowed their information to be shared
– Kogan had perpetrated a fraud by sharing data with Cambridge Analytica
– Suspended accounts of Kogan and Cambridge Analytica
Mark Zuckerberg called to Washington, D C, and testified for 10 hours in front of two
Congressional Committees
May 2018: Cambridge Analytica filed for bankruptcy
76
Ethical Considerations
77
Lethal autonomous weapons
The UN defines a lethal autonomous weapon as one that locates, selects, and
engages (i.e., kills) human targets without human supervision.
Autonomous weapons have been called the “third revolution in warfare” after
gunpowder and nuclear weapons. Their military potential is obvious.
Israel’s Harop missile is a “loitering munition” with a ten-foot wingspan and a fifty-pound
warhead. It searches for up to six hours in a given geographical region for any target that meets
a given criterion and then destroys it. The criterion could be “emits a radar signal resembling
antiaircraft radar” or “looks like a tank.”
78
Ethical Side
On the ethical side, some find it simply morally unacceptable to delegate the
decision to kill humans to a machine.
Germany’s ambassador in Geneva has stated that it “will not accept that the decision over life
and death is taken solely by an autonomous system”.
Gen. Paul Selva, at the time the second-ranking military officer in the United States, said in
2017, “I don’t think it’s reasonable for us to put robots in charge of whether or not we take a
human life.”
António Guterres, the head of the United Nations, stated in 2019 that “machines with the power
and discretion to take lives without human involvement are politically unacceptable, morally
repugnant and should be prohibited by international law.”
79
NGOs
More than 140 NGOs in over 60 countries are part of the Campaign to Stop Killer
Robots
The Future of Life Institute organized an open letter was signed by over 4,000 AI
researchers and 22,000 others.
80
Counterarguments
81
The U.S. Stands
82
What would have happened if there had been no human in the loop…
83
Cyberattacks against autonomous weapons
84
Summary
85
Ethical Considerations
86
Surveillance, and security
As of 2018, there were as many as 350 million surveillance cameras in China and
70 million in the United States.
low-tech countries with some with reputations for mistreating their citizens,
disproportionately target marginalized communities.
As we interact with computers for increasing amounts of our daily lives, more data
on us is being collected by governments and corporations. Data collectors have a
moral and legal responsibility to be good stewards of the data they hold.
87
Closed-Circuit Television Cameras
First use in Olean, New York in 1968
Now more than 30 million cameras in U S
New York City’s effort in lower Manhattan
$201 million for 3,000 new cameras
License plate readers
Radiation detectors
Effectiveness of cameras debated
88
Boston Marathon Bombing Suspects
After the Boston Marathon bombing, images from surveillance cameras played an important role in the apprehension of the
suspects. (FBI/Law Enforcement Bulletin)
89
Machine Learning for Cybersecurity
Machine learning can be a powerful tool for both sides in the cybersecurity battle.
Attackers can use automation to probe for insecurities and they can apply reinforcement
learning for phishing attempts and automated blackmail.
Defenders can use unsupervised learning to detect anomalous incoming traffic patterns and
various machine learning techniques to detect fraud.
One forecast puts the market for machine learning in cybersecurity at about $100
billion by 2021.
90
Ethical Considerations
91
Fairness and bias (27.3.3)
Example: a system may pick up the racial or gender prejudices of human judges
from the examples in the training set.
92
How can we defend against these biases?
1. Understand the limits of the data you are using. It has been suggested that data sets
should come with annotations: declarations of provenance, security, conformity, and
fitness for use.
2. De-bias the data. We could over-sample from minority classes to defend against
sample size disparity.
Techniques such as SMOTE, the synthetic minority over-sampling technique or ADASYN, the adaptive
synthetic sampling approach for imbalanced learning provide principled ways of oversampling.
3. Invent new machine learning models and algorithms that are more resistant to bias;
and the final idea is to let a system make initial recommendations that may be
biased, but then train a second system to de-bias the recommendations of the first
one.
The IBM AI FAIRNESS 360 system, which provides a framework for all of these ideas. There will be
increased use of tools like this in the future.
93
How do you make sure that the systems you build will be fair?
A set of best
practices has been
emerging (although
they are not always
followed):
94
Ethical Considerations
95
Trust
96
A Possible Solution: Certification
97
Transparency
Another aspect of trust is transparency: consumers want to know what is going on inside a
system, and that the system is not working against them, whether due to intentional malice, an
unintentional bug, or pervasive societal bias that is recapitulated by the system.
Example: When an AI system turns you down for a loan, you deserve an explanation.
An AI system that can explain itself is called explainable AI (XAI).
98
Counterargument for explanation
An explanation about one case does not give you a summary over other cases.
If the bank explains, “Sorry, you didn’t get the loan because you have a history of
previous financial problems,” you don’t know if that explanation is accurate or if
the bank is secretly biased against you for some reason.
In this case, you require not just an explanation, but also an audit of past
decisions, with aggregated statistics across various demographic groups, to see if
their approval rates are balanced.
99
Bot or Human
100
Ethical Considerations
101
The future of work
PwC (Rao and Verweij, 2017) predicts that AI contribute $15 trillion annually to
global GDP by 2030.
The healthcare and automotive/transportation industries stand to gain the most in the short
term.
The mainstream economic view for most of the 20th century: technological
employment was at most a short-term phenomenon. Increased productivity would
always lead to increased wealth and increased demand, and thus net job growth
(e.g., Bank-tellers Next slide)
102
Bank tellers Example
Although ATMs replaced humans in the job of counting out cash for withdrawals
that made it cheaper to operate a bank branch so the number of branches
increased
1. leading to more bank employees overall.
2. The nature of the work also changed, becoming less routine and requiring more advanced
business skills.
The net effect of automation seems to be in eliminating tasks rather than jobs.
The majority of commenters predict that the same will hold true with AI
technology, at least in the short run.
103
Things will be different this time around…
But some analysts think that this time around, things will be different.
In 2019, IBM predicted that 120 million workers would need retraining due to
automation by 2022, and
Oxford Economics predicted that 20 million manufacturing jobs could be lost to
automation by 2030.
Frey and Osborne (2017) survey 702 different occupations, and estimate that
47% of them are at risk of being automated, meaning that at least some of the
tasks in the occupation can be performed by machine.
For example, almost 3% of the workforce in the U.S. are vehicle drivers, and in some districts,
as much as 15% of the male workforce are drivers. The task of driving is likely to be eliminated
by driverless cars/trucks/buses/taxis.
104
Automation in Occupation or Task
McKinsey estimates that only 5% of occupations are fully automatable, but that
60% of occupations can have about 30% of their tasks automated.
Examples:
Track Drivers
Radiologists
105
Example-1: Track Drivers
107
Other AI-Applications
Care for the elderly: In developed countries, in 2015, there were less than 30
retirees per 100 workers; by 2050 there may be over 60 per 100 workers. Care for
the elderly will be an increasingly important role, one that can partially be filled by
AI.
Farming industry: In 1900, over 40% of the U.S. workforce was in agriculture, but
by 2000 that had fallen to 2%. That is a huge disruption in the way we work, but it
happened over a period of 100 years, and thus across generations, not in the
lifetime of one worker.
108
Negative effect: Winner-Take-All Society
Example:
If farmer Ali is 10% better than farmer Bo, then Ali gets about 10% more income:
Ali can charge slightly more for superior goods, but there is a limit on how much
can be produced on the land, and how far it can be shipped.
But if software app developer Cary is 10% better than Dana, it may be that Cary
ends up with 99% of the global market.
109
Summary
110
Ethical Considerations
111
Robot rights
If they have no consciousness, no qualia, then few would argue that they deserve
rights.
But if robots can feel pain, if they can dread death, if they are considered
“persons,” then the argument can be made that they have rights and deserve to
have their rights recognized
112
Robot Rights
If robots have rights, then they should not be enslaved, and there is a question of
whether reprogramming them would be a kind of enslavement.
Another ethical issue involves voting rights: a rich person could buy thousands of
robots and program them to cast thousands of votes—should those votes count? If
a robot clones itself, can they both vote?
113
Avoiding the Dilemmas
Ernie Davis argues for avoiding the dilemmas of robot consciousness by never
building robots that could possibly be considered conscious.
114
Ethical Considerations
115
AI Safety
Almost any technology has the potential to cause harm in the wrong hands, but
with AI and robotics, the hands might be operating on their own.
It would be unethical to distribute an unsafe AI agent. We require our agents to
avoid accidents, to be resistant to adversarial attacks and malicious abuse, and in
general to cause benefits, not harms.
That is especially true as AI agents are deployed in safety-critical applications,
such as driving cars, controlling robots in dangerous factory or construction
settings, and making life-or-death medical decisions.
116
Data Science and AI Need Ethics
• While Data Science and AI can help in so many ways, it can hurt too.
• By developing a shared sense of ethical values we can reap benefits while
minimizing harms.
117
Debate Topic
Problem statement: Was Edward Snowden, the person who leaked information
about US surveillance programs hero or traitor?
118
References
Online
119
Slides presented are Intellectual Property of Dr. Dvijesh Shastri and usage rights belong to him.
120