3-Ethics of Data Science and AI

Day-3: Ethics for Data Science and AI
1
Course Agenda
Week Day Theory Topic Hands-on Lab
Day-1 • Introduction to Data Analytics Lab-1 (Introduction to Orange)

• Exploratory Data Analysis (EDA)
Day-2 Lab-2 (EDA)
• Visual Analytics
1
• Data Understanding
Day-3 • Data Preprocessing Lab-3 (Data Preprocessing)
• Data Science Ethics (Afternoon Session)
• Supervised Machine Learning (Decision Tree)
Day-4 Lab-4 (Decision Tree)
• Model Evaluation
• Ensemble methods (Bagging, Random Forest and
Day-5 Lab-5 (Ensemble methods for Classification)
Boosting)
2
• Unsupervised Machine Learning (k-Means, and
Day-6 Hierarchical) Lab-6 (Clustering)
• Data Science Ethics
Day-7 • Kaggle Competition
2
Today’s Agenda (Day-3)
Time Period Time Length

# Activities
(HH:MM) (minutes)
1 09:00 – 09:20 20 Understanding Data
2 09:20 – 10:20 60 Data Preprocessing (1/2)
3 10:20 – 10:30 10 Break
4 10:30 – 11:00 30 Data Preprocessing (2/2)
5 11:00 – 12:00 60 Lab-3 (Data Preprocessing)
6 12:00 – 01:00 60 Lunch
7 01:00 – 01:55 55 Ethics for Data Science and AI
8 01:55 – 02:05 10 Break
9 02:05 – 03:00 55 Ethics for Data Science and AI + Debate
3
What is Ethics?
Ethics are shared values that help us distinguish right from wrong.
Ethics are the cornerstone of civilization.
Ethics are the basis for the rules we all voluntarily choose to follow because that
makes the world a better place for all of us.
Data Science Ethics by H.V. Jagadish

4
Example
Ethical principles stop me from stealing your wallet

– Even if you are smaller and weaker
– Even if there is no chance I will be caught.

5
Ethics are not Religion
Most religions promote ethical behavior.
“Thou Shalt Not Steal”
But ethics need not be religious.
Ethics flow from shared values, which could be on account of religion, or not.

6
Ethics are not Laws
Suppose you tell me a secret and I

promise to tell no one. If I then break my
promise, I may not have broken any
law, but I have certainly been unethical.

7
Ethics  Laws
Ethics guide the creation of laws, so the two are often in consonance.
Not everyone will be ethical, even when there are shared values
– That is why people go to jail for theft
Laws may be used to enforce ethical behavior.

8
Benefits from Following Rules
If we all agree to traffic rules

• we can all drive much safer and faster
If we all agree not to litter, we can enjoy clean

roads and sidewalks.

9
What is Ethics?
Ethics are shared rules we all agree to

follow because of the resulting benefits.
–Infused with a sense of right and wrong.

10
Ethics for Data Science and AI
Copyright © 2019 NExT. All rights reserved

Ethics for Data Science
There is tremendous excitement about Data Science precisely because of the many
ways in which it provides us with a “better” way to do something.
But there are possible undesired consequences, for privacy, fairness, etc.
For example, Spams
How do we decide what is OK to do?

12
The Ethics for AI
Given that AI is a powerful technology, we have a moral obligation to use it well,

to promote the positive aspects and avoid or mitigate the negative ones.
The positive aspects are many.
There are also negative aspects as well.
Dvijesh Shastri
13
The positive aspects
Microsoft’s AI for Humanitarian Action program

applies AI to recovering from natural disasters,
addressing the needs of children, protecting
refugees, and promoting human rights.
Google’s AI for Social Good program supports

work on rainforest protection, human rights
jurisprudence, pollution monitoring,
measurement of fossil fuel emissions, crisis
counseling, news fact checking, suicide
prevention, recycling, and other issues.
14
The University of Chicago’s Center for Data Science for

Social Good applies machine learning to problems in
criminal justice, economic development, education, public
health, energy, and environment.
15
AI applications in crop management and food production help feed the world.
https://www.youtube.com/watch?v=-YCa8RntsRE&t=17s
Optimization of business processes using machine learning will make businesses more
productive, increasing wealth and providing more employment.
https://youtu.be/kbPRrHjji3g
Automation can replace the tedious and dangerous tasks that many workers face, and
free them to concentrate on more interesting aspects.
https://www.youtube.com/watch?v=Rc8QP-0zUZE
16
People with disabilities will benefit from AI-based assistance in seeing, hearing,
and mobility.
Machine translation already allows people from different cultures to communicate.
Software-based AI solutions have near zero marginal cost of production, and so

have the potential to democratize access to advanced technology (even as other
aspects of software have the potential to centralize power).
https://www.theverge.com/2021/5/25/22451144/microsoft-gpt-3-openai-coding-autocomplete-powerapps-power-fx
17
Data Science and AI Need Ethics
• While Data Science and AI can help in so many ways, it can hurt too.
18
Ethical Considerations
• Data Gathering and Privacy

• Data Analysis
• Lethal autonomous weapons
• Surveillance, and security
• Fairness and bias
• Trust and transparency
• The future of work
• Robot rights
• AI Safety
19

• Data Analysis
• Surveillance, and security
• Robot rights
• AI Safety
20
Who generates Big Data
• Machines
• People
• Organizations
21
22
Who generates Big Data?
• Machines
• People
• Organizations
23
Machine-Generated Data
Includes web logs, sensor data, telemetry data,

smart meter data and appliance usage data.
Largest all data sources
Collect data autonomously  Analyze data

Autonomously.
Provide environmental context.
24
Sensor Data
Data collected from sensors embedded in

industrial and energy systems for
monitoring their health and detecting
failures.
25
• Machines
• People
• Organizations
26
People-Generated Data
Data generated by social networks including

text, images, audio and video data.
Click-stream data generated by web

applications such as e-Commerce to analyze
user behavior.
27
Huge Volume and High Velocity
A recent report by DOMO estimates the amount of
data generated every minute on popular online
platforms:
 Facebook users share nearly 4.16 million pieces of content

 Twitter users send nearly 300,000 tweets
 Instagram users like nearly 1.73 million photos
 YouTube users upload 300 hours of new video content
 Apple users download nearly 51,000 apps
 Skype users make nearly 110,000 new calls
 Amazon receives 4300 new visitors
 Uber passengers take 694 rides
 Netflix subscribers stream nearly 77,000 hours of video
http://www.internetlivestats.com/one-second/
28
Twitter data companies analyze 12 TB of data everyday to
measure “sentiment” around their products
29
KB (103)  MB (106)  GB (109)  TB (1012)  PB (1015)  EB (1018)  ZB (1021)  YB (1024)
30
• Machines
• People
• Organizations
31
Organization-Generated Data
Healthcare data collected in electronic
health record (EHR) systems
Logs generated by web applications

Stock markets data
Transactional data generated by banking

and financial applications
Government Data
32
Why?
Fraud detection
Meet Competitive Pressure
 Provide better, customized
services for an edge (e.g. in
Customer Relationship
Management)
33
Some Stats
UPS is estimated to have 16 petabytes of data about its operations

 16 Million shipments per day
 40 Million tracking request
Walmart generates 2.5 petabytes data every 60 minutes !

 250 Million customers, 10,000 stores
34
Data Gathering and Privacy Implications
• Facebook tags • Facebook Login

• Enhanced 911 services • OnStar
• Rewards or loyalty programs • Automobile “black boxes”
• Medical records
• Body scanners
• Digital video recorders
• RFID tags • Cookies
• Implanted chips
• Mobile apps
35
Facebook Tags
Tag: Label identifying a person in a photo
Facebook allows users to tag people who are on their list of friends
About 100 million tags added per day in Facebook
Facebook uses facial recognition to suggest name of friend appearing in photo
Does this feature increase risk of improper tagging?
36
Facebook Login
Allows people to login to Web sites or apps using their Facebook credentials
App’s developer has permission to access information from person’s Facebook
profile: name, location, email address, and friends list
37
Malls Track Shoppers’ Cell Phones
In 2011 two malls recorded movement of shopper by tracking locations of cell
phones
 How much time people spend in each store?
 Do people who shop at X also shop at Y?
 Are there unpopular areas of mall?
Small signs informed shoppers of study
After protest, mall quickly halted study
38
iPhone Apps Upload Address Books
In 2012 a programmer discovered Path was uploading
iPhone address books without permission
Internet community pointed out this practice violated

Apple’s guidelines
CEO of Path apologized; app rewritten
Twitter, Foursquare, and Instagram also implicated for

same practice
39
Instagram’s Proposed Change to Terms of Service
Late 2012: Instagram announced changes
 Privacy policy
 Terms of service
Legal experts: Instagram and Facebook would have right to use photos in ads
without permission
Instagram CEO: New policy misunderstood
Changed advertising section of terms of service agreement back to original
version
40
Enhanced 911 Services
Cell phone providers in United States required to track locations of active cell phones
to within 100 meters
Allows emergency response teams to reach people in distress
What if this information is sold or shared?
41
Rewards or Loyalty Programs
Shoppers who belong to store’s rewards program can save money on many of
their purchases
Computers use information about buying habits to provide personalized service

 ShopRite computerized shopping carts with pop-up ads
Do card users pay less, or do non-users get overcharged?
42
Body Scanners (1 of 2)
Some department stores have 3-D body scanners
Computer can use this information to recommend

clothes
Scans can also be used to produce custom-made

clothing
A computer takes a customer’s measurements. (AP photo/Richard Drew)
43
Advanced Imaging Technology (AIT) Scanners
Transportation Security Administration began installing AIT scanners in 2007
AIT scanners revealed anatomical features
Electronic Privacy Information Center sued government in 2010, saying systems
violated 4th Amendment and various laws
TSA announced it would develop new software that would replace passenger-
specific images with generic outlines
All body scanners producing passenger specific images removed in 2013
44
Advanced Imaging Technology Scanner
When the first advanced imaging technology scanners were deployed in American airports, they revealed anatomical features
in great detail. (Paul Ellis/AFP/Getty Images)
45
RFID Tags
RFID: Radio frequency identification
An RFID tag is a tiny wireless transmitter
Manufacturers are replacing bar codes with RFID
tags
 Contain more information
 Can be scanned more easily
If tag cannot be removed or disabled, it becomes
a tracking device
Employees take inventory more quickly and make fewer errors when items
are marked with RFID tags. (Marc F. Henning/Alamy)
46
Implanted Chips
Taiwan: Every domesticated dog must have an implanted microchip
 Size of a grain of rice; implanted into ear
 Chip contains name, address of owner
 Allows lost dogs to be returned to owners
RFID tags approved for use in humans
 Can be used to store medical information
 Can be used as a “debit card”
47
Mobile Apps
Many apps on Android smartphones and iPhones collect location information and sell
it to advertisers and data brokers
 Angry Birds
 Brightest Flashlight
Flurry: a company specializing in analyzing data collected from mobile apps
 Has access to data from > 500,000 apps
48
OnStar
OnStar manufactures communication system incorporated into rear-view mirror
Emergency, security, navigation, and diagnostics services provided subscribers
Two-way communication and GPS
Automatic communication when airbags deploy
Service center can even disable gas pedal
49
Automobile “Black Boxes”
Modern automobiles come equipped with a “black box”
Maintains data for five seconds:
 Speed of car
 Amount of pressure being put on brake pedal
 Seat belt status
After an accident, investigators can retrieve and gather information from “black box”
50
Medical Records
Advantages of changing from paper-based to electronic medical records
Quicker and cheaper for information to be shared among caregivers
 Lower medical costs
 Improve quality of medical care
Once information in a database, more difficult to control how it is disseminated
51
Digital Video Recorders
TiVo service allows subscribers to record programs and watch them later
TiVo collects detailed information about viewing habits of its subscribers
Data collected second by second, making it valuable to advertisers and others
interested in knowing viewing habits
52
Cookies
Cookie: File placed on computer’s hard drive by a Web server
Contains information about visits to a Web site
Allows Web sites to provide personalized services
Put on hard drive without user’s permission
You can set Web browser to alert you to new cookies or to block cookies entirely
53
Privacy
Balanced against the individual’s right to privacy is the value that society gains
from sharing data.
We want to be able to stop terrorists without oppressing peaceful dissent, and we

want to cure diseases without compromising any individual’s right to keep their
health history private.
One key practice is de-identification: eliminating personally identifying information

(such as name and social security number) so that medical researchers can use
the data to advance the common good.
54
The Problem of Re-identification
The problem is that the shared de-identified data may be subject to re-
identification.
Examples:
 If the data strips out the name, social security number, and street
address, but includes date of birth, gender, and zip code, then, as
shown by Latanya Sweeney (2000), 87% of the U.S. population can
be uniquely re-identified.
55
Netflix Prize
Netflix offered $1 million prize to any group that could come up with a significantly better
algorithm for predicting user ratings (2006)
Released more than 100 million movie ratings from a half million customers
 Stripped ratings of private information
Researchers demonstrated that ratings not truly anonymous if a little more information from
individuals was available
U.S. Federal Trade Commission complaint and lawsuit
Netflix canceled sequel to Netflix Prize (2010)
56
AOL Search Dataset
AOL researcher Dr. Chowdhury posted three months’ worth of user queries from
650,000 users (2006)
No names used; random integers used to label all queries from particular users
Researchers identified some users from queries; e.g., many people performed
searches on their own names
New York Times investigation led to public outcry
AOL took down dataset, but already copied and reposted
AOL fired Dr. Chowdhury and his supervisor
57
Solution for Re-identification
Generalizing fields: Replacing the exact birth date with just the year of birth, or a
broader range like “20-30 years old.” Deleting a field altogether can be seen as a
form of generalizing to “any”.
k-anonymity: a database is k-anonymized if every record in the database is
indistinguishable from at least k−1 other records. If there are records that are
more unique than this, they would have to be further generalized.
Aggregate querying: An API for queries against the database is provided, and
valid queries receive a response that summarizes the data with a count or
average (e.g., for each zip code, the percentage of people with cancer).
58
Solution for Re-identification
Differential privacy: The query response

employs a randomized algorithm that
adds a small amount of noise to the
result.
Given a database D, any record in the
database r, any query Q, and a possible
response y to the query, we say that the
database D has ε differential privacy if
the log probability of the
response y varies by less than ε when
we add the record r:
59
General Data Protection Regulation
General Data Protection Regulation (GDPR): set of rules governing collection of
information from citizens of European Union
Requires companies to…
 Disclose information they are seeking to collect
 Disclose why they are collecting it
 Get permission before collecting it
Responding to GDPR, most large American companies are adopting new privacy
guidelines
 Web-site banners informing users, asking for consent
60

• Data Analysis
• Surveillance and security
• Robot rights
• AI Safety
61
Predictive Policing
Hypothesis: Criminals behave in a predictable way
 Times of crimes fall into patterns
 Some areas have higher incidence of crimes
Predictive policing: use of data mining to deploy police officers to areas where crimes
are more likely to occur
Police in Santa Cruz and Los Angeles saw significant declines in property crime
62
Facebook Beacon
2007: Facebook announced Beacon, a targeted advertising device
 Facebook user makes purchase
 Facebook broadcasts purchase to user’s friends
 Based on opt-out policy: users enrolled unless explicitly asked to be excluded
A significant source of advertising revenue for Facebook
MoveOn.org led online campaign lobbying Facebook to switch to an opt-in policy
Mark Zuckerberg apologized, and Facebook switched to an opt-in policy
63
Google’s Personalized Search
Secondary use: Information collected for one purpose use for another purpose
Google keeps track of your search queries and Web pages you have visited
 It uses this information to infer your interests and determine which pages to
return
 Example: “bass” could refer to fishing or music
Also used by retailers for direct marketing
64
Limiting Information Google Saves
You can limit amount of information Google saves about your activities
Privacy Checkup lets you pause collection of personal information
 Search queries and other Google activity
 Location information collected from signed-in devices
– Where you have gone
– How often you have gone there
– How long you have stayed
– Customary routes of travel
 Contact and calendar information
 Recordings of your voice and accompanying audio
 YouTube search queries
 YouTube videos you have watched
65
Secondary Uses of Information
66
Collaborative Filtering
Form of data mining
Analyze information about preferences of large number of people to predict what one
person may prefer
 Explicit method: ask people to rank preferences
 Implicit method: keep track of purchases
Used by online retailers and movie sites
67
Ownership of Transaction Information
Who controls transaction information?
 Buyer?
 Seller?
 Both?
Opt-in: Consumer must explicitly give permission before the organization can share info
Opt-out: Organization can share info until consumer explicitly forbid it
Opt-in is a barrier for new businesses, so direct marketing organizations prefer opt-out
68
“Target”-ing Pregnant Women
Most people keep shopping at the same stores, but new parents have malleable
shopping habits
Targeting pregnant women a good way to attract new customers
Target did data mining to predict customers in second trimester of pregnancy

 Large amounts of unscented lotion, extra-large bags of cotton balls, nutritional
supplements
Mailings included offers for unrelated items with offers for diapers, baby clothes, etc.
69
Credit Reports
Example of how information about customers can itself become a commodity
Credit bureaus
 Keep track of an individual’s assets, debts, and history of paying bills and
repaying loans
 Sell credit reports to banks, credit card companies, and other potential
lenders
System gives you more choices in where to borrow money
Poor credit can hurt employment prospects
70
Targeted Direct Mail
Businesses mail advertisements only to those most likely to purchase products
Data brokers provide customized mailing lists created for information gathered online
and offline
Example of making inferences for targeted direct mail
 Shopping for clothes online + frequent fast-food dining + subscribing to premium
cable T V channels → more likely to be obese
Two shoppers visiting same site may pay different prices based on inferences about
their relative affluence
71
Microtargeting
Political campaigns determine voters most likely to support
particular candidates
 Voter registration
 Voting frequency
 Consumer data
 G I S data
Target direct mailings, emails, text messages, home visits to
most likely supporters
72
Social Network Analysis
Collect information from social networks to inform decisions
Bharti Airtel (India) offers special promotions to “influencers”
Police use Facebook and Twitter posts to deploy officers on big party nights
Banks combine social network data with credit reports to determine creditworthiness
73
Cambridge Analytica (1 of 3)
Robert Mercer’s vision: Use data analytics to help conservative candidates and causes
Mercer formed joint venture with SCL Group and invested $15 million in new firm:
Cambridge Analytica
SCL Group hired Aleksandr Kogan to gather data about American voters
Kogan created survey app: “thisisyourdigitallife”
 Promoted survey using Amazon’s Mechanical Turk
 Users paid $1 or $2 to take personality test
 Users had to access app using Facebook Login
 Users agreed that app would download information about them and their Facebook
friends
74
Personal data collected from 270,000 people who took surveys and as many as 87 million
people who were on their friends’ lists
Kogan shared profiles with Cambridge Analytica
About 30 million profiles were detailed enough that Cambridge Analytica could combine data
with other data they had, creating psychographic profiles
 Classified voters over five personality traits: openness, conscientiousness, extroversion,
agreeableness, neuroticism
 Strategy: target ads based on psychographic profile
Ted Cruz campaign hired Cambridge Analytica to help with microtargeting
 Value of advice debatable
 Campaign staffers said predictions were bad
75
Trump campaign hired Cambridge Analytica in fall 2016 – firm promised to provide names of
millions of voters likely to vote for Trump
“Data breach” story broke in spring 2018
 Facebook response
– Not a breach – everyone who used Kogan’s app had granted their consent, and privacy
settings of their friends allowed their information to be shared
– Kogan had perpetrated a fraud by sharing data with Cambridge Analytica
– Suspended accounts of Kogan and Cambridge Analytica
 Mark Zuckerberg called to Washington, D C, and testified for 10 hours in front of two
Congressional Committees
May 2018: Cambridge Analytica filed for bankruptcy
76

• Data Analysis
• Robot rights
• AI Safety
77
Lethal autonomous weapons
The UN defines a lethal autonomous weapon as one that locates, selects, and
engages (i.e., kills) human targets without human supervision.
Autonomous weapons have been called the “third revolution in warfare” after
gunpowder and nuclear weapons. Their military potential is obvious.
 Israel’s Harop missile is a “loitering munition” with a ten-foot wingspan and a fifty-pound
warhead. It searches for up to six hours in a given geographical region for any target that meets
a given criterion and then destroys it. The criterion could be “emits a radar signal resembling
antiaircraft radar” or “looks like a tank.”
78
Ethical Side
On the ethical side, some find it simply morally unacceptable to delegate the
decision to kill humans to a machine.
 Germany’s ambassador in Geneva has stated that it “will not accept that the decision over life
and death is taken solely by an autonomous system”.
 Gen. Paul Selva, at the time the second-ranking military officer in the United States, said in
2017, “I don’t think it’s reasonable for us to put robots in charge of whether or not we take a
human life.”
 António Guterres, the head of the United Nations, stated in 2019 that “machines with the power
and discretion to take lives without human involvement are politically unacceptable, morally
repugnant and should be prohibited by international law.”
79
NGOs
More than 140 NGOs in over 60 countries are part of the Campaign to Stop Killer
Robots
The Future of Life Institute organized an open letter was signed by over 4,000 AI
researchers and 22,000 others.
80
Counterarguments
81
The U.S. Stands
82
What would have happened if there had been no human in the loop…
83
Cyberattacks against autonomous weapons
Cyberattacks against autonomous weapons could result in friendly-fire casualties;

Disconnecting the weapon from all communication may prevent that (assuming it
has not already been compromised), but then the weapon cannot be recalled if it
is malfunctioning.
84
Summary
85

• Data Analysis
• Robot rights
• AI Safety
86
Surveillance, and security
As of 2018, there were as many as 350 million surveillance cameras in China and
70 million in the United States.
low-tech countries with some with reputations for mistreating their citizens,
disproportionately target marginalized communities.
As we interact with computers for increasing amounts of our daily lives, more data
on us is being collected by governments and corporations. Data collectors have a
moral and legal responsibility to be good stewards of the data they hold.
87
Closed-Circuit Television Cameras
First use in Olean, New York in 1968
Now more than 30 million cameras in U S
New York City’s effort in lower Manhattan
 $201 million for 3,000 new cameras
 License plate readers
 Radiation detectors
Effectiveness of cameras debated
88
Boston Marathon Bombing Suspects
After the Boston Marathon bombing, images from surveillance cameras played an important role in the apprehension of the
suspects. (FBI/Law Enforcement Bulletin)
89
Machine Learning for Cybersecurity
Machine learning can be a powerful tool for both sides in the cybersecurity battle.
 Attackers can use automation to probe for insecurities and they can apply reinforcement
learning for phishing attempts and automated blackmail.
 Defenders can use unsupervised learning to detect anomalous incoming traffic patterns and
various machine learning techniques to detect fraud.
One forecast puts the market for machine learning in cybersecurity at about $100
billion by 2021.
90

• Data Analysis
• Robot rights
• AI Safety
91
Fairness and bias (27.3.3)
Machine learning is augmenting and sometimes replacing human decision-

making in important situations: whose loan gets approved, to what neighborhoods
police officers are deployed, who gets pretrial release or parole.
But machine learning models can perpetuate societal bias.
Example: a system may pick up the racial or gender prejudices of human judges
from the examples in the training set.
92
How can we defend against these biases?
1. Understand the limits of the data you are using. It has been suggested that data sets
should come with annotations: declarations of provenance, security, conformity, and
fitness for use.
2. De-bias the data. We could over-sample from minority classes to defend against
sample size disparity.
 Techniques such as SMOTE, the synthetic minority over-sampling technique or ADASYN, the adaptive
synthetic sampling approach for imbalanced learning provide principled ways of oversampling.
3. Invent new machine learning models and algorithms that are more resistant to bias;
and the final idea is to let a system make initial recommendations that may be
biased, but then train a second system to de-bias the recommendations of the first
one.
 The IBM AI FAIRNESS 360 system, which provides a framework for all of these ideas. There will be
increased use of tools like this in the future.
93
How do you make sure that the systems you build will be fair?
A set of best
practices has been
emerging (although
they are not always
followed):
94

• Data Analysis
• Robot rights
• AI Safety
95
Trust
It is one challenge to make an AI system accurate, fair, safe, and secure; a

different challenge to convince everyone else that you have done so. People
need to be able to trust the systems they use.
A PwC survey in 2017 found that 76% of businesses were slowing the adoption of
AI because of trustworthiness concerns.
96
A Possible Solution: Certification
One instrument of trust is certification.

Example:
 Underwriters Laboratories (UL) was founded in 1894 at a time when consumers were
apprehensive about the risks of electric power. UL certification of appliances gave consumers
increased trust. In fact UL is now considering entering the business of product testing and
certification for AI.
 ISO 26262 is an international standard for the safety of automobiles, describing how to
develop, produce, operate, and service vehicles in a safe way.
Some development: IEEE P7001, a standard defining ethical design for artificial
intelligence and autonomous systems.
97
Transparency
Another aspect of trust is transparency: consumers want to know what is going on inside a
system, and that the system is not working against them, whether due to intentional malice, an
unintentional bug, or pervasive societal bias that is recapitulated by the system.
Example: When an AI system turns you down for a loan, you deserve an explanation.
An AI system that can explain itself is called explainable AI (XAI).
A good explanation has several properties:

 it should be understandable and convincing to the user,
 it should accurately reflect the reasoning of the system,
 it should be complete, and
 it should be specific in that different users with different conditions or different outcomes should get different
explanations.
98
Counterargument for explanation
An explanation about one case does not give you a summary over other cases.
If the bank explains, “Sorry, you didn’t get the loan because you have a history of
previous financial problems,” you don’t know if that explanation is accurate or if
the bank is secretly biased against you for some reason.
In this case, you require not just an explanation, but also an audit of past
decisions, with aggregated statistics across various demographic groups, to see if
their approval rates are balanced.
99
Bot or Human
100

• Data Analysis
• Robot rights
• AI Safety
101
The future of work
PwC (Rao and Verweij, 2017) predicts that AI contribute $15 trillion annually to
global GDP by 2030.
 The healthcare and automotive/transportation industries stand to gain the most in the short
term.
Technological unemployment: Technological innovations have historically put

some people out of work.
The mainstream economic view for most of the 20th century: technological
employment was at most a short-term phenomenon. Increased productivity would
always lead to increased wealth and increased demand, and thus net job growth
(e.g., Bank-tellers  Next slide)
102
Bank tellers Example
Although ATMs replaced humans in the job of counting out cash for withdrawals
 that made it cheaper to operate a bank branch  so the number of branches
increased 
1. leading to more bank employees overall.
2. The nature of the work also changed, becoming less routine and requiring more advanced
business skills.
The net effect of automation seems to be in eliminating tasks rather than jobs.
The majority of commenters predict that the same will hold true with AI
technology, at least in the short run.
103
Things will be different this time around…
But some analysts think that this time around, things will be different.
In 2019, IBM predicted that 120 million workers would need retraining due to
automation by 2022, and
Oxford Economics predicted that 20 million manufacturing jobs could be lost to
automation by 2030.
Frey and Osborne (2017) survey 702 different occupations, and estimate that
47% of them are at risk of being automated, meaning that at least some of the
tasks in the occupation can be performed by machine.
 For example, almost 3% of the workforce in the U.S. are vehicle drivers, and in some districts,
as much as 15% of the male workforce are drivers. The task of driving is likely to be eliminated
by driverless cars/trucks/buses/taxis.
104
Automation in Occupation or Task
McKinsey estimates that only 5% of occupations are fully automatable, but that
60% of occupations can have about 30% of their tasks automated.
Examples:
 Track Drivers
 Radiologists
105
Example-1: Track Drivers
Future truck drivers will spend less time holding

the steering wheel and more time making sure
that the goods are picked up and delivered
properly; serving as customer service
representatives and salespeople at either end of
the journey; and perhaps managing convoys of,
say, three robotic trucks.
Replacing three drivers with one convoy manager

implies a net loss in employment, but if
transportation costs decrease, there will be more
demand, which wins some of the jobs back—but
perhaps not all of them.
106
Example-2: Radiologists
Despite many advances in applying

machine learning to the problem of
medical imaging, radiologists have so far
been augmented, not replaced, by these
tools.
Ultimately, there is a choice of how to

make use of automation: do we want to
focus on cutting cost, and thus see job
loss as a positive; or do we want to focus
on improving quality, making life better for
the worker and the customer?
107
Other AI-Applications
Care for the elderly: In developed countries, in 2015, there were less than 30
retirees per 100 workers; by 2050 there may be over 60 per 100 workers. Care for
the elderly will be an increasingly important role, one that can partially be filled by
AI.
Farming industry: In 1900, over 40% of the U.S. workforce was in agriculture, but
by 2000 that had fallen to 2%. That is a huge disruption in the way we work, but it
happened over a period of 100 years, and thus across generations, not in the
lifetime of one worker.
108
Negative effect: Winner-Take-All Society
Technology tends to magnify income inequality.
Example:
If farmer Ali is 10% better than farmer Bo, then Ali gets about 10% more income:
Ali can charge slightly more for superior goods, but there is a limit on how much
can be produced on the land, and how far it can be shipped.
But if software app developer Cary is 10% better than Dana, it may be that Cary
ends up with 99% of the global market.
109
Summary
A job serves three purposes:

1. it fuels the production of the goods that society needs to flourish,
2. it provides the income that the worker needs to live, and
3. it gives the worker a sense of purpose, accomplishment, and social integration.
With increasing automation, it may be that these three purposes become

disaggregated
110

• Data Analysis
• Robot rights
• AI Safety
111
Robot rights
Critical to the question of what rights, if any, robots should have.
If they have no consciousness, no qualia, then few would argue that they deserve
rights.
But if robots can feel pain, if they can dread death, if they are considered
“persons,” then the argument can be made that they have rights and deserve to
have their rights recognized
112
Robot Rights
If robots have rights, then they should not be enslaved, and there is a question of
whether reprogramming them would be a kind of enslavement.
Another ethical issue involves voting rights: a rich person could buy thousands of
robots and program them to cast thousands of votes—should those votes count? If
a robot clones itself, can they both vote?
113
Avoiding the Dilemmas
Ernie Davis argues for avoiding the dilemmas of robot consciousness by never
building robots that could possibly be considered conscious.
114

• Data Analysis
• Robot rights
• AI Safety
115
AI Safety
Almost any technology has the potential to cause harm in the wrong hands, but
with AI and robotics, the hands might be operating on their own.
It would be unethical to distribute an unsafe AI agent. We require our agents to
avoid accidents, to be resistant to adversarial attacks and malicious abuse, and in
general to cause benefits, not harms.
That is especially true as AI agents are deployed in safety-critical applications,
such as driving cars, controlling robots in dangerous factory or construction
settings, and making life-or-death medical decisions.
116
Data Science and AI Need Ethics
• While Data Science and AI can help in so many ways, it can hurt too.
• By developing a shared sense of ethical values we can reap benefits while
minimizing harms.
117
Debate Topic
Problem statement: Was Edward Snowden, the person who leaked information
about US surveillance programs hero or traitor?
The British newspaper the Guardian, characterized as a whistle-blower. Michael

Hayden, former director of the National Security Agency, said that he used to
describe leaker Edward Snowden as a “defector,” but is now “drifting in the
direction of perhaps more harsh language ... such as ‘traitor.’” [1]
[1] http://www.huffingtonpost.com/2013/12/29/michael-hayden-edward-snowden_n_4515705.html
Edward Snowden was a traitor. Edward Snowden was a whistle-blower.

Sit on the right-hand side of Sit on the left-hand side of the isle as you
the isle as you enter the room. enter the room.
118
References
Online
119
Slides presented are Intellectual Property of Dr. Dvijesh Shastri and usage rights belong to him.
120

3-Ethics of Data Science and AI

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3-Ethics of Data Science and AI

Uploaded by

Copyright:

Available Formats

Day-3: Ethics for Data Science and AI

Day-1 • Introduction to Data Analytics Lab-1 (Introduction to Orange)

Time Period Time Length

Ethics are the cornerstone of civilization.

Data Science Ethics by H.V. Jagadish

Ethical principles stop me from stealing your wallet

Data Science Ethics by H.V. Jagadish

Most religions promote ethical behavior.

“Thou Shalt Not Steal”

But ethics need not be religious.

Data Science Ethics by H.V. Jagadish

Suppose you tell me a secret and I

Data Science Ethics by H.V. Jagadish

Laws may be used to enforce ethical behavior.

Data Science Ethics by H.V. Jagadish

If we all agree to traffic rules

If we all agree not to litter, we can enjoy clean

Data Science Ethics by H.V. Jagadish

Ethics are shared rules we all agree to

Data Science Ethics by H.V. Jagadish

Copyright © 2019 NExT. All rights reserved

For example, Spams

How do we decide what is OK to do?

Data Science Ethics by H.V. Jagadish

Given that AI is a powerful technology, we have a moral obligation to use it well,

The positive aspects are many.

There are also negative aspects as well.

Microsoft’s AI for Humanitarian Action program

Google’s AI for Social Good program supports

The University of Chicago’s Center for Data Science for

Machine translation already allows people from different cultures to communicate.

Software-based AI solutions have near zero marginal cost of production, and so

• Data Gathering and Privacy

• Data Gathering and Privacy

Includes web logs, sensor data, telemetry data,

Largest all data sources

Collect data autonomously  Analyze data

Provide environmental context.

Data collected from sensors embedded in

Data generated by social networks including

Click-stream data generated by web

 Facebook users share nearly 4.16 million pieces of content

Logs generated by web applications

Transactional data generated by banking

UPS is estimated to have 16 petabytes of data about its operations

Walmart generates 2.5 petabytes data every 60 minutes !

• Facebook tags • Facebook Login

Internet community pointed out this practice violated

CEO of Path apologized; app rewritten

Twitter, Foursquare, and Instagram also implicated for

Allows emergency response teams to reach people in distress

What if this information is sold or shared?

Computers use information about buying habits to provide personalized service

Do card users pay less, or do non-users get overcharged?

Computer can use this information to recommend

Scans can also be used to produce custom-made

A computer takes a customer’s measurements. (AP photo/Richard Drew)

We want to be able to stop terrorists without oppressing peaceful dissent, and we

One key practice is de-identification: eliminating personally identifying information

Differential privacy: The query response

• Data Gathering and Privacy

Also used by retailers for direct marketing

Used by online retailers and movie sites

Targeting pregnant women a good way to attract new customers