Welcome to Scribd!

Red Act Text Document Using Python

Uploaded by

0% found this document useful (0 votes)

13 views2 pages

This Python code extracts text from a PDF file, redacts any personally identifiable information (PII) using regular expressions, and saves the redacted text to a CSV file and new redacted PDF file. It redacts phone numbers, email addresses, organization names, and employee names by replacing them with generic placeholders. The redactions are also logged to a CSV file with the matched text and entity type.

Original Description:

Original Title

Red Act Text Document using python (1)(3)

Copyright

Available Formats

TXT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

13 views2 pages

Red Act Text Document Using Python

Uploaded by

veejay78

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

import csv

import io
import os
import re
import pdfminer.high_level
from reportlab.pdfgen import canvas

import re

def redact_pii(text):
# Phone numbers
text = re.sub(r"\b\d{10}\b", "xxxxxx", text)

# Email addresses
text = re.sub(r"\S+@\S+\.\S+", "yyyyyy", text)

# Organization names
org_names = ["Google", "Microsoft", "Apple"]
for org_name in org_names:
text = re.sub(org_name, "ORGANIZATION_NAME", text, flags=re.IGNORECASE)

# Employee names
emp_names = ["Jane", "John", "Mary"]
for emp_name in emp_names:
text = re.sub(emp_name, "EMPLOYEE_NAME", text, flags=re.IGNORECASE)

return text

# Open the PDF file and read its contents

with open("/content/demo.pdf", "rb") as pdf_file:
content = pdfminer.high_level.extract_text(pdf_file)

# Create a CSV writer to save the redacted PII content

with open("redact.csv", "w", newline="") as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerow(["Entity", "Type"])

# Redact PII content and save to CSV

redacted_content = redact_pii(content)
phone_numbers = re.findall(r"(?<!Phone\s)\b\d{10}\b", content)
for phone_number in phone_numbers:
csv_writer.writerow([phone_number, "Phone no"])

org_names = ["Google", "Microsoft", "Apple"]

for org_name in org_names:
if org_name.lower() in content.lower():
csv_writer.writerow([org_name, "Organization"])

emp_names = ["Jane", "John", "Mary"]

for emp_name in emp_names:
if emp_name.lower() in content.lower():
csv_writer.writerow([emp_name, "Employee"])

# Save the redacted PDF file

pdf_canvas = canvas.Canvas("redacted_demo.pdf")
y = 750
for line in redacted_content.split("\n"):
pdf_canvas.drawString(50, y, line)
y -= 20
pdf_canvas.save()

Python
Document31 pages
Python
Rimjhim Kymari
No ratings yet
Computer Science-CLASS-12-RECORD PROGRAMS
Document10 pages
Computer Science-CLASS-12-RECORD PROGRAMS
nitheeshchowdary2007
No ratings yet
File Handling Worksheet
Document19 pages
File Handling Worksheet
C1A 05 Ashwina J
No ratings yet
Osintgram
Document28 pages
Osintgram
josephkandolo5
No ratings yet
IDAP Assignment
Document6 pages
IDAP Assignment
Rithik Reddy
No ratings yet
Sure
Document22 pages
Sure
Bratadeep Sarkar
No ratings yet
Python Lab ALL 10 Prgms
Document16 pages
Python Lab ALL 10 Prgms
dvyvmsfcdwzbxpmymt
No ratings yet
Practical File CS Nihal Saini
Document28 pages
Practical File CS Nihal Saini
nik
No ratings yet
Python .13 18
Document15 pages
Python .13 18
makanijenshi2409
No ratings yet
Python 13-18
Document9 pages
Python 13-18
makanijenshi2409
No ratings yet
Cs Project
Document65 pages
Cs Project
saifah
No ratings yet
Python Lab
Document16 pages
Python Lab
Siddharth
No ratings yet
Python 2 Lab Esy
Document34 pages
Python 2 Lab Esy
Sharukh Hussain
No ratings yet
6-10 Python Lab Program
Document16 pages
6-10 Python Lab Program
abcd12341109
No ratings yet
XII Cs Practical
Document7 pages
XII Cs Practical
ACHSPAI
No ratings yet
PYTHON
Document2 pages
PYTHON
bkcc.feedback
No ratings yet
Aim: Write A Program To Parse XML Text, Generate Web Graph and Compute Topic Specific Page Rank. Source Code
Document5 pages
Aim: Write A Program To Parse XML Text, Generate Web Graph and Compute Topic Specific Page Rank. Source Code
SumitMaurya
0% (1)
3
Document7 pages
3
Rithik Reddy
No ratings yet
Python RR
Document39 pages
Python RR
Rachna
No ratings yet
File Programs
Document23 pages
File Programs
Ishaan Seth
No ratings yet
Google Scrapper Com Envio de Email Funcionando
Document3 pages
Google Scrapper Com Envio de Email Funcionando
lucas
No ratings yet
CS Practical File 2023-24
Document51 pages
CS Practical File 2023-24
apnshayar
No ratings yet
Aryan Cs Project
Document28 pages
Aryan Cs Project
aryan12gautam12
No ratings yet
CS Practical File 2023-24
Document49 pages
CS Practical File 2023-24
Souvik JEE 2024
No ratings yet
Accessing Internet Data
Document3 pages
Accessing Internet Data
Jean-Claude Bruce Lee
No ratings yet
What Is Meant by Unpacking Columns ?: (X, Y) X y (A, B, C) A B C
Document8 pages
What Is Meant by Unpacking Columns ?: (X, Y) X y (A, B, C) A B C
kantamanenipriyamtech
No ratings yet
Lab Report 05
Document5 pages
Lab Report 05
Imamul Hasan
No ratings yet
Practice Questions For Practical
Document11 pages
Practice Questions For Practical
Ishaan Seth
No ratings yet
Karan Offfical PDF
Document16 pages
Karan Offfical PDF
Neeraj Kumar
No ratings yet
Practical 3 - File Handling
Document5 pages
Practical 3 - File Handling
Shivanee Shukla
No ratings yet
Binary File
Document8 pages
Binary File
V.R. Murugan
No ratings yet
Nekobin
Document2 pages
Nekobin
Forwarding
No ratings yet
Python Assignment Harsh Ue218122
Document8 pages
Python Assignment Harsh Ue218122
Harsh
No ratings yet
PRACTICAL FILE CS Armaan Jaiswal
Document26 pages
PRACTICAL FILE CS Armaan Jaiswal
nik
No ratings yet
Project - Employee Management System
Document23 pages
Project - Employee Management System
jitendra kumar
No ratings yet
25 Awesome Python Scripts
Document26 pages
25 Awesome Python Scripts
moises tinte
No ratings yet
G12 - Computer Science - Lab Programs 1 To 20
Document19 pages
G12 - Computer Science - Lab Programs 1 To 20
Shaibaaz Ali
No ratings yet
Vaibhav Sharma 12-A Roll No. 34assignment - 3 Binary File
Document12 pages
Vaibhav Sharma 12-A Roll No. 34assignment - 3 Binary File
Vaibhav Sharma
No ratings yet
Xii CS Lab
Document17 pages
Xii CS Lab
Shravan
100% (1)
AND 9 To 11
Document13 pages
AND 9 To 11
Sanchit Thakur
No ratings yet
Self Self Self Self Self Self: ####Attribute ###Class Variable ### Contructor or Init Method
Document2 pages
Self Self Self Self Self Self: ####Attribute ###Class Variable ### Contructor or Init Method
Arun Mmohanty
No ratings yet
Shreyas Practical Doc Final
Document44 pages
Shreyas Practical Doc Final
shreyassantoshkurup
No ratings yet
COMPUTER SCIENCE PROJECT FILE (Ayush)
Document40 pages
COMPUTER SCIENCE PROJECT FILE (Ayush)
Pratibha Panwar
No ratings yet
Xii Programs Binary Files
Document4 pages
Xii Programs Binary Files
MEH BAHH
No ratings yet
Class Xii Report File 29 11 2022
Document56 pages
Class Xii Report File 29 11 2022
Kartikey Mathur
No ratings yet
CS Practical Record by Siphin 12
Document42 pages
CS Practical Record by Siphin 12
SIPHIN SAMSON
No ratings yet
CS Practical File 2023-24 (Python and SQL)
Document52 pages
CS Practical File 2023-24 (Python and SQL)
jitendratyagi2005
No ratings yet
PTT - II Program With Answs
Document3 pages
PTT - II Program With Answs
Fazal Qureshi
No ratings yet
Sample Program File With Output and Index
Document27 pages
Sample Program File With Output and Index
krishna devnani
No ratings yet
Cbse-12th-Lab Programs-Cs-Part-Ii
Document22 pages
Cbse-12th-Lab Programs-Cs-Part-Ii
Shunmuga Sundaram
No ratings yet
Important Questions of CSV File in Python
Document9 pages
Important Questions of CSV File in Python
Namita Sahu
50% (2)
Lab7 - Python Assisted Exploitation
Document11 pages
Lab7 - Python Assisted Exploitation
Saw Gyi
No ratings yet
Binary File Important Questions
Document10 pages
Binary File Important Questions
aakarsh29baba
No ratings yet
Program-1: Source Code
Document17 pages
Program-1: Source Code
Mukesh Kaushik
No ratings yet
Binary File Handling
Document8 pages
Binary File Handling
akshar.sharma94
No ratings yet
Practical File Cs Nihal Saini
Document29 pages
Practical File Cs Nihal Saini
nik
No ratings yet
SDFG
Document4 pages
SDFG
gprasadatvu
No ratings yet
Programming Activity # 7 - Arrays and File Handling
Document2 pages
Programming Activity # 7 - Arrays and File Handling
Nat Nat
No ratings yet
10 Lessons in Front-end
From Everand
10 Lessons in Front-end
Krasimir Tsonev
Rating: 2 out of 5 stars
2/5 (1)
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
SentiGaze SDK Documentation
Document3,024 pages
SentiGaze SDK Documentation
veejay78
No ratings yet
Cloud Web Help
Document306 pages
Cloud Web Help
veejay78
No ratings yet
Goedertier 09 A
Document36 pages
Goedertier 09 A
veejay78
No ratings yet
AMO Siddha Notice 12092020 PDF
Document70 pages
AMO Siddha Notice 12092020 PDF
veejay78
No ratings yet
Match Details
Document1 page
Match Details
veejay78
No ratings yet
OWASP AlphaRelease CodeReviewGuide2.0
Document223 pages
OWASP AlphaRelease CodeReviewGuide2.0
veejay78
No ratings yet