Taiyo - Ai-Data Quality Trial Task For QA Analyst - Engineer

Uploaded by

khushisrivastav196

0% found this document useful (0 votes)

30 views2 pages

Original Title

Taiyo.ai-data Quality Trial Task for QA Analyst_Engineer (5)

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

30 views2 pages

Taiyo - Ai-Data Quality Trial Task For QA Analyst - Engineer

Uploaded by

khushisrivastav196

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

TAIYŌAI INC.

Data Quality Trial Task for QA Analyst/Engineer

Objective:
Clean and standardize entity names (sponsoring government agencies and awarded
companies) from SAM.gov (USA) and the E-Procurement Government of India databases.
● https://sam.gov/data-services
● https://eprocure.gov.in/eprocure/app
○ Please do in depth research about the above sources to find the right url to find
active tenders and historic tender records dataset
○ Within this find the “entities” i.e. the sponsoring government. Here candidates
need to only focus on two key attributes
A. the attribute that mentions the sponsoring government agency like the
Department of Defense for SAM. gov or the Ministry of Road and Highway for
E Procurement India i.e. the government agency that is giving the contract.
B. The supplier or the company that won the contract or was awarded the
contract, so it could be Tata Steel or Bouygues construction. Now note the bigger
focus is on the company and supplier name. Say Tata Groups can have multiple
entities for example Tata Steel and Tata Steel might be mentioned with the Tata
Steel Europe or Tata Steel USA or other such names. The objective is to have
consistent and reliable final names that you are supposed to clean up.
○ Candidate can show
A. Manual ways they fixed a small sub sample of 100 records.
B. if they can provide a basis of automation of this using LLMs or python
operations and execute on his, they will get extra points.

Part 1: Data Cleaning and Standardization

Task: Manually clean and standardize a subset (100 records) of entity names from the provided
datasets.
Focus: Pay special attention to the company/supplier names, ensuring consistency and
reliability in the final names. For example, different iterations of "Tata Steel" should be
standardized.

Part 2: Automation Proposal and Script Development

Task: Develop a basic automation script or method using Python and language models (OpenAI
API, Llama2, etc.) to standardize entity names in the datasets.
Expectation:
Provide a working script or a detailed proposal for automating the standardization process.
Demonstrate the script's effectiveness on a subset of the data.

Part 3: Scalability and Production Readiness

Task: Document how the proposed method can be scaled and implemented in a production
environment.

Details:
Include considerations for continuous data updating and processing large volumes of data.
Explain how the method adheres to data quality and standards.

Evaluation Criteria
Standards & Quality: Accuracy and consistency in the final cleaned and standardized data.
Scalability: The potential of the method to handle large datasets efficiently in a production
environment.
Documentation: Clarity and comprehensiveness of the documentation, including reasoning for
scaling the solution.

Deliverables
Candidates should submit a Google Drive folder containing:

1. Python Scripts: Code for data cleaning and standardization.

2. Sample Data: Original and final cleaned datasets (100 records minimum).

Documentation:
● Detailed explanation of the methods used.
● Plan for scaling the solution to a production environment.
Additional Task Details
● Data Sources: Use SAM.gov and the E-Procurement Government of India for sourcing
data.
● Tools and Languages: Utilize Python, OpenAI API, Llama2, or other open-source
language models.

Bonus Points: Candidates showing effective automation methods may qualify for extra points
and a higher starting salary.

Task Submission: Kindly fill out this form to submit your task-
https://forms.gle/d3KqrzM9KYY3watx6

Demo E-Learning System Graduation Project Documentation
Document102 pages
Demo E-Learning System Graduation Project Documentation
Safwan Ashraf
100% (1)
GB922-E Information Framework (SID) Model in Excel R14.0
Document49 pages
GB922-E Information Framework (SID) Model in Excel R14.0
Gautam Sharma
100% (1)
Template Request For Enhancement
Document4 pages
Template Request For Enhancement
Rose Ramirez
No ratings yet
New Synopsis BCA Project
Document14 pages
New Synopsis BCA Project
Prince Arora
100% (4)
Managing Conversion of Legacy Data To Oracle Financials
Document20 pages
Managing Conversion of Legacy Data To Oracle Financials
amitvohra
No ratings yet
Caterpillar
Document21 pages
Caterpillar
Vvb Satyanarayana
No ratings yet
Building A Test Case
Document15 pages
Building A Test Case
Tony L Howard
No ratings yet
Data Ware Project Documentation Template PDF
Document10 pages
Data Ware Project Documentation Template PDF
Celestin Francis
No ratings yet
DW Project Process and Documentation
Document10 pages
DW Project Process and Documentation
Raj
No ratings yet
Parser Transformations
Document13 pages
Parser Transformations
Shiva CH
100% (3)
Pega Day 1
Document31 pages
Pega Day 1
Mohanavel Kathirvel
No ratings yet
A Study On Benchmarking Techniques Used by Companies With Reference To TATA STEEL
Document25 pages
A Study On Benchmarking Techniques Used by Companies With Reference To TATA STEEL
CHETAN KHAJURIA
No ratings yet
Attribute Mapping MFG - OM - Advisor - Webcast - 2013 - 0820 PDF
Document62 pages
Attribute Mapping MFG - OM - Advisor - Webcast - 2013 - 0820 PDF
Omer Shahzad
No ratings yet
Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & The Solution
Document15 pages
Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & The Solution
Anonymous S5fcPa
No ratings yet
Practice Questions for UiPath Certified RPA Associate Case Based
From Everand
Practice Questions for UiPath Certified RPA Associate Case Based
Exam OG
No ratings yet
Muthuselvi R - CV
Document6 pages
Muthuselvi R - CV
database.mnrsolutions
No ratings yet
Locale Settings Include
Document11 pages
Locale Settings Include
rajak_khan786
No ratings yet
Technical Terms: Bug: Bug Is An Error, Flaw, or A Problem With The Functionality of An
Document15 pages
Technical Terms: Bug: Bug Is An Error, Flaw, or A Problem With The Functionality of An
sridharkotte
No ratings yet
Quality Assessment of Orange HRM Application: Problem Statement
Document25 pages
Quality Assessment of Orange HRM Application: Problem Statement
sridharadg
No ratings yet
Juoht
Document1 page
Juoht
Kranti Kumar
No ratings yet
Enterprise Model Driven Infrastructure
Document13 pages
Enterprise Model Driven Infrastructure
sistemcba
No ratings yet
Quiz
Document28 pages
Quiz
surya
No ratings yet
Question # A01 - 01 (15 Marks) The Problem Is 3.6 PP 99 of Statistical Quality Control by Douglas C. Montogomery
Document2 pages
Question # A01 - 01 (15 Marks) The Problem Is 3.6 PP 99 of Statistical Quality Control by Douglas C. Montogomery
Aditya Tiwari
No ratings yet
100+ C Interview Questions, Your Interviewer Might Ask: For Free Interview Preparation Check The Links Below
Document87 pages
100+ C Interview Questions, Your Interviewer Might Ask: For Free Interview Preparation Check The Links Below
Pranjal Jalan
No ratings yet
Group Technology For Enhancing The Efficiency of Engineering Activities
Document8 pages
Group Technology For Enhancing The Efficiency of Engineering Activities
abcd
No ratings yet
Software Testing Help
Document14 pages
Software Testing Help
A kumar
No ratings yet
Test Scenario Test Cases: Validation
Document4 pages
Test Scenario Test Cases: Validation
Ranjit Singh
No ratings yet
Study Material For Lab - DM
Document20 pages
Study Material For Lab - DM
Shana Kaur
No ratings yet
Wednesday, June 13, 2012 L&T Infotech Proprietary Page 1 of 4
Document4 pages
Wednesday, June 13, 2012 L&T Infotech Proprietary Page 1 of 4
Hari Reddy
No ratings yet
Functional Test Plan Template
Document8 pages
Functional Test Plan Template
shabiumer
No ratings yet
Naukri MallikaDevi (3y 0m)
Document4 pages
Naukri MallikaDevi (3y 0m)
ELAN LAHSIV
No ratings yet
Tata Steel Ideation
Document13 pages
Tata Steel Ideation
Anu Singh
No ratings yet
Anomaly Detection in Social Networks Twitter Bot
Document11 pages
Anomaly Detection in Social Networks Twitter Bot
Mallikarjun patil
No ratings yet
Data Definition Job Aid
Document10 pages
Data Definition Job Aid
NoSpam
No ratings yet
Performing Lookups and Transformations With Talend - Appirio
Document18 pages
Performing Lookups and Transformations With Talend - Appirio
Uday Kiran
No ratings yet
Project Report: A Brief Study ON "Training and Devlopment of Emloyess"
Document33 pages
Project Report: A Brief Study ON "Training and Devlopment of Emloyess"
Ashutosh Kumar
No ratings yet
00.10 CertifyBest PracticesNamingConventions 12-16
Document8 pages
00.10 CertifyBest PracticesNamingConventions 12-16
Bhupathy M
No ratings yet
FAQ's - Applied Statistics
Document3 pages
FAQ's - Applied Statistics
nitheeshbharathi.m
No ratings yet
Guide For GAOTek Nov 14 2023
Document29 pages
Guide For GAOTek Nov 14 2023
Nelson Rafael Vicent Rueda
No ratings yet
Data Testing White Paper
Document15 pages
Data Testing White Paper
Shiva Krishna Bera
No ratings yet
IMST020
Document3 pages
IMST020
eaglator
No ratings yet
Mobile: E-Mail:: Chetna Rani
Document3 pages
Mobile: E-Mail:: Chetna Rani
chetna thakur
No ratings yet
Mobile: E-Mail:: Chetna Rani
Document3 pages
Mobile: E-Mail:: Chetna Rani
chetna thakur
No ratings yet
Company Valuation Bachelor Thesis
Document6 pages
Company Valuation Bachelor Thesis
sarahgordonanchorage
100% (2)
Gujarat Technological University: Comprehensive Project Report
Document32 pages
Gujarat Technological University: Comprehensive Project Report
Dharmesh Patel
No ratings yet
Sumaps: 79, 11 Cross, 1St Stage, 5 Main Indiranagar, Bangalore-38 Phone: +91-9880133099
Document3 pages
Sumaps: 79, 11 Cross, 1St Stage, 5 Main Indiranagar, Bangalore-38 Phone: +91-9880133099
Ramesh Ankathi
No ratings yet
Arc085 P3 Exam
Document7 pages
Arc085 P3 Exam
cristian abram bautista
No ratings yet
DA 100 Exam Practice Questions
Document21 pages
DA 100 Exam Practice Questions
souihli rim
100% (1)
Chapter 4
Document6 pages
Chapter 4
kelceye2earchiveuploader
No ratings yet
User Defined Attributes Framework WP
Document15 pages
User Defined Attributes Framework WP
tanumanu05
No ratings yet
ETL - Interview Question&Answers-2
Document51 pages
ETL - Interview Question&Answers-2
RajeshKumarInjamala
No ratings yet
Fusion Import Lookup Values Using File-Based Import
Document8 pages
Fusion Import Lookup Values Using File-Based Import
ishaqalikhan
No ratings yet
Gartner 2022 Critical Capabilities For IT Service Management Platforms Demonstration Script
Document6 pages
Gartner 2022 Critical Capabilities For IT Service Management Platforms Demonstration Script
prakash113779
No ratings yet
Next Pathway Hack Backpackers Problem Statement
Document11 pages
Next Pathway Hack Backpackers Problem Statement
sonali Pradhan
No ratings yet
Ilovepdf Merged
Document44 pages
Ilovepdf Merged
ranga thukaram
No ratings yet
Scripting Standards v6.0
Document13 pages
Scripting Standards v6.0
Raymond Reiner Candari
No ratings yet
Da-100 Demo
Document8 pages
Da-100 Demo
janmarkowski23
No ratings yet
K. Raghunath Reddy Mo B Il E: + 9 1 9962237536
Document4 pages
K. Raghunath Reddy Mo B Il E: + 9 1 9962237536
Rajesh Vasili
No ratings yet
Tin Can Quick Start
Document5 pages
Tin Can Quick Start
Lala Vanzuela
No ratings yet
Oracle CRM On Demand Administration Essentials
From Everand
Oracle CRM On Demand Administration Essentials
Venkatesan Sundaram
No ratings yet
IBM Cognos 10 Framework Manager
From Everand
IBM Cognos 10 Framework Manager
Terry Curran
No ratings yet
Monika Resume 2021
Document1 page
Monika Resume 2021
Shubham Jain
No ratings yet
High Availability and Disaster Recovery Considerations For Hyper-V
Document43 pages
High Availability and Disaster Recovery Considerations For Hyper-V
aloshi25
No ratings yet
Envizom V1.1
Document8 pages
Envizom V1.1
angie pretell
No ratings yet
3 - EcoXOppsPerf - EcoX bFO Rules Detail v2019.07
Document33 pages
3 - EcoXOppsPerf - EcoX bFO Rules Detail v2019.07
Glenn Midel Delos Santos
No ratings yet
HPS Reliability Based Maintenance Standard EN
Document6 pages
HPS Reliability Based Maintenance Standard EN
Ibrahem Taha
No ratings yet
10 Event Trends v3
Document39 pages
10 Event Trends v3
ee s
No ratings yet
6 Principles of Lean Construction
Document6 pages
6 Principles of Lean Construction
rmdarisa
No ratings yet
Informatica Developer - Murali
Document3 pages
Informatica Developer - Murali
raaman
No ratings yet
MCS 014 Previous Year Question Papers by Ignouassignmentguru
Document55 pages
MCS 014 Previous Year Question Papers by Ignouassignmentguru
sahil verma
No ratings yet
Engineering Information Systems and The IEEE STD 1220-1994
Document7 pages
Engineering Information Systems and The IEEE STD 1220-1994
moonstar
No ratings yet
Course Plan - Bachelor of Information Technology V4 15 July 2020 (Commence T3 2020)
Document3 pages
Course Plan - Bachelor of Information Technology V4 15 July 2020 (Commence T3 2020)
WolfSeldom
No ratings yet
Korean 6000 Words
Document9 pages
Korean 6000 Words
prxyx1103
No ratings yet
Unit IV 2 Analytics in Business Support Functions
Document14 pages
Unit IV 2 Analytics in Business Support Functions
Kenil Doshi
No ratings yet
Kranthi QC Resume
Document2 pages
Kranthi QC Resume
Rajesh Kumar
No ratings yet
Knowledge Grid What
Document15 pages
Knowledge Grid What
fsdfsdfsd
No ratings yet
Introduction To Drools: Dr. Adrian Giurca Brandenburg University of Technology Cottbus, Germany
Document22 pages
Introduction To Drools: Dr. Adrian Giurca Brandenburg University of Technology Cottbus, Germany
princeranjan
No ratings yet
Current Log
Document34 pages
Current Log
BKO
No ratings yet
The Role of Cobots in Industry
Document14 pages
The Role of Cobots in Industry
4JN20IS407Shaziya Nikhath Khanum
No ratings yet
The Surprising Answer The Tech Goliaths Don't Want You To Know!
Document5 pages
The Surprising Answer The Tech Goliaths Don't Want You To Know!
asim
No ratings yet
Index: SR - No Content Page No
Document11 pages
Index: SR - No Content Page No
Sakshi Tamshetti
No ratings yet
Stuvia 2038540 Fac3762 Exam Pack 2023
Document223 pages
Stuvia 2038540 Fac3762 Exam Pack 2023
Thulani Ndlovu
No ratings yet
Zoho CRM Tips, Tricks & Hacks Prepared by Alderbest Solutions
Document14 pages
Zoho CRM Tips, Tricks & Hacks Prepared by Alderbest Solutions
Juan-Pierre Eybers
No ratings yet
Day 4 Questions
Document10 pages
Day 4 Questions
Qamar Shahzad
No ratings yet
Inventory System
Document21 pages
Inventory System
analyn dalida
No ratings yet
SAP EHS Cloud - Overview
Document57 pages
SAP EHS Cloud - Overview
albanote
No ratings yet
AIS Chapter 6 Slides
Document95 pages
AIS Chapter 6 Slides
getaw bayu
No ratings yet
2019 Six - Practices - For - Effective - Portfolio - MGMT - 385064
Document13 pages
2019 Six - Practices - For - Effective - Portfolio - MGMT - 385064
Laura
No ratings yet
SWE2004 - Software Design and Architecture
Document20 pages
SWE2004 - Software Design and Architecture
Pradhyumna Gunda
No ratings yet
CPQ Integrations
Document315 pages
CPQ Integrations
Ahmad Badawy
No ratings yet