Welcome to Scribd!

Skip carousel

Checkpoint 3

Uploaded by

marwaan.nabil1

0% found this document useful (0 votes)

2 views16 pages

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

2 views16 pages

Checkpoint 3

Uploaded by

marwaan.nabil1

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 16

Search inside document

Preprocessing of Genotyping & scRna seq data

Checkpoint (1) 25/09/2023

Director: Pr. Lluis Quintana-Murci Co-supervisor: Dr. Maxime Rotival
Human Evolutionary Genetics Unit
Marwan Sharawy
Pipline progress
• Utilizing a combination of KNN10 and doublet detection for cryptic doublets.

• Filtering out cells with more than 20% mitochondrial reads.

• Excluding cells with a low number of genes (< 0.02) by computing the 2nd percentile of gene counts per library and
removing cells below this threshold.

• Total cells removed: 86,039 (11% of total cells).

• Performing Scran normalization, feature selection, dimensionality reduction (PCA), and clustering.
Total merged libraries

726k total cells after filtering /88 library

Total merged libraries

726k total cells after filtering /88 library

Filtered_out 86k

• True_DBL 64226
• low_Gene_Counts 16280
• Cryptic_doublet 5443
Total merged libraries
Total merged libraries
Cryptic doublets accuracy

DD&knn10 &scr
DD&KNN5 TPR 89%
TPR 84% DD&scr
TPR 0.81%

DD&knn5&scr
DD&knn10 TPR 85 %
True postive ratio (TPR) :88%
Cryptic doublets accuracy

Filtered_out
• True_DBL 64226
• low_Gene_Counts 16280
• Cryptic_doublet 5443

• Anticipated cryptic doublets across all libraries:

5,320
(computed as TRUE_DBL * 1/12).

• Considering a 12% false positive rate, there are 642

expected false positives in the dataset.

• Out of 725,000 total cells, 642 cells are identified as

cryptic doublets (0.08% of total cells).
Features selection

• Perform feature selection by computing the coefficient of variation for all genes.
• Select N genes based on their coefficient of variation and mean expression and utilize these features for subsequent
downstream analysis. means>0.0125b& dispersions>0.2
means>0.001 &dispersions<1
Features selection

• Testing deviance for feature selection which works on raw counts [Germain et al., 2020]
• Quantifies whether genes show a constant expression profile across cells

Additionally, any gene labeled as highly

deviant (top 12,000 deviant genes) within
all 88 libraries is kept.

• means>0.0125b&dispersions>0.2
• means>0.001 &dispersions<1
• Highly deviant in all 88 libraries

• Total genes filtered : 21241

• Total genes retained 15885
Clustering

• Computing the neighborhood graph through calculating a Euclidean distance matrix on the PC-reduced
expression space for all cells and then connect each cell to its K most similar cells.

('PCA + euclidean + kNN + Leiden')

• Currently testing :

('OT + kNN + Leiden')

• OT defines distances to compare high-dimensional data represented as probability distributions.
Integration

• SCVI integration of 88 library

Integration

• Leiden clustering : 20 cluster at 0.5 resolution

Cell type annotation

• Using CellID

• 2 references used to automatically annotate cell type( hao20 “same one yann used”)
Cell type annotation
• Using CellID

• 2 references used to automatically annotate cell type (“using 150k cells from yann datatset ”)
To Do

• Manual cell annotations

• Rerun yann scripts to confirm my results (“multiBatchNorm followed by harmony integration” )

• Enhance integration accuracy by re-running integration using scANVI, a scvi model that incorporates
annotated cell types for more precise results.

• Conduct an in-depth analysis of pilot 5 libraries, with a specific focus on comparing individuals
sampled in both Pilot 5 and V3, involving different strains of COVID and influenza.

• Obtain preliminary insights into the effects of different viruses by November 28.

• 13:15 october preparing for exam on 16th of october

• 27 of november mandatory poster session “ sorbonne ”

Boiler Design Basics
Document92 pages
Boiler Design Basics
kumarkaul
100% (10)
QTL Mapping
Document20 pages
QTL Mapping
Vivay Salazar
80% (5)
Dna Sequencing and Gene Clonning: Kalpana Dalei
Document31 pages
Dna Sequencing and Gene Clonning: Kalpana Dalei
Binod Sahu
100% (1)
RNA-Seq Analysis Course
Document40 pages
RNA-Seq Analysis Course
jubatus.libro
No ratings yet
Solution Manual For Oracle 12c SQL 3rd Edition Casteel
Document6 pages
Solution Manual For Oracle 12c SQL 3rd Edition Casteel
HeatherRobertstwopa
100% (35)
Gene Cloning
Document32 pages
Gene Cloning
ishanimehta
100% (1)
Microsoft Office 365 Customer Decision Framework
Document14 pages
Microsoft Office 365 Customer Decision Framework
thinkofdevil
No ratings yet
Basic Survival Skills Training For Survival Groups
Document5 pages
Basic Survival Skills Training For Survival Groups
evrazian
No ratings yet
Lump Sum Contract Notes
Document1 page
Lump Sum Contract Notes
amrkipl
No ratings yet
University of Hargeisa GIT Pharmacology Review Questions: Name:aniisa Muse Ahmed Faculty:midwifery ID:1716642 Class:3A
Document4 pages
University of Hargeisa GIT Pharmacology Review Questions: Name:aniisa Muse Ahmed Faculty:midwifery ID:1716642 Class:3A
Aniza Mouse
100% (1)
OM Best Practices Guidelines V3.0
Document98 pages
OM Best Practices Guidelines V3.0
Enrique Balan Romero
No ratings yet
Eamcet 2013-2014 Opening and Closing Ranks
Document201 pages
Eamcet 2013-2014 Opening and Closing Ranks
Carolyn C. Eyre
No ratings yet
DNA Microarray
Document37 pages
DNA Microarray
lordniklaus
No ratings yet
Massive MIMO For 5G - 2015 - Emil Bjornson
Document58 pages
Massive MIMO For 5G - 2015 - Emil Bjornson
hiba
No ratings yet
DNA Microarrays
Document39 pages
DNA Microarrays
hira jamil
No ratings yet
ddPCRuncertainties PHC
Document25 pages
ddPCRuncertainties PHC
Amira Baihani
No ratings yet
L05 Deseq2 Anders
Document46 pages
L05 Deseq2 Anders
xin dd
No ratings yet
Gene Expression and DNA Chips
Document88 pages
Gene Expression and DNA Chips
Akshita
No ratings yet
Checkpoint 2
Document13 pages
Checkpoint 2
marwaan.nabil1
No ratings yet
Cnvs Dataset and Analysis: Prepared By: Mohammed Abdulghani Taha Supervised By: Assist. Prof. Gokmen Altay
Document39 pages
Cnvs Dataset and Analysis: Prepared By: Mohammed Abdulghani Taha Supervised By: Assist. Prof. Gokmen Altay
Muhammad A. Bazzaz
No ratings yet
Massively Parallel Sequencing For Forensic DNA Using In-House PCR
Document46 pages
Massively Parallel Sequencing For Forensic DNA Using In-House PCR
AdrianaAlexandraIbarraRodríguez
No ratings yet
Checkpoint 4 PP TX
Document29 pages
Checkpoint 4 PP TX
marwaan.nabil1
No ratings yet
Seminar
Document21 pages
Seminar
Priyanka Saini
No ratings yet
Recombinant DNA Technology: Common General Cloning Strategy
Document42 pages
Recombinant DNA Technology: Common General Cloning Strategy
hugo200887
No ratings yet
DIATRON Abacus 5 Leaflet
Document4 pages
DIATRON Abacus 5 Leaflet
Popov
No ratings yet
DIATRON Abacus 5 Leaflet
Document4 pages
DIATRON Abacus 5 Leaflet
Popov
No ratings yet
DNA Microarrays: DR Divya Gupta
Document33 pages
DNA Microarrays: DR Divya Gupta
Ruchi Sharma
No ratings yet
Mythic Technology: Let's Talk About Innovation
Document4 pages
Mythic Technology: Let's Talk About Innovation
Jose Godoy
No ratings yet
Quantifying Gene Expression With RT-PCR
Document22 pages
Quantifying Gene Expression With RT-PCR
Đặng Lan Hương
No ratings yet
Identification of Tissue-Specific Cdnas
Document4 pages
Identification of Tissue-Specific Cdnas
jasdeep089
No ratings yet
Image Analysis: Pre-Processing of Affymetrix Arrays
Document14 pages
Image Analysis: Pre-Processing of Affymetrix Arrays
satyabasha
No ratings yet
Data Analytics Theory
Document54 pages
Data Analytics Theory
Chandra Mohan
No ratings yet
Genomics Lectures 9 To 14-2023 PDF
Document65 pages
Genomics Lectures 9 To 14-2023 PDF
Ahire Ganesh Ravindra bs20b004
No ratings yet
Forensic DNA Fingerprinting Using Restriction Enzymes
Document33 pages
Forensic DNA Fingerprinting Using Restriction Enzymes
bharad waj
No ratings yet
Microarraytechnique 200506031603
Document78 pages
Microarraytechnique 200506031603
مصطفى الورديغي
No ratings yet
DNA Microarray:: A Recombinant DNA Method
Document12 pages
DNA Microarray:: A Recombinant DNA Method
Kenty Regina
No ratings yet
Bio 10
Document14 pages
Bio 10
nawal abdelwahed
No ratings yet
DNA Microarray Overview: (Some Slides From Dr. Holly Dressman, Duke University
Document34 pages
DNA Microarray Overview: (Some Slides From Dr. Holly Dressman, Duke University
Gayathri Maigandan
No ratings yet
An Improved Neural Network Algorithm For Classifying The Transmission Line Faults
Document20 pages
An Improved Neural Network Algorithm For Classifying The Transmission Line Faults
jijo123408
No ratings yet
5 Tools of Biotechnology 2019-1
Document110 pages
5 Tools of Biotechnology 2019-1
chizy banana
No ratings yet
Day 2 General Microarray Lecture - Ver11706
Document48 pages
Day 2 General Microarray Lecture - Ver11706
amit
No ratings yet
Molecular Cloning: Dr. Robin Herlands Genetics, Biology 300, Nevada State College
Document56 pages
Molecular Cloning: Dr. Robin Herlands Genetics, Biology 300, Nevada State College
Samir Patel
No ratings yet
Mock Exam Model Answers
Document13 pages
Mock Exam Model Answers
Fatma Zorlu
No ratings yet
Patologie 1
Document104 pages
Patologie 1
Gabriela Baltatescu
No ratings yet
Cloning and Vector: Instructor: Prof. Myoung-Dong Kim T: 6458, Mdkim@kangwon - Ac.kr Room 411, Ag. BLD #3
Document41 pages
Cloning and Vector: Instructor: Prof. Myoung-Dong Kim T: 6458, Mdkim@kangwon - Ac.kr Room 411, Ag. BLD #3
ferinarz
No ratings yet
Genomes: Number of Base Pairs
Document38 pages
Genomes: Number of Base Pairs
shoober
No ratings yet
Investigating The Impact of CNN Depth On Neonatal Seizure Detection Performance
Document4 pages
Investigating The Impact of CNN Depth On Neonatal Seizure Detection Performance
Anonymous 1DK1jQgAG
No ratings yet
Genomics in the AWS Cloud: Analyzing Genetic Code Using Amazon Web Services
From Everand
Genomics in the AWS Cloud: Analyzing Genetic Code Using Amazon Web Services
David Wall
No ratings yet
Nanofluidic Enabled Single Molecule Imaging For DNA Analysis
Document19 pages
Nanofluidic Enabled Single Molecule Imaging For DNA Analysis
Rupshika Rajasekaran
No ratings yet
Crosstabs: Notes
Document15 pages
Crosstabs: Notes
AdeLia Nur Fitriana
No ratings yet
Trancriptome and Proteome Analysis
Document68 pages
Trancriptome and Proteome Analysis
Neeru Redhu
No ratings yet
Sri PDF
Document55 pages
Sri PDF
Srikanth Sri
No ratings yet
Enrique F. Schisterman, PHD Epidemiology Branch - Despr-Nichd
Document108 pages
Enrique F. Schisterman, PHD Epidemiology Branch - Despr-Nichd
kylrn1
No ratings yet
Lab3 NguyenQuocKhanh ITITIU18186
Document7 pages
Lab3 NguyenQuocKhanh ITITIU18186
Kay Nguyen
No ratings yet
Analysis of Microarray Data Lecture 3 Visualization and Functional Analysis 2020-08-12 08 - 02 - 55
Document32 pages
Analysis of Microarray Data Lecture 3 Visualization and Functional Analysis 2020-08-12 08 - 02 - 55
gfdsgefgij
No ratings yet
Output
Document4 pages
Output
shintabratha
No ratings yet
TNBC Analysics Project
Document20 pages
TNBC Analysics Project
shashank rangare
No ratings yet
DNA Microarray
Document34 pages
DNA Microarray
Shivam Gupta
No ratings yet
Neural - Networks
Document47 pages
Neural - Networks
howgibaa
No ratings yet
Using Machine Learning Techniques For Prediction of Breast Cancer Subtypes - Upload
Document30 pages
Using Machine Learning Techniques For Prediction of Breast Cancer Subtypes - Upload
Erik sali
No ratings yet
Microarrays Lecture Notes (No Wet Lab)
Document2 pages
Microarrays Lecture Notes (No Wet Lab)
Meri Sunder
No ratings yet
Quantitative RT-PCR Protocol (Sybr Green I)
Document8 pages
Quantitative RT-PCR Protocol (Sybr Green I)
u77
No ratings yet
Microarray Image Analysis and Gene Expression Ratio Statistics
Document42 pages
Microarray Image Analysis and Gene Expression Ratio Statistics
Samir Sabry
No ratings yet
Math 1060 - Lecture 7
Document26 pages
Math 1060 - Lecture 7
John Lee
No ratings yet
Leukemia Prediction Using Random Forest Algorithm
Document8 pages
Leukemia Prediction Using Random Forest Algorithm
TJPRC Publications
No ratings yet
Checkpoint 1
Document9 pages
Checkpoint 1
marwaan.nabil1
No ratings yet
Hematology
Document2 pages
Hematology
vivicon2004
No ratings yet
Analysis of Nucleic Acids - November 2020
Document177 pages
Analysis of Nucleic Acids - November 2020
Papagaj Ćiro
No ratings yet
Higher Education Loans Board: Loan Disbursement Report
Document2 pages
Higher Education Loans Board: Loan Disbursement Report
Edward Kalvis
No ratings yet
Chapter One
Document36 pages
Chapter One
Jeremiah Alhassan
100% (1)
Abe Tos 2023
Document28 pages
Abe Tos 2023
Delenay
No ratings yet
Subject: SRE (Software Requirement Engineering) Topic: SRS Document of Project. Project: Medical Store Management. Group Members
Document9 pages
Subject: SRE (Software Requirement Engineering) Topic: SRS Document of Project. Project: Medical Store Management. Group Members
abdullah amjad
No ratings yet
Lab Report# 07: To Analyze The Effect of Parallel Compensation On Reactive Power Flow of Transmission Line
Document11 pages
Lab Report# 07: To Analyze The Effect of Parallel Compensation On Reactive Power Flow of Transmission Line
muhammad irfan
No ratings yet
Mitsubishi Electric MSZ-FH VE Remote Controller Eng PDF
Document1 page
Mitsubishi Electric MSZ-FH VE Remote Controller Eng PDF
Shadi Mattar
No ratings yet
EC Ch04 Building An E-Commerce Presence
Document72 pages
EC Ch04 Building An E-Commerce Presence
Udhaya Shatis
No ratings yet
Commonly Asked Interview Questions Co Op Version NEW
Document2 pages
Commonly Asked Interview Questions Co Op Version NEW
Gail T. Borromeo
No ratings yet
Vacancy For Ceo-Tanzania Association of Accountants-Revised
Document2 pages
Vacancy For Ceo-Tanzania Association of Accountants-Revised
Othman Michuzi
No ratings yet
Data Structures Lab Manual
Document159 pages
Data Structures Lab Manual
abhiraj1234
No ratings yet
Natal Chart (Data Sheet) : Jul - Day 2442664.865117 TT, T 46.1 Sec
Document1 page
Natal Chart (Data Sheet) : Jul - Day 2442664.865117 TT, T 46.1 Sec
Деан Веселиновић
No ratings yet
Josef Suk-Bagatella
Document6 pages
Josef Suk-Bagatella
Alessandro Caspani
No ratings yet
5197 PDF
Document1 page
5197 PDF
arpannath
No ratings yet
Integration Patterns and Practices
Document53 pages
Integration Patterns and Practices
atifhassansiddiqui
No ratings yet
Philex Mining
Document13 pages
Philex Mining
christine reyes
No ratings yet
Image Harmonization With Diffusion Model
Document8 pages
Image Harmonization With Diffusion Model
zhangzhengyi443
No ratings yet
Semantic Analysis of The Verbal Phraseological Units On The Examples in The German, English and Karakalpak Languages
Document3 pages
Semantic Analysis of The Verbal Phraseological Units On The Examples in The German, English and Karakalpak Languages
Editor IJTSRD
No ratings yet
Reading Comprehension Read The Article Below and Then Answer The Questions That Follow
Document4 pages
Reading Comprehension Read The Article Below and Then Answer The Questions That Follow
juan m isaza
No ratings yet
MISSISSIPPI Chanery Clerks
Document7 pages
MISSISSIPPI Chanery Clerks
2Plus
No ratings yet
Fon Iv Ass1
Document2 pages
Fon Iv Ass1
Hemanth
No ratings yet
HAL S3201 - Brochure 2011 - Noprices
Document6 pages
HAL S3201 - Brochure 2011 - Noprices
Omar García Muñoz
No ratings yet
AlbertEinstein PDF
Document3 pages
AlbertEinstein PDF
ABDUL ALEEM
No ratings yet