Getting started with Deep Learning for Natural Language Processing: Learn how to build NLP applications with Deep Learning (English Edition)

Ebook678 pages7 hours

Getting started with Deep Learning for Natural Language Processing: Learn how to build NLP applications with Deep Learning (English Edition)

Name: Getting started with Deep Learning for Natural Language Processing: Learn how to build NLP applications with Deep Learning (English Edition)
Author: Sunil Patel
ISBN: 9789389898125

By Sunil Patel

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Natural language processing (NLP) is one of the areas where many Machine Learning and Deep Learning techniques are applied.
This book covers wide areas, including the fundamentals of Machine Learning, Understanding and optimizing Hyperparameters, Convolution Neural Networks (CNN), and Recurrent Neural Networks (RNN). This book not only covers the classical concept of text processing but also shares the recent advancements. This book will empower users in designing networks with the least computational and time complexity. This book not only covers basics of Natural Language Processing but also helps in deciphering the logic behind advanced concepts/architecture such as Batch Normalization, Position Embedding, DenseNet, Attention Mechanism, Highway Networks, Transformer models and Siamese Networks. This book also covers recent advancements such as ELMo-BiLM, SkipThought, and Bert. This book also covers practical implementation with step by step explanation of deep learning techniques in Topic Modelling, Text Generation, Named Entity Recognition, Text Summarization, and Language Translation. In addition to this, very advanced and open to research topics such as Generative Adversarial Network and Speech Processing are also covered.

Skip carousel

Security

LanguageEnglish

PublisherBPB Online LLP

Release dateJan 13, 2021

ISBN9789389898125

Author

Sunil Patel

Related authors

Skip carousel

Related to Getting started with Deep Learning for Natural Language Processing

Related ebooks

Skip carousel

Applied Machine Learning Solutions with Python: Production-ready ML Projects Using Cutting-edge Libraries and Powerful Statistical Techniques (English Edition)
Ebook
Applied Machine Learning Solutions with Python: Production-ready ML Projects Using Cutting-edge Libraries and Powerful Statistical Techniques (English Edition)
bySiddhanta Bhatta
Rating: 0 out of 5 stars
0 ratings
Beginning with Machine Learning: The Ultimate Introduction to Machine Learning, Deep Learning, Scikit-learn, and TensorFlow (English Edition)
Ebook
Beginning with Machine Learning: The Ultimate Introduction to Machine Learning, Deep Learning, Scikit-learn, and TensorFlow (English Edition)
byDr. Amit Dua
Rating: 0 out of 5 stars
0 ratings
Practical Data Science with Jupyter: Explore Data Cleaning, Pre-processing, Data Wrangling, Feature Engineering and Machine Learning using Python and Jupyter (English Edition)
Ebook
Practical Data Science with Jupyter: Explore Data Cleaning, Pre-processing, Data Wrangling, Feature Engineering and Machine Learning using Python and Jupyter (English Edition)
byPrateek Gupta
Rating: 0 out of 5 stars
0 ratings
Deep Learning with TensorFlow
Ebook
Deep Learning with TensorFlow
byMd. Rezaul Karim
Rating: 5 out of 5 stars
5/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Ebook
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
byAlexandra George
Rating: 0 out of 5 stars
0 ratings
Advanced Deep Learning with TensorFlow 2 and Keras - Second Edition: Apply DL, GANs, VAEs, deep RL, unsupervised learning, object detection and segmentation, and more, 2nd Edition
Ebook
Advanced Deep Learning with TensorFlow 2 and Keras - Second Edition: Apply DL, GANs, VAEs, deep RL, unsupervised learning, object detection and segmentation, and more, 2nd Edition
byRowel Atienza
Rating: 0 out of 5 stars
0 ratings
Elements of Deep Learning for Computer Vision: Explore Deep Neural Network Architectures, PyTorch, Object Detection Algorithms, and Computer Vision Applications for Python Coders (English Edition)
Ebook
Elements of Deep Learning for Computer Vision: Explore Deep Neural Network Architectures, PyTorch, Object Detection Algorithms, and Computer Vision Applications for Python Coders (English Edition)
byBharat Sikka
Rating: 0 out of 5 stars
0 ratings
Designing Machine Learning Systems with Python
Ebook
Designing Machine Learning Systems with Python
byDavid Julian
Rating: 0 out of 5 stars
0 ratings
Python Machine Learning: A Practical Beginner's Guide to Understanding Machine Learning, Deep Learning and Neural Networks with Python, Scikit-Learn, Tensorflow and Keras
Ebook
Python Machine Learning: A Practical Beginner's Guide to Understanding Machine Learning, Deep Learning and Neural Networks with Python, Scikit-Learn, Tensorflow and Keras
byBrandon Railey
Rating: 0 out of 5 stars
0 ratings
Time Series Forecasting using Deep Learning: Combining PyTorch, RNN, TCN, and Deep Neural Network Models to Provide Production-Ready Prediction Solutions
Ebook
Time Series Forecasting using Deep Learning: Combining PyTorch, RNN, TCN, and Deep Neural Network Models to Provide Production-Ready Prediction Solutions
byIvan Gridin
Rating: 0 out of 5 stars
0 ratings
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
Ebook
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
byPurna Chander Rao. Kathula
Rating: 5 out of 5 stars
5/5
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Ebook
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
byAlok Kumar
Rating: 0 out of 5 stars
0 ratings
Python Machine Learning Projects: Learn how to build Machine Learning projects from scratch (English Edition)
Ebook
Python Machine Learning Projects: Learn how to build Machine Learning projects from scratch (English Edition)
byDr. Deepali R Vora
Rating: 0 out of 5 stars
0 ratings
Practical Machine Learning for Data Analysis Using Python
Ebook
Practical Machine Learning for Data Analysis Using Python
byAbdulhamit Subasi
Rating: 0 out of 5 stars
0 ratings
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
Ebook
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
byWilliam Sullivan
Rating: 1 out of 5 stars
1/5
Machine Learning: Adaptive Behaviour Through Experience: Thinking Machines
Ebook
Machine Learning: Adaptive Behaviour Through Experience: Thinking Machines
byalasdair gilchrist
Rating: 4 out of 5 stars
4/5
Deep Learning with C#, .Net and Kelp.Net: The Ultimate Kelp.Net Deep Learning Guide
Ebook
Deep Learning with C#, .Net and Kelp.Net: The Ultimate Kelp.Net Deep Learning Guide
byMatt R. Cole
Rating: 0 out of 5 stars
0 ratings
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
Ebook
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
bySanket Subhash Khandare
Rating: 0 out of 5 stars
0 ratings
Practical Machine Learning with Spark: Uncover Apache Spark’s Scalable Performance with High-Quality Algorithms Across NLP, Computer Vision and ML
Ebook
Practical Machine Learning with Spark: Uncover Apache Spark’s Scalable Performance with High-Quality Algorithms Across NLP, Computer Vision and ML
byGourav Gupta
Rating: 0 out of 5 stars
0 ratings
Deep Learning with Keras: Beginner’s Guide to Deep Learning with Keras
Ebook
Deep Learning with Keras: Beginner’s Guide to Deep Learning with Keras
byFrank Millstein
Rating: 3 out of 5 stars
3/5
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
Ebook
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
bySteven Cooper
Rating: 5 out of 5 stars
5/5
Practical Mathematics for AI and Deep Learning: A Concise yet In-Depth Guide on Fundamentals of Computer Vision, NLP, Complex Deep Neural Networks and Machine Learning (English Edition)
Ebook
Practical Mathematics for AI and Deep Learning: A Concise yet In-Depth Guide on Fundamentals of Computer Vision, NLP, Complex Deep Neural Networks and Machine Learning (English Edition)
byTamoghna Ghosh
Rating: 0 out of 5 stars
0 ratings
Machine Learning - A Comprehensive, Step-by-Step Guide to Intermediate Concepts and Techniques in Machine Learning: 2
Ebook
Machine Learning - A Comprehensive, Step-by-Step Guide to Intermediate Concepts and Techniques in Machine Learning: 2
byPeter Bradley
Rating: 0 out of 5 stars
0 ratings
Machine Learning: A Comprehensive, Step-by-Step Guide to Learning and Understanding Machine Learning Concepts, Technology and Principles for Beginners: 1
Ebook
Machine Learning: A Comprehensive, Step-by-Step Guide to Learning and Understanding Machine Learning Concepts, Technology and Principles for Beginners: 1
byPeter Bradley
Rating: 0 out of 5 stars
0 ratings
Machine Learning for Finance
Ebook
Machine Learning for Finance
bySaurav Singla
Rating: 0 out of 5 stars
0 ratings
Learn AI with Python: Explore Machine Learning and Deep Learning techniques for Building Smart AI Systems Using Scikit-Learn, NLTK, NeuroLab, and Keras (English Edition)
Ebook
Learn AI with Python: Explore Machine Learning and Deep Learning techniques for Building Smart AI Systems Using Scikit-Learn, NLTK, NeuroLab, and Keras (English Edition)
byGaurav Leekha
Rating: 5 out of 5 stars
5/5
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
Ebook
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
byPrateek Gupta
Rating: 0 out of 5 stars
0 ratings
Hands-on Supervised Learning with Python
Ebook
Hands-on Supervised Learning with Python
byMadeleine Shang
Rating: 0 out of 5 stars
0 ratings
Up and Running Google AutoML and AI Platform
Ebook
Up and Running Google AutoML and AI Platform
byAmit Agrawal
Rating: 0 out of 5 stars
0 ratings

Security For You

Skip carousel

How to Be Invisible: Protect Your Home, Your Children, Your Assets, and Your Life
Ebook
How to Be Invisible: Protect Your Home, Your Children, Your Assets, and Your Life
byJ. J. Luna
Rating: 4 out of 5 stars
4/5
How to Become Anonymous, Secure and Free Online
Ebook
How to Become Anonymous, Secure and Free Online
byAmy Awol
Rating: 5 out of 5 stars
5/5
Hacking : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Ethical Hacking
Ebook
Hacking : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Ethical Hacking
byKevin Clark
Rating: 5 out of 5 stars
5/5
Practical Lock Picking: A Physical Penetration Tester's Training Guide
Ebook
Practical Lock Picking: A Physical Penetration Tester's Training Guide
byDeviant Ollam
Rating: 5 out of 5 stars
5/5
Social Engineering: The Science of Human Hacking
Ebook
Social Engineering: The Science of Human Hacking
byChristopher Hadnagy
Rating: 3 out of 5 stars
3/5
Network+ Study Guide & Practice Exams
Ebook
Network+ Study Guide & Practice Exams
byRobert Shimonski
Rating: 4 out of 5 stars
4/5
CompTIA Network+ Review Guide: Exam N10-008
Ebook
CompTIA Network+ Review Guide: Exam N10-008
byJon Buhagiar
Rating: 0 out of 5 stars
0 ratings
Cybersecurity: The Beginner's Guide: A comprehensive guide to getting started in cybersecurity
Ebook
Cybersecurity: The Beginner's Guide: A comprehensive guide to getting started in cybersecurity
byDr. Erdal Ozkaya
Rating: 5 out of 5 stars
5/5
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
Ebook
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
byBruce Sterling
Rating: 4 out of 5 stars
4/5
Mike Meyers CompTIA Security+ Certification Passport, Sixth Edition (Exam SY0-601)
Ebook
Mike Meyers CompTIA Security+ Certification Passport, Sixth Edition (Exam SY0-601)
byDawn Dunkerley
Rating: 5 out of 5 stars
5/5
How to Hack Like a Pornstar
Ebook
How to Hack Like a Pornstar
bySparc Flow
Rating: 5 out of 5 stars
5/5
CompTIA Security+ Study Guide: Exam SY0-601
Ebook
CompTIA Security+ Study Guide: Exam SY0-601
byMike Chapple
Rating: 5 out of 5 stars
5/5
Remote/WebCam Notarization : Basic Understanding
Ebook
Remote/WebCam Notarization : Basic Understanding
byJeannie Eunice Franks
Rating: 3 out of 5 stars
3/5
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
Ebook
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
byDexter Jackson
Rating: 4 out of 5 stars
4/5
Hacking With Kali Linux : A Comprehensive, Step-By-Step Beginner's Guide to Learn Ethical Hacking With Practical Examples to Computer Hacking, Wireless Network, Cybersecurity and Penetration Testing
Ebook
Hacking With Kali Linux : A Comprehensive, Step-By-Step Beginner's Guide to Learn Ethical Hacking With Practical Examples to Computer Hacking, Wireless Network, Cybersecurity and Penetration Testing
byPeter Bradley
Rating: 5 out of 5 stars
5/5
Computer Networking: The Complete Beginner's Guide to Learning the Basics of Network Security, Computer Architecture, Wireless Technology and Communications Systems (Including Cisco, CCENT, and CCNA)
Ebook
Computer Networking: The Complete Beginner's Guide to Learning the Basics of Network Security, Computer Architecture, Wireless Technology and Communications Systems (Including Cisco, CCENT, and CCNA)
byBenjamin Walker
Rating: 4 out of 5 stars
4/5
Cybersecurity For Dummies
Ebook
Cybersecurity For Dummies
byJoseph Steinberg
Rating: 4 out of 5 stars
4/5
The Cyber Attack Survival Manual: Tools for Surviving Everything from Identity Theft to the Digital Apocalypse
Ebook
The Cyber Attack Survival Manual: Tools for Surviving Everything from Identity Theft to the Digital Apocalypse
byNick Selby
Rating: 0 out of 5 stars
0 ratings
CompTIA CySA+ Practice Tests: Exam CS0-002
Ebook
CompTIA CySA+ Practice Tests: Exam CS0-002
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
Wireless Hacking 101
Ebook
Wireless Hacking 101
byKarina Astudillo
Rating: 4 out of 5 stars
4/5
Dark Territory: The Secret History of Cyber War
Ebook
Dark Territory: The Secret History of Cyber War
byFred Kaplan
Rating: 4 out of 5 stars
4/5
Tor and the Dark Art of Anonymity
Ebook
Tor and the Dark Art of Anonymity
byLance Henderson
Rating: 5 out of 5 stars
5/5
The CompTIA Network+ & Security+ Certification: 2 in 1 Book- Simplified Study Guide Eighth Edition (Exam N10-008) | The Complete Exam Prep with Practice Tests and Insider Tips & Tricks | Achieve a 98% Pass Rate on Your First Attempt!
Ebook
The CompTIA Network+ & Security+ Certification: 2 in 1 Book- Simplified Study Guide Eighth Edition (Exam N10-008) | The Complete Exam Prep with Practice Tests and Insider Tips & Tricks | Achieve a 98% Pass Rate on Your First Attempt!
byComptia Ace5
Rating: 0 out of 5 stars
0 ratings
Make Your Smartphone 007 Smart
Ebook
Make Your Smartphone 007 Smart
byConrad Jaeger
Rating: 4 out of 5 stars
4/5
CompTIA CySA+ Cybersecurity Analyst Certification Passport (Exam CS0-002)
Ebook
CompTIA CySA+ Cybersecurity Analyst Certification Passport (Exam CS0-002)
byBobby E. Rogers
Rating: 5 out of 5 stars
5/5
CompTIA Network+ Certification Guide (Exam N10-008): Unleash your full potential as a Network Administrator (English Edition)
Ebook
CompTIA Network+ Certification Guide (Exam N10-008): Unleash your full potential as a Network Administrator (English Edition)
byEithne Hogan
Rating: 0 out of 5 stars
0 ratings
The Art of Intrusion: The Real Stories Behind the Exploits of Hackers, Intruders and Deceivers
Ebook
The Art of Intrusion: The Real Stories Behind the Exploits of Hackers, Intruders and Deceivers
byKevin D. Mitnick
Rating: 4 out of 5 stars
4/5
Mike Meyers' CompTIA Security+ Certification Guide, Third Edition (Exam SY0-601)
Ebook
Mike Meyers' CompTIA Security+ Certification Guide, Third Edition (Exam SY0-601)
byMike Meyers
Rating: 5 out of 5 stars
5/5
Ultimate Guide for Being Anonymous: Hacking the Planet, #4
Ebook
Ultimate Guide for Being Anonymous: Hacking the Planet, #4
bysparc Flow
Rating: 5 out of 5 stars
5/5
Keys to the Kingdom: Impressioning, Privilege Escalation, Bumping, and Other Key-Based Attacks Against Physical Locks
Ebook
Keys to the Kingdom: Impressioning, Privilege Escalation, Bumping, and Other Key-Based Attacks Against Physical Locks
byDeviant Ollam
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Crafting Interpreters With Bob Nystrom: Bob Nystrom is the author of Crafting Interpreters. I speak with Nystrom about building a programming language and an interpreter implementation for it. We talk about parsing, the difference between compiler and interpreters and a lot more. If you are...
Podcast episode
Crafting Interpreters With Bob Nystrom: Bob Nystrom is the author of Crafting Interpreters. I speak with Nystrom about building a programming language and an interpreter implementation for it. We talk about parsing, the difference between compiler and interpreters and a lot more. If you are...
byCoRecursive: Coding Stories
0 ratings
0% found this document useful
047 Interpretable Machine Learning - Christoph Molnar
Podcast episode
047 Interpretable Machine Learning - Christoph Molnar
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
What is beyond PoCs? ML project-hurdles you should be prepared to take with Balázs Kégl - 016: Why do we do PoCs all the time and why do we struggle with Real projects? We are going to talk about ML project-hurdles with the head of AI at Huawei Paris, Balazs Kegl.
Podcast episode
What is beyond PoCs? ML project-hurdles you should be prepared to take with Balázs Kégl - 016: Why do we do PoCs all the time and why do we struggle with Real projects? We are going to talk about ML project-hurdles with the head of AI at Huawei Paris, Balazs Kegl.
byMachine Learning Cafe
0 ratings
0% found this document useful
Recurrent Neural Nets: This week, we're doing a crash course in recurren…
Podcast episode
Recurrent Neural Nets: This week, we're doing a crash course in recurren…
byLinear Digressions
0 ratings
0% found this document useful
[MINI] Long Short Term Memory: Thanks to our sponsor brilliant.org/dataskeptics A Long Short Term Memory (LSTM) is a neural unit, often used in Recurrent Neural Network (RNN) which attempts to provide the network the capacity to store information for longer periods of time. An...
Podcast episode
[MINI] Long Short Term Memory: Thanks to our sponsor brilliant.org/dataskeptics A Long Short Term Memory (LSTM) is a neural unit, often used in Recurrent Neural Network (RNN) which attempts to provide the network the capacity to store information for longer periods of time. An...
byData Skeptic
0 ratings
0% found this document useful
Leveling Up Natural Language Processing with Transfer Learning: An interview with Paul Azunre about how you can use transfer learning techniques to build more flexible natural language processing systems and reduce the requirements for labelled data.
Podcast episode
Leveling Up Natural Language Processing with Transfer Learning: An interview with Paul Azunre about how you can use transfer learning techniques to build more flexible natural language processing systems and reduce the requirements for labelled data.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
Podcast episode
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
Computational Thinking & Learning Python During an AI Revolution
Podcast episode
Computational Thinking & Learning Python During an AI Revolution
byThe Real Python Podcast
0 ratings
0% found this document useful
Declarative Machine Learning For High Performance Deep Learning Models With Predibase
Podcast episode
Declarative Machine Learning For High Performance Deep Learning Models With Predibase
byThe Python Podcast.__init__
0 ratings
0% found this document useful
004 Algorithms - Intuition: Overview of machine learning algorithms. Infer/predict, error/loss, train/learn. Supervised, unsupervised, reinforcement learning. ocdevel.com/mlg/4 for notes and resources
Podcast episode
004 Algorithms - Intuition: Overview of machine learning algorithms. Infer/predict, error/loss, train/learn. Supervised, unsupervised, reinforcement learning. ocdevel.com/mlg/4 for notes and resources
byMachine Learning Guide
0 ratings
0% found this document useful
#111 The Rise of the Julia Programming Language
Podcast episode
#111 The Rise of the Julia Programming Language
byDataFramed
0 ratings
0% found this document useful
Episode 161: Trapped as a QA engineer and trapped as a generalist
Podcast episode
Episode 161: Trapped as a QA engineer and trapped as a generalist
bySoft Skills Engineering
0 ratings
0% found this document useful
#51 Francois Chollet - Intelligence and Generalisation
Podcast episode
#51 Francois Chollet - Intelligence and Generalisation
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
41. Bob Nystrom
Podcast episode
41. Bob Nystrom
byIt's All Widgets! Flutter Podcast
0 ratings
0% found this document useful
DR. JEFF BECK - THE BAYESIAN BRAIN
Podcast episode
DR. JEFF BECK - THE BAYESIAN BRAIN
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
Podcast episode
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
001 Introduction: Teaches the high level fundamentals of machine learning and artificial intelligence. I teach basic intuition, algorithms, and math. I discuss languages and frameworks, deep learning, and more. ocdevel.com/mlg/1 for notes and resources
Podcast episode
001 Introduction: Teaches the high level fundamentals of machine learning and artificial intelligence. I teach basic intuition, algorithms, and math. I discuss languages and frameworks, deep learning, and more. ocdevel.com/mlg/1 for notes and resources
byMachine Learning Guide
0 ratings
0% found this document useful
Microservices with Rafi Schloming: Microservices are a widely adopted pattern for breaking an application up into pieces that can be well-understood by the individual teams within the company. Microservices also allow these individual pieces to be scaled independently and updated in iso...
Podcast episode
Microservices with Rafi Schloming: Microservices are a widely adopted pattern for breaking an application up into pieces that can be well-understood by the individual teams within the company. Microservices also allow these individual pieces to be scaled independently and updated in iso...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
The Undocumented Web: scraping, private APIs, proxies and “alternative solutions”: What is the undocumented web? Scott and Wes dive into it, discussing APIs, faking, scraping, automation, proxies as well as tips and tricks for best practices. Kyle Prinsloo’s Freelancing & Beyond — Sponsor Kyle Prinsloo teaches you everything...
Podcast episode
The Undocumented Web: scraping, private APIs, proxies and “alternative solutions”: What is the undocumented web? Scott and Wes dive into it, discussing APIs, faking, scraping, automation, proxies as well as tips and tricks for best practices. Kyle Prinsloo’s Freelancing & Beyond — Sponsor Kyle Prinsloo teaches you everything...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
76: TDD: Don’t be afraid of Test-Driven Development - Chris May: Test Driven Development, TDD, can be intimidating to try. In this episode, Chris May shares his experience with adding testing and TDD to his work flow. His story will help lots of people overcome testing anxiety.
Podcast episode
76: TDD: Don’t be afraid of Test-Driven Development - Chris May: Test Driven Development, TDD, can be intimidating to try. In this episode, Chris May shares his experience with adding testing and TDD to his work flow. His story will help lots of people overcome testing anxiety.
byTest and Code
100%
100% found this document useful
Can we predict the accuracy of a Neural Network? Yes, with the WeightWatcher tool by Charles Martin, Ph.D. - 002: In this episode we do a deep dive into deep neural networks. What conclusions can we make looking at the distribution of eigenvalues of each layer?
Podcast episode
Can we predict the accuracy of a Neural Network? Yes, with the WeightWatcher tool by Charles Martin, Ph.D. - 002: In this episode we do a deep dive into deep neural networks. What conclusions can we make looking at the distribution of eigenvalues of each layer?
byMachine Learning Cafe
0 ratings
0% found this document useful
Data Visualization and D3.js with Irene Ros: Scott talks to Data Visualization expert Irene Ros. When she isn't contributing to the Miso Project, teaching her d3.js class, or working on making OpenVis Conf the best data visualization conference it can be, she's working on projects that focus on creating engaging interactive visual displays of information.
Podcast episode
Data Visualization and D3.js with Irene Ros: Scott talks to Data Visualization expert Irene Ros. When she isn't contributing to the Miso Project, teaching her d3.js class, or working on making OpenVis Conf the best data visualization conference it can be, she's working on projects that focus on creating engaging interactive visual displays of information.
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
MLA 020 Kubeflow: Conversation with Dirk-Jan Kubeflow (vs cloud native solutions like SageMaker) - Data Scientist at Dept Agency . (From the website:) The Machine Learning Toolkit for Kubernetes. The Kubeflow project is dedicated to making deployments of...
Podcast episode
MLA 020 Kubeflow: Conversation with Dirk-Jan Kubeflow (vs cloud native solutions like SageMaker) - Data Scientist at Dept Agency . (From the website:) The Machine Learning Toolkit for Kubernetes. The Kubeflow project is dedicated to making deployments of...
byMachine Learning Guide
0 ratings
0% found this document useful
MLA 018 Descript: (Optional episode) just showcasing a cool application using machine learning Dept uses Descript for some of their podcasting. I'm using it like a maniac, I think they're surprised at how into it I am. Check out the transcript & see how it...
Podcast episode
MLA 018 Descript: (Optional episode) just showcasing a cool application using machine learning Dept uses Descript for some of their podcasting. I'm using it like a maniac, I think they're surprised at how into it I am. Check out the transcript & see how it...
byMachine Learning Guide
0 ratings
0% found this document useful
Let's Talk About Natural Language Processing: This episode reboots our podcast with the theme of Natural Language Processing for the next few months. We begin with introductions of Yoshi and Linh Da and then get into a broad discussion about natural language processing: what it is, what some of...
Podcast episode
Let's Talk About Natural Language Processing: This episode reboots our podcast with the theme of Natural Language Processing for the next few months. We begin with introductions of Yoshi and Linh Da and then get into a broad discussion about natural language processing: what it is, what some of...
byData Skeptic
0 ratings
0% found this document useful
MLA 019 DevOps
Podcast episode
MLA 019 DevOps
byMachine Learning Guide
100%
100% found this document useful
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
Podcast episode
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
You don't know JS with Getify (Kyle Simpson): Kyle Simpson, aka @getify, is the Curriculum Manager for MakerSquare and has created a series of books called You Don't Know JS. You can read the You Don't Know JS book series for free on GitHub, but we know you'll want to buy them after you hear this interview. Kyle sets Scott straight and explains why Scott doesn't know JavaScript. It's true, he really doesn't...at least not as well as he thought!
Podcast episode
You don't know JS with Getify (Kyle Simpson): Kyle Simpson, aka @getify, is the Curriculum Manager for MakerSquare and has created a series of books called You Don't Know JS. You can read the You Don't Know JS book series for free on GitHub, but we know you'll want to buy them after you hear this interview. Kyle sets Scott straight and explains why Scott doesn't know JavaScript. It's true, he really doesn't...at least not as well as he thought!
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
018 Natural Language Processing 1: Introduction to Natural Language Processing (NLP) topics. ocdevel.com/mlg/18 for notes and resources
Podcast episode
018 Natural Language Processing 1: Introduction to Natural Language Processing (NLP) topics. ocdevel.com/mlg/18 for notes and resources
byMachine Learning Guide
0 ratings
0% found this document useful

Skip carousel

Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Tensor Flow 101
APC
Article
Tensor Flow 101
Jan 27, 2020
4 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
Precision Medicine Is Crushing Once-Untreatable Cancers
Newsweek
Article
Precision Medicine Is Crushing Once-Untreatable Cancers
Jul 26, 2019
12 min read
How AI Algorithms Could Help Design New Drugs
Futurity
Article
How AI Algorithms Could Help Design New Drugs
Apr 6, 2017
A new kind of AI algorithm—designed to work with a small amount of data—may be able to assist in the early stages of drug development. Artificially intelligent algorithms can learn to identify amazingly subtle information, enabling them to distinguis
3 min read
Building A Career In IT
PC Pro Magazine
Article
Building A Career In IT
Aug 7, 2022
8 min read
The Deep Learning Revolution For Artificial Intelligence
Facility Management
Article
The Deep Learning Revolution For Artificial Intelligence
Mar 28, 2019
3 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
Things Get Strange When AI Starts Training Itself
The Atlantic
Article
Things Get Strange When AI Starts Training Itself
Feb 16, 2024
7 min read
The Future Is Here
Business Today
Article
The Future Is Here
Oct 30, 2017
17 min read
Generative AI: What Leaders Need To Know
Rotman Management
Article
Generative AI: What Leaders Need To Know
Jan 1, 2024
12 min read
Decoding The Impact Of AI
Her World Singapore
Article
Decoding The Impact Of AI
May 5, 2023
6 min read
Intelligently Yours Truly
India Today
Article
Intelligently Yours Truly
Mar 25, 2023
1 min read
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
TechLife News
Article
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
Mar 30, 2024
4 min read
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
AppleMagazine
Article
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
Mar 29, 2024
4 min read
ARTIFICIAL INTELLIGENCE Just Who Is In Charge Around Here?
The European Business Review
Article
ARTIFICIAL INTELLIGENCE Just Who Is In Charge Around Here?
Jan 25, 2021
22 min read
Family History In The AI Era
Family Tree UK
Article
Family History In The AI Era
Apr 12, 2024
7 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Artificial Intelligence
RECOIL OFFGRID
Article
Artificial Intelligence
Dec 5, 2023
19 min read
The Risks Of The Generative AI Gold Rush
PC Pro Magazine
Article
The Risks Of The Generative AI Gold Rush
Apr 6, 2023
8 min read
The Risks Of The Generative AI Gold Rush
APC
Article
The Risks Of The Generative AI Gold Rush
May 22, 2023
8 min read
The Human Touch
Her World Singapore
Article
The Human Touch
Feb 3, 2023
9 min read
Pandora’s Box? Unleashing The Power Of AI
NZ Marketing
Article
Pandora’s Box? Unleashing The Power Of AI
Jun 22, 2023
8 min read
Embracing AI in Financial Services
Rotman Management
Article
Embracing AI in Financial Services
Jan 1, 2020
You are the Chief Science Officer at RBC and you also oversee its AI research institute. Describe the bank’s interest in this arena. There are many aspects to our interest in AI. First of all, financial services is a very data-driven business. From t
6 min read
All about AI
You South Africa
Article
All about AI
Mar 28, 2024
3 min read
AI The Pros & Cons
MacLife
Article
AI The Pros & Cons
May 23, 2023
5 min read
Mythbusting AI, What Marketers Should Really Know
AdNews
Article
Mythbusting AI, What Marketers Should Really Know
Nov 20, 2019
2 min read
Adoption of Cognitive Computing Across Various Industries
Techfastly
Article
Adoption of Cognitive Computing Across Various Industries
Dec 1, 2021
5 min read
Seed of Doubt
Business Today
Article
Seed of Doubt
Feb 6, 2018
2 min read
Exploring the Impact of Artificial Intelligence: Prediction vs. Judgment
Rotman Management
Article
Exploring the Impact of Artificial Intelligence: Prediction vs. Judgment
Jan 1, 2019
11 min read

Related categories

Skip carousel

Reviews for Getting started with Deep Learning for Natural Language Processing

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Getting started with Deep Learning for Natural Language Processing - Sunil Patel

CHAPTER 1

Understanding the Basics of Learning Process

This chapter covers the most basic aspects of machine learning. It will help you in understanding the basic mathematical representation of learning algorithm and teach you how to design a machine learning model from scratch. After constructing this model, we will understand the methods used to gauge the model’s prediction accuracy using different accuracy metrics. Going further, we will coer the bias-variance problem and diagnose such a problem with a technique called learning curves. Once we get our model correct, we need it to generalize well on unknown datasets that can be understood through a chapter on regularization. After perfecting such a model, it must be efficiently deployed to help improve speed and accuracy.

Structure

In this chapter, we will cover the following topics:

Learning from data

Error/noise reduction

Bias-variance reduction

Learning curves

Regularization

Training and inference

The three learning principles

Objective

Building a simple model for efficient training and gauging the model’s accuracy will be covered in this chapter. It will help you understand the usage of popular software and hardware acceleration for faster training and inference. We’ll end this chapter with three learning principles that are extremely important to machine learning.

Pre-requisites

I have provided some of the examples through code, and the code for this chapter are present in ch1 folder at GitHub repository (https://github.com/bpbpublications/Getting-started-with-Deep-Learning-for-Natural-Language-Processing). Basic know-ledge of the following Python packages is required to understand this chapter:

NumPy

Scikit-Learn

Matplotlib

Pandas

This chapter has one example that uses PyTorch for demonstration. If you don’t know PyTorch, we will cover it in detail in Chapter 2, Text Pre-Processing Techniques in NLP.

Learning from Data

In this data-centric world, a little improvement in the existing application can potentially help earn millions. We all remember a big prize (the $1,000,000) that Netflix gave to the winner for improving the algorithm’s accuracy by 10.06%. A similar opportunity exists in financial planning, be it Forex forecasting or trade market analysis. Minute improvements in such use cases can provide beneficial results. One must explore the entire logic behind the process to improve something; this is what we call learning from data. Machine learning is an interesting area where one can use historical data to make a system capable of identifying observed patterns in the new data. However, machine learning cannot be applied to all problems; a rule of thumb is considered to decide whether machine learning should be applied to a given problem:

There must exist a hidden pattern

We cannot find such a pattern by applying simple mathematical approaches

There must be historical/relevant data about the task concerned

In this chapter, I will start with a basic perceptron model to give you a taste of the learning model. After building the perceptron model, I will discuss Error/Noise and its detection using bias-variance and learning curves. In the learning curve, we will look at how the complexity of the algorithm helps mitigate a high-bias problem. Then, we will cover regularization techniques to achieve better generalization. All these techniques help get a better model, and then it’s time to deploy such a model efficiently. Later in the chapter, we will cover techniques for faster and better inferencing and then look at the three learning principles that are not directly related to machine learning but help in giving state-of-the-art performance. Before going further, we will briefly discuss the mathematical formalization of any supervised learning problem.

Let’s assume that we are talking about supervised learning paradigm. Supervised learning takes finite pairs of X and Y for learning. X and Y can be of the different types according to learning the goals. In the following table, we can see some examples with the nature of X and Y described for different learning problems:

Table 1.1

Here, Y is the label for X. Each X and Y is paired, as shown in the following equation:

(X, Y) = ((xi, yi),…,(xn, yn))

Where xi,…..,xn are individual data points in X, and yi,…..,yn are individual data points in Y. The main task of our hypothesis function f is to apply it over xi to predict ŷi = f(xi). The goal is to predict (ŷi) so that it is the same as or near the original label (yi) and the Error (E) between the predicted label and the original one(|Ŷ – Y|) tends to become 0. The function is (X, Y) = (xi, yi, …, xn, xn). The overall procedure to learn any function can be summarized as in the following flow diagram:

Figure 1.1: Supervise learning paradigm with all major components involved in training and evaluation.

As shown in the preceding figure, any of the hypotheses h is used for making predictions. Here, hypothesis can be anything like Linear, Logistic, Polynomial, SVM, RF, or Neural Network. Hypothesis h is also called function f in general terms.

There are also other types of learning techniques, like unsupervised learning and reinforcement learning. An unsupervised learning paradigm has no label attached to the data; it only has xi,…..,xn ∈ X, and there is no Y label. In fact, unsupervised methodologies are gaining popularity nowadays and are responsible for pushing the state-of-the-art model in the field of vision and NLP even higher. Popular models like Bert and Megatron are examples of unsupervised models. On the other hand, reinforcement learning is a technique where an agent tries to maximize the immediate or cumulative reward by learning/adapting to a given environment. We will learn and apply unsupervised learning to NLP problems in the upcoming chapters.

Implementing the Perceptron Model

Well, this chapter is a little out of sync, but we will discuss the perceptron model. Perceptron with no activation function is a linear model. The perceptron algorithm was invented by Frank Rosenblatt in 1957 at the Cornell Aeronautical Laboratory. To date, this is the most important and widely used model in the course of the machine learning.

Let’s take all the features, that is, xi,…..,xn, and try to derive a hypothesis h that maps y ⇒ ŷ., where ŷ is the predicted label and y is the original label. Our hypothesis h can be one of the simplest possible perceptron models with a linear activation function. The preliminary hypothesis can be thought of as in the following equation:

Where wi are learnable weights, wi changes as per the feedback signal received due to calculated loss, and bi is the bias term. Bias is used in the Perceptron network for phase shifting. The preceding equation can be thought of as the step function, a simple linear function if the predicted value is above the threshold then then the label is 1 else 0.

Where Step(WT X) is the Matrix product. Matrix-based computations are faster than iteration-based computations, and GPU supports Matrix-based operations well. To see whether this hypothesis can solve our problem, I have taken a hypothetical dataset and applied the discussed hypothesis. In the present model, I will use the Mean Squared Error (MSE) loss. MSE loss can be defined as:

We will use the Stochastic Gradient Descent (SGD) as the optimizer that tries to find the best solution by exploring multi-dimensional terrain. We will learn about how SGD works while discussing the workings of RMS prop (another advanced optimizer) in the upcoming chapters. Although terms like SGD and MSE might seem intimidating, these are simple mathematical equations, the purpose of which will be better understood as we move ahead in the chapter.

Generating and Understanding Fake Image Data and Binary Labels

The following fake data with two classes—0 and 1—are generated to validate the preceding hypothesis:

Figure 1.2: An imaginary dataset to demonstrate learning using a simple perceptron model.

This simple dataset is 10*10 matrix, with each location having a random float shown by variable color temperature. Class 0 has 1 at locations [5, 4], [4, 5], [5, 6], [6, 5] and 0 at locations [3, 3], [3, 8], [8, 3], [8, 8]. On the other hand, Class 1 has 0 at locations [1, 4], [4, 1], [5, 6], [6, 5] and 1 at locations [3, 3], [3, 8], [8, 3], [8, 8]. Only 8 of 100 indicative points are present, and our algorithm must differentiate between these two classes based on these 8 points. The other 92 points are considered noise to the dataset, and the algorithm must be good enough to avoid this noise.

Understanding Our First Tiny Machine Learning Model

A simple model can be visualized as follows:

Figure 1.3: Simple Perceptron model with the step function.

As shown in the figure, our model accepts 100 input features (10*10 flattened). Each input xi is multiplied by weight wi to produce the final output. The input of size [1, 100] is multiplied with the weight matrix [100, 2]; multiplication yields [1, 2] output, on which the threshold (step) is applied. The main question is, why are there two outputs? Generally, the output is provided as one-hot encoded; 0 is encoded to [1, 0], and 1 is encoded to [0, 1]. Now, let’s understand why one hot encoding. If we use only one output, where it can be 0 or 1, the model may predict 0.5 as the output every time and take this as the better alternative to accurately predicting 0 and 1. This problem worsens when the number of classes increases, but one-hot encoding forces the network to predict well in extremes and minimizes loss.

Coding the Model with PyTorch

I have used PyTorch version 0.4.1 to design the preceding simple model. PyTorch lets us design the network as per our thought process, and the following is the simple network we just discussed. If you don’t get this, we will cover these things in detail in Chapter 2, Text Processing Techniques. Here, network can be designed simply by extending the torch.nn.module class. Then, you must implement two functions, namely __int()__ and forward(). The __init__ function initializes as well as defines various layers. In our network, we have defined a simple linear layer (nn.Linear) that takes two inputs: input dimension and output dimension. In our case, the input dimension is 100, and the output dimension is 2. The following code block illustrates how to construct a simple network that takes a 100-dimensional vector as the input and produces two-dimensional outputs:

class simple_module_1(nn.Module):

def __init__(self):

super(simple_module_1, self).__init__()

self.simple_linear = nn.Linear(100,2)

def forward(self, input):

return self.simple_linear(input)

We don’t have to worry about the weight and its dimensions, as the framework will take care of that based on the input and output size.

Confirming the Convergence of the Model

After training for 200 epochs, the network determines the peculiarities between classes. As the epochs progresses, loss decreases and accuracy increases.

The following image shows the progress of Loss/ Accuracy as Epochs progresses:

Figure 1.4: The progress of loss/accuracy as Epochs progresses.

To demonstrate the simplest learning model, I have generated dummy data; you can experiment the same with the dummy_data.py Python script. I used imaginary data generator function to produce dummy data and put it into the apply_perceptron.py script. In apply_perceptron.py, many other functions for data generation, PyTorch Model, Loss Calculator, Optimizer, Accuracy Calculator, and function to render plot are present.

The following code uses mean square error as the measure of the error. It is one of the loss functions used according to learning goals. We will discuss the mathematics of these loss functions in the next section on error/noise reduction. Also, the SGD optimizer decreases loss by back-propagating error and adjusting weights:

define loss

objective = nn.MSELoss(reduce=True)

# define Optimizer

optimizer = optim.SGD(simple_module.parameters(),lr=0.01)

Error/Noise Reduction

Suppose we have a dataset, as shown in the following figure:

Figure 1.5: Data before and after applying the non-linear transformation.

Suppose we have the data shown in the preceding figure (A), where each point xi belongs to superset X, in short (xi ∈ X). To this data, we apply a transformation φ and the new dataset distribution, as shown in the same figure (B), can be given as zi = φ(xi), and (zi ∈ Z). Looking at the figure (B), we understand that the data is now linearly separable, and any linear model like linear regression or Perceptron model can separate such data. Transformations like φ are never perfect and often end up categorizing some of the points wrong. As shown in the preceding figure (B), some of the points are wrongly categorized by the separating line. Now we need measures to quantify these errors.

Mathematically, we measure such errors by loss functions like the following:

Mean squared error:

Mean absolute error:

Cross-entropy error:

The loss function can also be problem-specific. In the next chapter, we will define sequence to sequence loss, which will be used particularly for language translation. The triple loss is a custom loss function used in style transfer applications. In fact, we must provide different penalties for different cases to meet specific requirements. We will discuss such a case in this chapter.

Understanding Confusion Matrix and Derived Measures

To quantify the goodness of our model, we generally use the confusion matrix. It is a table that allows the visualization of a comparison of our algorithm’s performance. It has four values based on four cases:

True Positives (TP): These are cases in which we predicted yes, correctly

True Negatives (TN): We predicted no, correctly

False Positives (FP): We predicted yes, wrongly

False Negatives (FN): We predicted no, wrongly

The confusion matrix looks as follows:

Some of the derived performance measures like accuracy, precision, sensitivity, and specificity can be calculated as follows:

Specificity, Selectivity or True Negative Rate(TNR)=TN\/(TN+FP)

Sensitivity, Recall, Hit Rate, or True Positive Rate(TPR)=TP\/(TP+FN)

Precision or Positive Predictive Value (PPV)=TP\/(TP+FP)

Accuracy=(TP+TN)\/(TP+TN+FP+FN

Apart from these measures, the confusion matrix helps derive other measures like F1-Score, Matthews Correlation Coefficient (MCC) and informedness.

Defining Weighted Loss Function

As discussed, loss function should be designed as per the end goal, and mistakes should be penalized accordingly by providing proper weights. To understand this concept, let’s look at a few use cases:

Case 1 - A case of the grocery market: Suppose a grocery store decides to provide 25% off to their loyal customers, who are identified at the checkout counter by face recognition. Now, we gauge the accuracy of facial recognition based on the confusion matrix. In this case, our confusion matrix should be designed after considering the following:

TP and TN will have 0 penalty

FP means a customer was not very frequent but still got the discount

FN means a loyal customer’s face was not identified correctly, and so they failed to get discount

In this case, more penalties should be applied to FN cases. We could not detect a loyal customer due to a system error, and we did not provide them with a discount. It leaves a bad impression, and we could lose the customer, so a higher penalty should be imposed on FN cases. FP cases do not do any harm; if a customer accidentally gets the discount, they may like it and may visit again. FP cases may eventually enhance word-of-mouth promotion. In the loss function, if the penalty for FP cases is one, the penalty for FN cases should be 100.

Case 2 - An ATM security system: Let’s say a company is designing the ATM feature where you press your thumb and get your money. The cases are as listed:

TN and TP are normal cases with 0 penalty

FP are cases where a person did not have an account, but got access to a random account and money when they put their thumb

FN are cases when a person with an account was not recognized

Here, FP cases should be taken very seriously. If the penalty for FN is 1, the penalty for FP should be 10,000.

In the upcoming chapters, we will discuss regularization to help us reduce errors in our algorithm and improve generalization to the test data; this is the reality of real-life data. Most often, you will deal with such data, and you’ll rarely find clean data without noisy targets. The noisy target can be defined as the sum of deterministic target and some noise. Deterministic noise can be taken care of using various hypotheses or different feature sets. Noise is out of our control, and features to deal with such data are not present in our example set. Also, confusion matrix is not the only type of matrix used to quantify learning.

BLEU Score

BLEU score is another measure for the quality of Language Translation and Text Summarization. The mean Average Precision (mAP) is used to measure the quality of the object detection algorithm. The GLUE benchmark is used to measure the quality of the language embedding task. There are many such measures, which we will discuss in detail.

Bias-Variance Problem

Our goal is clear—keep error, particularly test error, minimal. Two major sources of errors are bias and variance. To understand the logic behind the learning curve, we must understand bias-variance and the trade-off between them:

Figure 1.6: Illustrating bias-variance trade-off in the data.

Bias: Bias is a source of error, as a model with higher bias pays less attention to the training data and oversimplifies the model. Mathematically, bias can be formalized as follows.

Suppose we have training data xi,…..,xn and real-valued labels yi,…..,yn associated with each data point xi. Let’s say we define a function f, which effectively estimates yi taking xi, but with minimum error (E). Here, Here can be any learning algorithm like logistic regression, Perceptron, or any complex polynomial model. The mathematical formula for the same is Ŷ = (X) + E, where is the function which is very near to ideal function for minimal error, as shown in the following equations:

Where if the error E is minimum and tends to become 0 in ideal cases. Bias can be mathematically represented as:

There exist outliers in any distribution, and our function cannot be perfect enough to take care of all the points and still generalize well. So, there will always be an error E always, regardless of the hypothesis used. Bias is also known as under-fitting, where your model is incapable of learning from data and eventually ends up with high training and test error.

Let’s look at the possible cause of the high bias:

No pattern exists: The first assumption whereby we apply machine learning is that the data must have a pattern. If there is no pattern, no hypothesis can model it, so the model ends up with high training and test error. No model can solve this problem unless more representative data is provided. There can be an issue with the imbalanced dataset.

Modeling errors: The model’s architecture is incapable of learning the intricacies of the data. There can be multiple reasons like the use of the wrong activation function or a high learning rate. There can be an issue if the data is not evenly distributed among batches and results in improper gradient propagation per batch and, so, no learning. Improper loss implementation can be one of the problems.

We can see a definite way to tackle the high bias problem:

Train longer: Sometimes, algorithms take longer to learn higher-level abstraction and then converge quickly. This scenario is observed with image segmentation problems and more with deeper convolution neural network models. Allowing the algorithm to run for longer with a lower learning rate helps with convergence.

Train complex model / new model architecture: Another reason is that the model is not complex enough to learn the patterns in the data. For example, if the feed-forward model is applied to the text sentiment analysis problem, it may perform poorly as compared to the performance of the recurrent neural network applied to the same problem. Using non-linear kernels in SVM can help mitigate the problem.

Normalizing/increasing features: Adding more characteristic features can help. This can be understood with an example. One may never be able to predict the price bucket of the car by looking at some features like its color and length, but adding features like its maximum speed, gear shifting pattern, and pickup can help. In the case of the numerical feature, normalization of the feature between 0 and 1 can help with better convergence.

Adding regularization: This technique certainly helps and is widely used in the deeper convolution neural network models. Batch normalization is a well-known technique.

Variance: As we saw in the bias-related problem, the model was not flexible/complex enough to identify the patterns in the data. Variance is on the other side, where the model just mugs up the data. In other words, the model is specifically getting used to the training data and not to generalize the test data. Variance, also known as overfitting, can be mathematically denoted as .

High variance can be solved using the following techniques:

Increase data: We experience the high variance problem when there are lower data points and highly complex models. In this case, our model learns all the intricacies of the data and training error tends to reach zero, so no more learning occurs. Such a model provides poor performance when applied to test data belonging to a slightly different distribution than the training data.

Decrease the number of features: The trained data was so properly preprocessed that it has all necessary features, and the model takes thesedirect features and learns around them. When unprocessed test data is provided, the model cannot look at the feature and performs poorly.

Regularization: The model may be fitting to noise in the data, due to which it is providing poor results on test data. We can reduce the model’s flexibility by applying constraints, which is also known as regularization. There is a definite solution in data science for this problem—the application of regularization techniques. Four types of regularization techniques can help overcome overfitting:

L1 regularization

L2 regularization

Elastic Net

Dropout

Reduce model complexity: Certain time-reducing model parameters also help a lot. Reducing the network by decreasing the layers in feed-forward network and CNN can prove helpful, and decreasing the hidden state size in the recurrent neural network can also help.

The trade-off: We already learned about the bias-variance trade-off, so it’s time to diagnose where the problem lies. Several methods can help identify such a problem, one of which is to plot the learning curve. After understanding the bias-variance problem, it is essential to understand, differentiate the problem, and solve it systematically. Mathematically, bias, variance, and some irreducible errors contribute to the overall error. This can be represented as:

To practically model the bias-variance trade-off, we will take the help of polynomial equations and demonstrate the problem by taking polynomial equations of different degrees. Predictions are made on the held-out test set, and MSE is reported. To demonstrate the bias-variance problem, we will take the help of Scikit learn functions like Polynomial Features, Linear Regression, and Pipeline. The following code block defines a function that produces labels (Y) for a given value of X. This function will help us generate example data:

def true_fun(X):

given X it will provide its mapping to Y by sing function np.cos(1.5 * np.pi * X)

:param X:

:return:

return np.cos(1.5 * np.pi * X)

We take 30 samples and calculate the polynomial of order 1, 3, 9, and 15.np.random.rand(n_samples) will generate random samples from a uniform distribution over (0, 1). The label for this function will be generated by the true_fun function:

n_samples = 30

degrees = [1, 3, 9, 15]

X = np.sort(np.random.rand(n_samples))

y = true_fun(X) + np.random.randn(n_samples) * 0.1

SciKit Learn Functions to Build Pipeline Quickly

We calculate polynomial features for each degree of the polynomial, and these polynomial features are used to perform linear regression. Such a model with polynomial features is fit for linear regression. Let’s use the pipeline function to stack polynomial features and linear regression in one pipeline. Then, we’ll use the pipeline function to predict text data:

polynomial_features = PolynomialFeatures (degree=degrees[i], include_bias=False)

linear_regression = LinearRegression()

pipeline = Pipeline([(polynomial_features, polynomial_features),(linear_regression, linear_regression)])

pipeline.fit(X[:, np.newaxis], y)

# Evaluate the models using cross-validation

scores = cross_val_score(pipeline, X[:, np.newaxis], y, scoring=neg_mean_squared_error, cv=10)

X_test = np.linspace(0, 1, 100)

plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=Model)

We can see that polynomial with degree 1 is not fitting for the data, so it is called underfitting. A polynomial with a degree higher than 3 fits the data so well that it does not generalize to the test data and ends up with higher MSE, so it is called as overfitting. The following plot makes it clear that the MSE is lowest with Degree = 3. It is the point of equilibrium between bias-variance and model complexity. You may visit the code bias_variance.py scripts to run or experiment with the code that produced the plots. The following image is generated by taking different degrees of polynomials to demonstrate bias-variance trade-off:

Figure 1.7: Demonstrating bias-variance trade-off using a polynomial of different power.

Ideally, the plot of error versus model complexity looks like this:

Figure 1.8: Showing equilibrium between bias and variance.

As the model’s complexity increases, the bias increases, and so does the variance. The model with a complexity where the bias and variance are minimum should be selected for inference. Low complexity means the model is complex enough to absorb the complexity of data, so the problem of high variance occurs when we increase the model complexity. However, if we increase it gradually, given the model complexity, both the bias and variance are minimum at a point. As we further increase it, the model’s complexity variance increases, also called model over-fitting problem.

Managing the Bias and Variance

Let’s look at certain measures that can manage the bias and variance problem:

Bagging and resampling: Bagging refers to creating various partitions of the data and using random selection with replacement, and using each sample to train a model. Test time prediction is made using all these models, and the final prediction is made based on averaging or voting. This technique is also known as bagging or bootstrap aggregating. The Random Forest model makes good use of the bagging technique. Numerous trees are constructed based on each sample partition, and the entire model’s bias is the same as that of a single tree. So, creating a forest of such

Enjoying the preview?

Page 1 of 1

Getting started with Deep Learning for Natural Language Processing: Learn how to build NLP applications with Deep Learning (English Edition)

About this ebook

Sunil Patel

Related authors

Related to Getting started with Deep Learning for Natural Language Processing

Related ebooks

Security For You

Related podcast episodes

Related articles

Related categories

Reviews for Getting started with Deep Learning for Natural Language Processing

What did you think?

Book preview

Getting started with Deep Learning for Natural Language Processing - Sunil Patel

CHAPTER 1

Understanding the Basics of Learning Process

Structure

Objective

Pre-requisites

Learning from Data

Implementing the Perceptron Model

Generating and Understanding Fake Image Data and Binary Labels

Understanding Our First Tiny Machine Learning Model

Coding the Model with PyTorch

Confirming the Convergence of the Model

Error/Noise Reduction

Understanding Confusion Matrix and Derived Measures

Defining Weighted Loss Function

BLEU Score

Bias-Variance Problem

SciKit Learn Functions to Build Pipeline Quickly

Managing the Bias and Variance