You are on page 1of 32

Software

Construction

Regular Expressions and Grammars, Parser


Generators
Overview
• Comprehend grammar productions and regular expression operators.
• Evaluate if a given grammar or regular expression matches a character
sequence.
• Create grammar or regular expressions to match character sequences
and parse them into structured data.
Regular Expressions
A regular expression (regex) is a concise pattern or sequence of characters
used for pattern matching and text processing, defining rules to search,
validate, or manipulate text data.
In essence, a regular expression serves as a template for finding, extracting,
or manipulating specific patterns or substrings within a larger body of text.
It consists of a combination of regular characters, metacharacters, and
special symbols that collectively define the pattern to be matched.
Regular Expressions Cont.
• Regular expressions are widely used in tasks such as text search and
replacement, data validation, parsing structured text, and
extracting information from unstructured text.
• They provide a concise and flexible means of handling complex text-
processing requirements.
Key components and concepts
1.Characters: Regular characters(letters, digits, symbols) represent themselves in
patterns (e.g., "a" matches "a"). The regular expression "hello" will match the string
"hello" in the input text.
2.Metacharacters: Special characters with predefined meanings Some common
metacharacters include . (matches any character), * (matches zero or more
occurrences of the preceding character or group), + (matches one or more
occurrences), ? (matches zero or one occurrence), and more.
3.Character Classes: Enclosed in square brackets, they specify sets of characters to
match (e.g., "[0-9]" matches any digit, and “[aeiou]” matches any vowel ).
4.Anchors: Anchors are used to specify the position of a match within the input text.
"^" matches the start of a line, "$" matches the end.
• Regular Characters:
• Expression: "apple"
• Matches: "apple" in "I have an apple."

• Metacharacters: •"acb" (matches)


• Expression: "a.b" •"a1b" (matches)
•"axb" (matches)
• Matches: "axb" in "a.b is a regex pattern." •"a b" (matches, as the space is
considered a character)
• Character Classes:
• Expression: "[0-9]"
• Matches: "7" in "The number is 7."
"start the engine" (matches)
• Anchors: "starting point" (matches)
• Expression: "^start" "start with a smile" (matches)

• Matches: "start" at the beginning of a line.


Key components and concepts
5. Quantifiers: Specify repetition (e.g., “{n}” (exactly n times), “{n,}” (at least
n times), and “{n,m}” (between n and m times), "{3}" for exactly three
times, "{1,3}" for one to three times).
6. Grouping: Parentheses “()” create sub-patterns for grouping.
7. Alternation: "|" represents choices (e.g., "cat|dog" matches "cat" or "dog").
8. Escape Sequences: Backslashes “\" escape metacharacters (e.g., "\."
matches a period).
• Quantifiers:
• Expression: "\d{3}"
• Matches: "123" in "The code is 12345."

• Grouping:
• Expression: "(apple|banana)"
• Matches: "apple" or "banana" in "I like apple and banana."

• Alternation:
• Expression: "cat|dog"
• Matches: "cat" or "dog" in "I have a cat and a dog."

• Escape Sequences:
• Expression: "\\."
• Matches: "." in "The price is $5.99."
Example
• The regular expression \d{3}-\d{2}-\d{4} matches a string in the format of "123-
45-6789," which is a common pattern for Social Security numbers.

• Regular expressions are an essential tool for tasks like data validation (e.g., email
or phone number validation), text parsing (e.g., extracting data from log files), and
text searching (e.g., finding specific words or patterns in a document). They are
supported in many programming languages, including Python, JavaScript, Java,
and Perl, among others.
Matching an Extracting
Validating Parsing CSV Matching
Email Phone
URLs: Data HTML Tags
Address Numbers
Matching an
Email Address

Expression:^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$

Matches: john.doe@example.com

Description: Validates email addresses by matching the common


email pattern.
Extracting Phone
Numbers

Expression: (\d{3}-\d{3}-\d{4})|(\(\d{3}\)\s*\d{3}-\d{4})

Matches: "555-123-4567" or "(555) 123-4567“

Description: Captures phone numbers in both hyphen and


parentheses formats.
Validating URLs

Expression: ^(https?|ftp)://[^\s/$.?#].[^\s]*$

Matches: https://www.example.com/page

Description: Validates URLs, allowing both HTTP and FTP


protocols.
Parsing CSV
Data

Expression: ("[^"]*"|[^,]+)(,|$)

Matches: "John Doe", "30", "Male" in "John Doe,30, Male“

Description: Parses CSV (Comma-Separated Values) data into


individual fields.
Matching HTML
Tags

Expression: <\s*([a-zA-Z0-9]+)\s*>.*<\/\s*\1\s*>

Matches: "<div>Content</div>“

Description: Matches and captures HTML tags and their


corresponding closing tags.
These examples illustrate the versatility of regular expressions in
tasks such as validation, data extraction, and pattern matching,
combining various components and concepts.
Software Applications That Demonstrate
The Use Of Regular Expressions
Text Editors and IDEs (Python code using re
module):
• import re

• text = "Hello, World! This is a sample text."


• pattern = r"sample \w+"
• result = re.search(pattern, text)
• print(result.group()) # Matches "sample text"
Programming Languages (JavaScript code):
• const text = "Email: john@example.com";
• const pattern = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]
{2,7}\b/;
• const match = text.match(pattern);
• console.log(match[0]); // Matches "john@example.com"
Database Management (SQL query):

• SELECT * FROM users WHERE email REGEXP '^[A-Za-z0-9._%+-]


+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$';
Web Development (Python code using
Django):
• import re

• url = "https://www.example.com/page/123"
• pattern = r"/page/(\d+)"
• match = re.search(pattern, url)
• page_number = match.group(1) # Extracts "123"
Data Science and Analytics (Python code
using Pandas):
• import pandas as pd

• data = pd.DataFrame({'text': ['apple123', 'banana456', 'cherry789']})


• pattern = r'(\D+)(\d+)'
• data[['fruit', 'number']] = data['text'].str.extract(pattern)
Network Security (Python code using Scapy
for packet analysis):
• from scapy.all import *
• packets = rdpcap('network_traffic.pcap')
• pattern = r"user=([\w\d]+)&password=([\w\d]+)"
• for packet in packets:
• if 'HTTP' in packet and 'POST' in packet:
• match = re.search(pattern, str(packet['HTTP']['Raw']))
• if match:
• print("Username:", match.group(1))
• print("Password:", match.group(2))
Command-Line Tools (Bash script):
• # Extract email addresses from a file
• cat input.txt | grep -oE "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-
z]{2,7}\b"
Text Processing Software (Python code using
Tesseract OCR)
• from PIL import Image
• import pytesseract

• image = Image.open('sample.png')
• text = pytesseract.image_to_string(image)
Content Management Systems (CMS)
(WordPress URL rewriting):
• RewriteEngine On
• RewriteRule ^news/([0-9]+)/?$ index.php?news_id=$1 [L]
Search Engines (Python code for search query
parsing):
• import re

• query = "apple OR banana NOT cherry"


• pattern = r"(\w+)|([AND|OR|NOT]+)"
• tokens = re.findall(pattern, query)
Compilation
Compilation is the way toward changing over source code into object code. This is finished with the assistance of
the compiler. The compiler checks the source code for linguistic or basic mistakes, and if there are no blunders in
the source code, it creates the objective code.
Process Semantic
Analyzer

Intermediate code
generator
3 Address
Lexical code
Analyzer
Code Optimizer
Syntax
Analyzer
Target Code
generation
Error
Handling Symbol Table

HLL LLL
Parser
• A parser is a software component or tool that is used to analyze and
interpret the structure of a given input, typically in the form of text or
code, based on a specific grammar or syntax.
• The primary purpose of a parser is to break down the input into its
constituent elements and determine whether it conforms to the expected
structure defined by the grammar.
Parser
• Parsers can be classified into different types based on their parsing
techniques, including recursive descent parsers, LL parsers, LR parsers,
and more. The choice of parser and parsing technique depends on the
complexity of the language or grammar being parsed and the specific
requirements of the application.
Cont
• Parser is a compiler that is used to break the data
into smaller elements coming from lexical
analysis phase.

• A parser takes input in the form of sequence of


tokens and produces output in the form of parse
tree.

• Parsing is of two types: top down parsing and


bottom up parsing.
Predictive parser

You might also like