Data Codes

Er. Prateek Solanki

1

• • • • • • • • •

Data Forms Data conversion and representation Data Formats Alphanumeric Data Image Data Audio Data Data Input Data Compression Internal Computer Data Format

2

• Human communication
Includes language, images and sounds

• Computers
Process and store all forms of data in binary format

• Conversion to computer-usable representation using data formats
Define the different ways human data may be represented, stored and processed by a computer

3

4 .

g.. MPEG) 5 . Motion Pictures Experts Group. Adobe PostScript) Invented by an international standard organization (e..g.g.. Microsoft Word.• Proprietary formats Unique to a product or company E. Word Perfect • Standards (evolve in two ways): Proprietary formats become de facto standards (e.

AVI. WMA Page description Video PDF (Adobe Portable Document Format). TrueType WAV. WMV 6 . HTML. EBCDIC GIF (graphical image format) TIF (tagged image file format) PNG (portable network graphics) Image (object) Outline graphics and fonts Sound PostScript. MP3. ASCII. MIDI. RealVideo. JPEG. SWF (Macromedia Flash). XML Quicktime.Type of Data Alphanumeric Image (bitmapped) Standard(s) Unicode. MPEG-2. SVG PostScript.

).. punctuation (!. number digits (0.9). T). special purpose characters ($. &) • Four codes/standards to represent letters and numbers: BCD (Binary-Coded Decimal) Unicode ASCII (American Standard Code for Information Interchange) EBCDIC (Extended Binary Coded Decimal Interchange Code) 7 .• Characters (r. .

BCD  ASCII  EBCDIC  Unicode  Next 2 slides 8 .

 Four bits per digit Note: the following 6 bit patterns are not used: 1010 1011 1100 1101 1110 1111 Digit 0 1 Bit pattern 0000 0001 2 3 4 5 6 0010 0011 0100 0101 0110 7 8 9 0111 1000 1001 9 .

 709310 = ? (in BCD) 7 0 9 3 0111 0000 1001 0011 10 .

BCD  ASCII  EBCDIC  Unicode  Next 13 slides 11 .

• • • • Developed by ANSI (American National Standards Institute) Defined in ANSI document X3. standard punctuation characters Plus small set of accents and other European special characters (Latin-I ASCII) 12 . Arabic numerals.4-1977 7-bit code 8th bit is unused (or used for a parity bit or to indicate “extended” character set) • 27 = 128 different codes • Two general types of codes: 95 are “Printing” codes (displayable on a console) 33 are “Control” codes (control features of the console or communications channel) • Represents Latin alphabet.

< = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 13 . / 011 0 1 2 3 4 5 6 7 8 9 : . .0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + .

< CR GS = SO RS . > SI US / ? 100 @ A B C bit D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 14 .0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 Least 1100 1101 1110 1111 000 001 010 011 NULL DLE 0 SOH DC1 ! 1 STX DC2 " 2 ETX DC3 # 3 EDT DC4 Most $ significant 4 ENQ NAK % 5 ACK SYN & 6 BEL ETB ' 7 BS CAN ( 8 HT EM ) 9 LF SUB * : VT ESC + . significant bit FF FS .

. .e.g. / 011 0 1 2 3 4 5 6 7 8 9 : . < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 15 . ‘a’ = 1100001 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + .

95 Printing codes 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + . < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 16 . . / 011 0 1 2 3 4 5 6 7 8 9 : .

33 Control codes 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + . < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 17 . . / 011 0 1 2 3 4 5 6 7 8 9 : .

/ 011 0 1 2 3 4 5 6 7 8 9 : .Alphabetic codes 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + . . < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 18 .

. < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 19 .Numeric codes 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + . / 011 0 1 2 3 4 5 6 7 8 9 : .

< = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 20 . .Punctuation. 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + . / 011 0 1 2 3 4 5 6 7 8 9 : . etc.

MSD LSD 0 1 2 3 4 5 0 NUL SOH STX ETX EOT ENQ 1 DLE DC1 DC2 DC3 DC4 NAK 2 SP ! “ # $ % 3 0 1 2 3 4 5 4 @ A B C D E 5 P Q R S T U 6 7 p a b c d e W r s t u 6 7 8 9 A B C D E F ACJ BEL BS HT LF VT FF CR SO SI SYN ETB CAN EM SUB ESC FS GS RS US & ‘ ( ) * + . . / 6 7 8 9 : . < = > ? F G H I J K L M N O V W X Y Z [ \ ] ^ _ f g h i j k l m n o v w x y z { 7416 111 0100 | } ~ DEL 21 .

H e l l o . w o r l d = = = = = = = = = = = = ASCII 1001000 1100101 1101100 1101100 1101111 0101100 0100000 1110111 1101111 1110010 1101100 1100100 Hex 919766CDEB1077DFCB664 1001000 1100101 1101100 1101100 1101111 0101100 9 1 9 7 6 6 C D E B 0100000 1110111 1101111 1110010 1101100 1100100 1 0 7 7 D F C B 6 6 4 22 .

CR  LF  HT  DEL  NULL  0D 0A 09 7F 00 carriage return line feed horizontal tab delete null Hexadecimal code 23 .

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + . < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 24 . . / 011 0 1 2 3 4 5 6 7 8 9 : .

BCD  ASCII  EBCDIC  Unicode  Next 3 slides 25 .

• 8-bit code • Developed by IBM • IBM and compatible mainframes only • Rarely used today (common in archival data) Character codes differ from ASCII ASCII EBCDIC Space 2016 4016 A 4116 C116 • Conversion software to/from ASCII available b 6216 8216 26 .

27 .

28 .

BCD  ASCII  EBCDIC  Unicode  Next 2 slides 29 .

• Most common 16-bit form represents 65.536 characters • ASCII Latin-I subset of Unicode Values 0 to 255 in Unicode table • Multilingual: defines codes for Nearly every character-based alphabet Large set of ideographs for Chinese. Japanese and Korean Composite characters for vowels and syllabic clusters required by some languages • Allows software modifications for local-languages 30 .

31 .