You are on page 1of 49

# Introduction to Computer and Program Design

Lesson 4
Data Types
James C.C. Cheng
Department of Computer Science
National Chiao Tung University
The basic data types
 There are 13 basic data types in C
Name Size (byte) Range
char 1 -128 to 127
unsigned char 1 0 to 255
short 2 -32768 to 32767
unsigned short 2 0 to 65535
int 4 -231 to 231 - 1
unsigned int 4 0 to 232 - 1
long 4 -231 to 231 - 1
unsigned long 4 0 to 232 - 1
__int64, long long 8 -263 to 263 – 1
unsigned __int64 8 0 to 264 - 1
float 4 ±(1.175494351e-38 to 3.402823466e38 )
double 8 ±(2.2250738585072014e-308 to
1.7976931348623158e308 )
long double 12 in DevC++, 8 in MSC x86 80-bit extended precision format
 In C++, bool represents boolean value, true or false.
 The size of bool depends on compiler, 1 byte in most case. 2
The basic data types
 sizeof operator
 return the number of byte for a variable or a data type
size_t sizeof( name );
 size_t
 In 32-bit environment: unsigned long
 In 64-bit environment: unsigned long long
 sizeof(type name)
 ex: sizeof(int); sizeof(double);
 sizeof(variable name)
 ex: int x; sizeof(x); double d; sizeof(d);
Integers
 The integer data types are:
 bool, char, short, int, long, __int64
 bool in C99: #include <stdbool.h>
 unsigned char, unsigned short, unsigned int, unsigned long,
unsigned __int64

4
Integers
 Decimal system
 Each digit = {0, 1, …, 9 }
 347210 = ( 3 x 103 ) + ( 4 x 102 ) + ( 7 x 101 ) + ( 2 x 100 )
 Binary system
 Each digit = {0, 1}
 Binary  Decimal
 10112 = (1 x 23)10 + (0 x 22)10 + (1 x 21)10 + (1 x 20)10 = 1110
 Decimal  Binary
 13 mod 2 = 1
 13/2 = 6 mod 2 = 0
1310 = 11012
 6/2 = 3 mod 2 = 1
 3/2 = 1 mod 2 = 1
 1/2 =0
 Addition: 1+0 = 12 ; 1+1 = 102
 Subtraction: 1-0 = 12 ; 10-1 = 12
5
Integers
 Each digit = {0-9, A, B, C, D, E, F}
 Hex  Decimal
 3A7C = (3 x 163)10 + (10 x 162)10 + (7 x 161)10 + (12 x 160)10
= 1497210
 Decimal  Hex
 Method 1: the same as Dec binary
 Method 2: decbinhex
 (2+7)16 = 916
 (1+9)16 = A16
 (2+B)16 = D16
 (9+9)16 = 1216
 Subtraction
 (5-2)16 = 316
 (F-2)16 = D16
 (10-1)16 = F16 6
Integers
 Binary  Hex Hex Binary

0 0000
 Grouping each 4 digits from the smallest digit
1 0001
 11001110101101 = 0011,0011,1010,1101
2 0010
 Check the hex  bin table 3 0011
 0011,0011,1010,1101 = 33AD 4 0100

5 0101

6 0110

7 0111

8 1000

9 1001

A 1010

B 1011

C 1100

D 1101

E 1110

F 1111
7
Integers
 The limits of unsigned integers
 1 bit: #include <limits.h>
 0, 1 …
 4 bit: unsigned char uc = UCHAR_MAX;
unsigned short us = USHRT_MAX;
 0 ~ (24 – 1)10 = 0 ~ 1510
unsigned int u = UINT_MAX;
 1 byte: unsigned long ul = ULONG_MAX;
 0 ~ (28 – 1)10 = 0 ~ 25510 unsigned long long ull = ULLONG_MAX;
 2 byte:
 0 ~ (216 – 1)10 = 0 ~ 65,53510
 4 byte:
 0 ~ (232 – 1)10 = 0 ~ 4,294,967,29510
 8 byte:
 0 ~ (264 – 1)10 = 0 ~ 18,446,744,073,709,551,61510

8
Integers
 The limits of signed integers
#include <limits.h>

char c = CHAR_MIN;
c = CHAR_MAX;
short s = SHRT_MIN;
s = SHRT_MAX;
int i = INT_MIN;
i = INT_MAX;
long l = LONG_MIN;
l = LONG_MAX;
long long ll = LLONG_MIN;
ll = LLONG_MAX;

9
Integers
 Signed integer
 How to store a signed integer?
 Method 1: Sign bit

1 0 0 1 1 0 1 1 = -27
Sign bit
0 0 0 1 1 0 1 1 = 27

##  The range is ( 0 ~ 27 – 1) =  (0 ~ 127)

 pros: simple
 cons:
 negative zero
1001,10112 + 0001,10112 = 1011,01102  -27+27 = -54 …!?

10
Integers
 Signed integer
 Method 2: One’s complement
 27 = 0001,10112
 -27 = -(0001,10112)  0/1 inversion  1110,01002
 The range is ( 0 ~ 27 – 1) =  (0 ~ 127)
 Pros:
 Simple
 No addition and subtraction problems
1110,01002 + 0001,10112 = 1111,11112  -27 + 27 = -0
1110,01002 + 0000,00012 = 1110,01012  -27 + 1 = -26
 Cons:
 Negative zero

11
Integers
 Signed integer
 Method 3: Two’s complement
 27 = 0001,10112
 -27 = 1,0000,00002 - 0001,10112 = 1110,01012
 The range is - 27 ~ -1, 0, 1 ~ (27 – 1) = -128 ~ 127
 Pros:
 No negative zero
 No addition and subtraction problems
1110,01012 + 0001,10112 =
1,0000,00002 (max 8 bit) = 0000,00002  -27 + 27 = 0
1110,01012 + 0000,00012 = 1110,01102
= -(1,0000,00002 - 1110,01102 ) = -(0001,1010)2  -27 + 1 = -26
 Cons:
 It needs more conversion cost
12
Integers
 Constants:
 bool bT = true, bF = false, bTrue = 5438, bFalse = 0;
 The zero is false; Non-zero is true.
 char cA = ‘A’, cB = 66, cC= 65603, cD = 0x43, cE = 0105, cEnter = ‘\n’;
 Hex constant: Using the prefix "0x"
 Octal constant: Using the prefix "0"
 short i = 10, j = 32767, k = -32768, s = 32768, t = -32769;
 int x = 10, y = -100, z = 0x02AB2F;
int nMax = 2147483647;
int nMinA = -2147483648; // It causes a warning message
- 2147483648  - ( 2147483648 )

## int nMinB = (-2147483647 – 1); // OK!

13
Integers
 Constants:
 long n = 120, m = 12345L, a = 0xFFFFL;
 long integer constant : Using the suffix “L”
 __int64 nn = 10, mm = 9876543210LL;
 long long constant : Using the suffix “LL”
 64-bit integer in scanf and printf: ll or I64

__int64 nn = 10;
long long mm = 9876543210LL ;
scanf("%I64d%lld", &nn, &mm);
printf("%lld, %I64d \n", nn, mm); // OK, using “ll” or “I64”
// Notice that "l" is lowercase. "L" means "long double", %Ld ==> %d

nn = 7; mm = 1;
printf("%d, %I64d \n", nn, mm); // In x86, the second output is wrong!?

07, 00, 00, 00 00, 00, 00, 00 01, 00, 00, 00 00, 00, 00, 00

14
nn mm
Integers
 Constants:
 unsigned char, unsigned short and unsigned int
unsigned char uc = -1;
unsigned short us = -1;
unsigned int ui = -1;
printf("%u, %u, %u\n", uc, us, ui);

##  unsigned long : The suffix is “UL”

unsigned long un0 = 123UL, un1 = -1UL;
printf("%u, %u\n", un0, un1);
 unsigned __int64 :
 In Dev C++, the suffix is “LLU”
 In VC++, the suffix is “ULL”
unsigned long long unn0 = 9876543210000001234LLU
unsigned long long unn1 = -1LLU;
printf("%llu, %I64u\n", unn0 , unn1 ); 15
Integers
 integers in printf and scanf
 In printf
 <= 4byte argument  4byte
 long long  8 byte
 %d: singed decimal
 %i: signed decimal integer (In scanf, including hex and octal)
 %u: unsigned integer
 %o: unsigned octal integer
 %x: unsigned lowercase hex integer (in scanf, including lowercase and uppercase)
 %X: uppercase unsigned hex integer (Only in printf)
 %c: character
 %p: Address in hex digits. 32-bit: 8-digit, 64-bit: 16-digit (only in printf)
 %n: output the number of characters written/read so far, the argument shall
be an integer point.
 In VC++, this function is disabled. 16
Integers
 integers in printf and scanf
 data length:
 hh: chat
 h: short
 l: long
 ll long long

char c = -1;
printf("%02hhX\n", c); // Failed in DevC++
short s = -1;
printf("%04hX\n", s);
long l = -1L;
printf("%08lX\n", l);
long long ll = -1LL;
printf("%016llX\n", ll);
17
Integers
 Implicit typecast
 small  large

short s1 = 10;
int n1 = 0xFF0000;
if(n1 > s1) // short  int
printf("Hello\n");

## long long nn = 0x7FFFFFFF00000000ULL;

if(nn > n1) // int  long long
printf("World!\n");

 signed  unsigned

unsigned int u= 0;
int i = -1;
if(i>u) // int  unsigned int
printf("-1 > 0\n");
18
Float point numbers
 float , 32-bit IEEE-754 float point number
 double , 64-bit IEEE-754 float point number
 long double
 In VC++, long double = double
 In GCC 4.3 or above, sizeof( long double) is 12 byte,
 In x86 environment, it only uses 10 byte
 Why it need 12 byte? In 32-bit environment, the data access
unit is 4 byte.
 x86 80-bit extended precision format

19
Float point numbers
 Decimal fraction  Binary?
0.5 10 = 2-1 10 = 0.1 2
0.25 10 = 2-2 10 = 0.01 2
0.125 10 = 2-3 10 = 0.001 2
….
 0.10112 = (1 x 2-1) 10 + (0 x 2-2) 10 + (1 x 2-3) 10 + (1 x 2-4) 10
= 0.510 + 0.12510 + 0.062510
= 0.687510
 0.37510 = 0.011
0.375 * 2 = 0.75
0.75 * 2 = 1.5
0.5 * 2 = 1.0
 Some decimal fraction cannot be converted to binary system
0.410 = 0.001100110011……2
= 0.00112
20
Float point numbers
 Decimal fraction  Binary?
integer part  binary
fraction part  binary
 13.562510 = 1101.10012
 1310 = 1101
 0.562510 = 0.10012

21
Float point numbers
 IEEE 754 (The IEEE Standard for Floating-Point Arithmetic)
 IEEE, Institute of Electrical and Electronics Engineers 國際電子電機學會
 32-bit IEEE-754
 Sign: 1bit
 Exponent: 8 bit (127 offset)
 Fraction: 23 bit
 64-bit IEEE-754
 Sign: 1bit
 Exponent: 11 bit (1023 offset)
 Fraction: 52 bit
mantissa

22
Float point numbers
 IEEE 754 (The IEEE Standard for Floating-Point Arithmetic)
 Example: 13.562510
= 1101.10012
= 1.10110012 * 23

## For 32 bit float:

= (-1)0 * 1.10110012 * 2130-127
Sign = 0
Exponent = 130 = 100000102
Fraction = 10110010…02
01000001010110010000000000000000 0x41590000
, , , , , , ,

## For 64 bit float:

= (-1)0 * 1.10110012 * 21026-1023
Sign = 0
Exponent = 1026= 100,0000,00102
Fraction = 10110010…02
0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 0 0 … 0 0x402B20...0
, , , , , ,
23
Float point numbers
 x86 0-bit extended precision format ( for gcc's long double)
 Sign: 1bit
 Exponent: 15 bit (16383 offset)
 Integer part: 1bit
 In 80387 or above, this bit always be 1
 Fraction: 63 bit
… 1 …
Sign Exponent Integer Fraction

##  Value = (-1)sign  2exponent – 16383  (1.fraction)2

 Reference:
 http://en.wikipedia.org/wiki/Extended_precision

24
Float point numbers
 13.562510 + 0.1562510 = 13.7187510
= (-1)0 * 23 * 1.10110012 + (-1)0 * 2-3 * 1.012
= 23 * 1.10110012 + 23 * 2-6 * 1.012
= 23 * 1.10110012 + 23 * 0.000001012
= 23 * 1.101101112
= 13.7187510

##  13.562510 - 0.1562510 = 13.4062510

= (-1)0 * 23 * 1.10110012 + (-1)1 * 2-3 * 1.012
= 23 * 1.10110012 + (-1) * 23 * 2-6 * 1.012
= 23 * 1.10110012 + (-1) * 23 * 0.000001012
} 2's complement
= 23 * 1.10110012 + 23 * 1.111110112
= 23 * 11.101011012 (Overflow) 23 * 1.101011012
= 13. 4062510
25
Float point numbers
 Truncation and Rounding
 Example:
1234567.210
= 1,0010,1101,0110,1000,0111. 0011,0011,0011... 2
= 1. 0010,1101,0110,1000,0111 0011,0011,0011... 2 * 220
For 32 bit float:
= (-1)0 * 1. 0010,1101,0110,1000,0111 0011,0011,0011... 2 * 2147-127
Sign = 0
Exponent = 147 = 100100112
Fraction (23 bit) = 0010,1101,0110,1000,0111 0011,0011,0011 2 Truncation
= 0010,1101,0110,1000,0111 010 2 Rounding (1234567.25)

1234567.1510
= 1,0010,1101,0110,1000,0111. 0010,0110,0110,... 2
= 1. 0010,1101,0110,1000,0111 0010,0110,0110... 2 * 220
For 32 bit float:
= (-1)0 * 1. 0010,1101,0110,1000,0111 0010,0110,0110... 2 * 2147-127
Sign = 0
Exponent = 147 = 100100112
Fraction (23 bit) = 0010,1101,0110,1000,0111 0010,0110,0110 2 Truncation
= 0010,1101,0110,1000,0111 001 2 (1234567.125) 26
Float point numbers
 Limits
Float Values (b = bias)

## Sign Exponent (e ) Fraction (f ) Value

0 00..00 00..00 0
00..01~ Positive Denormalized Real
0 00..00
11..11 0.f × 2(-b+1)
00..01~ Positive Normalized Real
0 XX..XX
11..10 1.f × 2(e-b)
0 11..11 00..00 +Infinity

00..01~
0 11..11 NaN
11..11

27
Float point numbers
 Limits

## Sign Exponent (e) Fraction (f) Value

1 00..00 00..00 -0

## 00..01~ Negative Denormalized Real

1 00..00
11..11 -0.f × 2(-b+1)
00..01~ Negative Normalized Real
1 XX..XX
11..10 -1.f × 2(e-b)
1 11..11 00..00 -Infinity
00..01~
1 11..11 NaN
11.11

28
Float point numbers
 Limits
#include <float.h>

float f = FLT_MIN;
f = FLT_MAX;
double d = DBL_MIN;
d = DBL_MAX;
long double ld = LDBL_MIN;
ld = LDBL_MAX;
printf("%f\n", f);
printf("%f\n", d);
printf("%Lf\n", ld); // only in GCC 4.3 or above

29
Float point numbers
 printf()
 float will be converted to double
 %f, %e, %E, %g, and %G need 8 byte argument
 %+width+.precision + {f/e/E/g/G},
 width: the minimum number of characters printed, if width is not
given, all characters of the value are printed.
 precision:
 for f/e/E : the number of digits after the decimal point,
 for g/G: the maximum number of significant digits printed.
 default is six
float r = 0.00000123;
printf("%10.8f, %4.2e, %4.2g\n", r, r, r);

Output:
0.00000123, 1.23e-006, 1.2e-006
30
Float point numbers
 scanf()
 %f: float, 4 byte data
 %lf: double 8 byte data
 %Lf: long double
 Only in GCC 4.3 or above  *.c file
 G++ does not accept %Lf

31
Float point numbers
 Constants:
 float f = 1.1234f;
 Using the suffix "f"
 double r0 = 1.12345, r1 = 123, r2 = 0x64, r3 = 123LLU;
 Without any prefix and suffix
 Typecast
 integer  float
float fx = 33.3f;
printf("%f\n", fx * 2 / 3 );

 float  double
#include <float.h>

float fx = 0.0f;
double dx = fx + DBL_MIN;
printf("%f\n", dx);
32
Float point numbers
 Never check a floating number to EQUAL a value
float A = 1.2f;
float B = 12.0f;
scanf("%f%f", &A, &B); // type 1.2 and 12
float D = A - B / 10.0f;
// D = 1.2 – 12/10.0 should be zero
if(D == 0.0) // Oops!
printf("!\n");
else
printf("%f\n", D); // This line will be show up

33
Float point numbers
 Be careful with using a floating number to be an iterator

float i;
for(i=0.0f; i<=1.0f; i+=0.1f)
printf("%f\n", i);

## // How many lines will be displayed?

34
Union
 One memory space to be accessed as different data types
 The size of the union is at least the size of the largest member.

union U{
unsigned long i;
float f;
};

int main(){
U u;
u.f = 13.5625f;

## printf("%08X\n", u.i); // 41590000

printf("%08X\n", u.f); // In x86: 00000000
printf("%08X%08X\n", u.f); // In x86: 00000000402B2000

## // Three printing method for float. Which one is the best?

};
35
Characters
 ASCII (American Standard Code for Information Interchange)
 '0'~'9': 48~57
 'A'~'Z': 65~90 char a = 50, b = 70, c= 100;
printf("%c%c%c\n", a, b, c);
 'a'~'z': 97~122
 http://en.wikipedia.org/wiki/ASCII
 Constants:
 Using the single quotation marks ' '

char a = 'A';
printf("%c: %d\n", a, a);
a = '9';
printf("%c: %d\n", a, a);
a = 'xyz';
printf("%c: %d\n", a, a);

## a = "uvw"; // Compiling error!

36
Characters
 Escape Sequences
char a = '\n', b = '\t', c= '\b';
 \n: (10) newline
char d = '\r', i = '\'', j = '\"';
 \t: (9) tab char f = '\\', g = '\0';
 \b: (8) backspace
 \r: (13) return key
 In MS Windows, the end of line consists of two characters: \r\n (13, 10)
 \': single quotation
 \": double quotation
 \\: backslash
 \0: null
 \OOO: OOO is 3-digit octal ASCII
 char e = '\050',
 \xOO: OO is 2-digit hex ASCII
 char i = '\x41',
37
Strings
 A character array
 The last character of a string must be \0
 Constant string: Using the double quotation marks " "
 String declaration and initialization:
char s1[5] = "abcd"; // a writable char array
char s2[5] = {'A', 'B', 'C', 'D', 0}; // a writable char array
char s3[] = "xyz"; // a writable char array
char s4[] = { 'X', 'Y', 'Z', 0}; // a writable char array
char* s5 = "uvw"; // a char pointer that points a read-only data

## s5[0] = 'T'; // Runtime error

char* s6 = { 'U', 'V', 'W', 0}; // Compiler error

38
Strings
 Misuse
char s1[] = "abcd";
char s2[] = "ABCD";
s1 = s2;
// Compiling error! The array name cannot be an l-value

## char *s1 = "abcd";

char s2[] = "ABCD";
printf("%s\n%s\n", s1, s2);

## s1 = s2; // The pointer can be an l-value

printf("%s\n%s\n", s1, s2); // ABCD, ABCD
s2[0] = 'T'; // Watch the side effect!
printf("%s\n%s\n", s1, s2); // TBCD, TBCD

side effect
A expression returns one or more additional values.
That means it modifies some observable state.
39
Strings
 scanf()
char A[20] = {0}; // Declare a string of 20 zero characters
scanf("%s", A); // type "ABC EFG XYZ"
printf("%s\n", A);

## char A[20] = {0}, B[20] = {0}, C[20] = {0};

scanf("%s%s%s", A , B, C); // type "ABC EFG XYZ"
printf("%s %s %s\n", A, B, C);

##  %[ ]: Read the specified characters

 Input ends when a non-matching character is reached or the field width is reached.

## char A[20] = {0}, B[20] = {0}, C[20] = {0};

scanf("%[0-9]%[A-Z]%[a-z]", A , B, C);
printf("%s%s%s\n", A, B, C);

## scanf("%[ -~, '\t']", A);

// Read ASCII 9 and 32 to 126
printf("%s\n", A);
scanf("%[ -~, '\t', '\n' ]", A); // Oops….

40
Strings
 scanf()
 %[^ ]: Read the characters except the specified characters

## char A[20] = {0}, B[20] = {0}, C[20] = {0};

scanf("%[^0-9]%[^A-Z]%[^a-z]", A , B, C);
printf("%s %s %s\n", A, B, C);

## scanf("%[^'\n']", A); // Read all characters except '\n'

printf("%s\n", A);

41
Strings
 gets()
 char* gets ( char *s);
 It reads characters from the stdin and stores them into s until a newline
character or the end-of-file is reached.
 On success, it returns s. Otherwise, it returns NULL

#include <stdio.h>

char A[100] = {0};
while( gets(A) != NULL ){
printf("%s\n", A);
}

42
Typecast
 Change the data type of variable
 x = (x's type name) y;
 Ex:

int n = 1234;
float f = (float) n;
char c = (char) f;
double d = (double) c;

43
#define
 The text replacement
 Usage:
#define replacement target_text

 Ex:
#define PI 3.14159
#define EXP 2.71828
#define NULL_STR

double p = PI;
printf("%f\n", p); // 3.14159
printf("%f\n", EXP); // 2.71828
printf("%f\n", NULL_STR); // Compile Error
printf("%f\n", NULL_STREXP); // Compile Error
printf("%f\n", NULL_STR EXP); // 2.71828 44
#define
 Macro, 巨集
 Ex:
#define MIN_INT(x) (-2147483647 – 1)
#define INC(x) (++x)
#define MIN(x, y) (x<y?x:y)
#define _MIN(x, y) x<y?x:ytice

## float x = PI * MIN(2.0f, 3.0f); // x = 6.28298

float y = PI * _MIN(2.0f, 3.0f); // y = 3.0 !?

45
#define
 Do not use #define to define a data type
 Ex:
#define uint unsigned int
uint a, b ;
// OK! a and b are unsigned int

## char *s1 = " Hello ", *s2 = " World ";

#define CSTR char *
CSTR s3 = " OK ", s4 = " oops ";
// s4 is not a char *

46
typedef
 To define a datatype
 The usage:
typedef original_typename new_typename; 注意要加分號
 EX:

## typedef unsigned int uint;

uint a, b ; // OK! a and b are unsigned int

## typedef char * cstr;

cstr sA = "Hello", sB = "world!";
printf("%s, %s\n", sA, sB ); // OK!

47

 設計一個函式，名為inverseN，用來反轉一個unsigned long

 給一個double陣列A，用來儲存某數學函數的運算結果，即A[i] = f(ix)。

void dev(const double* A, double* B, int n);

A[i] = 0 if i < 0 or i >= n

48

 請設計generic min & max functions，可處理所有C語言的基本資料型

void GMin(const void *pa, const void *pb, void *pOut, size_t n);
void GMax(const void *pa, const void *pb, void *pOut, size_t n);
pa及pb兩者為指向同一資料型態的輸入，pOut為輸出，n為資料大小
(byte)

##  請以上題來實作generic 的bubble sort

void bbsort(void *p, int n, size_t m, char dir);
p為指向欲排序的資料陣列，n為陣列元素個數，m為每個元素的資料

49