You are on page 1of 301

PostgreSQL, Second Edition

by Korry Douglas; Susan Douglas

--------------------------------------------------------------------------------

Publisher: Sams
Pub Date: July 26, 2005
Print ISBN-10: 0-672-32756-2
Print ISBN-13: 978-0-672-32756-8

Pages: 1032

www.allitebooks.com
The Real Value in Free Softw are

These days, it seem s t hat m ost discussion of open- source soft ware cent ers around t he idea t hat you should not have t o t ie
your fut ure t o t he whim of som e giant corporat ion. People say t hat open- source soft ware is bet t er t han propriet ary soft ware
because it is developed and m aint ained by t he users inst ead of a faceless com pany out t o light en your wallet .

I t hink t hat t he real value in free soft ware is educat ion. I have never learned anyt hing by reading m y own code[ 1 ] . On t he
ot her hand, it 's a rare occasion when I 've looked at code writ t en by som eone else and haven't com e away wit h anot her t ool
in m y t oolkit . People don't t hink alike. I don't m ean t hat people disagree wit h each ot her; I m ean t hat people solve problem s
in different ways. Each person brings a unique set of experiences t o t he t able. Each person has his own set of goals and
biases. Each person has his own int erest s. All of t hese t hings will shape t he way you t hink about a problem . Oft en, I 'll find
m yself in a heat ed disagreem ent wit h a colleague only t o realize t hat we are each correct in our approach. Just because I 'm
right , doesn't m ean t hat m y colleague can't be right as well.

[ 1]
Maybe I should say t hat I have never learned anyt hing new by reading m y own code. I 've cert ainly looked
at code t hat I 've writ t en and wondered what I was t hinking at t he t im e, learning t hat I 'm not nearly as clever
as I had rem em bered. Oddly enough, t hose who have read m y code have reached a sim ilar conclusion.

Open-source soft ware is a great way t o learn. You can learn about program m ing. You can learn about design. You can learn
about debugging. Som et im es, you'll learn how not t o design, code, or debug; but t hat 's a valuable lesson, t oo. You can learn
sm all t hings, like how t o cache file descript ors on syst em s where file descript ors are a scarce and expensive resource, or how
t o use t he select() funct ion t o im plem ent fine- grained t im ers. You can learn big t hings, like how a query opt im izer works or
how t o writ e a parser, or how t o develop a good m em ory - m anagem ent st rat egy.

Post greSQL is a great exam ple. I 've been using dat abases for t he last t wo decades. I 've used m ost of t he m aj or com m ercial
dat abases: Oracle, Sybase, DB2, and MS SQL Server. Wit h each com m ercial dat abase, t here is a wall of knowledge bet ween
m y needs and t he vendor's need t o prot ect his int ellect ual propert y. Unt il I st art ed exploring open- source dat abases, I had
an incom plet e underst anding of how a dat abase works. Why was t his part icular feat ure im plem ent ed t hat way? Why am I
get t ing poor perform ance when I try t his? That 's a neat feat ure; I wonder how t hey did t hat ? Every com m ercial dat abase
t ries t o expose a sm all piece of it s inner workings. The explain st at em ent will show you why t he dat abase m akes it s
opt im izat ion decisions. But , you only get t o see what t he vendor want s you to see. The vendor isn't t rying t o hide t hings from
you ( in m ost cases) , but wit hout com plet e access t o t he source code, t hey have t o pick and choose how t o expose
inform at ion in a m eaningful way. Wit h open- source soft ware, you can dive deep int o t he source code and pull out all t he
inform at ion you need. While writ ing t his book, I 've spent a lot of t im e reading t hrough t he Post greSQL source code. I 've
added a lot of m y own code t o reveal m ore inform at ion so t hat I could explain t hings m ore clearly. I can't do t hat wit h a
com m ercial dat abase.

There are gem s of brilliance in m ost open - source proj ect s. I n a well- designed, well- fact ored proj ect , you will find designs and
code t hat you can use in your own proj ect s. Many open - source proj ect s are st art ing t o split t heir code int o reusable libraries.
The Apache Port able Runt im e is a good exam ple. The Apache Web server runs on m any diverse plat form s. The Apache
developm ent t eam saw t he need for a layer of abst ract ion t hat would provide a port able int erface t o syst em funct ions such as
shared m em ory and net work access. They decided t o fact or t he port abilit y layer int o a library separat e from t heir m ain
proj ect . The result is t he Apache Port able Runt im e—a library of code t hat can be used in ot her open - source proj ect s ( such as
Post greSQL) .

Som e developers hat e t o work on som eone else's code. I love working on code writ t en by anot her developer—I always learn
som et hing from t he experience. I st rongly encourage you t o dive int o t he Post greSQL source code. You will learn from it . You
m ight even decide t o cont ribut e t o t he proj ect .

—Korry Douglas

www.allitebooks.com
I ntroduction

Post greSQL is a relat ional dat abase wit h a long hist ory. I n t he lat e 1970s, t he Universit y of California at Berkeley began
developm ent of Post greSQL's ancest or —a relat ional dat abase known as I ngres. Relat ional Technologies t urned I ngres int o a
com m ercial product . Relat ional Technologies becam e I ngres Corporat ion and was lat er acquired by Com put er Associat es.
Around 1986, Michael St onebraker from UC Berkeley led a t eam t hat added obj ect - orient ed feat ures t o t he core of I ngres;
t he new version becam e known as Post gres. Post gres was again com m ercialized; t his t im e by a com pany nam ed I llust ra,
which becam e part of t he I nform ix Corporat ion. Andrew Yu and Jolly Chen added SQL support t o Post gres in t he m id- '90s.
Prior versions had used a different , Post gres- specific query language known as Post quel. I n 1996, m any new feat ures were
added, including t he MVCC t ransact ion m odel, m ore adherence t o t he SQL92 st andard, and m any perform ance
im provem ent s. Post gres once again t ook on a new nam e: Post greSQL.

Today, Post greSQL is developed by an int ernat ional group of open - source soft ware proponent s known as t he Post greSQL
Global Developm ent group. Post greSQL is an open- source product —it is not propriet ary in any way. Red Hat has recent ly
com m ercialized Post greSQL, creat ing t he Red Hat Dat abase, but Post greSQL it self will rem ain free and open source.

P o st g r e SQ L Fe a t u r e s

Post greSQL has benefit ed well from it s long hist ory. Today, Post greSQL is one of t he m ost advanced dat abase servers
available. Here are a few of t he feat ures found in a st andard Post greSQL dist ribut ion:

€ Obj ect - relat ional— I n Post greSQL, every t able defines a class. Post greSQL im plem ent s inherit ance bet ween t ables ( or,
if you like, bet ween classes) . Funct ions and operat ors are polym orphic.

€ St andards com pliant — Post greSQL synt ax im plem ent s m ost of t he SQL92 st andard and m any feat ures of SQL99.
Where differences in synt ax occur, t hey are m ost oft en relat ed t o feat ures unique t o Post greSQL.

€ Open source— An int ernat ional t eam of developers m aint ains Post greSQL. Team m em bers com e and go, but t he core
m em bers have been enhancing Post greSQL's perform ance and feat ure set since at least 1996. One advant age t o
Post greSQL's open - source nat ure is t hat t alent and knowledge can be recruit ed as needed. The fact t hat t his t eam is
int ernat ional ensures t hat Post greSQL is a product t hat can be used product ively in any nat ural language, not j ust
English.

€ Transact ion processing— Post greSQL prot ect s dat a and coordinat es m ult iple concurrent users t hrough full t ransact ion
processing. The t ransact ion m odel used by Post greSQL is based on m ult i-version concurrency cont rol ( MVCC) . MVCC
provides m uch bet t er perform ance t han you would find wit h ot her product s t hat coordinat e m ult iple users t hrough
t able- , page-, or row- level locking.

€ Referent ial int egrit y — Post greSQL im plem ent s com plet e referent ial int egrit y by support ing foreign and prim ary key
relat ionships as well as t riggers. Business rules can be expressed wit hin t he dat abase rat her t han relying on an
ext ernal t ool.

€ Mult iple procedural languages— Triggers and ot her procedures can be writ t en in any of several procedural languages.
Server- side code is m ost com m only writ t en in PL/ pgSQL, a procedural language sim ilar t o Oracle's PL/ SQL. You can
also develop server - side code in Tcl, Perl, even bash ( t he open- source Linux/ Unix shell) .

€ Mult iple- client API s— Post greSQL support s t he developm ent of client applicat ions in m any languages. This book
describes how t o int erface t o Post greSQL from C, C+ + , ODBC, Perl, PHP, Tcl/ Tk, and Pyt hon.

€ Unique dat a t ypes— Post greSQL provides a variet y of dat a t ypes. Besides t he usual num eric, st ring, and dat a t ypes,
you will also find geom et ric t ypes, a Boolean dat a t ype, and dat a t ypes designed specifically to deal wit h net work
addresses.

€ Ext ensibilit y — One of t he m ost im port ant feat ures of Post greSQL is t hat it can be ext ended. I f you don't find
som et hing t hat you need, you can usually add it yourself. For exam ple, you can add new dat a t ypes, new funct ions
and operat ors, and even new procedural and client languages. There are m any cont ribut ed packages available on t he
I nt ernet . For exam ple, Refract ions Research, I nc. has developed a set of geographic dat a t ypes t hat can be used t o
efficient ly m odel spat ial ( GI S) dat a.

W h a t V e r si o n s D o e s T h i s Bo o k Co v e r ?

The first edit ion of t his book covered versions 7.1 t hrough 7.3. I n t his edit ion, we've updat ed t he basics and added coverage
for t he new feat ures int roduced in versions 7.4 and 8.0. Throughout t he book, I 'll be sure t o let you know which feat ures
work only in new releases, and, in a few cases, I 'll explain feat ures t hat have been deprecat ed ( t hat is, feat ures t hat are
obsolet e) . You can use t his book t o inst all, configure, t une, program , and m anage Post greSQL versions 7.1 t hrough 8.0.

Fort unat ely, t he Post greSQL developers t ry very hard t o m aint ain forward com pat ibilit y—new feat ures t end not t o break
exist ing applicat ions. This m eans t hat all t he feat ures discussed in t his book should st ill be available and subst ant ially sim ilar
in lat er versions of Post greSQL. I have t ried t o avoid t alking about feat ures t hat have not been released at t he t im e of
writ ing—where I have m ent ioned fut ure developm ent s, I will point t hem out .

W h o I s T h i s Bo o k Fo r ?

www.allitebooks.com
I f you are already using Post greSQL, you should find t his book a useful guide t o som e of t he feat ures t hat you m ight be less
fam iliar wit h. The first part of t he book provides an int roduct ion t o SQL and Post greSQL for t he new user. You'll also find
inform at ion t hat shows how t o obt ain and inst all Post greSQL on a Unix/ Linux host , as well as on Microsoft Windows.

I f you are developing an applicat ion t hat will st ore dat a in Post greSQL, t he second part of t his book will provide you wit h a
great deal of inform at ion relat ing t o Post greSQL program m ing. You'll find inform at ion on bot h server -side and client -side
program m ing in a variet y of languages.

Every dat abase needs occasional adm inist rat ive work. The final part of t he book should be of help if you are a Post greSQL
adm inist rat or, or a developer or user t hat needs t o do occasional adm inist rat ion. You will also find inform at ion on how t o
secure your dat a against inappropriat e use.

Finally, if you are t rying to decide which dat abase to use for your current proj ect ( or for fut ure proj ect s) , t his book should
provide all t he inform at ion you need t o evaluat e whet her Post greSQL will fit your needs.

W h a t T o p i cs D o e s T h i s Bo o k Co v e r ?

Post greSQL is a huge product . I t 's not easy t o find t he right m ix of t opics when you are t rying t o fit everyt hing int o a single
book. This book is divided int o t hree part s.

The first part , "General Post greSQL Use," is an int roduct ion and user's guide for Post greSQL. Chapt er 1, "I nt roduct ion t o
Post greSQL and SQL," covers t he basics—how t o obt ain and inst all Post greSQL ( if you are running Linux, chances are you
already have Post greSQL and it m ay be inst alled) . The first chapt er also provides a gent le int roduct ion t o SQL and discusses
t he sam ple dat abase we'll be using t hroughout t he book. Chapt er 2, "Working wit h Dat a in Post greSQL," describes t he m any
dat a t ypes support ed by a st andard Post greSQL dist ribut ion; you'll learn how t o ent er values ( lit erals) for each dat a t ype,
what kind of dat a you can st ore wit h each t ype, and how t hose dat a t ypes are com bined int o expressions. Chapt er 3,
"Post greSQL SQL Synt ax and Use," fills in som e of t he det ails we glossed over in t he first t wo chapt ers. You'll learn how to
creat e new dat abases, new t ables and indexes, and how Post greSQL keeps your dat a safe t hrough t he use of t ransact ions.
Chapt er 4, "Perform ance," describes t he Post greSQL opt im izer. I 'll show you how t o get inform at ion about t he decisions
m ade by t he opt im izer, how t o decipher t hat inform at ion, and how t o influence t hose decisions.

Part I I , "Program m ing wit h Post greSQL," is all about Post greSQL program m ing. I n Chapt er 5, "I nt roduct ion t o Post greSQL
Program m ing," we st art off by describing t he opt ions you have when developing a dat abase applicat ion t hat works wit h
Post greSQL ( and t here are a lot of opt ions) . Chapt er 6, "Ext ending Post greSQL," briefly describes how t o ext end Post greSQL
by adding new funct ions, dat a t ypes, and operat ors. Chapt er 7, "PL/ pgSQL," describes t he PL/ pgSQL language. PL/ pgSQL is a
server- based procedural language. Code t hat you writ e in PL/ pgSQL execut es wit hin t he Post greSQL server and has very fast
access t o dat a. Each chapt er in t he rem ainder of t he program m ing sect ion deals wit h a client- based API . You can connect t o
a Post greSQL server using a num ber of languages. I show you how t o int erface t o Post greSQL using C, C+ + , ecpg, ODBC,
JDBC, Perl, PHP, Tcl/ Tk, Pyt hon, and Microsoft 's .NET. Chapt ers 8 t hrough 18 all follow t he sam e pat t ern: you develop a
series of client applicat ions in a given language. The first client applicat ion shows you how t o est ablish a connect ion t o t he
dat abase ( and how t hat connect ion is represent ed by t he language in quest ion) . The next client adds error checking so t hat
you can int ercept and react t o unusual condit ions. The t hird client in each chapt er dem onst rat es how t o process SQL
com m ands from wit hin t he client . The final client wraps everyt hing t oget her and shows you how t o build an int eract ive query
processor using t he language being discussed. Even if you program in only one or t wo languages, I would encourage you t o
st udy t he ot her chapt ers in t his sect ion. I t hink you'll find t hat looking at t he sam e applicat ion writ t en in a variet y of
languages will help you underst and t he philosophy followed by t he Post greSQL developm ent t eam , and it 's a great way t o
st art learning a new language. Chapt er 19, "Ot her Useful Program m ing Tools," int roduces you t o a few program m ing t ools
( and int erfaces) t hat you m ight find useful: PL/ Java and PL/ Perl. I 'll also show you how t o use Post greSQL inside of bash shell
script s.

The final part of t his book ( Part I I I , "Post greSQL Adm inist rat ion") deals wit h adm inist rat ive issues. The final six chapt ers of
t his book show you how t o perform t he occasional dut ies required of a Post greSQL adm inist rat or. I n t he first t wo chapt ers,
Chapt er 20, "I nt roduct ion t o Post greSQL Adm inist rat ion," and Chapt er 21, "Post greSQL Adm inist rat ion," you'll learn how t o
st art up, shut down, back up, and rest ore a server. I n Chapt er 22, "I nt ernat ionalizat ion and Localizat ion," you will learn how
Post greSQL support s int ernat ionalizat ion and localizat ion. Post greSQL underst ands how t o st ore and process a variet y of
single-byt e and m ult i- byt e charact er set s including Unicode, ASCI I , and Japanese, Chinese, Korean, and Taiwan EUC. I n
Chapt er 23, "Securit y," I 'll show you how t o secure your dat a against unaut horized uses ( and unaut horized users) . I n
Chapt er 24, "Replicat ing Post greSQL wit h Slony," you'll learn how t o replicat e dat a wit h Post greSQL's Slony replicat ion
syst em . Chapt er 25, "Cont ribut ed Modules," int roduces a few open-source proj ect s t hat work well wit h Post greSQL. I 'll show
you how t o query a Post greSQL dat abase using XML, how t o configure and use TSEARCH2 ( a full- t ext indexing and search
syst em ) , and how t o inst all and use PgAdm in I I I , a graphical user int erface specifically designed for Post greSQL.

W h a t ' s N e w i n t h e Se co n d Ed i t i o n ?

The first edit ion of t his book hit t he shelves in February 2003—at t hat t im e, t he Post greSQL developers had j ust released
version 7.3.2. Release 7.4 was unleashed in Novem ber 2003. I n January 2005, t he Post greSQL developers released version
8.0—a m aj or release full of new feat ures. We t im ed t he second edit ion of t his book t o coincide wit h t he release of version 8.0
( t he book will appear in bookst ores a few m ont hs aft er 8.0 hit s t he st reet s) . I n t his edit ion, we've added coverage for all of
t he ( m aj or) new feat ures in 7.3, 7.4, and 8.0, including

€ I nst alling, securing, and m anaging Post greSQL on Windows host s

€ Tablespaces

www.allitebooks.com
€ Schem as

€ New quot ing m echanism s for st ring values

€ New dat a t ypes ( ANYARRAY, ANYELEMENT, VOID)

€ The st andards- conform ing INFORMATION_SCHEMA

€ Nest ed t ransact ions ( SAVEPOINT's)

€ The new Post greSQL buffer m anager

€ Aut o-vacuum

€ Prepared- st at em ent execut ion ( t he PREPARE/ EXECUTE m odel)

€ Set -ret urning funct ions

€ Except ion handling in PL/ pgSQL

€ libpqxx, t he new Post greSQL int erface for C+ + client s

€ New feat ures in ecpg ( t he em bedded SQL processor for C)

€ New feat ures in t he ODBC, JDBC ( Java) , Perl, Pyt hon, PHP, and Tcl/ Tk client int erfaces

€ npgsql—t he Post greSQL .NET dat a provider

€ Ot her useful program m ing t ools ( PL/ Java, pgpash, pgcurl, et c.)

€ Point - in- t im e recovery

€ Replicat ion

€ Using Post greSQL wit h XML

€ Full- t ext search

We hope you enj oy t his book and find it useful. The Post greSQL developers have done an incredible j ob of enhancing what
was already a world- class dat abase product . Now dig in.

www.allitebooks.com
Pa r t I : Ge n e r a l Post g r e SQL U se
1 I n t r odu ct ion t o Post g r eSQL an d SQL

2 Wor k in g w it h Dat a in Post g r eSQL

3 Post g r eSQL SQL Sy n t ax an d Use

4 Per f or m an ce

Cha pt e r 1 . I nt r oduct ion t o Post gr e SQL a nd SQL

Post g r eSQL is an op en - sou r ce, clien t / ser v er , r elat ion al d at ab ase. Post g r eSQL of f er s a u n iqu e m ix of f eat u r es t h at com p ar e w ell t o
t he m aj or com m er cial d at ab ases su ch as Sy base, Or acle, an d DB2 . On e of t h e m aj or ad v an t ag es t o Post g r eSQL is t h at it is op en
sou r ce—y ou can see t h e sou r ce cod e f or Post gr eSQL. Post g r eSQL is n ot ow n ed by an y sin gle com p an y . I t is d ev elop ed , m ain t ain ed ,
br ok en , an d f ix ed b y a g r ou p of v olu n t eer d ev elop er s ar ou n d t he w or ld. You don ' t h av e t o b u y Post g r eSQL—it 's f r ee. You w on ' t h av e
t o p ay an y m ain t en an ce f ees ( alt h ou gh y ou can cer t ain ly f in d com m er cial sou r ces f or t ech n ical su p p or t ) .

Post g r eSQL of f er s all t h e u su al f eat u r es of a r elat ion al d at ab ase plu s q u it e a f ew u n iqu e f eat u r es. Post g r eSQL offer s in h er it an ce ( for
y ou ob j ect - or ien t ed r eader s) . You can ad d y ou r ow n dat a t y p es t o Post g r eSQL. ( I k n ow , som e of y ou ar e p r ob ab ly t h in k in g t h at y ou
can d o t h at in y ou r f av or it e d at ab ase. ) Most d at ab ase sy st em s allow y ou t o giv e a n ew n am e t o an ex ist in g t y pe. Som e sy st em s
allow y ou t o d ef in e com p osit e t y pes. Wit h Post gr eSQL, y ou can ad d n ew f u n d am en t al d at a t y p es. Post g r eSQL in clu des su p p or t f or
g eom et r ic dat a t y p es su ch as point, line segment, box, polygon, an d circle. Post g r eSQL u ses in d ex in g st r u ct u r es t h at m ak e
g eom et r ic dat a t y p es fast . Post g r eSQL can be ex t en ded —y ou can bu ild n ew f u n ct ion s, n ew oper at or s, an d n ew d at a t y p es in t h e
lan g u ag e of y ou r ch oice. Post g r eSQL is bu ilt ar ou n d clien t / ser v er ar ch it ect u r e. You can bu ild clien t applicat ion s in a n u m b er of
dif f er en t lan gu ages, in clu din g C, C+ + , Jav a, Py t h on , Per l, TCL/ Tk , an d ot h er s. On t h e ser v er side, Post g r eSQL spor t s a p ow er f u l
pr ocedu r al lan g u ag e, PL/ p g SQL ( ok ay , t h e lan g u ag e is spor t ier t h an t h e n am e) . You can add p r oced u r al lan g u ag es t o t h e ser v er . You
w ill f in d pr ocedu r al lan g u ag es su p p or t in g Per l, TCL/ Tk , an d ev en t h e bash sh ell.

A Sa m ple D a t a ba se

Th r ou g h ou t t h is b ook , I 'll u se a sim ple ex am p le d at ab ase t o h elp ex plain som e of t h e m or e com p lex con cept s. Th e sam p le d at ab ase
r epr esen t s som e of t h e dat a st or ag e an d r et r iev al r eq u ir em en t s t h at y ou m ig h t en cou n t er w h en r u n n in g a v id eo r en t al st or e. I w on ' t
pr et en d t h at t h e sam p le d at ab ase is u sef u l for an y r eal- w or ld scen ar ios; in st ead , t h is d at ab ase w ill h elp u s ex plor e h o w Post g r eSQL
w or k s an d sh ou ld illu st r at e m an y Post g r eSQL f eat u r es.

To b eg in w it h , t h e sam p le d at ab ase ( w h ich is called m ov ies) con t ain s t h r ee k in ds of r ecor ds: cu st om er s, t ap es, an d r en t als.

Wh en ev er a cu st om er w alk s in t o ou r im ag in ar y v id eo st or e, y ou w ill con su lt y ou r d at ab ase t o d et er m in e w h et h er y ou alr eady k n ow


t h is cu st om er . I f n ot , y ou ' ll ad d a n ew r ecor d. Wh at it em s of in f or m at ion sh ou ld y ou st or e f or each cu st om er ? At t h e v er y least , y ou
w ill w an t t o r ecor d t h e cu st om er ' s n am e. You w ill w an t t o en su r e t h at each cu st om er h as a u n iq u e iden t if ier —y ou m ig h t h av e t w o
cu st om er s n am ed " Dan n y Joh n son , " an d y ou ' ll w an t t o k eep t h em st r aigh t . A n am e is a poor ch oice f or a u n iqu e iden t if ier —n am es
m ig h t n ot be u n iqu e, an d t h ey can of t en b e spelled in dif f er en t w ay s. ( " Was t h at Dan n y , Dan , or Dan iel?" ) You ' ll assig n each
cu st om er a u n iq u e cu st om er I D. You m ig h t also w an t t o st or e t h e cu st om er ' s bir t h dat e so t h at y ou k n ow w h et h er h e sh ou ld b e
allow ed t o r en t cer t ain m ov ies. I f y ou find t h at a cu st om er h as an ov er d u e t ape r en t al, y ou ' ll p r ob ab ly w an t t o p h on e h im , so y ou
bet t er st or e t h e cu st om er ' s p h on e n u m b er . I n a r eal- w or ld b u sin ess, y ou w ou ld p r ob ab ly w an t t o k n ow m u ch m or e in f or m at ion ab ou t
each cu st om er ( su ch as h is h om e ad d r ess) , b u t f or t h ese pu r poses, y ou ' ll k eep y ou r st or ag e r eq u ir em en t s t o a m in im u m .

Nex t , y ou w ill n eed t o k eep t r ack of t he v id eos t h at y ou st ock . Each v id eo h as a t it le an d a d u r at ion —y ou ' ll st or e t h ose. You m ig h t
ow n sev er al cop ies of t h e sam e m ov ie an d y ou w ill cer t ain ly h av e m an y m ov ies w it h t h e sam e du r at ion , so y ou can ' t u se eit h er on e
for a u n iq u e iden t if ier . I n st ead , y ou ' ll assign a u n iq u e I D t o each v id eo.

Fin ally , y ou w ill n eed t o t r ack r en t als. Wh en a cu st om er r en t s a t ape, y ou w ill st or e t h e cu st om er I D, t ap e I D, an d r en t al dat e.

Not ice t h at y ou w on ' t st or e t h e cu st om er n am e w it h each r en t al. As lon g as y ou st or e t h e cu st om er I D, y ou can alw ay s r et r iev e t h e


cu st om er n am e. You w on ' t st or e t h e m ov ie t it le w it h each r en t al, eit h er —y ou can find t h e m ov ie t it le by it s u n iq u e iden t ifier .

At a f ew poin t s in t h is b ook , w e m ig h t m ak e ch an g es t o t h e lay ou t of t h e sam p le d at ab ase, b u t t h e basic sh ape w ill r em ain t h e sam e.

www.allitebooks.com
Ba si c D a t a b a s e T e r m i n o l o g y

Before we get int o t he int erest ing st uff, it m ight be useful t o get acquaint ed w it h a few of t he t er m s t hat y ou w ill encount er in y our
Post gr eSQL life. Post gr eSQL has a long hist or y —you can t race it s hist ory back t o 1977 and a pr ogr am k now n as I ngr es. A lot has changed
in t he relat ional dat abase w or ld since 1977. When y ou ar e br eak ing gr ound w it h a new product ( as t he I ngr es developer s w er e) , y ou don't
have t he lux ur y of using st andard, well- under st ood, and well- accept ed t er m inology —you have t o m ake it up as y ou go along. Many of t he
t erm s used by Post gr eSQL have sy nony m s ( or at least close analogies) in t oday's r elat ional m ar ket place. I n t his sect ion, I 'll show you a few
of t he t erm s t hat you'll encount er in t his book and t ry t o ex plain how t hey r elat e t o sim ilar concept s in ot her dat abase pr oduct s.

€ Schem a

A schem a is a nam ed collect ion of t ables. ( see t able) . A schem a can also cont ain view s, index es, sequences, dat a t ypes, oper at or s,
and funct ions. Ot her relat ional dat abase pr oduct s use t he t erm cat alog.

€ Dat abase

A dat abase is a nam ed collect ion of schem as. When a client applicat ion connect s t o a Post gr eSQL server, it specifies t he nam e of t he
dat abase t hat it w ant s t o access. A client cannot int er act w it h m ore t han one dat abase per connect ion but it can open any num ber of
connect ions in or der t o access m ult iple dat abases sim ult aneously.

€ Com m and

A com m and is a st ring t hat y ou send t o t he ser ver in hopes of hav ing t he ser ver do som et hing useful. Som e people use t he word
st at em ent t o m ean com m and. The t w o w or ds ar e v er y sim ilar in m eaning and, in pr act ice, are int er changeable.

€ Query

A query is a t ype of com m and t hat r et r ieves dat a from t he ser ver .

€ Table ( relat ion, file, class)

A t able is a collect ion of r ow s. A t able usually has a nam e, alt hough som e t ables ar e t em por ar y and exist only t o carry out a
com m and. All t he r ow s in a t able have t he sam e shape ( in ot her words, every r ow in a t able cont ains t he sam e set of colum ns) . I n
ot her dat abase sy st em s, y ou m ay see t he t er m s relat ion, file, or even class—t hese ar e all equivalent t o a t able.

€ Colum n ( field, at t ribut e)

A colum n is t he sm allest unit of st or age in a r elat ional dat abase. A colum n represent s one piece of infor m at ion about an obj ect .
Every colum n has a nam e and a dat a t ype. Colum ns ar e gr ouped int o r ow s, and r ow s ar e gr ouped int o t ables. I n Figure 1.1, t he
shaded area depict s a single colum n.

Figu r e 1 .1 . A colu m n ( h igh ligh t e d) .

The t er m s field and at t ribut e have sim ilar m eanings.

€ Row ( r ecor d, t uple)

A row is a collect ion of colum n values. Ev er y r ow in a t able has t he sam e shape ( in ot her w or ds, ev er y r ow is com posed of t he sam e
set of colum ns) . I f y ou are t rying t o m odel a r eal- world applicat ion, a r ow r epr esent s a r eal- world obj ect . For ex am ple, if y ou are
running an aut o dealership, y ou m ight hav e a vehicles t able. Each r ow in t he vehicles t able r epr esent s a car ( or t r uck , or
m ot orcycle, and so on) . The k inds of infor m at ion t hat you st or e ar e t he sam e for all vehicles ( t hat is, ev er y car has a color, a
vehicle I D, an engine, and so on) . I n Figure 1.2 , t he shaded area depict s a row.

Figu r e 1 .2 . A r ow ( h igh ligh t e d) .

www.allitebooks.com
You m ay also see t he t erm s record or t uple—t hese ar e equiv alent t o a row.

€ Com posit e t y pe

St art ing w it h Post gr eSQL version 8, y ou can cr eat e new dat a t ypes t hat are com posed of m ult iple values. For ex am ple, you could
cr eat e a com posit e t ype nam ed address t hat holds a st r eet addr ess, cit y , st at e/ pr ovince, and post al code. When you cr eat e a t able
t hat cont ains a colum n of t y pe address, you can st ore all four com ponent s in a single field. We discuss com posit e t ypes in m ore
det ail in Chapt er 2, " Wor king w it h Dat a in Post gr eSQL."

€ Dom ain

A dom ain defines a nam ed specializat ion of anot her dat a t y pe. Dom ains ar e useful w hen y ou need t o ensur e t hat a single dat a t y pe
is used in sever al t ables. For exam ple, y ou m ight define a dom ain nam ed accountNumber t hat cont ains a single let t er follow ed by
four digit s. Then y ou can cr eat e colum ns of t ype accountNumber in a gener al ledger account s t able, an account s r eceivable cust om er
t able, and so on.

€ View

A view is an alt er nat ive w ay t o present a t able ( or t ables) . You m ight t hink of a view as a " vir t ual" t able. A view is ( usually) defined
in t er m s of one or m ore t ables. When y ou cr eat e a v iew , y ou ar e not st or ing m or e dat a, y ou ar e inst ead creat ing a different w ay of
looking at exist ing dat a. A view is a useful w ay t o giv e a nam e t o a com plex query t hat y ou m ay hav e t o use r epeat edly .

€ Client / server

Post gr eSQL is built around a client / server archit ect ure. I n a client / server pr oduct , t her e ar e at least t w o pr ogr am s involved. One is a
client and t he ot her is a ser v er . These pr ogr am s m ay exist on t he sam e host or on differ ent host s t hat ar e connect ed by som e sort
of net w or k . The ser ver offers a ser vice; in t he case of Post gr eSQL, t he ser ver offer s t o st ore, ret rieve, and change dat a. The client
asks a server t o per for m w or k ; a Post gr eSQL client ask s a Post gr eSQL ser ver t o serve up relat ional dat a.

€ Client

A client is an applicat ion t hat m akes r equest s of t he Post gr eSQL server. Before a client applicat ion can t alk t o a server, it m ust
connect t o a post m ast er ( see postmaster) and est ablish it s ident it y. Client applicat ions pr ovide a user int er face and can be w r it t en
in m any languages. Chapt er s 8 t hr ough 19 will show you how t o w r it e a client applicat ion.

€ Ser ver

The Post gr eSQL server is a program t hat ser vices com m ands com ing from client applicat ions. The Post gr eSQL ser ver has no user
int erface—you can't t alk t o t he ser ver dir ect ly , y ou m ust use a client applicat ion.

€ Post m ast er

Because Post gr eSQL is a client / ser v er dat abase, som et hing has t o list en for connect ion r equest s com ing fr om a client applicat ion.
That 's w hat t he postmaster does. When a connect ion request ar r ives, t he postmaster creat es a new server pr ocess in t he host
operat ing sy st em .

€ Transact ion

A t r ansact ion is a collect ion of dat abase oper at ions t hat are t reat ed as a unit . Post gr eSQL guar ant ees t hat all t he oper at ions w it hin a
t ransact ion com plet e or t hat none of t hem com plet e. This is an im por t ant pr oper t y —it ensur es t hat if som et hing goes w rong in t he
m iddle of a t r ansact ion, changes m ade befor e t he point of failur e will not be reflect ed in t he dat abase. A t r ansact ion usually st ar t s
w it h a BEGIN com m and and ends w it h a COMMIT or ROLLBACK ( see t he next ent r ies) .

€ Com m it

www.allitebooks.com
A com m it m arks t he successful end of a t r ansact ion. When y ou per for m a com m it , y ou are t elling Post gr eSQL t hat you have
com plet ed a unit of oper at ion and t hat all t he changes t hat y ou m ade t o t he dat abase should becom e per m anent .

€ Rollback

A rollback m arks t he un successful end of a t ransact ion. When y ou roll back a t ransact ion, you ar e t elling Post gr eSQL t o discard any
changes t hat y ou hav e m ade t o t he dat abase ( since t he beginning of t he t r ansact ion) .

€ I ndex

An index is a dat a st r uct ur e t hat a dat abase uses t o r educe t he am ount of t im e it t akes t o per for m cert ain oper at ions. An index can
also be used t o ensur e t hat duplicat e values don't appear w her e t hey ar en't w ant ed. I 'll t alk about index es in Chapt er 4,
" Per for m ance."

€ Tablespace

A t ablespace defines an alt er nat ive st or age locat ion w her e y ou can cr eat e t ables and indexes. When y ou cr eat e a t able ( or index) ,
you can specify t he nam e of a t ablespace—if y ou don't specify a t ablespace, Post gr eSQL cr eat es all obj ect s in t he sam e dir ect or y
t ree. You can use t ablespaces t o dist r ibut e t he w or k load acr oss m ult iple disk drives.

€ Result set

When you issue a query t o a dat abase, y ou get back a result set . The r esult set cont ains all t he r ow s t hat sat isfy y our quer y. A result
set m ay be em pt y .

www.allitebooks.com
Pr e r e qu isit e s

Bef or e I g o m u ch f u r t h er , let ' s t alk ab ou t in st allin g Post g r eSQL. Ch ap t er s 2 1 , " Post g r eSQL Ad m in ist r at ion , " an d 23 , " Secu r it y , "
d iscu ss Post g r eSQL in st allat ion in d et ail, b u t I ' ll sh o w y ou a t y p ical in st allat ion p r oced u r e h er e.

Wh en y ou in st all Post g r eSQL, y ou can st ar t w it h p r eb u ilt b in ar ies o r y ou can com p ile Post g r eSQL f r om sou r ce cod e. I n t h is ch ap t er ,
I 'll sh ow y ou h o w t o in st all Post g r eSQL o n a Lin u x h ost st ar t in g f r om pr ebu ilt b in ar ies. I f y ou d ecid e t o in st all Post g r eSQL f r om
sou r ce cod e, m a n y of t h e st ep s ar e t h e sam e. I ' ll sh ow y ou h o w t o bu ild Post g r eSQL f r om sou r ce cod e in Ch ap t er 2 1 .

I n old er v er sion s of Post g r eSQL, y ou cou ld r u n t h e Post g r eSQL ser v er o n a Win d ow s h ost b u t y ou h ad t o in st all a Un ix - lik e
in f r ast r u ct u r e ( Cy g w in ) f ir st : Post g r eSQL w asn ' t a n at iv e Win d ow s ap p licat ion . St ar t in g w it h Post g r eSQL v er sion 8 . 0 , t h e
Post g r eSQL ser v er h as b een p or t ed t o t h e Win d ow s en v ir on m en t as a n at iv e- Win d ow s ap p licat ion . I n st allin g Post g r eSQL o n a
Win d ow s ser v er is v er y sim p le; sim p ly d ow n load an d r u n t h e in st aller p r og r am . You d o h av e a f ew ch oices t o m ak e, an d w e cov er
t he en t ir e p r oced u r e in Ch ap t er 2 1 .

I n st a llin g Post g r e SQL U sin g a n RPM

Th e easiest w ay t o in st all Post g r eSQL is t o u se a p r eb u ilt RPM p ack ag e. RPM is t h e Red Hat Pack ag e Man ag er . I t ' s a sof t w ar e
p ack ag e d esig n ed t o in st all ( an d m an ag e) ot h er sof t w ar e p ack ag es. I f y ou ch oose t o in st all u sin g so m e m et h od ot h er t h an RPM,
con su lt t h e d ocu m en t at ion t h at com es w it h t h e dist r ibu t ion y o u ar e u sin g .

Post g r eSQL is d ist r ib u t ed as a collect ion of RPM p ack ag es—y ou d on ' t h av e t o in st all all t h e p ack ag es t o u se Post g r eSQL. Tab le 1 . 1
list s t h e RPM p ack ag es av ailab le as of r elease 7 . 4 . 5 .

T a b l e 1 . 1 . P o st g r e SQ L R P M P a ck a g e s a s o f R e l e a se 7 . 4 . 5

P a ck a g e D e scr i p t i o n

p ost g r esq l Clien t s, lib r ar ies, an d d ocu m en t at ion

p ost g r esq l- ser v er Pr og r am s ( an d d at a f iles) r eq u ir ed t o r u n a ser v er

p ost g r esq l- d ev el Files r eq u ir ed t o cr eat e n ew clien t ap p licat ion s

p ost g r esq l- j d b c JDBC d r iv er f or Post g r eSQL

p ost g r esq l- t cl Tcl clien t an d PL/ Tcl

p ost g r esq l- p y t h on Post g r eSQL' s Py t h on lib r ar y

p ost g r esq l- t est Reg r ession t est su it e f or Post g r eSQL

p ost g r esq l- libs Sh ar ed libr ar ies f or clien t ap p licat ion s

p ost g r esq l- d ocs Ex t r a d ocu m en t at ion n ot in clu d ed in t h e p ost g r esq l b ase p ack ag e

p ost g r esq l- con t r ib Con t r ib u t ed sof t w ar e

Don ' t w or r y if y ou d on ' t k n ow w h ich of t h ese y ou n eed ; I 'll ex p lain m ost of t h e p ack ag es in lat er ch ap t er s. You can st ar t w or k in g
w it h Post g r eSQL b y d ow n load in g t h e p ost g r esq l, p ost g r esq l- libs, a n d p ost g r esq l - ser v er p ack ag es. Th e act u al f iles ( at t h e
w w w . p ost g r esq l. or g w eb sit e) h av e n am es t h at in clu d e a v er sion n u m b er : postgresql-7.4.5-2PGDG.i686.rpm, f or ex am p le.

I st r on g ly r ecom m en d cr eat in g an em p t y d ir ect or y , an d t h en d ow n load in g t h e Post g r eSQL p ack ag es in t o t h at d ir ect or y . Th at w ay


y ou can in st all all t h e Post g r eSQL p ack ag es w it h a sin gle co m m an d .

Af t er y ou h av e d ow n load ed t h e d esir ed p ack ag es, u se t h e rpm co m m an d t o p er f or m t h e in st allat ion p r oced u r e. You m u st h av e


su p er u ser p r iv ileg es t o in st all Post g r eSQL.

To in st all t h e Post g r eSQL p ack ag es, cd in t o t h e d ir ect or y t h at con t ain s t h e p ack ag e files a n d issu e t h e f ollow in g co m m an d :

# rpm -ihv *.rpm

Th e rpm co m m an d in st alls all t h e p ack ag es in y ou r cu r r en t d ir ect or y . You sh ou ld see r esu lt s sim ilar t o w h at is sh ow n in Fig u r e 1 . 3 .

Fi g u r e 1 . 3 . U si n g t h e rpm co m m a n d t o i n st a l l P o st g r e SQ L.

[ View f u ll size im ag e]

www.allitebooks.com
Th e RPM in st aller sh ou ld h av e cr eat ed a n ew u ser ( n am ed postgres) f or y ou r sy st em . Th is u ser I D ex ist s so t h at all d at ab ase f iles
accessed b y Post g r eSQL can b e ow n ed b y a sin g le u ser .

Each RPM p ack ag e is com p osed of m a n y f iles. You can v iew t h e list of f iles in st alled f or a g iv en p ack ag e u sin g t h e rpm -ql
co m m an d :

# rpm -ql postgresql-server


/etc/rc.d/init.d/postgresql
/usr/bin/initdb
/usr/bin/initlocation
...
/var/lib/pgsql/data
# rpm -ql postgresql-libs
/usr/lib/libecpg.so.3
/usr/lib/libecpg.so.3.2.0
/usr/lib/libpgeasy.so.2
...
/usr/lib/libpq.so.2.1

At t h is p oin t ( assu m in g t h at ev er y t h in g w or k ed ) , y ou h av e in st alled Post g r eSQL o n y ou r sy st em . No w it ' s t im e t o cr eat e a d at ab ase


t o p lay , er , w or k in .

Wh ile y ou h av e su p er u ser pr iv ileges, issu e t h e f ollow in g co m m an d s:

# su - postgres
bash-2.04$ echo $PGDATA
/var/lib/pgsql/data
bash-2.04$ initdb

Th e f ir st co m m an d ( su - postgres) ch an g es y ou r id en t it y f r om t h e OS su p er u ser ( r oot ) t o t h e Post g r eSQL su p er u ser ( postgres) .


Th e secon d co m m an d ( echo $PGDATA) sh ow s y ou w h er e t h e Post g r eSQL d at a f iles w ill b e cr eat ed . Th e f in al co m m an d cr eat es t h e
t w o p r ot ot y p e d at ab ases ( template0 an d template1) .

You sh ou ld g et ou t p u t t h at look s lik e t h at sh ow n in Fig u r e 1 . 4 .

Fi g u r e 1 . 4 . Cr e a t i n g t h e p r o t o t y p e d a t a b a se s u si n g initdb.

[ View f u ll size im ag e]
You n o w h av e t w o em p t y d at ab ases n am ed template0 an d template1. You r eally sh ou ld n ot cr eat e n ew t ables in eit h er of t h ese
d at ab ases—a t em p lat e d at ab ase con t ain s all t h e d at a r eq u ir ed t o cr eat e ot h er d at ab ases. I n ot h er w or d s, template0 an d
template1 act as p r ot ot y p es f or cr eat in g ot h er d at ab ases. I n st ead , let ' s cr eat e a d at ab ase t h at y ou can p lay in . Fir st , st ar t t h e
postmaster p r ocess. Th e postmaster is a p r og r am t h at list en s f or con n ect ion r eq u est s com in g f r om clien t ap p licat ion s. Wh en a
con n ect ion r eq u est ar r iv es, t h e postmaster st ar t s a n ew ser v er p r ocess. You can ' t d o an y t h in g in Post g r eSQL w it h ou t a
postmaster. Fig u r e 1 . 5 sh ow s y ou h o w t o g et t h e postmaster st ar t ed .

Fi g u r e 1 . 5 . Cr e a t i n g a n e w d a t a b a se w i t h createdb.

[ View f u ll size im ag e]

Af t er st ar t in g t h e postmaster, u se t h e createdb co m m an d t o cr eat e t h e movies d at ab ase ( t h is is also sh ow n in Fig u r e 1 . 5 ) . Most


of t h e ex am p les in t h is b ook t ak e p lace in t h e movies d at ab ase.

Not ice t h at I u sed t h e pg_ctl co m m an d t o st ar t t h e postmaster [ 1] .

[ 1]
You can also ar r an g e f or t h e postmaster t o st ar t w h en ev er y ou b oot y ou r com p u t er , b u t t h e ex act in st r u ct ion s
v ar y d ep en d in g on w h ich op er at in g sy st em y ou ar e u sin g . See t h e sect ion t it led " Ar r an g in g f or Post g r eSQL St ar t u p
an d Sh u t d ow n " in Ch ap t er 2 1

Th e pg_ctl p r og r am m ak es it easy t o st ar t an d st op t h e postmaster. To see a full d escr ip t ion of t h e pg_ctl co m m an d , en t er t h e


co m m an d pg_ctl --help. You w ill g et t h e ou t p u t sh ow n in Fig u r e 1 . 6 .
Fi g u r e 1 . 6 . pg_ctl o p t i o n s.

[ View f u ll size im ag e]

I f y ou u se a r ecen t RPM file t o in st all Post g r eSQL, t h e t w o p r ev iou s st ep s ( initdb an d pg_ctl start) can b e au t om at ed . I f y ou
f in d a file n am ed postgresql in t h e /etc/rc.d/init.d d ir ect or y , y ou can u se t h at sh ell scr ip t t o in it ialize t h e d at ab ase an d st ar t
t h e postmaster. Th e /etc/rc.d/init.d/postgresql scr ip t can b e in v ok ed w it h an y of t h e co m m an d - lin e op t ion s sh ow n in Tab le
1.2.

T a b l e 1 . 2 . /etc/rc.d/init.d/postgresql O p t i o n s

Opt ion D e scr i p t i o n

start St ar t t h e postmaster

stop St op t h e postmaster

status Disp lay t h e p r ocess I D of t h e postmaster if it is


r u n n in g

restart St op an d t h en st ar t t h e postmaster

reload For ce t h e postmaster t o r er ead it s con f ig u r at ion


files w it h ou t p er f or m in g a f u ll r est ar t

At t h is p oin t , y ou sh ou ld u se t h e createuser co m m an d t o t ell Post g r eSQL w h ich u ser s ar e allow ed t o access y ou r d at ab ase. Let ' s
allow t h e u ser ' bruce' in t o ou r sy st em ( see Fig u r e 1 . 7 ) .

Fi g u r e 1 . 7 . Cr e a t i n g a n e w P o st g r e SQ L u se r .

[ View f u ll size im ag e]
Th at ' s it ! You n o w h av e a Post g r eSQL d at ab ase u p an d r u n n in g .
Connect ing to a Dat abase

Assum ing t hat y ou hav e a copy of Post gr eSQL up and r unning, it 's pr et t y sim ple t o connect t o t he dat abase. Her e is an ex am ple:

$ psql -d movies
Welcome to psql, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit

movies=# \q

The psql pr ogr am is a t ex t - based int er face t o a Post gr eSQL dat abase. When y ou ar e r unning psql, y ou w on't see a gr aphical
applicat ion —no but t ons or pict ur es or ot her bells and w hist les, j ust a t ex t - based int er face. Lat er , I 'll show y ou anot her client applicat ion
t hat does pr ov ide a gr aphical int er face ( pgaccess) .

psql suppor t s a lar ge collect ion of com m and- line opt ions. To see a sum m ar y of t he opt ions t hat y ou can use, t y pe psql --help:

Code View : Scr oll / Show All

$ psql --help
This is psql, the PostgreSQL interactive terminal.

Usage:
psql [options] [dbname [username]]

Options:
-a Echo all input from script
-A Unaligned table output mode (-P format=unaligned)
-c <query> Run only single query (or slash command) and exit
-d <dbname> Specify database name to connect to (default: korry)
-e Echo queries sent to backend
-E Display queries that internal commands generate
-f <filename> Execute queries from file, then exit
-F <string> Set field separator (default: "|") (-P fieldsep=)
-h <host> Specify database server host (default: domain socket)
-H HTML table output mode (-P format=html)
-l List available databases, then exit
-n Disable readline
-o <filename> Send query output to filename (or |pipe)
-p <port> Specify database server port (default: hardwired)
-P var[=arg] Set printing option 'var' to 'arg' (see \pset command)
-q Run quietly (no messages, only query output)
-R <string> Set record separator (default: newline) (-P recordsep=)
-s Single step mode (confirm each query)
-S Single line mode (newline terminates query)
-t Print rows only (-P tuples_only)
-T text Set HTML table tag options (width, border) (-P tableattr=)
-U <username> Specify database username (default: Administrator)
-v name=val Set psql variable 'name' to 'value'
-V Show version information and exit
-W Prompt for password (should happen automatically)
-x Turn on expanded table output (-P expanded)
-X Do not read startup file (~/.psqlrc)

For m or e infor m at ion, t y pe \? ( for int er nal com m ands) or \help ( for SQL com m ands) fr om w it hin psql, or consult t he psql sect ion in t he
Post gr eSQL m anual, w hich accom panies t he dist r ibut ion and is also av ailable at ht t p: / / w w w .post gr esql.or g. Repor t bugs t o pgsql-
bugs@post gr esql.or g.

The m ost im por t ant opt ions ar e -U <user>, -d <dbname>, -h <host>, and -p <port>.

The -U opt ion allow s y ou t o specify a user nam e ot her t han t he one y ou ar e logged in as. For ex am ple, let 's say t hat y ou ar e logged in t o
y our host as user br uce and y ou w ant t o connect t o a Post gr eSQL dat abase as user sheila. This psql com m and m ak es t he connect ion ( or
at least t r ies t o) :

$ whoami
bruce
$ psql -U sheila -d movies

I m personating Another User

The -U opt ion m ay or m ay not allow y ou t o im per sonat e anot her user . Depending on how y our Post gr eSQL adm inist r at or
has configur ed dat abase secur it y , y ou m ight be pr om pt ed for sheila's passw or d; if y ou don't k now t he pr oper passw or d,
y ou w on't be allow ed t o im per sonat e her . ( Chapt er 23 discusses secur it y in gr eat er det ail.) I f y ou don't pr ov ide psql w it h
a user nam e, it w ill assum e t he user nam e t hat y ou used w hen y ou logged in t o y our host .

You use t he -d opt ion t o specify t o w hich dat abase y ou w ant t o connect . I f y ou don't specify a dat abase, Post gr eSQL will assum e t hat y ou
w ant t o connect t o a dat abase w hose nam e is y our user nam e. For ex am ple, if y ou ar e logged in as user br uce, Post gr eSQL will assum e
t hat y ou w ant t o connect t o a dat abase nam ed bruce.

The -d and -U ar e not st r ict ly r equir ed. The com m and line for psql should be of t he follow ing for m :

psql [options] [dbname [username]]

I f y ou ar e connect ing t o a Post gr eSQL ser v er t hat is r unning on t he host t hat y ou ar e logged in t o, y ou pr obably don't hav e t o w or r y about
t he -h and -p opt ions. I f, on t he ot her hand, y ou ar e connect ing t o a Post gr eSQL ser v er r unning on a differ ent host , use t he -h opt ion t o
t ell psql w hich host t o connect t o. You can also use t he -p opt ion t o specify a TCP/ I P por t num ber —y ou only hav e t o do t hat if y ou ar e
connect ing t o a ser v er t hat uses a nonst andar d por t ( Post gr eSQL usually list ens for client connect ions on TCP/ I P por t num ber 5432) . Her e
ar e a few ex am ples:

$ # connect to a server waiting on the default port on host 192.168.0.1


$ psql -h 192.168.0.1

$ # connect to a server waiting on port 2000 on host arturo


$ psql -h arturo -p 2000

I f y ou pr efer , y ou can specify t he dat abase nam e, host nam e, and TCP/ I P por t num ber using env ir onm ent v ar iables r at her t han using t he
com m and- line opt ions. Table 1.3 list s som e of t he psql com m and- line opt ions and t he cor r esponding env ir onm ent v ar iables.

Ta b le 1 .3 . psql En v ir on m e n t V a r ia b le s

Com m a n d - Lin e Op t ion En v ir on m e n t V a r ia b le M e a n in g

-d <dbname> PGDATABASE Nam e of dat abase t o connect t o


-h <host> PGHOST Nam e of host t o connect t o
-p <port> PGPORT Por t num ber t o connect t o
-U <user> PGUSER Post gr eSQL User nam e

A ( Very) Sim ple Query

At t his point , y ou should be r unning t he psql client applicat ion. Let 's t r y a v er y sim ple quer y :

$ psql -d movies
Welcome to psql, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms


\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit

movies=# SELECT user;


current_user
---------------
korry
(1 row)

movies=# \q

Let 's t ak e a close look at t his session. Fir st , y ou can see t hat I st ar t ed t he psql pr ogr am w it h t he -d movies opt ion —t his t ells psql t hat I
w ant t o connect t o t he movies dat abase.

Aft er gr eet ing m e and pr ov iding m e w it h a few cr ucial hint s, psql issues a pr om pt : movies=#. psql encodes som e useful infor m at ion int o
t he pr om pt , st ar t ing w it h t he nam e of t he dat abase t hat I am cur r ent ly connect ed t o ( movies in t his case) . The char act er t hat follow s t he
dat abase nam e can v ar y . A = char act er m eans t hat psql is w ait ing for m e t o st ar t a com m and. A - char act er m eans t hat psql is w ait ing
for m e t o com plet e a com m and ( psql allow s y ou t o split a single com m and ov er m ult iple lines. The fir st line is pr om pt ed by a = char act er ;
subsequent lines ar e pr om pt ed by a - char act er ) . I f t he pr om pt ends w it h a ( char act er , y ou hav e ent er ed m or e opening par ent heses
t han closing par ent heses.

You can see t he com m and t hat I ent er ed follow ing t he pr om pt : SELECT user;. Each SQL com m and st ar t s w it h a v er b—in t his case,
SELECT. The v er b t ells Post gr eSQL w hat y ou w ant t o do an d t he r est of t he com m and pr ov ides infor m at ion specific t o t hat com m and. I am
ex ecut ing a SELECT com m and. SELECT is used t o r et r iev e infor m at ion fr om t he dat abase. When y ou ex ecut e a SELECT com m and, y ou
hav e t o t ell Post gr eSQL w hat infor m at ion y ou ar e int er est ed in. I w ant t o r et r iev e m y Post gr eSQL user I D so I SELECT user. The final par t
of t his com m and is t he sem icolon ( ;) —each SQL com m and m ust end w it h a sem icolon.

Aft er I ent er t he SELECT com m and ( and pr ess t he Return k ey ) , psql display s t he r esult s of m y com m and:

current_user
---------------
korry
(1 row)

When y ou ex ecut e a SELECT com m and, psql st ar t s by display ing a r ow of colum n header s. I hav e select ed only a single colum n of
infor m at ion so I see only a single colum n header ( each colum n header display s t he nam e of t he colum n) . Follow ing t he r ow of colum n
header s is a single r ow of separ at or char act er s ( dashes) . Nex t com es zer o or m or e r ow s of t he dat a t hat I r equest ed. Finally , psql show s
a count of t he num ber of dat a r ow s display ed.

I ended t his session using t he \q com m and.

Tips for I nteracting w it h PostgreSQL

The psql client h as a lot of feat ur es t hat w ill m ak e y our Post gr eSQL life easier .

Besides Post gr eSQL com m ands ( SELECT, INSERT, UPDATE, CREATE TABLE, and so on) , psql pr ov ides a num ber of int er nal
com m ands ( also k now n as m et a- com m ands) . Post gr eSQL com m ands ar e sent t o t he ser v er , m et a- com m ands ar e
pr ocessed by psql it self. A m et a- com m and begins w it h a back slash char act er ( \) . You can obt ain a list of all t he m et a-
com m ands using t he \? m et a- com m and:

Code View : Scr oll / Show All

movies=# \?
\a toggle between unaligned and aligned mode
\c[onnect] [dbname|- [user]]
connect to new database (currently 'movies')
\C <title> table title
\copy ... perform SQL COPY with data stream to the client machine
\copyright show PostgreSQL usage and distribution terms
\d <table> describe table (or view, index, sequence)
\d{t|i|s|v} list tables/indices/sequences/views
\d{p|S|l} list permissions/system tables/lobjects
\da list aggregates
\dd [object] list comment for table, type, function, or operator
\df list functions
\do list operators
\dT list data types
\e [file] edit the current query buffer or [file]
with external editor
\echo <text> write text to stdout
\encoding <encoding> set client encoding
\f <sep> change field separator
\g [file] send query to backend (and results in [file] or |pipe)
\h [cmd] help on syntax of sql commands, * for all commands
\H toggle HTML mode (currently off)
\i <file> read and execute queries from <file>
\l list all databases
\lo_export, \lo_import, \lo_list, \lo_unlink
large object operations
\o [file] send all query results to [file], or |pipe
\p show the content of the current query buffer
\pset <opt> set table output
<opt> = {format|border|expanded|fieldsep|
null|recordsep|tuples_only|title|tableattr|pager}
\q quit psql
\qecho <text> write text to query output stream (see \o)
\r reset (clear) the query buffer
\s [file] print history or save it in [file]
\set <var> <value> set internal variable
\t show only rows (currently off)
\T <tags> HTML table tags
\unset <var> unset (delete) internal variable
\w <file> write current query buffer to a <file>
\x toggle expanded output (currently off)
\z list table access permissions
\! [cmd] shell escape or command
movies=#

The m ost im por t ant m et a- com m ands ar e \? ( m et a- com m and help) , and \q ( quit ) . The \h ( SQL help) m et a- com m and is
also v er y useful. Not ice t hat unlik e SQL com m ands, m et a- com m ands don't r equir e a t er m inat ing sem icolon, w hich m eans
t hat m et a- com m ands m ust be ent er ed ent ir ely on one line. I n t he next few sect ions, I 'll show y ou som e of t he ot her
m et a- com m ands.
Cr e a t i n g T a b l e s

Now t hat you have seen how t o connect t o a dat abase and issue a sim ple query, it 's t im e t o creat e som e sam ple dat a t o
work wit h.

Because you are pret ending t o m odel a m ovie- rent al business ( t hat is, a video st ore) , you will creat e t ables t hat m odel t he
dat a t hat you m ight need in a video st ore. St art by creat ing t hree t ables: tapes, customers, and rentals.

The tapes t able is sim ple: For each videot ape, you want t o st ore t he nam e of t he m ovie, t he durat ion, and a unique ident ifier
( rem em ber t hat you m ay have m ore t han one copy of any given m ovie, so t he m ovie nam e is not sufficient t o uniquely
ident ify a specific t ape) .

Here is t he com m and you should use t o creat e t he tapes t able:

CREATE TABLE tapes (


tape_id CHARACTER(8) UNIQUE,
title CHARACTER VARYING(80),
duration INTERVAL
);

Let 's t ake a close look at t his com m and.

The verb in t his com m and is CREATE TABLE, and it s m eaning should be obvious—you want t o creat e a t able. Following t he
CREATE TABLE verb is t he nam e of t he t able ( tapes) and t hen a com m a- separat ed list of colum n definit ions, enclosed wit hin
parent heses.

Each colum n in a t able is defined by a nam e and a dat a t ype. The first colum n in tapes is nam ed tape_id. Colum n nam es
( and t able nam es) m ust begin wit h a let t er or an underscore charact er [ 2 ] and should be 31 charact ers or fewer [ 3 ] . The
tape_id colum n is creat ed wit h a dat a t ype of CHARACTER(8). The dat a t ype you define for a colum n det erm ines t he set of
values t hat you can put int o t hat colum n. For exam ple, if you want a colum n t o hold num eric values, you should use a
num eric dat a t ype; if you want a colum n t o hold dat e ( or t im e) values, you should use a dat e/ t im e dat a t ype. tape_id holds
alphanum eric values ( a m ixt ure of num bers and let t ers) , so I chose a charact er dat a t ype, wit h a lengt h of eight charact ers.

[ 2]
You can begin a colum n or t able nam e wit h nonalphabet ic charact ers, but you m ust enclose t he nam e in
double quot es. You have t o quot e t he nam e not only when you creat e it , but each t im e you reference it .

[ 3]
You can increase t he m axim um ident ifier lengt h beyond 31 charact ers if you build Post greSQL from a
source dist ribut ion. I f you do so, you'll have t o rem em ber t o increase t he ident ifier lengt h each t im e you
upgrade your server, or whenever you m igrat e t o a different server.

The tape_id colum n is defined as UNIQUE. The word UNIQUE is not a part of t he dat a t ype—t he dat a t ype is CHARACTER(8).
The keyword 'UNIQUE' specifies a colum n const raint . A colum n const raint is a condit ion t hat m ust be m et by a colum n. I n t his
case, each row in t he tapes t able m ust have a unique tape_id. Post greSQL support s a variet y of colum n const raint s ( and
t able const raint s) . I 'll cover const raint s in Chapt er 2.

The title is defined as CHARACTER VARYING(80). The difference bet ween CHARACTER(n) and CHARACTER VARYING(n) is t hat
a CHARACTER(n) colum n is fixed lengt h—it will always cont ain a fixed num ber of charact ers ( nam ely, n charact ers) . A
CHARACTER VARYING(n) colum n can cont ain a m axim um of n charact ers. I 'll m ent ion here t hat CHARACTER(n) can be
abbreviat ed as CHAR(n), and CHARACTER VARYING(n) can be abbreviat ed as VARCHAR(n). I chose CHAR(8) as t he dat a t ype
for tape_id because I know t hat a tape_id will always cont ain exact ly eight charact ers, never m ore and never less. Movie
t it les, on t he ot her hand, are not all t he sam e lengt h, so I chose VARCHAR(80) for t hose colum ns. A fixed lengt h dat a t ype is
a good choice when t he dat a t hat you st ore is in fact fixed lengt h; and in som e cases, fixed lengt h dat a t ypes can give you a
perform ance boost . A variable lengt h dat a t ype saves space ( and oft en gives you bet t er perform ance) when t he dat a t hat
you are st oring is not all t he sam e lengt h and can vary widely.

The duration colum n is defined as an INTERVAL—an INTERVAL st ores a period of t im e such as 2 weeks, 1 hour 45 m inut es,
and so on.

I 'll be discussing Post greSQL dat a t ypes in det ail in Chapt er 2. Let 's m ove on t o creat ing t he ot her t ables in t his exam ple
dat abase.

The customers t able is used t o record inform at ion about each cust om er for t he video st ore.

CREATE TABLE customers (


customer_id INTEGER UNIQUE,
customer_name VARCHAR(50),
phone CHAR(8),
birth_date DATE,
balance NUMERIC(7,2)
);

Each cust om er will be assigned a unique customer_id. Not ice t hat customer_id is defined as an INTEGER, whereas t he
ident ifier for a tape was defined as a CHAR(8). A tape_id can cont ain alphabet ic charact ers, but a customer_id is ent irely
num eric [ 4 ] .

[ 4]
The decision t o define customer_id as an INTEGER was arbit rary. I sim ply want ed t o show a few m ore dat a
t ypes here.

I 've used t wo ot her dat a t ypes here t hat you m ay not have seen before: DATE and NUMERIC. A DATE colum n can hold dat e
values ( cent ury, year, m ont h, and day) . Post greSQL offers ot her dat e/ t im e dat a t ypes t hat can st ore different dat e/ t im e
com ponent s. For exam ple, a TIME colum n can st ore t im e values ( hours, m inut es, seconds, and m icroseconds) . A TIMESTAMP
colum n gives you bot h dat e and t im e com ponent s—cent uries t hrough m icroseconds.

A NUMERIC colum n, obviously, holds num eric values. When you creat e a NUMERIC colum n, you have t o t ell Post greSQL t he
t ot al num ber of digit s t hat you want t o st ore and t he num ber of fract ional digit s ( t hat is, t he num ber of digit s t o t he right of
t he decim al point ) . The balance colum n cont ains a t ot al of seven digit s, wit h t wo digit s t o t he right of t he decim al point .

Now, let 's creat e t he rentals t able:

CREATE TABLE rentals (


tape_id CHARACTER(8),
customer_id INTEGER,
rental_date DATE
);

When a cust om er com es in t o rent a t ape, you will add a row t o t he rentals t able t o record t he t ransact ion. There are t hree
pieces of inform at ion t hat you need t o record for each rent al: t he tape_id, t he customer_id, and t he dat e t hat t he rent al
occurred. Not ice t hat each row in t he rent als t able refers t o a cust om er ( customer_id) and a t ape ( tape_id) . I n m ost cases,
when one row refers t o anot her row, you want t o use t he sam e dat a t ype for bot h colum ns.

W hat Makes a Relational Database Relational?

Not ice t hat t he each row in t he rentals t able refers t o a row in t he customer t able ( and a row in t he tapes
t able) . I n ot her words, t here is a relat ionship bet ween rentals and customers and a relat ionship bet ween
rentals and tapes. The relat ionship bet ween t wo rows is est ablished by including an ident ifier from one row
wit hin t he ot her row. Each row in t he rentals t able refers t o a customer by including t he customer_id. That 's
t he heart of t he relat ional dat abase m odel—t he relat ionship bet ween t wo ent it ies is est ablished by including
t he unique ident ifier of one ent it y wit hin t he ot her.
V i e w i n g T a b l e D e scr i p t i o n s

At t his point , you've defined t hree t ables in t he movies dat abase: tapes, customers, and rentals. I f you want t o view t he
t able definit ions, you can use t he \d m et a- com m and in psql ( rem em ber t hat a m eta- com m and is not really a SQL com m and,
but a com m and underst ood by t he psql client ) . The \d m et a- com m and com es in t wo flavors: I f you include a t able nam e ( \ d
customers) , you will see t he definit ion of t hat t able; if you don't include a t able nam e, \d will show you a list of all t he t ables
defined in your dat abase.

Code View: Scroll / Show All

$ psql -d movies
Welcome to psql, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms


\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit

movies=# \d
List of relations
Name | Type | Owner
-----------+-------+---------------
customers | table | bruce
rentals | table | bruce
tapes | table | bruce
(3 rows)

movies=# \d tapes
Table "tapes"

Column | Type | Modifiers


----------+-----------------------+-----------
tape_id | character(8) |
title | character varying(80) |
duration | interval |

Indexes:
"tapes_tape_id_key" UNIQUE, btree (tape_id)

movies=# \d customers
Table "customers"
Attribute | Type | Modifier
--------------+-----------------------+----------
customer_id | integer |
customer_name| character varying(50) |
phone | character(8) |
birth_date | date |
balance | numeric(7,2) |
Index: customers_customer_id_key

movies=# \d rentals
Table "rentals"
Attribute | Type | Modifier
-------------+--------------+----------
tape_id | character(8) |
customer_id | integer |
rental_date | date |

movies=#

I 'll point out a few t hings about t he \d m et a- com m and.

Not ice t hat for each colum n in a t able, t he \d m et a- com m and ret urns t hree pieces of inform at ion: t he colum n nam e ( or
Attribute) , t he dat a t ype, and a Modifier.

The dat a t ype report ed by t he \d m et a-com m and is spelled out ; you won't see char(n) or varchar(n), you'll see character
(n) and character varying(n) inst ead.

The Modifier colum n shows addit ional colum n at t ribut es. The m ost com m only encount ered m odifiers are NOT NULL and
DEFAULT .... The NOT NULL m odifier appears when you creat e a m andat ory colum n—m andat ory m eans t hat each row in t he
t able m ust have a value for t hat colum n. The DEFAULT ... m odifier appears when you creat e a colum n wit h a default value.
A default value is insert ed int o a colum n when you don't specify a value for a colum n. I f you don't specify a default value,
Post greSQL insert s t he special value NULL. I 'll discuss NULL values and default values in m ore det ail in Chapt er 2.

www.allitebooks.com
You m ight have not iced t hat t he list ing for t he tapes and customers t ables show t hat an index has been creat ed. Post greSQL
aut om at ically creat es an index for you when you define UNIQUE colum ns. An index is a dat a st ruct ure t hat Post greSQL can
use t o ensure uniqueness. I ndexes are also used t o increase perform ance. I 'll cover indexes in m ore det ail in Chapt er 3,
"Post greSQL SQL Synt ax and Use."

Depending on which version of Post greSQL you're using, you m ay see each t able nam e list ed as "Table "public. t able- nam e".
The "public" part is t he nam e of t he schem a t hat t he t able is defined in.
Ad d i n g N e w R e co r d s t o a T a b l e

The t wo previous sect ions showed you how t o creat e som e sim ple t ables and how t o view t he t able definit ions. Now let 's see
how t o insert dat a int o t hese t ables.

U si n g t h e INSERT Co m m a n d

The m ost com m on m et hod t o get dat a int o a t able is by using t he INSERT com m and. Like m ost SQL com m ands, t here are a
num ber of different form at s for t he INSERT com m and. Let 's look at t he sim plest form first :

INSERT INTO table VALUES ( expression [,...] );

A Quick I ntroduction t o Syntax Diagram s

I n m any books t hat describe a com put er language ( such as SQL) , you will see synt ax diagram s. A synt ax
diagram is a precise way t o describe t he synt ax for a com m and. Here is an exam ple of a sim ple synt ax
diagram :

INSERT INTO table VALUES ( expression [,...] );

I n t his book, I 'll use t he following convent ions:

€ Words t hat are present ed in uppercase m ust be ent ered lit erally, as shown, except for t he case. When
you ent er t hese words, it doesn't m at t er if you ent er t hem in uppercase, lowercase, or m ixed case, but
t he spelling m ust be t he sam e. SQL keywords are t radit ionally t yped in uppercase t o im prove
readabilit y, but t he case does not really m at t er ot herwise.

€ A lowercase it alic word is a placeholder for user- provided t ext . For exam ple, t he t able placeholder
shows where you would ent er a t able nam e, and expression shows where you would ent er an
expression.

€ Opt ional t ext is shown inside a pair of square bracket s ( []). I f you include opt ional t ext , don't include
t he square bracket s.

€ Finally, ,... m eans t hat you can repeat t he previous com ponent one or m ore t im es, separat ing
m ult iple occurrences wit h com m as.

So, t he following INSERT com m ands are ( synt act ically) correct :

INSERT INTO states VALUES ( 'WA', 'Washington' );


INSERT INTO states VALUES ( 'OR' );

This com m and would not be legal:

INSERT states VALUES ( 'WA' 'Washington' );

There are t wo problem s wit h t his com m and. First , I forgot t o include t he INTO keyword ( following INSERT) .
Second, t he t wo values t hat I provided are not separat ed by a com m a.

When you use an INSERT st at em ent , you have t o provide t he nam e of t he t able and t he values t hat you want t o include in
t he new row. The following com m and insert s a new row int o t he customers t able:

INSERT INTO customers VALUES


(
1,
'William Rubin',
'555-1212',
'1970-12-31',
0.00
);

This com m and creat es a single row in t he customers t able. Not ice t hat you did not have t o t ell Post greSQL how t o m at ch up
each value wit h a specific colum n: I n t his form of t he INSERT com m and, Post greSQL assum es t hat you list ed t he values in
colum n order. I n ot her words, t he first value t hat you provide will be placed in t he first colum n, t he second value will be
st ored in t he second colum n, and so fort h. ( The ordering of colum ns wit hin a t able is defined when you creat e t he t able.)

I f you don't include one ( or m ore) of t he t railing values, Post greSQL will insert default values for t hose colum ns. The default
value is t ypically NULL.

Not ice t hat I have included single quot es around som e of t he dat a values. Num eric dat a should not be quot ed; m ost ot her
dat a t ypes m ust be. I n Chapt er 2, I 'll cover t he lit eral value synt ax for each dat a t ype.

I n t he second form of t he INSERT st at em ent , you include a list of colum ns and a list of values:

INSERT INTO table ( column [,...] ) VALUES ( expression [,...] );

Using t his form of INSERT, I can specify t he order of t he colum n values:

INSERT INTO customers


(
customer_name, birth_date, phone, customer_id, balance
)
VALUES
(
'William Rubin',
'1970-12-31',
'555-1212',
1,
0.00
);

As long as t he colum n values m at ch up wit h t he order of t he colum n nam es t hat you specified, everybody's happy.

The advant age t o t his second form is t hat you can om it t he value for any colum n ( at least any colum n t hat allows NULLs) . I f
you use t he first form ( wit hout colum n nam es) , you can only om it values for t railing colum ns. You can't om it a value in t he
m iddle of t he row because Post greSQL can only m at ch up colum n values in left t o right order.

Here is an exam ple t hat shows how t o INSERT a cust om er who wasn't willing t o give you his dat e of birt h:

INSERT INTO customers


(
customer_name, phone, customer_id, balance
)
VALUES
(
'William Rubin',
'555-1212',
1,
0.00
);

This is equivalent t o eit her of t he following st at em ent s:

INSERT INTO customers


(
customer_name, birth_date, phone, customer_id, balance
)
VALUES
(
'William Rubin',
NULL,
'555-1212',
1,
0.00
);

or

INSERT INTO customers VALUES


(
1,
'William Rubin',
'555-1212',
NULL,
0.00
);

There are t wo ot her form s for t he INSERT com m and. I f you want t o creat e a row t hat cont ains only default values, you can
use t he following form :

INSERT INTO table DEFAULT VALUES;

Of course, if any of t he colum ns in your t able are unique, you can only insert a single row wit h default values.

The final form for t he INSERT st at em ent allows you t o insert one or m ore rows based on t he result s of a query:

INSERT INTO table ( column [,...] ) SELECT query;

I haven't really t alked ext ensively about t he SELECT st at em ent yet ( t hat 's in t he next sect ion) , but I 'll show you a sim ple
exam ple here:

INSERT INTO customer_backup SELECT * from customers;

This INSERT com m and copies every row in t he customers t able int o t he customer_backup t able. I t 's unusual t o use
INSERT...SELECT... t o m ake an exact copy of a t able ( in fact , t here are easier ways t o do t hat ) . I n m ost cases, you will use
t he INSERT...SELECT... com m and t o m ake an alt ered version of a t able; you m ight add or rem ove colum ns or change t he
dat a using expressions.

U si n g t h e COPY Co m m a n d

I f you need t o load a lot of dat a int o a t able, you m ight want t o use t he COPY com m and. The COPY com m and com es in t wo
form s. COPY ... TO writ es t he cont ent s of a t able int o an ext ernal file. COPY ... FROM reads dat a from an ext ernal file int o a
t able.

Let 's st art by export ing t he customers t able:

COPY customers TO '/tmp/customers.txt';

This com m and copies every row in t he customers t able int o a file nam ed '/tmp/customers.txt'. Take a look at t he
cust om ers.t xt file:

1 Jones, Henry 555-1212 1970-10-10 0.00


2 Rubin, William 555-2211 1972-07-10 15.00
3 Panky, Henry 555-1221 1968-01-21 0.00
4 Wonderland, Alison 555-1122 1980-03-05 3.00

I f you com pare t he file cont ent s wit h t he definit ion of t he cust om ers t able:

movies=# \d customers
Table "customers"
Attribute | Type | Modifier
--------------+-----------------------+----------
customer_id | integer |
customer_name| character varying(50) |
phone | character(8) |
birth_date | date |
balance | numeric(7,2) |
Index: customers_customer_id_key

You can see t hat t he colum ns in t he t ext form m at ch ( left t o right ) wit h t he colum ns defined in t he t able: The left m ost
colum n is t he customer_id, followed by customer_name, phone, and so on. Each colum n is separat ed from t he next by a t ab
charact er and each row ends wit h an invisible newline charact er. You can choose a different colum n separat or ( wit h t he
DELIMITERS 'delimiter' opt ion) , but you can't change t he line t erm inat or. That m eans t hat you have t o be careful edit ing a
COPY file using a DOS ( or Windows) t ext edit or because m ost of t hese edit ors t erm inat e each line wit h a carriage-
ret urn/ newline com binat ion. That will confuse t he COPY ... FROM com m and when you t ry t o im port t he t ext file.

The inverse of COPY ... TO is COPY ... FROM. COPY ... FROM im port s dat a from an ext ernal file int o a Post greSQL t able.
When you use COPY ... FROM, t he form at of t he t ext file is very im port ant . The easiest way t o find t he correct form at is t o
export a few rows using COPY ... TO, and t hen exam ine t he t ext file.
I f you decide t o creat e your own t ext file for use wit h t he COPY ... FROM com m and, you'll have t o worry about a lot of
det ails like proper quot ing, colum n delim it ers, and such. Consult t he Post greSQL reference docum ent at ion for m ore det ails.
I n st a llin g t h e Sa m p l e D a t a b a se

I f you want , you can download a sam ple dat abase from t his book's websit e: ht t p: / / www.conj ect rix.com / pgbook .

Aft er you have downloaded t he bookdata.tar.gz file, you can unpack it wit h eit her of t he following com m ands:

$ tar -zxvf bookdata.tar.gz

or

$ gunzip -c bookdata.tar.gz | tar -xvf -

The bookdata.tar.gz file cont ains a num ber of files and will unpack int o your current direct ory. Aft er unpacking, you will see
a subdirect ory for each chapt er ( okay, for m ost chapt ers—not all chapt ers include sam ple code or sam ple dat a) .

You can use t he chapter1/load_sample.sql file t o creat e and populat e t he t hree t ables t hat I have discussed ( tapes,
customers, and rentals) . To use t he load_sample.sql file, execut e t he following com m and:

$ psql -d movies -f chapter1/load_sample.sql

This com m and drops t he tapes, customers, and rentals t ables ( if t hey exist ) , creat es t hem , and adds a few sam ple rows t o
each one.
Re t r ie v in g D a t a f r o m t h e Sa m p l e D a t a b a se

At t his point , y ou should have a sam ple dat abase ( m ov ies) t hat cont ains t hr ee t ables ( tapes, customers, and rentals) and a few r ow s in
each t able. You k now how t o get dat a int o a t able; now let 's see how t o view t hat dat a.

The SELECT st at em ent is used t o ret rieve dat a fr om a dat abase. SELECT is t he m ost com plex st at em ent in t he SQL language, and t he m ost
powerful. Using SELECT, you can ret rieve ent ire t ables, single r ow s, a gr oup of r ow s t hat m eet a set of const r aint s, com binat ions of m ult iple
t ables, expressions, and m or e. To help y ou under st and t he basics of t he SELECT st at em ent , I 'll t ry t o br eak it dow n int o each of it s for m s
and m ov e from t he sim ple t o t he m ore com plex.

SELECT Ex p r e ssi o n

I n it s sim plest for m , y ou can use t he SELECT st at em ent t o r et r ieve one or m ore values from a set of pr edefined funct ions. You've alr eady
seen how t o r et r ieve y our Post gr eSQL user id:

movies=# select user;


current_user
---------------
korry
(1 row)

movies=# \q

Ot her v alues t hat you m ight w ant t o see are

select 5; -- returns the number 5 (whoopee)


select sqrt(2.0); -- returns the square root of 2
select timeofday();-- returns current date/time
select now(); -- returns time of start of transaction
select version(); -- returns the version of PostgreSQL you are using

select now(), timeofday();

Co m m e n t in g

The -- char act er s int roduce a com m ent —any t ext t hat follow s is ignor ed.

The previous exam ple show s how t o SELECT m ore t han one piece of infor m at ion—j ust list all t he values t hat you want , separat ed by
com m as.

The Post gr eSQL User 's Guide cont ains a list of all t he funct ions t hat ar e dist r ibut ed w it h Post gr eSQL. I n Chapt er 2, I 'll show y ou how t o
com bine colum ns, funct ions, oper at or s, and lit er al v alues int o m or e com plex ex pr essions.

SELECT * FROM T a b l e

You pr obably w on't use t he fir st for m of t he SELECT st at em ent v er y oft en—it j ust isn't v er y ex cit ing. Mov ing t o t he next lev el of com plexit y,
let 's see how t o r et r ieve dat a from one of t he t ables t hat y ou creat ed ear lier :

Code View : Scroll / Show All

movies=# SELECT * FROM customers;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00(4 rows)

When you w r it e a SELECT st at em ent , y ou have t o t ell Post gr eSQL what infor m at ion y ou ar e t r y ing t o ret rieve. Let 's t ak e a closer look at t he
com ponent s of t his SELECT st at em ent .

Follow ing t he SELECT keyw or d, y ou specify a list of t he colum ns t hat y ou w ant t o ret rieve. I used an ast er isk ( *) here t o t ell Post gr eSQL
t hat w e w ant t o see all t he colum ns in t he customers t able.

Next , you have t o t ell Post gr eSQL w hich t able y ou w ant t o view ; in t his case, y ou w ant t o see t he customers t able.

Now let 's look at t he r esult s of t his quer y. A SELECT st at em ent r et ur ns a result set . A r esult set is a t able com posed of all t he r ow s and
colum ns ( or fields) t hat y ou r equest . A r esult set m ay be em pt y.
You ask ed Post gr eSQL t o ret urn all t he colum ns in t he customers t able—not ice t hat t he colum ns ar e displayed ( from left t o r ight ) in t he
order t hat you specified w hen y ou cr eat ed t he t able. You m ay have not iced t hat t he r ow s ar e r et ur ned in an ( appar ent ly) arbit rary or der .
That 's an im port ant t hing t o keep in m ind: Unless you specifically r equest t hat Post gr eSQL ret urn r ow s in a part icular order, y ou w on't be
able t o pr edict w hich r ow s will com e fir st [ 5] . This is a per for m ance feat ur e; if y ou don't car e about r ow ordering, let Post gr eSQL ret urn t he
rows in t he fast est possible w ay.

[ 5]
Okay, som e people probably could pr edict t he or der in w hich t he r ow s w ill appear . Those people have w ay t oo m uch fr ee
t im e and consider a pr opeller t o be fashionable headw ear. They ar e also v er y good at inducing sleep.

M a k in g t h e M o st o f t h e psql Co n so le

You'll be spending a lot of t im e using t he psql console so it 's a good idea t o get t o k now it w ell. psql can do m or e t han
j ust send a com m and t o t he ser ver and display t he r esult . You can use bash- st yle t ab com plet ion t o r educe t he am ount of
t yping you have t o do. To use t ab com plet ion, j ust t y pe in t he fir st few char act er s of a w or d and t hen pr ess t he TAB key —
psql will t r y t o com plet e t he r est of t he w ord. Tab com plet ion is sm art . I f y ou t ype in t he first few char act er s of a
com m and and t hen pr ess TAB, psql t ries t o com plet e t he com m and nam e. I f you've alr eady ent er ed DROP DATABASE and
t hen pr ess TAB, psql show s y ou a list of dat abases; t y pe in t he fir st few char act er s of a dat abase, pr ess TAB, and psql
com plet es t he nam e of t he dat abase. I f you pr ess TAB in a cont ex t w her e psql expect s t o find a user nam e ( like DROP USER
<TAB>) , you'll see a list of user s. psql can com plet e colum n nam es, dat a t y pe nam es, dom ain nam es, aggregat e nam es,
funct ion nam es, index nam es, t able nam es, view nam es, dat abase nam es, encodings, languages, schem as, and users.

You can also change t he form at t hat psql uses t o display quer y result s. By default , psql uses a form at nam ed aligned
( each colum n is pr eceded by a colum n header and v alues ar e aligned w it hin a gr id) . You can also choose unaligned, html,
or latex form at . To change t he out put form at , use t he com m and \pset format format-name. For ex am ple, t o sw it ch t o
html form at , t y pe in t he com m and \pset format html. Once y ou'r e in HTML- m ode, query r esult s will include t he HTML
t ags required t o display t he r esult s in t abular for m . You pr obably w ant t o send HTML out put t o a file ( r at her t han seeing all
of t he for m at t ing com m ands in your t erm inal w indow ) . Use t he \o filename com m and t o r out e quer y r esult s t o t he giv en
filename. See t he psql m anual page ( $ man psql) for com plet e det ails. Play around w it h t he for m at t ing opt ions. Play
around w it h t ab com plet ion. Change y our psql pr om pt . psql packs a lot of pow er int o an easy - t o- use int er face.

SELECT Single-Column FROM T a b l e

I f y ou don't want t o view all of t he colum ns fr om a t able, y ou can r eplace t he * ( follow ing t he SELECT keyw or d) w it h t he nam e of a colum n:

movies=# SELECT title FROM tapes;


title
---------------
The Godfather
The Godfather
Casablanca
Citizen Kane
Rear Window
(5 rows)

Again, t he r ow s ar e pr esent ed in an arbit rary order. But t his t im e y ou see only a single colum n. You m ay have not iced t hat " The Godfat her"
appear s t w ice in t his list . That happens because our im aginar y video st or e owns t wo copies of t hat m ov ie. I 'll show you how t o get r id of
duplicat es in a m om ent .

SELECT Column-List FROM T a b l e

So far, y ou have seen how t o select all t he colum ns in a t able and how t o select a single colum n. Of cour se, t here is a m iddle ground—you
can select a list of colum ns:

movies=# SELECT customer_name, birth_date FROM customers;


customer_name | birth_date
----------------------+------------
Jones, Henry | 1970-10-10
Rubin, William | 1972-07-10
Panky, Henry | 1968-01-21
Wonderland, Alice N. | 1969-03-05
(4 rows)

I nst ead of nam ing a single colum n aft er t he SELECT k ey w or d, y ou can pr ovide a colum n- separat ed list of colum n nam es. Colum n nam es can
appear in any order, and t he result s will appear in t he or der y ou specify.

SELECT Expression-List FROM T a b l e

I n addit ion t o select ing colum ns, y ou can also select ex pr essions. Rem em ber , an ex pr ession is a com binat ion of colum ns, funct ions,
operat ors, lit eral v alues, and ot her ex pr essions t hat will evaluat e t o a single value. Here is an exam ple:

movies=# SELECT
movies-# customer_name,
movies-# birth_date,
movies-# age( birth_date )
movies-# FROM customers;
customer_name | birth_date | age
----------------------+------------+------------------------------
Jones, Henry | 1970-10-10 | 31 years 4 mons 3 days 01:00
Rubin, William | 1972-07-10 | 29 years 7 mons 3 days 01:00
Panky, Henry | 1968-01-21 | 34 years 23 days
Wonderland, Alice N. | 1969-03-05 | 32 years 11 mons 8 days
(4 rows)

I n t his exam ple, I 've select ed t w o colum ns and an expr ession. The ex pr ession age( birth_date ) is evaluat ed for each row in t he t able.
The age() funct ion subt r act s t he giv en dat e from t he current dat e [ 6] .

[ 6]
Technically, t he age() funct ion subt r act s t he giv en t im est am p ( dat e+ t im e) fr om t he cur r ent dat e and t im e.

Se l e ct i n g Sp e ci f i c R o w s

The pr eceding few sect ions have shown y ou how t o specify w hich colum ns y ou w ant t o see in a result set . Now let 's see how t o choose only
t he rows t hat y ou want .

First , I 'll show y ou t o how t o elim inat e duplicat e r ow s; t hen I 'll int roduce t he WHERE clause.

SELECT [ALL | DISTINCT | DISTINCT ON]

I n an ear lier exam ple, y ou select ed t he t it les of all t he videot apes ow ned by y our video st ore:

movies=# SELECT title from tapes;


title
---------------
The Godfather
The Godfather
Casablanca
Citizen Kane
Rear Window
(5 rows)

Not ice t hat " The Godfat her " is list ed t wice ( you ow n t wo copies of t hat video) . You can use t he DISTINCT clause t o filt er out duplicat e r ow s:

movies=# SELECT DISTINCT title FROM tapes;


title
---------------
Casablanca
Citizen Kane
Rear Window
The Godfather
(4 rows)

You now have a single row w it h t he v alue " The Godfat her." Let 's see w hat happens w hen y ou add t he tape_id back int o t he pr ev ious
query:

movies=# SELECT DISTINCT title, tape_id FROM tapes;


title | tape_id
---------------+----------
Casablanca | MC-68873
Citizen Kane | OW-41221
Rear Window | AH-54706
The Godfather | AB-12345
The Godfather | AB-67472
(5 rows)

We're back t o seeing " The Godfat her " t wice. What happened? The DISTINCT clause r em oves duplicat e row s, not duplicat e colum n values;
and w hen t he t ape I Ds are added t o t he result , t he rows cont aining " The Godfat her" are no longer ident ical.

I f you w ant t o filt er r ow s t hat hav e duplicat e values in one ( or m or e) colum ns, use t he DISTINCT ON() form :

movies=# SELECT DISTINCT ON (title) title, tape_id FROM tapes;


title | tape_id
---------------+----------
Casablanca | MC-68873
Citizen Kane | OW-41221
Rear Window | AH-54706
The Godfather | AB-12345
(4 rows)

Not ice t hat one of t he " The Godfat her " r ow s has been om it t ed from t he result set . I f y ou don't include an ORDER BY clause ( I 'll cover t hat in
a m om ent ) , y ou can't pr edict w hich r ow in a set of duplicat es will be included in t he result set .

You can list m ult iple colum ns ( or expr essions) in t he DISTINCT ON() clause.

The WHERE Cla u se


The next for m of t he SELECT st at em ent includes t he WHERE clause. Her e is t he synt ax diagr am for t his form :

SELECT expression-list FROM table WHERE conditions

Using t he WHERE clause, y ou can filt er out r ow s t hat y ou don't w ant included in t he r esult set . Let 's see a sim ple ex am ple. First , here is t he
com plet e customers t able:

movies=# SELECT * FROM customers;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
(4 rows)

Now pick out only t hose cust om er s who ow e y ou som e m oney :

movies=# SELECT * FROM customers WHERE balance > 0;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
(2 rows)

I n t his exam ple, I 've used a single condit ion t o r est r ict t he r ow s included in t he r esult set : balance > 0.

When Post gr eSQL ex ecut es a SELECT st at em ent , it evaluat es t he WHERE clause as it pr ocesses each r ow . I f all t he condit ions specified by t he
WHERE clause are m et , t he r ow will be included in t he result set ( if a row m eet s all t he condit ions in t he WHERE clause, t he row sat isfies t he
WHERE clause) .

Here is an exam ple t hat is slight ly m or e com plex:

movies=# SELECT customer_name, phone FROM customers


movies-# WHERE
movies-# ( balance = 0 )
movies-# AND
movies-# ( AGE( birth_date ) < '35 years' )
movies-# ;
customer_name | phone
---------------+----------
Jones, Henry | 555-1212
(1 row)

I n t his quer y, I 'v e specified t wo condit ions, separ at ed by an AND operat or. The condit ions ar e: balance = 0 and AGE( birth_date ) < '34
years'[ 7] . As befor e, Post gr eSQL r eads each r ow in t he customers t able and ev aluat es t he WHERE clause. I f a giv en row is t o be included in
t he r esult set , it m ust sat isfy t wo const r aint s—balance m ust be equal t o zero and t he cust om er m ust be y ounger t han 35 y ear s of age. I f
eit her of t hese condit ions is false for a given r ow , t hat r ow will not be included in t he r esult set .

[ 7] I 'll show y ou how t o for m at v ar ious dat e/ t im e relat ed v alues in Chapt er 2.

AND is one of t he logical oper at or s suppor t ed by Post gr eSQL. A logical oper at or is used t o com bine logical ex pr essions. A logical expression
is an expression t hat ev aluat es t o TRUE, FALSE, or unk now n ( NULL) . The ot her t wo logical oper at or s are OR and NOT.

Let 's see how t he OR oper at or w or ks:

movies=# SELECT customer_id, customer_name, balance, AGE(birth_date)


movies-# FROM customers
movies-# WHERE
movies-# ( balance = 0 )
movies-# OR
movies-# ( AGE( birth_date ) < '35 years' )
movies-# ;
customer_id | customer_name | balance | age
-------------+----------------+---------+----------------------------------
3 | Panky, Henry | 0.00 | 36 years 8 mons 29 days 23:00:00
1 | Jones, Henry | 0.00 | 34 years 10 days
2 | Rubin, William | 15.00 | 32 years 3 mons 10 days
(3 rows)

The OR oper at or evaluat es t o TRUE if eit her ( or bot h) of t he condit ions is TRUE. The fir st r ow ( id = 1) is included in t he r esult set because it
sat isfies t he fir st condit ion ( balance = 0) . I t is included even if it does not sat isfy t he second condit ion. The second r ow ( id = 2) is
included in t he result set because it sat isfies t he second condit ion, but not t he fir st . You can see t he differ ence bet w een AND and OR. A row
sat isfies t he AND operat or if bot h condit ions ar e TRUE. A row sat isfies t he OR operat or if eit her condit ion is TRUE ( or if bot h ar e TRUE) .

www.allitebooks.com
The NOT oper at or is sim ple:

movies=# SELECT * FROM customers


movies-# WHERE
movies-# NOT ( balance = 0 )
movies-# ;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
(2 rows)

NOT evaluat es t o TRUE if it s oper and is FALSE and evaluat es t o FALSE if it s oper and is TRUE. The NOT oper at or inver t s ( or r ever ses) a t est .
Wit hout t he NOT operat or, t he previous exam ple w ould have r et ur ned all customers w her e t he balance colum n w as equal t o zer o. Wit h t he
NOT operat or, you get t he ot her r ow s inst ead.

One ot her point t hat I should m ent ion about t he WHERE clause. Just because y ou m ent ion a colum n in t he WHERE clause does not m ean t hat
you have t o include t he colum n in t he r esult set . For exam ple:

movies=# SELECT customer_id, customer_name FROM customers


movies-# WHERE
movies-# balance != 0
movies-# ;

customer_id | customer_name
-------------+----------------------
4 | Wonderland, Alice N.
2 | Rubin, William
(2 rows)

This exam ple also show s a m ore com m on alt er nat iv e t o t he NOT operat or. The != oper at or m eans " is not equal t o." The != oper at or is not
an ex act r eplacem ent for NOT—it can only be used t o check for inequalit y, w her eas NOT is used t o r ev er se t he sense of any logical
expression.

NULL V a lu e s

Som et im es w hen y ou add dat a t o a t able, y ou find t hat you don't k now w hat value y ou should include for a colum n. For ex am ple, y ou m ay
encount er a cust om er who does not w ant t o pr ovide y ou w it h his or her bir t hday . What value should be r ecor ded in t he birth_date colum n
for t hat cust om er ? You don't r eally w ant t o m ake up an answ er —you w ant a dat e value t hat m eans " unknow n." This is w hat t he NULL value
is for . NULL usually m eans t hat you don't k now w hat v alue should be ent er ed int o a colum n, but it can also m ean t hat a colum n does not
apply. A NULL value in t he birth_date colum n cer t ainly m eans t hat we don't k now a cust om er 's birth_date, not t hat birth_date does not
apply [ 8] . On t he ot her hand, you m ight w ant t o include a rating colum n in t he tapes t able. A NULL value in t he rating colum n m ight im ply
t hat t he m ov ie w as pr oduced befor e r at ings w er e int r oduced and t her efor e t he rating colum n does not apply .

[ 8]
I am m ak ing t he assum pt ion t hat t he cust om er s for y our video st ore have act ually been bor n. For som e of y ou, t hat m ay
not be a valid assum pt ion.

Som e colum ns should not allow NULL values. I n m ost cases, it w ould not m ake sense t o add a cust om er t o y our customers t able unless y ou
know t he cust om er's nam e. Ther efor e, t he customer_name colum n should be m andat or y ( in ot her words, customer_name should not allow
NULL values) .

Let 's dr op and re- cr eat e t he customers t able so t hat you can t ell Post gr eSQL which colum ns should allow NULL values:

movies=# DROP TABLE customers;


DROP
movies=# CREATE TABLE customers (
movies-# customer_id INTEGER UNIQUE NOT NULL,
movies-# customer_name VARCHAR(50) NOT NULL,
movies-# phone CHAR(8),
movies-# birth_date DATE,
movies-# balance DECIMAL(7,2)
movies-#);
CREATE

The NOT NULL m odifier t ells Post gr eSQL t hat t he customer_id and customer_name colum ns ar e m andat or y . I f y ou don't specify NOT NULL,
Post gr eSQL assum es t hat a colum n is opt ional. You can include t he k ey w or d NULL t o m ake your choices m or e obvious:

movies=# DROP TABLE customers;


DROP
movies=# CREATE TABLE customers (
movies-# customer_id INTEGER UNIQUE NOT NULL,
movies-# customer_name VARCHAR(50) NOT NULL,
movies-# phone CHAR(8) NULL,
movies-# birth_date DATE NULL,
movies-# balance DECIMAL(7,2) NULL
movies-#);
CREATE
Not ice t hat a colum n of any dat a t y pe can suppor t NULL values.

The NULL value has a unique pr oper t y t hat is oft en t he sour ce of m uch confusion. NULL is not equal t o any v alue, not even it self. NULL is not
less t han any value, and NULL is not gr eat er t han any value. Let 's add a cust om er w it h a NULL balance:

movies=# INSERT INTO customers


movies-# VALUES
movies-# (
movies(# 5, 'Funkmaster, Freddy', '555-FUNK', NULL, NULL
movies(# )
movies-# ;

Now we have five cust om ers:

movies=# SELECT * FROM customers;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
5 | Funkmaster, Freddy | 555-FUNK | |
(5 rows)

One of t hese cust om er s has a NULL balance. Let 's t r y a few quer ies:

movies=# SELECT * FROM customers WHERE balance > NULL;

customer_id | customer_name | phone | birth_date | balance


-------------+---------------+-------+------------+---------
(0 rows)

This query did not ret urn any row s. You m ight t hink t hat it should have cust om er num ber 2 ( Rubin, William ) ; aft er all, 15.00 is surely
great er t han 0. But r em em ber , NULL is not equal t o, great er t han, or less t han any ot her value. NULL is not t he sam e as zer o. Rat her t han
using relat ional oper at or s ( ' =', ' !=', '<', or '>') , y ou should use eit her t he IS or IS NOT oper at or .

movies=# SELECT * FROM customers WHERE balance IS NULL;

customer_id | customer_name | phone | birth_date | balance


-------------+--------------------+----------+------------+---------
5 | Funkmaster, Freddy | 555-FUNK | |
(1 row)

movies=# SELECT * FROM customers WHERE balance IS NOT NULL;


customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
(4 rows)

The NULL value int r oduces anot her com plicat ion. I f NULL is not gr eat er t han, equal t o, or less t han any ot her value, w hat w ould ' NULL + 4'
m ean? I s NULL + 4 gr eat er t han NULL? I t can't be because t hat w ould im ply t hat NULL is less t han NULL + 4 and, by definit ion, NULL can't
be less t han anot her value. What does all t his m ean? I t m eans t hat y ou can't do m at h w it h a NULL value.

Code View : Scroll / Show All

movies=# SELECT customer_id, customer_name, balance, balance+4 FROM customers;

customer_id | customer_name | balance | ?column?


-------------+----------------------+---------+----------
3 | Panky, Henry | 0.00 | 4.00
1 | Jones, Henry | 0.00 | 4.00
4 | Wonderland, Alice N. | 3.00 | 7.00
2 | Rubin, William | 15.00 | 19.00
5 | Funkmaster, Freddy | |
(5 rows)

This quer y show s w hat happens w hen y ou t r y t o per for m a m at hem at ical oper at ion using NULL. When you t ry t o add '4' t o NULL, you end up
w it h NULL.

The NULL value com plicat es logic oper at or s as well. Most pr ogr am m er s are fam iliar w it h t wo- valued logic oper at or s ( t hat is, logic oper at or s
t hat ar e defined for t he v alues TRUE and FALSE) . When y ou add in NULL values, t he logic oper at or s becom e a bit m ore com plex . Tables 1.4,
1.5, and 1.6 show t he t rut h t ables for each logical oper at or .
Ta ble 1 .4 . Tr u t h Ta ble for Th r e e - V a lu e d AND Ope r a t or

a b a AND b

TRUE TRUE TRUE

TRUE FALSE FALSE

TRUE NULL NULL

FALSE FALSE FALSE

FALSE NULL FALSE

NULL NULL NULL

Sour ce: Post gr eSQL User 's Guide

Ta ble 1 .5 . Tr u t h Ta ble f or Th r e e - V a lu e d OR Ope r a t or

a b a OR b

TRUE TRUE TRUE

TRUE FALSE TRUE

TRUE NULL TRUE

FALSE FALSE FALSE

FALSE NULL NULL

NULL NULL NULL

Sour ce: Post gr eSQL User 's Guide

Ta ble 1 .6 . Tr u t h Ta ble f or Th r e e - V a lu e d NOT Ope r a t or

a NOT a

TRUE FALSE

FALSE TRUE

NULL NULL

Sour ce: Post gr eSQL User 's Guide

I don't m ean t o scar e y ou aw ay from t he NULL v alue—it 's very useful and oft en necessary —but y ou do have t o under st and t he
com plicat ions t hat it int roduces.

N ULLI F( ) a n d COALESCE( )

Post gr eSQL offer s t w o oper at or s t hat can conv er t a NULL value t o som e ot her value or conv er t a specific value int o NULL.

The COALESCE() oper at or will subst it ut e a default value w henev er it encount er s a NULL. For ex am ple, pret end t hat y ou'v e added t w o m or e
colum ns, male_lead and female_lead t o t he t apes t able so t hat it looks like t his:

movies=# SELECT * from tapes;


tape_id | title | male_lead | female_lead | duration
----------+---------------+-----------------+----------------+----------
AB-12345 | The Godfather | Marlon Brando | | 02:55:00
AB-67472 | The Godfather | Marlon Brando | | 02:55:00
MC-68873 | Casablanca | Humphrey Bogart | Ingrid Bergman | 01:42:00
OW-41221 | Citizen Kane | | | 01:59:00
AH-54706 | Rear Window | James Stewart | Grace Kelly |
AH-44289 | The Birds | | Tippi Hedren | 01:59:00
(6 rows)

You can use t he COALESCE() oper at or t o t r ansfor m a NULL male_lead int o t he w or d 'Unknown':

movies=# SELECT title, COALESCE( male_lead, 'Unknown' ) FROM tapes;


title | coalesce
---------------+-----------------
The Godfather | Marlon Brando
The Godfather | Marlon Brando
Casablanca | Humphrey Bogart
Citizen Kane | Unknown
Rear Window | James Stewart
The Birds | Unknown
(6 rows)

The COALESCE() oper at or is m ore t alent ed t han w e'v e show n her e—it can sear ch t hr ough a list of values, r et ur ning t he fir st non- NULL value
it finds. For ex am ple, t he follow ing query pr int s t he male_lead, or, if male_lead is NULL, t he female_lead, or if bot h ar e NULL, 'Unknown':

movies=# SELECT title, COALESCE( male_lead, female_lead, 'Unknown' )


movies-# AS "Starring"
movies-# FROM TAPES;
title | Starring
---------------+-----------------
The Godfather | Marlon Brando
The Godfather | Marlon Brando
Casablanca | Humphrey Bogart
Citizen Kane | Unknown
Rear Window | James Stewart
The Birds | Tippi Hedren
(6 rows)

You can st r ing t oget her any num ber of ex pr essions inside of t he COALESCE() oper at or ( as long as all ex pr essions evaluat e t o t he sam e
t ype) and COALESCE() will evaluat e t o t he left m ost non- NULL value in t he list . I f all of t he ex pr essions inside COALESCE() ar e NULL, t he
ent ire expression ev aluat es t o NULL.

The NULLIF() oper at or t r anslat es a non- NULL value int o NULL. NULLIF() is oft en used t o do t he opposit e of COALESCE(). COALESCE()
t r ansfor m s NULL int o a default v alue—NULLIF() t r anslat es a default v alue int o NULL. I n m any cir cum st ances, y ou w ant t o t r eat a num er ic
value and a NULL value as being t he sam e t hing. For ex am ple, t he balance colum n ( in t he customers t able) is NULL unt il a cust om er
act ually r ent s a t ape: A NULL balance im plies t hat y ou haven't act ually done any business w it h t he cust om er yet . But a NULL balance also
im plies t hat t he cust om er ow es y ou no m oney. To conv er t a NULL balance t o 0, use COALESCE( balance, 0 ). To conv er t a zero balance
t o NULL, use NULLIF( balance, 0 ). When Post gr eSQL evaluat es an NULLIF( arg1, arg2 ) expression, it com par es t he t w o ar gum ent s;
if t hey are equal, t he expression evaluat es t o NULL; if t hey ar e not equal, t he expression evaluat es t o t he value of arg1.
T h e CASE Ex p r e ssi o n

The CASE expression is a m or e gener ic for m of NULLIF() and COALESCE(). A CASE expression let s y ou m ap any giv en value int o som e ot her
value. You can writ e a CASE expression in t wo differ ent form s. The first form ( called t he sim ple form ) looks like t his:

CASE expression1
WHEN value1 THEN result1
WHEN value2 THEN result2
...
[ ELSE resultn ]
END

When Post gr eSQL ev aluat es a sim ple CASE expr ession, it com put es t he value of expression1 t hen com par es t he result t o value1. I f
expression 1 equals value1, t he CASE expression evaluat es t o result1. I f not , Post gr eSQL com par es expression 1 t o value1; if t hey m at ch,
t he CASE expression evaluat es t o result2. Post gr eSQL cont inues sear ching t hr ough t he WHEN clauses unt il it finds a m at ch. I f none of t he
values m at ch expression 1, t he expression evaluat es t o t he value specified in t he ELSE clause. I f Post gr eSQL get s all t he w ay t o t he end of
t he list and y ou hav en't specified an ELSE clause, t he CASE expression evaluat es t o NULL. Not e t hat result1, result2, … resultn m ust all
have t he sam e dat a t y pe.

You can see t hat NULLIF( balance, 0 ) is equivalent t o

CASE balance
WHEN 0 THEN NULL
ELSE balance
END

The second, m ore flexible for m of t he CASE expression is called t he searched form :

CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
...
[ ELSE resultn ]
END

When Post gr eSQL ev aluat es a searched CASE expr ession, it fir st ev aluat es condition1. I f condition1 evaluat es t o true, t he value of t he
CASE expression is result1. I f condition1 evaluat es t o false, Post gr eSQL ev aluat es condition2. I f t hat condit ion ev aluat es t o true, t he
value of t he CASE expr ession is result2. Ot her w ise, Post gr eSQL m oves on t o t he nex t condit ion. Post gr eSQL cont inues t o evaluat e each
condit ion unt il it finds one t hat evaluat es t o true. I f none of t he condit ions is true, t he CASE expression evaluat es t o resultn. I f
Post gr eSQL get s all t he w ay t o t he end of t he list and y ou haven't specified an ELSE clause, t he CASE expr ession evaluat es t o NULL.

Like t he sim ple for m , result1, result2, … resultn m ust all hav e t he sam e dat a t ype. How ever, in t he searched for m , t he condit ions don't
have t o be sim ilar t o each ot her. For ex am ple, if you w ant t o classify t he t it les in your tapes collect ion ( and you're a big Jim m y St ew ar t
fan) , y ou m ight use a CASE expression like t his:

movies=# SELECT
movies-# title, male_lead, duration,
movies-# CASE
movies-# WHEN duration < '1 hour 45 min' THEN 'short movie'
movies-# WHEN male_lead = 'James Stewart' THEN 'great movie'
movies-# WHEN duration > '2 hours' THEN 'long movie'
movies-# END
movies-# FROM
movies-# tapes;
title | male_lead | duration | case
---------------+-----------------+----------+-------------
The Godfather | Marlon Brando | 02:55:00 | long movie
The Godfather | Marlon Brando | 02:55:00 | long movie
Casablanca | Humphrey Bogart | 01:42:00 | short movie
Citizen Kane | | 01:59:00 |
Rear Window | James Stewart | | great movie
The Birds | | 01:59:00 |
(6 rows)

T h e ORDER BY Cl a u se

So far, all t he quer ies t hat y ou have seen r et ur n r ow s in an arbit rary or der . You can add an ORDER BY clause t o a SELECT com m and if y ou
need t o im pose a pr edict able or der ing. The gener al for m of t he ORDER BY clause is [ 9]

[ 9]
Post gr eSQL support s anot her for m for t he ORDER BY clause: ORDER BY expression [ USING operator ] [, ...]. This
m ight seem a lit t le confusing at fir st . When y ou specify ASC, Post gr eSQL uses t he < oper at or t o det er m ine r ow ordering.
When you specify DESC, Post gr eSQL uses t he > operat or. The second for m of t he ORDER BY clause allow s y ou t o specify an
alt er nat ive oper at or .
ORDER BY expression [ ASC | DESC ] [, ...]

The ASC and DESC t erm s m ean ascending and descending, r espect ively. I f y ou don't specify ASC or DESC, Post gr eSQL assum es t hat y ou w ant
t o see r esult s in ascending order. The expression follow ing ORDER BY is called a sort k ey .

Let 's look at a sim ple exam ple:

movies=# SELECT * FROM customers ORDER BY balance;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
5 | Funkmaster, Freddy | 555-FUNK | |
(5 rows)

You can see t hat t his SELECT com m and r et ur ns t he r esult set in ascending or der of t he balance colum n. Her e is t he sam e quer y, but in
descending order:

movies=# SELECT * FROM customers ORDER BY balance DESC;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
5 | Funkmaster, Freddy | 555-FUNK | |
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
(5 rows)

This t im e, t he lar gest balance is fir st , follow ed by successively sm aller values.

You m ay hav e not iced som et hing odd about how t he ORDER BY clause handles t he cust om er nam ed Fr eddy Funkm ast er . Recall from t he
previous sect ion t hat NULL cannot be com par ed t o ot her values. By it s very nat ure, t he ORDER BY clause m ust com par e v alues. Post gr eSQL
resolves t his issue w it h a sim ple r ule: NULL is alw ays consider ed larger t han all ot her values w hen evaluat ing an ORDER BY clause.

You can include m ult iple sort keys in t he ORDER BY clause. The follow ing quer y sor t s customers in ascending balance order, and t hen in
descending birth_date or der :

movies=# SELECT * FROM customers ORDER BY balance, birth_date DESC;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
5 | Funkmaster, Freddy | 555-FUNK | |
(5 rows)

When an ORDER BY clause cont ains m ult iple sort keys, y ou ar e t elling Post gr eSQL how t o br eak t ies. You can see t hat cust om ers 1 and 3
have t he sam e v alue ( 0.00) in t he balance colum n—you have asked Post gr eSQL t o or der r ow s using t he balance colum n. What happens
when Post gr eSQL finds t wo rows w it h t he sam e balance? When t w o sort key v alues ar e equal, Post gr eSQL m oves t o t he next sort k ey t o
break t he t ie. I f t wo sor t key values ar e not equal, sor t k ey s w it h a low er pr ecedence are ignor ed. So, w hen Post gr eSQL finds t hat
customers 1 and 3 have t he sam e balance, it m oves t o t he birth_date colum n t o break t he t ie.

I f you don't have a sort k ey w it h a low er pr ecedence, you w on't be able t o pr edict t he or der ing of rows wit h duplicat e sort k ey values.

You can include as m any sort k ey s as you like.

LIMIT a n d OFFSET

Occasionally, you will find t hat y ou w ant t o answ er a quest ion such as " Who ar e m y t op 10 salespeople?" I n m ost r elat ional dat abases, t his
is a difficult quest ion t o ask. Post gr eSQL offers t wo ext ensions t hat m ak e it easy t o answ er " Top n" or " Bot t om n"- t ype quest ions. The first
ext ension is t he LIMIT clause. The follow ing quer y show s t he t wo cust om er s w ho ow e y ou t he m ost m oney :

movies=# SELECT * FROM customers ORDER BY balance DESC LIMIT 2;

customer_id | customer_name | phone | birth_date | balance


-------------+--------------------+----------+------------+---------
5 | Funkmaster, Freddy | 555-FUNK | |
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
(2 rows)

You can see here t hat I used an ORDER BY clause so t hat t he r ow s are sor t ed such t hat t he highest balances appear first —in m ost cases,
you w on't use a LIMIT clause w it hout also using an ORDER BY clause. Let 's change t his query a lit t le—t his t im e w e w ant t he t op five
cust om ers w ho have a balance ov er $10:

movies=# SELECT * FROM customers


movies-# WHERE
movies-# balance >= 10
movies-# ORDER BY balance DESC
movies-# LIMIT 5;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------+----------+------------+---------
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
(1 row)

This exam ple show s t hat t he LIMIT clause w on't alw ays r et ur n t he num ber of r ow s t hat w er e specified. I nst ead, LIMIT r et ur ns no m or e
t han t he num ber of rows t hat you r equest . I n t his sam ple dat abase, y ou hav e only one cust om er w ho ow es y ou m ore t han $10.

The second ex t ension is t he OFFSET n clause. The OFFSET n clause t ells Post gr eSQL t o skip t he fir st n r ow s of t he r esult set . For exam ple:

movies=# SELECT * FROM customers ORDER BY balance DESC OFFSET 1;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
(4 rows)

I n t his case, we ar e v iew ing all t he customers except t he cust om er w it h t he gr eat est balance. I t 's com m on t o use LIMIT and OFFSET
t oget her:

movies=# SELECT * FROM customers


movies-# ORDER BY balance DESC LIMIT 2 OFFSET 1;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
(2 rows)

Fo r m a t t i n g Co l u m n R e su l t s

So far, y ou have seen how t o t ell Post gr eSQL w hich r ow s y ou w ant t o v iew , w hich colum ns y ou w ant t o v iew , and t he or der in w hich t he
rows should be r et ur ned. Let 's t ake a short side- t rip her e and learn how t o change t he appear ance of t he v alues t hat y ou select .

Take a look at t he follow ing query:

Code View : Scroll / Show All

movies=# SELECT customer_id, customer_name, balance, balance+4 FROM customers;

customer_id | customer_name | balance | ?column?


-------------+----------------------+---------+----------
3 | Panky, Henry | 0.00 | 4.00
1 | Jones, Henry | 0.00 | 4.00
4 | Wonderland, Alice N. | 3.00 | 7.00
2 | Rubin, William | 15.00 | 19.00
5 | Funkmaster, Freddy | |
(5 rows)

Post gr eSQL inser t s t w o lines of t ext bet w een your query and t he r esult set . These t w o lines ar e ( obviously) colum n headings. You can see
t hat t he header for each of t he fir st t hr ee colum ns cont ains t he nam e of t he colum n. What about t he last colum n? When y ou SELECT an
expr ession, Post gr eSQL uses " ?column?" for t he field header [ 10] .

[ 10]
Act ually, if y ou SELECT a funct ion ( such as AGE() or SQRT()), Post gr eSQL w ill use t he nam e of t he funct ion for t he field
header .

You can change field header s using t he AS clause:

movies=# SELECT customer_id, customer_name,


movies-# balance AS "Old balance",
movies-# balance + 4 AS "New balance"
movies-# FROM customers;

customer_id | customer_name | Old Balance | New balance


-------------+----------------------+-------------+-------------
3 | Panky, Henry | 0.00 | 4.00
1 | Jones, Henry | 0.00 | 4.00
4 | Wonderland, Alice N. | 3.00 | 7.00
2 | Rubin, William | 15.00 | 19.00
5 | Funkmaster, Freddy | |
(5 rows)

Not ice t hat y ou can pr ovide a field header for t able colum ns as well as for ex pr essions. I f you r enam e a field and t he quer y includes an
ORDER BY clause t hat r efer s t o t he field, t he ORDER BY should use t he new nam e, not t he original one:

movies=# SELECT id, customer_name,


movies-# balance AS "Old balance",
movies-# balance + 4 AS "New balance"
movies-# FROM customers
movies-# ORDER BY "Old balance";

customer_id | customer_name | Old Balance | New balance


-------------+----------------------+-------------+-------------
3 | Panky, Henry | 0.00 | 4.00
1 | Jones, Henry | 0.00 | 4.00
4 | Wonderland, Alice N. | 3.00 | 7.00
2 | Rubin, William | 15.00 | 19.00
5 | Funkmaster, Freddy | |
(5 rows)

This sect ion ex plained how t o change t he colum n headers for a SELECT com m and. You can also change t he appear ance of t he dat a values.
I n t he next sect ion, I 'll show y ou a few ex am ples using dat e values for illust r at ion.

W or k in g w it h D a t e V a lu e s

Post gr eSQL suppor t s six basic dat e, t im e, and dat e/ t im e dat a t ypes, as show n in Table 1.7 . I 'll use t he t erm t em por al t o cover dat e, t im e,
and dat e/ t im e dat a t ypes.

Ta ble 1 .7 . Post gr e SQL Te m por a l D a t a Ty pe s

D a t a Ty pe N a m e Ty pe of D a t a St or e d Ea r lie st D a t e / Tim e La t e st D a t e / Tim e


TIMESTAMP Dat e/ Tim e 4713 BC 1465001 AD

TIMESTAMP WITH TIME Dat e/ Tim e 1903 AD 2037 AD


ZONE

INTERVAL I nt erval - 178000000 year s 178000000 year s


DATE Dat e 4713 BC 32767 AD
TIME Tim e 00: 00: 00.00 23: 59: 59.99

TIME WITH TIME ZONE Tim e 00: 00: 00.00+ 12 23: 59: 59.99- 12

I 'll cover t he det ails of t he dat e/ t im e dat a t ypes in Chapt er 2. You have alr eady seen t wo of t hese t em por al dat a t ypes. The customers
t able cont ains a DATE colum n ( birth_date) :

movies=# \d customers
Table "customers"

Column | Type | Modifiers


---------------+-----------------------+-----------
customer_id | integer | not null
customer_name | character varying(50) | not null
phone | character(8) |
birth_date | date |
balance | numeric(7,2) |
Indexes:
"customers_customer_id_key" UNIQUE, btree (customer_id)

movies=# SELECT customer_name, birth_date FROM customers;


customer_name | birth_date
----------------------+------------
Jones, Henry | 1970-10-10
Rubin, William | 1972-07-10
Panky, Henry | 1968-01-21
Wonderland, Alice N. | 1969-03-05
Funkmaster, Freddy |
(5 rows)

You've also seen t he INTERVAL dat a t ype—t he AGE() funct ion r et ur ns an INTERVAL:

movies=# SELECT customer_name, AGE( birth_date ) FROM customers;


customer_name | age
----------------------+------------------------------
Jones, Henry | 31 years 4 mons 8 days 01:00
Rubin, William | 29 years 7 mons 8 days 01:00
Panky, Henry | 34 years 28 days
Wonderland, Alice N. | 32 years 11 mons 13 days
Funkmaster, Freddy |
(5 rows)

Dat e/ t im e v alues ar e usually pr et t y easy t o w or k w it h, but t here is a com plicat ion t hat you need t o be aw are of. Let 's say t hat I need t o add
a new cust om er :

movies=# INSERT INTO customers


movies-# VALUES
movies-# (
movies-# 7, 'Gull, Jonathon LC', '555-1111', '02/05/1984', NULL
movies-# );

This cust om er has a birth_date of '02/05/1984'—does t hat m ean " Febr uar y 5 th 1984" , or " May 2 nd 1984" ? How does Post gr eSQL k now
which dat e I m eant ? The pr oblem is t hat a dat e such as '02/05/1984' is am biguous—you can't k now w hich dat e t his st r ing r epr esent s
w it hout k now ing som et hing about t he cont ex t in w hich it w as ent ered. '02/05/1984' is am biguous. 'May 02 1984' is unam biguous.

Post gr eSQL enables y ou t o ent er and display dat es in a num ber of for m at s—som e dat e for m at s ar e am biguous and som e ar e unam biguous.
The DATESTYLE runt im e variable t ells Post gr eSQL how t o for m at dat es when displaying dat a and how t o int erpret am biguous dat es t hat you
ent er.

The DATESTYLE variable can be a lit t le confusing. DATESTYLE is com posed of t w o part s. The fir st par t , called t he convent ion, t ells
Post gr eSQL how t o int erpret am biguous dat es. The second par t , called t he display form at , det erm ines how Post gr eSQL displays dat e values.
The convent ion cont r ols dat e input and t he display for m at cont r ols dat e out put . Table 1.8 show s t he DATESTYLE display for m at s.

Ta ble 1 .8 . DATESTYLE D ispla y For m a t s

D ispla y For m a t US Con v e n t ion Eu r ope a n Con v e n t ion


ISO 1984- 05 - 02 1984- 05- 02
GERMAN 02.05.1984 02.05.1984
POSTGRES Wed May 02 1984 Wed 02 May 1984
SQL 05/ 02/ 1984 02/ 05/ 1984

Let 's t alk about t he display for m at fir st . Post gr eSQL suppor t s four different display for m at s. Three of t he display for m at s are unam biguous
and one is am biguous.

The default display for m at is nam ed ISO. I n ISO form at , dat es alw ays appear in t he for m ' YYYY-MM-DD'. The next display form at is GERMAN.
I n GERMAN form at , dat es alw ays appear in t he form 'DD.MM.YYYY'. The ISO and GERMAN for m at s ar e unam biguous because t he for m at nev er
changes. The POSTGRES form at is also unam biguous, but t he display form at can v ar y . Post gr eSQL needs a second piece of inform at ion ( t he
convent ion) t o decide w het her t he m ont h should appear befor e t he day ( US conv ent ion) or t he day should appear befor e t he m ont h
( European convent ion) . I n POSTGRES form at , dat e values display t he day - of - t he- w eek and m ont h nam e in abbr eviat ed t ext for m ; for
exam ple ' Wed May 02 1984' ( US) or ' Wed 02 May 1984' ( European) .

The final display for m at is SQL. SQL form at is am biguous. I n SQL form at , t he dat e 'May 02 1984' is displayed as '05/02/1984' ( US) , or as
'02/05/1984'( European) .

As I m ent ioned earlier, t he ISO and GERMAN display for m at s are unam biguous. I n ISO form at , t he m ont h alw ays pr ecedes t he day. I n
GERMAN form at , t he day alw ays pr ecedes t he m ont h. I f you choose POSTGRES or SQL form at , y ou m ust also specify t he or der in which y ou
want t he m ont h and day com ponent s t o appear. You can specify t he desired display for m at and m ont h/ day or der ing ( t hat is, t he
convent ion) in t he DATESTYLE runt im e variable:

movies=# SET DATESTYLE TO 'US,ISO'; -- 1984-05-02


movies=# SET DATESTYLE TO 'US,GERMAN'; -- 02.05.1984
movies=# SET DATESTYLE TO 'US,POSTGRES'; -- Wed May 02 1984
movies=# SET DATESTYLE TO 'US,SQL'; -- 05/02/1984

movies=# SET DATESTYLE TO 'EUROPEAN,ISO'; -- 1984-05-02


movies=# SET DATESTYLE TO 'EUROPEAN,GERMAN'; -- 02.05.1984
movies=# SET DATESTYLE TO 'EUROPEAN,POSTGRES'; -- Wed 02 May 1984
movies=# SET DATESTYLE TO 'EUROPEAN,SQL'; -- 02/05/1984

The convent ion part of t he DATESTYLE variable det er m ines how Post gr eSQL will m ak e sense of t he dat e v alues t hat y ou ent er. The
convent ion also affect s t he or der ing of t he m ont h and day com ponent s w hen displaying a POSTGRES or SQL dat e. Not e t hat y ou ar e not
rest rict ed t o ent er ing dat e v alues in t he for m at specified by DATESTYLE. For ex am ple, if y ou have chosen t o display dat es in 'US,SQL'
form at , y ou can st ill ent er dat e values in any of t he ot her for m at s.

Recall t hat t he ISO and GERMAN dat e for m at s ar e unam biguous—t he ordering of t he m ont h and day com ponent s is pr edefined. A dat e
ent ered in POSTGRES form at is unam biguous as well—you ent er t he nam e of t he m ont h so it cannot be confused w it h t he day. I f y ou choose
t o ent er a dat e in SQL form at , Post gr eSQL will look t o t he first com ponent of DATESTYLE ( t hat is, t he conv ent ion) t o det er m ine w het her y ou
want t he value int er pr et ed as a US or a European dat e. Let 's look at a few ex am ples.

movies=# SET DATESTYLE TO 'US,ISO';


movies=# SELECT CAST( '02/05/1984' AS DATE );
1984-02-05

movies=# SET DATESTYLE TO 'EUROPEAN,ISO';


movies=# SELECT CAST( '02/05/1984' AS DATE );
1984-05-02

I n t his exam ple, I 've ask ed Post gr eSQL t o display dat es in ISO form at , but I 've ent ered a dat e in an am biguous for m at . I n t he fir st case,
you can see t hat Post gr eSQL int er pr et ed t he am biguous dat e using US convent ions ( t he m ont h pr ecedes t he day ) . I n t he second case,
Post gr eSQL uses European convent ions t o int erpret t he dat e.

Now let 's see w hat happens w hen I ent er an unam biguous dat e:

movies=# SET DATESTYLE TO 'US,ISO';


SET VARIABLE
movies=# SELECT CAST( '1984-05-02' AS DATE );
1984-05-02

movies=# SET DATESTYLE TO 'EUROPEAN,ISO';


SET VARIABLE
movies=# SELECT CAST( '1984-05-02' AS DATE );
1984-05-02

This t im e, t here can be no confusion—an ISO- form at t ed dat e is alw ays ent er ed in ' YYYY-MM-DD' form at . Post gr eSQL ignor es t he convent ion.

So, y ou can see t hat I can ent er dat e v alues in m any form at s. I f I choose t o ent er a dat e in an am biguous form at , Post gr eSQL uses t he
convent ion part of t he cur r ent DATESTYLE t o int erpret t he dat e. I can also use DATESTYLE t o cont rol t he display for m at .

M a t ch i n g P a t t e r n s

I n t he previous t wo sect ions, y ou t ook a shor t det our t o learn a lit t le about how t o for m at r esult s. Now let 's get back t o t he t ask of
producing t he desired r esult s.

The WHERE clause is used t o r est r ict t he num ber of r ow s r et ur ned by a SELECT com m and [ 11] . Som et im es, y ou don't k now t he exact value
t hat y ou ar e sear ching for. For ex am ple, y ou m ay have a cust om er ask y ou for a film , but he doesn't r em em ber t he exact nam e, alt hough
he know s t hat t he film has t he word " Cit izen" in t he t it le. Post gr eSQL provides t w o feat ur es t hat m ake it possible t o sear ch for par t ial
alphanum er ic v alues.

[ 11] Technically, t he WHERE clause const rains t he set of r ow s affect ed by a SELECT, UPDATE, or DELETE com m and. I 'll show you

t he UPDATE and DELETE com m ands a lit t le lat er.

LIKE a n d NOT LIKE

The LIKE oper at or pr ovides sim ple pat t ern - m at ching capabilit ies. LIKE uses t wo special char act er s t hat indicat e t he unk now n part of a
pat t ern. The under scor e ( _) char act er m at ches any single charact er. The per cent sign ( %) m at ches any sequence of zer o or m ore
charact ers. Table 1.9 show s a few exam ples.

Ta ble 1 .9 . Pa t t e r n M a t ch in g w it h t he LIKE Ope r a t or

St r in g Pa t t e r n Re su lt

The Godfat her % Godfat her% Mat ches

The Godfat her % Godfat her Mat ches

The Godfat her % Godfat he_ Mat ches

The Godfat her ___ Godfat her Mat ches

The Godfat her Godfat her % Does not m at ch

The Godfat her _Godfat her Does not m at ch

The Godfat her : Part I I % Godfat her Does not m at ch

Now let 's see how t o use t he LIKE oper at or in a SELECT com m and:

movies=# SELECT * FROM tapes WHERE title LIKE '%Citizen%';


tape_id | title | duration
----------+----------------------+----------
OW-41221 | Citizen Kane |
KJ-03335 | American Citizen, An |
(2 rows)

The LIKE oper at or is case- sensit ive:

movies=# SELECT * FROM tapes WHERE title LIKE '%citizen%';


tape_id | title | duration

www.allitebooks.com
---------+-------+----------
(0 rows)

I f you w ant t o perform case- insensit ive pat t ern m at ching, use t he ILIKE oper at or :

movies=# SELECT * FROM tapes WHERE title ILIKE '%citizen%';


tape_id | title | duration
----------+----------------------+----------
OW-41221 | Citizen Kane |
KJ-03335 | American Citizen, An |
(2 rows)

You can, of cour se, com bine LIKE and ILIKE w it h t he NOT oper at or t o ret urn r ow s t hat do not m at ch a pat t er n:

movies=# SELECT * FROM tapes WHERE title NOT ILIKE '%citizen%';


tape_id | title | duration
----------+---------------+--------------
AB-12345 | The Godfather |
AB-67472 | The Godfather |
MC-68873 | Casablanca |
AH-54706 | Rear Window |
OW-42200 | Sly | 01:36
OW-42201 | Stone | 4 days 01:36
(6 rows)

Pa t t e r n M a t ch in g w it h Re gu la r Ex pr e ssion s

The LIKE and ILIKE oper at or s ar e easy t o use, but t hey ar en't v er y pow er ful. For t unat ely, Post gr eSQL let s y ou sear ch for dat a using
regular expressions. A regular expression is a st ring t hat specifies a pat t er n. The language t hat you use t o cr eat e regular ex pr essions is far
m ore pow er ful t han t he LIKE and ILIKE oper at or s. You have pr obably used regular expressions before; pr ogr am s such as grep, awk, and
t he Unix ( and DOS) shells use regular ex pr essions.

The LIKE and ILIKE oper at or s define t w o pat t ern- m at ching charact ers; t he r egular expression operat or defines far m ore. First , t he
charact er " ." w it hin a regular expression oper at es in t he sam e w ay as t he " _" char act er in a LIKE pat t er n: it m at ches any single charact er.
The char act er s " .*" in a r egular expression oper at e in t he sam e w ay as t he " %" char act er in a LIKE pat t er n: t hey m at ch zero or m or e
occurrences of any single char act er .

Not ice t hat in a regular ex pr ession, y ou use t wo char act er s t o m at ch a sequence of charact ers, w her eas y ou use a single char act er in a
LIKE pat t ern. The regular expression ".*" is act ually t wo regular ex pr essions com bined int o one com plex expr ession. As I m ent ioned
earlier, t he " ." char act er m at ches any single charact er. The " *" char act er m at ches zero or m or e occurrences of t he pat t er n t hat pr ecedes it .
So, " .*" m eans t o m at ch any single char act er , zero or m ore t im es. There ar e t hree ot her r epet it ion operat ors: The " +" char act er m at ches
one or m or e occur r ences of t he preceding pat t er n, and t he "?" char act er m at ches zer o or one occur r ence of t he pr eceding pat t er n. I f y ou
need t o get r eally fancy ( I never hav e) , you can use t he form " {x[,y]}" t o m at ch at least x and no m ore t han y occurrences of t he
preceding pat t er n.

You can also sear ch for t hings ot her t han " ." . For exam ple, t he char act er " ^" m at ches t he beginning of a st r ing and " $" m at ches t he end.
The regular expression sy nt ax even includes suppor t for char act er classes. The pat t er n " [:upper:]*[:digit:]" w ill m at ch any st ring t hat
includes zero or m ore upper case char act er s follow ed by a single digit .

The " |" char act er giv es y ou a w ay t o sear ch for a st ring t hat m at ches eit her of t w o pat t er ns. For ex am ple, t he regular expression " (^God)
|.*Donuts.*" w ould m at ch a st ring t hat eit her st ar t s w it h t he st ring "God" or includes t he w or d " Donuts".

Regular ex pr essions ar e ext r em ely pow er ful, but t hey can get aw fully com plex . I f you need m or e inform at ion, Chapt er 4 of t he Post gr eSQL
User 's Manual provides an exhaust ive reference t o t he com plet e r egular expression synt ax.

Table 1.10 show s how t o const r uct r egular ex pr essions t hat m at ch t he sam e st r ings m at ched by t he LIKE pat t er ns in show n in Table 1.9.

Ta ble 1 .1 0 . Pa t t e r n M a t ch in g w it h Re gu la r Ex pr e ssion s

St r in g Pa t t e r n Re su lt

The Godfat her .* Godfat her Mat ches

The Godfat her .* Godfat her .* Mat ches

The Godfat her .* Godfat he. Mat ches

The Godfat her ... Godfat her Mat ches

The Godfat her Godfat her.* Does not m at ch

The Godfat her .Godfat her Does not m at ch

The Godfat her : Part I I .* Godfat her Does not m at ch


Ag g r e g a t e s

Post g r eSQL of f er s a n u m b er of ag g r eg at e f u n ct ion s. An ag g r eg at e is a collect ion of t h in g s—y ou can t h in k of a n ag g r eg at e as t h e


set of r o w s r et u r n ed b y a q u er y . An ag g r eg at e f u n ct ion is a f u n ct ion t h at op er at es o n a n ag g r eg at e ( n on ag g r eg at e f u n ct ion s
op er at e o n a sin g le r o w w it h in a n ag g r eg at e) . Most of t h e ag g r eg at e f u n ct ion s op er at e o n a sin g le v alu e ex t r act ed f r om each
r ow — t h is is called an ag g r eg at e ex p r ession .

COUNT()

COUNT() is p r ob ab ly t h e sim p lest ag g r eg at e f u n ct ion . COUNT() r et u r n s t h e n u m b er of ob j ect s in an ag g r eg at e. Th e COUNT()


f u n ct ion co m es in f ou r f o r m s:

€ COUNT(*)

€ COUNT( expression )

€ COUNT( ALL expression )

€ COUNT( DISTINCT expression )

I n t h e f ir st f or m , COUNT(*) r et u r n s t h e n u m b er of r o w s in a n ag g r eg at e:

movies=# SELECT * FROM customers;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
5 | Funkmaster, Freddy | 555-FUNK | |
7 | Gull, Jonathon LC | 555-1111 | 1984-02-05 |
8 | Grumby, Jonas | 555-2222 | 1984-02-21 |
(7 rows)

movies=# SELECT COUNT(*) FROM customers;


count
-------
7
(1 row)

movies=# SELECT COUNT(*) FROM customers WHERE customer_id < 5;


count
-------
4
(1 row)

You can see f r om t h is ex am p l e t h at t h e COUNT(*) f u n ct ion p ay s at t en t ion t o t h e WHERE clau se. I n ot h er w o r d s, COUNT(*) r et u r n s
t h e n u m b er of r o w s t h at f ilt er t h r ou g h t h e WHERE clau se; t h at is, t h e n u m b er of r ow s in t h e ag g r eg at e.

I n t h e secon d f or m , COUNT( expression ) r et u r n s t h e n u m b er of n o n - NULL v alu es in t h e ag g r eg at e. Fo r ex am p le, y o u m i g h t


w an t t o k n o w h o w m a n y cu st om er s h av e a n o n - NULL b alan ce:

movies=# SELECT COUNT( balance ) FROM customers;


count
-------
4
(1 row)

movies=# SELECT COUNT(*) - COUNT( balance ) FROM customers;


?column?
----------
3
(1 row)

Th e f ir st q u er y r et u r n s t h e n u m b er of n o n - NULL b alan ces in t h e customers t ab le. Th e seco n d q u er y r et u r n s t h e n u m b er of NULL


b alan ces.

Th e t h ir d f or m , COUNT( ALL expression ) is eq u iv alen t t o t h e secon d f or m . Po st g r eSQL in clu d es t h e t h ir d f or m f or


com p let en ess; it co m p l em en t s t h e f ou r t h f o r m .

COUNT( DISTINCT expression ) r et u r n s t h e n u m b er of d ist in ct n o n - NULL v alu es in t h e ag g r eg at e.

movies=# SELECT DISTINCT balance FROM customers;


balance
---------
0.00
3.00
15.00

(4 rows)

movies=# SELECT COUNT( DISTINCT balance ) FROM customers;


count
-------
3
(1 row)

You m i g h t n ot ice a su r p r isin g r esu lt in t h at last ex am p le. Th e f ir st q u er y r et u r n s t h e d ist in ct b alan ces in t h e customers t ab le.
Not ice t h a t Post g r eSQL t ells y o u t h at it r et u r n ed f ou r r o w s—t h er e ar e f ou r d ist in ct v alu es. Th e secon d q u er y r et u r n s a cou n t of
t h e d ist in ct b alan ces—it say s t h at t h er e ar e on ly t h r ee.

I s t h is a b u g ? No, b ot h q u er ies r et u r n ed t h e cor r ect in f or m at ion . Th e f ir st q u er y in clu d es t h e NULL v alu e in t h e r esu lt set . COUNT
(), an d in f act all t h e ag g r eg at e f u n ct ion s ( ex cep t f or COUNT(*)) , i g n or e NULL v alu es.

SUM()

Th e SUM( expression ) f u n ct ion r et u r n s t h e su m of all t h e v alu es in t h e ag g r eg at e ex p r ession . Un lik e COUNT(), y o u can ' t u se
SUM() on en t ir e r o w s [ 1 2 ] . I n st ead , y ou u su ally sp ecif y a sin g le co l u m n :

[ 12]
Act u ally , y o u can SUM(*), b u t it p r ob ab ly d oesn ' t d o w h at y o u w o u l d ex p ect . SUM(*) is eq u iv alen t t o COUNT(*).

movies=# SELECT SUM( balance ) FROM customers;


sum
-------
18.00
(1 row)

Not ice t h at t h e SUM() f u n ct ion ex p ect s a n ex p r ession . Th e n a m e of a n u m er i c co l u m n is a v alid ex p r ession . Yo u can also sp ecif y
an ar b it r ar ily co m p l ex ex p r ession as lon g as t h at ex p r ession r esu lt s in a n u m er i c v alu e.

You can also SUM() an ag g r eg at e o f intervals. Fo r ex am p le, t h e f ollow in g q u er y t ells y o u h o w lon g it w o u l d t ak e t o w at ch all t h e
t ap es in y ou r v id eo st or e:

movies=# SELECT SUM( duration ) FROM tapes;


sum
--------------
4 days 03:12
(1 row)

AVG()

Th e AVG( expression ) f u n ct ion r et u r n s t h e av er ag e o f an ag g r eg at e ex p r ession . Lik e SUM(), y o u can f in d t h e av er ag e o f a


n u m er ic ag g r eg at e or a n interval ag g r eg at e.

movies=# SELECT AVG( balance ) FROM customers;


avg
--------------
4.5000000000
(1 row)

movies=# SELECT AVG( balance ) FROM customers


movies-# WHERE balance IS NOT NULL;
avg
--------------
4.5000000000
(1 row)

Th ese q u er ies d em on st r at e a n im p or t an t p oin t : t h e ag g r eg at e f u n ct ion s com p let ely i g n or e r o w s w h er e t h e ag g r eg at e ex p r ession


ev alu at es t o NULL. Th e ag g r eg at e p r o d u ced b y t h e secon d q u er y ex p licit ly om it s an y r o w s w h er e t h e b alan ce is NULL. Th e
ag g r eg at e p r od u ced b y t h e f ir st q u er y im p licit ly om it s NULL b alan ces. I n ot h er w o r d s, t h e f ollow in g q u er ies ar e eq u iv alen t :

SELECT AVG( balance ) FROM customers;


SELECT AVG( balance ) FROM customers WHERE balance IS NOT NULL;
SELECT SUM( balance ) / COUNT( balance ) FROM customers;
Bu t t h ese q u er ies ar e n o t eq u iv alen t :

SELECT AVG( balance ) FROM customers;


SELECT SUM( balance ) / COUNT( * ) FROM customers;

W h y n o t ? Becau se COUNT( * ) cou n t s all r ow s w h er eas COUNT( balance ) om it s an y r o w s w h er e t h e balance is NULL.

MIN() a n d MAX()

Th e MIN( expression ) an d MAX( expression ) f u n ct ion s r et u r n t h e m i n i m u m an d m a x i m u m v alu es, r esp ect iv ely , of a n
ag g r eg at e ex p r ession . Th e MIN() an d MAX() f u n ct ion s ca n op er at e o n n u m er ic, d at e/ t im e, or st r in g ag g r eg at es:

movies=# SELECT MIN( balance ), MAX( balance ) FROM customers;


min | max
------+-------
0.00 | 15.00
(1 row)

movies=# SELECT MIN( birth_date ), MAX( birth_date ) FROM customers;


min | max
------------+------------
1968-01-21 | 1984-02-21
(1 row)

movies=# SELECT MIN( customer_name ), MAX( customer_name )


movies-# FROM customers;
min | max
--------------------+----------------------
Funkmaster, Freddy | Wonderland, Alice N.
(1 row)

Ot h e r Ag g r e g a t e Fu n ct ion s

I n ad d it ion t o COUNT(), SUM(), AVG(), MIN(), a n d MAX(), Post g r eSQL also su p p or t s t h e STDDEV( expression ) an d VARIANCE
( expression ) ag g r eg at e f u n ct ion s. Th ese last t w o ag g r eg at e f u n ct ion s co m p u t e t h e st an d ar d d ev iat ion a n d v ar ian ce of a n
ag g r eg at e, t w o co m m o n st at ist ical m easu r es o f v ar iat ion w it h in a set of ob ser v at ion s.

Gr ou p in g Re su lt s

Th e ag g r eg at e f u n ct ion s a r e u sef u l f or su m m ar i zi n g in f or m at ion . Th e r esu lt of a n ag g r eg at e f u n ct ion is a sin g le v alu e. So m et i m es,


y ou r eally w an t a n ag g r eg at e f u n ct ion t o ap p l y t o each of a n u m b er of su b set s o f y ou r d at a. Fo r ex am p le, y o u m a y f in d it
in t er est in g t o co m p u t e so m e d em og r ap h i c in f or m at ion ab o u t y ou r cu st o m er b ase. Let ' s f ir st look at t h e en t ir e customers t ab le:

movies=# SELECT * FROM customers;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
5 | Funkmaster, Freddy | 555-FUNK | |
7 | Gull, Jonathon LC | 555-1111 | 1984-02-05 |
8 | Grumby, Jonas | 555-2222 | 1984-02-21 |
(7 rows)

Look at t h e birth_date colu m n — n ot ice t h at y o u h av e cu st o m er s b or n in t h r ee d ist in ct d ecad es ( f ou r if y o u co u n t NULL as a


d ecad e) :

movies=# SELECT DISTINCT( EXTRACT( DECADE FROM birth_date ))


movies-# FROM customers;
date_part
-----------
196
197
198
(4 rows)

Th e EXTRACT() f u n ct ion ex t r act s a d at e co m p o n en t f r o m a d at e/ t im e v alu e. Th e DECADE co m p o n en t look s a lit t le st r an g e, b u t it


m ak es sen se t o k n o w w h et h er t h e d ecad e of t h e ' 6 0 s r ef er s t o t h e 1 9 6 0 s or t h e 2 0 6 0 s, n o w t h at w e ar e p ast Y2 K.

No w t h at y ou k n o w h o w m a n y d ecad es ar e r ep r esen t ed in y ou r cu st o m er b ase, y o u m i g h t n ex t w an t t o k n o w h o w m a n y


cu st om er s w er e b or n in each d ecad e. Th e GROUP BY clau se h elp s an sw er t h is k in d o f q u est ion :
movies=# SELECT COUNT(*), EXTRACT( DECADE FROM birth_date )
movies-# FROM customers
movies-# GROUP BY EXTRACT( DECADE FROM birth_date );
count | date_part
-------+-----------
2 | 196
2 | 197
2 | 198
1 |
(4 rows)

Th e GROUP BY clau se is u sed w it h ag g r eg at e f u n ct ion s. Po st g r eSQL sor t s t h e r esu lt set b y t h e GROUP BY ex p r ession a n d ap p lies
t h e ag g r eg at e f u n ct ion t o each g r ou p .

Th er e is an easier w a y t o b u ild t h is q u er y . Th e p r o b l em w it h t h is q u er y is t h at y o u h ad t o r ep eat t h e EXTRACT( DECADE FROM


birth_date ) p h r ase. I n st ead , y ou can u se t h e AS clau se t o n a m e t h e d ecad e f ield , a n d t h en y ou ca n r ef er t o t h at f ield b y n a m e
in t h e GROUP BY clau se:

movies=# SELECT COUNT(*), EXTRACT( DECADE FROM birth_date ) AS decade


movies-# FROM customers
movies-# GROUP BY decade;
count | decade
-------+--------
2 | 196
2 | 197
2 | 198
1 |
(4 rows)

I f y o u d on ' t r eq u est a n ex p licit or d er in g , t h e GROUP BY clau se w ill cau se t h e r esu lt set t o be sor t ed b y t h e GROUP BY f ield s. I f y ou
w an t a d if f er en t or d er in g , y o u can u se t h e ORDER BY clau se w it h GROUP BY. Th e f ollow in g q u er y sh o w s h o w m a n y cu st om er s y o u
h av e f or each d ecad e, sor t ed b y t h e co u n t :

movies=# SELECT
movies-# COUNT(*) as "Customers",
movies-# EXTRACT( DECADE FROM birth_date ) as "Decade"
movies-# FROM customers
movies-# GROUP BY "Decade"
movies-# ORDER BY "Customers";
Customers | Decade
-----------+--------
1 |
2 | 196
2 | 197
2 | 198
(4 rows)

Th e NULL d ecad e look s a lit t le f u n n y in t h is r esu lt set . You h av e o n e cu st o m er ( Fr ed d y Fu n k m ast er ) w h o w as t o o v ain t o t ell y ou
w h en h e w as b or n . Yo u can u se t h e HAVING clau se t o elim in at e ag g r eg at e g r ou p s:

movies=# SELECT COUNT(*), EXTRACT( DECADE FROM birth_date ) as decade


movies-# FROM customers
movies-# GROUP BY decade
movies-# HAVING EXTRACT( DECADE FROM birth_date ) IS NOT NULL;
count | decade
-------+--------
2 | 196
2 | 197
2 | 198
(3 rows)

You can see t h at t h e HAVING clau se is sim ilar t o t h e WHERE clau se. Th e WHERE clau se d et er m in es w h i ch r ow s ar e in clu d ed in t h e
ag g r eg at e, w h er eas t h e HAVING clau se d et er m in es w h ich g r ou p s ar e in clu d ed in t h e r esu lt set .
M ult i- Ta ble Joins

So far , all t he quer ies t hat y ou'v e seen inv olv e a single t able. Most dat abases cont ain m ult iple t ables an d t her e ar e r elat ionships
bet w een t h ese t ables. This sam ple dat abase h as an ex am ple:

movies=# \d rentals

Table "rentals"
Attribute | Type | Modifier
-------------+--------------+----------
tape_id | character(8) | not null
rental_date | date | not null
customer_id | integer | not null

Her e's a descr ipt ion of t he rentals t able fr om ear lier in t his chapt er :

" When a cu st om er com es in t o r ent a t ape, w e w ill add a r ow t o t he rentals t able t o r ecor d t he t r ansact ion. Ther e ar e
t hr ee pieces of infor m at ion t hat w e need t o r ecor d for each r ent al: t he tape_id, t he customer_id, an d t he dat e t hat t he
r ent al occur r ed. Not ice t hat each r ow in t he r ent als t able r efer s t o a cu st om er ( customer_id) an d a t ape ( tape_id) ."

You can see t hat each r ow in t he rentals t able r efer s t o a t ape ( tape_id) an d t o a cu st om er ( customer_id) . I f y ou SELECT fr om t he
r ent als t able, y ou can see t he t ape I D and cu st om er I D, but y ou can't see t he m ov ie t it le or cust om er nam e. What y ou need h er e is a
j oin . Wh en y ou n eed t o r et r iev e dat a fr om m ult iple t ables, y ou j oin t hose t ables.

Post gr eSQL ( an d all r elat ional dat abases) suppor t s a n u m ber of j oin t y pes. The m ost basic j oin t y pe is a cr oss- j oin ( or Car t esian
pr oduct ) . I n a cr oss j oin, Post gr eSQL j oins each r ow in t he fir st t able t o each r ow in t he second t able t o pr oduce a r esult t able. I f y ou
ar e j oining against a t hir d t able, Post gr eSQL j oins each r ow in t he int er m ediat e r esult w it h each r ow in t he t hir d t able.

Let 's look at an ex am ple. We'll cr oss- j oin t he rentals an d customers t ables. Fir st , I 'll sh ow y ou each t able:

movies=# SELECT * FROM rentals;


tape_id | rental_date | customer_id
----------+-------------+-------------
AB-12345 | 2001-11-25 | 1
AB-67472 | 2001-11-25 | 3
OW-41221 | 2001-11-25 | 1
MC-68873 | 2001-11-20 | 3
(4 rows)

movies=# SELECT * FROM customers;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
5 | Funkmaster, Freddy | 555-FUNK | |
7 | Gull, Jonathon LC | 555-1111 | 1984-02-05 |
8 | Grumby, Jonas | 555-2222 | 1984-02-21 |
(7 rows)

Now I 'll j oin t hese t ables. To per for m a cr oss- j oin, w e sim ply list each t able in t he FROM clause:

Code View : Scr oll / Show All

movies=# SELECT rentals.*, customers.customer_id, customers.customer_name


movies-# FROM rentals, customers;
tape_id | rental_date | customer_id | customer_id | customer_name
----------+-------------+-------------+-------------+----------------------
AB-12345 | 2001-11-25 | 1 | 3 | Panky, Henry
AB-12345 | 2001-11-25 | 1 | 1 | Jones, Henry
AB-12345 | 2001-11-25 | 1 | 4 | Wonderland, Alice N.
AB-12345 | 2001-11-25 | 1 | 2 | Rubin, William
AB-12345 | 2001-11-25 | 1 | 5 | Funkmaster, Freddy
AB-12345 | 2001-11-25 | 1 | 7 | Gull, Jonathon LC
AB-12345 | 2001-11-25 | 1 | 8 | Grumby, Jonas
AB-67472 | 2001-11-25 | 3 | 3 | Panky, Henry
AB-67472 | 2001-11-25 | 3 | 1 | Jones, Henry
AB-67472 | 2001-11-25 | 3 | 4 | Wonderland, Alice N.
AB-67472 | 2001-11-25 | 3 | 2 | Rubin, William
AB-67472 | 2001-11-25 | 3 | 5 | Funkmaster, Freddy
AB-67472 | 2001-11-25 | 3 | 7 | Gull, Jonathon LC
AB-67472 | 2001-11-25 | 3 | 8 | Grumby, Jonas
OW-41221 | 2001-11-25 | 1 | 3 | Panky, Henry
OW-41221 | 2001-11-25 | 1 | 1 | Jones, Henry
OW-41221 | 2001-11-25 | 1 | 4 | Wonderland, Alice N.
OW-41221 | 2001-11-25 | 1 | 2 | Rubin, William
OW-41221 | 2001-11-25 | 1 | 5 | Funkmaster, Freddy
OW-41221 | 2001-11-25 | 1 | 7 | Gull, Jonathon LC
OW-41221 | 2001-11-25 | 1 | 8 | Grumby, Jonas
MC-68873 | 2001-11-20 | 3 | 3 | Panky, Henry
MC-68873 | 2001-11-20 | 3 | 1 | Jones, Henry
MC-68873 | 2001-11-20 | 3 | 4 | Wonderland, Alice N.
MC-68873 | 2001-11-20 | 3 | 2 | Rubin, William
MC-68873 | 2001-11-20 | 3 | 5 | Funkmaster, Freddy
MC-68873 | 2001-11-20 | 3 | 7 | Gull, Jonathon LC
MC-68873 | 2001-11-20 | 3 | 8 | Grumby, Jonas
(28 rows)

You can see t hat Post gr eSQL h as j oined each r ow in t he rentals t able t o each r ow in t he customers t able. Th e rentals t able cont ains
four r ow s; t he customers t able cont ains sev en r ow s. The r esult set cont ains 4 ¥ 7 or 28 r ow s.

Cr oss- j oins ar e r ar ely u sefu l—t hey usually don ' t r epr esent r eal- w or ld r elat ionships.

The second t y pe of j oin, t he inner - j oin , is v er y useful. An in n er - j oin st ar t s w it h a cr oss- j oin, an d t hen t hr ow s out t he r ow s t hat y ou don't
w ant . Tak e a close look at t he r esult s of t he pr ev ious qu er y . Her e ar e t he fir st sev en r ow s again :

tape_id | rental_date | customer_id | customer_id | customer_name


----------+-------------+-------------+-------------+----------------------
AB-12345 | 2001-11-25 | 1 | 3 | Panky, Henry
AB-12345 | 2001-11-25 | 1 | 1 | Jones, Henry
AB-12345 | 2001-11-25 | 1 | 4 | Wonderland, Alice N.
AB-12345 | 2001-11-25 | 1 | 2 | Rubin, William
AB-12345 | 2001-11-25 | 1 | 5 | Funkmaster, Freddy
AB-12345 | 2001-11-25 | 1 | 7 | Gull, Jonathon LC
AB-12345 | 2001-11-25 | 1 | 8 | Grumby, Jonas

. . . . .
. . . . .
. . . . .

These sev en r ow s w er e pr oduced by j oining t he fir st r ow in t he rentals t able:

tape_id | rental_date | customer_id


----------+-------------+-------------
AB-12345 | 2001-11-25 | 1

w it h each r ow in t he customers t able. Wh at is t he r eal- w or ld r elat ionship bet w een a rentals r ow an d a customers r ow ? Each r ow in t he
rentals t able cont ains a cu st om er I D. Each r ow in t he customers t able is uniquely ident ified by a cu st om er I D. So, giv en a rentals
r ow , w e can find t he cor r esponding customers r ow by sear ching for a cu st om er w h er e t he cu st om er I D is equal t o
rentals.customer_id. Look ing back at t he pr ev ious quer y , y ou can see t hat t he m eaningful r ow s ar e t hose WHERE
customers.customer_id = rentals.customer_id.

Qualifying Colum n N am es

Not ice t hat t his WHERE clause m en t ion s t w o colu m n s w it h t he sam e n am es ( customer_id) . You m ay find it helpful t o
qualify each colu m n n am e by pr efix ing it w it h t he n am e of t he cor r esponding t able, follow ed by a per iod. So,
customers.customer_id r efer s t o t he customer_id colu m n in t he customers t able and rentals.customer_id r efer s t o
t he customer_id colu m n in t he rentals t able. Addin g t he t able qualifier is som et im es r equir ed if a com m an d inv olv es
t w o colum ns w it h ident ical n am es, but is useful in ot her cases.

Now y ou can const r uct a quer y t hat w ill sh ow u s all of t he r ent als an d t he n am es of t he cor r esponding cust om er s:

movies=# SELECT rentals.*, customers.customer_id, customers.customer_name


movies-# FROM rentals, customers
movies-# WHERE customers.customer_id = rentals.customer_id;

tape_id | rental_date | customer_id | customer_id | customer_name


----------+-------------+-------------+-------------+---------------
AB-12345 | 2001-11-25 | 1 | 1 | Jones, Henry
AB-67472 | 2001-11-25 | 3 | 3 | Panky, Henry
OW-41221 | 2001-11-25 | 1 | 1 | Jones, Henry
MC-68873 | 2001-11-20 | 3 | 3 | Panky, Henry
(4 rows)

To ex ecut e t his quer y , Post gr eSQL could st ar t by cr eat ing t he cr oss- j oin bet w een all t he t ables inv olv ed, pr oducing an int er m ediat e
r esult t able. Nex t , Post gr eSQL could t hr ow out all t he r ow s t hat fail t o sat isfy t he WHERE clause. I n pr act ice, t his w ould be a poor
st r at egy : Cr oss- j oins can get v er y lar ge quick ly . I nst ead, t he Post gr eSQL quer y opt im izer analy zes t he quer y and plans an ex ecut ion
st r at egy t o m inim ize ex ecut ion t im e. I 'll cov er qu er y opt im izat ion in Chapt er 4.

Join Type s
We'v e seen t w o j oin t y pes so far : cr oss- j oins an d inner - j oins. Now w e'll look at out er - j oins. An out er - j oin is sim ilar t o an inner - j oin: a
r elat ionship bet w een t w o t ables is est ablished by cor r elat ing a colu m n fr om each t able.

I n an ear lier sect ion, y ou w r ot e a quer y t hat an sw er ed t he quest ion: " Wh ich cu st om er s ar e cur r ent ly r ent ing m ov ies?" How w ould y ou
answ er t he quest ion: " Wh o ar e m y cust om er s an d w h ich m ov ies ar e t hey cur r ent ly r ent ing?" You m ight st ar t by t r y ing t he follow ing
quer y :

movies=# SELECT customers.*, rentals.tape_id


movies-# FROM customers, rentals
movies=# WHERE rentals.customer_id = customers.customer_id;

customer_id | customer_name | phone | birth_date | balance | tape_id


-------------+---------------+----------+------------+---------+----------
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 | AB-12345
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 | AB-67472
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 | OW-41221
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 | MC-68873
rows)

Well, t hat didn't w or k . Th is qu er y show ed y ou w hich cust om er s ar e cur r ent ly r ent ing m ov ies ( an d t he m ov ies t hat t hey ar e r ent ing) .
What w e r eally w ant is a list of all cust om er s and, if a cu st om er is cur r ent ly r ent ing an y m ov ies, all t he m ov ies r ent ed. Th is is an out er -
j oin. An out er - j oin pr eser v es all t he r ow s in on e t able ( or bot h t ables) r egar dless of w het her a m at ching r ow can be found in t he second
t able.

The sy nt ax for an out er - j oin is a lit t le st r ange. Her e is an ex am ple:

movies=# SELECT customers.customer_name, rentals.tape_id


movies-# FROM customers LEFT OUTER JOIN rentals
movies-# ON customers.customer_id = rentals.customer_id;
customer_name | tape_id
----------------------+----------
Jones, Henry | AB-12345
Jones, Henry | OW-41221
Rubin, William |
Panky, Henry | AB-67472
Panky, Henry | MC-68873
Wonderland, Alice N. |
Funkmaster, Freddy |
Gull, Jonathon LC |
Grumby, Jonas |
rows)

This quer y is a left out er - j oin . Wh y left ? Because y ou w ill see each r ow fr om t he left t able ( t he t able t o t he left of t he LEFT OUTER JOIN
phr ase) . An inner - j oin w ould list only t w o cust om er s ( " Jones, Henr y " an d " Pank y , Henr y " ) —t he ot her cu st om er s h av e no r ent als.

A RIGHT OUTER JOIN pr eser v es each r ow fr om t he r ight t able. A FULL OUTER JOIN pr eser v es each r ow fr om bot h t ables.

The follow ing qu er y sh ow s a list of all customers, all tapes, an d an y rentals:

movies=# SELECT customers.customer_name, rentals.tape_id, tapes.title


movies-# FROM customers FULL OUTER JOIN rentals
movies-# ON customers.customer_id = rentals.customer_id
movies-# FULL OUTER JOIN tapes
movies-# ON tapes.tape_id = rentals.tape_id;
customer_name | tape_id | title
----------------------+----------+----------------------
Jones, Henry | AB-12345 | The Godfather
Panky, Henry | AB-67472 | The Godfather
| | Rear Window
| | American Citizen, An
Panky, Henry | MC-68873 | Casablanca
Jones, Henry | OW-41221 | Citizen Kane
Rubin, William | |
Wonderland, Alice N. | |
Funkmaster, Freddy | |
Gull, Jonathon LC | |
Grumby, Jonas | |
| | Sly
| | Stone
(13 rows)
UPDATE

No w t h at y ou ' v e seen a n u m b er of w ay s t o v iew y ou r d at a, let ' s see h o w t o m o d i f y ( an d d elet e) ex ist in g d at a.

Th e UPDATE co m m a n d m od if ies d at a in o n e or m or e r o w s. Th e g en er al f or m of t h e UPDATE co m m a n d is

UPDATE table SET column = expression [, ...] [WHERE condition]

Usin g t h e UPDATE co m m a n d is st r aig h t f or w ar d : Th e WHERE clau se ( if p r esen t ) d et er m in es w h i ch r o w s w ill b e u p d at ed an d t h e SET


clau se d et er m in es w h ich colu m n s w ill b e u p d at ed ( an d t h e n ew v alu es) .

You m i g h t h av e n ot iced in ear lier ex am p l es t h at o n e of t h e t ap es h a d a d u r at ion of '4 days, 01:36'— t h at ' s ob v iou sly a m ist ak e.
You can cor r ect t h is p r o b l em w it h t h e UPDATE co m m a n d as f ollow s:

movies=# UPDATE tapes SET duration = '4 hours 36 minutes'


movies-# WHERE tape_id = 'OW-42201';
UPDATE 1

movies=# SELECT * FROM tapes;


tape_id | title | duration
----------+----------------------+----------
AB-12345 | The Godfather |
AB-67472 | The Godfather |
MC-68873 | Casablanca |
OW-41221 | Citizen Kane |
AH-54706 | Rear Window |
OW-42200 | Sly | 01:36
KJ-03335 | American Citizen, An |
OW-42201 | Stone Cold | 04:36
(8 rows)

Usin g t h e UPDATE co m m an d , y o u can u p d at e all t h e r ow s in t h e t ab le, a sin g le r o w , or a set of r o w s—it all d ep en d s o n t h e WHERE
clau se. Th e SET clau se in t h is ex am p l e u p d at es a sin g le co l u m n in all t h e r ow s t h at sat isf y t h e WHERE clau se. I f y ou w an t t o u p d at e
m u lt ip le colu m n s, list each assig n m en t , sep ar at ed b y co m m a s:

movies=# UPDATE tapes


movies-# SET duration = '1 hour 52 minutes', title = 'Stone Cold'
movies-# WHERE tape_id = 'OW-42201';
UPDATE 1
movies=# SELECT * FROM tapes;
tape_id | title | duration
----------+----------------------+----------
AB-12345 | The Godfather |
AB-67472 | The Godfather |
MC-68873 | Casablanca |
OW-41221 | Citizen Kane |
AH-54706 | Rear Window |
OW-42200 | Sly | 01:36
KJ-03335 | American Citizen, An |
OW-42201 | Stone Cold | 01:52
(8 rows)

Th e UPDATE st at em en t d isp lay s t h e n u m b er of r o w s t h at w er e m od if ied . Th e f ollow in g UPDATE w ill m o d i f y t h r ee of t h e sev en r o w s


in t h e customers t ab le:

Co d e Vi ew : Scr oll / Sh ow All

movies=# SELECT * FROM customers;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
5 | Funkmaster, Freddy | 555-FUNK | |
7 | Gull, Jonathon LC | 555-1111 | 1984-02-05 |
8 | Grumby, Jonas | 555-2222 | 1984-02-21 |
(7 rows)

movies=# UPDATE customers


movies-# SET balance = 0
movies-# WHERE balance IS NULL;
UPDATE 3
movies=# SELECT * FROM customers;
customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
5 | Funkmaster, Freddy | 555-FUNK | | 0.00
7 | Gull, Jonathon LC | 555-1111 | 1984-02-05 | 0.00
8 | Grumby, Jonas | 555-2222 | 1984-02-21 | 0.00
(7 rows)

www.allitebooks.com
DELETE

Like UPDATE, t he DELETE com m and is sim ple. The general form at of t he DELETE com m and is

DELETE FROM table [ WHERE condition ]

The DELETE com m and rem oves all rows t hat sat isfy t he ( opt ional) WHERE clause. Here is an exam ple:

Code View: Scroll / Show All

movies=# SELECT * FROM tapes;


tape_id | title | duration
----------+----------------------+----------
AB-12345 | The Godfather |
AB-67472 | The Godfather |
MC-68873 | Casablanca |
OW-41221 | Citizen Kane |
AH-54706 | Rear Window |
OW-42200 | Sly | 01:36
KJ-03335 | American Citizen, An |
OW-42201 | Stone Cold | 01:52
(8 rows)
movies=# BEGIN WORK;
BEGIN
movies=# DELETE FROM tapes WHERE duration IS NULL;
DELETE 6
movies=# SELECT * FROM tapes;
tape_id | title | duration
----------+------------+----------
OW-42200 | Sly | 01:36
OW-42201 | Stone Cold | 01:52
(2 rows)

movies=# ROLLBACK;
ROLLBACK

Before we execut ed t he DELETE com m and, t here were eight rows in t he tapes t able, and six of t hese t apes had a NULL
durat ion.

You can see t hat t he DELETE st at em ent ret urns t he num ber of rows delet ed ( " DELETE 6") . Aft er t he DELETE st at em ent , only
t wo tapes rem ain.

I f you om it t he WHERE clause in a DELETE com m and, Post greSQL will delet e all rows. Sim ilarly, forget t ing t he WHERE clause for
an UPDATE com m and updat es all rows. Be careful!
A ( V e r y ) Sh o r t I n t r o d u ct i o n t o T r a n sa ct i o n P r o ce ssi n g

You m ight have not iced t wo new com m ands in t his exam ple. The BEGIN WORK and ROLLBACK com m ands are used for
t ransact ion processing. A t ransact ion is a group of com m ands. Usually, a t ransact ion includes one or m ore t able m odificat ions
( INSERTs, DELETEs, and UPDATEs) .

BEGIN WORK m arks t he beginning of a t ransact ion. I nside of a t ransact ion, any changes t hat you m ake t o t he dat abase are
t em porary changes. There are t wo ways t o m ark t he end of a t ransact ion: COMMIT and ROLLBACK. I f you COMMIT a
t ransact ion, you are t elling Post greSQL t o writ e all t he changes m ade wit hin t he t ransact ion int o t he dat abase—in ot her
words, when you COMMIT a t ransact ion, t he changes becom e perm anent . When you ROLLBACK a t ransact ion, all changes
m ade wit hin t he t ransact ion are discarded.

You can see t hat t ransact ions are handy in t hat you can discard your changes if you change your m ind. But t ransact ions are
im port ant for anot her reason. Post greSQL guarant ees t hat all t he m odificat ions in a t ransact ion will com plet e, or none of
t hem will com plet e. The classic exam ple of t he im port ance of t his propert y is t o pret end t hat you are t ransferring m oney
from one bank account t o anot her. This t ransact ion m ight be writ t en in t wo st eps. The first st ep is t o subt ract an am ount
from t he first account . The second st ep is t o add t he am ount t o t he second account . Now consider what would happen if your
syst em crashed aft er com plet ing t he first st ep, but before t he second st ep. Som ehow, you've lost m oney! I f you wrap t hese
st eps in a t ransact ion, Post greSQL prom ises t hat t he first st ep will be rolled back if t he second st ep fails ( act ually, t he
t ransact ion will be rolled back unless you perform a COMMIT).

I 'll cover t he t ransact ion processing feat ures of Post greSQL in great det ail in Chapt er 3.
Cr e a t in g N e w Ta b le s U sin g CREATE TABLE...AS

Let ' s t u r n ou r at t en t ion t o som et h in g com p let ely d if f er en t . Ear lier in t h is ch ap t er , y ou lear n ed h o w t o u se t h e INSERT st at em en t t o
st or e d at a in a t ab le. Som et im es, y ou w an t t o cr eat e a n ew t ab le b ased on t h e r esu lt s of a SELECT co m m an d . Th at ' s ex act ly w h at
t h e CREATE TABLE...AS co m m an d is d esig n ed t o d o .

Th e f or m at of CREATE TABLE...AS is

CREATE [ TEMPORARY | TEMP ] TABLE table [ (column [, ...] ) ]


AS select_clause

Wh en y ou ex ecu t e a CREATE TABLE...AS co m m an d , Post g r eSQL au t om at ically cr eat es a n ew t ab le. Each colu m n in t h e n ew t ab le
cor r esp on d s t o a colu m n r et u r n ed b y t h e SELECT clau se. I f y ou in clu d e t h e TEMPORARY ( or TEMP) k ey w or d , Post g r eSQL w ill cr eat e a
t em p or ar y t ab le. Th is t ab le is in v isible t o ot h er u ser s an d is d est r oy ed w h en y ou en d y ou r Post g r eSQL session . A t em p or ar y t ab le is
u sef u l b ecau se y ou d on ' t h av e t o r em em b er t o r em ov e t h e t ab le lat er —Post g r eSQL t ak es car e of t h at d et ail f or y ou .

Let ' s look at an ex am p le. A f ew p ag es ear lier in t h e ch ap t er , y ou cr eat ed a com p lex j oin b et w een t h e customers, rentals, an d
tapes t ab les. Let ' s cr eat e a n ew t ab le b ased o n t h at q u er y so y ou d on ' t h av e t o k eep en t er in g t h e sam e com p lex q u er y [ 13] :

[ 13]
So m e r ead er s ar e p r ob ab ly t h in k in g , " Hey , y ou sh ou ld u se a v iew t o d o t h at ! " You ' r e r ig h t , y ou ' ll soon see t h at I
j u st n eed ed a b ad ex am p le.

Cod e View : Scr oll / Sh ow All

movies=# CREATE TABLE info AS


movies-# SELECT customers.customer_name, rentals.tape_id, tapes.title
movies-# FROM customers FULL OUTER JOIN rentals
movies-# ON customers.customer_id = rentals.customer_id
movies-# FULL OUTER JOIN tapes
movies-# ON tapes.tape_id = rentals.tape_id;
SELECT
movies=# SELECT * FROM info;
customer_name | tape_id | title
----------------------+----------+----------------------
Jones, Henry | AB-12345 | The Godfather
Panky, Henry | AB-67472 | The Godfather
| | Rear Window
| | American Citizen, An
Panky, Henry | MC-68873 | Casablanca
Jones, Henry | OW-41221 | Citizen Kane
Rubin, William | |
Wonderland, Alice N. | |
Funkmaster, Freddy | |
Gull, Jonathon LC | |
Grumby, Jonas | |
| | Sly
| | Stone Cold
(13 rows)

Th is is t h e sam e com p lex q u er y t h at y ou saw ear lier . I 'll p oin t ou t a f ew t h in g s ab ou t t h is ex am p le. Fir st , n ot ice t h at t h e SELECT
co m m an d select ed t h r ee col u m n s ( customer_name, tape_id, title) —t h e r esu lt t ab le h as t h r ee colu m n s. Nex t , y ou can cr eat e a
t ab le u sin g a n ar bit r ar ily com p lex SELECT co m m an d . Fin ally , n ot ice t h at t h e TEMPORARY k ey w or d is n ot in clu d ed ; t h er ef or e, info is
a p er m an en t t ab le an d is v isible t o ot h er u ser s.

Wh at h ap p en s if y ou t r y t o cr eat e t h e info t ab le ag ain ?

movies=# CREATE TABLE info AS


movies-# SELECT customers.customer_name, rentals.tape_id, tapes.title
movies-# FROM customers FULL OUTER JOIN rentals
movies-# ON customers.customer_id = rentals.customer_id
movies-# FULL OUTER JOIN tapes
movies-# ON tapes.tape_id = rentals.tape_id;
ERROR: Relation 'info' already exists

As y ou m ig h t ex p ect , y ou r eceiv e an er r or m essag e b ecau se t h e info t ab le alr ead y ex ist s. CREATE TABLE...AS w ill n ot
au t om at ically d r op an ex ist in g t ab le. No w let ' s see w h at h ap p en s if y ou in clu d e t h e TEMPORARY k ey w or d :

movies=# CREATE TEMPORARY TABLE info AS


movies-# SELECT * FROM tapes;
SELECT
movies=# SELECT * FROM info;
tape_id | title | duration
----------+----------------------+----------
AB-12345 | The Godfather |
AB-67472 | The Godfather |
MC-68873 | Casablanca |
OW-41221 | Citizen Kane |
AH-54706 | Rear Window |
OW-42200 | Sly | 01:36
KJ-03335 | American Citizen, An |
OW-42201 | Stone Cold | 01:52
(8 rows)

Th is t im e, t h e CREATE TABLE...AS co m m an d su cceed ed . Wh en I SELECT f r om info, I see a cop y of t h e tapes t ab le. Doesn ' t t h is
v iolat e t h e r u le t h at I m en t ion ed ear lier ( " CREATE TABLE...AS w ill n ot au t om at ically d r op an ex ist in g t ab le" ) ? No t r eally . Wh en y ou
cr eat e a t em p or ar y t able, y ou ar e h id in g an y p er m an en t t ab le of t h e sam e n am e—t h e or ig in al ( p er m an en t ) t ab le st ill ex ist s. Ot h er
u ser s w ill st ill see t h e p er m an en t t ab le. I f y ou DROP t h e t em p or ar y t ab le, t h e p er m an en t t ab le w ill r eap p ear :

Cod e View : Scr oll / Sh ow All

movies=# SELECT * FROM info;


tape_id | title | duration
----------+----------------------+----------
AB-12345 | The Godfather |
AB-67472 | The Godfather |
MC-68873 | Casablanca |
OW-41221 | Citizen Kane |
AH-54706 | Rear Window |
OW-42200 | Sly | 01:36
KJ-03335 | American Citizen, An |
OW-42201 | Stone Cold | 01:52
(8 rows)

movies=# DROP TABLE info;


DROP
movies=# SELECT * FROM info;
customer_name | tape_id | title
----------------------+----------+----------------------
Jones, Henry | AB-12345 | The Godfather
Panky, Henry | AB-67472 | The Godfather
| | Rear Window
| | American Citizen, An
Panky, Henry | MC-68873 | Casablanca
Jones, Henry | OW-41221 | Citizen Kane
Rubin, William | |
Wonderland, Alice N. | |
Funkmaster, Freddy | |
Gull, Jonathon LC | |
Grumby, Jonas | |
| | Sly
| | Stone Cold
(13 rows)
U sin g VIEW

I n t h e p r ev iou s sect ion , I u sed t h e CREATE TABLE...AS co m m a n d t o cr eat e t h e info t ab le so t h at y o u d id n ' t h av e t o t y p e in t h e


sam e com p lex q u er y ov er an d ov er ag ain . Th e p r o b l em w it h t h at ap p r o ach is t h at t h e info t ab le is a sn ap sh o t of t h e u n d er ly in g
t ab les at t h e t im e t h at t h e CREATE TABLE...AS co m m a n d w as ex ecu t ed . I f an y o f t h e u n d er ly in g t ab les ch an g e ( an d t h ey
p r ob ab ly w ill) , t h e info t ab le w ill b e ou t of sy n ch .

For t u n at ely , Po st g r eSQL p r ov id es a m u ch b et t er solu t ion t o t h is p r ob lem — t h e v iew . A v iew is a n am ed q u er y . Th e sy n t ax y ou u se


t o cr eat e a v iew is n ear ly id en t ical t o t h e CREATE TABLE...AS co m m a n d :

CREATE VIEW view AS select_clause;

Let ' s g et r id of t h e info t ab le an d r ep lace it w it h a v i ew :

movies=# DROP TABLE info;


DROP
movies=# CREATE VIEW info AS
movies-# SELECT customers.customer_name, rentals.tape_id,tapes.title
movies-# FROM customers FULL OUTER JOIN rentals
movies-# ON customers.customer_id = rentals.customer_id
movies-# FULL OUTER JOIN tapes
movies-# ON tapes.tape_id = rentals.tape_id;
CREATE

Wh ile u sin g psql, y o u can see a list of t h e v i ew s in y ou r d at ab ase u sin g t h e \dv m et a - co m m a n d :

movies=# \dv
List of relations
Name | Type | Owner
------+------+---------------
info | view | bruce
(1 row)

You can see t h e d ef in it ion of a v i ew u sin g t h e \d view-name m et a - co m m a n d :

movies=# \d info
View "info"
Attribute | Type | Modifier
---------------+-----------------------+----------
customer_name | character varying(50) |
tape_id | character(8) |
title | character varying(80) |
View definition: SELECT customers.customer_name,
rentals.tape_id, tapes.title
FROM (( customers FULL JOIN rentals
ON ((customers.customer_id = rentals.customer_id)))
FULL JOIN tapes
ON ((tapes.tape_id = rentals.tape_id)));

You can SELECT f r om a v i ew in ex act ly t h e sam e w a y t h at y ou can SELECT f r om a t ab le:

movies=# SELECT * FROM info WHERE tape_id IS NOT NULL;


customer_name | tape_id | title
---------------+----------+---------------
Jones, Henry | AB-12345 | The Godfather
Panky, Henry | AB-67472 | The Godfather
Panky, Henry | MC-68873 | Casablanca
Jones, Henry | OW-41221 | Citizen Kane
(4 rows)

Th e g r eat t h in g ab o u t a v iew is t h at it is al w ay s in sy n ch w it h t h e u n d er ly in g t ab les. Let ' s ad d a n ew rentals r o w :

movies=# INSERT INTO rentals VALUES( 'KJ-03335', '2001-11-26', 8 );


INSERT 38488 1

an d t h en r ep eat t h e p r ev iou s q u er y :

movies=# SELECT * FROM info WHERE tape_id IS NOT NULL;


customer_name | tape_id | title
---------------+----------+----------------------
Jones, Henry | AB-12345 | The Godfather
Panky, Henry | AB-67472 | The Godfather
Grumby, Jonas | KJ-03335 | American Citizen, An
Panky, Henry | MC-68873 | Casablanca
Jones, Henry | OW-41221 | Citizen Kane
(5 rows)

To h elp y o u u n d er st an d h o w a v i ew w or k s, y o u m i g h t i m ag i n e t h at t h e f ollow in g seq u en ce of ev en t s occu r s each t im e y o u SELECT


f r om a v i ew :

1. Post g r eSQL cr eat es a t em p or ar y t ab le b y ex ecu t in g t h e SELECT co m m a n d u sed t o d ef in e t h e v iew .

2. Post g r eSQL ex ecu t es t h e SELECT co m m a n d t h at y o u en t er ed , su b st it u t in g t h e n a m e of t em p or ar y t ab le ev er y w h er e t h at


y ou u sed t h e n a m e of t h e v iew .

3. Post g r eSQL d est r oy s t h e t em p o r ar y t ab le.

Th is is n ot w h at act u ally occu r s u n d er t h e cov er s, b u t it ' s t h e easiest w ay t o t h in k ab o u t v i ew s.

Un lik e ot h er r elat ion al d at ab ases, Post g r eSQL t r eat s all v iew s as r ead- on ly —y ou can ' t INSERT, DELETE, o r UPDATE a v iew .

To d est r oy a v i ew , y ou u se t h e DROP VIEW co m m a n d :

movies=# DROP VIEW info;


DROP
Su m m a r y

This chapt er has given you a gent le int roduct ion t o Post greSQL. You have seen how t o inst all Post greSQL on your syst em and
how t o configure it for use. You've also creat ed a sam ple dat abase t hat you'll use t hroughout t he rest of t his book.

I n t he next chapt er, I 'll discuss t he m any Post greSQL dat a t ypes in m ore dept h, and I 'll give you som e guidelines for
choosing bet ween t hem .
Chapter 2 . W orking w ith Data in PostgreSQL

When you creat e a t able in Post greSQL, you specify t he t ype of dat a t hat you will st ore in each colum n. For exam ple, if you
are st oring a cust om er nam e, you will want t o st ore alphabet ic charact ers. I f you are st oring a cust om er's birt h dat e, you will
want t o st ore values t hat can be int erpret ed as dat es. An account balance would be st ored in a num eric colum n.

Every value in a Post greSQL dat abase is defined wit hin a dat a t ype. Each dat a t ype has a nam e ( NUMERIC, TIMESTAMP,
CHARACTER, and so on) and a range of valid values. When you ent er a value in Post greSQL, t he dat a t hat you supply m ust
conform t o t he synt ax required by t he t ype. Post greSQL defines a set of funct ions t hat can operat e on each dat a t ype; you
can also define your own funct ions. Every dat a t ype has a set of operat ors t hat can be used wit h values of t hat t ype. An
operat or is a sym bol used t o build up com plex expressions from sim ple expressions. You're already fam iliar wit h arit hm et ic
operat ors such as + ( addit ion) and - ( subt ract ion) . An operat or represent s som e sort of com put at ion applied t o one or m ore
operands. For exam ple, in t he expression 5 + 3, + is t he operat or and 5 and 3 are t he operands. Most operat ors require t wo
operands, som e require a single operand, and ot hers can funct ion in eit her cont ext . An operat or t hat works wit h t wo
operands is called a binary operat or. An operat or t hat works wit h one operand is called a unary operat or.

You can convert m ost values from one dat a t ype t o anot her. I 'll describe t ype conversion at t he end of t his chapt er.

This chapt er explores each of t he dat a t ypes built int o a st andard Post greSQL dist ribut ion ( yes, you can also define your own
cust om dat a t ypes) . For each t ype, I 'll show you t he range of valid values, t he synt ax required t o ent er a value of t hat t ype,
and a list of operat ors t hat you can use wit h t hat t ype.

Each sect ion includes a t able showing which operat ors you can use wit h a specific dat a t ype. For exam ple, in t he discussion
of charact er dat a t ypes, you will see t hat t he st ring concat enat ion operat or ( ||) can be used t o append one st ring value t o
t he end of anot her st ring value. The operat or t able in t hat sect ion shows t hat you use t he st ring concat enat ion operat or t o
j oin t wo CHARACTER values, t wo VARCHAR values, or t wo TEXT values. What t he t able does not show is t hat you can use t he
st ring concat enat ion operat or t o append an INTEGER value t o t he end of a VARCHAR. Post greSQL aut om at ically convert s t he
INTEGER value int o a st ring value and t hen applies t he || operat or. I t 's im port ant t o keep t his point in m ind as you read
t hrough t his chapt er —t he operat or t ables don't show all possible com binat ions, only t he com binat ions t hat don't require t ype
conversion.

Lat er in t his chapt er, I 'll give a brief descript ion of t he process t hat Post greSQL uses t o decide whet her an operat or ( or
funct ion) is applicable, and if so, which values require aut om at ic t ype conversion. For a det ailed explanat ion of t he process,
see Chapt er 5 of t he Post greSQL User's Guide t hat cam e wit h your copy of Post greSQL.

Besides t he operat ors list ed in t his sect ion, Post greSQL offers a huge select ion of funct ions t hat you can call from wit hin
expressions. For a com plet e, up- to- dat e list of funct ions, see t he Post greSQL User's Guide.

NULL V a l u e s

NULL values represent m issing, unknown, or not -applicable values. For exam ple, let 's say t hat you want t o add a
membership_expiration_date t o t he customers t able. Som e cust om ers m ight be perm anent m em bers—t heir m em berships
will never expire. For t hose cust om ers, t he membership_expiration_date is not applicable and should be set t o NULL. You
m ay also find som e cust om ers who don't want t o provide you wit h t heir birt h dat es. The birth_date colum n for t hese
cust om ers should be NULL.

I n one case, NULL m eans not applicable. I n t he ot her case, NULL m eans don't know. A NULL membership_expiration_date
does not m ean t hat you don't know t he expirat ion dat e, it m eans t hat t he expirat ion dat e does not apply. A NULL
birth_date does not m ean t hat t he cust om er was never born( ! ) ; it m eans t hat t he dat e of birt h is unknown.

Of course, when you creat e a t able, you can specify t hat a given colum n cannot hold NULL values ( NOT NULL) . When you do
so, you aren't affect ing t he dat a t ype of t he colum n; you're j ust saying t hat NULL is not a legal value for t hat part icular
colum n. A colum n t hat prohibit s NULL values is m andat ory ; a colum n t hat allows NULL values is opt ional.

You m ay be wondering how a dat a t ype could hold all values legal for t hat t ype, plus one m ore value. The answer is t hat
Post greSQL knows whet her a given colum n is NULL not by looking at t he colum n it self, but by first exam ining a NULL indicat or
( a single bit ) st ored separat ely from t he colum n. I f t he NULL indicat or for a given row/ colum n is set t o TRUE, t he dat a st ored
in t he row/ colum n is m eaningless. This m eans t hat a dat a row is com posed of values for each colum n plus an array of
indicat or bit s—one bit for each opt ional colum n.
Ch a r a ct e r V a l u e s

There are t hree charact er ( or, as t hey are m ore com m only known, st ring) dat a t ypes offered by Post greSQL. A st ring value is j ust t hat —a
st ring of zero or m ore charact ers. The t hree st ring dat a t ypes are CHARACTER(n), CHARACTER VARYING(n), and TEXT.

A value of t ype CHARACTER(n) can hold a fixed- lengt h st ring of n charact ers. I f you st ore a value t hat is short er t han n, t he value is padded
wit h spaces to increase t he lengt h to exact ly n charact ers. You can abbreviat e CHARACTER(n) t o CHAR(n). I f you om it t he " (n) " when you
creat e a CHARACTER colum n, t he lengt h is assum ed t o be 1.

The CHARACTER VARYING(n) t ype defines a variable- lengt h st ring of at m ost n charact ers. VARCHAR(n) is a synonym for CHARACTER VARYING
(n). I f you om it t he " (n) " when creat ing a CHARACTER VARYING colum n, you can st ore st rings of any lengt h in t hat colum n.

The last st ring t ype is TEXT. A TEXT colum n is equivalent t o a VARCHAR colum n wit hout a specified lengt h—a TEXT colum n can st ore st rings of
any lengt h.

Sy n t a x f o r Li t e r a l V a l u e s

A st ring value is a sequence of charact ers surrounded by a pair of delim it ers. Prior t o Post greSQL version 8.0, you had t o use a pair of single
quot e charact ers t o delim it a st ring value. St art ing wit h version 8.0, you can also define your own delim it ers for each st ring value using a form
known as dollar quot ing. Each of t he following is a valid st ring value:

'I am a string'
'3.14159265'
''

You can also writ e t hese sam e st ring values using dollar quot ing as follows:

$$I am a string$$
$$3.14159265$$
$$$$

The first exam ple is obviously a st ring value. '3.14159265' is also a st ring value—at first glance it m ay look like a num eric value but t he fact it
is surrounded by single quot es t ells you t hat it is really a st ring. The t hird exam ple ( '') is also a valid st ring: I t is t he st ring com posed of zero
charact ers ( t hat is, it has a lengt h of zero) . I t is im port ant t o underst and t hat an em pt y st ring is not t he sam e as a NULL value. An em pt y st ring
m eans t hat you have a known value t hat j ust happens to be em pt y, whereas NULL im plies t hat t he value is unknown. Consider, for exam ple,
t hat you are st oring an em ployee nam e in your dat abase. You m ight creat e t hree colum ns t o hold t he com plet e nam e: first_name,
middle_name, and last_name. I f you find an em ployee whose middle_name is NULL, t hat should im ply t hat t he em ployee m ight have a m iddle
nam e, but you don't know what it is. On t he ot her hand, if you find an em ployee who has no m iddle nam e, you should st ore t hat middle_name
as an em pt y st ring. Again, NULL im plies t hat you don't have a piece of inform at ion; an em pt y st ring m eans t hat you do have t he inform at ion,
but it j ust happens t o be em pt y.

I f a st ring is delim it ed wit h single quot es, how do you represent a st ring t hat happens t o include a single quot e? There are four choices. First ,
you can em bed a single quot e wit hin a st ring by ent ering t wo adj acent quot es. For exam ple, t he st ring "Where's m y car?" could be ent ered as:

'Where''s my car?'

Two ot her alt ernat ives involve an escape charact er. An escape is a special charact er t hat t ells Post greSQL t hat t he charact er ( or charact ers)
following t he escape is to be int erpret ed as a direct ive inst ead of as a lit eral value. I n Post greSQL, t he escape charact er is t he backslash ( \) .
When Post greSQL sees a backslash in a st ring lit eral, it discards t he backslash and int erpret s t he following charact ers according t o t he following
rules:

Code View: Scroll / Show All

\b is the backspace character


\f is the form feed character
\r is the carriage-return character
\n is the newline character
\t is the tab character

\xxx (where xxx is an octal number) means the character whose ASCII value is xxx.

I f any charact er, ot her t han t hose m ent ioned, follows t he backslash, it is t reat ed as it s lit eral value. So, if you want t o include a single quot e in
a st ring, you can escape t he quot e by preceding it wit h a backslash:

'Where\'s my car?'

Or you can em bed a single quot e ( or any charact er) wit hin a st ring by escaping it s ASCI I value ( in oct al) , as in

'Where\047s my car?'

Finally, you can use dollar quot ing. To writ e t he st ring "Where's m y car?" in dollar- quot ed form , use t his form at :
$$Where's my car?$$

Notice that in this form, the embedded single quote doesn't cause any problems. When you write a string in dollar-quoted form, the single
quote character has no special meaning€it's just another character. You may be thinking that dollar quoting just trades one special delimiter (a
single quote) for another (two dollar signs). After all, what happens if you want to embed two consecutive dollar signs in a string value? OK,
that's not very likely, but PostgreSQL doesn't just ignore the problem; it lets you define your own delimiters.

In its most simple form, a dollar-quote delimiter is just a pair of dollar signs. To define your own delimiter, simply include a tag between the
two dollar signs at the beginning of the string and include the same tag between the two dollar signs at the end of the string. You get to choose
the tag but be aware that tags are case sensitive. Here's a string written using a custom delimiter:

$MyTag$That restaurant's rated 3 $$$; it must be expensive$MyTag$

When you define your own delimiter, embedded single quotes lose their special meaning and so do consecutive dollar signs. You can define a
custom delimiter for each string value that you write, but remember that you don't have to define a custom delimiter unless your string
contains consecutive dollar signs.

To summarize, here are the four ways that you can embed a single quote within a string:

'It''s right where you left it'


'It\'s right where you left it'
'It\047s right where you left it'
$$It's right where you left it$$

Supported Operators

PostgreSQL offers a large number of string operators. One of the most basic operations is string concatenation. The concatenation operator
(||) is used to combine two string values into a single TEXT value. For example, the expression

'This is ' || 'one string'

will evaluate to the value: 'This is one string'. And the expression

'The current time is ' || now()

will evaluate to a TEXT value such as, 'The current time is 2002-01-01 19:45:17-04'.

PostgreSQL also gives you a variety of ways to compare string values. All comparison operators return a BOOLEAN value; the result will be TRUE,
FALSE, or NULL. A comparison operator will evaluate to NULL if either of the operands are NULL.

The equality (=) and inequality (<>) operators behave the way you would expect€two strings are equal if they contain the same characters (in
the same positions); otherwise, they are not equal. You can also determine whether one string is greater than or less than another (and of
course, greater than or equal to and less than or equal to).

Table 2.1[1] shows a few sample string comparisons.

[1] You might find the format of this table a bit confusing at first. In the first column, I use the '•' character to represent any one
of the operators listed in the remaining columns. So, the first row of the table tells you that 'string' < 'string' evaluates to
FALSE, 'string' <= 'string' evaluates to TRUE, 'string' = 'string' evaluates to TRUE, and so forth. I'll use the '•'
character throughout this chapter to indicate an operator.

Table 2.1. Sample String Comparisons

Operator (€)
Expression < <= = <> >= >

'string' • FALSE TRUE TRUE FALSE TRUE FALSE


'string'

'string1' • FALSE FALSE FALSE TRUE TRUE TRUE


'string'

'String1' • TRUE TRUE FALSE TRUE FALSE FALSE


'string'

You can also use pattern-matching operators with string values. PostgreSQL defines eight pattern-matching operators, but the names are a bit
contrived and not particularly intuitive.

Table 2.2 contains a summary of the string operators.

Table 2.2. String Operators

Operator Meaning Case Sensitive?

www.allitebooks.com
|| Concat enat ion Not applicable
~ Mat ches regular expression Yes
~~ Mat ches LIKE expression Yes

~* Mat ches regular expression No


~~* Mat ches LIKE expression No

!~ Does not m at ch regular Yes


expression
!~~ Does not m at ch LIKE expression Yes
!~* Does not m at ch regular No
expression
!~~* Does not m at ch LIKE expression No

The first set of pat t ern- m at ching operat ors is relat ed t o t he LIKE keyword. ~~ is equivalent t o LIKE. The ~~* operat or is equivalent t o ILIKE —
it is a case- insensit ive version of LIKE. !~~ and !~~* are equivalent t o NOT LIKE and NOT ILIKE, respect ively.

The second set of pat t ern- m at ching operat ors is used t o m at ch a st ring value against a regular expression ( regular expression s are
described in m ore det ail in Chapt er 1, "I nt roduct ion t o Post greSQL and SQL") . The nam ing convent ion for t he regular expression operat ors is
sim ilar to t hat for t he LIKE operat ors—regular expression operat ors are indicat ed wit h a single t ilde and LIKE operat ors use t wo t ildes. The ~
operat or com pares a st ring against a regular expression ( ret urning True if t he st ring sat isfies t he regular expression) . ~* com pares a st ring
against a regular expression, ignoring differences in case. The !~ operat or ret urns False if t he st ring value m at ches t he regular expression
( and ret urns True if t he st ring sat isfies t he regular expression) . The !~* operat or ret urns False if t he st ring value m at ches t he regular
expression, ignoring differences in case, and ret urns True ot herwise.

Ty pe Con v e r sion Op e r a t or s

There are t wo im port ant operat ors t hat you should know about before we go m uch furt her —act ually it 's one operat or, but you
can writ e it t wo different ways.

The CAST() operat or is used to convert a value from one dat a t ype to anot her. There are t wo ways to writ e t he CAST()
operat or:

CAST(expression AS type)
expression::type

No m at t er which way you writ e it , t he expression is convert ed int o t he specified t ype. Of course, not every value can be
convert ed int o every t ype. For exam ple, t he expression CAST( 'abc ' AS INTEGER ) result s in an error ( specifically, 'pg_atoi:
error in "abc": can't parse "abc" ') because 'abc' obviously can't be convert ed int o an int eger.

Most oft en, your cast ing requirem ent s will com e in eit her of t wo form s: you will need t o CAST() a st ring value int o som e ot her
t ype, or you will need t o convert bet ween relat ed t ypes ( for exam ple, INTEGER int o NUMERIC) . When you CAST() a st ring value
int o anot her dat a t ype, t he st ring m ust be in t he form required by t he lit eral synt ax for t he t arget dat a t ype. Each of t he
following sect ions describes t he lit eral synt ax required by each t ype. When you convert bet ween relat ed dat a t ypes, you m ay
gain or lose precision. For exam ple, when you convert from a fract ional num eric t ype int o an int eger t ype, t he value is
rounded:

movies=# SELECT CAST( CAST( 12345.67 AS FLOAT8 ) AS INTEGER );


?column?
----------
12346
N u m e r ic V a lu e s

Post greSQL provides a variet y of num eric dat a t ypes. Of t he six num eric t ypes, four are exact ( SMALLINT, INTEGER, BIGINT,
NUMERIC(p,s)) and t wo are approxim at e ( REAL, DOUBLE PRECISION).

Three of t he four exact num eric t ypes ( SMALLINT, INTEGER, and BIGINT) can st ore only int eger values. The fourt h ( NUMERIC
(p,s)) can accurat ely st ore any value t hat fit s wit hin t he specified num ber ( p) of digit s.

The approxim at e num eric t ypes, on t he ot her hand, cannot st ore all values exact ly. I nst ead, an approxim at e dat a t ype st ores
an approxim at ion of a real num ber. The DOUBLE PRECISION t ype, for exam ple, can st ore a t ot al of 15 significant digit s, but
when you perform calculat ions using a DOUBLE PRECISION value, you can run int o rounding errors. I t 's easy t o see t his
problem :

movies=# select 2000.3 - 2000.0;


?column?
-------------------
0.299999999999955
(1 row)

Siz e , P r e cisio n , a n d R a n g e - o f - V a l u e s

The four exact dat a t ypes can accurat ely st ore any value wit hin a t ype- specific range. The exact num eric t ypes are described
in Table 2.3.

Ta ble 2 .3 . Ex act N um eric Dat a Types

Type N a m e Size in Byt e s M inim um Va lue Maxim um Value


SMALLINT 2 - 32768 + 32767
INTEGER 4 - 2147483648 + 2147483647
BIGINT 8 - 9223372036854775808 + 9223372036854775807
NUMERIC(p,s) 11+ ( p/ 2) No lim it No lim it

The NUMERIC(p,s) dat a t ype can accurat ely st ore any num ber t hat fit s wit hin t he specified num ber of digit s. When you
creat e a colum n of t ype NUMERIC(p,s), you can specify t he t ot al num ber of decim al digit s ( p) and t he num ber of fract ional
digit s ( s). The t ot al num ber of decim al digit s is called t he precision, and t he num ber of fract ional digit s is called t he scale.

Table 2.3 shows t hat t here is no lim it t o t he values t hat you can st ore in a NUMERIC(p,s) colum n. I n fact , t here is a lim it
( norm ally 1,000 digit s) , but you can adj ust t he lim it by changing a sym bol and rebuilding your Post greSQL server from
source code.

The t wo approxim at e num eric t ypes are nam ed REAL and DOUBLE PRECISION. Table 2.4 shows t he size and range for each of
t hese dat a t ypes, while Table 2.5 shows alt ernat ive nam es for t he dat a t ypes.

Ta ble 2 .4 . Approx im a t e N um eric Da t a Type s

Type N a m e Size in Byt e s Ra nge


REAL 4 6 decim al digit s
DOUBLE PRECISION 8 15 decim al digit s

Ta ble 2 .5 . Alt erna t e N a m es for N um eric D a t a Types

Com m on N am e Synonym s
SMALLINT INT2

INTEGER INT, INT4


BIGINT INT8

NUMERIC(p,s) DECIMAL(p,s)

REAL FLOAT, FLOAT4


DOUBLE PRECISION FLOAT8

SERIAL, BIGSERIAL, and SEQUENCES

Besides the numeric data types already described, PostgreSQL supports two "advanced" numeric types:
SERIAL and BIGSERIAL. A SERIAL column is really an unsigned INTEGER whose value automatically increases
(or decreases) by a defined increment as you add new rows. Likewise, a BIGSERIAL is a BIGINT that increases
in value. When you create a BIGSERIAL or SERIAL column, PostgreSQL will automatically create a SEQUENCE
for you. A SEQUENCE is an object that generates sequence numbers for you. I'll talk more about SEQUENCEs
later in this chapter.

Syntax for Literal Values

When you need to enter a numeric literal, you must follow the formatting rules defined by PostgreSQL. There are two distinct
styles for numeric literals: integer and fractional (the PostgreSQL documentation refers to fractional literals as floating-point
literals).

Let's start by examining the format for fractional literals. Fractional literals can be entered in any of the following forms[2]:

[2]
Syntax diagrams are described in detail in Chapter 1.

[-]digits.[digits][E[+|-]digits]
[-][digits].digits[E[+|-]digits]
[-]digits[+|-]digits

Here are some examples of valid fractional literals:

3.14159
2.0e+15
0.2e-15
4e10

A numeric literal that contains only digits is considered to be an integer literal:

[-]digits

Here are some examples of valid integer literals:

-100
55590332
9223372036854775807
-9223372036854775808

A fractional literal is always considered to be of type DOUBLE PRECISION. An integer literal is considered to be of type
INTEGER, unless the value is too large to fit into an integer€in which case, it will be promoted first to type BIGINT, then to
NUMERIC or REAL if necessary.

Supported Operators

PostgreSQL supports a variety of arithmetic, comparison, and bit-wise operators for the numeric data types. Tables 2.6 and
2.7 give some examples of the arithmetic operators.

Table 2.6. Arithmetic Operators for Integers

Data Types Valid Operators (€)

INT2 • INT2 + - * / %

INT2 • INT4 + - * / %

INT4 • INT2 + - * / %

INT4 • INT4 + - * / %
INT4 • INT8 + - * /

INT8 • INT4 + - * /

INT8 • INT8 + - * / %

Table 2.7. Arithmetic Operators for Floats

Data Types Valid Operators (€)

FLOAT4 • FLOAT4 * + - /

FLOAT4 • FLOAT8 * + - /

FLOAT8 • FLOAT4 * + - /

FLOAT8 • FLOAT8 * + - / ^

You use the comparison operators to determine the relationship between two numeric values. PostgreSQL supports the usual
operators: <, <=, <> (not equal), =, >, and >=. You can use the comparison operators with all possible combinations of the
numeric data types (some combinations will require type conversion).

PostgreSQL also provides a set of bit-wise operators that you can use with the integer data types. Bit-wise operators work on
the individual bits that make up the two operands.

The easiest way to understand the bit-wise operators is to first convert your operands into binary notation-for example:

decimal 12 = binary 00001100


decimal 7 = binary 00000111
decimal 21 = binary 00010101

Next, let's look at each operator in turn.

The AND (&) operator compares corresponding bits in each operand and produces a 1 if both bits are 1 and a 0 otherwise. For
example:

00001100 & 00000111 &


00010101 00010101
-------- --------
00000100 00000101

The OR (|) operator compares corresponding bits in each operand and produces a 1 if either (or both) bit is 1 and a 0
otherwise. For example:

00001100 | 00000111 |
00010101 00010101
-------- --------
00011101 00010111

The XOR (#) operator is similar to OR. XOR compares corresponding bits in each operand, and produces a 1 if either bit, but
not both bits, is 1, and produces a 0 otherwise.

00001100 # 00000111 #
00010101 00010101
-------- --------
00011001 00010010

PostgreSQL also provides two bit-shift operators.

The left-shift operator (<<) shifts the bits in the first operand n bits to the left, where n is the second operand. The leftmost n
bits are discarded, and the rightmost n bits are set to 0. A left-shift by n bits is equivalent to multiplying the first operand by
2n €for example:

00001100 << 2(decimal) = 00110000


00010101 << 3(decimal) = 10101000
The right-shift operator (>>) >)>shifts the bits in the first operand n bits to the right, where n is the second operand. The
rightmost n bits are discarded, and the leftmost n bits are set to 0. A right-shift by n bits is equivalent to dividing the first
operand by 2n :

00001100 >> 2(decimal) = 00000011


00010101 >> 3(decimal) = 00000010

The final bit-wise operator is the binary NOT (~). Unlike the other bit-wise operators, NOT is a unary operator€it takes a single
operand. When you apply the NOT operator to a value, each bit in the original value is toggled: ones become zeroes and
zeroes become ones. For example:

~00001100 = 11110011
~00010101 = 11101010

Table 2.8 shows the data types that you can use with the bit-wise operators.

Table 2.8. Bit-Wise Operators for Integers

Data Types Valid Operators (€)

INT2 • INT2 # & | << >>

INT4 • INT4 # & | << >>

INT8 • INT4 << >>

INT8 • INT8 # & |


D a t e / Tim e V a lu e s

Post greSQL support s four basic t em poral dat a t ypes plus a couple of ext ensions t hat deal wit h t im e zone issues.

The DATE t ype is used t o st ore dat es. A DATE value st ores a cent ury, year, m ont h, and day.

The TIME dat a t ype is used t o st ore a t im e- of- day value. A TIME value st ores hours, m inut es, seconds, and m icroseconds. I t
is im port ant t o not e t hat a TIME value does not cont ain a t im e zone—if you want t o include a t im e zone, you should use t he
t ype TIME WITH TIME ZONE. TIMETZ is a synonym for TIME WITH TIME ZONE.

The TIMESTAMP dat a t ype com bines a DATE and a TIME, st oring a cent ury, year, m ont h, day, hour, m inut es, seconds, and
m icroseconds. Unlike t he TIME dat a t ype, a TIMESTAMP does include a t im e zone. I f, for som e reason, you want a dat e/ t im e
value t hat does not include a t im e zone, you can use t he t ype TIMESTAMP WITHOUT TIME ZONE.

The last t em poral dat a t ype is t he INTERVAL. An INTERVAL represent s a span of t im e. I find t hat t he easiest way t o t hink
about INTERVAL values is t o rem em ber t hat an INTERVAL st ores som e ( possibly large) num ber of seconds, but you can group
t he seconds int o larger unit s for convenience. For exam ple, t he CAST( '1 week' AS INTERVAL ) is equal t o CAST( '604800
seconds' AS INTERVAL ), which is equal t o CAST( '7 days' AS INTERVAL ) —you can use whichever form at you find
easiest t o work wit h.

Table 2.9 list s t he size and range for each of t he t em poral dat a t ypes.

Table 2 .9 . Te m por a l Da t a Type Sizes and Ra nges

Da t a Type Size ( in byt e s) Ra nge

DATE 4 - 01- MAR- 4801 BC 31- DEC- 32767

TI ME [ WI THOUT TI ME 4 - 00: 00: 00.00 23: 59: 59.99


ZONE ]

TI ME WI TH TI ME ZONE 12 - 00: 00: 00.00+ 12 23: 59: 59.00- 12

TI MESTAMP [ WI TH TI ME 8 - 24- NOV- 4714 BC 31- DEC- 5874897


ZONE ]

TI MESTAMP WI THOUT 8 - 24- NOV- 4714 BC 31- DEC- 5874897


TI ME ZONE

I NTERVAL 12 -- 178000000 YEARS + 178000000 YEARS

The dat a t ypes t hat cont ain a t im e value ( TIME, TIME WITH TIME ZONE, TIMESTAMP, TIMESTAMP WITH TIME ZONE, and
INTERVAL) have m icrosecond precision. The DATE dat a t ype has a precision of one day.

Sy n t a x f o r Li t e r a l V a l u e s

I covered dat e lit eral synt ax pret t y t horoughly in Chapt er 1 ; see t he sect ion t it led " Working wit h Dat e Values."

You m ay recall from Chapt er 1 t hat dat e values can be ent ered in m any form at s, and you have t o t ell Post greSQL how t o
int erpret am biguous values. Fort unat ely, t he synt ax for TIME, TIMESTAMP, and INTERVAL values is m uch m ore
st raight forward.

A TIME value st ores hours, m inut es, seconds, and m icroseconds. The synt ax for a TIME lit eral is

hh:mm[:ss[.µ ]][AM|PM]µ

where hh specifies t he hour, mm specifies t he num ber of m inut es past t he hour, ss specifies t he num ber of seconds, and µ
specifies t he num ber of m icroseconds. I f you include an AM or PM indicat or, t he hh com ponent m ust be less t han or equal t o
12; ot herwise, t he hour can range from 0 t o 24.

Ent ering a TIME WITH TIME ZONE value is a bit m ore com plex. A TIME WITH TIME ZONE value is a TIME value, plus a t im e
zone. The t im e zone com ponent can be specified in t wo ways. First , you can include an offset ( in m inut es and hours) from
UTC:

hh:mm[:ss[.µ ]][AM|PM][{+|-}HH[:MM]]

where HH is t he num ber of hours and MM is t he num ber of m inut es dist ant from UTC. Negat ive values are considered t o be
west of t he prim e m eridian, and posit ive values are east of t he prim e m eridian.

You can also use a st andard t im e zone abbreviat ion ( such as UTC, PDT, or EST) t o specify t he t im e zone:

hh:mm[:ss[.µ ]][AM|PM][ZZZ]

Table 2.10 shows all t he t im e zone abbreviat ions accept ed by Post greSQL version 8.0.

Ta ble 2 .1 0 . Post greSQL Tim e Zone N am es

N am es Offset Descript ion


FJST - 13: 00 Fij i Sum m er Tim e
FJT - 12: 00 Fij i Tim e
IDLW - 12: 00 I nt ernat ional Dat e Line, West
BST - 11: 00 Bering Sum m er Tim e
NT - 11: 00 Nom e Tim e
NUT - 11: 00 Niue Tim e
AHST - 10: 00 Alaska- Hawaii St d Tim e
CAT - 10: 00 Cent ral Alaska Tim e
HST - 10: 00 Hawaii St d Tim e
THAT - 10: 00 Tahit i Tim e
TKT - 10: 00 Tokelau Tim e
MART - 09: 30 Marquesas Tim e
AKST - 09: 00 Alaska St andard Tim e
GAMT - 09: 00 Gam bier Tim e
HDT - 09: 00 Hawaii/ Alaska Daylight Tim e
YST - 09: 00 Yukon St andard Tim e
AKDT - 08: 00 Alaska Daylight Tim e
PST - 08: 00 Pacific St andard Tim e
YDT - 08: 00 Yukon Daylight Tim e
MST - 07: 00 Mount ain St andard Tim e
PDT - 07: 00 Pacific Daylight Tim e
CST - 06: 00 Cent ral St andard Tim e
EAST - 06: 00 East er I sland Tim e
GALT - 06: 00 Galapagos Tim e
MDT - 06: 00 Mount ain Daylight Tim e
ZP6 - 06: 00 UTC + 6 hours
ACT - 05: 00 At lant ic/ Port o Acre Tim e
CDT - 05: 00 Cent ral Daylight Tim e
COT - 05: 00 Colum bia Tim e
EASST - 05: 00 East er I sland Sum m er Tim e
ECT - 05: 00 Ecuador Tim e
EST - 05: 00 East ern St andard Tim e
PET - 05: 00 Peru Tim e
ZP5 - 05: 00 UTC + 5 hours
ACST - 04: 00 At lant ic/ Port o Acre Sum m er Tim e
AMT - 04: 00 Am azon Tim e ( Port o Velho)
AST - 04: 00 At lant ic St d Tim e ( Canada)
BOT - 04: 00 Bolivia Tim e
CLT - 04: 00 Chile Tim e
ECT - 04: 00 East ern Caribbean Tim e
EDT - 04: 00 East ern Daylight Tim e
GYT - 04: 00 Guyana Tim e
PYT - 04: 00 Paraguay Tim e
VET - 04: 00 Venezuela Tim e
ZP4 - 04: 00 UTC + 4 hours
NFT - 03: 30 Newfoundland St andard Tim e
NST - 03: 30 Newfoundland St andard Tim e
ADT - 03: 00 At lant ic Daylight Tim e
AMST - 03: 00 Am azon Sum m er Tim e ( Port o Velho)
ART - 03: 00 Argent ina Tim e
AWT - 03: 00 Brazil Tim e
BRT - 03: 00 Brasilia Tim e
BST - 03: 00 Brazil St andard Tim e
CLST - 03: 00 Chile Sum m er Tim e
FKST - 03: 00 Falkland I slands Sum m er Tim e
GFT - 03: 00 French Guiana Tim e
PYST - 03: 00 Paraguay Sum m er Tim e
UYT - 03: 00 Uruguay Tim e
WGT - 03: 00 West Greenland Tim e
NDT - 02: 30 Newfoundland Daylight Tim e
BRST - 02: 00 Brasilia Sum m er Tim e
FKT - 02: 00 Falkland I slands Tim e
FNT - 02: 00 Fernando de Noronha Tim e
PMDT - 02: 00 Pierre & Miquelon Daylight Tim e
UYST - 02: 00 Uruguay Sum m er Tim e
WGST - 02: 00 West Greenland Sum m er Tim e
AZOT - 01: 00 Azores Tim e
EGT - 01: 00 East Greenland Tim e
FNST - 01: 00 Fernando de Noronha Sum m er Tim e
SET - 01: 00 Seychelles Tim e
WAT - 01: 00 West Africa Tim e
AZOST + 00: 00 Azores Sum m er Tim e
EGST + 00: 00 East Greenland Sum m er Tim e
GMT + 00: 00 Greenwich Mean Tim e
UTC + 00: 00 Universal Coordinat ed Tim e
UT + 00: 00 Universal Tim e
WET + 00: 00 West ern Europe
ZULU + 00: 00 Universal Tim e
Z + 00: 00 I SO- 8601 Universal Tim e
BST + 01: 00 Brit ish Sum m er Tim e
CET + 01: 00 Cent ral European Tim e
DNT + 01: 00 Dansk Norm al Tim e
FST + 01: 00 French Sum m er Tim e
MET + 01: 00 Middle Europe Tim e
MEWT + 01: 00 Middle Europe Wint er Tim e
MEZ + 01: 00 Middle Europe Zone
NOR + 01: 00 Norway St andard Tim e
SWT + 01: 00 Swedish Wint er Tim e
WEST + 01: 00 West ern Europe Sum m er Tim e
WETDST + 01: 00 West ern Europe Daylight Savings Tim e
BDST + 02: 00 Brit ish Double Sum m er Tim e
CEST + 02: 00 Cent ral European Dayl.Tim e
CETDST + 02: 00 Cent ral European Dayl.Tim e
EET + 02: 00 East ern Europe, USSR Zone 1
FWT + 02: 00 French Wint er Tim e
IST + 02: 00 I srael Tim e
MEST + 02: 00 Middle Europe Sum m er Tim e
METDST + 02: 00 Middle Europe Daylight Tim e
SST + 02: 00 Swedish Sum m er Tim e
BT + 03: 00 Baghdad Tim e
EAT + 03: 00 East Africa Tim e
EAT + 03: 00 I ndian Ant ananarivo Tim e
EEST + 03: 00 East ern Europe Sum m er Tim e
EETDST + 03: 00 East ern Europe Daylight Tim e
HMT + 03: 00 Hellas Medit erranean Tim e
MSK + 03: 00 Moscow Tim e
IRT + 03: 30 I ran Tim e
IT + 03: 30 I ran Tim e
AMT + 04: 00 Arm enia Tim e ( Yerevan)
AZT + 04: 00 Azerbaij an Tim e
EAST + 04: 00 I ndian Ant ananarivo Savings Tim e
GET + 04: 00 Georgia Tim e
MSD + 04: 00 Moscow Sum m er Tim e
MUT + 04: 00 Maurit ius I sland Tim e
RET + 04: 00 Reunion I sland Tim e
SCT + 04: 00 Mahe I sland Tim e
AFT + 04: 30 Kabul Tim e
AMST + 05: 00 Arm enia Sum m er Tim e ( Yerevan)
AZST + 05: 00 Azerbaij an Sum m er Tim e
GEST + 05: 00 Georgia Sum m er Tim e
IOT + 05: 00 I ndian Chagos Tim e
KGT + 05: 00 Kyrgyzst an Tim e
MVT + 05: 00 Maldives I sland Tim e
PKT + 05: 00 Pakist an Tim e
TFT + 05: 00 Kerguelen Tim e
TJT + 05: 00 Taj ikist an Tim e
TMT + 05: 00 Turkm enist an Tim e
UZT + 05: 00 Uzbekist an Tim e
YEKT + 05: 00 Yekat erinburg Tim e
NPT + 05: 45 Nepal St andard Tim e
ALMT + 06: 00 Alm at y Tim e
BDT + 06: 00 Dacca Tim e
BTT + 06: 00 Bhut an Tim e
DUSST + 06: 00 Dushanbe Sum m er Tim e
KGST + 06: 00 Kyrgyzst an Sum m er Tim e
LKT + 06: 00 Lanka Tim e
MAWT + 06: 00 Mawson, Ant arct ica
NOVT + 06: 00 Novosibirsk St andard Tim e
OMST + 06: 00 Om sk Tim e
UZST + 06: 00 Uzbekist an Sum m er Tim e
YEKST + 06: 00 Yekat erinburg Sum m er Tim e
CCT + 06: 30 I ndian Cocos ( I sland) Tim e
MMT + 06: 30 Myanm ar Tim e
ALMST + 07: 00 Alm at y Savings Tim e
CVT + 07: 00 Christ m as I sland Tim e ( I ndian Ocean)
CXT + 07: 00 Christ m as I sland Tim e ( I ndian Ocean)
DAVT + 07: 00 Davis Tim e ( Ant arct ica)
ICT + 07: 00 I ndochina Tim e
JAVT + 07: 00 Java Tim e
KRAST + 07: 00 Krasnoyarsk Sum m er Tim e
NOVST + 07: 00 Novosibirsk Sum m er Tim e
OMSST + 07: 00 Om sk Sum m er Tim e
WAST + 07: 00 West Aust ralian St d Tim e
JT + 07: 30 Java Tim e
AWST + 08: 00 West ern Aust ralia
BNT + 08: 00 Brunei Darussalam Tim e
BORT + 08: 00 Borneo Tim e ( I ndonesia)
CCT + 08: 00 China Coast Tim e
HKT + 08: 00 Hong Kong Tim e
IRKT + 08: 00 I rkut sk Tim e
KRAT + 08: 00 Krasnoyarsk St andard Tim e
MYT + 08: 00 Malaysia Tim e
PHT + 08: 00 Phillipine Tim e
ULAT + 08: 00 Ulan Bat or Tim e
WADT + 08: 00 West Aust ralian DST
WST + 08: 00 West Aust ralian St andard Tim e
MT + 08: 30 Moluccas Tim e
AWSST + 09: 00 West ern Aust ralia Tim e
IRKST + 09: 00 I rkut sk Sum m er Tim e
JAYT + 09: 00 Jayapura Tim e ( I ndonesia)
JST + 09: 00 Japan St d Tim e, USSR Zone 8
KST + 09: 00 Korea St andard Tim e

www.allitebooks.com
PWT + 09: 00 Palau Tim e
ULAST + 09: 00 Ulan Bat or Sum m er Tim e
WDT + 09: 00 West Aust ralian DST
YAKT + 09: 00 Yakut sk Tim e
ACST + 09: 30 Cent ral Aust ralia
CAST + 09: 30 Cent ral Aust ralian ST
SAST + 09: 30 Sout h Aust ralian St d Tim e
SAT + 09: 30 Sout h Aust ralian St d Tim e
AEST + 10: 00 Aust ralia East ern St d Tim e
DDUT + 10: 00 Dum ont - d'Urville Tim e ( Ant arct ica)
EAST + 10: 00 East Aust ralian St d Tim e
EST + 10: 00 Aust ralia East ern St d Tim e
GST + 10: 00 Guam St d Tim e, USSR Zone 9
KDT + 10: 00 Korea Daylight Tim e
LIGT + 10: 00 From Melbourne, Aust ralia
MPT + 10: 00 Nort h Mariana I slands Tim e
PGT + 10: 00 Papua New Guinea Tim e
TRUK + 10: 00 Truk Tim e
VLAT + 10: 00 Vladivost ok Tim e
YAKST + 10: 00 Yakut sk Sum m er Tim e
YAPT + 10: 00 Yap Tim e ( Micronesia)
ACSST + 10: 30 Cent ral Aust ralia Tim e
CADT + 10: 30 Cent ral Aust ralian DST
CST + 10: 30 Aust ralia Cent ral St d Tim e
LHST + 10: 30 Lord Howe St andard Tim e, Aust ralia
SADT + 10: 30 Sout h Aust ralian Daylight Tim e
AESST + 11: 00 East ern Aust ralia
LHDT + 11: 00 Lord Howe Daylight Tim e, Aust ralia
MAGT + 11: 00 Magadan Tim e
NCT + 11: 00 New Caledonia Tim e
PONT + 11: 00 Ponape Tim e ( Micronesia)
VLAST + 11: 00 Vladivost ok Sum m er Tim e
VUT + 11: 00 Vanuat a Tim e
ANAT + 12: 00 Anadyr Tim e ( Russia)
CKT + 12: 00 Cook I slands Tim e
GILT + 12: 00 Gilbert I slands Tim e
IDLE + 12: 00 I nt ernat ional Dat e Line, East
KOST + 12: 00 Kosrae Tim e
MAGST + 12: 00 Magadan Sum m er Tim e
MHT + 12: 00 Kwaj alein Tim e
NZST + 12: 00 New Zealand St andard Tim e
NZT + 12: 00 New Zealand Tim e
PETT + 12: 00 Pet ropavlovsk- Kam chat ski Tim e
TVT + 12: 00 Tuvalu Tim e
WAKT + 12: 00 Wake Tim e
WFT + 12: 00 Wallis and Fut una Tim e
CHAST + 12: 45 Chat ham I sland Tim e
ANAST + 13: 00 Anadyr Sum m er Tim e ( Russia)
NZDT + 13: 00 New Zealand Daylight Tim e
PETST + 13: 00 Pet ropavlovsk- Kam chat ski Sum m er
Tim e
PHOT + 13: 00 Phoenix I slands ( Kiribat i) Tim e
TOT + 13: 00 Tonga Tim e
CHADT + 13: 45 Chat ham I sland Daylight Tim e
LINT + 14: 00 Line I slands Tim e ( Kiribat i)

I m ent ioned earlier in t his sect ion t hat an INTERVAL value represent s a t im e span. I also m ent ioned t han an INTERVAL st ores
som e num ber of seconds. The synt ax for an INTERVAL lit eral allows you t o specify t he num ber of seconds in a variet y of
unit s.

The form at of an INTERVAL value is

quantity unit [quantity unit ...][AGO]

The unit com ponent specifies a num ber of seconds, as shown in Table 2.11. The quantity com ponent act s as a m ult iplier
( and m ay be fract ional) . I f you have m ult iple quantity unit groups, t hey are all added t oget her. The opt ional phrase AGO
will cause t he INTERVAL t o be negat ive.

Ta ble 2 .1 1 . INTERVAL Unit s

Descript ion Seconds Unit N am es

Microsecond [ 3 ] .000001 us, usec, usecs, useconds,


microsecon

Millisecond [ 3 ] .001 - ms, msecs, mseconds,


millisecon

Second 1 s, sec, secs, second,


seconds

Minut e 60 m, min, mins, minute,


minutes

Hour 3600 h, hr, hrs, hours

Day 86400 d, day, days

Week 604800 w, week, weeks

Mont h ( 30 days) 2592000 mon, mons, month, months

Year 31557600 y, yr, yrs, year, years

Decade 315576000 dec, decs, decade, decades

Cent ury 3155760000 c, cent, century, centuries

Millennium 31557600000 mil, mils, millennia,


millennium

[ 3]
Millisecond and m icrosecond can be used only in com binat ion wit h anot her dat e/ t im e com ponent . For
exam ple, CAST( '1 SECOND 5000 MSEC' AS INTERVAL ) result s in an int erval of six seconds.

You can use t he EXTRACT( EPOCH FROM interval ) funct ion t o convert an INTERVAL int o a num ber of seconds. A few sam ple
INTERVAL values are shown in Table 2.12 . The Display colum n shows how Post greSQL would form at t he I nput Value for
display. The EPOCH colum n shows t he value t hat would be ret urned by ext ract ing t he EPOCH from t he I nput Value.

Table 2 .1 2 . Sam ple INTERVAL Va lues

I nput Va lue Display EPOCH


.5 minutes 00:00:30 30
22 seconds 1 msec 00:00:22.00 22.001
22.001 seconds 00:00:22.00 22.001
10 centuries 2 decades 1020 years 32188752000
1 week 2 days 3.5 msec 9 days 00:00:00.00 777600.0035

Supported Operators

There are two types of operators that you can use with temporal values: arithmetic operators (addition and subtraction) and
comparison operators.

You can add an INT4, a TIME, or a TIMETZ to a DATE. When you add an INT4, you are adding a number of days. Adding a
TIME or TIMETZ to a DATE results in a TIMESTAMP. Table 2.13 lists the valid data type and operator combinations for temporal
data types. The last column in Table 2.14 shows the data type of the resulting value.

Table 2.13. Arithmetic Date/Time Operators

Data Types Valid Operators (€) Result Type

DATE • DATE - INTEGER

DATE • TIME + TIMESTAMP

DATE • TIMETZ + TIMESTAMP WITH TIMEZONE

DATE • INT4 + - DATE

TIME • DATE + TIMESTAMP

TIME • INTERVAL + - TIME

TIMETZ • DATE + TIMESTAMP WITH TIMEZONE

TIMETZ • INTERVAL + - TIMETZ

TIMESTAMP • TIMESTAMP - INTERVAL

TIMESTAMP • INTERVAL + - TIMESTAMP WITH TIMEZONE

INTERVAL • TIME + TIME WITHOUT TIMEZONE

Table 2.14. Arithmetic Date/Time Operator Examples

Example Result

'23-JAN-2003'::DATE - '23-JAN-2002'::DATE 365

'23-JAN-2003'::DATE + '2:35 PM'::TIME 2003-01-23 14:35:00


'23-JAN-2003'::DATE + '2:35 PM 2003-01-23 09:35:00-05
GMT'::TIMETZ
'23-JAN-2003'::DATE + 2::INT4 2003-01-25

'2:35 PM'::TIME + '23-JAN-2003'::DATE 2003-01-23 14:35:00


'2:35 PM'::TIME + '2 hours 5 16:40:00
minutes'::INTERVAL
'2:35 PM EST'::TIMETZ + '23-JAN- 2003-01-23 14:35:00-05
2003'::DATE

'2:35 PM EST'::TIMETZ + '2 hours 5 16:40:00-05


minutes'::INTERVAL

'23-JAN-2003 2:35 PM EST'::TIMESTAMP - 365 days 01:35


'23-JAN-2002 1:00 PM EST'::TIMESTAMP
'23-JAN-2003 2:35 PM EST'::TIMESTAMP + '3 2003-01-26 16:40:00-05
days 2 hours 5 minutes'::INTERVAL
'2 hours 5 minutes'::INTERVAL + '2:34 16:39:00
PM'::TIME
Table 2.14 shows how each of the arithmetic operators behave when applied to date/time values.

Using the temporal comparison operators, you can determine the relationship between two date/time values. For purposes of
comparison, an earlier date/time value is considered to be less than a later date/time value.

Table 2.15 shows how you can combine the various temporal types with comparison operators.

Table 2.15. Date/Time Comparison Operators

Data Types Valid Operators (€)

date • date < <= <> = >= >

time • time < <= <> = >= >

timetz • timetz < <= <> = >= >

timestamp • timestamp < <= <> = >= >


Boolean (Logical) Values

PostgreSQL supports a single Boolean (or logical) data type: BOOLEAN (BOOLEAN can be abbreviated as BOOL).

Size and Valid Values

A BOOLEAN can hold the values TRUE, FALSE, or NULL, and consumes a single byte of storage.

Syntax for Literal Values

Table 2.16 shows the alternate spellings for BOOLEAN literals.

Table 2.16. BOOLEAN Literal Syntax

Common Name Synonyms


TRUE true, 't', 'y', 'yes', 1

FALSE false, 'f', 'n', 'no', 0

Supported Operators

The only operators supported for the BOOLEAN data type are the logical operators shown in Table 2.17:

Table 2.17. Logical Operators for BOOLEAN

Data Types Valid Operators (€)

BOOLEAN • BOOLEAN AND OR NOT

I covered the AND, OR, and NOT operators in Chapter 1. For a complete definition of these operators, see Tables 1.3, 1.4, and
1.5.
Ge om e t r ic D a t a Ty p e s

Post g r eSQL su p p or t s six d at a t y p es t h at r ep r esen t t w o - d im en sion al g eom et r ic ob j ect s. Th e m ost b asic g eom et r ic d at a t y p e is t h e
POINT — as y ou m ig h t ex p ect , a POINT r ep r esen t s a p oin t w it h in a t w o- d im en sion al p lan e.

A POINT is com p osed of an x - coor d in at e an d a y - coor d in at e— each coor d in at e is a DOUBLE PRECISION n u m b er .

Th e LSEG d at a t y p e r ep r esen t s a t w o - d im en sion al lin e seg m en t . Wh en y ou cr eat e a LSEG v alu e, y ou specif y t w o p oin t s—t h e st ar t in g
POINT an d t h e en d in g POINT.

A BOX v alu e is u sed t o d ef in e a r ect an g le—t h e t w o poin t s t h at d ef in e a b ox sp ecif y op p osit e cor n er s.

A PATH is a collect ion of an ar b it r ar y n u m b er of POINT s t h at ar e con n ect ed . A PATH can sp ecif y eit h er a closed p at h or an o p en
p at h . I n a closed p at h , t h e b eg in n in g an d en d in g p oin t s ar e con sid er ed t o be con n ect ed , an d in an o p en p at h , t h e f ir st an d last
poin t s ar e n ot con n ect ed . Post g r eSQL p r ov id es t w o f u n ct ion s t o f or ce a PATH t o be eit h er o p en or closed : POPEN() a n d PCLOSE().
You can also sp ecif y w h et h er a PATH is o p en or closed u sin g sp ecial lit er al sy n t ax ( d escr ib ed lat er ) .

A POLYGON is sim ilar t o a closed PATH. Th e d if f er en ce b et w een t h e t w o t y p es is in t h e su p p or t in g f u n ct ion s.

A cen t er POINT an d a ( DOUBLE PRECISION) f loat in g - poin t r ad iu s r ep r esen t a CIRCLE.

Tab le 2 . 1 8 su m m ar izes t h e g eom et r ic d at a t y p es.

T a b l e 2 . 1 8 . Ge o m e t r i c D a t a T y p e s

Type M e a n in g D e fin e d By

POINT 2D p oin t o n a p lan e x - an d y - coor d in at es

LSEG Lin e seg m en t Tw o p oin t s

BOX Rect an g le Tw o p oin t s

PATH Op en o r closed p at h n p oin t s

POLYGON Poly g on n p oin t s

CIRCLE Cir cle Cen t er p oin t an d r ad iu s

Sy n t a x for Lit e r a l V a lu e s

Wh en y ou en t er a v alu e f or g eom et r ic d at a t y p e, k eep in m in d t h at y ou ar e w or k in g w it h a list of t w o- d im en sion al p oin t s ( ex cep t in


t h e case of a CIRCLE, w h er e y ou ar e w or k in g w it h a POINT an d a r ad iu s) .

A sin g le POINT can b e en t er ed in eit h er of t h e f ollow in g t w o f or m s:

'( x, y )'
' x, y '

Th e LSEG an d BOX t y p es ar e con st r u ct ed f r om a p air of POINT s. You can en t er a p air of POINT s in an y of t h e f ollow in g f or m at s:

'(( x1, y1 ), ( x2, y2 ))'


'( x1, y1 ), ( x2, y2 )'
'x1, y1, x2, y2'

Th e PATH an d POLYGON t y p es ar e con st r u ct ed f r om a list of on e or m or e POINT s. An y of t h e f ollow in g f or m s is accep t ab le f or a PATH


or POLYGON lit er al:

'(( x1, y1 ), ..., ( xn, yn ))'


'( x1, y1 ), ..., ( xn, yn )'
'( x1, y1, ..., xn, yn )'
'x1, y1, ..., xn, yn'

You can also u se t h e sy n t ax '[( x1, y1 ), ..., ( xn, yn )]' t o en t er a PATH lit er al. A PATH en t er ed in t h is f or m is con sid er ed
t o b e a n op en PATH.

A CIRCLE is d escr ib ed by a cen t r al p oin t an d a f loat in g p oin t r ad iu s. You can en t er a CIRCLE in an y of t h e f ollow in g f or m s:

'< ( x, y ), r >'
'(( x, y ), r )'
'( x, y ), r'
'x, y, r'

Not ice t h at t h e su r r ou n d in g sin g le q u ot es ar e r eq u ir ed ar ou n d all g eom et r ic lit er als—in ot h er w or d s, g eom et r ic lit er als ar e en t er ed
as st r in g lit er als. I f y ou w an t t o cr eat e a g eom et r ic v alu e f r om in d iv id u al com p on en t s, y ou w ill h av e t o u se a g eom et r ic con v er sion
f u n ct ion . For ex am p le, if y ou w an t t o cr eat e a POINT v alu e f r om t h e r esu lt s of so m e com p u t at ion , y ou w ou ld u se:

POINT( 4, 3*height )

Th e POINT( DOUBLE PRECISION x, DOUBLE PRECISION y ) f u n ct ion cr eat es a POINT v alu e f r om t w o DOUBLE PRECISION v alu es.
Th er e ar e sim ilar f u n ct ion s t h at y ou can u se t o cr eat e an y g eom et r ic t y p e st ar t in g f r om in d iv id u al com p on en t s. Table 2 . 1 9 list s t h e
con v er sion f u n ct ion s f or g eom et r ic t y p es.

T a b l e 2 . 1 9 . T y p e Co n v e r si o n O p e r a t o r s f o r t h e Ge o m e t r i c D a t a T y p e s

R e su l t T y p e M e a n in g

POINT POINT( DOUBLE PRECISION x, DOUBLE PRECISION y )

LSEG LSEG( POINT p1, POINT p2 )

BOX BOX( POINT p1, POINT p2 )

PATH PATH( POLYGON poly )

POLYGON POLYGON( PATH path )

POLYGON( BOX b ) y ields a 1 2 - poin t p oly g on

POLYGON( CIRCLE c ) y ield s a 12 - poin t p oly g on

POLYGON( INTEGER n, CIRCLE c ) y ield s an n p oin t p oly g on

CIRCLE CIRCLE( BOX b ) CIRCLE( POINT radius, DOUBLE PRECISION


point )

Siz e s a n d V a lid V a lu e s

Tab le 2 . 2 0 list s t h e size of each g eom et r ic d at a t y p e.

T a b l e 2 . 2 0 . Ge o g r a p h i c D a t a T y p e St o r a g e R e q u i r e m e n t s

Type Siz e ( i n b y t e s)

POINT 1 6 ( 2 * sizeof DOUBLE PRECISION)

LSEG 3 2 ( 2 * sizeof POINT)

BOX 3 2 ( 2 * sizeof POINT)

PATH 4 + ( 3 2 * n u m b er of p oin t s) [ 4]

POLYGON 4 + ( 3 2 * n u m b er of p oin t s) [ 4]

CIRCLE 2 4 ( sizeof POINT + sizeof DOUBLE PRECISION)

[ 4]
Th e size of a PATH o r POLYGON is eq u al t o 4 + ( size of LSEG * n u m b er of seg m en t s) .

Su p p or t e d Op e r a t or s

Post g r eSQL f eat u r es a lar g e collect ion of op er at or s t h at w or k w it h t h e g eom et r ic d at a t y p es. I ' v e d iv id ed t h e g eom et r ic op er at or s
in t o t w o b r oad cat eg or ies ( t r an sf or m at ion an d p r ox im it y ) t o m ak e it a lit t le easier t o t alk ab ou t t h em .

Usin g t h e t r an sf or m at ion op er at or s, y ou can t r an slat e, r ot at e, an d scale g eom et r ic ob j ect s. Th e + an d - op er at or s t r an slat e a


g eom et r ic ob j ect t o a n ew locat ion . Con sid er Fig u r e 2 . 1 , w h ich sh ow s a BOX d ef in ed as BOX( POINT( 3,5 ), POINT( 1,2 )).

Fi g u r e 2 . 1 . BOX( POINT( 3,5 ), POINT( 1,2 )).


I f y ou u se t h e + op er at or t o ad d t h e POINT( 2,1 ) t o t h is BOX, y ou en d u p w it h t h e ob j ect sh ow n in Fig u r e 2 . 2 .

Fi g u r e 2 . 2 . Ge o m e t r i c t r a n sl a t i o n .

You can see t h at t h e x - coor d in at e of t h e POINT is ad d ed t o each of t h e x - coor d in at es in t h e BOX, an d t h e y - coor d in at e of t h e POINT
is ad d ed t o t h e y - coor d in at es in t h e BOX. Th e - op er at or w or k s in a sim ilar f ash ion : t h e x - coor d in at e of t h e POINT is su b t r act ed
f r om t h e x - coor d in at es of t h e BOX, an d t h e y - coor d in at e of t h e POINT is su b t r act ed f r om each y - coor d in at e in t h e BOX.

Usin g t h e + an d - op er at or s, y ou can m o v e a POINT, BOX, PATH, or CIRCLE t o a n ew locat ion . I n each case, t h e x - coor d in at e in t h e
secon d op er an d ( a POINT), is ad d ed or su b t r act ed f r om each x - coor d in at e in t h e f ir st op er an d , an d t h e y - coor d in at e in t h e secon d
op er an d is ad d ed o r su b t r act ed f r om each y - coor d in at e in t h e f ir st op er an d .

Th e m u lt ip licat ion an d d iv ision op er at or s ( * a n d /) ar e u sed t o scale an d r ot at e. Th e m u lt ip licat ion an d d iv ision op er at or s t r eat t h e


op er an d s as p oin t s in t he com p lex p lan e. Let ' s look at so m e ex am p les.

Figu r e 2 . 3 sh ow s t h e r esu lt of m u lt ip ly in g BOX(POINT(3,2),POINT(1,1)) by POINT(2,0).

Fi g u r e 2 . 3 . P o i n t m u l t i p l i ca t i o n - sca l i n g b y a p o si t i v e v a l u e .
You can see t h at each coor d in at e in t h e or igin al b ox is m u lt ip lied b y t h e x - coor d in at e of t h e p oin t , r esu lt in g in BOX(POINT
(6,4),POINT(2,2)). I f y ou h ad m u lt ip lied t h e b ox b y POINT(0.5,0), y ou w ou ld h av e en d ed u p w it h BOX(POINT(1.5,1),POINT
(0.5,0.5)). So t h e ef f ect of m u lt ip ly in g an ob j ect b y POINT(x,0) is t h at each coor d in at e in t h e ob j ect m ov es aw ay f r om t h e or ig in
by a f act or x. I f x is n eg at iv e, t h e coor d in at es m o v e t o t h e ot h er side of t h e or ig in , as sh ow n in Fig u r e 2 . 4 .

Fi g u r e 2 . 4 . P o i n t m u l t i p l i ca t i o n - sca l i n g b y a n e g a t i v e v a l u e .

Th e x - coor d in at e con t r ols scalin g. Th e y - coor d in at e con t r ols r ot at ion . Wh en y ou m u lt ip ly an y g iv en g eom et r ic ob j ect b y POINT
(0,y), each p oin t in t h e ob j ect is r ot at ed ar ou n d t h e or ig in . Wh en y is eq u al t o 1 , each p oin t is r ot at ed cou n t er clock w ise b y 9 0 °
ab ou t t h e or igin . Wh en y is eq u al t o –1 , each poin t is r ot at ed – 9 0 ° ab ou t t h e or ig in ( or 2 7 0 °) . Wh en y ou r ot at e a poin t w it h ou t
scaling, the length of the line segment drawn between the point and origin remains constant, as shown in Figure 2.5.

Figure 2.5. Point multiplication-rotation.

You can combine rotation and scaling into the same operation by specifying non-zero values for both the x- and y-coordinates. For
more information on using complex numbers to represent geometric points, see http://www.clarku.edu/~djoyce/complex.

Table 2.21 shows the valid combinations for geometric types and geometric operators.

Table 2.21. Transformation Operators for the Geometric Types

Data Types Valid Operators (€)

POINT • POINT * + - /

BOX • POINT * + - /

PATH • POINT * + - /

CIRCLE • POINT * + - /

The proximity operators allow you to determine the spatial relationships between two geometric objects.

First, let's look at the three containment operators. The ~ operator evaluates to TRUE if the left operand contains the right operand.
The @ operator evaluates to TRUE if the left operand is contained within the right operand. The ~= returns TRUE if the left operand is
the same as the right operand€ two geographic objects are considered identical if the points that define the objects are identical
(two circles are considered identical if the radii and center points are the same).

The next two operators are used to determine the distance between two geometric objects.

The ## operator returns the closest point between two objects. You can use the ## operator with the following operand types shown
in Table 2.22.

Table 2.22. Closest-Point Operators

Operator Description

LSEG a ## BOX b Returns the point in BOX b that is closest to LSEG

LSEG a ## LSEG b Returns the point in LSEG b that is closest to


LSEG a

POINT a ## BOX b Returns the point in BOX b that is closest to


POINT a

POINT a ## LSEG b Returns the point in LSEG b that is closest to


POINT a

The distance (<->) operator returns (as a DOUBLE PRECISION number) the distance between two geometric objects. You can use
the distance operator with the operand types in Table 2.23.

T a b l e 2 . 2 3 . D i st a n ce O p e r a t o r s

Ope r a t or D e scr i p t i o n ( o r Fo r m u l a )

BOX a <-> BOX b (@@ BOX a) <-> (@@ BOX b)

CIRCLE a <-> CIRCLE b (@@ CIRCLE a) <-> (@@ CIRCLE b)

(radius a + radius b)

CIRCLE a <-> POLYGON b -0 if any point in POLYGON b is inside CIRCLE a otherwise,


distance between center of CIRCLE a and closest point in
POLYGON b

LSEG a <-> BOX b (LSEG ## BOX) <-> (LSEG ## (LSEG ## BOX))

LSEG a <-> LSEG b Distance between closest points (0 if LSEG a intersects LSEG

b
)

PATH a <-> PATH b Distance between closest points

POINT a <-> BOX b POINT a <-> (POINT a ## BOX b )

POINT a <-> CIRCLE b POINT a <-> ((@@ CIRCLE b) - CIRCLE b radius)

POINT a <-> LSEG b POINT a <-> (POINT a ## LSEG b)

POINT a <-> PATH b Distance between POINT a and closest points

POINT a <-> POINT b -SQRT(( POINT a.x - POINT b.x ) 2 + (POINT a.y -
POINT b.y ) 2 )

Next, you can determine the spatial relationships between two objects using the left-of (<<), right-of(>>), below (<^), and above
(>^) operators.

There are three overlap operators. && evaluates to TRUE if the left operand overlaps the right operand. The &> operator evaluates to
TRUE if the leftmost point in the first operand is left of the rightmost point in the second operand. The &< evaluates to TRUE if the
rightmost point in the first operand is right of the leftmost point in the second operand.

The intersection operator (#) returns the intersecting points of two objects. You can find the intersection of two BOX es, or the
intersection of two LSEG s. The intersection of two BOX es evaluates to a BOX. The intersection of two LSEG s evaluates to a single
POINT.

Finally, the •# operator evaluates to TRUE if the first operand intersects with or overlaps the second operand.

The final set of geometric operators determines the relationship between a line segment and an axis, or the relationship between
two line segments.

The •- operator evaluates to TRUE if the given line segment is horizontal (that is, parallel to the x-axis). The •| operator evaluates
to TRUE if the given line segment is vertical (that is, parallel to the y-axis). When you use the •- and •| operators with a line
segment, they function as prefix unary operators. You can also use the •- and •| operators as infix binary operators (meaning that
the operator appears between two values), in which case they operate as if you specified two points on a line segment.

The •-| operator evaluates to TRUE if the two operands are perpendicular. The •|| operator evaluates to TRUE if the two operands
are parallel. The perpendicular and parallel operators can be used only with values of type LSEG.

The final geometric operator (@@) returns the center point of an LSEG, PATH, BOX, POLYGON, or CIRCLE.

Tables 2.24 summarizes the proximity operators.


Table 2.24. Proximity Operators for the Geometric Types

Data Types Valid Operators (€)

POINT • POINT <-> << <^ >> >^ •- •| @

POINT • LSEG ## <-> @

POINT • BOX ## <-> @

POINT • PATH <-> @

POINT • POLYGON @

POINT • CIRCLE <-> @

LSEG • LSEG # ## < <-> <= <> = > >= •# •-| •||

LSEG • BOX ## <-> •# @

BOX • POINT * + - /

BOX • BOX # && &< &> < <-> << <= <^ = > >= >> >^ •#
@ ~ ~=

PATH • POINT * + - / ~

PATH • PATH + < <-> <= = > >= •#

POLYGON • POINT ~

POLYGON • POLYGON && &< &> <-> >> << @ ~ ~=

CIRCLE • POINT * + - / ~

CIRCLE • POLYGON <->

CIRCLE • CIRCLE && &< &> > <-> << <= <> <^ = > >= >> >^ @
~ ~=

Table 2.25 summarizes the names of the proximity operators for geometric types.

Table 2.25. Geometric Proximity Operator Names

Data Types Valid Operators (€)

# Intersection or point count(for polygons)

## Point of closest proximity

<-> Distance between

<< Left of •

>> Right of •

<^ Below •

>^ Above •

&& Overlaps

&> Overlaps to left

&< Overlaps to right

•# Intersects or overlaps

@ Contained in

~ Contains

~= Same as

•- Horizontal

•| Vertical

•-| Perpendicular

•|| Parallel

@@ Center
O b j e ct I D s ( O I D )

An OID is a 32 - bit , posit iv e w hole num ber . Ev er y r ow [ 5] in a Post gr eSQL dat abase cont ains a unique ident ifier [ 6] —t he obj ect I D ( or OID) .
Norm ally, t he OID colum n is hidden. You can see t he OID for a row by including t he OID colum n in t he t arget list of a SELECT st at em ent :

[ 5] By default , all t ables ar e creat ed such t hat every r ow cont ains an OI D. You can om it t he obj ect I Ds using t he WITHOUT
OIDS clause of t he CREATE TABLE com m and.

[ 6]
The Post gr eSQL docum ent at ion w ar ns t hat obj ect I Ds ar e cur r ent ly unique w it hin a dat abase clust er; but in a fut ure
release, an OID m ay be unique only w it hin a single t able.

Code View : Scroll / Show All

movies=# SELECT OID, * FROM customers;

oid | customer_id | customer_name | phone | birth_date | balance


-------+-------------+----------------------+----------+------------+---------
33876 | 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
33877 | 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
33878 | 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
33879 | 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
33889 | 5 | Funkmaster, Freddy | 555-FUNK | | 0.00
33890 | 7 | Gull, Jonathon LC | 555-1111 | 1984-02-05 | 0.00
33891 | 8 | Grumby, Jonas | 555-2222 | 1984-02-21 | 0.00

You can cr eat e a colum n of t ype OID if you w ant t o explicit ly refer t o anot her obj ect ( usually a row in anot her t able) . Think back t o t he
rentals t able t hat y ou dev eloped in Chapt er 1. Each r ow in t he rentals t able cont ains a tape_id, a customer_id, and a rental date.
The rentals t able cur r ent ly look s like t his:

movies=# \d rentals
Table "public.rentals"
Attribute | Type | Modifier
-------------+--------------+----------
tape_id | character(8) | not null
rental_date | date | not null
customer_id | integer | not null

movies=# SELECT * FROM rentals;


tape_id | rental_date | customer_id
----------+-------------+-------------
AB-12345 | 2001-11-25 | 1
AB-67472 | 2001-11-25 | 3
OW-41221 | 2001-11-25 | 1
MC-68873 | 2001-11-20 | 3
KJ-03335 | 2001-11-26 | 8
(5 rows)

Each value in t he tape_id colum n r efer s t o a row in t he tapes t able. Each v alue in t he customer_id colum n r efer s t o a row in t he
customers t able. Rat her t han st or ing t he tape_id and customer_id in t he rentals t able, y ou could st ore OID s for t he corresponding row s.
The follow ing CREATE TABLE ... AS com m and cr eat es a new t able, rentals2, t hat is equivalent t o t he or iginal rentals t able:

movies=# CREATE TABLE rentals2 AS


movies-# SELECT
movies-# t.oid AS tape_oid, c.oid AS customer_oid, r.rental_date
movies-# FROM
movies-# tapes t, customers c, rentals r
movies-# WHERE
movies-# t.tape_id = r.tape_id
movies-# AND
movies-# c.id = r.customer_id;

This st at em ent ( concept ually) w or k s as follow s. First , y ou r et r ieve a r ow fr om t he rentals t able. Next , you use t he rentals.customer_id
colum n t o ret rieve t he m at ching customers row and t he rentals.tape_id colum n t o r et r ieve t he m at ching tapes r ow . Finally , y ou st or e
t he OID of t he customers row and t he OID of t he tapes row ( and t he rental_date) in a new rentals2 row.

Now, w hen y ou SELECT from t he r ent als2 t able, y ou will see t he obj ect I Ds for t he customers row and t he tapes row:

movies=# SELECT * FROM rentals2;


tape_oid | customer_oid | rental_date
----------+--------------+-------------
38337 | 38333 | 2001-11-25
38338 | 38335 | 2001-11-25
38394 | 38393 | 2001-11-26
38339 | 38335 | 2001-11-20
38340 | 38333 | 2001-11-25
You can re-create the data in the original table by joining the corresponding customers and tapes records, based on their respective OID s:

movies=# SELECT t.tape_id, r.rental_date, c.id


movies-# FROM
movies-# tapes t, rentals2 r, customers c
movies-# WHERE
movies-# t.oid = r.tape_oid AND
movies-# c.oid = r.customer_oid
movies-# ORDER BY t.tape_id;

tape_id | rental_date | id
----------+-------------+----
AB-12345 | 2001-11-25 | 1
AB-67472 | 2001-11-25 | 3
KJ-03335 | 2001-11-26 | 8
MC-68873 | 2001-11-20 | 3
OW-41221 | 2001-11-25 | 1
(5 rows)

Here are a couple of warnings about using OID s in your own tables.

The first concern has to do with backups. The standard tool for performing a backup of a PostgreSQL database is pg_dump. By default,
pg_dump will not archive OID s. This means that if you back up a table that contains an OID column (referring to another object) and then
restore that table from the archive, the relationships between objects will be lost, unless you remembered to tell pg_dump to archive OID s.
This happens because when you restore a row from the archive, it might be assigned a different OID.

The second thing you should consider when using OID s is that they offer no real performance advantages. If you are coming from an
Oracle or Sybase environment, you might be thinking that an OID sounds an awful lot like a ROWID. It's true that an OID and a ROWID
provide a unique identifier for a row, but that is where the similarity ends. In an Oracle environment, you can use a ROWID as the fastest
possible way to get to a specific row. A ROWID encodes the location (on disk) of the row that it belongs to€when you retrieve a row by
ROWID, you can bypass any index7 searches and go straight to the data. An OID is just a 32-bit number€you can create an index on the OID
column, but you could also create an index on any other (unique) column to achieve the same results. In fact, the only time that it might
make sense to use an OID to identify a row is when the primary key[7] for a table is very long.

[7] Don't be too concerned if you aren't familiar with the concept of indexes or primary keys. I'll cover each of those topics a
bit later.

Finally, I should point out that OID s can wrap. In an active database cluster, it's certainly possible that 4 billion objects can be created.
That doesn't mean that all 4 billion objects have to exist at the same time, just that 4 billion OID s have been created since the cluster was
created. When the OID generator wraps, you end up with duplicate values. This may sound a little far-fetched, but it does happen and it is
not easy to recover from. There really is no good reason to use an OID as a primary key€use SERIAL (or BIGSERIAL) instead.

S y n t a x f o r Li t e r a l V a l u e s

The format in which you enter literal OID values is the same that you would use for unsigned INTEGER values. An OID literal is simply a
sequence of decimal digits.

Si z e a n d V a l i d V a l u e s

As I mentioned earlier, an OID is an unsigned 32-bit (4-byte) integer. An OID column can hold values between 0 and 4294967295. The
value 0 represents an invalid OID.

Su p p o r t e d Op e r a t o r s

You can compare two OID values, and you can compare an OID value against an INTEGER value. Table 2.26 shows which operators you can
use with the OID data type.

Ta ble 2 .2 6 . OID Ope r a t or s

D a t a Ty pe s Va lid Ope r a t or s

OID • OID < <= <> = >= >

OID • INT4 < <= <> = >= >

INT4 • OID < <= <> = >= >


BLO Bs

Most dat abase syst em s provide a dat a t ype t hat can st ore raw dat a, and Post greSQL is no except ion. I use t he t erm raw dat a
t o m ean t hat t he dat abase doesn't underst and t he st ruct ure or m eaning of a value. I n cont rast , Post greSQL does underst and
t he st ruct ure and m eaning of ot her dat a t ypes. For exam ple, when you define an INTEGER colum n, Post greSQL knows t hat
t he byt es of dat a t hat you place int o t hat colum n are supposed t o represent an int eger value. Post greSQL knows what an
int eger is—it can add int egers, m ult iply t hem , convert t hem t o and from st ring form , and so on. Raw dat a, on t he ot her hand,
is j ust a collect ion of bit s—Post greSQL can't infer any m eaning in t he dat a.

Post greSQL offers t he t ype BYTEA for st oring raw dat a. A BYTEA colum n can t heoret ically hold values of any lengt h, but it
appears t hat t he m axim um lengt h is 1GB.

The size of a BYTEA value is 4 byt es plus t he act ual num ber of byt es in t he value.

Sy n t a x f o r Li t e r a l V a l u e s

Ent ering a BYTEA value can be a lit t le t ricky. A BYTEA lit eral is ent ered as a st ring lit eral: I t is j ust a st ring of charact ers
enclosed wit hin single quot es. Given t hat , how do you ent er a BYTEA value t hat includes a single quot e? I f you look back t o
t he discussion of st ring lit eral values ( earlier in t his chapt er) , you'll see t hat you can include special charact ers in a st ring
value by escaping t hem . I n part icular, a single quot e can by escaped in one of t hree ways:

€ Double up t he single quot es ( 'This is a single quote''')

€ Precede t he single quot e wit h a backslash ( 'This is a single quote \'')

€ I nclude t he oct al value of t he charact er inst ead ( 'This is a single quote \047')

There are t wo ot her charact ers t hat you m ust escape when ent ering BYTEA lit erals. A byt e whose value is zero ( not t he
charact er 0, but t he null byt e) m ust be escaped, and t he backslash charact er m ust be escaped. You can escape any charact er
using t he "\ \ddd " form ( where ddd is an oct al num ber) . You can escape any print able charact er using t he "\ \c " form . So, if
you want t o st ore a BYTEA value t hat includes a zero byt e, you could ent er it like t his:

'This is a zero byte \\000'

I f you want t o st ore a BYTEA value t hat includes a backslash, you can ent er it in eit her of t he following form s:

'This is a backslash \\\\'


'This is also a backslash \\134'

I f you com pare t hese rules t o t he rules for quot ing st ring lit erals, you'll not ice t hat BYTEA lit erals require t wice as m any
backslash charact ers. This is a quirk of t he design of t he Post greSQL parser. BYTEA lit erals are processed by t wo different
parsers. The m ain Post greSQL parser sees a BYTEA lit eral as a st ring lit eral ( gobbling up t he first set of backslash charact ers) .
Then, t he BYTEA parser processes t he result , gobbling up t he second set of backslash charact ers.

So, if you have a BYTEA value such as This is a backslash \, you quot e it as 'This is a backslash \\\\'. Aft er t he
st ring parser processes t his st ring, it has been t urned int o 'This is a backslash \\'. The BYTEA parser finally t ransform s
t his int o This is a backslash \.

Su p p o r t e d O p e r a t o r s

Post greSQL offers a single BYTEA operat or: concat enat ion. You can append one BYTEA value t o anot her BYTEA value using t he
concat enat ion ( ||) operat or.

Not e t hat you can't com pare t wo BYTEA values, even for equalit y/ inequalit y. You can, of course, convert a BYTEA value int o
anot her value using t he CAST() operat or, and t hat opens up ot her operat ors.

La r g e - O b j e ct s

The BYTEA dat a t ype is current ly lim it ed t o st oring values no larger t han 1GB. I f you need t o st ore values larger t han will fit
int o a BYTEA colum n, you can use large- obj ect s. A large- obj ect is a value st ored out side of a t able. For exam ple, if you want
t o st ore a phot ograph wit h each row in your tapes t able, you would add an OID colum n t o hold a reference t o t he
corresponding large- obj ect :

movies=# ALTER TABLE tapes ADD COLUMN photo_id OID;


ALTER
Each value in t he photo_id colum n refers t o an ent ry in t he pg_largeobject syst em t able. Post greSQL provides a funct ion
t hat will load an ext ernal file ( such as a JPEG file) int o t he pg_largeobject t able:

movies=# INSERT INTO tapes VALUES


movies-# (
movies(# 'AA-55892',
movies(# 'Casablanca',
movies(# '102 min',
movies(# lo_import('/tmp/casablanca.jpg' )
movies(# );

The lo_import() funct ion loads t he nam ed file int o pg_largeobject and ret urns an OID value t hat refers t o t he large- obj ect .
Now when you SELECT t his row, you see t he OID, not t he act ual bit s t hat m ake up t he phot o:

movies=# SELECT * FROM tapes WHERE tape_id = 'AA-55892';

tape_id | title | duration | photo_id


----------+------------+----------+----------
AA-55892 | Casablanca | 01:42:00 | 510699

I f you want t o writ e t he phot o back int o a file, you can use t he lo_export() funct ion:

movies=# SELECT lo_export( 510699, '/tmp/Casablanca.jpg' );


lo_export
-----------
1
(1 row)

To see all large- obj ect s in t he current dat abase, use psql 's \lo_list m et acom m and:

movies=# \lo_list
Large objects
ID | Description
--------+-------------
510699 |
(1 row)

You can rem ove large-obj ect s from your dat abase using t he lo_unlink() funct ion:

movies=# SELECT lo_unlink( 510699 );


lo_unlink
-----------
1
(1 row)

movies=# \lo_list
Large objects
ID | Description
----+-------------
(0 rows)

How do you get t o t he act ual bit s behind t he reference OID ? You can't —at least not wit h psql. Large- obj ect support m ust be
built int o t he client applicat ion t hat you are using. psql is a t ext - orient ed t ool and has no way t o display a phot ograph, so t he
best t hat you can do is t o look at t he raw dat a in t he pg_largeobject t able. A few client applicat ions, such as t he Conj ect rix
Workst at ion, do support large-obj ect s and can int erpret t he raw dat a properly, in m ost cases.
N e t w or k Ad d r e ss D a t a Ty p e s

Post g r eSQL su p p or t s t h r ee d at a t y p es t h at ar e d esig n ed t o h o l d n et w o r k ad d r esses, b ot h I P[ 8] ( log ical) a n d MAC [ 9] ( p h y sical)


ad d r esses. I d on ' t t h in k t h er e ar e m a n y ap p licat ion s t h at r eq u ir e t h e st or ag e of a n I P or MAC ad d r ess, so I w on ' t sp en d t oo m u ch
t im e d escr ib in g t h em . Th e Post g r eSQL User ' s Gu id e con t ain s all t h e d et ails t h at y ou m i g h t n eed t o k n o w r eg ar d in g n et w or k d at a
t y p es.

[ 8]
I P st an d s f or I n t er n et Pr ot ocol, t h e su b st r at e of t h e I n t er n et .

[ 9]
Th e acr o n y m MAC st an d s f or o n e o r m o r e of t h e f ollow in g : Mach i n e Ad d r ess Cod e, Med ia Access Con t r ol, or
Macar on i An d Ch eese.

MACADDR

Th e MACADDR t y p e is d esig n ed t o h old a MAC ad d r ess. A MAC ad d r ess is a h ar d w ar e ad d r ess, u su ally t h e ad d r ess of an et h er n et
in t er f ace.

CIDR

Th e CIDR d at a t y p e is d esig n ed t o h old a n I P n et w o r k ad d r ess. A CIDR v alu e con t ai n s an I P n et w o r k ad d r ess an d a n op t ion al


n et m ask ( t h e n et m ask d et er m in es t h e n u m b er of m ean in g f u l b it s in t h e n et w o r k ad d r ess) .

INET

An INET v alu e ca n h old t h e I P ad d r ess of a n et w o r k or o f a n et w o r k h ost . An INET v alu e con t ain s a n et w o r k ad d r ess an d a n
op t ion al n et m ask . I f t h e n et m ask is om it t ed , it is assu m ed t h at t h e ad d r ess id en t if ies a sin g le h ost ( in ot h er w o r d s, t h er e i s no
d iscer n ib le n et w o r k co m p o n en t in t h e ad d r ess) .

Not e t h at a n INET v alu e ca n r ep r esen t a n et w o r k o r a h ost , b u t a CIDR is d esig n ed t o r ep r esen t t h e ad d r ess of a n et w o r k .

Sy n t a x f or Lit e r a l V a lu e s

Th e sy n t ax r eq u ir ed f or lit er al n et w o r k v alu es is sh o w n in Tab le 2 . 2 7 .

T a b l e 2 . 2 7 . Li t e r a l Sy n t a x f o r N e t w o r k T y p e s

Type Sy n t a x Ex a m p l e s

INET a.b.c.d[/e] 192.168.0.1_192.168.150.0/26_130.155.16.1/20

CIDR a[.b[.c[.d]]][/e] 192.168.0.0/16_192.168/16

MACADDR xxxxxx:xxxxxx 0004E2:3695C0

xxxxxx-xxxxxx 0004E2-3695C0

xxxx.xxxx.xxxx 0004.E236.95C0

xx-xx-xx-xx-xx-xx 00-04-E2-36-95-C0

xx:xx:xx:xx:xx:xx 00:04:E2:36:95:C0

St ar t in g w it h v er sion 7 . 4 , y o u can also st or e I Pv 6 ( colon - sep ar at ed ) ad d r esses in an INET or CIDR v alu e.

An INET or CIDR v alu e co n su m es eit h er 1 2 b y t es or 2 4 b y t es o f st or ag e ( d ep en d i n g o n t h e n u m b er of bit s in t h e ad d r ess) . A


MACADDR v alu e co n su m es 6 b y t es of st or ag e.

Su p p or t e d Op e r a t or s

Post g r eSQL p r ov id es com p ar i son op er at or s t h at y o u can u se t o co m p ar e t w o INET v alu es, t w o CIDR v alu es, or t w o MACADDR
v alu es. Th e co m p ar i so n op er at or s w o r k b y f ir st ch eck in g t h e co m m o n b it s in t h e n et w o r k co m p o n en t s of t h e t w o ad d r esses; t h en ,
if t h o se ar e eq u al, t h e ad d r ess w it h t h e g r eat est n u m b er of n et m ask b it s is con sid er ed t h e lar g est v alu e. I f t h e n u m b er of b it s in
t h e n et m ask is eq u al ( an d t h e n et w o r k co m p o n en t s o f t h e ad d r esses ar e eq u al) , t h en t h e en t ir e ad d r ess is co m p ar ed . Th e n et
ef f ect ( p u n in t en d ed ) is t h at 192.168.0.22/24 is con sid er ed g r eat er t h an 192.168.0.22/20.

Wh en y o u ar e w or k i n g w it h t w o INET ( or CIDR) v alu es, y ou ca n also ch eck f or con t ain er sh ip . Tab le 2 . 2 8 d escr ib es t h e n et w o r k
ad d r ess op er at or s.
T a b l e 2 . 2 8 . N e t w o r k A d d r e ss O p e r a t o r s

Opera t or M e a n in g

INET 1 < INET 2 Tr u e if op er an d 1 is less t h an o p er an d 2

CIDR 1 < CIDR 2

MACADDR 1 < MACADDR 2

INET 1 <= INET 2 Tr u e if op er an d 1 is less t h an o r eq u al t o


op er an d 2
CIDR 1 <= CIDR 2

MACADDR 1 <= MACADDR 2

INET 1 <> INET 2 Tr u e if op er an d 1 is n ot eq u al t o op er an d 2

CIDR 1 <> CIDR 2

MACADDR 1 <> MACADDR 2

INET 1 = INET 2 Tr u e if op er an d 1 is eq u al t o o p er an d 2

CIDR 1 = CIDR 2

MACADDR 1 = MACADDR 2

INET 1 >= INET 2 Tr u e if op er an d 1 is g r eat er t h an o r eq u al t o


op er an d 2
CIDR 1 >= CIDR 2

MACADDR 1 >= MACADDR 2

INET 1 > INET 2 Tr u e if op er an d 1 is g r eat er t h an o p er an d 2

CIDR 1 > CIDR 2

MACADDR 1 > MACADDR 2

INET 1 << INET 2 CIDR 1 << CIDR 2 Tr u e if op er an d 1 is con t ain ed w it h in op er an d 2

INET 1 <<= INET 2 CIDR 1 <<= CIDR 2 Tr u e if op er an d 1 is con t ain ed w it h in op er an d 2


or if op er an d 1 is eq u al t o o p er an d 2

INET 1 >> INET 2 CIDR 1 >> CIDR 2 Tr u e if op er an d 1 con t ain s o p er an d 2

INET 1 >>= INET 2 CIDR 1 >>= CIDR 2 Tr u e if op er an d 1 con t ain s o p er an d 2 or if


op er an d 1 is eq u al t o o p er an d 2
Se q u e n ce s

On e p r ob lem t h at y ou w ill m ost lik ely en cou n t er in y ou r d at ab ase lif e is t h e n eed t o g en er at e u n iq u e id en t if ier s. We' v e alr ead y seen
on e ex am p le of t h is in t h e customers t ab le—t h e customer_id col u m n is n ot h in g m or e t h an a u n iq u e iden t if ier . Som et im es, a n
en t it y t h at y ou w an t t o st or e in y ou r d at ab ase w ill h av e a n at u r ally u n iq u e id en t if ier . For ex am p le, if y ou ar e d esig n in g a d at ab ase
t o t r ack em p l oy ee in f or m at ion ( in t h e Un it ed St at es) , a Social Secu r it y n u m b er m ig h t m ak e a g ood id en t if ier . Of cou r se, if y ou
em p loy p eop le w h o ar e n ot U. S. cit izen s, t h e Social Secu r it y n u m b er sch em e w ill f ail. I f y ou ar e t r ack in g in f or m at ion ab ou t
au t om ob iles, y ou m ig h t be t em p t ed t o u se t h e licen se p lat e n u m b er as a u n iq u e id en t if ier . Th at w ou ld w or k f in e u n t il y ou n eed ed t o
t r ack au t os in m or e t h an o n e st at e. Th e VI N ( or Veh icle I d en t if icat ion Nu m b er ) is a n at u r ally u n iq u e iden t if ier .

Qu it e of t en , y ou w ill n eed t o st or e in f or m at ion ab ou t an en t it y t h at h as n o n at u r ally u n iq u e I D. I n t h ose cases, y ou ar e lik ely t o


sim p ly assig n a u n iq u e n u m b er t o each en t it y . Af t er y ou h av e d ecid ed t o cr eat e a u n iqu if ier [ 10] , t h e n ex t p r ob lem is com in g u p w it h
a seq u en ce of u n iq u e n u m b er s.

[ 10]
I ' m n ot su r e t h at " u n iq u if ier " is a r eal w or d , b u t I ' v e u sed it f or q u it e som e t im e an d it su r e is a lot easier t o say
t h an " d isam b ig u at or . "

Post g r eSQL of f er s h elp in t h e f or m of a SEQUENCE. A SEQUENCE is an ob j ect t h at au t om at ically g en er at es seq u en ce n u m b er s. You


can cr eat e as m a n y SEQUENCE ob j ect s as y ou lik e: Each SEQUENCE h as a u n iq u e n a m e.

Let ' s cr eat e a n ew SEQUENCE t h at y ou can u se t o g en er at e u n iq u e iden t if ier s f or r ow s in y ou r customers t ab le. You alr ead y h av e a
f ew cu st om er s, so st ar t t h e seq u en ce n u m b er s at 1 0 :

movies=# CREATE SEQUENCE customer_id_seq START 10;


CREATE

Th e " \ds " co m m an d ( in psql) sh ow s y ou a list of t h e SEQUENCE ob j ect s in y ou r d at ab ase:

movies=# \ds
List of relations
Name | Type | Owner
-----------------+----------+------
customer_id_seq | sequence | korry
(1 row)

Now , let ' s t r y u sin g t h is SEQUENCE. Post g r eSQL p r ov id es a n u m b er of f u n ct ion s t h at y ou can call t o m ak e u se of a SEQUENCE. Th e
on e t h at y o u ar e m ost in t er est ed in at t h e m o m en t is t h e nextval() f u n ct ion . Wh en y ou call t h e nextval() f u n ct ion , y ou p r ov id e
( in t h e f or m of a st r in g) t h e n am e of t h e SEQUENCE as t h e on ly ar g u m en t .

For ex am p le, w h en y ou INSERT a n ew r ow in t h e customers t ab le, y ou w an t Post g r eSQL t o au t om at ically assig n a u n iq u e


customer_id :

movies=# INSERT INTO


movies-# customers( customer_id, customer_name )
movies-# VALUES
movies-# (
movies-# nextval( 'customer_id_seq' ), 'John Gomez'
movies-# );

movies=# SELECT * FROM customers WHERE customer_name = 'John Gomez';


customer_id | customer_name | phone | birth_date | balance
-------------+---------------+-------+------------+--------
10 | John Gomez | | |
(1 row)

You can see t h at t h e SEQUENCE ( customer_id_seq) g en er at ed a n ew customer_id, st ar t in g w it h t h e v alu e t h at y ou r eq u est ed . You


can u se t h e currval() f u n ct ion t o f in d t h e v alu e t h at w as j u st g en er at ed b y y ou r ser v er p r ocess:

movies=# SELECT currval( 'customer_id_seq' );


currval
---------
10

Th e com p let e sy n t ax f or t h e CREATE SEQUENCE co m m an d is

CREATE SEQUENCE name


[ INCREMENT increment ]
[ MINVALUE min ]
[ MAXVALUE max ]
[ START start_value ]
[ CACHE cache_count ]
[ CYCLE ]

Not ice t h at t h e on ly r eq u ir ed it em is t h e n am e.

Th e INCREMENT at t r ib u t e d et er m in es t h e am ou n t ad d ed t o g en er at e a n ew seq u en ce n u m b er . Th is v alu e can b e p osit iv e or


n eg at iv e, b u t n ot zer o. Posit iv e v alu es cau se t h e seq u en ce n u m b er s t o in cr ease in v alu e as t h ey ar e g en er at ed ( t h at is, 0 , 1 , 2 , an d
so on ) . Neg at iv e v alu es cau se t h e seq u en ce n u m b er s t o d ecr ease in v alu e ( t h at is, 3 , 2 , 1 , 0 , an d so on ) .

Th e MINVALUE a n d MAXVALUE at t r ib u t es con t r ol t h e m in im u m an d m ax i m u m v alu es ( r esp ect iv ely ) f or t h e SEQUENCE.

Wh at h ap p en s w h en a SEQUENCE h as r each ed t h e en d of it s v alid r an g e? You g et t o d ecid e: I f y ou in clu d e t h e CYCLE at t r ib u t e, t h e


SEQUENCE w ill w r ap ar ou n d . For ex am p le, if y ou cr eat e a cy clical SEQUENCE w it h MINVALUE 0 an d MAXVALUE 3, y ou w ill r et r iev e t h e
f ollow in g seq u en ce n u m b er s: 0, 1, 2, 3, 0, 1, 2, 3, .... I f y ou d on ' t in clu d e t h e CYCLE at t r ib u t e, y ou w ill see: 0, 1, 2, 3, error:
reached MAXVALUE.

Th e START at t r ib u t e d et er m in es t he f ir st seq u en ce n u m b er g en er at ed by a SEQUENCE. Th e v alu e f or t h e START at t r ib u t e m u st b e


w it h in t h e MINVALUE a n d MAXVALUE r an g e.

Th e d ef au lt v alu es f or m ost of t h e SEQUENCE at t r ib u t es d ep en d o n w h et h er t h e INCREMENT is p osit iv e or n eg at iv e. Th e d ef au lt v alu e


f or t h e INCREMENT at t r ib u t e is 1. I f y ou sp ecif y a n eg at iv e INCREMENT, t h e MINVALUE d ef au lt s t o - 2 1 4 7 4 8 3 6 4 7 , an d MAXVALUE
d ef au lt s t o - 1 . I f y ou sp ecif y a p osit iv e INCREMENT, MINVALUE d ef au lt s t o 1 , an d MAXVALUE d ef au lt s t o 2 1 4 7 4 8 3 6 4 7 . Th e d ef au lt
v alu e f or t h e START at t r ib u t e is also d ep en d en t on t h e sign of t h e INCREMENT. A p osit iv e INCREMENT d ef au lt s t h e START v alu e t o t h e
MINVALUE at t r ib u t e. A n eg at iv e INCREMENT d ef au lt s t h e START v alu e t o t h e MAXVALUE at t r ib u t e.

Rem em b er , t h ese ar e t h e d ef au lt s—y ou can ch oose an y m ean in g f u l com b in at ion of v alu es t h at y ou lik e ( w it h in t h e v alid r an g e of a
BIGINT) .

Th e d ef au lt SEQUENCE at t r ib u t es ar e su m m ar ized in Tab le 2 . 2 9 .

T a b l e 2 . 2 9 . Se q u e n ce At t r i b u t e s

At t r i b u t e N a m e D e fa u lt V a lu e

INCREMENT 1

MINVALUE -INCREMENT > 0 ? 1 _ INCREMENT < 0 ? - 2 1 4 7 4 8 3 6 4 7

MAXVALUE INCREMENT > 0 ? 2 1 4 7 4 8 3 6 4 7 INCREMENT < 0 ? - 1

START INCREMENT > 0 ? MINVALUE INCREMENT < 0 ? MAXVALUE

CACHE 1

CYCLE False

Th e CACHE at t r ib u t e is a p er f or m an ce- t u n in g p ar am et er ; it d et er m in es h o w m a n y seq u en ce n u m b er s ar e g en er at ed an d h eld in


m em or y . I n m ost cases, y ou can sim p ly u se t h e d ef au lt v alu e ( 1 ) . I f y ou su sp ect t h at seq u en ce n u m b er g en er at ion is a b ot t len eck
in y ou r ap p licat ion , y ou m ig h t con sid er in cr easin g t h e CACHE at t r ib u t e, b u t b e su r e t o r ead t h e w ar n in g in t h e Post g r eSQL
d ocu m en t at ion ( see t h e CREATE SEQUENCE sect ion ) .

You can v iew t h e at t r ib u t es of a SEQUENCE b y t r eat in g it as a t ab le an d select in g f r om it [ 11] :

[ 11]
Th er e ar e f ou r ot h er colu m n s in a SEQUENCE, b u t t h ey h old b ook k eep in g in f or m at ion r eq u ir ed t o p r op er ly
m ain t ain t h e SEQUENCE.

movies=# SELECT
movies-# increment_by, max_value, min_value, cache_value, is_cycled
movies-# FROM
movies-# customer_id_seq;

increment_by | max_value | min_value | cache_value | is_cycled


--------------+-----------+-----------+-------------+-----------
1 | 3 | 0 | 1 | f

Post g r eSQL p r ov id es t h r ee f u n ct ion s t h at w or k w it h SEQUENCE s. I d escr ib ed t h e nextval() a n d currval() f u n ct ion s ear lier ;
nextval() g en er at es ( an d r et u r n s) a n ew v alu e f r om a SEQUENCE, an d currval() r et r iev es t h e m o st r ecen t ly g en er at ed v alu e. You
can r eset a SEQUENCE t o an y v alu e b et w een MINVALUE an d MAXVALUE b y callin g t h e setval() f u n ct ion . For ex am p le:

movies=# SELECT nextval( 'customer_id_seq' );


ERROR: customer_id_seq.nextval: reached MAXVALUE (3)

movies=# SELECT setval( 'customer_id_seq', 0 );


setval
--------
0
(1 row)

movies=# SELECT nextval( 'customer_id_seq' );


nextval
---------
1

Now t h at y ou k n ow h o w SEQUENCE s w or k in Post g r eSQL, let ' s r ev isit t h e SERIAL d at a t y p e. I m en t ion ed ear lier in t h is ch ap t er t h at
a SERIAL is r eally im p lem en t ed as a SEQUENCE ( see t h e " SERI AL, BI GSERI AL, an d SEQUENCES" sid eb ar ) . Rem em b er t h at a SERIAL
p r ov id es a n au t om at ically in cr easin g ( or d ecr easin g ) u n iq u e iden t if ier . Th at sou n d s j u st lik e a SEQUENCE, so w h at ' s t h e d if f er en ce? A
SEQUENCE is a st an d alon e ob j ect , w h er eas SERIAL is a d at a t y p e t h at y ou can assig n t o a colu m n .

Let ' s cr eat e a n ew t ab le t h at con t ain s a SERIAL col u m n :

movies=# CREATE TABLE serial_test ( pkey SERIAL, payload INTEGER );

NOTICE: CREATE TABLE will create implicit


sequence 'serial_test_pkey_seq' for
SERIAL column 'serial_test.pkey'
NOTICE: CREATE TABLE/UNIQUE will create implicit
index 'serial_test_pkey_key' for table 'serial_test'
CREATE

Th e CREATE TABLE co m m an d is n or m ally silen t . Wh en y ou cr eat e a t ab le w it h a SERIAL col u m n , Post g r eSQL d oes a lit t le ex t r a w or k
on y ou r b eh alf . Fir st , Post g r eSQL cr eat es a SEQUENCE f or y ou . Th e n am e of t h e SEQUENCE is b ased o n t h e n am e of t h e t ab le an d t h e
n am e of t h e colu m n . I n t h is case, t h e SEQUENCE is n am ed serial_test_pkey_seq. Nex t , Post g r eSQL cr eat es a u n iq u e in d ex . We
h av en ' t r eally t alk ed ab ou t in d ex es y et : f or n o w , k n o w t h at a u n iq u e in d ex o n t h e pkey col u m n en su r es t h at y ou h av e n o d u p licat e
v alu es in t h at colu m n . Post g r eSQL p er f or m s on e m or e n icet y f or y ou w h en y ou cr eat e a SERIAL colu m n . Th e \d co m m an d ( in psql)
sh ow s y ou t h is last st ep :

movies=# \d serial_test
Table "public.serial_test"
Attribute | Type | Modifier
-----------+---------+-------------------------------------------------
pkey | integer | not null default nextval('serial_test_pkey_seq')
payload | integer |
Index: serial_test_pkey_key

Post g r eSQL h as cr eat ed a d ef au lt v alu e f or t h e pkey colu m n . A colu m n ' s d ef au lt v alu e is u sed w h en ev er y ou in ser t a r o w b u t om it a
v alu e f or t h at colu m n . For ex am p le, if y ou ex ecu t e t h e co m m an d INSERT INTO serial_test( payload ) VALUES( 24307 );, y ou
h av e n ot p r ov id ed a n ex plicit v alu e f or t h e pkey col u m n . I n t h is case, Post g r eSQL ev alu at es t h e d ef au lt v alu e f or pkey a n d in ser t s
t h e r esu lt in g v alu e. Becau se t h e d ef au lt v alu e f or pkey is a call t o t h e nextval() f u n ct ion , each n ew r o w is assig n ed a n ew
( u n iq u e) seq u en ce n u m b er .
Ar r a y s

On e of t he u n iq u e f eat u r es of Post g r eSQL is t h e fact t h at y ou can d ef in e a colu m n t o b e an ar r ay . Most com m er cial d at ab ase sy st em s
r equ ir e t h at a sin gle colu m n w it h in a g iv en r ow can h old n o m or e t h an on e v alu e. Wit h Post g r eSQL, y ou ar en ' t b ou n d by t h at r u le—
y ou can cr eat e colu m n s t h at st or e m u lt iple v alu es ( of t h e sam e dat a t y p e) .

Th e customers t ab le def in ed in Ch ap t er 1 con t ain ed a sin gle balance colu m n . Wh at ch an ge w ou ld y ou h av e t o m ak e t o t h e d at ab ase


if y ou w an t ed t o st or e a m on t h - by - m on t h b alan ce for each cu st om er , g oin g b ack at m ost 1 2 m on t h s? On e alt er n at iv e w ou ld b e t o
cr eat e a separ at e t ab le t o st or e m on t h ly b alan ces. Th e p r im ar y k ey of t he cust_balance m ig h t b e com p osed of t h e customer_id
an d t h e m on t h n u m b er ( eit h er 0 - 1 1 or 1 - 1 2 , w h ich ev er y ou f ou n d m or e con v en ien t ) [ 12] . Th is w ou ld cer t ain ly w or k , bu t in
Post gr eSQL, it ' s n ot t h e on ly ch oice.

[ 12]
Th e r elat ion sh ip b et w een t h e customers t ab le an d t he cust_balance is called a par en t / ch ild r elat ion sh ip. I n t h is
case, t h e customers t ab le is t h e p ar en t an d cust_balance is t h e ch ild. Th e pr im ar y k ey of a ch ild t able is com p osed of
t he p ar en t k ey p lu s a u n iqu if ier ( t h at is, a v alu e, su ch as t he m on t h n u m b er , t h at p r ov id es a u n iqu e iden t if ier w it h in a
gr ou p of r elat ed ch ildr en ) .

You k n ow t h at t h er e ar e n ev er m or e t h an 1 2 m on t h s in a y ear an d t h at t h er e ar e n ev er f ew er t h an 1 2 m on t h s in a y ear . Par en t / ch ild


r elat ion sh ips ar e per f ect w h en t h e p ar en t h as a v ar iable n u m b er of ch ildr en , b u t t h ey ar en ' t alw ay s t h e m ost con v en ien t ch oice w h en
t he n u m b er of ch ild r ecor d s is f ix ed.

I n st ead, y ou cou ld st or e all 1 2 m on t h ly b alan ce v alu es in side t h e customers t able. Her e is h ow y ou m ig h t cr eat e t he customers
t able u sin g an ar r ay t o st or e t h e m on t h ly b alan ces:

CREATE TABLE customers (


customer_id INTEGER UNIQUE,
customer_name VARCHAR(50),
phone CHAR(8),
birth_date DATE,
balance DECIMAL(7,2),
monthly_balances DECIMAL(7,2)[12]
);

Not ice t h at I h av e ad d ed a n ew colu m n n am ed monthly_balances —t h is is an ar r ay of 1 2 DECIMAL v alu es. I 'll sh ow y ou h ow t o p u t


v alu es in t o an ar r ay in a m om en t .

You can d ef in e an ar r ay of any d at a t y p e: t h e bu ilt - in t y pes, u ser - def in ed t y pes, ev en ot h er ar r ay s. Wh en y ou cr eat e an ar r ay of


ar r ay s, y ou ar e act u ally cr eat in g a m u lt idim en sion al ar r ay . For ex am p le, if w e w an t ed t o st or e m on t h - by - m on t h balan ces f or t h e
t h r ee pr ev iou s y ear s, I cou ld h av e cr eat ed t h e monthly_balances f ield as

monthly_balances DECIMAL(7,2)[3][12]

Th is w ou ld giv e y ou t h r ee ar r ay s of 1 2 - elem en t ar r ay s.

Th er e is n o lim it t o t h e n u m b er of m em b er s in an ar r ay . Th er e is also n o lim it t o t h e n u m b er of d im en sion s in a m u lt id im en sion al


ar r ay .

Now , let ' s t alk abou t in ser t in g an d u pdat in g ar r ay v alu es. Wh en y ou w an t t o in ser t a n ew r ow in t o t he customers t able, y ou pr ov ide
v alu es f or each m em b er in t h e monthly_balances ar r ay as f ollow s:

INSERT INTO customers


(
customer_id, customer_name, phone, birth_date, balance, monthly_balances
)
VALUES
(
8,
'Wink Wankel',
'555-1000',
'1988-12-25',
0.00,
'{1,2,3,4,5,6,7,8,9,10,11,12}'
);

To INSERT v alu es in t o an ar r ay , y ou en close all t he ar r ay elem en t s in sin gle q u ot es and b r aces ( {}) an d separ at e m u lt iple elem en t s
w it h a com m a. St ar t in g w it h Post g r eSQL v er sion 7 . 4 , y ou can u se an alt er n at e f or m t o w r it e an ar r ay v alu e. To ex pr ess t h e sam e
ar r ay v alu e in con st r u ct or sy n t ax f or m , y ou w ou ld w r it e:

ARRAY[1,2,3,4,5,6,7,8,9,10,11,12]

Th er e ar e t w o ad v an t ag es t o t h e n ew ar r ay con st r u ct or sy n t ax . Fir st , t h e m ean in g is a bit m or e ob v iou s w h en y ou ' r e look in g at a


piece of u n f am iliar SQL cod e. Secon d , if y ou w r it e an ar r ay v alu e ex p r essed ' { element 1 , element 2, ...} ' f or m , y ou h av e t o
dou ble u p t h e q u ot es if t he ar r ay con t ain s st r in g v alu es. I n ar r ay con st r u ct or f or m , y ou don ' t h av e t o d o t h at . Th e f ollow in g ar r ay
v alu es ar e eq u iv alen t :

'{ ''Panky, Henry'', ''Rubin, William'', ''Grumby, Jonas'' }'


ARRAY[ 'Panky, Henry', 'Rubin, William', 'Grumby, Jonas' ]

I n ser t in g v alu es in t o a m u lt id im en sion al ar r ay is t r eat ed as if y ou w er e in ser t in g an ar r ay of ar r ay s. For ex am p le, if y ou h ad a t ab le


def in ed as

CREATE TABLE arr


(
pkey serial,
val int[2][3]
);

y ou w ou ld INSERT a r ow as

INSERT INTO arr( val ) VALUES( '{ {1,2,3}, {4,5,6} }' );

Or , t o w r it e t he sam e ar r ay v alu e in con st r u ct or f or m :

INSERT INTO arr( val ) VALUES( ARRAY[ [1,2,3], [4,5,6] ] );

Look in g b ack at t h e customers t ab le n ow ; if y ou SELECT t h e r ow t h at y ou INSERT ed , y ou ' ll see:

movies=# \x
Expanded display is on

movies=# SELECT
movies-# customer_name, monthly_balances
movies-# FROM customers
movies-# WHERE customer_id = 8;
-[ RECORD 1 ]----+------------------------------------

customer_name | Wink Wankel

monthly_balances | {1.00,2,3,4,5,6,7,8,9,10,11,12.00}

To m ak e t h is ex am p le a lit t le m or e r ead ab le in b ook f or m , I h av e u sed psql ' s \x com m an d t o r ear r an g e t h e display f or m at h er e. I


h av e also ed it ed ou t som e of t h e t r ailin g zer oes in t h e monthly_balances colu m n .

You can r et r iev e specif ic elem en t s w it h in an ar r ay :

movies=# SELECT
movies-# customer_name, monthly_balances[3]
movies-# FROM customers
movies-# WHERE customer_id = 8;
customer_name | monthly_balances
---------------+------------------
Wink Wankel | 3.00
(1 row)

Or y ou can ask f or a r an ge [ 13] of ar r ay elem en t s:

[ 13]
Th e Post g r eSQL d ocu m en t at ion r ef er s t o a con t igu ou s r an g e of ar r ay elem en t s as a slice.

movies=# SELECT
movies-# customer_name, monthly_balances[1:3]
movies-# FROM customers
movies-# WHERE customer_id = 8;
customer_name | monthly_balances
---------------+------------------------
Wink Wankel | {"1.00","2.00","3.00"}
(1 row)

Th e in d ex for an ar r ay st ar t s at 1 b y def au lt . I ' ll sh ow y ou h ow t o ch an g e t h e r an g e of an in d ex in a m om en t .

You can u se an ar r ay elem en t in any sit u at ion w h er e y ou can u se a v alu e of t h e sam e dat a t y pe. For ex am p le, y ou can u se an ar r ay
elem en t in a WHERE clau se:

movies=# SELECT
movies-# customer_name, monthly_balances[1:3]
movies-# FROM customers
movies-# WHERE monthly_balances[1] > 0;
customer_name | monthly_balances
---------------+------------------------
Wink Wankel | {"1.00","2.00","3.00"}
(1 row)

Th er e ar e t h r ee w ay s t o UPDATE an ar r ay . I f y ou w an t t o UPDATE all elem en t s in an ar r ay , sim ply SET t h e ar r ay t o a n ew v alu e:

movies=# UPDATE customers SET


movies-# monthly_balances = '{12,11,10,9,8,7,6,5,4,3,1}'
movies-# WHERE customer_id = 8;

I f y ou w an t t o UPDATE a sin gle ar r ay elem en t , sim p ly iden t if y t h e elem en t :

movies=# UPDATE customers SET monthly_balances[1] = 22;

Fin ally , y ou can UPDATE a con t igu ou s r an g e of elem en t s:

movies=# UPDATE customers SET monthly_balances[1:3] = '{11,22,33}';

Now , t h er e ar e a f ew od d t h in gs y ou sh ou ld k n ow ab ou t ar r ay s in Post g r eSQL.

Fir st , t h e ar r ay b ou n d s t h at y ou specif y w h en y ou cr eat e a colu m n ar e opt ion al. I don ' t j u st m ean t h at y ou can om it an ar r ay b ou n d
w h en y ou cr eat e a colu m n ( alt h ou gh y ou can ) , I m ean t h at Post g r eSQL w on ' t en f or ce an y lim it s t h at y ou t r y t o im p ose. For ex am p le,
y ou cr eat ed t h e monthly_balances colu m n as a 12 - elem en t ar r ay . Post g r eSQL h appily let s y ou p u t a v alu e in t o elem en t 1 3 , 1 4 , or
2 6 8 . Th e array_dims() f u n ct ion t ells t h e u p p er an d low er b ou n d s of an ar r ay v alu e:

movies=# SELECT array_dims( monthly_balances ) FROM customers


movies-# WHERE
movies-# customer_id = 8;

array_dims
------------
[1:12]

You can in cr ease t h e size of an ar r ay b y u pdat in g v alu es adj acen t t o t h ose t h at alr ead y ex ist [ 14] . For ex am p le, t he
monthly_balances colu m n f or cu st om er 8 ( Win k Wan k el) con t ain s 1 2 elem en t s, n u m b er ed 1 t h r ou g h 1 2 . You can add n ew elem en t s
at eit h er en d of t h e r an ge ( ar r ay su bscr ipt s can be n eg at iv e) :

[ 14] Th e Post g r eSQL d ocu m en t at ion w ar n s t h at y ou can ' t ex p an d a m u lt id im en sion al ar r ay .

Code View : Scr oll / Sh ow All

movies=# UPDATE customers SET


movies-# monthly_balances[13] = 13
movies-# WHERE
movies-# customer_id = 8;
UPDATE 1

movies=# SELECT array_dims( monthly_balances ) FROM customers


movies-# WHERE
movies-# customer_id = 8;
array_dims
------------
[1:13]

movies=# UPDATE customers SET


movies-# monthly_balances[-1:0] = '{ -1, 0 }'
movies-# WHERE
movies-# customer_id = 8;
UPDATE 1

movies=# SELECT array_dims( monthly_balances ) FROM customers


movies-# WHERE
movies=# customer_id = 8;
array_dims
------------
[-1:13]
Not e t h at y ou can ex p an d an ar r ay on ly by u p d at in g elem en t s t h at ar e dir ect ly adj acen t t o t h e ex ist in g elem en t s. For ex am p le,
cu st om er n u m b er 8 n ow con t ain s elem en t s - 1 : 1 3 . We can ' t ad d an elem en t 1 5 w it h ou t fir st ad d in g elem en t 1 4 :

movies=# UPDATE customers SET


movies-# monthly_balances[15] = 15
movies-# WHERE
movies-# customer_id = 8;
ERROR: Invalid array subscripts

Nex t , t h e sy n t ax f or in ser t in g or u p d at in g ar r ay v alu es is a bit m isleadin g. Let ' s say t h at y ou w an t t o in ser t a n ew r ow in y ou r


customers t able, b u t y ou on ly w an t t o pr ov ide a b alan ce f or m on t h n u m b er 3:

movies=# INSERT INTO customers


movies-# ( customer_id, customer_name, monthly_balances[3] )
movies-# VALUES
movies-# ( 9, 'Samuel Boney', '{300}' );

Th is ap p ear s t o w or k , bu t t h er e is dan ger lu r k in g h er e. Let ' s g o b ack an d r et r iev e t h e dat a t h at y ou j u st in ser t ed:

movies=# SELECT customer_name, monthly_balances[3]


movies-# FROM customers
movies-# WHERE
movies-# customer_id = 9;
customer_name | monthly_balances
---------------+------------------
Samuel Boney |

Wh er e' d t h e d at a g o? I f y ou SELECT all ar r ay elem en t s, t h e dat a is st ill t h er e:

movies=# SELECT customer_name, monthly_balances


movies-# FROM customers
movies-# WHERE
movies-# customer_id = 9;
customer_name | monthly_balances
---------------+------------------
Samuel Boney | {"300"}

Th e array_dims() f u n ct ion giv es y ou a pr et t y g ood h in t :

movies=# SELECT array_dims( monthly_balances ) FROM customers


movies-# WHERE
movies-# customer_id = 9;

array_dims
------------
[1:1]

Accor din g t o array_dims(), t he h igh an d low su bscr ipt v alu es ar e b ot h 1. You ex plicit ly INSERT ed t h e v alu e 300 in t o ar r ay elem en t
3, b u t Post g r eSQL ( silen t ly ) d ecid ed t o p lace it in t o elem en t 1 an y w ay . Th is seem s a bit m y st er iou s t o m e, b u t t h at ' s h ow it w or k s.

Th e f in al oddit y con cer n s h o w Post g r eSQL h an dles NULL v alu es an d ar r ay s. An ar r ay can b e NULL, bu t an in div idu al elem en t can n ot —
y ou can ' t h av e an ar r ay in w h ich som e elem en t s ar e NULL an d ot h er s ar e n ot . Fu r t h er m or e, Post g r eSQL silen t ly ign or es an at t em p t
t o UPDATE an ar r ay m em b er t o NULL :

movies=# SELECT customer_name, monthly_balances


movies-# FROM
movies-# customers
movies-# WHERE
movies-# customer_id = 8;
-[ RECORD 1 ]----+------------------------------------
id | 8
customer_name | Wink Wankel
phone | 555-1000
birth_date | 1988-12-25
monthly_balances | {1.00,2,3,4,5,6,7,8,9,10,11,12.00}

movies=# UPDATE customers SET


movies-# monthly_balances[1] = NULL
movies-# WHERE
movies-# customer_id = 8;
UPDATE 1

You w on ' t g et an y er r or m essag es w h en y ou t r y t o ch an g e an ar r ay elem en t t o NULL, b u t a SELECT st at em en t w ill sh ow t h at t he


UPDATE h ad n o ef f ect :
movies=# SELECT customer_name, monthly_balances
movies-# FROM
movies-# customers
movies-# WHERE
movies-# customer_id = 8;
-[ RECORD 1 ]----+------------------------------------
id | 8
customer_name | Wink Wankel
phone | 555-1000
birth_date | 1988-12-25
monthly_balances | {1.00,2,3,4,5,6,7,8,9,10,11,12.00}

I f y ou k eep t h ese t h r ee oddit ies in m in d , ar r ay s can be v er y u sef u l. Rem em b er , t h ou g h , t h at an ar r ay is n ot a su bst it u t e for a ch ild
t able. You sh ou ld u se an ar r ay on ly w h en t h e n u m b er of elem en t s is f ix ed by som e r eal- w or ld con st r ain t ( 1 2 m on t h s p er y ear , 7 d ay s
per w eek , an d so on ) .
Co lu m n Co n st r a in t s

W h e n y o u cr eat e a Po st g r eSQL t ab le, y o u ca n d ef i n e co l u m n con st r ain t s [ 1 5 ] . A co l u m n co n st r ai n t is a r u l e t h at m u st b e sat isf ied


w h en ev er y o u in ser t o r u p d a t e a v al u e in t h at co l u m n .

[ 15]
Yo u ca n al so d ef i n e t ab le con st r ain t s. A t ab le con st r ain t ap p l i es t o t h e t ab le a s a w h o l e, n o t j u st a sin g le
co l u m n . We' l l d i scu ss t ab le co n st r ai n t s in Ch ap t er 3 , " Po st g r eSQL SQL Sy n t a x a n d Use. "

I t ' s v er y i m p o r t an t t o u n d er st a n d t h at w h e n y o u d ef i n e a co l u m n con st r ain t , Po st g r eSQL w o n ' t ev er let y o u r t ab le g e t i n t o a


st at e in w h i ch t h e co n st r ai n t s a r e n o t m et . I f y o u t r y t o INSERT a v al u e t h at v iolat es a con st r ain t , t h e i n ser t i on w ill f ail. I f y o u
t r y t o UPDATE a v al u e in su ch a w a y t h a t it w o u l d v i o l at e a con st r ain t , t h e m o d i f i cat i o n w ill b e r ej ect ed .

Yo u ca n also d ef in e con st r ain t s t h at est ab lish r elat ion sh ip s b et w een t w o t ab l es. Fo r ex a m p l e, ea ch r o w in t h e rentals t ab le
co n t ai n s a tape_id ( co r r esp o n d i n g t o a r o w in t h e tapes t ab le) . Yo u co u l d d ef i n e a con st r ain t t o t ell Po st g r eSQL t h a t t h e
rentals.tape_id co l u m n REFERENCES t h e tapes.tape_id co l u m n . I ' ll d i scu ss t h e i m p l i cat i on s of a REFERENCES con st r ain t in a
m om en t .

Need l ess t o sa y , co l u m n con st r ain t s ar e a v er y p o w er f u l f eat u r e.

NULL/NOT NULL

Let ' s st ar t w it h t h e m o st b asi c co l u m n co n st r ai n t s: NULL a n d NOT NULL. Yo u ' v e al r ead y se e n so m e ex a m p l es of t h e NOT NULL
con st r ain t ( in Ch ap t er 1 ) :

CREATE TABLE customers (


customer_id INTEGER UNIQUE NOT NULL,
name VARCHAR(50) NOT NULL,
phone CHAR(8),
birth_date DATE,
balance DECIMAL(7,2)
);

I h a v e sp eci f i ed t h at t h e customer_id a n d name co l u m n s a r e NOT NULL. Th e m e a n i n g o f a NOT NULL con st r ain t is p r et t y clear :
Th e co l u m n i s n o t al l o w ed t o con t ai n a NULL v alu e [ 1 6 ] . I f y o u t r y t o INSERT a NULL v alu e in t o t h e customer_id or name co l u m n s,
y o u w ill r ecei v e a n er r or :

[ 16]
A co l u m n t h at h a s b e e n d ef i n ed t o b e NOT NULL is al so k n o w n as a m a n d a t o r y co l u m n . A co l u m n t h at ca n
a ccep t NULL v al u es i s said t o b e o p t i o n al .

INSERT INTO customers VALUES


(
11,
NULL,
'555-1984',
'10-MAY-1980',
0
);

ERROR: ExecAppend: Fail to add null value in not null


attribute customer_name

You ' ll also g et a n er r or if y o u t r y t o UPDATE eit h er co l u m n in su ch a w a y t h at t h e r esu l t w o u l d b e NULL :

UPDATE customers SET customer_name = NULL WHERE customer_id = 1;

ERROR: ExecReplace: Fail to add null value in not null


attribute customer_name

Th e o p p o si t e of NOT NULL is NULL. Yo u ca n ex p licit ly d ef i n e a NULL con st r ain t , b u t it r eally d o esn ' t f u n ct i o n as a con st r ain t . A
NULL con st r ain t d o es n o t f or ce a co l u m n t o con t ai n on ly NULL v al u es ( t h at w o u l d b e p r et t y p oin t less) . I n st ead , a NULL con st r ain t
si m p l y t ells Po st g r eSQL t h at NULL v al u es ar e al l o w ed in a p ar t icu lar co l u m n . I f y o u d on ' t sp eci f y t h at a co l u m n is m a n d a t o r y , it
is co n si d er ed op t ion al.

UNIQUE

Th e UNIQUE con st r ain t en su r es t h at a co l u m n w ill co n t ai n u n i q u e v alu es; t h at is, t h er e w ill b e n o d u p licat e v al u es in t h e co l u m n .


I f y o u look b ack t o t h e p r ev i o u s sect i o n , y ou ' ll se e t h at y o u sp ecif ied t h at t h e customer_id co l u m n sh o u l d b e UNIQUE. I f y o u t r y
t o INSERT a d u p licat e v alu e in t o a UNIQUE co l u m n , y o u w ill r eceiv e a n er r or m e ssa g e :
movies=# SELECT * FROM customers;

customer_id | customer_name | phone | birth_date | balance


-------------+----------------------+----------+------------+--------
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00

movies=# INSERT INTO customers VALUES


movies-# (
movies-# 1,
movies-# 'John Gomez',
movies-# '555-4272',
movies-# '1982-06-02',
movies-# 0.00
movies-# );

ERROR: Cannot insert a duplicate key into unique


index customers_customer_id_key

W h e n y o u cr eat e a UNIQUE co l u m n , Po st g r eSQL w ill en su r e t h at a n i n d ex ex i st s f or t h a t co l u m n . I f y o u d o n ' t cr eat e o n e y ou r self ,


Po st g r eSQL w ill cr eat e o n e f or y o u . We' ll t alk m o r e a b o u t i n d ex es in Ch ap t er 3 .

PRIMARY KEY

Al m o st ev er y t ab le t h at y o u cr eat e w ill h a v e o n e co l u m n ( or p o ssi b l y a set o f co l u m n s) t h at u n i q u el y id en t if ies each r o w . Fo r


ex a m p l e, each t ap e in t h e tapes t ab le is u n iq u ely id en t if ied b y it s tape_id. Ea ch cu st o m er in y o u r customers t ab le is id en t if ied
b y a UNIQUE customer_id. I n r elat ion al d a t a b a se lin g o, t h e set o f co l u m n s t h at act t o id en t if y a r o w is called t h e p r i m ar y k e y .

Qu i t e o f t en , y o u w ill f in d t h at a t ab le h a s m o r e t h a n o n e u n i q u e co l u m n . Fo r ex a m p l e, a t ab le h o l d i n g e m p l o y e e i n f o r m at i o n
m i g h t h a v e an employee_id co l u m n a n d a social_security_number ( SSN) co l u m n . Yo u cou ld ar g u e t h at eit h er of t h ese w o u l d
be a r easo n ab l e p r i m a r y k ey . Th e employee_id w o u l d p r o b ab l y b e t h e b et t er ch o i ce f or at least t h r ee r easo n s. Fir st , y o u a r e
lik ely t o r ef er t o a n em p l o y ee r eco r d in o t h er t ab l es ( f or ex a m p l e, withholdings a n d earnings) — a n employee_id is ( m o st
lik ely ) sh o r t er t h a n a n SSN. Seco n d , a n SSN is co n si d er ed p r i v at e i n f or m at i on , a n d y o u d o n ' t w a n t t o ex p o se a n em p l o y ee' s
SSN t o ev er y o n e w h o h a s access t o o n e o f t h e r elat ed f iles. Th ir d , it is en t ir ely p ossib le t h at so m e of y ou r em p l o y ees m a y n o t
h a v e Social Secu r i t y n u m b e r s ( t h ey m a y n o t b e U. S. cit izen s) — y o u can ' t d ef i n e a co l u m n a s t h e PRIMARY KEY if t h at co l u m n
allow s NULL v al u es.

Po st g r eSQL p r o v i d es a con st r ain t , PRIMARY KEY, t h at y o u ca n u se t o d ef i n e t h e p r i m ar y k ey f o r a t ab le. Pr act ically sp eak i n g ,


id en t if y in g a co l u m n ( or a set of co l u m n s) a s a PRIMARY KEY is t h e sa m e as d ef in in g t h e co l u m n t o b e NOT NULL a n d UNIQUE.
Bu t t h e PRIMARY KEY con st r ain t d o es of f er o n e ad v an t ag e ov er NOT NULL a n d UNIQUE : d o cu m en t at i o n . W h e n y o u cr eat e a
PRIMARY KEY, y o u ar e st at in g t h at t h e co l u m n s t h at co m p r i se t h e k e y sh o u l d b e u sed w h e n y o u n e e d t o r ef er t o a r o w in t h at
t ab le. Ea ch r o w in t h e rentals t ab le, f or ex a m p l e, co n t ai n s a r ef er en ce ( rentals.tape_id) t o a tape a n d a r ef er en ce
( rentals.customer_id) t o a customer. Yo u sh o u l d d ef i n e t h e customers.customer_id co l u m n as t h e PRIMARY KEY of t h e
customers t ab le:

CREATE TABLE customers (


customer_id INTEGER PRIMARY KEY,
name VARCHAR(50) NOT NULL,
phone CHAR(8),
birth_date DATE,
balance DECIMAL(7,2)
);

Yo u sh o u l d also d ef i n e t h e tapes.tape_id co l u m n as t h e p r i m a r y k e y of t h e tapes t ab le:

CREATE TABLE tapes (


tape_id CHARACTER(8) PRIMARY KEY,
title CHARACTER VARYING(80)
);

No w , let ' s l ook at t h e o t h er h alf o f t h e eq u at i o n : t h e REFERENCES con st r ain t .

REFERENCES

A f o r ei g n k e y is a co l u m n ( or g r o u p o f co l u m n s) in o n e t ab le t h at r ef er s t o a r o w in an o t h er t ab le. Usu al l y , b u t n o t al w ay s, a
f or eig n k ey r ef er s t o t h e p r i m a r y k e y of an o t h er t ab le.

Th e REFERENCES con st r ain t t ells Po st g r eSQL t h at o n e t ab l e r ef er s t o an o t h er t ab le ( or m o r e p r eci sel y , a f o r ei g n k e y in o n e t ab le


r ef er s t o t h e p r i m ar y k e y o f an o t h er ) . Let ' s l o o k at a n ex a m p l e:

CREATE TABLE rentals (


tape_id CHARACTER(8) REFERENCES tapes,
customer_id INTEGER REFERENCES customers,
rental_date DATE
);

I ' v e n o w d ef i n ed rentals.tape_id a n d rentals.customer_id t o b e f or eig n k ey s. I n t h i s ex a m p l e, t h e rentals.tape_id


co l u m n i s al so called a r ef er en ce a n d t h e tapes.tape_id co l u m n is called t h e r ef er en t .

Th er e ar e a f ew i m p l i cat i on s t o t h e REFERENCES con st r ain t t h at y o u w ill n eed t o con si d er . Fir st , t h e REFERENCES con st r ain t is a
con st r ain t : Po st g r eSQL d o es n o t al l o w y o u t o ch a n g e t h e d a t a b a se in su ch a w a y t h at t h e con st r ain t w o u l d b e v iolat ed . Yo u
can n o t a d d a rentals r o w t h at r ef er s t o a n o n ex i st en t t ap e ( or t o a n o n ex i st en t cu st o m er ) :

movies=# SELECT * FROM tapes;


tape_id | title
----------+---------------
AB-12345 | The Godfather
AB-67472 | The Godfather
MC-68873 | Casablanca
OW-41221 | Citizen Kane
AH-54706 | Rear Window

movies=# INSERT INTO rentals VALUES


movies-# (
movies(# 'OW-00000',
movies(# 1,
movies(# '2002-02-21'
movies(# );
ERROR: <unnamed> referential integrity violation -
key referenced from rentals not found in tapes

Th e n ex t t h in g t o con si d er is t h at y o u ca n n o t ( n o r m al l y ) DELETE a r ef er en t —d oi n g so w o u l d v iolat e t h e REFERENCES con st r ain t :

movies=# SELECT * FROM rentals;


tape_id | customer_id | rental_date
----------+-------------+-------------
AB-12345 | 1 | 2001-11-25
AB-67472 | 3 | 2001-11-25
OW-41221 | 1 | 2001-11-25
MC-68873 | 3 | 2001-11-20
(4 rows)

movies=# DELETE FROM tapes WHERE tape_id = 'AB-12345';


ERROR: <unnamed> referential integrity violation -
key in tapes still referenced from rentals

So m et i m es, it ' s n ot ap p r o p r i at e f or a REFERENCES con st r ain t t o b l o ck t h e d el et i o n of a r ef er en t . Yo u ca n sp ecif y t h e act i o n t h at


Po st g r eSQL sh o u l d t ak e w h e n t h e r ef er en t is d el et ed . Th e d ef au lt act i o n ( also k n o w n a s NO ACTION a n d RESTRICT) is t o p r ev en t
t h e d elet ion of a r ef er en t if t h er e a r e st ill a n y r ef er en ces t o it . Th e n ex t alt er n at iv e, CASCADE, d el et es all r o w s t h at r ef er t o a
v alu e w h e n t h e r ef er en t is d el et ed . Th e f in al t w o ch o i ces b r eak t h e l i n k b et w een t h e r ef er en ce a n d t h e r ef er en t : SET NULL
u p d at es a n y r ef er en ces t o NULL w h en ev er a r ef er en t is d el et ed , w h er ea s SET DEFAULT u p d a t es a n y r ef er en ces t o t h eir d ef au lt
v al u es w h e n a r ef er en t is d el et ed .

I f y o u w a n t t o sp ecif y o n e o f t h e alt er n at iv es, y o u w o u l d u se t h e f ol l ow i n g sy n t ax w h e n y o u cr eat e t h e REFERENCES con st r ain t :

REFERENCES table [ (column) ] ON DELETE


NO ACTION | RESTRICT | CASCADE | SET NULL | SET DEFAULT

By d ef au lt , a REFERENCES con st r ain t al so p r ev en t s y o u f r o m ch a n g i n g d at a in su ch a w a y t h at t h e con st r ain t w o u l d b e v iolat ed .


Yo u ca n u se t h e ON UPDATE cl au se t o r elax t h e con st r ain t a lit t le, m u ch t h e sa m e a s t h e ON DELETE cl au se.

Th e sy n t ax r eq u i r ed f or ON UPDATE is

REFERENCES table [ (column) ] ON UPDATE


NO ACTION | RESTRICT | CASCADE | SET NULL | SET DEFAULT

Th er e is a su b t le d if f er en ce b et w een t h e ON UPDATE cl au se a n d ON DELETE clau se. W h e n y o u DELETE a r ef er en t , t h e en t ir e r o w


d i sap p ear s, so t h e b eh a v i o r of t h e ON DELETE cl au se is o b v i o u s. W h e n y o u UPDATE a r ef er en t r o w , y o u m a y ch a n g e v al u es ot h er
t h an t h e r ef er en t co l u m n ( s) . I f y o u UPDATE a r ef er en t r o w , b u t y o u d on ' t u p d at e t h e r ef er en t co l u m n , y o u can ' t in t r od u ce a
con st r ain t v iolat ion , so t h e ON UPDATE act ion d o esn ' t co m e i n t o p lay . I f y o u d o ch a n g e t h e r ef er en t co l u m n , t h e ON UPDATE
act ion is t r ig g er ed .

Th e NO ACTION a n d RESTRICT act i o n s si m p l y p r ev en t a co n st r ai n t v iolat ion — t h is is id en t ical t o t h e ON DELETE clau se. Th e


CASCADE act ion ca u ses all r ef er en ces t o b e u p d a t ed w h e n e v e r a r ef er en t ch an g es. SET NULL a n d SET DEFAULT act i o n s w o r k t h e
sa m e f o r ON UPDATE as f or ON DELETE.

CHECK()

By d ef i n i n g a CHECK() con st r ain t o n a co l u m n , y o u ca n t ell Po st g r eSQL t h at a n y v al u es i n ser t ed in t o t h at co l u m n m u st sat isf y a n


ar b it r ar y Bo o l ean ex p r essi o n . Th e sy n t ax f or a CHECK() con st r ain t is

[CONSTRAINT constraint-name] CHECK( boolean-expression )

Fo r ex a m p l e, if y o u w a n t t o en su r e t h at t h e customer_balance co l u m n is a p o si t i v e v alu e, b u t less t h a n $ 1 0 , 0 0 0 . 0 0 , y o u m i g h t


u se t h e f ol l ow i n g :

CREATE TABLE customers


(
customer_id INTEGER UNIQUE,
customer_name VARCHAR(50),
phone CHAR(8),
birth_date DATE,
balance DECIMAL(7,2)
CONSTRAINT invalid_balance
CHECK( balance > 0 AND balance < 10000 )
);

No w , if y o u t r y t o INSERT a n in v alid v alu e in t o t h e customer_balance t ab le, y ou ' ll cau se a n er r or :

INSERT INTO customers VALUES


(
10,
'John Smallberries',
'555-8426',
'1970-JAN-02',
20000
);

ERROR: ExecAppend: rejected due to CHECK constraint invalid_balance


Ex p r e ssi o n Ev a l u a t i o n a n d T y p e Co n v e r si o n

Now t hat you have seen all t he st andard Post greSQL dat a t ypes, it 's t im e t o t alk about how you can com bine values of
different t ypes int o com plex expressions.

First , you should underst and t hat an expression represent s a value. I n a well- designed language, you can use an expression
anywhere you can use a value. An expression can be as sim ple as a single value: 3.14159 is an expression. A com plex
expression is creat ed by com bining t wo sim ple expressions wit h an operat or. An operat or is a sym bol t hat represent s som e
sort of operat ion t o be applied t o one or t wo operands. For exam ple, t he expression " customer_balance * 1.10 " uses t he
m ult iplicat ion operat or ( *) t o m ult iply customer_balance by 1.10. I n t his exam ple, customer_balance is t he left operand, *
is t he operat or, and 1.10 is t he right operand. This expression com bines t wo different kinds of values: customer_balance is
( presum ably) a colum n in one of your t ables; whereas 1.10 is a lit eral value ( inform ally called a const ant ) . You can com bine
colum n values, lit eral values, funct ion result s, and ot her expressions t o build com plex expressions.

Most operat ors ( such as *, +, and <) require t wo operands: t hese are called binary operat ors. Ot her operat ors ( such as !!,
t he fact orial operat or) work wit h a single value: t hese are called unary operat ors[ 1 7 ] . Som e operat ors ( such as -) can
funct ion as eit her.

[ 17]
You m ay also see t he t erm s dyadic ( m eaning t wo-valued) and m onadic ( m eaning single- valued) . These
t erm s have t he dist inct advant age t hat you will never have t o worry about accident ally saying "urinary
operat or" in polit e com pany.

For som e expressions, part icularly t hose expressions t hat m ix dat a t ypes, Post greSQL m ust perform im plicit t ype conversions
[ 18]
. For exam ple, t here is no predefined operat or t hat allows you t o add an INT2 t o a FLOAT8. Post greSQL can convert t he
INT2 int o a FLOAT8 before perform ing t he addit ion, and t here is an operat or t hat can add t wo FLOAT8 values. Every com put er
language defines a set of rules[ 1 9 ] t hat govern aut om at ic t ype conversion; Post greSQL is no except ion.

[ 18]
A t ype conversion t hat is aut om at ically provided by Post greSQL is called a coercion. A t ype conversion
caused explicit ly by t he program m er ( using t he CAST() or '::' operator) is called a cast .

[ 19]
A given language m ight sim ply prohibit aut om at ic t ype conversion, but m ost languages t ry t o help out t he
program m er a bit .

Post greSQL is rat her unique in it s dept h of support for user- defined dat a t ypes. I n m ost RDBMSs, you can define new dat a
t ypes, but you are really j ust providing a different nam e for an exist ing dat a t ype ( alt hough you m ight be able t o const rain
t he set of legal values in t he new t ype) . Wit h Post greSQL, you can add new dat a t ypes t hat are not necessarily relat ed t o t he
exist ing dat a t ypes. When you add a new dat a t ype t o Post greSQL, you can also define a set of operat ors t hat can operat e on
t he new t ype. Each operat or is im plem ent ed as an operat or funct ion; usually, but not necessarily, writ t en in C. When you use
an operat or in an expression, Post greSQL m ust find an operat or funct ion t hat it can use t o evaluat e t he expression. The point
of t his short digression is t hat alt hough m ost languages can define a st at ic set of rules governing t ype conversion, t he
presence of user- defined dat a t ypes requires a m ore dynam ic approach. To accom m odat e user -defined dat a t ypes,
Post greSQL consult s a t able nam ed pg_operator. Each row in t he pg_operator cont ains an operat or nam e ( such as + or #) ,
t he operand dat a t ypes, and t he dat a t ype of t he result . For exam ple, ( in Post greSQL version 7.1.2) t here are 31 rows in
pg_operator t hat describe t he + operat or: One row describes t he + operat or when applied t o t wo POINT values, anot her row
describes t he + operat or when applied t o t wo INTERVAL values, and a t hird row describes t he + operat or when applied t o an
INT2 and an INT4.

You can see t he com plet e list of operat ors using t he "\do " com m and in t he psql query t ool.

When searching for an operat or funct ion, Post greSQL first searches t he pg_operator t able for an operat or t hat exact ly
m at ches dat a t ypes involved in t he expression. For exam ple, given t he expression:

CAST( 1.2 AS DECIMAL ) + CAST( 5 AS INTEGER )

Post greSQL searches for a funct ion nam ed '+ ' t hat t akes a DECIMAL value as t he left operand and an INTEGER value as right
operand. I f it can't find a funct ion t hat m eet s t hose crit eria, t he next st ep is t o det erm ine whet her it can coerce one ( or bot h)
of t he values int o a different dat a t ype. I n our exam ple, Post greSQL could choose t o convert eit her value: The DECIMAL value
could be convert ed int o an INTEGER, or t he INTEGER value could be convert ed int o a DECIMAL. Now we have t wo operat or
funct ions t o choose from : One funct ion can add t wo DECIMAL values and t he ot her can add t wo INTEGER values. I f
Post greSQL chooses t he INTEGER + INTEGER operat or funct ion, it will have t o convert t he DECIMAL value int o an INTEGER —
t his will result in loss of precision ( t he fract ional port ion of t he DECIMAL value will be rounded t o t he nearest whole num ber) .
I nst ead, Post greSQL will choose t he DECIMAL + DECIMAL operat or, coercing t he INTEGER value int o a DECIMAL.

So t o sum m arize, Post greSQL first looks for an operat or funct ion in which t he operand t ypes exact ly m at ch t he expression
being evaluat ed. I f it can't find one, Post greSQL looks t hrough t he list of operat or funct ions t hat could be applied by coercing
one ( or bot h) operands int o a different t ype. I f t ype coercion would result in m ore t han one alt ernat ive, Post greSQL t ries t o
find t he operat or funct ion t hat will m aint ain t he great est precision.
The process of select ing an operat or funct ion can get com plex and is described m ore fully in Chapt er 5 of t he Post greSQL
User's Guide.

Table 2.30 list s t he t ype conversion funct ions supplied wit h a st andard Post greSQL dist ribut ion.

Ta ble 2 .3 0 . Ex plicit Type Conversion Funct ions

Result Type Source Type


BOX CIRCLE, POLYGON
DATE TIMESTAMPTZ, DATE, TEXT
INTERVAL INTERVAL, TEXT, TIME

LSEG BOX
MACADDR TEXT

NUMERIC BIGINT, SMALLINT, INTEGER, REAL, DOUBLE PRECISION

OID TEXT
PATH POLYGON

POINT PATH, LSEG, BOX, POLYGON, CIRCLE


POLYGON PATH, CIRCLE, BOX

TEXT INET, DOUBLE PRECISION, NAME, OID, SMALLINT, INTEGER,


INTERVAL, TIMESTAMP WITH TIME ZONE, TIME WITH TIME ZONE, TIME,
BIGINT, DATE, MACADDR, CHAR, REAL

TIME TEXT, TIME, TIMESTAMP WITH TIME ZONE, INTERVAL


Cr e a t in g Y o u r O w n D a t a T y p e s

Post greSQL allows you to creat e your own dat a t ypes. This is not unique am ong relat ional dat abase syst em s, but Post greSQL's dept h of support
is unique. I n ot her RDBMSs, you can define one dat a t ype in t erm s of anot her ( predefined) dat a type. For exam ple, you m ight creat e a new
num eric dat a t ype t o hold an em ployee's age, wit h valid values bet ween 18 and 100. This is st ill a num eric dat a t ype—you m ust define t he new
type as a subset of an exist ing type. Post greSQL calls such a "refined" dat a type a dom ain. St art ing wit h Post greSQL version 8.0, you can also
create com posit e dat a t ypes. A com posit e t ype is a single dat a t ype m ade up of m ult iple fields. For exam ple, you m ight define a com posit e type
nam ed address t hat cont ains a st reet num ber, cit y, st at e/ province, and post al code. When you define a com posit e t ype, each com ponent has a
separat e nam e and data t ype.

With Post greSQL, you can creat e ent irely new t ypes t hat have no relat ionship to exist ing t ypes. When you define a cust om data t ype ( in
Post greSQL) , you det erm ine t he synt ax required for lit eral values, t he form at for int ernal dat a st orage, t he set of operat ors support ed for t he
new type, and t he set of ( predefined) funct ions t hat can operat e on values of t hat type.

There are a num ber of cont ribut ed packages t hat add new dat a t ypes t o t he st andard Post greSQL dist ribut ion. For exam ple, t he Post GI S proj ect
(ht t p: / / post gis.refract ions.net ) adds geographic dat a t ypes based on specificat ions produced by t he Open GI S Consort ium . The /contrib
direct ory of a st andard Post greSQL dist ribut ion cont ains a cube dat a t ype as well as an im plem ent at ion of I SBN/ I SSN ( I nt ernat ional St andard
Book Num ber/ I nt ernat ional St andard Serial Num ber) dat a t ypes.

Creat ing a new dat a type is too advanced for t his chapt er. I f you are int erest ed in defining a new dat a t ype, see Chapter 6, "Ext ending
Post greSQL." I n t he next two sect ions, we'll show you how to creat e and work wit h dom ains and com posit e dat a t ypes.

Re f in in g D a t a T y p e s w it h CREATE DOMAIN

A dom ain is a user- defined dat a t ype t hat refines an exist ing dat a t ype. You t ypically creat e a dom ain when you need to st ore t he sam e kind of
data in m any t ables ( or m any t im es wit hin t he sam e t able) . For exam ple, if you creat e a phone_number dom ain, you can st ore a phone num ber in
a cust om er t able ( in fact , you m ay st ore m any phone num bers per cust om er) , in a salesm an t able, a vendor t able, and so on. At first glance, you
m ay t hink t hat you could sim ply add a CHARACTER(13) colum n to each t able, but a phone num ber isn't sim ply a collect ion of 13 charact ers—it
has a specific form at . Here in t he U.S., a phone num ber is oft en writ t en as

(800)555-1212

The t hree- digit area code is surrounded by parent heses, then you see t he prefix, a dash, and t he last four digit s of t he phone num ber. To creat e
a phone_number dom ain t hat enforces t hese const raint s, you could execut e t he com m and

CREATE DOMAIN phone_number AS CHAR(13)


CHECK( VALUE ~ '\\([[:digit:]]{3}\\)[[:digit:]]{3}-[[:digit:]]{4}' );

The first part of t his com m and is st raight forward—you're creat ing a dom ain nam ed phone_number as a const rained version of a 13- charact er
CHAR field. The second part of t he com m and ( t he CHECK() clause) is a const raint . I n t his case, you're t elling Post greSQL t hat t he VALUE st ored in
a phone_number field m ust m atch t he given regular expression ( if you're not accust om ed to reading com plex regular expressions, t his one
specifies t hat VALUE m ust be an open parent hesis followed by t hree digit s, followed by a close parent hesis, followed by t hree digit s, a dash, and
then four digit s) .

Once you've creat ed a dom ain, you can define colum ns of t hat t ype. For exam ple, to add a hom e phone num ber to t he customers t able, use t he
com m and:

ALTER TABLE customers ADD COLUMN home_phone phone_number;

Now here's t he payoff. Once you've defined a dom ain ( and all of t he const raint s you want to apply t o t he dom ain) , you can use t he dom ain in
m ult iple t ables, or m any t im es in t he sam e t able. To add two m ore phone num bers t o t he customers t able, use t his com m and:

Code View: Scroll / Show All

ALTER TABLE customers ADD COLUMN cell_phone phone_number, work_phone phone_number;

You've defined t he const raint s once, but you've creat ed t hree colum ns t hat enforce t hose const raint s. I f you don't define a phone_number
dom ain, you'll have to specify t he const raint s every tim e you add a phone num ber to a t able.

When should you define a dom ain? Any t im e you st ore t he sam e kind of obj ect in m ult iple t ables ( or m any t im es in t he sam e t able) . You should
also define a dom ain for any colum n t hat part icipat es in a PRIMARY/FOREIGNKEY relat ionship. For exam ple,t he rentals t able cont ains two foreign
keys: The rentals.customer_id colum n refers t o customer.customer_id and rentals.tape_id refers t o tapes.tape_id ( for t he sake of
sim plicit y, we haven't act ually defined PRIMARY/FOREIGN KEY const raint s in t he sam ple dat a for t his book) . Given these relat ionships, it 's clear
t hat t he dat a type of rentals.customer_id m ust be ident ical t o t he data t ype of customer.customer_id ( and t hat rentals.tape_id and
tapes.tape_id m ust have t he sam e t ype) . The safest way to ensure t hat t he dat a t ypes m at ch is t o creat e a customer_id dom ain and a
tape_id dom ain and define t he key colum ns using t hose t ypes, as shown in t he following sequence of com m ands:

movies=# CREATE DOMAIN tape_id AS CHARACTER(8);


CREATE DOMAIN
movies=# CREATE DOMAIN customer_id AS INTEGER;
CREATE DOMAIN
movies=# ALTER TABLE customers ALTER COLUMN customer_id TYPE customer_id;
ALTER TABLE
movies=# ALTER TABLE rentals ALTER COLUMN customer_id TYPE customer_id;
ALTER TABLE
movies=# ALTER TABLE tapes ALTER COLUMN tape_id TYPE tape_id;
ALTER TABLE
movies=# ALTER TABLE rentals ALTER COLUMN tape_id TYPE tape_id;
ALTER TABLE

Not ice t hat t he tape_id and customer_id dom ains are unconst rained. You don't have to at t ach const raint s to a dom ain—an unconst rained
dom ain is st ill useful because it defines a logical dat a t ype. I n fact , you can at t ach new const raint s to a dom ain lat er ( or change exist ing
const raint s) using t he ALTER DOMAIN com m and and Post greSQL will ensure t hat exist ing dat a conform s to t he new const raint s ( you'll be
rewarded wit h an error m essage if you have any dat a t hat fails t o sat isfy t he new const raint s) .

The com plet e synt ax for CREATE DOMAIN is shown here:

CREATE DOMAIN name [AS] data_type


[ DEFAULT expression ]
[ constraint [ ... ] ]

where constraint is one or more of the following:


[ CONSTRAINT constraint_name ] NULL
[ CONSTRAINT constraint_name ] NOT NULL
[ CONSTRAINT constraint_name ] CHECK( expression )

I f you include a DEFAULT expression clause, t he given value becom es t he default for any colum ns of t ype name. I n ot her words, if you om it a
colum n of type name in an INSERT com m and, Post greSQL insert s expression inst ead of t he usual NULL value. The default value should sat isfy
any const raint s t hat you at t ach t o t he dom ain.

Once you've creat ed a colum n whose t ype is defined by a dom ain, you can t reat t hat colum n in t he sam e way you would t reat any ot her colum n
of t he base dat a t ype. For exam ple, if you define a dom ain whose base t ype is CHARACTER, you can insert st ring values using t he sam e synt ax
you would use for values of t ype CHARACTER. You can also creat e indexes t hat include dom ain values. A dom ain is a refinem ent of som e ot her
data t ype.

Cr e a t in g a n d U si n g Co m p o si t e T y p e s

One of t he new and powerful feat ures int roduced in Post greSQL version 8.0 is t he com posit e dat a t ype. A com posit e t ype is a dat a t ype
com posed of one or m ore nam ed fields. For exam ple, you m ight want to creat e a com posit e type nam ed address com posed of a st reet num ber,
cit y, st at e/ province, and post al code. The following com m and will do j ust t hat :

movies=# CREATE TYPE address AS


movies-# (
movies(# street_number VARCHAR,
movies(# city VARCHAR,
movies(# state CHAR(2),
movies(# postal_code VARCHAR
movies(# );

Once you've defined a com posit e t ype, you can creat e colum ns based on t he new t ype. When you add a colum n of com posit e t ype, you're adding
a single field t hat happens to be com posed of m ult iple fields. For exam ple, to add an address t o t he customers t able, execut e t he following
com m and:

movies=# ALTER TABLE customers ADD COLUMN home_address address;


ALTER TABLE

Now t ake a look at t he definit ion of t he customers t able:

movies=# \d customers;
Table "public.customers"
Column | Type | Modifiers
---------------+-----------------------+-----------
customer_id | integer | not null
customer_name | character varying(50) | not null
phone | character(8) |
birth_date | date |
balance | numeric(7,2) |
home_address | address |
Indexes:
"customers_customer_id_key" UNIQUE, btree (customer_id)

The ALTER TABLE com m and added a single field nam ed home_address. So what happened t o t he street_number, city, state, and postal_code
fields? They're inside t he home_address colum n. Let 's fill in t he home_address for one of your customers :

Code View: Scroll / Show All

movies=# UPDATE customers


movies-# SET home_address = ( '200 Main Street', 'Springfield', 'CA', '90210' )
movies-# WHERE customer_id = 3;

Not ice t hat t he com posit e value t hat you're insert ing is enclosed in parent heses. There are t hree ways t hat you can writ e a com posit e lit eral. The
easiest m ethod is t he one you've j ust seen—sim ply enclose t he com ponent values in a set of parent heses. I f you only have a single com ponent to
INSERT ( t hat is, you're insert ing default values for t he ot her com ponent s) , you m ust use t he row const ruct or form inst ead:

ROW( '200 Main Street' )

You can also writ e t he com posit e value as a st ring:

'(200 Main Street,Springfield,CA,90210)'

The row const ruct or form is t ypically t he easiest way to build a com posit e lit eral because you don't have t o worry about doubling- up any
em bedded quot es.

Now t ake a look at what Post greSQL st ored in t he row you j ust added:

movies=# SELECT customer_name, home_address FROM customers;


customer_name | home_address
----------------------+------------------------------------------
Jones, Henry |
Wonderland, Alice N. |
Rubin, William |
Panky, Henry | ("200 Main Street",Springfield,CA,90210)
(4 rows)

You see a single colum n ( home_address) wit h four values in it . How do you get to t he individual com ponent s in t he home_address colum n?
Sim ply refer t o ( columnName) .fieldname, like t his:

movies=# SELECT customer_name, (home_address).postal_code FROM customers;


customer_name | postal_code
----------------------+-------------
Jones, Henry |
Wonderland, Alice N. |
Rubin, William |
Panky, Henry | 90210
(4 rows)

That com m and ret rieved t he postal_code com ponent of t he home_address colum n. The parent heses are required because t he Post greSQL
parser can't t ell if home_address.postal_code refers t o a colum n ( postal_code) wit hin a t able ( home_address) or a field ( postal_code) wit hin a
com posit e colum n ( home_address).

Of course you can UPDATE a single com ponent in a com posit e colum n as well:

movies=# UPDATE customers


movies-# SET home_address.postal_code = '94404'
movies-# WHERE customer_id = 3;

Not ice t hat you can't include t he parent heses around home_address in t his case. Why? Because t he parser would never expect to see a t able
nam e following t he word SET and t herefore can't m ist ake home_address as t he nam e of a t able.

You can't easily [ 20] creat e an index on a com posit e colum n, but you can creat e an index on an individual com ponent ( or on m ult iple com ponent s)
even t hough t he synt ax is a bit m yst erious. To creat e an index on home_address.city plus home_address.state, use t he following com m and:

[ 20] You can creat e an index on a com posit e colum n, but you'll have t o define an index operat or class for each com posit e t ype—it 's
m uch easier to creat e an index on each com ponent inst ead.

movies=# CREATE INDEX customer_location ON customers


movies-# (
movies(# (( home_address ).city ),
movies(# (( home_address ).state )
movies(# );

Take careful not e of t he parent heses—when you creat e an index on a field wit hin a com posit e colum n, you m ust use t he synt ax
'(( columnName ). fieldname ) '.

I f you ask psql t o display t he layout of a t able t hat cont ains a com posit e colum n, you won't see t he com ponent fields list ed:

Code View: Scroll / Show All

movies=# \d customers;
Table "public.customers"
Column | Type | Modifiers
---------------+-----------------------+-----------
customer_id | integer | not null
customer_name | character varying(50) | not null
phone | character(8) |
birth_date | date |
balance | numeric(7,2) |
home_address | address |
Indexes:
"customers_customer_id_key" UNIQUE, btree (customer_id)
"customer_location" btree (((home_address).city), ((home_address).state))
To see t he definit ion of a com posit e t ype, use t he com m and \d typename:

movies=# \d address
Composite type "public.address"
Column | Type
---------------+-------------------
street_number | character varying
city | character varying
state | character(2)
postal_code | character varying

There are a few rest rict ions on com posit e t ypes in Post greSQL version 8.0. You can't at t ach const raint s to a com posit e type. That 's really not a
problem because you can at t ach const raint s t o a dom ain and define a com posit e type t hat uses t he dom ain. You can't create a dom ain whose
base type is a com posit e t ype, but you can creat e a com posit e t ype t hat includes a dom ain. You can creat e a com posit e t ype t hat cont ains fields
of com posit e type ( m eaning t hat you can nest one com posit e t ype wit hin anot her) . Nest ing com posit e t ypes can cause som e confusion when you
need to creat e a lit eral value of t he out erm ost t ype ( you need a lot of parent heses) .

Overall, com posit e t ypes t ake you one st ep closer t o m odeling com plex real- world obj ect s inside of a Post greSQL dat abase. When you com bine
com posit e t ypes and dom ains, you have a powerful m echanism for enforcing const raint s on com plex obj ect s.
Su m m a r y

As you can see, Post greSQL offers a dat a t ype t o fit alm ost every need. I n t his chapt er, I 've described each dat a t ype
included in a st andard Post greSQL dist ribut ion. The synt ax for lit eral values m ay seem a bit cont rived for som e of t he dat a
t ypes, but t he fact t hat Post greSQL allows you t o define new dat a t ypes requires a few concessions ( fort unat ely, very few) .

I 've list ed all t he st andard operat ors in t his chapt er because t hey are a bit under- docum ent ed in t he Post greSQL User's
Guide. Funct ions, on t he ot her hand, are well docum ent ed ( as well as const ant ly changing) —refer t o Chapt er 4 of t he
Post greSQL User's Guide for an up- to-dat e list of funct ions.

I n Chapt er 3, we'll explore a variet y of t opics t hat should round out your knowledge of Post greSQL from t he perspect ive of a
user. Lat er chapt ers will cover Post greSQL program m ing and Post greSQL adm inist rat ion.
Cha pt e r 3 . Post gr e SQL SQL Synt a x a nd Use

Th e f ir st t w o ch ap t er s ex plor ed t h e basics of t h e SQL lan g u ag e an d look ed at t h e dat a t y p es su p p or t ed by Post g r eSQL. Th is ch ap t er


cov er s a v ar iet y of t opics t h at sh ou ld r ou n d ou t y ou r k n ow led g e of Post g r eSQL.

We' ll st ar t by look in g at t h e r u les t h at y ou h av e t o f ollow w h en ch oosin g n am es f or t ables, colu m n s, in d ex es, an d su ch . Nex t , y ou ' ll
see h ow t o cr eat e, dest r oy , an d v iew Post g r eSQL d at ab ases. I n Ch ap t er 1 , " I n t r odu ct ion t o Post g r eSQL an d SQL, " y ou cr eat ed a f ew
sim ple t ables; in t h is ch apt er , y ou ' ll lear n all t h e det ails of t h e CREATE TABLE com m an d . I 'll also t alk ab ou t in dex es. I 'll f in ish u p b y
t alk in g ab ou t t r an sact ion pr ocessin g an d lock in g. I f y ou ar e f am iliar w it h Sy b ase, DB2 , or Micr osof t SQL Ser v er , I t h in k y ou ' ll f in d
t h at t he lock in g m od el u sed by Post g r eSQL is a r ef r esh in g ch an g e.

Post gr e SQL N a m in g Ru le s

Wh en y ou cr eat e an obj ect in Post gr eSQL, y ou g iv e t h at ob j ect a n am e. Ev er y t ab le h as a n am e, ev er y colu m n h as a n am e, an d so


on . Post g r eSQL u ses a sin gle dat a t y p e t o d ef in e all obj ect n am es: t h e name t y p e.

A v alu e of t y p e name is a st r in g of 6 3 or f ew er ch ar act er s [ 1] . A name m u st st ar t w it h a let t er or an u n d er scor e; t h e r est of t h e st r in g


can con t ain let t er s, digit s, an d u n d er scor es.

[ 1]
You can in cr ease t h e len gt h of t h e n am e dat a t y p e by ch an g in g t h e v alu e of t h e NAMEDATALEN sy m b ol bef or e
com pilin g Post g r eSQL.

I f y ou ex am in e t he en t r y cor r esp on d in g t o name in t h e pg_type t able, y ou w ill f in d t h at a name is r eally 6 4 ch ar act er s lon g. Becau se
t he name t y p e is u sed in t er n ally by t he Post g r eSQL en gin e, it is a n u ll- t er m in at ed st r in g. So, t h e m ax im u m len gt h of a name v alu e is
6 3 ch ar act er s. You can en t er m or e t h an 6 3 ch ar act er s for an obj ect n am e, b u t Post g r eSQL st or es on ly t h e f ir st 63 ch ar act er s.

Bot h SQL an d Post g r eSQL r eser v e cer t ain w or d s an d n or m ally , y ou can n ot u se t h ose w or d s t o n am e ob j ect s. Ex am p les of r eser v ed
w or d s ar e

ANALYZE
BETWEEN
CHARACTER
INTEGER
CREATE

You can n ot cr eat e a t ab le n am ed INTEGER or a colu m n n am ed BETWEEN. A com p let e list of r eser v ed w or d s can be f ou n d in Ap p en d ix
B of t h e Post g r eSQL User ' s Gu id e.

I f y ou f in d t h at y ou n eed t o cr eat e an obj ect t h at d oes n ot m eet t h ese r u les, y ou can en close t h e n am e in dou ble qu ot es. Wr ap p in g a
n am e in q u ot es cr eat es a q u ot ed iden t if ier . For ex am p le, y ou cou ld cr eat e a t able w h ose n am e is "3.14159"—t he d ou b le q u ot es ar e
r equ ir ed, b u t ar e n ot act u ally a par t of t h e n am e ( t h at is, t h ey ar e n ot st or ed an d d o n ot cou n t again st t h e 63 - ch ar act er lim it ) . Wh en
y ou cr eat e an obj ect w h ose n am e m u st b e qu ot ed, y ou h av e t o in clu de t h e q u ot es n ot on ly w h en y ou cr eat e t h e obj ect , b u t ev er y
t im e y ou r ef er t o t h at obj ect . For ex am p le, t o select f r om t h e t able m en t ion ed pr ev iou sly , y ou w ou ld h av e t o w r it e

SELECT filling, topping, crust FROM "3.14159";

Her e ar e a f ew ex am p les of bot h v alid and in v alid n am es:

my_table -- valid
my_2nd_table -- valid
échéanciers -- valid: accented and non-Latin letters are allowed
"2nd_table" -- valid: quoted identifier
"create table" -- valid: quoted identifier
"1040Forms" -- valid: quoted identifier
2nd_table -- invalid: does not start with a letter or an underscore

Qu ot ed n am es ar e case- sen sit iv e. "1040Forms" an d "1040FORMS" ar e t w o dist in ct n am es. Un q u ot ed n am es ar e con v er t ed t o


low er case, as sh ow n h er e:

movies=# CREATE TABLE FOO( BAR INTEGER );


CREATE
movies=# CREATE TABLE foo( BAR INTEGER );
ERROR: Relation 'foo' already exists
movies=# \d
List of relations
Name | Type | Owner
------------------+-------+---------------
1040FORMS | table | bruce
1040Forms | table | sheila
customers | table | bruce
foo | table | bruce
rentals | table | bruce
tapes | table | bruce
(6 rows)
Th e n am es of all obj ect s m u st be u n iqu e w it h in som e scope. Ev er y d at ab ase m u st h av e a u n iq u e n am e; t he n am e of a sch em a m u st
be u n iqu e w it h in t h e scop e of a sin gle d at ab ase, t h e n am e of a t able m u st be u n iqu e w it h in t h e scop e of a sin gle sch em a, an d
colu m n n am es m u st be u n iq u e w it h in a t able. Th e n am e of an in d ex m u st be u n iq u e w it h in a d at ab ase.

Th e I m por t a n ce of t h e COMMENT Com m a n d

I f y ou ' v e b een a p r og r am m er ( or d at ab ase d ev elop er ) f or m or e t h an , say , t w o d ay s, y ou u n d er st an d t h e im p or t an ce of com m en t in g


y ou r code. A com m en t h elp s n ew d ev elop er s u n d er st an d h ow y ou r p r og r am ( or dat abase) is st r u ct u r ed . I t also h elps y ou r em em b er
w h at y ou w er e t h in k in g w h en y ou com e back t o w or k af t er a lon g w eek en d . I f y ou ' r e w r it in g p r oced u r al cod e ( in C, Jav a, PL/ p g SQL,
or w h at ev er lan g u ag e y ou pr ef er ) , y ou can in t er sper se com m en t s dir ect ly in t o y ou r code. I f y ou ' r e cr eat in g obj ect s in a Post g r eSQL
d at ab ase, w h er e do y ou st or e t h e com m en t s? I n t h e dat abase, of cou r se. Th e COMMENT com m an d let s y ou associat e a com m en t w it h
j u st abou t an y obj ect t h at y ou can d ef in e in a Post g r eSQL dat abase. Th e sy n t ax for t h e COMMENT com m an d is v er y sim ple:

COMMENT ON object-type object-name IS comment-text;

w h er e object-type an d object-name ar e t ak en f r om t h e f ollow in g:

DATABASE database-name
SCHEMA schema-name
TABLE table-name
COLUMN table-name.column-name
INDEX index-name
DOMAIN domain-name
TYPE data-type-name
VIEW view-name
CONSTRAINT constraint-name ON table-name
SEQUENCE sequence-name
TRIGGER trigger-name ON table-name

You can also def in e com m en t s for ot h er obj ect t y p es ( f u n ct ion s, oper at or s, r u les, ev en lan g u ag es) , b u t t h e obj ect t y p es t h at w e' v e
sh ow n h er e ar e t h e m ost com m on ( see t h e Post g r eSQL r ef er en ce d ocu m en t at ion for a com p let e list ) .

To ad d a com m en t t o a t able, for ex am p le, y ou w ou ld ex ecu t e a com m an d su ch as

COMMENT ON TABLE customers IS 'List of active customers';

You can on ly st or e on e com m en t per ob j ect —if y ou COMMENT ON an obj ect t w ice, t h e secon d com m en t r eplaces t h e fir st . To d r op a
com m en t , ex ecu t e a COMMENT com m an d , b u t specif y NULL in p lace of t h e comment-text st r in g, lik e t h is:

COMMENT ON TABLE customers IS NULL;

On ce y ou h av e ad d ed a com m en t t o an obj ect , y ou can v iew t h e com m en t ( in psql) u sin g t he com m an d \dd object-name-pattern,
lik e t h is:

movies=# \dd customers


Object descriptions
Schema | Name | Object | Description
--------+-----------+--------+--------------------------
public | customers | table | List of active customers
(1 row)

Th e \dd com m an d w ill sh ow y ou an y com m en t ed obj ect w h ose n am e m at ch es t h e object-name-pattern. Th e \dd com m an d w ill n ot
sh ow com m en t s t h at y ou ' v e assign ed t o a colu m n w it h in a t able. To see colu m n - r elat ed com m en t s, u se t h e com m an d \d+ [table-
name]. To see t he com m en t assign ed t o each d at ab ase, u se t he com m an d \l+.
Cr e a t i n g , D e st r o y i n g , a n d V i e w i n g D a t a b a s e s

Before y ou can do any t hing else w it h a Post gr eSQL dat abase, you m ust fir st cr eat e t he dat abase. Befor e y ou get t oo m uch furt her, it m ight
be a good idea t o see w her e a dat a base fit s int o t he over all schem e of Post gr eSQL. Figure 3.1 show s t he r elat ionships bet w een clust er s,
dat abases, schem as, and t ables.

Figu r e 3 .1 . Clu st e r s, da t a ba se s, sch e m a s, a n d t a ble s.

At t he highest lev el of t he Post gr eSQL st or age hier ar chy is t he clust er . A clust er is a collect ion of dat abases. Each clust er exist s wit hin a
single dir ect or y t r ee, and t he ent ir e clust er is ser viced by a single postmaster. A clust er is not nam ed—t here is no w ay t o r efer t o a clust er
wit hin Post gr eSQL, ot her t han by cont act ing t he postmaster ser vicing t hat clust er . The $PGDATA envir onm ent variable should point t o t he
root of t he clust er's dir ect or y t r ee. A clust er is ser viced by a single postmaster pr ocess. The postmaster list ens for connect ion r equest s
com ing from client applicat ions. When a connect ion r equest is r eceived ( and t he user's cr edent ials ar e aut hent icat ed) , t he postmaster
st art s a new ser ver pr ocess and connect s t he client t o t he server. A single client connect ion can only int eract w it h a single dat abase at any
given t im e ( but a client applicat ion can cer t ainly open m ult iple connect ions if it needs t o int er act w it h sever al dat abases sim ult aneously) . A
postmaster pr ocess can connect a client applicat ion t o any of t he dat abases in t he clust er ser viced by t hat post m ast er .

Four syst em t ables are shar ed bet w een all dat abases in a clust er: pg_group ( t he list of user gr oups) , pg_database ( t he list of dat abases
wit hin t he clust er) , pg_shadow ( t he list of v alid users) , and pg_tablespace ( t he list of t ablespaces) .

Each clust er cont ains one or m or e dat abases. Ev er y dat abase has a nam e t hat m ust follow t he nam ing r ules descr ibed in t he pr ev ious
sect ion. Dat abase nam es m ust be unique w it hin a clust er . A dat abase is a collect ion of schem as.

A schem a is a nam ed collect ion of t ables ( as w ell as funct ions, dat a t ypes, and oper at or s) . The schem a nam e m ust be unique w it hin a
dat abase. Table nam es, funct ion nam es, index nam es, t ype nam es, and oper at or s m ust be unique w it hin t he schem a. A schem a exist s
prim arily t o pr ovide a nam ing cont ex t . You can refer t o an obj ect in any schem a w it hin a single dat abase by pr efix ing t he obj ect nam e w it h
schema-name. For ex am ple, if you have a schem a nam ed bruce, you can cr eat e a t able w it hin t hat schem a as

CREATE TABLE bruce.ratings ( ... );


SELECT * FROM bruce.ratings;

Each connect ion has a schem a sear ch pat h. I f t he obj ect t hat y ou are r efer r ing t o is found on t he sear ch pat h, y ou can om it t he schem a
nam e. How ev er , because t able nam es ar e not required t o be unique wit hin a dat abase, y ou m ay find t hat t her e are t wo t ables w it h t he
sam e nam e w it hin your sear ch pat h ( or a t able m ay not be in y our sear ch pat h at all) . I n t hose cir cum st ances, y ou can include t he schem a
nam e t o r em ov e any am biguit y.

To view t he schem a search pat h, use t he com m and SHOW SEARCH_PATH:

movies=# SHOW SEARCH_PATH;


search_path
--------------
$user,public
(1 row)

The default sear ch pat h, show n her e, is $user,public. The $user part equat es t o your Post gr eSQL user nam e. For exam ple, if I connect t o
psql as user bruce, m y sear ch pat h is bruce,public. I f a schem a nam ed bruce does not exist , Post gr eSQL will j ust ignor e t hat part of t he
search pat h and m ov e on t o t he schem a nam ed public. To change t he sear ch pat h, use SET SEARCH_PATH TO:

movies=# SET SEARCH_PATH TO 'bruce','sheila','public';


SET

You cr eat e a new schem a w it h t he CREATE SCHEMA com m and and dest r oy a schem a w it h t he DROP SCHEMA com m and:

movies=# CREATE SCHEMA bruce;


CREATE SCHEMA

movies=# CREATE TABLE bruces_table( pkey INTEGER );


CREATE TABLE

movies=# \d
List of relations
Name | Schema | Type | Owner
----------------+--------+-------+-------
bruces_table | bruce | table | bruce
tapes | public | table | bruce
(2 rows)

movies=# DROP SCHEMA bruce;


ERROR: Cannot drop schema bruce because other objects depend on it
Use DROP ... CASCADE to drop the dependent objects too

movies=# DROP SCHEMA bruce CASCADE;


NOTICE: Drop cascades to table bruces_table
DROP SCHEMA

Not ice t hat y ou w on't be able t o dr op a schem a t hat is not em pt y unless y ou include t he CASCADE clause. Schem as are a r elat ively new
feat ure t hat fir st appeared in Post gr eSQL v er sion 7.3. Schem as are very useful. At m any sit es, y ou m ay need t o k eep a " developm ent "
syst em and a " pr oduct ion" syst em . You m ight consider keeping bot h syst em s in t he sam e dat abase, but in separ at e schem as. Anot her
( part icularly clev er ) use of schem as is t o separ at e financial dat a by y ear . For exam ple, y ou m ight w ant t o k eep one year's w or t h of dat a per
schem a. The t able nam es ( invoices, sales, and so on) r em ain t he sam e acr oss all schem as, but t he schem a nam e reflect s t he year t o
which t he dat a applies. You could t hen r efer t o dat a for 2001 as FY2001.invoices, FY2001.sales, and so on. The dat a for 2002 w ould be
st or ed in FY2002.invoices, FY2002.sales, and so on. This is a difficult pr oblem t o solv e w it hout schem as because Post gr eSQL does not
support cr oss- dat abase access. I n ot her w or ds, if y ou ar e connect ed t o dat abase movies, you can't access t ables st or ed in anot her
dat abase. St art ing w it h Post gr eSQL 7.3, y ou can k eep all y our dat a in a single dat abase and use schem as t o part it ion t he dat a.

When you cr eat e a schem a, y ou can specify an opt ional t ablespace—by default , t ables cr eat ed w it hin t he schem a will be st or ed in t he
schem a's t ablespace. We discuss t ablespaces in m ore det ail in t he next w it h t he CREATE SCHEMA sect ion.

T a b l e sp a ce s

St art ing w it h Post gr eSQL version 8.0, y ou can st or e dat abase obj ect s ( t ables and indexes) in alt er nat e locat ions using a new feat ur e called
a t ablespace. A t ablespace is a nam e t hat you give t o som e dir ect or y w it hin y our com put er 's filesyst em . Once y ou cr eat e a t ablespace ( w e'll
show you how in a m om ent ) , you can cr eat e schem as, t ables, and index es w it hin t hat t ablespace. A t ablespace is defined w it hin a single
clust er —all dat abases w it hin a clust er can refer t o t he sam e t ablespace.

To cr eat e a new t ablespace, use t he CREATE TABLESPACE com m and:

CREATE TABLESPACE tablespacename


[ OWNER username ]
LOCATION 'directory'

The tablespacename par am et er m ust sat isfy t he nor m al r ules for all ident ifier s; it m ust be 63 char act er s or shor t er and m ust st ar t w it h a
let t er ( or t he nam e m ust be quot ed) . I n addit ion, y ou can't cr eat e a t ablespace w hose nam e begins w it h t he char act er s 'pg_' since t hose
nam es are reserved for t he Post gr eSQL developm ent t eam . I f you om it t he OWNER username clause, t he new t ablespace is ow ned by t he
user execut ing t he CREATE TABLESPACE com m and. By default , y ou can't cr eat e an obj ect in a t ablespace unless y ou ar e t he ow ner of t hat
t ablespace ( or y ou ar e a Post gr eSQL super user ) . You can gr ant CREATE pr ivileges t o ot her user s w it h t he GRANT com m and ( see Chapt er 23,
" Securit y " for m or e infor m at ion on t he GRANT com m and) .

The int er est ing part of a CREATE TABLESPACE com m and is t he LOCATI ON 'dir ect or y' clause. The LOCATION clause includes a dir ect or y—
obj ect s creat ed w it hin t he t ablespace are st or ed in t hat dir ect or y. There ar e a few r ules t hat y ou m ust follow befor e y ou can cr eat e a
t ablespace:

€ You m ust be a Post gr eSQL super user

€ Post gr eSQL m ust be running on a sy st em t hat suppor t s sym bolic link s ( t hat m eans y ou can't cr eat e t ablespaces on a Window s host )

€ The directory m ust alr eady exist ( Post gr eSQL w on't cr eat e t he dir ect or y for y ou)

€ The directory m ust be em pt y

€ The directory nam e m ust be shor t er t han 991 char act er s

€ The directory m ust be ow ned by t he ow ner of t he post m ast er pr ocess ( t y pically a user nam ed post gr es)

I f all of t hose condit ions ar e sat isfied, Post gr eSQL cr eat es t he new t ablespace.

When you cr eat e a t ablespace, t he Post gr eSQL server per for m s a num ber of act ions behind t he scenes. First , t he perm issions on t he
directory ar e changed t o 700 ( r ead, w r it e, and execut e per m issions for t he dir ect or y ow ner , all ot her per m issions denied) . Next ,
Post gr eSQL cr eat es a single file nam ed PG_VERSION in t he given dir ect or y ( t he PG_VERSION file st or es t he version num ber of t he Post gr eSQL
server t hat cr eat ed t he t ablespace—if t he Post gr eSQL developer s change t he st ruct ure of a t ablespace in a fut ure version, PG_VERSION will
help any conv er sion t ools under st and t he st r uct ur e of an exist ing t ablespace) . I f t he per m ission change succeeds, Post gr eSQL adds a new
row t o t he pg_tablespace t able ( a clust er - wide t able) and assigns a new OI D ( obj ect - id) t o t hat r ow . Next , t he ser ver uses t he OI D t o
cr eat e a sym bolic link bet w een your clust er and t he giv en directory. For ex am ple, consider t he follow ing scenario:

movies# CREATE TABLESPACE mytablespace LOCATION '/fastDrive/pg';


CREATE TABLESPACE

movies# SELECT oid, spcname, spclocation


movies-# FROM
movies-# pg_tablespace
movies-# WHERE
movies-# spcname = 'mytablespace';
oid | spcname | spclocation
-------+--------------+--------------
34281 | mytablespace | /fastDrive/pg

I n t his case, Post gr eSQL assigned t he new t ablespace ( mytablespace) an OI D of 34281. Post gr eSQL creat es a sym bolic link t hat point s
from $PGDATA/pg_tblspc/34281 t o /fastDrive/pg. When you cr eat e an obj ect ( a t able or index) inside of t his t ablespace, t he obj ect is not
creat ed dir ect ly inside of t he /fastDrive/pg dir ect or y. I nst ead, Post gr eSQL creat es a subdir ect or y in t he t ablespace and t hen cr eat es t he
obj ect wit hin t hat subdir ect or y. The nam e of t he subdir ect or y cor r esponds t o t he OI D of t he dat abase ( t hat is, t he obj ect - id of t he
dat abase's ent r y in t he pg_database t able) t hat holds t he new obj ect . I f y ou cr eat e a new t able w it hin t he mytablespace t ablespace, lik e
t his:

movies# CREATE TABLE foo ( data VARCHAR ) TABLESPACE mytablespace;


CREATE TABLE

Then find t he OI D of t he new t able and t he OI D of t he dat abase ( movies) :

movies# SELECT oid FROM pg_class WHERE relname = 'foo';


oid
-------
34282
(1 row)

movies# SELECT oid FROM pg_database WHERE datname = 'movies';


oid
-------
17228
(1 row)

You can see t he relat ionships bet w een t he t ablespace, t he dat abase subdir ect or y, and t he new t able:

Code View : Scroll / Show All

$ ls -l $PGDATA/pg_tblspc
total 0
lrwxrwxrwx 1 postgres postgres 12 Nov 9 19:31 34281 -> /fastDrive/pg

$ ls -l /fastDrive/pg
total 8
drwx------ 2 postgres postgres 4096 Nov 9 19:50 17228
-rw------- 1 postgres postgres 4 Nov 9 19:31 PG_VERSION

$ ls -l /fastDrive/pg/17228
total 0
-rw------- 1 postgres postgres 0 Nov 9 19:50 34282

Not ice t hat $PGDATA/pg_tblspc/34281 is a sym bolic link t hat point s t o /fastDrive/pg ( 34281 is t he OI D of mytablespace's ent ry in t he
pg_tablespace t able) , Post gr eSQL has creat ed a subdir ect or y ( 17228) for t he m ov ies dat abase, and t he t able nam ed foo w as cr eat ed in
t hat subdir ect or y ( in a file w hose nam e, 34282, cor r esponds t o t he t able's OI D) . By creat ing a subdir ect or y for each dat abase, Post gr eSQL
ensures t hat y ou can safely st or e obj ect s from m ult iple dat abases w it hin t he sam e t ablespace wit hout w orrying about OI D collisions.

When you creat e a clust er ( w hich is done for y ou aut om at ically w hen y ou inst all Post gr eSQL) , Post gr eSQL silent ly creat es t w o t ablespaces
for you: pg_default and pg_global. Post gr eSQL cr eat es obj ect s in t he pg_default t ablespace w hen it can't find a m ore appr opr iat e
t ablespace. The pg_default t ablespace is alw ays locat ed in t he $PGDATA/base dir ect or y. The pg_global t ablespace st or es clust er - wide
t ables like pg_database, pg_group, and pg_tablespace—you can't cr eat e obj ect s in t he pg_global t ablespace.

The nam e of t he pg_default t ablespace can be a bit m isleading. You m ay t hink t hat Post gr eSQL alw ays cr eat es an obj ect in pg_default if
you om it t he TABLESPACE tablespacename clause, but t hat 's not t he case. I nst ead, Post gr eSQL follow s an inher it ance hier ar chy t o find t he
appr opr iat e t ablespace. I f y ou specify a TABLESPACE tablespacename clause when y ou execut e a CREATE TABLE or CREATE INDEX
com m and, t he ser ver creat es t he obj ect in t he giv en tablespacename. I f y ou don't specify a t ablespace and you're creat ing an index , t he
index is creat ed in t he t ablespace of t he par ent t able ( t hat is, t he t able t hat you are indexing) . I f y ou don't specify a t ablespace and y ou'r e
creat ing a t able, t he t able is creat ed in t he t ablespace of t he par ent schem a. I f you are creat ing a schem a and y ou don't specify a
t ablespace, t he schem a is creat ed in t he t ablespace of t he par ent dat abase. I f you ar e creat ing a dat abase and y ou don't specify a
t ablespace, t he dat abase is creat ed in t he t ablespace of t he t em plat e dat abase ( t ypically, template1) . So, an index inherit s it s t ablespace
from t he par ent t able, a t able inher it s it s t ablespace fr om t he par ent schem a, a schem a inherit s it s t ablespace fr om t he par ent dat abase,
and a dat abase inherit s it s dat abase from t he t em plat e dat abase.

To view t he dat abases defined in a clust er, use t he \db ( or \db+) com m and in psql:

movies=# \db+
List of tablespaces
Name | Owner | Location | Access privileges
--------------+----------+------------------+------------------
mytablespace | postgres | /fastDrive/pg |
pg_default | postgres | |
pg_global | postgres | | {pg=C/pg}
(4 rows)

To see a list of obj ect s defined w it h a giv en t ablespace, use t he follow ing quer y :

SELECT relname FROM pg_class


WHERE reltablespace =
(
SELECT oid FROM pg_tablespace WHERE spcname = 'tablespacename'
);

Don't confuse schem as and t ablespaces—t hey bot h pr ovide or ganizat ion for t he t ables and index es in your clust er, but t hey ar e definit ely
not t he sam e t hing. A t ablespace affect s t he physical or ganizat ion of dat a w it hin a clust er ( t hat is, it a t ablespace defines where your dat a is
st or ed) . A schem a affect s t he logical or ganizat ion of dat a w it hin a dat abase—a schem a affect s nam e resolut ion; a t ablespace does not . A
schem a act s as a part of a nam e; once y ou'v e creat ed an obj ect , you can ignore it s phy sical locat ion ( it s t ablespace) .

Cr e a t i n g N e w D a t a b a s e s

Now let 's see how t o cr eat e a new dat abase and how t o r em ove an exist ing one.

The synt ax for t he CREATE DATABASE com m and is

CREATE DATABASE database-name


[ WITH [ OWNER [=] {username|DEFAULT}]
[ TEMPLATE [=] {template-name|DEFAULT}]
[ ENCODING [=] {encoding|DEFAULT}]
[ TABLESPACE [=] tablespace ]]

As I m ent ioned earlier, t he database-name m ust follow t he Post gr eSQL nam ing r ules descr ibed ear lier and m ust be unique w it hin t he
clust er.

I f you don't include t he OWNER=username clause or y ou specify OWNER=DEFAULT, you becom e t he ow ner of t he dat abase. I f y ou are a
Post gr eSQL super user , you can cr eat e a dat abase t hat will be ow ned by anot her user using t he OWNER=username clause. I f y ou are not a
Post gr eSQL super user , you can st ill cr eat e a dat abase if y ou have t he CREATEDB privilege, but y ou cannot assign ow ner ship t o anot her user.
Chapt er 21, "Post gr eSQL Adm inist r at ion," descr ibes t he pr ocess of defining user pr ivileges.

The TEMPLATE=template-name clause is used t o specify a t em plat e dat abase. A t em plat e defines a st art ing point for a dat abase. I f y ou
don't include a TEMPLATE=template-name or you specify TEMPLATE=DEFAULT, t he dat abase nam ed template1 is copied t o t he new dat abase.
All t ables, view s, dat a t y pes, funct ions, and oper at or s defined in t he t em plat e dat abase ar e duplicat ed int o t he new dat abase. I f y ou add
obj ect s ( usually funct ions, oper at or s, and dat a t ypes) t o t he template1 dat abase, t hose obj ect s will be pr opagat ed t o any new dat abases
t hat y ou cr eat e based on template1. You can also t rim down a t em plat e dat abase if y ou w ant t o r educe t he size of new dat abases. For
exam ple, you m ight decide t o r em ove t he geom et r ic dat a t ypes ( and t he funct ions and oper at or s t hat support t hat t y pe) if y ou k now t hat
you w on't need t hem . Or, if y ou have a set of funct ions t hat are required by your applicat ion, y ou can define t he funct ions in t he template1
dat abase and all new dat abases will aut om at ically include t hose funct ions. I f y ou w ant t o cr eat e an as- dist ribut ed dat abase, y ou can use
template0 as y our t em plat e dat abase. The template0 dat abase is t he st ar t ing point for template1 and cont ains only t he st andar d obj ect s
included in a Post gr eSQL dist r ibut ion. You should not m ake changes t o t he template0 dat abase, but y ou can use t he template1 dat abase
t o pr ovide a sit e- specific set of default obj ect s.

You can use t he ENCODING=character-set clause t o choose an encoding for t he st ring values in t he new dat abase. An encoding det erm ines
how t he byt es t hat m ake up a st ring are int er pr et ed as char act er s. For exam ple, specifying ENCODING=SQL_ASCII t ells Post gr eSQL t hat
char act er s ar e st or ed in ASCI I form at , w her eas ENCODING=ISO-8859-8 r equest s ECMA- 121 Lat in/ Hebrew encoding. When you cr eat e a
dat abase, all char act er s st or ed in t hat dat abase ar e encoded in a single form at . When a client r et r ieves dat a, t he client / server pr ot ocol
aut om at ically conver t s bet w een t he dat abase encoding and t he encoding being used by t he client . Chapt er 22, " I nt ernat ionalizat ion and
Localizat ion," discusses encoding schem es in m ore det ail.

The TABLESPACE=tablespace-name clause t ells Post gr eSQL t hat y ou w ant t o cr eat e t he dat abase in an alt er nat e locat ion ( t hat is, t he
dat abase should not be cr eat ed in t he usual $PGDATA/base direct ory) . You m ust cr eat e a t ablespace befor e y ou can use it . I f y ou don't
include a TABLESPACE clause in t he CREATE DATABASE com m and, t he new dat abase is cr eat ed in t he sam e t ablespace as t he t em plat e
dat abase.

I f you're using an older version of Post gr eSQL ( older t han 8.0) , y ou can't use t ablespaces t o cr eat e a dat abase in a non - st andard locat ion.
I nst ead, y ou m ust use a feat ur e k now n as a locat ion. I n versions of Post gr eSQL older t han 8.0, t he last opt ion for t he CREATE DATABASE
com m and is t he LOCATION=path clause. I n m ost cases, y ou will nev er hav e t o use t he LOCATION opt ion, w hich is good because it 's a lit t le
st range.

I f you do have need t o use an alt er nat e locat ion, y ou w ill probably w ant t o specify t he locat ion by using an environm ent var iable. The
environm ent variable m ust be know n t o t he postmaster processor at t he t im e t he postmaster is st art ed and it should cont ain an absolut e
pat hnam e.

The LOCATION=path clause can be confusing. The path m ight be specified in t hree form s:

€ The pat h cont ains a /, but does not begin w it h a /—t his specifies a r elat iv e pat h

€ The pat h begins w it h a /—t his specifies an absolut e pat h

€ The pat h does not include a /

Relat ive locat ions are not allow ed by Post gr eSQL, so t he fir st form is invalid.

Absolut e pat hs ar e allow ed only if y ou defined t he C/ C+ + pr epr ocessor sy m bol " ALLOW_ABSOLUTE_DBPATHS" at t he t im e y ou com piled y our
copy of Post gr eSQL. I f y ou ar e using a pr ebuilt version of Post gr eSQL, t he chances ar e pr et t y high t hat t his sym bol w as not defined and
t herefore absolut e pat hs ar e not allow ed.

So, t he only form t hat y ou can r ely on in a st andar d dist r ibut ion is t he last —a pat h t hat does not include any " /" char act er s. At fir st glance,
t his m ay look like a r elat iv e pat h t hat is only one lev el deep, but t hat 's not how Post gr eSQL sees it . I n t he t hird for m , t he pat h m ust be t he
nam e of an env ir onm ent var iable. As I m ent ioned earlier, t he environm ent variable m ust be know n t o t he postmaster processor at t he t im e
t he postmaster is st art ed, and it should cont ain an absolut e pat hnam e. Let 's look at an exam ple:

$ export PG_ALTERNATE=/bigdrive/pgdata
$ initlocation PG_ALTERNATE
$ pg_ctl restart -l /tmp/pg.log -D $PGDATA
...
$ psql -q -d movies
movies=# CREATE DATABASE bigdb WITH LOCATION=PG_ALTERNATE;
...

First , I 've defined ( and export ed) an env ir onm ent variable nam ed PG_ALTERNATE. I 've defined PG_ALTERNATE t o have a value
of /bigdrive/pgdata—t hat 's w her e I w ant m y new dat abase t o r eside. Aft er t he env ir onm ent variable has been defined, I need t o init ialize
t he direct ory st r uct ur e—t he initlocation script w ill t ak e car e of t hat for m e. Now I have t o r est ar t t he postmaster so t hat it can see t he
PG_ALTERNATE variable. Finally, I can st ar t psql ( or som e ot her client ) and execut e t he CREATE DATABASE com m and specifying t he
PG_ALTERNATE env ir onm ent variable.

This all sounds a bit convolut ed, and it is. The Post gr eSQL developer s consider it a securit y r isk t o allow users t o cr eat e dat abases in
arbit rary locat ions. Because t he postmaster m ust be st ar t ed by a Post gr eSQL adm inist r at or , only an adm inist r at or can choose w her e
dat abases can be cr eat ed. So, t o sum m ar ize t he pr ocess:

1. Creat e a new env ir onm ent variable and set it t o t he pat h w her e y ou w ant new dat abases t o reside.

2. I nit ialize t he new dir ect or y using t he initlocation applicat ion.

3. St op and rest art t he postmaster.

4. Now, y ou can use t he env ir onm ent variable w it h t he LOCATION=path clause.

createdb

The CREATE DATABASE com m and cr eat es a new dat abase from w it hin a Post gr eSQL client applicat ion ( such as psql) . You can also cr eat e a
new dat abase fr om t he operat ing sy st em com m and line. The createdb com m and is a shell script t hat invokes psql for you and execut es
t he CREATE DATABASE com m and for y ou. For m ore infor m at ion about createdb, see t he Post gr eSQL Refer ence Manual or invoke createdb
w it h t he --help flag:

$ createdb --help
createdb creates a PostgreSQL database.

Usage:
createdb [OPTION]... [DBNAME] [DESCRIPTION]

Options:
-D, --tablespace=TABLESPACE default tablespace for the database
-E, --encoding=ENCODING encoding for the database
-O, --owner=OWNER database user to own the new database
-T, --template=TEMPLATE template database to copy
-e, --echo show the commands being sent to the server
-q, --quiet don't write any messages
--help show this help, then exit
--version output version information, then exit

Connection options:
-h, --host=HOSTNAME database server host or socket directory
-p, --port=PORT database server port
-U, --username=USERNAME user name to connect as
-W, --password prompt for password

By default, a database with the same name as the current user is created.

Report bugs to <pgsql-bugs@postgresql.org>.


D r o p p in g a D a t a b a se

Get t ing r id of an old dat abase is easy . The DROP DATABASE com m and will delet e all of t he dat a in a dat abase and r em ov e t he dat abase fr om
t he clust er.

For exam ple:

movies=# CREATE DATABASE redshirt;


CREATE DATABASE
movies=# DROP DATABASE redshirt;
DROP DATABASE

There ar e no opt ions t o t he DROP DATABASE com m and; y ou sim ply include t he nam e of t he dat abase t hat y ou w ant t o r em ove. There are a
few r est r ict ions. First , y ou m ust ow n t he dat abase t hat y ou are t rying t o dr op, or y ou m ust be a Post gr eSQL super user . Next , y ou cannot
dr op a dat abase fr om w it hin a t r ansact ion block —you cannot roll back a DROP DATABASE com m and. Finally, t he dat abase m ust not be in
use, even by y ou. This m eans t hat befor e y ou can dr op a dat abase, y ou m ust connect t o a differ ent dat abase ( template1 is a good
candidat e) . An alt er nat ive t o t he DROP DATABASE com m and is t he dropdb shell script . dropdb is sim ply a w r apper around t he DROP
DATABASE com m and; see t he Post gr eSQL Refer ence Manual for m or e infor m at ion about dropdb.

V ie w in g D a t a b a se s

Using psql, t here are t wo w ay s t o view t he list of dat abases. Fir st , y ou can ask psql t o sim ply display t he list of dat abases and t hen ex it .
The -l opt ion does t his for you:

$ psql -l
List of databases
Name | Owner | Encoding
-----------+-------------+----------
template0 | postgres | UNICODE
template1 | postgres | UNICODE
movies | bruce | UNICODE
(3 rows)
$

From w it hin psql, you can use t he \l or \l+ m et a- com m ands t o display t he dat abases w it hin a clust er :

movies=# \l+
List of databases
Name | Owner | Encoding | Description
-----------+---------------+----------+--------------------------
template0 | postgres | UNICODE |
template1 | postgres | UNICODE | Default template database
movies | bruce | UNICODE | Virtual Video database
(3 rows)
Cr e a t ing N e w Ta ble s

Th e p r ev iou s sect ion d escr ib ed h ow t o cr eat e an d d r op d at ab ases. Now let ' s m ov e d ow n on e lev el in t h e Post g r eSQL st or age h ier ar ch y an d t alk ab ou t
cr eat in g an d d r op p in g t ab les.

You ' v e cr eat ed som e sim p le t ables in t h e f ir st t w o ch ap t er s; it ' s t im e t o t alk ab ou t som e of t h e m o r e ad v an ced f eat u r es of t h e CREATE TABLE com m an d .
Her e is t h e com m an d t h at y ou u sed t o cr eat e t h e customers t able:

CREATE TABLE customers (


customer_id INTEGER UNIQUE,
customer_name VARCHAR(50),
phone CHAR(8),
birth_date DATE,
balance DECIMAL(7,2)
);

Th is co m m an d cr eat es a p er m an en t t able n am ed customers. A t able n am e m u st m eet t h e n am in g cr it er ia d escr ib ed ear lier in t h is ch ap t er . Wh en y ou


cr eat e a t able, Post g r eSQL au t om at ically cr eat es a n ew d at a t y p e [ 2 ] w it h t h e sam e n am e as t h e t able. Th is m ean s t h at y ou can ' t cr eat e a t ab le w h ose
n am e is t h e sam e as an ex ist in g d at a t y p e.

[ 2]
Th is seem s t o b e a h old ov er f r om ear lier day s. You can ' t act u ally d o an y t h in g w it h t h is d at a t y p e.

Wh en y ou ex ecu t e t h is com m an d , t h e customers t able is cr eat ed in t h e d at ab ase t h at y ou ar e con n ect ed t o. I f y ou ar e u sin g Post g r eSQL 7 . 3 or lat er , t h e
customers t able is cr eat ed in t h e f ir st sch em a in y ou r sear ch pat h . ( I f y ou ar e u sin g a v er sion old er t h an 7 . 3 , y ou r cop y of Post g r eSQL d oes n ot su p p or t
sch em as) . I f y ou w an t t h e t able t o be cr eat ed in som e ot h er sch em a, y ou can pr ef ix t h e t ab le n am e w it h t h e sch em a qu alif ier , f or ex am p le:

CREATE TABLE joes_video.customers( ... );

Th e n ew t able is ow n ed by y ou . You can ' t g iv e ow n er sh ip t o an ot h er u ser at t h e t im e y ou cr eat e t h e t able, b u t y ou can ch an g e it lat er u sin g t h e ALTER
TABLE...OWNER TO com m an d ( d escr ib ed lat er ) .

Wh en y ou cr eat e a t able ( or an in dex ) , y ou can t ell Post g r eSQL t o st or e t h e obj ect in a specif ic t ablespace by in clu d in g a TABLESPACE tablespacename
clau se, lik e t h is:

CREATE TABLE joes_video.customers( ... ) TABLESPACE mytablespace;

I f y ou don ' t sp ecif y a t ab lesp ace, Post g r eSQL cr eat es t h e t ab le in t h e t ab lesp ace assig n ed t o t h e sch em a ( if y ou ' r e cr eat in g an in dex w it h ou t sp ecif y in g a
t ab lesp ace, t h e in dex is cr eat ed in t h e t ablespace of t h e p ar en t t able) .

Te m por a r y Ta ble s

I m en t ion ed ear lier t h at t h e customers t able is a p er m an en t t able. You can also cr eat e t em p or ar y t ables. A p er m an en t t able p er sist s af t er y ou t er m in at e
y ou r Post g r eSQL session ; a t em p or ar y t able is au t om at ically d est r oy ed w h en y ou r Post g r eSQL session en d s. Tem p or ar y t ables ar e also local t o y ou r
session , m ean in g t h at ot h er Post g r eSQL session s can ' t see t em p or ar y t ab les t h at y ou cr eat e. Becau se t em p or ar y t ab les ar e local t o each session , y ou don ' t
h av e t o w or r y ab ou t collidin g w it h t h e n am e of a t able cr eat ed b y an ot h er session .

I f y ou cr eat e a t em p or ar y t able w it h t h e sam e n am e as a p er m an en t t able, y ou ar e ef f ect iv ely h idin g t h e p er m an en t t able. For ex am p le, let 's cr eat e a
t em p or ar y t ab le t h at h id es t h e p er m an en t customers t able:

CREATE TEMPORARY TABLE customers (


customer_id INTEGER UNIQUE,
customer_name VARCHAR(50),
phone CHAR(8),
birth_date DATE,
balance DECIMAL(7,2)
);

Not ice t h at t h e on ly dif f er en ce b et w een t h is com m an d an d t h e com m an d t h at y ou u sed t o cr eat e t h e p er m an en t customers t able is t h e TEMPORARY k ey w or d
[ 3]
. Now y ou h av e t w o t ables, each n am ed customers. I f y ou n ow SELECT f r om or INSERT in t o t h e customers t able, y ou w ill be w or k in g w it h t h e t em p or ar y
t able. Pr ior t o v er sion 7 . 3 , t h er e w as n o w ay t o g et b ack t o t h e p er m an en t t ab le ex cep t by d r op p in g t h e t em p or ar y t ab le:

[ 3]
You can ab b r ev iat e TEMPORARY t o TEMP.

Cod e View : Scr oll / Sh ow All

movies=# SELECT * FROM customers;


customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
8 | Wink Wankel | 555-1000 | 1988-12-25 | 0.00
(5 rows)

movies=# CREATE TEMPORARY TABLE customers


movies-# (
movies(# customer_id INTEGER UNIQUE,
movies(# customer_name VARCHAR(50),
movies(# phone CHAR(8),
movies(# birth_date DATE,
movies(# balance DECIMAL(7,2)
movies(# );
CREATE
movies=# SELECT * FROM customers;
customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
(0 rows)

movies=# DROP TABLE customers;


DROP

movies=# SELECT * FROM customers;


customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
8 | Wink Wankel | 555-1000 | 1988-12-25 | 0.00
(5 rows)

St ar t in g w it h r elease 7 . 3 , y ou can access t h e p er m an en t t ab le by in clu din g t h e n am e of t h e sch em a w h er e t h e p er m an en t t able r esides.

A t em p or ar y t able is lik e a scr at ch p ad . You can u se a t em p or ar y t able t o accu m u lat e in t er m ed iat e r esu lt s. Qu it e of t en , y ou w ill f in d t h at a com p lex q u er y
can be f or m u lat ed m or e easily by f ir st ex t r act in g t h e d at a t h at in t er est s y ou int o a t em p or ar y t able. I f y ou f in d t h at y ou ar e cr eat in g a giv en t em p or ar y
t able ov er an d ov er again , y ou m ig h t w an t t o con v er t t h at t able in t o a v iew . See t h e sect ion t it led " Usin g View s" in Ch apt er 1 f or m or e in f or m at ion ab ou t
v iew s.

Ta ble Const r a int s

I n Ch ap t er 2 , " Wor k in g w it h Dat a in Post g r eSQL, " w e ex p lor ed t h e v ar iou s con st r ain t s t h at y ou can ap p ly t o a colu m n : NOT NULL, UNIQUE, PRIMARY KEY,
REFERENCES, an d CHECK(). You can also ap p ly con st r ain t s t o a t able as a w h ole or t o g r ou p s of colu m n s w it h in a t ab le.

Fir st , let ' s look at t h e CHECK() con st r ain t . Th e sy n t ax f or a CHECK() con st r ain t is

[CONSTRAINT constraint-name] CHECK( boolean-expression )

Wh en y ou d ef in e a CHECK() con st r ain t f or a t able, y ou ar e t ellin g Post g r eSQL t h at an y in ser t ion s or u p d at es m ad e t o t h e t able m u st sat isf y t h e boolean-
expression giv en w it h in t h e con st r ain t . Th e d if f er en ce b et w een a colu m n con st r ain t an d a t able con st r ain t is t h at a colu m n con st r ain t sh ou ld r ef er on ly t o
t h e colu m n t o w h ich it r elat es. A t able con st r ain t can r efer t o an y colu m n in t h e t able.

For ex am p le, su p p ose t h at y ou h ad an orders t able t o t r ack cu st om er or d er s:

CREATE TABLE orders


(
customer_number INTEGER,
part_number CHAR(8),
quantity_ordered INTEGER,
price_per_part DECIMAL(7,2)
);

You cou ld cr eat e a t able- r elat ed CHECK() con st r ain t t o en su r e t h at t h e ex t en d ed pr ice ( t h at is, quantity_ordered t im es price_per_part) of an y giv en
or der is at least $ 5 . 0 0 :

CREATE TABLE orders


(
customer_number INTEGER,
part_number CHAR(8),
quantity_ordered INTEGER,
price_per_part DECIMAL(7,2),

CONSTRAINT verify_minimum_order
CHECK (( price_per_part * quantity_ordered) >= 5.00::DECIMAL )
);

Each t im e a r ow is in ser t ed in t o t h e or d er s t able ( or t h e quantity_ordered or price_per_part colu m n s ar e u pdat ed) , t h e verify_minimum_order


con st r ain t is ev alu at ed. I f t h e ex pr ession ev alu at es t o FALSE, t h e m odif icat ion is r ej ect ed . I f t h e ex p r ession ev alu at es t o TRUE or NULL, t h e m odif icat ion is
allow ed .

You m ay h av e n ot iced t h at a t able con st r ain t look s v er y m u ch lik e a colu m n con st r ain t . Post g r eSQL can t ell t h e dif f er en ce b et w een t h e t w o t y p es b y t h eir
p lacem en t w it h in t h e CREATE TABLE st at em en t . A colu m n con st r ain t is p laced w it h in a colu m n def in it ion —af t er t h e colu m n ' s d at a t y p e an d b ef or e t h e
com m a. A t able con st r ain t is list ed ou t side of a colu m n def in it ion . Th e on ly t r ick y sp ot is a t able con st r ain t t h at f ollow s t h e last colu m n def in it ion ; y ou
n or m ally w ou ld n ot in clu d e a com m a af t er t h e last colu m n . I f y ou w an t a con st r ain t t o b e t r eat ed as a t able con st r ain t , be su r e t o in clu d e a com m a
f ollow in g t h e last colu m n def in it ion . At t h e m om en t , Post g r eSQL d oes n ot t r eat t ab le con st r ain t s an d colu m n con st r ain t s dif f er en t ly , b u t in a f u t u r e r elease
it m ay .

Each of t h e t able con st r ain t v ar iet ies is r elat ed t o a t y p e of colu m n con st r ain t .

Th e UNIQUE t able con st r ain t is iden t ical t o t h e UNIQUE colu m n con st r ain t , ex cep t t h at y ou can sp ecif y t h at a g r ou p of colu m n s m u st b e u n iq u e. For ex am p le,
h er e is t h e rentals t able as cu r r en t ly def in ed:

CREATE TABLE rentals


(
tape_id CHARACTER(8),
customer_id INTEGER,
rental_date DATE
);
Let ' s m od if y t h is t able t o r ef lect t h e b u sin ess r u le t h at an y giv en t ap e can n ot b e r en t ed t w ice o n t h e sam e d ay :

CREATE TABLE rentals


(
tape_id CHARACTER(8),
customer_id INTEGER,
rental_date DATE,

UNIQUE( rental_date, tape_id )

);

Now w h en y ou in ser t a r ow int o t h e rentals t able, Post g r eSQL w ill en su r e t h at t h er e ar e n o ot h er r ow s w it h t h e sam e com b in at ion of rental_date an d
tape_id. Not ice t h at I d id n ot p r ov id e a con st r ain t n am e in t h is ex am p le; con st r ain t n am es ar e opt ion al.

Th e PRIMARY KEY t able con st r ain t is iden t ical t o t h e PRIMARY KEY colu m n con st r ain t , ex cep t t h at y ou can specif y t h at t h e k ey is com p osed of a g r ou p of
colu m n s r at h er t h an a sin g le colu m n .

Th e REFERENCES t able con st r ain t is sim ilar t o t h e REFERENCES colu m n con st r ain t . Wh en y ou cr eat e a REFERENCES colu m n con st r ain t , y ou ar e t ellin g
Post g r eSQL t h at a colu m n v alu e in on e t able r ef er s t o a r ow in an ot h er t able. Mor e specif ically , a REFERENCES colu m n con st r ain t specif ies a r elat ion sh ip
b et w een t w o colu m n s. Wh en y ou cr eat e a REFERENCES t able con st r ain t , y ou can r elat e a g r ou p of colu m n s in on e t able t o a g r ou p of colu m n s in an ot h er
t able. Qu it e of t en , y ou w ill f in d t h at t h e u n iq u e iden t if ier f or a t able ( t h at is, t h e PRIMARY KEY) is com p osed of m u lt iple colu m n s. Let ' s say t h at t h e Vir t u al
Vid eo St or e is h av in g gr eat su ccess an d y ou d ecid e t o op en a secon d st or e. You m ig h t w an t t o con solid at e t h e d at a f or each st or e in t o a sin g le d at ab ase.
St ar t b y cr eat in g a n ew t able:

CREATE TABLE stores


(
store_id INTEGER PRIMARY KEY,
location VARCHAR
);

Now , ch an g e t h e def in it ion of t h e customers t able t o in clu d e a store_id f or each cu st om er :

CREATE TABLE customers (


store_id INTEGER REFERENCES stores( store_id ),
customer_id INTEGER UNIQUE,
customer_name VARCHAR(50),
phone CHAR(8),
birth_date DATE,
balance DECIMAL(7,2),

PRIMARY KEY( store_id, customer_id )


);

Th e store_id colu m n in t h e customers t able r ef er s t o t h e store_id colu m n in t h e stores t able. Becau se store_id is t h e p r im ar y k ey t o t h e stores t able,
y ou cou ld h av e w r it t en t h e REFERENCES con st r ain t in eit h er of t w o w ay s:

store_id INTEGER REFERENCES stores( store_id )

or

store_id INTEGER REFERENCES stores

Also, n ot ice t h at t h e p r im ar y k ey f or t h is t ab le is com p osed of t w o colu m n s: store_id an d customer_id. I can h av e t w o cu st om er s w it h t h e sam e


customer_id as lon g as t h ey h av e dif f er en t store_ids.

Now y ou h av e t o ch an g e t h e rentals t able as w ell:

CREATE TABLE rentals


(
store_id INTEGER,
tape_id CHARACTER(8),
customer_id INTEGER,
rental_date DATE,

UNIQUE( rental_date, tape_id )


FOREIGN KEY( store_id, customer_id ) REFERENCES customers
);

Th e customers t able h as a t w o - par t p r im ar y k ey . Each r o w in t h e r en t als t able r ef er s t o a r ow in t h e cu st om er s t able, so t h e FOREIGN KEY con st r ain t m u st
specif y a t w o - par t f or eig n k ey . Again , b ecau se f or eign k ey r ef er s t o t h e p r im ar y k ey of t h e cu st om er s t able, I can w r it e t h is con st r ain t in eit h er of t w o
f or m s:

FOREIGN KEY( store_id, customer_id )


REFERENCES customers( store_id, customer_id )

or

FOREIGN KEY( store_id, customer_id )


REFERENCES customers

Now t h at I h av e t h e r ef er en t ial in t egr it y con st r ain t s def in ed, t h ey w ill b eh av e as d escr ib ed in t h e " Colu m n Con st r ain t s" sect ion of Ch ap t er 2 . Rem em b er , a
t able con st r ain t f u n ct ion s t h e sam e as a colu m n con st r ain t , ex cep t t h at t able con st r ain t s can r ef er t o m o r e t h an on e colu m n .

D r opping Ta ble s

Dr oppin g a t able is m u ch easier t h an cr eat in g a t able. Th e sy n t ax f or t h e DROP TABLE com m an d is

DROP TABLE table-name [, ...];

I f y ou ar e u sin g Post g r eSQL 7 . 3 or lat er , y ou can qu alif y t h e t able n am e w it h a sch em a. For ex am p le, h er e is t h e com m an d t o d est r oy t h e rentals t able:

DROP TABLE rentals;

I f t h e rentals t able ex ist ed in som e sch em a ot h er t h an y ou r cu r r en t sch em a, y ou w ou ld q u alif y t h e t able n am e:

DROP TABLE sheila.rentals;

You can d est r oy a t ab le on ly if y ou ar e t h e t able' s ow n er or if y ou ar e a Post g r eSQL su p er u ser . Not ice t h at I u sed t h e w or d dest r oy h er e r at h er t h an d r op .
I t 's im p or t an t t o r ealize t h at w h en y ou ex ecu t e a DROP TABLE com m an d , y ou ar e dest r oy in g all t h e d at a in t h at t ab le.

Post g r eSQL h as a n ice f eat u r e t h at I h av e n ot seen in ot h er d at ab ases: You can r oll b ack a DROP TABLE com m an d . Tr y t h e f ollow in g ex p er im en t . Fir st , let 's
v iew t h e con t en t s of t h e tapes t able:

movies=# SELECT * FROM tapes;

tape_id | title | duration


----------+---------------+----------
AB-12345 | The Godfather |
AB-67472 | The Godfather |
MC-68873 | Casablanca |
OW-41221 | Citizen Kane |
AH-54706 | Rear Window |
(5 rows)

Now , st ar t a m u lt ist at em en t t r an sact ion an d d est r oy t h e tapes t able:

movies=# BEGIN WORK;


BEGIN

movies=# DROP TABLE tapes;

NOTICE: DROP TABLE implicitly drops referential integrity trigger


from table "rentals"
DROP

I f y ou t r y t o SELECT f r om t h e tapes t able, y ou ' ll f in d t h at it h as b een d est r oy ed :

movies=# SELECT * FROM tapes;


ERROR: Relation "tapes" does not exist

I f y ou COMMIT t h is t r an sact ion , t h e t able w ill p er m an en t ly d isap p ear ; let ' s ROLLBACK t h e t r an sact ion in st ead :

movies=# ROLLBACK;
ROLLBACK

Th e ROLLBACK t h r ew ou t all ch an g es m ad e sin ce t h e b eg in n in g of t h e t r an sact ion , in clu d in g t h e DROP TABLE com m an d . You sh ou ld be ab le t o SELECT f r om
t h e tapes t able ag ain an d see t h e sam e d at a t h at w as t h er e b ef or e:

movies=# SELECT * FROM tapes;


tape_id | title | duration
----------+---------------+----------
AB-12345 | The Godfather |
AB-67472 | The Godfather |
MC-68873 | Casablanca |
OW-41221 | Citizen Kane |
AH-54706 | Rear Window |
(5 rows)

Th is is a v er y n ice f eat u r e. You can r oll b ack CREATE TABLE, DROP TABLE, CREATE VIEW, DROP VIEW, CREATE INDEX, DROP INDEX, an d so on . I 'll discu ss
t r an sact ion s a bit lat er in t h is ch apt er . For n ow , I ' d lik e t o poin t ou t a f ew det ails t h at I g lossed ov er in t h e p r ev iou s ex am p le. You m ay h av e n ot iced t h at
t h e DROP TABLE com m an d p r od u ced a NOTICE.

movies=# DROP TABLE tapes;


NOTICE: DROP TABLE implicitly drops referential integrity trigger
from table "rentals"
DROP

Wh en y ou d r op a t ab le, Post g r eSQL w ill au t om at ically DROP an y in d ex es d ef in ed f or t h at t able as w ell as an y t r ig g er s or r u les. I f ot h er t ab les r ef er t o t h e
t able t h at y ou d r op p ed ( by m ean s of a REFERENCE con st r ain t ) , Post g r eSQL w ill au t om at ically d r op t h e con st r ain t s in t h e ot h er t ables. How ev er , an y v iew s
t h at r ef er t o t h e d r op p ed t able w ill n ot be r em ov ed —a v iew can r ef er t o m an y t ables an d Post g r eSQL w ou ld n ot k n ow h ow t o r em ov e a sin gle t able f r om a
m u lt it able SELECT.
I nhe r it a nce

An ot h er Post g r eSQL f eat u r e t h at is u n com m on in r elat ion al d at ab ase sy st em s is in h er it an ce . I n h er it an ce is on e of t h e f ou n d at ion s of t h e obj ect - or ien t ed
p r og r am m in g p ar ad ig m . Usin g in h er it an ce, y ou can d ef in e a h ier ar ch y of r elat ed d at a t y p es ( in Post g r eSQL, y ou def in e a h ier ar ch y of r elat ed t ables) . Each
lay er in t h e in h er it an ce h ier ar ch y r ep r esen t s a specializat ion of t h e lay er ab ov e it [ 4 ] .

[ 4]
We' ll v iew an in h er it an ce h ier ar ch y w it h t h e m ost g en er al t y p es at t h e t op an d t h e m ost sp ecialized t y p es at t h e b ot t om .

Let ' s look at an ex am p le. Th e Vir t u al Vid eo d at ab ase d ef in es a t ab le t h at st or es in f or m at ion ab ou t t h e t ap es t h at y ou h av e in st ock :

movies=# \d tapes
Column | Type | Modifiers
----------+-----------------------+-----------
tape_id | character(8) | not null
title | character varying(80) | not null
duration | interval |

For each t ape, y ou st or e t h e tape_id, title, an d duration. Let ' s say t h at y ou d ecid e t o j u m p in t o t h e t w en t y - fir st cen t u r y an d r en t DVDs as w ell as
v id eot ap es. You cou ld st or e DVD r ecor d s in t h e tapes t able, b u t a t ap e an d a D VD ar e n ot r eally t h e sam e t h in g . Let ' s cr eat e a n ew t able t h at d ef in es t h e
ch ar act er ist ics com m on t o b ot h DVDs an d v id eot ap es:

CREATE TABLE video


(
video_id CHARACTER(8) PRIMARY KEY,
title VARCHAR(80),
duration INTERVAL
);

Now , cr eat e a t able t o h old t h e DVDs. For each DVD y ou h av e in st ock , y ou w an t t o st or e ev er y t h in g in t h e video t able p lu s a region_id an d an ar r ay of
audio_tracks. Her e is t h e n ew t able def in it ion :

movies=# CREATE TABLE dvds


movies-# (
movies(# region_id INTEGER,
movies(# audio_tracks VARCHAR[]
movies(# ) INHERITS ( video );

Not ice t h e last lin e in t h is com m an d : You ar e t ellin g Post g r eSQL t h at t h e dvds t able in h er it s f r om t h e video t able. No w let ' s INSERT a n ew DVD:

movies=# INSERT INTO dvds VALUES


movies=# (
movies(# 'ASIN-750', -- video_id
movies(# 'Star Wars', -- title
movies(# '121 minutes', -- duration
movies(# 1, -- region_id
movies(# '{English,Spanish}' -- audio_tracks
movies(# );

Now , if y ou SELECT f r om t h e dvds t able, y ou ' ll see t h e in f or m at ion t h at y ou j u st in ser t ed:

video_id | title | duration | region_id | audio_tracks


----------+-----------+----------+-----------+-------------------
ASIN-750 | Star Wars | 02:01:00 | 1 | {English,Spanish}

At t h is p oin t , y ou m ig h t be t h in k in g t h at t h e INHERITS clau se did n ot h in g m o r e t h an cr eat e a r ow t em p lat e t h at Post g r eSQL cop ied w h en y ou cr eat ed t h e
dvds t able. Th at ' s n ot t h e case —if y ou sim p ly w an t t o cr eat e a t able t h at h as t h e sam e st r u ct u r e as an ot h er t able, u se t h e LIKE table-name clau se in st ead
of t h e INHERITS table-name clau se. Wh en w e say t h at dvds in h er it s f r om video, w e ar e n ot sim p ly say in g t h at a D VD is lik e a v id eo, w e ar e say in g t h at a
DVD is a v id eo. Let ' s SELECT f r om t h e video t able n ow ; r em em b er , y ou h av en ' t ex plicit ly in ser t ed an y d at a in t o t h e video t able, so y ou m ig h t ex p ect t h e
r esu lt set t o b e em p t y :

movies=# SELECT * FROM video;


video_id | title | duration
----------+-----------+----------
ASIN-750 | Star Wars | 02:01:00

A DVD is a v ideo. Wh en y ou SELECT f r om t h e video t able, y ou see on ly t h e colu m n s t h at com p r ise a video. Wh en y ou SELECT f r om t h e dvds t able, y ou see
all t h e colu m n s t h at com p r ise a DVD. I n t his r elat ion sh ip, y ou say t h at t h e dvds t able specializes [ 5 ] t h e m o r e g en er al video t able.

[ 5]
Ob j ect - or ien t ed t er m in olog y d ef in es m an y dif f er en t p h r ases f or t h is in h er it an ce r elat ion sh ip: specialize/ gen er alize, su bclass/ su per class,
an d so on . Ch oose t h e p h r ase t h at y ou lik e.

I f y ou ar e u sin g a v er sion of Post g r eSQL old er t h an 7 . 2 , y ou m u st cod e t h is q u er y as SELECT * FROM video* t o see t h e D VD en t r ies. St ar t in g w it h r elease
7 . 2 , SELECT w ill in clu d e d escen d en t t ables an d y ou h av e t o say SELECT * FROM ONLY video t o su p p r ess d escen d en t s.

You n ow h av e a n ew t ab le t o t r ack y ou r D VD in v en t or y ; let ' s g o b ack an d r edef in e t h e tapes t able t o f it in t o t h e in h er it an ce h ier ar ch y . For each t ape, w e
w an t t o st or e a video_id, a title, an d a duration. Th is is w h er e w e st ar t ed : t h e video t able alr ead y st or es all t his in f or m at ion . You sh ou ld st ill cr eat e a
n ew t ab le t o t r ack v ideot apes—at som e poin t in t h e f u t u r e, y ou m ay f in d in f or m at ion t h at r elat es t o a v ideot ape, bu t n ot t o a DVD:

movies=# CREATE TABLE tapes ( ) INHERITS( video );


CREATE

Th is CREATE TABLE com m an d cr eat es a n ew t able iden t ical in st r u ct u r e t o t h e video t able. Each r ow in t h e tapes t able w ill con t ain a video_id, a title,
an d a duration. I n ser t a r ow in t o t h e tapes t able:

movies=# INSERT INTO tapes VALUES


movies-# (
movies(# 'ASIN-8YD',
movies(# 'Flight To Mars(1951)',
movies(# '72 min'
movies(# );
INSERT

Wh en y ou SELECT f r om t h e tapes t able, y ou sh ou ld see t h is n ew r ow :

movies=# SELECT * FROM tapes;


tape_id | title | duration
----------+----------------------+----------
ASIN-8YD | Flight To Mars(1951) | 01:12:00
(1 row)

An d becau se a t ap e is a v id eo, y ou w ou ld also ex pect t o see t h is r ow in t h e v id eo t ab le:

movies=# SELECT * FROM video;


video_id | title | duration
----------+--------------------------------+----------
ASIN-750 | Star Wars | 02:01:00
ASIN-8YD | Flight To Mars(1951) | 01:12:00
(2 rows)

Now h er e' s t h e in t er est in g par t . A DVD is a v ideo—an y r o w t h at y ou ad d t o t h e dvds t able sh ow s u p in t h e video t able. A t ap e is a v ideo —an y r o w t h at y ou
ad d t o t h e tapes t able sh ow s u p in t h e video t able. Bu t a D VD is n ot a t ap e ( an d a t ap e is n ot a DVD) . An y r ow t h at y ou add t o t h e dvds t able w ill n ot
sh ow u p in t h e tapes t able ( an d v ice v er sa) .

I f y ou w an t a list of all t h e t ap es y ou h av e in st ock , y ou can SELECT f r om t h e tapes t able. I f y ou w an t a list of all t h e D VD s in st ock , SELECT f r om t h e dvds
t able. I f y ou w an t a list of all v ideos in st ock , SELECT f r om t h e videos t able.

I n t h is ex am p le, t h e in h er it an ce h ier ar ch y is on ly t w o lev els d eep . Post g r eSQL im p oses n o lim it t o t h e n u m b er of lev els t h at y ou can d ef in e in an
in h er it an ce h ier ar ch y . You can also cr eat e a t able t h at in h er it s f r om m u lt iple t ables—t h e n ew t able w ill h av e all t h e colu m n s d ef in ed in t h e m or e gen er al
t ables.

I sh ou ld cau t ion y ou ab ou t t w o p r ob lem s w it h t h e cu r r en t im p lem en t at ion of in h er it an ce in Post g r eSQL. Fir st , in d ex es ar e n ot sh ar ed b et w een p ar en t an d


ch ild t ables. On on e h an d , t h at ' s g ood b ecau se it giv es y ou g ood p er f or m an ce. On t h e ot h er h an d , t h at ' s b ad b ecau se Post g r eSQL u ses an in dex t o
gu ar an t ee u n iq u en ess. Th at m ean s t h at y ou cou ld h av e a v ideot ape an d a DVD w it h t h e sam e video_id. Of cou r se, y ou can w or k ar ou n d t h is p r ob lem by
en cod in g t h e t y pe of v id eo in t h e video_id ( f or ex am p le, u se a T for t ap es an d a D f or DVDs) . Bu t Post g r eSQL w on ' t giv e y ou an y h elp in f ix in g t h is
p r ob lem . Th e ot h er pot en t ial p r ob lem w it h in h er it an ce is t h at t r igger s ar e n ot sh ar ed b et w een p ar en t an d ch ild t ables. I f y ou d ef in e a t r igger f or t h e
t op m ost t able in y ou r in h er it an ce h ier ar ch y , y ou w ill h av e t o r em em b er t o d ef in e t h e sam e t r igger for each d escen d an t .

We h av e r ed ef in ed som e of t h e ex am p le t ables m an y t im es in t h e p ast t w o ch apt er s. I n a r eal- w or ld en v ir on m en t , y ou pr obably w on ' t w an t t o t h r ow ou t all


y ou r d at a each t im e y ou n eed t o m ak e a ch an g e t o t h e def in it ion of an ex ist in g t able. Let ' s ex p lor e a b et t er w ay t o alt er a t able.

ALTER TABLE

Now t h at y ou h av e a video t able, a dvds t able, an d a tapes t able, let ' s ad d a n ew colu m n t o all t h r ee t ables t h at y ou can u se t o r ecor d t h e r at in g of t h e
v ideo ( PG, G, R, an d so on ) .

You cou ld ad d t h e rating colu m n t o t h e tapes t able an d t o t h e dvds t able, b u t y ou r eally w an t t h e rating colu m n t o b e a p ar t of ev er y v ideo. Th e ALTER
TABLE ... ADD COLUMN com m an d ad d s a n ew colu m n f or y ou , leav in g all t h e or igin al d at a in place:

movies=# ALTER TABLE video ADD COLUMN rating VARCHAR;


ALTER

Now , if y ou look at t h e def in it ion of t h e video t able, y ou w ill see t h e n ew colu m n :

movies=# \d video
Table "video"
Column | Type | Modifiers
----------+-----------------------+-----------
video_id | character(8) | not null
title | character varying(80) |
duration | interval |
rating | character varying |
Indexes:
"video_pkey" PRIMARY KEY, btree (video_id)

Af t er t h e ALTER TABLE com m an d com p let es, each r ow in t h e video t able h as a n ew colu m n ; t h e v alu e of ev er y rating colu m n w ill b e NULL. Becau se y ou
h av e ch an g ed t h e def in it ion of a video, an d a DVD is a video, y ou m ig h t ex p ect t h at t h e dvds t able w ill also con t ain a rating colu m n :

movies=# \d dvds
Table "dvds"
Column | Type | Modifiers
--------------+-----------------------+-----------
video_id | character(8) | not null
title | character varying(80) |
duration | interval |
region_id | integer |
audio_tracks | character varying[] |
rating | character varying |
Inherits: video
Sim ilar ly , t h e tapes t able w ill also in h er it t h e n ew rating colu m n :

movies=# \d tapes
Table "tapes"
Column | Type | Modifiers
----------+-----------------------+-----------
video_id | character(8) | not null
title | character varying(80) |
duration | interval |
rating | character varying |
Inherits: video

St ar t in g w it h Post g r eSQL v er sion 8 . 0 , y ou can ch an g e t h e d at a t y pe of an ex ist in g colu m n u sin g ALTER TABLE. For ex am p le, t o ch an g e t h e d at a t y p e of t h e
customers.customer_id colu m n f r om INTEGER t o NUMERIC( 7, 2 ), y ou cou ld ex ecu t e t h e co m m an d :

ALTER TABLE customers ALTER COLUMN customer_id TYPE NUMERIC( 7,2 )

As lon g as Post g r eSQL k n ow s h ow t o con v er t a v alu e f r om t h e old d at a t y pe t o t h e n ew d at a t y pe, y ou can f r eely ch an g e d at a t y pes. I f Post g r eSQL d oesn ' t
k n ow h ow t o con v er t b et w een t h e old an d n ew t y pes, y ou can in clu d e a USING expression clau se t o t ell Post g r eSQL h ow t o p er f or m t h e con v er sion . Th e
expression f ollow in g t h e USING k ey w or d t y pically r ef er s t o t h e or igin al colu m n v alu e. For ex am p le, if y ou w an t t o ch an g e t h e d at a t y p e of
customers.customer_id an d m u lt iply each customer_id by 1 0 0 at t h e sam e t im e, u se t h e f ollow in g com m an d :

Cod e View : Scr oll / Sh ow All

ALTER TABLE customers ALTER COLUMN customer_id TYPE NUMERIC( 7,2 ) USING customer_id * 100

You can also r ef er t o ot h er colu m n s in t h e USING expression. For ex am p le, say t h at y ou ar e cu r r en t ly st or in g each cu st om er n am e in t w o colu m n s,
last_name an d first_name, an d y ou ' v e d ecid ed t o com b in e t h em in t o a sin g le colu m n n am ed customer name. You can d o t h at w it h t h e f ollow in g
com m an d s:

movies=# ALTER TABLE customers


movies-# ALTER COLUMN last_name
movies-# TYPE VARCHAR USING ( last_name || ',' || first_name ),
movies-# DROP COLUMN first_name;
ALTER TABLE

movies=# ALTER TABLE customers


movies-# RENAME COLUMN last_name TO customer_name;
ALTER TABLE

Th e fir st ALTER TABLE com m an d p er f or m s t w o alt er at ion s. Fir st , f or each r ow in t h e t able, it ev alu at es t h e ex p r ession last_name || ',' || first_name
an d assig n s t h at v alu e t o t h e last_name colu m n ( con v er t in g t h e r esu lt in t o t y p e VARCHAR alon g t h e w ay ) . Nex t , t h e ( f ir st ) ALTER TABLE com m an d r em ov es
t h e first_name colu m n f r om each r ow . You ' r e lef t w it h a sin g le colu m n called last_name t h at con t ain s t h e con cat en at ion of t h e or igin al last_name an d
first_name colu m n s ( w it h a com m a in b et w een ) . Th e secon d ALTER TABLE com m an d r en am es t h e last_name colu m n t o customer_name.

Keep in m in d t h at som e ALTER TABLE com m an d s w ill t ak e lon g er t o ex ecu t e t h an ot h er s. I t t ak es v er y lit t le t im e t o ch an g e t h e n am e of a colu m n . I t can
t ak e qu it e a w h ile t o ch an g e t h e d at a t y pe of a colu m n ( b ecau se Post g r eSQL h as t o t r av er se ev er y r ow in t h e t ab le an d w r it e ou t a n ew v er sion ) . I f y ou u se
ALTER TABLE ... SET TABLESPACE t o m ov e a t ab le f r om on e t ab lesp ace t o an ot h er , t h e ser v er m u st p h y sically cop y each b lock in t h e t able. I n m ost
cases, it ' s f ast er t o ex ecu t e a ser ies of ALTER TABLE com m an d s t h an it is t o r ead t h e old d at a in t o a clien t applicat ion , ch an g e each r ow , and t h en w r it e t h e
r esu lt b ack t o t h e ser v er . Wh en y ou u se an ALTER TABLE com m an d , t h e en t ir e t r an sf or m at ion occu r s w it h in t h e ser v er ; if y ou m od if y t h e st r u ct u r e of a
t able u sin g a cu st om - w r it t en clien t applicat ion , y ou h av e t o sen d ev er y r ow t o t h e clien t , p er f or m t h e t r an sf or m at ion , an d t h en sen d ev er y r ow b ack t o t h e
ser v er .

Th e ALTER TABLE com m an d is u sef u l w h en y ou ar e in t h e d ev elop m en t st ag es of a pr oj ect . Usin g ALTER TABLE, y ou can ad d n ew colu m n s t o a t able, def in e
def au lt v alu es, r en am e colu m n s ( an d t ables) , ad d an d d r op con st r ain t s, ch an g e t h e d at a t y p e of a colu m n , an d t r an sf er ow n er sh ip . Th e capabilit ies of t h e
ALTER TABLE com m an d seem t o g r ow w it h each n ew r elease —see t h e Post g r eSQL Ref er en ce Man u al for m or e det ails.
Ad d in g I n d e x e s t o a Ta ble

Most of t h e t ab les t h at y ou h av e cr eat ed so f ar h av e n o in d ex es. An in d ex ser v es t w o p u r p oses. Fir st , an in d ex can b e u sed t o


g u ar an t ee u n iq u en ess. Secon d , a n in d ex p r ov id es q u ick access t o d at a ( in cer t ain cir cu m st an ces) .

Her e is t h e d ef in it ion of t h e customers t ab le t h at y ou cr eat ed in Ch ap t er 1 :

CREATE TABLE customers (


customer_id INTEGER UNIQUE,
customer_name VARCHAR(50),
phone CHAR(8),
birth_date DATE,
balance DECIMAL(7,2)
);

Wh en y ou cr eat e t h is t ab le, Post g r eSQL w ill d isp lay a r at h er t er se m essag e:

NOTICE: CREATE TABLE / UNIQUE will create implicit


index 'customers_customer_id_key' for table 'customers'

Wh at Post g r eSQL is t r y in g t o t ell y ou h er e is t h at ev en t h ou g h y ou d id n ' t ex plicit ly ask f or on e, a n in d ex h as b een cr eat ed on y ou r


b eh alf . Th e im plicit in d ex is cr eat ed so t h at Post g r eSQL h as a q u ick w ay t o en su r e t h at t h e v alu es t h at y ou en t er in t o t he
customer_id col u m n ar e u n iq u e.

Th in k ab ou t h o w y ou m ig h t d esig n an alg or it h m t o ch eck f or d u p licat e v alu es in t h e f ollow in g list of n am es:

Gr u m b y , Jon as

Hin k ley , Ro y

Wen t w or t h , Eu n ice

Floy d , Hey w o o d

Bo w m an , Dav id

Du t t on , Ch ar les

Poole, Fr an k

Mor b iu s, Ed w ar d

Far m an , Jer r y

St on e, Jer em y

Du t t on , Ch ar les

Man ch ek , Ar t h u r

A f ir st at t em p t m ig h t sim p ly st ar t w it h t h e f ir st v alu e an d look f or a d u p licat e lat er in t h e list , com p ar in g Grumby, Jonas t o


Hinkley, Roy, t h en Wentworth, Eunice, an d so on . Nex t , y ou w ou ld m o v e t o t h e secon d n a m e in t h e list an d com p ar e Hinkley,
Roy t o Wentworth, Eunice, t h en Floyd, Heywood, an d so on . Th is alg or it h m w ou ld cer t ain ly w or k , b u t it w ou ld t u r n ou t t o b e slow
as t h e list g r ew lon g er . Each t im e y ou ad d a n ew n am e t o t h e list , y ou h av e t o com p ar e it t o ev er y ot h er n am e alr ead y in t h e list .

A b et t er solu t ion w ou ld be t o f ir st sor t t h e list :

Bow m an , Dav id

Du t t on , Ch ar les

Du t t on , Ch ar les

Far m an , Jer r y

Floy d , Hey w o o d

Gr u m b y , Jon as
Hin k ley , Ro y

Man ch ek , Ar t h u r

Mor b iu s, Ed w ar d

Poole, Fr an k

St on e, Jer em y

Wen t w or t h , Eu n ice

Af t er t h e list is sor t ed , it ' s easy t o ch eck f or d u p licat es —an y d u p licat e v alu es ap p ear n ex t t o each ot h er . To ch eck t h e sor t ed list ,
y ou st ar t w it h t h e f ir st n am e, Bowman, David an d com p ar e it t o t h e secon d n am e, Dutton, Charles. I f t h e secon d n a m e is n ot a
d u p licat e of t h e fir st , y ou k n o w t h at y ou w on ' t f in d an y d u p licat es lat er in t h e list . No w w h en y ou m o v e t o t h e secon d n a m e on t h e
list , y ou com p ar e it t o t h e t h ir d n am e —n o w y o u can see t h at t h er e is a d u p licat e. Du p licat e v alu es ap p ear n ex t t o each ot h er af t er
t h e list is sor t ed . No w w h en y ou ad d a n ew n am e t o t h e list , y ou can st op sear ch in g f or d u p licat e v alu es as soon as y ou en cou n t er
a v alu e t h at sor t s af t er t h e n am e y ou ar e ad d in g .

An in d ex is sim ilar in con cep t t o a sor t ed list , b u t it ' s ev en bet t er . An in d ex p r ov id es a q u ick w ay f or Post g r eSQL t o f in d d at a w it h in
a r an g e of v alu es. Let ' s see h o w an in d ex can h elp n ar r ow a sear ch . Fir st , let ' s assig n a n u m b er t o each of t h e n am es in t h e sor t ed
list , j u st f or easy r ef er en ce ( I ' v e r em o v ed t h e d u p licat e v alu e) :

1. Bow m an , Dav id

2. Du t t on , Ch ar les

3. Far m an , Jer r y

4. Floy d , Hey w o o d

5. Gr u m b y , Jon as

6. Hin k ley , Ro y

7. Man ch ek , Ar t h u r

8. Mor b iu s, Ed w ar d

9. Poole, Fr an k

10. St on e, Jer em y

11. Wen t w or t h , Eu n ice

Now let ' s bu ild a ( sim p list ic) in d ex ( see Figu r e 3 . 2 ) . Th e En g lish alp h ab et con t ain s 2 6 let t er s—split t h is r ou g h ly in h alf an d ch oose
t o k eep t r ack of w h er e t h e " Ms" st ar t in t h e list . I n t h is list , n am es b eg in n in g w it h an M st ar t at en t r y n u m b er 7 . Keep t r ack of t h is
pair ( M, 7 ) an d call it t h e r oot of y ou r in d ex .

Fi g u r e 3 . 2 . O n e - l e v e l i n d e x .

Now w h en y ou in ser t a n ew n am e, Tyrell, Eldon, y ou st ar t b y com p ar in g it t o t h e r oot . Th e r oot of t h e in d ex t ells y ou t h at


n am es st ar t in g w it h t h e let t er M ar e f ou n d st ar t in g at en t r y n u m b er 7 . Becau se t h e list is sor t ed , an d y ou k n ow t h at Tyrell w ill sor t
af t er M, y ou can st ar t sear ch in g f or t h e in ser t ion poin t at en t r y 7 , sk ip p in g en t r ies 1 t h r ou g h 6 . Also, y ou can st op sear ch in g as
soon as y ou en cou n t er a n a m e t h at sor t s lat er t h an Tyrell.
As y ou r list of n am es g r ow s, it w ou ld b e ad v an t ag eou s t o ad d m or e lev els t o t h e in d ex ( see Fig u r e 3 . 3 ) . Th e let t er M sp lit s t h e
alp h ab et ( r ou g h ly ) in h alf . Ad d a secon d lev el t o t h e in d ex b y sp lit t in g t h e r an g e b et w een A a n d M ( g iv in g y ou G) , an d split t in g t h e
r an g e b et w een M an d Z ( g iv in g y ou T) .

Fi g u r e 3 . 3 . T w o - le v e l i n d e x .

Now w h en y ou w an t t o ad d Tyrell, Eldon t o t h e list , y ou com p ar e Tyrell ag ain st t h e r oot an d f in d t h at Tyrell sor t s lat er t h an
M. Mov in g t o t h e n ex t lay er of t h e in d ex , y ou f in d t h at Tyrell sor t s lat er t h an T, so y ou can j u m p st r aig h t t o slot n u m b er 1 1 an d
in ser t t h e n ew v alu e.

You can see t h at y ou can ad d as m an y in d ex lev els as y ou n eed . Each lev el d iv id es t h e p ar en t ' s r an g e in h alf , an d each lev el
r ed u ces t h e n u m b er of n am es t h at y ou h av e t o sear ch t o f in d an in ser t ion p oin t [ 6] .

[ 6]
Tech n ically sp eak in g , t h e in d ex d iag r am s d iscu ssed h er e d ep ict a clu st er ed in d ex . I n a clu st er ed in d ex , t h e leaf
n od es in t h e in d ex t r ee ar e t h e d at a r ow s t h em selv es. I n a n o n - clu st er ed in d ex , t h e leaf n od es ar e act u ally r ow
p oin t er s—t h e r ow s ar e n ot k ep t in sor t ed or d er . Post g r eSQL d oes n ot su p p or t clu st er ed in d ex es. I ' v e d iag r am m ed
t h e in d ex t r ees in clu st er ed f or m f or clar it y . A clu st er ed in d ex p r ov id es f ast , seq u en t ial access alon g on e in d ex p at h ,
b u t it is v er y ex p en siv e t o m ain t ain .

Usin g an in d ex is sim ilar in con cep t t o t h e w ay y ou look up w o r d s in a d ict ion ar y . I f y ou h av e a dict ion ar y h an d y , pu ll it of f t h e sh elf
an d t ak e a close look at it . I f it ' s lik e m y d ict ion ar y , it h as t h ose lit t le t h u m b - t ab in d en t at ion s, o n e f or each let t er of t h e alp h ab et . I f
I w an t t o f in d t h e d ef in it ion of t h e w or d " p oly g lot , " I ' ll f in d t h e t h u m b- t ab lab eled " P" an d st ar t sear ch in g ab ou t h alf w ay t h r ou g h
t h at sect ion . I k n ow , b ecau se t h e d ict ion ar y is sor t ed , t h at " p oly g lot " w on ' t ap p ear in an y sect ion pr ior t o " P" an d it w on ' t ap p ear in
an y sect ion f ollow in g " P. " Th at lit t le t h u m b - t ab sav es a lot of sear ch in g .

You also can u se a n in d ex as a q u ick w ay t o ch eck f or u n iq u en ess. I f y ou ar e in ser t in g a n ew n am e in t o t h e in d ex st r u ct u r e sh ow n


ear lier , y ou sim p ly sear ch f or t h e n ew n am e in t h e in d ex . I f y ou f in d it in t h e in d ex , it is ob v iou sly a d u p licat e.

I m en t ion ed ear lier t h at Post g r eSQL u ses an in d ex f or t w o p u r p oses. You ' v e seen t h at a n in d ex can be u sed t o sear ch f or u n iq u e
v alu es. Bu t h o w d oes Post g r eSQL u se a n in d ex t o p r ov id e f ast er d at a access?

Let ' s look at a sim p le q u er y :

SELECT * FROM characters WHERE name >= 'Grumby' AND name < 'Moon';

Now assu m e t h at t h e list of n am es t h at y ou w or k ed w it h b ef or e is act u ally a t ab le n am ed characters an d y ou h av e a n in d ex


d ef in ed f or t he name colu m n , as in Fig u r e 3 . 4 .

Fi g u r e 3 . 4 . T w o - l e v e l i n d e x ( a g a i n ) .

Wh en Post g r eSQL p ar ses t h r ou g h t h e SELECT st at em en t , it n ot ices t h at y o u ar e con st r ain in g t h e r esu lt set t o a r an g e of n am es an d


t h at y ou h av e an in d ex o n t h e name colu m n . Th at ' s a con v en ien t com b in at ion . To sat isf y t h is st at em en t , Post g r eSQL can u se t h e
in d ex t o st ar t sear ch in g at en t r y n u m b er 5 . Becau se t h e r ow s ar e alr ead y sor t ed , Post g r eSQL can st op sear ch in g as soon as it f in d s
t h e f ir st en t r y g r eat er t h an " Moon " ( t h at is, t h e sear ch en d s as soon as y ou h it en t r y n u m b er 8 ) . Th is k in d of op er at ion is called a
par t ial in d ex scan .
Th in k of h ow Post g r eSQL w ou ld p r ocess t h is q u er y if t h e r ow s w er e n ot in d ex ed . I t w ou ld h av e t o st ar t at t h e b eg in n in g of t h e t ab le
an d com p ar e each r ow ag ain st t h e con st r ain t s; Post g r eSQL can ' t t er m in at e t h e sear ch w it h ou t p r ocessin g ev er y r o w in t h e t ab le.
Th is k in d of op er at ion is called a f u ll t ab le scan , or t ab le scan .

Becau se t h is k in d of in d ex can access d at a in sor t ed or d er , Post g r eSQL can u se su ch an in d ex t o av oid a sor t t h at w ou ld ot h er w ise
be r eq u ir ed t o sat isf y a n ORDER BY clau se.

I n t h ese ex am p les, w e ar e w or k in g w it h sm all t ab les, so t h e p er f or m an ce d if f er en ce b et w een a f u ll t ab le scan an d a n in d ex ed r an g e


r ead is n eg lig ib le. As t ab les b ecom e lar g er , t h e p er f or m an ce d if f er en ce can b e h u g e. Ch ap t er 4 , " Per f or m an ce, " d iscu sses h o w t h e
Post g r eSQL q u er y op t im izer ch ooses w h en it is ap p r op r iat e t o u se a n in d ex .

Post g r eSQL act u ally su p p or t s sev er al k in d s of in d ex es. Th e p r ev iou s ex am p les sh ow h o w a B- Tr ee in d ex w o r k s [ 7] . An ot h er t y p e of


in d ex is t h e Hash in d ex . A Hash in d ex u ses a t ech n iq u e called h ash in g t o ev en ly d ist r ib u t e k ey s am o n g a n u m b er of h ash b u ck et s.
Each k ey v alu e ad d ed t o a h ash in d ex is r u n t h r ou g h a h ash in g f u n ct ion . Th e r esu lt of a h ash in g f u n ct ion is a b u ck et n u m b er . A
sim plist ic h ash in g f u n ct ion f or st r in g v alu es m ig h t su m t h e ASCI I v alu e of each ch ar act er in t h e st r in g an d t h en com p u t e t h e su m
m od u lo t h e n u m b er of b u ck et s t o g et t h e r esu lt . I n C, y ou m i g h t w r it e t h is f u n ct ion as

[ 7]
Th e " B" in B- Tr ee st an d s f or " Balan ced . " A b alan ced t r ee is a t y p e of d at a st r u ct u r e t h at r et ain s it s p er f or m an ce
ch ar act er ist ics ev en in t h e f ace of n u m er ou s in ser t ion s an d d elet ion s. Th s m ost im p or t an t f eat u r e of a B- Tr ee is t h at
it t ak es ab ou t t h e sam e am ou n t of t im e t o f in d an y g iv en r ecor d .

int hash_string( char * key, int bucket_count )


{
int hash = 0;
int i;

for( i = 0; i < strlen( key ); i++ )


hash = hash + key[i];

return( hash % bucket_count );


}

Let ' s r u n each of t h e n am es in t h e characters t ab le t h r ou g h t h is f u n ct ion t o see w h at k in d of n u m b er s y ou g et b ack ( I ' v e u sed a


bucket_count of 5 ) :

h a sh _ st r i n g ( ) V a l u e Nam e

1 Gr u m b y , Jon as

2 Hin k ley , Ro y

3 Wen t w or t h , Eu n ice

4 Floy d , Hey w o o d

4 Bo w m an , Dav id

3 Du t t on , Ch ar les

3 Poole, Fr an k

0 Mor b iu s, Ed w ar d

0 Far m an , Jer r y

0 St on e, Jer em y

4 Man ch ek , Ar t h u r

Th e n u m b er s r et u r n ed don ' t r eally h av e an y in t r in sic m ean in g , t h ey sim p ly ser v e t o d ist r ib u t e a set of k ey s am on g st a set of
b u ck et s.

Now let ' s r ef or m at t h is t ab le so t h at t h e con t en t s ar e g r ou p ed b y b u ck et n u m b er :

Bu ck e t N u m b e r Bu ck e t Co n t e n t s

0 Mor b iu s, Ed w ar d

Far m an , Jer r y

St on e, Jer em y

1 Gr u m b y , Jon as

2 Hin k ley , Ro y

3 Wen t w or t h , Eu n ice

Bu ck e t N u m b e r Bu ck e t Co n t e n t s
Du t t on , Ch ar les

Poole, Fr an k

4 Floy d , Hey w ood

Bow m an , Dav id

Man ch ek , Ar t h u r

You can see t h at t h e h ash f u n ct ion ( hash_string()) d id a r esp ect ab le j ob of d ist r ib u t in g t h e n am es b et w een t h e f iv e h ash b u ck et s.
Not ice t h at w e d id n ot h av e t o assig n a u n iq u e h ash v alu e t o each k ey — h ash k ey s ar e seld om u n iq u e. Th e im p or t an t f eat u r e of a
g ood h ash f u n ct ion is t h at it d ist r ib u t es a set of k ey s f air ly ev en ly . Now t h at y ou h av e a Hash in d ex , h o w can y ou u se it ? Fir st , let ' s
t r y t o in ser t a n ew n am e: Lowell, Freeman. Th e f ir st t h in g y ou d o is r u n t h is n am e t h r ou g h y ou r hash_string() f u n ct ion , g iv in g
y ou a h ash v alu e of 4 . No w y ou k n ow t h at if Lowell, Freeman is alr ead y in t h e in d ex , it w ill b e in b u ck et n u m b er 4 ; all y ou h av e t o
d o is sear ch t h at on e b u ck et f or t h e n a m e y ou ar e t r y in g t o in ser t .

Th er e ar e a cou p le of im p or t an t p oin t s t o n ot e ab ou t Hash in d ex es.

Fir st , y ou m ay h av e n ot iced t h at each b u ck et can h old m an y k ey s. An ot h er w ay t o say t h is is t h at each k ey d oes n o t h av e a u n iq u e


h ash v alu e. I f y ou h av e t oo m an y collision s ( t h at is, t oo m a n y k ey s h ash in g t o t h e sam e b u ck et ) , p er f or m an ce w ill su f f er . A g ood
h ash f u n ct ion d ist r ib u t es k ey s ev en ly b et w een all h ash b u ck et s.

Secon d , n ot ice t h at a h ash t ab le is n ot sor t ed . Th e n a m e Floyd, Heywood h ash es t o b u ck et 4 , b u t Farman, Jerry h ash es t o
b u ck et 0 . Con sid er t h e SELECT st at em en t t h at w e look ed at ear lier :

SELECT * FROM characters WHERE name >= 'Grumby' AND name < 'Moon';

To sat isf y t h is q u er y u sin g a Hash in dex , y ou h av e t o r ead t h e en t ir e con t en t s of each b u ck et . Bu ck et 0 con t ain s on e r o w t h at
m eet s t h e con st r ain t s ( Farman, Jerry) , b u ck et 2 con t ain s on e r ow , an d b u ck et 4 con t ain s o n e r ow . A Hash in d ex of f er s n o
ad v an t ag e t o a r an g e r ead . A Hash in d ex is g ood for sear ch es b ased o n eq u alit y . For ex am p le, t h e SELECT st at em en t

SELECT * FROM characters WHERE name = 'Grumby, Jonas';

can b e sat isf ied sim p ly by h ash in g t h e st r in g t h at y ou ar e sear ch in g f or . A Hash in d ex is also u sef u l w h en y ou ar e j oin in g t w o t ables
w h er e t h e j oin con st r ain t is of t h e f or m table1-column = table2-column [ 8] . A Hash r ead can n ot b e u sed t o av oid a sor t r eq u ir ed
t o sat isf y an ORDER BY clau se.

[ 8]
Th is t y p e of j oin is k n ow n as an eq u i- j oin .

Post g r eSQL su p p or t s t w o ot h er t y p es of in d ex st r u ct u r es: t h e R- Tr ee in d ex an d t h e GiST in d ex . An R- Tr ee in d ex is b est su it ed f or


in d ex in g spat ial ( t h at is, g eom et r ic o r g eog r ap h ic) d at a. A GiST in d ex is a B- Tr ee in d ex t h at can b e ex t en d ed by d ef in in g n ew q u er y
p r ed icat es [ 9] . Mor e in f or m at ion ab ou t GiST in d ex es can b e f ou n d at h t t p : / / g ist . cs. b er k eley . ed u / .

[ 9]
A p r ed icat e is a t est . A sim p le p r ed icat e is t he less- t h an op er at or ( < ) . An ex p r ession su ch as a < 5 t est s w h et h er
t he v alu e of a is less t h an 5 . I n t h is ex p r ession , < is t h e p r ed icat e an d it is called t h e less- t h an p r ed icat e. Ot h er
p r ed icat es ar e = , > , > = , an d so o n .

Tr a d e of f s

Th e p r ev iou s sect ion sh ow ed t h at Post g r eSQL can u se a n in d ex t o sp eed t h e p r ocess of sear ch in g f or d at a w it h in a r an g e of v alu es
( or d at a w it h a n ex act v alu e) . Most q u er ies ( t h at is, SELECT com m an d s) in Post g r eSQL in clu d e a WHERE clau se t o lim it t h e r esu lt
set . I f y ou f in d t h at y ou ar e of t en sear ch in g f or r esu lt s b ased on a r an g e of v alu es f or a specif ic col u m n or g r ou p of colu m n s, y ou
m ig h t w an t t o con sid er cr eat in g an in d ex t h at cov er s t h ose colu m n s.

How ev er , y ou sh ou ld b e aw ar e t h at an in d ex r ep r esen t s a p er f or m an ce t r ad eof f . Wh en y ou cr eat e an in d ex , y ou ar e t r adin g r ead


p er f or m an ce f or w r it e p er f or m an ce. An in d ex can sig n if ican t ly r ed u ce t he am o u n t of t im e it t ak es t o r et r iev e d at a, b u t it w ill also
in cr ease t h e am ou n t of t im e it t ak es t o INSERT, DELETE, an d UPDATE d at a. Main t ain in g an in d ex in t r od u ces su b st an t ial ov er h ead
w h en y ou m od if y t h e d at a w it h in a t ab le.

You sh ou ld con sid er t h is t r ad eof f w h en y ou f eel t h e n eed t o ad d a n ew in d ex t o a t ab le. Ad d in g an in d ex t o a t ab le t h at is u p d at ed


f r eq u en t ly w ill cer t ain ly slow t h e u p d at es. A g ood can d id at e f or an in d ex is a t ab le t h at y ou SELECT f r om f r eq u en t ly b u t seld om
u p d at e. A cu st om er list , f or ex am p le, d oesn ' t ch an g e of t en ( p ossib ly sev er al t im es each d ay ) , b u t y ou p r ob ab ly q u er y t h e cu st om er
list f r eq u en t ly . I f y ou f in d t h at y ou of t en q u er y t h e cu st om er list b y p h on e n u m b er , it w ou ld b e b en ef icial t o in d ex t h e p h on e
n u m b er colu m n . On t h e ot h er h an d , a t ab le t h at is u p d at ed f r eq u en t ly , b u t seld om q u er ied , su ch as a t r an sact ion h ist or y t ab le,
w ou ld b e a p oor ch oice f or a n in d ex .

Cr e a t in g a n I n de x

Now t h at y ou h av e seen w h at a n in d ex can d o, let ' s look at t h e p r ocess of ad d in g an in d ex t o a t ab le. Th e p r ocess of cr eat in g a n ew
in d ex can r an g e f r om sim p le t o som ew h at com p lex .

Let ' s ad d an in d ex t o t he rentals t ab le. Her e is t h e st r u ct u r e of t h e r en t als t ab le f or r ef er en ce:

CREATE TABLE rentals


(
tape_id CHARACTER(8) REFERENCES tapes,
customer_id INTEGER REFERENCES customers,
rental_date DATE
);

Th e sy n t ax f or a sim p le CREATE INDEX co m m an d is

CREATE [UNIQUE] INDEX index-name ON table-name( column [,...] );

You w an t t o in d ex t h e rental_date col u m n in t h e rentals t ab le:

CREATE INDEX rentals_rental_date ON rentals ( rental_date );

You h av en ' t sp ecif ied an y op t ion al in f or m at ion in t h is co m m an d ( I ' ll g et t o t h e op t ion s in a m om en t ) , so Post g r eSQL cr eat es a B-
Tr ee in d ex n am ed rentals_rental_date. Post g r eSQL con sid er s u sin g t h is w h en ev er it f in d s a WHERE clau se t h at r ef er s t o t h e
rental_date col u m n u sin g t h e <, <=, =, >=, or > op er at or . Th is in d ex also can b e u sed w h en y ou sp ecif y an ORDER BY clau se t h at
sor t s o n t h e rental_date col u m n .

M u lt icolu m n I n de x e s

A B- Tr ee in d ex ( or a GiST in d ex ) can cov er m or e t h an on e colu m n . Mu lt icolu m n in d ex es ar e u su ally cr eat ed w h en


y ou h av e m an y v alu es on t h e secon d colu m n f or each v alu e in t h e fir st colu m n . For ex am p le, y ou m ig h t w an t t o
cr eat e an in d ex t h at cov er s t h e rental_date an d tape_id col u m n s—y ou h av e m an y d if f er en t t ap es r en t ed on an y
g iv en d at e. Post g r eSQL can u se m u lt icolu m n in d ex es f or select ion or f or or d er in g . Wh en y ou cr eat e a m u lt icolu m n
in dex , t h e or d er in w h ich y ou n am e t h e colu m n s is im p or t an t . Post g r eSQL can u se a m u lt icolu m n in d ex w h en y ou
ar e select in g ( or or d er in g b y ) a p r ef ix of t h e k ey . I n t h is con t ex t , a p r ef ix m ay b e t h e en t ir e k ey or a lead in g p or t ion
of t h e k ey . Fo r ex am p le, t h e co m m an d SELECT * FROM rentals ORDER BY rental_date cou ld n ot u se a n in d ex
t h at cov er s tape_id p lu s rental_date, b u t it cou ld u se an in d ex t h at cov er s rental_date p lu s tape_id.

Th e index-name m u st b e u n iq u e w it h in t h e d at ab ase: You can ' t h av e t w o in d ex es w it h t h e sam e n am e, ev en if t h ey ar e d ef in ed on


d if f er en t t ab les. Ne w r ow s ar e in d ex ed as t h ey ar e ad d ed , an d d elet ed r ow s ar e r em ov ed . I f y ou ch an g e t h e rental_date f or a
g iv en r ow , t h e in d ex w ill b e u p d at ed au t om at ically . I f y ou h av e an y d at a in t h e rentals t ab le, each r o w w ill b e in clu d ed in t h e
in d ex .

I n de x e s a n d NULL V a lu e s

Ear lier , I m en t ion ed t h at an in d ex in clu d es a p oin t er f or ev er y r o w in a t ab le. Th at st at em en t isn ' t 1 0 0 % accu r at e.


Post g r eSQL w ill n ot in d ex NULL v alu es in R- Tr ee, Hash , an d GiST in d ex es. Becau se su ch an in d ex w ill n ev er in clu d e
NULL v alu es, it can n ot b e u sed t o sat isf y t h e ORDER BY clau se of a q u er y t h at r et u r n s all r ow s in a t ab le. For
ex am p le, if y ou d ef in e a GiST in d ex cov er in g t h e phone col u m n in t h e customers t ab le, t h at in d ex w ou ld n ot in clu d e
r ow s w h er e phone w as NULL. I f y ou ex ecu t ed t h e co m m an d SELECT * FROM customers ORDER BY phone,
Post g r eSQL w ou ld h av e t o p er f or m a f u ll t ab le scan a n d t h en sor t t h e r esu lt s. I f Post g r eSQL t r ied t o u se t h e phone
in dex , it w ou ld n ot f in d all r ow s. I f t h e p h on e colu m n w er e d ef in ed as NOT NULL, t h en Post g r eSQL cou ld u se t h e
in d ex t o av oid a sor t . Or , if t h e SELECT co m m an d in clu d ed t h e clau se WHERE phone IS NOT NULL, Post g r eSQL cou ld
u se t h e in d ex t o sat isf y t h e ORDER BY clau se. An R- Tr ee, Hash , or GiST in d ex t h at cov er s an op t ion al ( t h at is,
NULLs- allow ed ) colu m n w ill n ot b e u sed t o sp eed t ab le j oin s, eit h er .

A B- Tr ee in d ex ( t h e d ef au lt in d ex t y p e) d oes in clu d e NULL v alu es.

I f y ou d on ' t sp ecif y a n in d ex t y p e w h en cr eat in g an in d ex , y ou ' ll g et a B- Tr ee in d ex . Let ' s ch an g e t h e rentals_rental_date in d ex


in t o a Hash in d ex . Fir st , d r op t h e or ig in al in d ex :

DROP INDEX rentals_rental_date;

Th en y ou can cr eat e a n ew in d ex :

CREATE INDEX rentals_rental_date ON rentals USING HASH ( rental_date );


Th e on ly d if f er en ce b et w een t h is CREATE INDEX co m m an d an d t h e p r ev iou s on e is t h at I h av e in clu d ed a USING clau se. You can
sp ecif y USING BTREE ( w h ich is t h e d ef au lt ) , USING HASH, USING RTREE, or USING GIST.

Th is in d ex can n ot be u sed t o sat isf y an ORDER BY clau se. I n f act , t h is in d ex can be u sed on ly w h en rental_date is com p ar ed u sin g
t he = op er at or .

I d r op p ed t h e B- Tr ee in d ex b ef or e cr eat in g t h e Hash in d ex , b u t t h at is n ot st r ict ly n ecessar y . I t is p er f ect ly v alid ( b u t u n u su al) t o


h av e t w o o r m or e in d ex es t h at cov er t h e sam e colu m n , as lon g as t h e in d ex es ar e u n iq u ely n am ed . I f w e h ad b ot h a B- Tr ee in d ex
an d a Hash in d ex cov er in g t h e rental_date colu m n , Post g r eSQL cou ld u se t h e Hash in d ex f or = com p ar ison s an d t h e B- Tr ee in d ex
f or ot h er com p ar ison s.

Fu n ct ion a l I n d e x e s a n d Pa r t ia l I n d e x e s

Now let ' s look at t w o v ar iat ion s o n t h e b asic in d ex t y p es: f u n ct ion al in d ex es an d p ar t ial in d ex es.

A colu m n - b ased in d ex cat alog s t h e v alu es f ou n d in a colu m n ( or a set of colu m n s) . A f u n ct ion al in d ex ( or m or e p r ecisely a f u n ct ion -
v alu ed in d ex ) cat alog s t h e v alu es r et u r n ed b y a g iv en f u n ct ion . Th is m ig h t b e easiest t o u n d er st an d b y look in g at a n ex am p le. Each
r ow in t h e customers t ab le con t ain s a p h on e n u m b er . You can u se t he ex ch an g e [ 10] p or t ion of t h e p h on e n u m b er t o d et er m in e
w h et h er a g iv en cu st om er is locat ed close t o y ou r st or e. For ex am p le, y ou m a y k n ow t h at t h e 5 5 5 , 5 5 6 , an d 7 9 4 ex ch an g es ar e
w it h in f iv e m iles of y ou r v ir t u al v id eo st or e. Let ' s cr eat e a f u n ct ion t h at ex t r act s t h e ex ch an g e f r om a p h on e n u m b er :

[ 10]
I n t h e U. S. , a p h on e n u m b er is com p osed of a n op t ion al t h r ee- digit ar ea cod e, a t h r ee- digit ex ch an g e, an d a
f ou r - d ig it ?u m m m , n u m b er .

-- exchange_index.sql
--
CREATE OR REPLACE FUNCTION get_exchange( CHARACTER )
RETURNS CHARACTER AS '

DECLARE
result CHARACTER(3);
BEGIN

result := SUBSTR( $1, 1, 3 );

return( result );
END;
' LANGUAGE 'plpgsql' WITH ( ISCACHABLE );

Don ' t b e t o o con cer n ed if t h is look s a b it con f u sin g ; I 'll cov er t h e PL/ p g SQL lan g u ag e in m or e d et ail in Ch ap t er 7 , " PL/ p g SQL. " Th is
f u n ct ion ( get_exchange()) accep t s a sin g le ar g u m en t , p r esu m ab ly a p h o n e n u m b er , an d ex t r act s t h e f ir st t h r ee ch ar act er s. You
can call t h is f u n ct ion d ir ect ly f r om psql:

movies=# SELECT customer_name, phone, get_exchange( phone )


movies-# FROM customers;

customer_name | phone | get_exchange


----------------------+----------+------------
Jones, Henry | 555-1212 | 555
Rubin, William | 555-2211 | 555
Panky, Henry | 555-1221 | 555
Wonderland, Alice N. | 555-1122 | 555
Wink Wankel | 555-1000 | 555

You can see t h at g iv en a p h o n e n u m b er , get_exchange() r et u r n s t h e f ir st t h r ee d ig it s. No w let ' s cr eat e a f u n ct ion - v alu ed in d ex


t h at u ses t h is f u n ct ion :

CREATE INDEX customer_exchange ON customers ( get_exchange( phone ));

Wh en y ou in ser t a n ew r o w in t o a colu m n - b ased in d ex , Post g r eSQL w ill in d ex t h e v alu es in t h e colu m n s cov er ed by t h at in d ex .


Wh en y ou in ser t a n ew r o w in t o a f u n ct ion - v alu ed in d ex , Post g r eSQL w ill call t h e f u n ct ion t h at y ou sp ecif ied an d t h en in d ex t h e
r et u r n v alu e.

Af t er t h e customer_exchange in d ex ex ist s, Post g r eSQL can u se it t o sp eed up q u er ies su ch as

SELECT * FROM customers WHERE get_exchange( phone ) = '555';


SELECT * FROM customers ORDER BY get_exchange( phone );

Now y ou h av e a n in d ex t h at y ou can u se t o sear ch t h e cu st om er list f or all cu st om er s t h at ar e g eog r ap h ically close. Let ' s p r et en d
t h at y ou occasion ally w an t t o sen d ad v er t isin g f ly er s t o t h ose cu st om er s closest t o y ou : y ou m ig h t n ev er u se t h e
customer_exchange in d ex f or an y ot h er p u r p ose. I f y ou n eed t h e customer_exchange in d ex f or on ly a sm all set of cu st om er s, w h y
b ot h er m ain t ain in g t h at in d ex f or cu st om er s ou t sid e of y ou r v icin it y ? Th is is w h er e a par t ial in d ex com es in h an d y . Wh en y ou cr eat e
an in d ex , y ou can in clu d e a WHERE clau se in t h e CREATE INDEX co m m an d . Each t im e y ou in ser t ( or u p d at e) a r ow , t h e WHERE clau se
is ev alu at ed . I f a r o w sat isf ies t h e con st r ain t s of t h e WHERE clau se, t h at r o w is in clu d ed in t h e in d ex ; ot h er w ise, t h e r o w is n ot
in clu d ed in t h e in d ex . Let ' s DROP t h e customer_exchange in d ex an d r ep lace it w it h a p ar t ial, f u n ct ion - v alu ed in d ex :

movies=# DROP INDEX customer_exchange;


DROP
movies=# CREATE INDEX customer_exchange
movies-# ON customers ( get_exchange( phone ))
movies-# WHERE
movies-# get_exchange( phone ) = '555'
movies-# OR
movies-# get_exchange( phone ) = '556'
movies-# OR
movies-# get_exchange( phone ) = '794';
CREATE

Now t h e customer_exchange p ar t ial in d ex con t ain s en t r ies on ly f or cu st om er s in t h e 5 5 5 , 5 5 6 , or 7 9 4 ex ch an g e.

Th er e ar e t h r ee p er f or m an ce ad v an t ag es t o a p ar t ial in d ex :

€ A par t ial in d ex r eq u ir es less d isk sp ace t h an a full in d ex .

€ Becau se f ew er r ow s ar e cat alog ed in a p ar t ial in d ex , t h e cost of m ain t ain in g t h e in d ex is low er .

€ Wh en a par t ial in d ex is u sed in a q u er y , Post g r eSQL w ill h av e f ew er in d ex en t r ies t o sear ch .

Par t ial in d ex es an d f u n ct ion - v alu ed in d ex es ar e v ar iat ion s o n t h e f ou r b asic in d ex t y p es. You can cr eat e a f u n ct ion - v alu ed Hash
in d ex , B- Tr ee in d ex , R- t r ee in d ex , or GiST in d ex . You can also cr eat e a par t ial v ar ian t of an y in d ex t y pe. An d , as y ou h av e seen ,
y ou can cr eat e p ar t ial f u n ct ion - v alu ed in d ex es ( of an y t y p e) . A f u n ct ion - v alu ed in d ex d oesn ' t ch an g e t h e or g an izat ion of a n in d ex —
j u st t h e v alu es t h at ar e act u ally in clu d ed in t h e in d ex . Th e sam e is t r u e f or a par t ial in d ex .

Cr e a t in g I n d e x e s on Ar r a y V a lu e s

Most in d ex es cov er scalar - v alu ed colu m n s ( colu m n s t h at st or e a sin g le v alu e) . Post g r eSQL also allow s y ou t o d ef in e in d ex es t h at
cov er in d ex v alu es. I n f act , y ou can cr eat e an in d ex t h at cov er s t h e en t ir e ar r ay or ( st ar t in g w it h Post g r eSQL v er sion 7 . 4 ) a n in d ex
t h at cov er s in d iv id u al elem en t s w it h in an ar r ay . I n Ch ap t er 2 w e sh o w ed y ou a m od if ied v er sion of t h e customers t ab le t h at
in clu d ed a n ar r ay colu m n ( monthly_balances) . You can ad d t h is col u m n t o y ou r w or k in g cop y of t h e customers t ab le w it h t h e
f ollow in g co m m an d :

movies=# ALTER TABLE customers


movies-# ADD COLUMN
movies-# monthly_balances DECIMAL( 7, 2 )[ 12 ];
ALTER TABLE

To cr eat e an in d ex t h at cov er s a sin g le elem en t of monthly_balances ar r ay ( say , t h e elem en t cor r esp on d in g t o t h e m on t h of


Feb r u ar y ) , y ou cou ld ex ecu t e t h e f ollow in g co m m an d :

movies=# CREATE INDEX customers_feb


movies-# ON customers(( monthly_balances[2] ));
CREATE INDEX

Not ice t h at y ou n eed an ex t r a set of p ar en t h eses ar ou n d monthly_balances[2]. On ce y ou ' v e cr eat ed t h e customers_feb in d ex ,


Post g r eSQL can u se it t o sat isf y q u er ies su ch as

movies=# SELECT * FROM customers WHERE monthly_balances[2] = 10;


movies=# SELECT * FROM customers ORDER BY monthly_balances[2];

To cr eat e an in d ex t h at cov er s t h e en t ir e monthly_balances ar r ay , ex ecu t e t h e co m m an d

movies=# CREATE INDEX customers_by_monthly_balance


movies-# ON customers( monthly_balances );
CREATE INDEX

Wh en y ou cr eat e an in d ex t h at cov er s a n ar r ay colu m n , t h e sy n t ax is t h e sam e as y ou w ou ld u se t o cov er a scalar ( sin gle- v alu ed )


colu m n . Th e Post g r eSQL op t im izer can u se t h e customers_by_monthly_balance in d ex t o sat isf y a n ORDER BY clau se su ch as

movies=# SELECT * FROM customers ORDER BY monthly_balances;

How ev er , y ou m a y be su r p r ised t o f in d t h at t h e op t im izer w ill n ot u se customers_by_monthly_balance t o sat isf y a WHERE CLAUSE


su ch as
movies=# SELECT * FROM customers WHERE monthly_balances[1] = 10;

Th e Post g r eSQL op t im izer w ill u se t h e customers_by_monthly_balance in d ex t o sat isf y a WHERE_CLAUSE t h at com p ar es t h e en t ir e


monthly_balances ar r ay ag ain st an ot h er ar r ay , lik e t h is:

movies=# SELECT * FROM customers WHERE monthly_balances = '{10}';

Bu t b e aw ar e t h at t h ese q u er ies ar e n ot eq u iv alen t . Th e f ir st WHERE clau se ( monthly_balances[1] = 10) select s an y r ow w h er e


monthly_balances[1] is eq u al t o 10, r eg ar d less of t h e ot h er monthly_balances in t h at r ow . Th e secon d WHERE clau se
( monthly_balances = '{10}') select s on ly t h ose r ow s w h er e monthly_balances[1] = 10 an d all ot h er monthly_balances v alu es
ar e NULL.

I n d e x e s a n d Ta b le sp a ce s

Wh en y ou cr eat e an in d ex , y ou can t ell Post g r eSQL t o st or e t h e in d ex in a sp ecif ic t ab lesp ace by in clu d in g a TABLESPACE
tablespacename clau se, lik e t h is:

CREATE INDEX rentals_rental_date


ON rentals ( rental_date ) TABLESPACE mytablespace;

I f y ou d on ' t sp ecif y a t ab lesp ace, Post g r eSQL cr eat es t h e in d ex in t h e t ab lesp ace assig n ed t o t h e t ab le t h at y ou ar e in d ex in g . You
can m o v e a n ex ist in g in d ex t o a d if f er en t t ab lesp ace u sin g t h e ALTER INDEX co m m an d . For ex am p le, t o m o v e t h e
rentals_rental_date in d ex t o mytablespace, y ou w ou ld ex ecu t e t h e co m m an d

ALTER INDEX rentals_rental_date SET TABLESPACE mytablespace;

You m a y w an t t o st or e a t ab le an d it s in d ex es in d if f er en t t ab lesp aces in or d er t o sp r ead t h e w or k load am o n g m u lt ip le p h y sical d isk


d r iv es.
Ge t t ing I nform a t ion About D a t a ba se s a nd Ta bles

When y ou cr eat e a t able, Post gr eSQL st or es t he definit ion of t hat t able in t he sy st em cat alog. The sy st em cat alog is a collect ion of
Post gr eSQL t ables. You can issue SELECT st at em ent s against t he sy st em cat alog t ables j ust lik e any ot her t able, bu t t her e ar e easier
w ay s t o v iew t able and index definit ions.

When y ou ar e using t he psql client applicat ion, y ou can v iew t he list of t ables defined in y our dat abase using t he \d m et a- com m an d:

movies=# \d
List of relations
Name | Type | Owner
------------------+-------+---------------
customers | table | bruce
rentals | table | bruce
tapes | table | bruce

To see t he det ailed definit ion of a par t icular t able, u se t he \d table-name m et a- com m an d:

movies=# \d tapes
Table "tapes"
Column | Type | Modifiers
----------+-----------------------+-----------
tape_id | character(8) | not null
title | character varying(80) | not null
duration | interval |

You can also v iew a list of all index es defined in y our dat abase. Th e \di m et a- com m an d display s index es:

movies=# \di
List of relations
Schema | Name | Type | Owner | Table
--------+---------------------------+-------+-------+-----------
public | customers_customer_id_key | index | korry | customers

You can see t he full definit ion for an y giv en index using t he \d index-name m et a- com m an d:

movies=# \d customers_customer_id_key
Index "public.customers_customer_id_key"
Column | Type
-------------+---------
customer_id | integer
UNIQUE, btree, for table "public.customers"

Table 3. 1 show s a com plet e list of t he sy st em cat alog- r elat ed m et a- com m an ds in psql:

Ta b le 3 .1 . Sy st e m Ca t a lo g M e t a - Co m m a n d s

Co m m a n d Re su lt

\dd object-name Display com m en t s for obj ect - nam e

\db List all t ablespaces

\dn List all sch em as

\d_\dt List all t ables

\di List all index es

\ds List all sequences


\dv List all v iew s
\dS List all Post gr eSQL- defined t ables

\d table-name Show t able definit ion

\d index-name Show index definit ion


\d view-name Show v iew definit ion
\d sequence-name Show sequence definit ion

\dp List all pr iv ileges

\dl List all lar ge obj ect s


\da List all aggr egat es
\df List all funct ions
\dc List all conv er sions

\dC List all cast s

\df function-name List all funct ions w it h giv en nam e


\do List all oper at or s
\do operator-name List all oper at or s w it h giv en nam e

\dT List all t y pes

\dD List all dom ain s


\dg List all gr ou ps
\du List all u ser s

\l List all dat abases in t his clust er

Alt ernat ive View s ( Oracle - St yle Dict ionary View s)

One of t he nice t hings about an open - sour ce pr oduct is t hat code cont r ibut ions com e fr om m an y differ ent places. On e
such pr oj ect ex ist s t o add Or acle- st y le dict ionar y v iew s t o Post gr eSQL. I f y ou ar e an ex per ienced Or acle user , y ou w ill
appr eciat e t his feat ur e. Th e or apgsqlv iew s pr oj ect cont r ibut es Or acle- st y le v iew s su ch as all_views, all_tables,
user_tables, an d so on. For m or e infor m at ion, see ht t p: / / gbor g. post gr esql. or g .

Post gr eSQL v er sion 8. 0 int r oduced a set of v iew s k n ow n as t he INFORMATION_SCHEMA. Th e v iew s defined in t he INFORMATION_SCHEMA
giv e y ou access t o t he infor m at ion st or ed in t he Post gr eSQL sy st em t ables. Th e INFORMATION_SCHEMA is defined as par t of t he SQL
st andar d an d y ou'll find an INFORMATION_SCHEMA in m ost com m er cial ( and a f ew open- sour ce) dat abase sy st em s. I f y ou becom e fam iliar
w it h t he v iew s defined in t he INFORMATION_SCHEMA, y ou'll find it m u ch easier t o m ov e fr om one RDBMS sy st em t o anot her —ev er y
INFORMATION_SCHEMA cont ains t he sam e set of v iew s, each cont aining t he sam e set of colum ns. For ex am ple, t o see a list of t he t ables
defined in y our cur r ent dat abase, y ou could ex ecut e t he com m an d:

SELECT table_schema, table_name, table_type FROM information_schema.tables;

You can ex ecut e t hat sam e quer y in DB2 , MS SQL Ser v er , or I nfor m ix ( sadly , Or acle doesn't suppor t t he INFORMATION_SCHEMA st andar d
at t he t im e w e ar e w r it ing t his) . So w hat can y ou find in t he INFORMATION_SCHEMA?

€ schemata— List s t he schem as ( in t he cu r r en t dat abase) t hat ar e ow ned by y ou

€ tables— List s all t ables in t he cur r ent dat abase ( act ually , y ou on ly see t hose t ables t hat y ou h av e t he r ight t o access in som e
w ay )

€ columns— List s all colu m n s in all t ables t hat y ou h av e t he r ight t o access

€ views— List s all of t he v iew s y ou h av e access t o in t he cu r r en t dat abase

€ table_privileges— Sh ow s t he pr iv ileges y ou hold ( or t hat y ou gr ant ed) for each accessible obj ect in t he cur r ent dat abase

€ domains— List s all of t he dom ains defined in t he cur r ent dat abase

€ check_constraints— List s all of t he CHECK const r aint s defined for t he accessible t ables ( or dom ains) in t he cur r ent dat abase

Ther e ar e m or e v iew s in t he INFORMATION_SCHEMA t han w e'v e descr ibed her e ( in fact , t her e ar e a t ot al of 3 9 INFORMATION_SCHEMA v iew s
in Post gr eSQL 8. 0) . See Ch apt er 30, " Th e I nfor m at ion Schem a, " of t he Post gr eSQL u ser guide for a com plet e list .

Why w ould y ou w ant t o use t he INFORMATION_SCHEMAinst ead of psql's \d com m an ds? We can t hink of t hr ee r easons. Fir st , y ou can u se
t he INFORMATION_SCHEMA inside of y our ow n client applicat ions—y ou can't do t hat w it h t he \d com m an ds because t h ey ar e par t of t he
psql console applicat ion ( it self a Post gr eSQL client ) inst ead of t he Post gr eSQL ser v er . Second, by using t he v iew s defined in t he
INFORMATION_SCHEMA, y ou can r ead t he Post gr eSQL sy st em t ables using t he sam e quer ies t hat y ou w ould use t o r ead t he DB2 sy st em
t ables ( or Sy base or SQL Ser v er ) . That m ak es y our client applicat ions a bit m or e por t able. Finally , y ou can w r it e cust om quer ies against
t he v iew s defined in t he INFORMATION_SCHEMA—y ou can't cust om ize t he \d com m an ds. For ex am ple, if y ou n eed t o find all of t he dat e
colum ns in y our dat abase, j ust look inside of INFORMATION_SCHEMA.columns, lik e t his:

SELECT DISTINCT table_name


FROM information_schema.columns WHERE data_type = 'date';

Need t o k n ow w h ich colu m n s can h old a NUMERIC v alue of at least sev en digit s? Use t his quer y :

SELECT table_name,column_name, numeric_precision


FROM information_schema.columns
WHERE data_type = 'numeric' AND numeric_precision >= 7;
Of cour se, y ou can find all t he infor m at ion ex posed by t he INFORMATION_SCHEMA in t he Post gr eSQL sy st em t ables ( pg_class, pg_index,
and so on) , bu t t he INFORMATION_SCHEMA is oft en m u ch easier t o w or k w it h. The INFORMATION_SCHEMA v iew s usually cont ain hum an -
r eadable n am es for t hings lik e dat a t y pe n am es, t able n am es, an d so on —t he Post gr eSQL sy st em t ables t y pically cont ain OI Ds t hat y ou
hav e t o JOIN t o anot her t able in or der t o com e u p w it h a hum an - r eadable n am e.
Tr a n sa ct ion Pr oce ssin g

No w let ' s m o v e o n t o a n im p or t an t f eat u r e in a n y d at ab ase sy st em : t r an sact ion p r ocessin g .

A t r an sact ion is a g r ou p o f on e or m or e SQL co m m a n d s t r eat ed a s a u n it . Post g r eSQL p r om ises t h at all co m m a n d s w it h in a


t r an sact ion w ill com p let e or t h at n o n e of t h em w ill com p let e. I f a n y co m m a n d w it h in a t r an sact ion d oes n o t com p let e, Post g r eSQL
w ill r oll b ack all ch an g es m ad e w it h in t h e t r an sact ion .

Post g r eSQL m ak es u se o f t r an sact ion s t o en su r e d at ab ase con sist en cy . Tr an sact ion s ar e n eed ed t o coor d in at e u p d at es m a d e b y
t w o or m o r e con cu r r en t u ser s. Ch an g es m a d e b y a t r an sact ion ar e n ot v isib le t o ot h er u ser s u n t il t h e t r an sact ion is com m it t ed .
Wh en y o u co m m i t a t r an sact ion , y ou ar e t ellin g Post g r eSQL t h at all t h e ch an g es m a d e w it h in t h e t r an sact ion ar e log ically
com p let e, t h e ch an g es sh o u l d b e m a d e p er m an en t , a n d t h e ch an g es sh ou ld b e ex p osed t o ot h er u ser s. W h e n y ou r oll b ack a
t r an sact ion , y o u ar e t ellin g Post g r eSQL t h at t h e ch an g es m a d e w it h in t h e t r an sact ion sh ou ld b e d iscar d ed a n d n ot m a d e v isib le t o
ot h er u ser s.

To st ar t a n ew t r an sact ion , ex ecu t e a BEGIN co m m an d . To com p let e t h e t r an sact ion an d h av e Post g r eSQL m a k e y ou r ch an g es
p er m an en t , ex ecu t e t h e COMMIT co m m an d . I f y ou w an t Po st g r eSQL t o r ev er t all ch an g es m a d e w it h in t h e cu r r en t t r an sact ion ,
ex ecu t e t h e ROLLBACK co m m a n d [ 1 1 ] .

[ 11]
BEGIN can also b e w r it t en as BEGIN WORK or BEGIN TRANSACTION. COMMIT can also b e w r it t en as COMMIT WORK
or COMMIT TRANSACTION. ROLLBACK can also w r it t en a s ROLLBACK WORK or ROLLBACK TRANSACTION.

I t ' s im p or t an t t o r ealize t h at all SQL co m m a n d s ex ecu t e w it h in a t r an sact ion . I f y o u d on ' t ex p licit ly BEGIN a t r an sact ion ,
Post g r eSQL w ill au t om at ically ex ecu t e each co m m a n d w it h in it s o w n t r an sact ion .

Pe r sist e n ce

I u sed t o t h in k t h at sin g le- co m m a n d t r an sact ion s w er e p r et t y u seless: I w a s w r on g . Sin g le- co m m a n d t r an sact ion s ar e im p or t an t
b ecau se a sin g le co m m a n d ca n access m u lt ip le r ow s. Con sid er t h e f ollow in g : Let ' s ad d a n ew con st r ain t t o t h e customers t ab le.

movies=# ALTER TABLE customers ADD CONSTRAINT


movies-# balance_exceeded CHECK( balance <= 50 );

Th is con st r ain t en su r es t h at n o cu st o m er is allow ed t o h av e a b alan ce ex ceed i n g $ 5 0 . 0 0 . Ju st t o p r ov e t h at it w or k s, let ' s t r y


set t in g a cu st om er ' s b alan ce t o so m e v alu e g r eat er t h an $ 5 0 . 0 0 :

movies=# UPDATE CUSTOMERS SET balance = 100 where customer_id = 1;


ERROR: ExecReplace: rejected due to CHECK constraint balance_exceeded

You can see t h at t h e UPDATE is r ej ect ed . Wh at h a p p en s if y o u t r y t o u p d at e m o r e t h an o n e r ow ? Fir st , let ' s l ook at t h e d at a


alr ead y in t h e customers t ab le:

movies=# SELECT * FROM customers;


customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
8 | Wink Wankel | 555-1000 | 1988-12-25 | 0.00
(5 rows)

Now , t r y t o UPDATE ev er y r o w in t h is t ab le:

movies=# UPDATE customers SET balance = balance + 40;


ERROR: ExecReplace: rejected due to CHECK constraint balance_exceeded

Th is UPDATE co m m a n d is r ej ect ed b ecau se ad d in g $ 4 0 . 0 0 t o t h e b alan ce f or Rubin, William v iolat es t h e balance_exceeded


con st r ain t . Th e q u est ion is, w er e a n y of t h e customers u p d at ed b ef or e t h e er r or occu r r ed ? Th e an sw er is: p r ob ab ly . You d on ' t
r eally k n o w f or su r e b ecau se an y ch an g es m a d e b ef or e t h e er r or occu r r ed ar e r olled b ack . Th e n et ef f ect is t h at n o ch an g es w er e
m ad e t o t h e d at ab ase:

movies=# SELECT * FROM customers;


customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
8 | Wink Wankel | 555-1000 | 1988-12-25 | 0.00
(5 rows)

I f so m e of t h e ch an g es p er sist ed w h ile ot h er s d id n ot , y o u w o u l d h av e t o so m eh o w f in d t h e p er sist en t ch an g es y ou r self a n d


r ev er t t h em . You can see t h at sin g le- co m m a n d t r an sact ion s ar e f ar f r om u seless. I t t ook m e aw h ile t o lear n t h at lesson

Wh at ab o u t m u l t i co m m an d t r an sact ion s? Post g r eSQL t r eat s a m u l t i co m m an d t r an sact ion in m u ch t h e sa m e w ay t h at it t r eat s a


sin g le- co m m a n d t r an sact ion . A t r an sact ion is at om ic, m ean i n g t h at all t h e co m m a n d s w it h in t h e t r an sact ion ar e t r eat ed as a
sin g le u n it . I f a n y of t h e co m m a n d s f ail t o com p let e, Post g r eSQL r ev er t s t h e ch an g es m a d e b y ot h er co m m a n d s w it h in t h e
t r an sact ion .

Tr a n sa ct ion I sola t ion

I m en t ion ed ear lier in t h is sect ion t h at t h e ch an g es m a d e w it h in a t r an sact ion ar e n o t v isib le t o ot h er u ser s u n t il t h e t r an sact ion is
com m it t ed . To b e a b it m o r e p r ecise, u n co m m i t t ed ch an g es m a d e in o n e t r an sact ion ar e n ot v isib le t o ot h er t r an sact ion s [ 1 2 ] .

[ 12]
Th is d ist in ct ion is i m p o r t an t w h en u sin g ( or d ev elop in g ) a clien t t h at o p en s t w o or m o r e con n ect ion s t o t h e
sam e d at ab ase. Tr an sact ion s ar e n o t sh ar ed b et w een m u lt ip le con n ect ion s. I f y ou m ak e an u n co m m i t t ed ch an g e
u sin g o n e con n ect ion , t h ose ch an g es w ill n o t be v isib le t o t h e ot h er con n ect ion ( u n t il com m it t ed ) .

Tr an sact ion isolat ion h elp s t o en su r e con sist en t d at a w it h in a d at ab ase. Let ' s look at a f ew o f t h e p r ob lem s solv ed b y t r an sact ion
isolat ion .

Con sid er t h e f ollow in g t r an sact ion s:

U se r : b r u ce Tim e U se r : sh e i l a

BEGIN TRANSACTION T1 BEGIN TRANSACTION

UPDATE customers T2

SET balance =
balance - 3

WHERE
customer_id =
2;

T3 SELECT SUM( balance )


FROM customers;

T4 COMMIT TRANSACTION;

ROLLBACK TRANSACTION; T5

At t i m e T1 , bruce an d sheila each b eg in a n e w t r an sact ion . bruce u p d at es t h e b alan ce f or cu st o m er 3 at t im e T1 . At t im e T3 ,


sheila co m p u t es t h e SUM() of t h e b alan ces f or all cu st om er s, com p l et i n g h er t r an sact ion at t i m e T4 . At t i m e T5 , bruce r olls b ack
h is t r an sact ion , d iscar d in g all ch an g es w it h in h is t r an sact ion . I f t h ese t r an sact ion s w er e n ot isolat ed f r om each ot h er , sheila
w o u l d h av e a n in cor r ect an sw er : Her a n sw er w as calcu lat ed u si n g d at a t h at w as r olled b ack .

Th is p r ob lem is k n o w n as t h e d ir t y r ead p r ob l em : w it h ou t t r an sact ion isolat ion , sheila w o u l d r ead u n co m m i t t ed d at a. Th e


solu t ion t o t h is p r ob lem is k n o w n as READ COMMITTED. READ COMMITTED is on e o f t h e t w o t r an sact ion isolat ion lev els su p p or t ed b y
Post g r eSQL. A t r an sact ion r u n n i n g at t h e READ COMMITTED isolat ion lev el is n ot al l o w ed t o r ead u n co m m i t t ed d at a. I ' ll sh o w y ou
h o w t o ch an g e t r an sact ion lev els in a m o m en t .

Th er e ar e ot h er d at a con sist en cy p r o b l em s t h at ar e av o i d ed b y isolat in g t r an sact ion s f r om ea ch ot h er . I n t h e f ollow in g scen ar io,


sheila w ill r eceiv e t w o d if f er en t an sw er s w it h in t h e sa m e t r an sact ion :

U se r : b r u ce Tim e U se r : sh e i l a

BEGIN TRANSACTION; T1 BEGIN TRANSACTION;

T2 SELECT balance

FROM customers

WHERE
customer_id =
2;

UPDATE customers

SET balance =
20
WHERE T3
customer_id =
2;

COMMIT TRANSACTION; T4

T5 SELECT balance
FROM customers
WHERE customer_id = 2;

T6 COMMIT TRANSACTION;

Ag ain , bruce an d sheila each st ar t a t r an sact ion at t im e T1 . At T2 , sheila f in d s t h at cu st o m er 2 h as a b alan ce o f $ 1 5 . 0 0 . bruce


ch an g es t h e b alan ce f or cu st o m er 2 f r om $ 1 5 . 0 0 t o $ 2 0 . 0 0 at t im e T3 an d co m m i t s h is ch an g e at t i m e T4 . At t im e T5 , sheila
ex ecu t es t h e sam e q u er y t h at sh e ex ecu t ed ear lier in t h e t r an sact ion , b u t t h is t im e sh e f in d s t h at t h e b alan ce is $ 2 0 . 0 0 . I n so m e
ap p licat ion s, t h is isn ' t a p r ob l em ; in ot h er s, t h is in t er f er en ce b et w een t h e t w o t r an sact ion s is u n accep t ab le. Th i s p r ob lem is
k n o w n as t h e n o n - r ep eat ab le r ead .

Her e is an ot h er t y p e o f p r ob lem :

U se r : b r u ce Tim e U se r : sh e i l a

BEGIN TRANSACTION; T1 BEGIN TRANSACTION;

T2 SELECT * FROM customers;

INSERT INTO customers VALUES T3


(
6,
'Neville, Robert',
'555-9999',
'1971-03-20',
0.00
);

COMMIT TRANSACTION; T4

T5 SELECT * FROM customers;

T6 COMMIT TRANSACTION;

I n t h is ex am p le, sheila ag ain ex ecu t es t h e sam e q u er y t w ice w it h in a sin g le t r an sact ion . Th is t im e, bruce h as in ser t ed a n ew
r o w in b et w een t h e sheila's q u er ies. Not ice t h at t h is is n o t a case of a d ir t y r ead —bruce h as com m i t t ed h is ch an g e b ef or e
sheila ex ecu t es h er secon d q u er y . At t im e T5 , sheila f in d s a n e w r ow . Th i s is sim ilar t o t h e n o n - r ep eat ab le r ead , b u t t h is
p r ob lem is k n o w n as t h e p h an t o m r ead p r ob lem .

Th e an sw er t o b ot h t h e n o n - r ep eat ab le r ead a n d t h e p h an t o m r ead is t h e SERIALIZABLE t r an sact ion isolat ion lev el. A t r an sact ion
r u n n in g at t h e SERIALIZABLE isolat ion lev el is on ly al l o w ed t o see d at a co m m i t t ed b ef or e t h e t r an sact ion b eg an .

I n Post g r eSQL, t r an sact ion s u su ally r u n at t h e READ COMMITTED isolat ion lev el. I f y ou n eed t o av oid t h e p r ob l em s p r esen t in READ
COMMITTED, y o u can ch an g e isolat ion lev els u si n g t h e SET TRANSACTION co m m an d . Th e sy n t ax f or t h e SET TRANSACTION
co m m a n d is

SET TRANSACTION ISOLATION LEVEL { READ COMMITTED | SERIALIZABLE };

Th e SET TRANSACTION co m m a n d af f ect s on ly t h e cu r r en t t r an sact ion ( an d it m u st b e ex ecu t ed b ef or e t h e f ir st D ML [ 1 3 ] co m m a n d


w it h in t h e t r an sact ion ) . I f y o u w an t t o ch an g e t h e isolat ion lev el f or y ou r session ( t h at is, ch an g e t h e isolat ion lev el f or f u t u r e
t r an sact ion s) , y o u can u se t h e SET SESSION co m m a n d :

[ 13]
A D ML ( d at a m an ip u lat ion l an g u ag e) co m m a n d is a n y co m m a n d t h at can u p d at e o r r ead t h e d at a w it h in a
t ab le. SELECT, INSERT, UPDATE, FETCH, an d COPY ar e D ML co m m a n d s.

SET SESSION CHARACTERISTICS AS


TRANSACTION ISOLATION LEVEL { READ COMMITTED | SERIALIZABLE }

Post g r eSQL v er sion 8 . 0 in t r od u ces a n ew t r an sact ion p r ocessin g f eat u r e called a SAVEPOINT. A SAVEPOINT is a n am ed m ar k er t h at
y ou d ef in e w it h in t h e st r eam of co m m a n d s t h at m a k e u p a t r an sact ion . On ce y o u ' v e d ef in ed a SAVEPOINT, y o u can ROLLBACK an y
ch an g es t h at y o u ' v e m a d e sin ce t h at p oin t w i t h o u t d iscar d in g ch an g es m a d e p r ior t o t h e SAVEPOINT—in ot h er w or d s, y ou ca n
ROLLBACK p ar t of a t r an sact ion ( t h e t r ailin g p ar t ) w it h ou t r ollin g b ack t h e en t ir e t r an sact ion . To cr eat e a SAVEPOINT, ex ecu t e a
SAVEPOINT co m m a n d w it h in a t r an sact ion . Th e sy n t ax f or a SAVEPOINT co m m a n d is v er y si m p l e:

SAVEPOINT savepoint-name

Th e sav ep oin t - n a m e m u st f ollow t h e n or m al r u les f or a n id en t if ier ; it m u st b e u n i q u e w it h in t h e f ir st 6 4 ch ar act er s an d m u st st ar t


w it h a let t er or u n d er scor e ( or it m u st b e a q u ot ed id en t if ier ) . A SAVEPOINT g iv es a n a m e t o a p o i n t in t i m e; in p ar t icu lar , a p oin t
b et w een t w o SQL co m m an d s. Con sid er t h e f ollow in g seq u en ce:

movies=# SELECT customer_id, customer_name FROM customers;


customer_id | customer_name
-------------+----------------------
3 | Panky, Henry
1 | Jones, Henry
4 | Wonderland, Alice N.
2 | Rubin, William
(4 rows)

movies=# START TRANSACTION;


START TRANSACTION

movies=# INSERT INTO customers VALUES( 5, 'Kemp, Hans' );


INSERT 44272 1

movies=# SELECT * FROM customers;


customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
5 | Kemp, Hans | | |
(5 rows)

At t h is p oin t , y ou ' v e st ar t ed a n e w t r an sact ion a n d in ser t ed a n ew r o w , b u t y o u h av en ' t com m i t t ed y o u r ch an g es y et . No w d ef in e


a SAVEPOINT n am ed p1 an d in ser t a secon d r o w :

movies=# SAVEPOINT P1;


SAVEPOINT

movies=# INSERT INTO customers VALUES( 6, 'Falkstein, Gerhard' );


INSERT 44273 1

Th e SAVEPOINT co m m a n d in ser t ed a m ar k er in t o t h e t r an sact ion st r eam . I f y o u ex ecu t e a ROLLBACK co m m a n d at t h is p oin t , b ot h


of t h e n ew l y in ser t ed r o w s w ill b e d i scar d ed ( in ot h er w o r d s, all of t h e ch an g es y ou ' v e m a d e in t h is t r an sact ion w ill b e r olled
b ack ) :

movies=# ROLLBACK;
ROLLBACK
movies=# SELECT * FROM customers;
customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
(4 rows)

No w r ep eat t h e sa m e seq u en ce of co m m an d s, b u t t h is t im e ar o u n d , ex ecu t e a q u alif ied ROLLBACK co m m an d , lik e t h is:

movies=# ROLLBACK TO SAVEPOINT P1;


ROLLBACK
movies=# SELECT * FROM customers;
customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
5 | Kemp, Hans | | |
(5 rows)

Wh en y o u ROLLBACK t o a SAVEPOINT, ch an g es m a d e sin ce t h e SAVEPOINT ar e d iscar d ed , b u t n ot ch an g es m a d e b ef or e t h e


SAVEPOINT. So, y o u see t h at t h e customers t ab le r et ain s t h e f ir st r o w t h at y ou in ser t ed , b u t n ot t h e secon d r ow . Wh en y o u
ROLLBACK t o a SAVEPOINT, y o u ar e st ill in t h e m i d d l e of a t r an sact ion — y ou m u st com p let e t h e t r an sact ion w it h a COMMIT or
ROLLBACK co m m a n d .
Her e ar e a f ew im p or t an t p oin t s t o k eep in m in d w h en y ou ' r e w or k i n g w it h SAVEPOINTs:

€ You can n est SAVEPOINTs. Fo r ex am p le, if y o u cr eat e a SAVEPOINT n a m ed P1, t h en cr eat e a secon d SAVEPOINT n am ed P2,
y ou h a v e cr eat ed a n est ed SAVEPOINT ( P2 is n est ed w it h in P1) . I f y o u ROLLBACK TO SAVEPOINT P2, Post g r eSQL d iscar d s
an y ch an g es m a d e sin ce P2, b u t p r eser v es ch an g es m a d e b et w een P1 an d P2. On t h e ot h er h an d , if y o u ROLLBACK TO
SAVEPOINT P1, Post g r eSQL d iscar d s all ch an g es m a d e sin ce P1, in clu d in g all ch an g es m ad e sin ce P2. Nest ed SAVEPOINTs
ar e h an d y w h en y o u ar e w or k i n g w it h a m u lt ilev el t ab le st r u ct u r e su ch as ORDERS an d LINEITEMS ( w h er e y ou h av e m u lt ip le
lin e i t em s p er or d er ) . I f y ou d ef in e a SAVEPOINT p r ior t o m od if y in g each or d er , a n d a secon d , n est ed SAVEPOINT p r ior t o
m od if y in g each lin e it em , y o u can ROLLBACK ch an g es m ad e t o a sin g le lin e it em , ch an g es m ad e t o a sin g le or d er , or a n
en t ir e t r an sact ion .

€ You can u se t h e sa m e SAVEPOINT n a m e as of t en a s y ou lik e w it h in a sin g le t r an sact ion —t h e n ew SAVEPOINT sim p ly


r ep laces t h e old SAVEPOINT [ 1 4 ] . Ag ain , t h is is u sef u l w h e n y ou ar e w or k i n g w it h a m u lt ilev el t ab le st r u ct u r e. I f y o u cr eat e a
SAVEPOINT p r ior t o p r ocessi n g each lin e it em a n d y ou g iv e each of t h ose SAVEPOINTs t h e sa m e n am e, y o u can ROLLBACK
ch an g es m a d e t o t h e m o st r ecen t ly p r ocessed lin e it em .

[ 14]
Post g r eSQL d oesn ' t f ollow t h e SQL st an d ar d w h e n y ou cr eat e t w o SAVEPOINTs w it h in t h e sa m e
t r an sact ion . Po st g r eSQL sim p ly h i d es t h e old SAVEPOINT— t h e SQL st an d ar d st at es t h at t h e old SAVEPOINT
sh ou ld b e d est r oy ed . I f y o u n eed t h e SQL- p r escr ib ed b eh av ior , y ou can d est r oy t h e o l d SAVEPOINT w it h t h e
co m m a n d RELEASE SAVEPOINT savepoint-name.

€ I f y o u ROLLBACK t o a SAVEPOINT, t h e SAVEPOINT is n ot d est r oy ed —y ou can m a k e m o r e ch an g es in t h e t r an sact ion an d


ROLLBACK t o t h e SAVEPOINT ag ain . Ho w ev er , an y SAVEPOINTs n est ed w it h in t h at SAVEPOINT w ill b e d est r oy ed . To con t in u e
t h e ORDERS an d LINEITEMS ex am p le, if y ou ROLLBACK t h e ch an g es m a d e t o a n ORDERS r o w , y ou also d iscar d ch an g es m a d e
t o t h e LINEITEMS f or t h at ORDER an d y o u ar e d est r oy in g t h e SAVEPOINT t h at y o u cr eat ed f or t h e m o st r ecen t lin e i t em .

€ I f y o u m a k e a m ist ak e ( su ch as a t y p in g er r or ) , Post g r eSQL r olls b ack t o t h e m o st r ecen t SAVEPOINT. Th at ' s a v er y n ice


f eat u r e. I f y ou ' v e u sed Post g r eSQL f or an y len g t h of t im e, y ou ' v e su r ely ex er cised y ou r v ocab u lar y af t er w at ch i n g
Post g r eSQL t h r o w ou t a lon g a n d co m p l ex t r an sact ion b ecau se y o u m a d e a si m p l e t y p in g er r or . I f y ou in ser t SAVEPOINTs in
y ou r t r an sact ion , y ou w on ' t lose as m u ch w o r k w h en y o u r f in g er s f u m b l e a t ab le n a m e .

M u lt i- V e r sion in g a n d Lock in g

Most com m er cial ( an d o p en - sou r ce) d at ab ases u se lock in g t o coor d in at e m u l t i u ser u p d at es. I f y o u ar e m od if y in g a t ab le, t h at
t ab le is lock ed ag ain st u p d at es an d q u er ies m a d e b y ot h er u ser s. So m e d at ab ases p er f or m p a g e- lev el or r ow - lev el lock in g t o
r ed u ce con t en t ion , b u t t h e p r in cip le is t h e sa m e— ot h er u ser s m u st w ait t o r ead t h e d at a y ou h a v e m od if ied u n t il y o u h av e
com m i t t ed y ou r ch an g es.

Post g r eSQL u ses a d if f er en t m od el called m u lt i - v er sion in g , or MVCC f or sh or t ( lock s ar e st ill u sed , b u t m u ch less f r eq u en t ly t h a n
y ou m i g h t ex p ect ) . I n a m u lt i - v er sion in g sy st em , t h e d at ab ase cr eat es a n ew co p y of t h e r ow s y o u h a v e m od if ied . Ot h er u ser s
see t h e or ig in al v al u es u n t il y o u co m m i t y ou r ch an g es— t h ey d on ' t h av e t o w ait u n t il y o u f in ish . I f y o u r oll b ack a t r an sact ion ,
ot h er u ser s ar e n ot af f ect ed — t h ey d id n o t h av e access t o y ou r ch an g es in t h e f ir st p lace. I f y ou co m m i t y ou r ch an g es, t h e or ig in al
r ow s ar e m ar k ed a s ob solet e an d ot h er t r an sact ion s r u n n i n g at t h e READ COMMITTED isolat ion lev el w ill see y o u r ch an g es.
Tr an sact ion s r u n n i n g at t h e SERIALIZABLE isolat ion lev el w ill con t in u e t o see t h e or ig in al r ow s. Ob solet e d at a is n o t au t om at ically
r em o v ed f r o m a Po st g r eSQL d at ab ase. I t is h id d en , b u t n o t r em o v ed . Yo u can r em o v e ob solet e r o w s u sin g t h e VACUUM co m m an d .
Th e sy n t ax of t h e VACUUM co m m a n d is

VACUUM [ VERBOSE ] [ ANALYZE ] [ table ]

I 'll t alk ab o u t t h e VACUUM co m m a n d in m o r e d et ail in t h e n ex t ch ap t er .

Th e MVCC t r an sact ion m o d el p r ov id es f or m u ch h i g h er con cu r r en cy t h an m o st ot h er m o d el s. Ev en t h o u g h Po st g r eSQL u ses


m u lt ip le v er sion s t o isolat e t r an sact ion s, it is st ill n ecessar y t o lock d at a in so m e cir cu m st an ces.

Tr y t h is ex p er im en t . Op en t w o psql session s, each con n ect ed t o t h e movies d at ab ase. I n o n e session , en t er t h e f ollow in g


co m m a n d s:

movies=# BEGIN WORK;


BEGIN
movies=# INSERT INTO customers VALUES
movies-# ( 5, 'Manyjars, John', '555-8000', '1960-04-02', 0 );
INSERT

I n t h e ot h er session , en t er t h ese co m m a n d s:

movies=# BEGIN WORK;


BEGIN
movies=# INSERT INTO customers VALUES
movies-# ( 6, 'Smallberries, John', '555-8001', '1960-04-02', 0 );
INSERT
Wh en y o u p r ess t h e Enter ( or Return) k ey , t h is INSERT st at em en t com p let es im m ed iat ely . No w , en t er t h is co m m a n d in t o t h e
secon d session :

movies=# INSERT INTO customers VALUES


movies-# ( 5, 'Gomez, John', '555-8000', '1960-04-02', 0 );

Th is t im e, w h en y o u p r ess Enter, psql h an g s. Wh at is it w ait in g f or ? Not ice t h at in t h e f ir st session , y ou alr ead y a d d ed a


cu st om er w h ose customer_id is 5 , b u t y o u h av e n o t y et co m m i t t ed t h is ch an g e. I n t h e secon d session , y o u ar e also t r y in g t o
in ser t a cu st o m er w h ose customer_id is 5 . Yo u can ' t h av e t w o cu st om er s w it h t h e sa m e customer_id ( b ecau se y ou h a v e d ef in ed
t h e customer_id col u m n t o b e t h e u n i q u e PRIMARY KEY) . I f y ou co m m i t t h e f ir st t r an sact ion , t h e secon d session w o u l d r eceiv e a
d u p licat e v alu e er r or . I f y ou r oll b ack t h e f ir st t r an sact ion , t h e secon d in ser t ion w ill con t in u e ( b ecau se t h er e is n o l on g er a
con st r ain t v iolat ion ) . Post g r eSQL w o n ' t k n o w w h i ch r esu lt t o g iv e y ou u n t il t h e t r an sact ion co m p l et es in t h e f ir st session .
Su m m a r y

Chapt er 1, "I nt roduct ion t o Post greSQL and SQL," showed you som e of t he basics of ret rieving and m odifying dat a using
Post greSQL. I n Chapt er 2, "Working wit h Dat a in Post greSQL," you learned about t he m any dat a t ypes offered by
Post greSQL. This chapt er has filled in som e of t he scaffolding—you've seen how t o creat e new dat abases, new t ables, and
new indexes. You've also seen how Post greSQL solves concurrency problem s t hrough it s m ult i- versioning t ransact ion m odel.

The next chapt er, Chapt er 4, "Perform ance," should help you underst and how t he Post greSQL server decides on t he fast est
way t o execut e your SQL com m ands.
Cha pt e r 4 . Pe r for m a nce

I n t he p r ev iou s t h r ee ch apt er s, y ou h av e seen h o w t o cr eat e n ew d at ab ases an d t ables. You h av e also seen a v ar iet y of w ay s t o
r et r iev e d at a. I n ev it ably , y ou w ill r u n in t o a p er f or m an ce p r ob lem . At som e poin t , Post g r eSQL w on ' t pr ocess d at a as qu ick ly as y ou
w ou ld lik e. Th is ch ap t er sh ou ld p r ep ar e y ou f or t h at sit u at ion —aft er r eadin g t h is ch apt er , y ou ' ll h av e a g ood u n d er st an d in g of h ow
Post g r eSQL ex ecu t es a qu er y an d w h at y ou can d o t o m ak e qu er ies r u n f ast er .

H ow Post gr e SQL Or ga n iz e s D a t a

Bef or e y ou can r eally d ig in t o t h e det ails of p er f or m an ce t u n in g, y ou n eed t o u n der st an d som e of t h e basic ar ch it ect u r e of
Post g r eSQL.

You alr ead y k n ow t h at in Post gr eSQL, dat a is st or ed in t ables an d t ables ar e g r ou p ed in t o d at ab ases. At t h e h igh est lev el of
or gan izat ion , d at ab ases ar e g r ou p ed in t o clu st er s—a clu st er of d at ab ases is ser v iced by a p ost m ast er .

Let ' s see h o w t h is d at a h ier ar ch y is st or ed on disk . You can see all d at ab ases in a clu st er u sin g t h e f ollow in g qu er y :

perf=# SELECT datname, oid FROM pg_database;


datname | oid
-----------+-------
perf | 16556
template1 | 1
template0 | 16555

Fr om t h is list , y ou can see t h at I h av e t h r ee d at ab ases in t h is clu st er . You can f in d t h e st or ag e for t h ese d at ab ases by look in g in t he
$PGDATA d ir ect or y :

$ cd $PGDATA
$ ls
base pg_clog pg_ident.conf pg_xlog postmaster.opts
global pg_hba.conf PG_VERSION postgresql.conf postmaster.pid

Th e $PGDATA d ir ect or y h as a su bdir ect or y n am ed base. Th e base su bdir ect or y is w h er e y ou r d at ab ases r esid e:

$ cd ./base
$ ls -l
total 12
drwx------ 2 postgres pgadmin 4096 Jan 01 20:53 1
drwx------ 2 postgres pgadmin 4096 Jan 01 20:53 16555
drwx------ 3 postgres pgadmin 4096 Jan 01 22:38 16556

Not ice t h at t h er e ar e t h r ee su bdir ect or ies u n d er n eat h $PGDATA/base. Th e n am e of each su bdir ect or y cor r esp on d s t o t h e oid of on e
en t r y in t h e pg_database t able: t he su bdir ect or y n am ed 1 con t ain s t h e template1 d at ab ase, t he su bdir ect or y n am ed 16555 con t ain s
t he template0 d at ab ase, an d t h e su bdir ect or y n am ed 16556 con t ain s t h e perf d at ab ase.

Let ' s look a lit t le d eep er :

$ cd ./1
$ ls
1247 16392 16408 16421 16429 16441 16449 16460 16472
1249 16394 16410 16422 16432 16442 16452 16462 16474
1255 16396 16412 16423 16435 16443 16453 16463 16475
1259 16398 16414 16424 16436 16444 16454 16465 16477
16384 16400 16416 16425 16437 16445 16455 16466 pg_internal.init
16386 16402 16418 16426 16438 16446 16456 16468 PG_VERSION
16388 16404 16419 16427 16439 16447 16457 16469
16390 16406 16420 16428 16440 16448 16458 16471

Again , y ou see a lot of files w it h n u m er ic f ilen am es. You m ig h t g u ess t h at t h ese n u m b er s also cor r esp on d t o oids, an d ( by ch an ce)
y ou w ou ld of t en be cor r ect . Ev er y t able ( an d in d ex ) in a d at ab ase is cat alogu ed in t he pg_class sy st em t able. To f in d a t ab le in
pg_class, sear ch for a r ow w h er e t h e relname colu m n is eq u al t o t h e n am e of t h e t able. For ex am p le, t o f in d t h e pg_class en t r y f or
t he pg_group t able, ex ecu t e t he com m an d

SELECT * FROM pg_class WHERE relname = 'pg_group';

Th e pg_class.relfilenode v alu e f or a t able d et er m in es t h e n am e of t he file t h at st or es t h e t ab le ( lik ew ise, t h e


pg_class.relfilenode v alu e f or an in d ex d et er m in es t h e n am e of t h e file t h at st or es in t h e in dex ) . I n m ost cases, t h e OID of a
t able' s pg_class en t r y m at ch es t h e t able' s relfilenode, bu t t h at ' s n o t alw ay s t h e case. I f y ou ALTER a t able in su ch a w ay t h at
Post g r eSQL m u st f ir st m ak e a n ew cop y of t h e t ab le an d t h en d r op t h e or igin al, t h e t able' s relfilenode w ill ch an g e. Th e
relfilenode m ay also ch an g e if y ou CLUSTER t h e t ab le ( Post g r eSQL m ak es a n ew cop y of t h e t ab le in t h e d esir ed or der an d t h en
dr ops t he or igin al) . Th e relfilenode v alu e f or an in d ex m ay ch an g e if y ou r ebu ild in t h e in d ex w it h a REINDEX com m an d , or if y ou
ALTER t h e dat a t y p e of a colu m n cov er ed by t h e in d ex .
To see t h e cor r esp on d en ce b et w een a t able' s relfilenode an d it s f ilen am e, sim ply com p ar e t h e ou t p u t f r om t h e f ollow in g SELECT
com m an d t o t h e f ilesy st em dir ect or y t h at con t ain s t h e d at ab ase

test=# SELECT relfilenode, relname FROM pg_class ORDER BY relfilenode;


relfilenode | relname
-------------+---------------------------------
0 | pg_xactlock
1247 | pg_type
1249 | pg_attribute
1255 | pg_proc
1259 | pg_class
... | ...

Each t able is st or ed in it s ow n disk file an d t h e n am e of t h e file is d et er m in ed by t h e oid relfilenode of t h e t able' s en t r y in t h e


pg_class t ab le.

Th er e ar e t w o m or e colu m n s in pg_class t h at m ig h t h elp ex p lain Post gr eSQL' s st or ag e st r u ct u r e:

perf=# SELECT relname, oid, relpages, reltuples FROM pg_class


perf-# ORDER BY oid
relname | oid | reltuples | relpages
--------------+------+-----------+----------
pg_type | 1247 | 143 | 2
pg_attribute | 1249 | 795 | 11
pg_proc | 1255 | 1263 | 31
pg_class | 1259 | 101 | 2
pg_shadow | 1260 | 1 | 1
pg_group | 1261 | 0 | 0
... | ... | ... | ...

Th e reltuples colu m n t ells y ou h ow m an y t u ples ar e in each t able. Th e relpages colu m n sh ow s h ow m an y p ag es ar e r eq u ir ed t o


st or e t h e cu r r en t con t en t s of t h e t able. How do t h ese n u m b er s cor r esp on d t o t h e act u al on - disk st r u ct u r es? I f y ou look at t h e t able
files f or a f ew t ables, y ou ' ll see t h at t h er e is a r elat ion sh ip b et w een t h e size of t h e file an d t h e n u m b er of relpages colu m n s:

$ ls -l 1247 1249
-rw------- 1 postgres pgadmin 16384 Jan 01 20:53 1247
-rw------- 1 postgres pgadmin 90112 Jan 01 20:53 1249

Th e file n am ed 1247 ( pg_type) is 1 6 , 3 8 4 b y t es lon g an d con su m es t w o p ag es. Th e file n am ed 1249 ( pg_attribute) is 9 0 , 1 2 2 b y t es


lon g an d con su m es 11 p ag es. A lit t le m at h w ill sh ow t h at 1 6 , 3 8 4 / 2 = 8 , 1 9 2 an d 9 0 , 1 2 2 / 1 1 = 8 , 1 9 2 : each p ag e is 8 , 1 9 2 ( 8 K) b y t es
lon g. I n Post gr eSQL, all disk I / O is p er f or m ed on a p ag e- by - p ag e basis [ 1] . Wh en y ou select a sin gle r ow f r om a t able, Post g r eSQL
w ill r ead at least on e p ag e—it m ay r ead m an y p ag es if t h e r ow is lar ge. Wh en y ou u pdat e a sin gle r ow , Post g r eSQL w ill w r it e t h e n ew
v er sion of t h e r ow at t h e en d of t h e t able an d w ill m ar k t h e or igin al v er sion of t h e r ow as in v alid.

[ 1]
Act u ally , m ost disk I / O is p er f or m ed on a p ag e- by - p ag e basis. Som e con f igu r at ion files an d log files ar e accessed in
ot h er f or m s, b u t all t able an d in d ex access is don e in p ag es.

Th e size of a p ag e is f ix ed at 8 , 1 9 2 by t es. You can in cr ease or d ecr ease t h e p ag e size if y ou bu ild y ou r ow n cop y of Post g r eSQL f r om
sou r ce, b u t all p ag es w it h in a d at ab ase w ill be t h e sam e size. Th e size of a r ow is n ot f ix ed—dif f er en t t ables w ill y ield dif f er en t r ow
sizes. I n fact , t h e r ow s w it h in a sin gle t ab le m ay dif f er in size if t h e t ab le con t ain s v ar iable- len gt h colu m n s. Giv en t h at t he p ag e size
is f ix ed an d t h e r ow size is v ar iable, it ' s difficult t o pr edict ex act ly h ow m an y r ow s w ill f it w it h in an y g iv en p ag e.

The perf da t a ba se a nd t h e recalls Ta ble

Th e sam p le d at ab ase t h at y ou h av e b een u sin g so f ar doesn ' t r eally h old en ou g h dat a t o sh ow p er f or m an ce


r elat ion sh ips. I n st ead, I ' v e cr eat ed a n ew d at ab ase ( n am ed perf) t h at h old s som e lar ge t ables. I 'v e d ow n load ed t h e
recalls d at ab ase f r om t h e U. S. Nat ion al Hig h w ay Tr af f ic Saf et y Adm in ist r at ion [ 2] . This d at ab ase con t ain s a sin gle
t able w it h 3 9 , 2 4 1 r ow s. Her e is t h e lay ou t of t h e recalls t ab le:

Code View : Scr oll / Sh ow All

perf=# \d recalls
Table "recalls"
Column | Type | Modifiers
-------------+-------------------------+-----------
record_id | numeric(9,0) |
campno | character(9) |
maketxt | character(25) |
modeltxt | character(25) |
yeartxt | character(4) |
mfgcampno | character(10) |
compdesc | character(75) |
mgftxt | character(30) |
bgman | character(8) |
endman | character(8) |
vet | character(1) |
potaff | numeric(9,0) |
ndate | character(8) |
odate | character(8) |
influenced | character(4) |
mfgname | character(30) |
rcdate | character(8) |
datea | character(8) |
rpno | character(3) |
fmvss | character(3) |
desc_defect | character varying(2000) |
con_defect | character varying(2000) |
cor_action | character varying(2000) |
Indexes: recall_record_id

Not ice t h at t h er e is on ly on e in d ex an d it cov er s t h e record_id colu m n .

[ 2]
Th is d at a ( f t p: / / f t p. n h t sa. dot . gov / r ev _ r ecalls/ ) is in t h e f or m of a flat ASCI I file. I h ad t o im p or t t h e d at a in t o m y
perf d at ab ase.

Th e recalls t ab le in t h e perf d at ab ase con t ain s 3 9 , 2 4 1 r ow s in 4 , 4 1 3 p ag es:

perf=# SELECT relname, reltuples, relpages, oid FROM pg_class


perf-# WHERE relname = 'recalls';
relname | reltuples | relpages | oid
---------+-----------+----------+-------
recalls | 39241 | 4413 | 96409

Giv en t h at a p ag e is 8 , 1 9 2 b y t es lon g, y ou w ou ld ex pect t h at t h e file h oldin g t h is t ab le ( $PGDATA/base/16556/96409) w ou ld b e


3 6 , 1 5 1 , 2 9 6 b y t es lon g :

$ ls -l $PGDATA/base/16556/96409
-rw------- 1 postgres pgadmin 36151296 Jan 01 23:34 96409

Figu r e 4. 1 sh ow s h ow t he recalls t ab le m ig h t look on disk . ( Not ice t h at t h e r ow s ar e n ot sor t ed —t h ey ap p ear in t h e ap p r ox im at e


or der of in ser t ion . )

Fig u r e 4 . 1 . T h e recalls t a b le a s it m i g h t l o o k o n d isk .

[ View full size im ag e]


I f a r ow is t oo lar ge t o fit in t o a sin gle 8 K b lock [ 3] , Post g r eSQL w ill w r it e par t of t h e dat a in t o a TOAST [ 4] t able. A TOAST t ab le act s
as an ex t en sion t o a n or m al t able. I t h old s v alu es t oo lar ge t o f it in lin e in t h e m ain t ab le.

[ 3]
Post g r eSQL t r ies t o st or e at least f ou r r ow s per h eap p ag e an d at least f ou r en t r ies p er in d ex p ag e.

[ 4]
Th e acr on y m TOAST st an ds f or " t h e ov er sized at t r ibu t e st or ag e t ech n iqu e. "

I n dex es ar e also st or ed in p ag e files. A p ag e t h at h old s r ow dat a is called a h eap p ag e. A p ag e t h at h old s in d ex d at a is called an


in dex p ag e. You can locat e t h e p ag e file t h at st or es an in dex b y ex am in in g t h e in dex ' s en t r y in t h e pg_class t able. An d , j u st lik e
t ables, it is dif f icu lt t o pr edict h o w m an y in d ex en t r ies w ill fit in t o each 8 K p ag e [ 5] . I f an in d ex en t r y is t oo lar ge, it is m ov ed t o an
in dex TOAST t ab le.

[ 5]
I f y ou w an t m or e in f or m at ion ab ou t h ow dat a is st or ed in side a page, I r ecom m en d t h e pg_filedump u t ilit y f r om
Red Hat .

I n Post gr eSQL, a p ag e t h at con t ain s r ow d at a is a h eap block . A p ag e t h at con t ain s in d ex d at a is an in dex block . You w ill n ev er f in d
h eap block s an d in d ex block s in t he sam e p ag e file.

Pa ge Ca ch in g

Tw o of t h e f u n d am en t al p er f or m an ce r u les in an y d at ab ase sy st em ar e:

€ Mem or y access is fast ; disk access is slow .

€ Mem or y sp ace is scar ce; disk sp ace is ab u n d an t .

Accor din gly , Post g r eSQL t r ies v er y h ar d t o m in im ize disk I / O by k eep in g f r equ en t ly u sed d at a in m em or y . Wh en t h e fir st ser v er
pr ocess st ar t s, it cr eat es an in - m em or y dat a st r u ct u r e k n ow n as t h e bu ffer cach e. Th e bu f f er cach e is or g an ized as a collect ion of 8 K
p ag es—each p ag e in t h e bu f f er cach e cor r esp on d s t o a p ag e in som e p ag e file. Th e b u f f er cach e is sh ar ed b et w een all p r ocesses
ser v icin g a g iv en d at ab ase.
Wh en y ou select a r ow f r om a t able, Post g r eSQL w ill r ead t h e h eap b lock t h at con t ain s t h e r ow in t o t h e bu ffer cach e. I f t h er e isn ' t
en ou g h f r ee sp ace in t h e cach e, Post g r eSQL w ill m ov e som e ot h er b lock ou t of t h e cach e. I f a block b ein g r em ov ed f r om t he cach e
h as b een m od if ied , it w ill b e w r it t en b ack ou t t o disk ; ot h er w ise. it w ill sim p ly b e discar ded. I n d ex block s ar e bu f f er ed as w ell.

I n t h e " Gat h er in g Per f or m an ce I n f or m at ion " sect ion , y ou ' ll see h o w t o m easu r e t h e p er f or m an ce of t h e cach e an d h o w t o ch an g e it s
size.

Su m m a r y

This sect ion g av e y ou a g ood ov er v iew of h ow Post g r eSQL st or es dat a on disk . Wit h som e of t h e f u n d am en t als ou t of t h e w ay , y ou
can m ov e on t o m or e p er f or m an ce issu es.
Ga t h e r i n g P e r f o r m a n ce I n f o r m a t i o n

Wit h release 7.2, t he Post greSQL developers int roduced a new collect ion of perform ance- relat ed syst em views. These views ret urn t wo
dist inct kinds of inform at ion. The pg_stat views charact erize t he frequency and t ype of access for each t able in a dat abase. The pg_statio
views will t ell you how m uch physical I / O is perform ed on behalf of each t able.

Let 's look at each set of perform ance- relat ed views in m ore det ail.

The pg_stat_all_tables cont ains one row for each t able in your dat abase. Here is t he layout of pg_stat_all_tables:

perf=# \d pg_stat_all_tables
View "pg_stat_all_tables"
Column | Type | Modifiers
---------------+---------+-----------
relid | oid |
schemaname | name |
relname | name |
seq_scan | bigint |
seq_tup_read | bigint |
idx_scan | numeric |
idx_tup_fetch | numeric |
n_tup_ins | bigint |
n_tup_upd | bigint |
n_tup_del | bigint |

The seq_scan colum n t ells you how m any sequent ial ( t hat is, t able) scans have been perform ed for a given t able, and seq_tup_read t ells
you how m any rows were processed t hrough t able scans. The idx_scan and idx_tup_fetch colum ns t ell you how m any index scans have
been perform ed for a t able and how m any rows were processed by index scans. The n_tup_ins, n_tup_upd, and n_tup_del colum ns t ell you
how m any rows were insert ed, updat ed, and delet ed, respect ively.

Qu e r y Ex e cu t ion

I f you're not fam iliar wit h t he t erm s " t able scan" or " index scan," don't w or r y—I 'll cover query execut ion lat er in t his chapt er
( see "Underst anding How Post greSQL Execut es a Query") .

The real value in pg_stat_all_tables is t hat you can find out which t ables in your dat a base are m ost heavily used. This view does not t ell
you how m uch disk I / O is perform ed against each t able file, nor does it t ell y ou how m uch t im e it t ook t o perform t he operat ions.

The following query finds t he t op 10 t ables in t erm s of num ber of rows r ead:

SELECT relname, idx_tup_fetch + seq_tup_read AS Total


FROM pg_stat_all_tables
WHERE idx_tup_fetch + seq_tup_read != 0
ORDER BY Total desc
LIMIT 10;

Here's an exam ple t hat shows t he result of t his query in a newly creat ed dat abase:

perf=# SELECT relname, idx_tup_fetch + seq_tup_read AS Total


perf-# FROM pg_stat_all_tables
perf-# WHERE idx_tup_fetch + seq_tup_read != 0
perf-# ORDER BY Total desc
perf-# LIMIT 10;

relname | total
--------------+-------
recalls | 78482
pg_class | 57425
pg_index | 20901
pg_attribute | 5965
pg_proc | 1391

I t 's easy t o see t hat t he recalls t able is heavily used—you have r ead 78,482 t uples from t hat t able.

There are t wo variat ions on t he pg_stat_all_tables view. The pg_stat_sys_tables view is ident ical t o pg_stat_all_tables, except t hat it
is rest rict ed t o showing syst em t ables. Sim ilarly, t he pg_stat_user_tables view is rest rict ed t o showing only user- creat ed t ables.

You can also see how heavily each index is being used—t he pg_stat_all_indexes, pg_stat_user_indexes, and pg_stat_system_indexes
views expose index inform at ion.

Alt hough t he pg_stat view t ells you how heavily each t able is used, it doesn't provide any inform at ion about how m uch physical I / O is
perform ed on behalf of each t able. The second set of perform ance- relat ed views provides t hat inform at ion.

The pg_statio_all_tables view cont ains one row for each t able in a dat abase. Here is t he layout of pg_statio_all_tables:

perf=# \d pg_statio_all_tables
View "pg_statio_all_tables"
Column | Type | Modifiers
-----------------+---------+-----------
relid | oid |
schemaname | name |
relname | name |
heap_blks_read | bigint |
heap_blks_hit | bigint |
idx_blks_read | numeric |
idx_blks_hit | numeric |
toast_blks_read | bigint |
toast_blks_hit | bigint |
tidx_blks_read | bigint |
tidx_blks_hit | bigint |

This view provides inform at ion about heap blocks ( heap_blks_read, heap_blks_hit) , index blocks ( idx_blks_read, idx_blks_hit) , t oast
blocks ( toast_blks_read, toast_blks_hit) , and index t oast blocks ( tidx_blks_read, tidx_blks_hit) . For each of t hese block t ypes,
pg_statio_all_tables exposes t wo values: t he num ber of blocks read and t he num ber of blocks t hat were found in Post greSQL's cache. For
exam ple, t he heap_blks_read colum n cont ains t he num ber of heap blocks read for a given t able, and heap_blks_hit t ells you how m any of
t hose pages were found in t he cache.

Post greSQL exposes I / O inform at ion for each index in t he pg_statio_all_indexes, pg_statio_user_indexes, and pg_statio_sys_indexes
views.

Let 's t ry a few exam ples and see how you can use t he inform at ion exposed by pg_statio_all_tables.

I 've writ t en a sim ple ut ilit y ( called t im er) t hat m akes it a lit t le easier t o see t he st at ist ical result s of a given query. This ut ilit y t akes a
snapshot of pg_stat_all_tables and pg_statio_all_tables, execut es a given query, and finally com pares t he new values in
pg_stat_all_tables and pg_statio_all_tables. Using t his ut ilit y, you can see how m uch I / O was perform ed on behalf of t he given query.
Of course, t he dat abase m ust be idle except for t he query under t est .

Execut e t his sim ple query and see what kind of I / O result s you get :

Code View: Scroll / Show All

$ timer "SELECT * FROM recalls"

+-------------+--------------------------------+-------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached|
|-------------+------+-------+----------+------+------+-------+---------+------+
|pg_aggregate | 0 | 0 | 1 | 0 | 1 | 1 | 2 | 0 |
|pg_am | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
|pg_amop | 0 | 0 | 2 | 10 | 10 | 24 | 4 | 16 |
|pg_amproc | 0 | 0 | 1 | 5 | 6 | 6 | 2 | 10 |
|pg_attribute | 0 | 0 | 8 | 14 | 21 | 65 | 6 | 57 |
|pg_cast | 0 | 0 | 2 | 6 | 60 | 8 | 2 | 118 |
|pg_class | 4 | 740 | 5 | 32 | 18 | 17 | 7 | 34 |
|pg_database | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
|pg_index | 2 | 146 | 3 | 11 | 8 | 12 | 4 | 12 |
|pg_namespace | 2 | 10 | 1 | 2 | 2 | 1 | 2 | 2 |
|pg_opclass | 0 | 0 | 2 | 11 | 5 | 73 | 4 | 6 |
|pg_operator | 0 | 0 | 4 | 6 | 10 | 10 | 4 | 26 |
|pg_proc | 0 | 0 | 6 | 8 | 14 | 14 | 12 | 31 |
|pg_rewrite | 0 | 0 | 1 | 1 | 2 | 2 | 2 | 2 |
|pg_shadow | 0 | 0 | 1 | 2 | 3 | 3 | 4 | 2 |
|pg_statistic | 0 | 0 | 3 | 5 | 33 | 8 | 2 | 64 |
|pg_trigger | 0 | 0 | 1 | 1 | 2 | 2 | 2 | 2 |
|pg_type | 0 | 0 | 2 | 5 | 7 | 7 | 2 | 12 |
|recalls | 1 | 39241 | 4413 | 0 | 0 | 0 | 0 | 0 |
+-------------+------+-------+----------+------+------+-------+---------+------+
|Totals | 11 | 40139 | 4458 | 119 | 202 | 253 | 61 | 394 |
+-------------+------+-------+----------+------+------+-------+---------+------+

The timer ut ilit y shows t hat a sim ple query generat es a lot of buffer t raffic. The Post greSQL server m ust parse and plan t he query and it
consult s a num ber of syst em t ables t o do so—t hat explains t he buffer int eract ion incurred on behalf of all of t he t ables t hat st art wit h pg_.
The recalls t able generat es m ost of t he buffer t raffic.

You can invoke t he timer ut ilit y w it h one argum ent or t wo. The first argum ent cont ains t he t ext of t he query t hat y ou want
t o m easure. The second argum ent , if present , is t he nam e of t he t able t hat you're int erest ed in. I f y ou om it t he second
argum ent , you'll see I / O m easurem ent s for every t able t hat was hit during t he query ( including t he Post greSQL syst em
t ables) . I f you include t he second argum ent , you'll only see t he I / O m easurem ent s for t hat t able. I n m ost of t he discussion
t hat follows, we'll filt er out t he I / O perform ed against t he syst em t ables.

This query ret rieved 39,241 rows in a single t able scan. This scan read 4,413 heap blocks from disk and found none in t he cache. Norm ally,
you would hope t o see a cache rat io m uch higher t han 4,413 t o 0! I n t his part icular case, I had j ust st art ed t he post m ast er so t here were few
pages in t he cache and none were devot ed t o t he recalls t able. Now, t ry t his experim ent again t o see if t he cache rat io get s any bet t er:

Code View: Scroll / Show All


$ timer "SELECT * FROM recalls" recalls

+-------------+--------------------------------+-------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached|
|-------------+------+-------+----------+------+------+-------+---------+------+
|recalls | 1 | 39241 | 4413 | 0 | 0 | 0 | 0 | 0 |
+-------------+------+-------+----------+------+------+-------+---------+------+

You get exact ly t he sam e result s for t he recalls t able—no cache hit s. Why not ? We did not include an ORDER BY clause in t his query so
Post greSQL ret urned t he rows in ( approxim at ely) t he order of insert ion. When we execut e t he sam e query a second t im e, Post greSQL st art s
reading at t he beginning of t he page file and cont inues unt il it has read t he ent ire file. Because m y cache is only 512 blocks in size, t he first
512 blocks have been forced out of t he cache by t he t im e I get t o t he end of t he t able scan. The next t im e I execut e t he sam e query, t he
final 512 blocks are in t he cache, but you are looking for t he leading blocks. The end result is no cache hit s.

Just as an experim ent , t ry t o increase t he size of t he cache t o see if you can for ce som e caching t o t ake place.

The Post greSQL cache is kept in a segm ent of m em ory shared by all backend processes. You can see t his using t he ipcs -m com m and [ 6] :

[ 6] \ up7 I n case you are curious, t he key value uniquely ident ifies a shared m em ory segm ent . The key is det erm ined by
m ult iplying t he post m ast er's port num ber by 1,000 and t hen increm ent ing unt il a free segm ent is found. The shm id value is
generat ed by t he operat ing syst em ( key is generat ed by Post greSQL) . The nattach colum n t ells you how m any processes are
current ly using t he segm ent .

$ ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 1409024 postgres 600 5021696 3

The shared m em ory segm ent cont ains m ore t han j ust t he buffer cache: Post greSQL also keeps som e bookkeeping inform at ion in shared
m em ory. Wit h 512 pages in t he buffer cache and an 8K block size, you see a shared m em ory segm ent t hat is 5,021,696 byt es long. Let 's
increase t he buffer cache t o 513 pages and see what effect t hat has on t he size of t he shared m em ory segm ent . There ar e t wo ways t hat you
can adj ust t he size of t he cache. You could change Post greSQL's configurat ion file ( $PGDATA/postgresql.conf) , changing t he
shared_buffers variable from 512 t o 513. Or, you can override t he shared_buffers configurat ion variable when you st art t he postmaster:

$ pg_ctl stop
waiting for postmaster to shut down......done
postmaster successfully shut down
$ #
$ # Note: specifying -o "-B 513" is equivalent
$ # to setting shared_buffers = 513 in
$ # the $PGDATA/postgresql.conf file
$ #
$ pg_start -o "-B 513" -l /tmp/pg.log
postmaster successfully started

Now you can use t he ipcs -m com m and t o see t he change in t he size of t he shared m em ory segm ent :

$ ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 1409024 postgres 600 5038080 3

The shared m em ory segm ent increased from 5,021,696 byt es t o 5,038,080 byt es. That 's a difference of 16,384 byt es, which happens t o be
t he size of t wo blocks. Why t wo? Because Post greSQL keeps som e bookkeeping inform at ion in shared m em ory in addit ion t o t he buffer
cache—t he am ount of ext ra space required depends on t he num ber of shared buffers. Post greSQL won't add t wo blocks each t im e you
increm ent shared_buffers by 1; it j ust happens t hat when you increase from 512 t o 513, you cross a t hreshold t hat requires an ext ra page
in t he bookkeeping syst em .

Now, let 's get back t o t he problem at hand. We want t o find out if doubling t he buffer count will result in m ore cache hit s and t herefore fewer
I / O operat ions. Rem em ber, a t able scan on t he recalls t able result ed in 4,413 heap blocks read and 0 cache hit s. Let 's double t he size of
t he shared buffer cache ( from 512 t o 1,024 blocks) . Try t he sam e query again and check t he result s:

$ pg_ctl stop

waiting for postmaster to shut down......done


postmaster successfully shut down
$ pg_start -o "-B 1024" -l /tmp/pg.log
postmaster successfully started
$ ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 1409024 postgres 600 9338880 3

$ timer "SELECT * FROM recalls" recalls

+-----------------------------------+----------------------------------+
| SEQUENTIAL I/O | INDEXED I/O |
| scans | tuples | heap_blks |cached| scans | tuples | idx_blks |cached|
+-------+--------+-----------+------+-------+--------+----------+------+
| 1 | 39241 | 4413 | 0 | 0 | 0 | 0 | 0 |
+-------+--------+-----------+------+-------+--------+----------+------+

You have t o r un t his query t wice because you shut down and rest art ed t he post m ast er t o adj ust t he cache size. When you shut down t he
post m ast er, t he cache is dest royed ( you can use t he ipcs -m com m and t o verify t his) .

$ timer "SELECT * FROM recalls" recalls

+-----------------------------------+----------------------------------+
| SEQUENTIAL I/O | INDEXED I/O |
| scans | tuples | heap_blks |cached| scans | tuples | idx_blks |cached|
+-------+--------+-----------+------+-------+--------+----------+------+
| 1 | 39241 | 4413 | 0 | 0 | 0 | 0 | 0 |
+-------+--------+-----------+------+-------+--------+----------+------+

St ill t he sam e result s as before—Post greSQL does not seem t o buffer any of t he dat a blocks read from t he recalls t able. Act ually, each block
is buffered as soon as it is read from disk; but as before, t he blocks read at t he beginning of t he t able scan are pushed out by t he blocks read
at t he end of t he scan. When you execut e t he sam e query a second t im e, y ou st art at t he beginning of t he t able and find t hat t he blocks t hat
you need are not in t he cache.

You could increase t he cache size t o be large enough t o hold t he ent ire t able ( som ewhere around 4,413 + 120 blocks should do it ) , but t hat 's
a large shared m em ory segm ent , and if you don't have enough physical m em ory, your syst em will st art t o t hrash.

Let 's t ry a different approach. Post greSQL has enough room for 1,024 pages in t he shared buffer cache. The ent ire recalls t able consum es
4,413 pages. I f you use t he LIMIT clause t o select a subset of t he recalls t able, you should see som e caching. I 'm going t o lower t he cache
size back t o it s default of 512 pages before we st art :

$ pg_ctl stop
waiting for postmaster to shut down......done
postmaster successfully shut down
$ pg_start -o "-B 512" -l /tmp/pg.log
postmaster successfully started

You know t hat it t akes 4,413 pages t o hold t he 39,241 rows in recalls, which gives you an average of about 9 rows per page. We have 512
pages in t he cache; let 's assum e t hat Post greSQL needs about 180 of t hem for it s own bookkeeping, leaving us 332 pages. So, you should
ask for 9 * 332 ( or 2,988) rows:

Code View: Scroll / Show All

$ ./timer "SELECT * FROM recalls LIMIT 2988" recalls


+-------------+--------------------------------+-------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached|
|-------------+------+-------+----------+------+------+-------+---------+------+
|recalls | 1 | 2988 | 208 | 0 | 0 | 0 | 0 | 0 |
+-------------+------+-------+----------+------+------+-------+---------+------+

Post greSQL read 208 heap blocks. I f everyt hing worked, t hose pages should st ill be in t he cache. Let 's run t he query again:

Code View: Scroll / Show All

$ ./timer "SELECT * FROM recalls LIMIT 2988" recalls


+-------------+--------------------------------+-------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached|
|-------------+------+-------+----------+------+------+-------+---------+------+
|recalls | 1 | 2988 | 0 | 208 | 0 | 0 | 0 | 0 |
+-------------+------+-------+----------+------+------+-------+---------+------+

Now you're get t ing som ewhere. Post greSQL read 208 heap blocks and found all 208 of t hem in t he cache.

D e a d Tu ple s

Now let 's look at anot her fact or t hat affect s perform ance. Make a sim ple updat e t o t he recalls t able:

perf=# UPDATE recalls SET potaff = potaff + 1;


UPDATE

This com m and increm ent s t he potaff colum n of each row in t he recalls t able. ( Don't read t oo m uch int o t his part icular UPDATE. I chose
potaff sim ply because I needed an easy w ay t o updat e every row.) Now, aft er rest art ing t he dat abase, go back and SELECT all rows again:
Code View: Scroll / Show All

$ timer "SELECT * FROM recalls" recalls

+-------------+--------------------------------+-------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached|
|-------------+------+-------+----------+------+------+-------+---------+------+
|recalls | 1 | 39241 | 8803 | 0 | 0 | 0 | 0 | 0 |
+-------------+--------------------------------+-------------------------------+

That 's an int erest ing result —you st ill ret rieved 39,241 row s, but t his t im e you had t o read 8,803 pages t o find t hem . What happened? Let 's
see if t he pg_class t able gives any clues:

perf=# SELECT relname, reltuples, relpages


perf-# FROM pg_class
perf-# WHERE relname = 'recalls';
relname | reltuples | relpages
---------+-----------+----------
recalls | 39241 | 4413

No clues t here—pg_class st ill t hinks you have 4,413 heap blocks in t his t able. Let 's t ry count ing t he individual rows:

perf=# SELECT count(*) FROM recalls;


count
-------
39241

At least t hat gives you a consist ent answer. But why does a sim ple update UPDATE cause y ou t o read t wice as m any heap blocks as before?

When you UPDATE a row, Post greSQL perform s t he following operat ions:

1. The new row values are writ t en t o t he t able.

2. The old row is delet ed from t he t able.

3. The delet ed row rem ains in t he t able, but is no longer accessible.

This m eans t hat when you execut ed t he st at em ent " UPDATE recalls SET potaff = potaff + 1", Post greSQL insert ed 39,241 new rows
and delet ed 39,241 old rows. We now have 78,482 row s, half of which are inaccessible.

Why does Post greSQL carry out an UPDATE com m and t his w ay? The answer lies in Post greSQL's MVCC ( m ult i- version concurrency cont rol)
feat ure. Consider t he following com m ands:

perf=# BEGIN WORK;


BEGIN
perf=# UPDATE recalls SET potaff = potaff + 1;
UPDATE

Not ice t hat you have st art ed a new t ransact ion, but you have not yet com plet ed it . I f anot her user were t o SELECT rows from t he recalls
t able at t his point , he m ust see t he old values—you m ight roll back t his t ransact ion. I n ot her dat abase syst em s ( such as DB2, Sybase, and
SQL Server) , t he ot her user would have t o w ait unt il you eit her com m it t ed or rolled back your t ransact ion before his query would com plet e.
Post greSQL, on t he ot her hand, keeps t he old rows in t he t able, and ot her users will see t he original values unt il you com m it your
t ransact ion. I f you roll back your changes, Post greSQL sim ply hides your m odificat ions from all t ransact ions ( leaving y ou wit h 78,482 rows,
half of which are inaccessible) .

When you DELETE rows from a t able, Post greSQL follows a sim ilar set of rules. The delet ed rows rem ain in t he t able, but are hidden. I f you
roll back a DELETE com m and, Post greSQL will sim ply m ake t he rows visible again.

Now you also know t he difference bet ween a t uple and a row. A t uple is som e version of a row.

When you m ake a change t o a t able, t he t uples t hat you've changed ar e hidden from ot her users unt il you COMMIT your changes. I f you
INSERT 100,000 new rows int o a t able t hat previously cont ained only a few row s, anot her user m ight suddenly see a decrease in query
perform ance even t hough he can't see t he new row s. I f you roll back your changes, ot her users will never see t he new rows, but t he dead
t uples t hat you've creat ed will cont inue t o affect perform ance unt il som eone VACUUMs t he t able.

You can see t hat t hese hidden t uples can dram at ically affect perform ance—updat ing every row in a t able doubles t he num ber of heap blocks
required t o read t he ent ire t able.

There are at least t hree w ays t o rem ove dead t uples from a dat abase. One way is t o export all ( visible) rows and t hen im port t hem again
using pg_dump and pg_restore. Anot her m et hod is t o use CREATE TABLE ... AS t o m ake a new copy of t he t able, drop t he original t able,
and renam e t he copy. The preferred way is t o use t he VACUUM com m and. I 'll show you how t o use t he VACUUM com m and a lit t le lat er ( see t he
sect ion " Table St at ist ics") .
I n d e x P e r f o r m a n ce

You've seen how Post greSQL bat ches all disk I / O int o 8K blocks, and you've seen how Post greSQL m aint ains a buffer cache t o reduce disk
I / O. Let 's find out what happens when you t hrow an index int o t he m ix. Aft er rest art ing t he post m ast er ( t o clear t he cache) , execut e t he
following query:

$ timer "SELECT * FROM recalls ORDER BY record_id;" recalls

+-----------------------------------+----------------------------------+
| SEQUENTIAL I/O | INDEXED I/O |
| scans | tuples | heap_blks |cached| scans | tuples | idx_blks |cached|
+-------+--------+-----------+------+-------+--------+----------+------+
| 0 | 0 | 26398 | 12843| 1 | 39241 | 146 | 0 |
+-------+--------+-----------+------+-------+--------+----------+------+

You can see t hat Post greSQL chose t o execut e t his query using an index scan ( rem em ber, y ou have an index defined on t he record_id
colum n) . This query read 146 index blocks and found none in t he buffer cache. You also processed 26,398 heap blocks and found 12,843 in
t he cache. You can see t hat t he buffer cache helped t he perform ance a bit , but you st ill processed over 26,000 heap blocks, and you need
only 4,413 t o hold t he ent ire recalls t able. Why did you need t o read each heap block ( approxim at ely) five t im es?

Think of how t he recalls t able is st ored on disk ( see Figure 4.2) .

Figur e 4 .2 . Th e recalls t a ble on disk .

Not ice t hat t he rows are not st ored in record_id order. I n fact , t hey are st ored in order of insert ion. When y ou creat e an index on t he
record_id colum n, you end up w it h a st ruct ure like t hat shown in Figure 4.3.

Figur e 4 .3 . Th e recalls t a ble st r u ct u r e a ft e r cr e a t in g a n in de x .

[ View full size im age]


Consider how Post greSQL uses t he record_id index t o sat isfy t he query. Aft er t he first block of t he record_id index is read int o t he buffer
cache, Post greSQL st art s scanning t hrough t he index ent ries. The first index ent ry point s t o a recalls row on heap block 2, so t hat heap
block is read int o t he buffer cache. Now, Post greSQL m oves on t o t he second index ent ry—t his one point s t o a row in heap block 1.
Post greSQL reads heap block 1 int o t he buffer cache, t hrowing out som e ot her page if t here is no room in t he cache. Figure 4.2 shows a
part ial view of t he recalls t able: rem em ber t hat t here are act ually 4,413 heap blocks and 146 index blocks needed t o sat isfy t his query. I t 's
t he random ordering of t he rows wit hin t he recalls t able t hat kills t he cache hit rat io.

Let 's t ry reordering t he recalls t able so t hat rows are insert ed in record_id order. First , creat e a work t able w it h t he sam e st ruct ure as
recalls:

perf=# CREATE TABLE work_recalls AS


perf-# SELECT * FROM recalls ORDER BY record_id;
SELECT

Then, drop t he original t able, renam e t he work t able, and re- creat e t he index:

perf=# DROP TABLE recalls;


DROP
perf=# ALTER TABLE work_recalls RENAME TO recalls;
ALTER
perf=# CREATE INDEX recalls_record_id ON recalls( record_id );
CREATE

At t his point , you have t he sam e dat a as before, consum ing t he sam e am ount of space:

perf=# SELECT relname, relpages, reltuples FROM pg_class


perf-# WHERE relname IN ('recalls', 'recalls_record_id' );
relname | relpages | reltuples
-------------------+----------+-----------
recalls_record_id | 146 | 39241
recalls | 4422 | 39241
(2 rows)

Aft er rest art ing t he postmaster ( again, t his clears out t he buffer cache so you get consist ent result s) , let 's r e- execut e t he previous query:

$ timer "SELECT * FROM recalls ORDER BY record_id;" recalls

+-----------------------------------+----------------------------------+
| SEQUENTIAL I/O | INDEXED I/O |
| scans | tuples | heap_blks |cached| scans | tuples | idx_blks |cached|
+-------+--------+-----------+------+-------+--------+----------+------+
| 0 | 0 | 4423 | 34818| 1 | 39241 | 146 | 0 |
+-------+--------+-----------+------+-------+--------+----------+------+

That m ade quit e a difference. Before reordering, you read 26,398 heap blocks from disk and found 12,843 in t he cache for a 40% cache hit
rat io. Aft er physically reordering t he rows t o m at ch t he index, you read 4,423 heap blocks from disk and found 34,818 in t he cache for hit
rat io of 787% . This m akes a huge perform ance difference. Now as you r ead t hrough each index page, t he heap records appear next t o each
ot her; you w on't be t hrashing heap pages in and out of t he cache. Figure 4.4 shows how t he recalls t able looks aft er reordering.

Figur e 4 .4 . Th e recalls t a ble on disk a ft e r r e or de r in g.

[ View full size im age]


We reordered t he recalls t able by creat ing a copy of t he t able ( in t he desired order) , dropping t he original t able, and t hen renam ing t he
copy back t o t he original nam e. You can also use t he CLUSTER com m and—it does exact ly t he sam e t hing.
U n d e r st a n d i n g H o w P o st g r e S Q L Ex e cu t e s a Q u e r y

Before going m uch furt her, you should underst and t he procedure t hat Post greSQL follows whenever it execut es a query on your behalf.

Aft er t he Post greSQL server receives a query from t he client applicat ion, t he t ext of t he query is handed t o t he parser. The parser scans
t hrough t he query and checks it for synt ax errors. I f t he query is synt act ically correct , t he parser will t ransform t he query t ext int o a parse
t ree. A parse t ree is a dat a st ruct ure t hat represent s t he m eaning of your query in a form al, unam biguous form .

Given t he query

Code View: Scroll / Show All

SELECT customer_name, balance FROM customers WHERE balance > 0 ORDER BY balance

t he parser m ight com e up w it h a parse t ree st ruct ured as shown in Figure 4.5.

Figur e 4 .5 . A sa m ple pa r se t r e e .

[ View full size im age]

Aft er t he parser has com plet ed parsing t he query, t he parse t ree is handed off t o t he planner/ opt im izer.

The planner is responsible for t raversing t he parse t ree and finding all possible plans for execut ing t he query. The plan m ight include a
sequent ial scan t hrough t he ent ire t able and index scans if useful indexes have been defined. I f t he query involves t wo or m ore t ables, t he
planner can suggest a num ber of different m et hods for j oining t he t ables. The execut ion plans are developed in t erm s of query operat ors.
Each query operat or t ransform s one or m ore input set s int o an int erm ediat e result set . The Seq Scan operat or, for exam ple, t ransform s an
input set ( t he physical t able) int o a result set , filt ering out any rows t hat don't m eet t he query const raint s. The Sort operat or produces a
result set by reordering t he input set according t o one or m ore sort keys. I 'll describe each of t he query operat ors in m ore det ail a lit t le lat er.
Figure 4.6 shows an exam ple of a sim ple execut ion plan ( it is a new exam ple; it is not relat ed t o t he parse t ree in Figure 4.5) .

Figur e 4 .6 . A sim ple e x e cu t ion pla n.

You can see t hat com plex queries are broken down int o sim ple st eps. The input set for a query operat or at t he bot t om of t he t ree is usually a
physical t able. The input set for an upper - level operat or is t he result set of a lower - level operat or.

When all possible execut ion plans have been generat ed, t he opt im izer searches for t he least - expensive plan. Each plan is assigned an
est im at ed execut ion cost . Cost est im at es ar e m easured in unit s of disk I / O. An operat or t hat reads a single block of 8,192 byt es ( 8K) from
t he disk has a cost of one unit . CPU t im e is also m easured in disk I / O unit s, but usually as a fract ion. For exam ple, t he am ount of CPU t im e
required t o process a single t uple is assum ed t o be 1/ 100 th of a single disk I / O. You can adj ust m any of t he cost est im at es. Each query
operat or has a different cost est im at e. For exam ple, t he cost of a sequent ial scan of an ent ire t able is com put ed as t he num ber of 8K blocks
in t he t able, plus som e CPU overhead.

Aft er choosing t he ( apparent ly) least - expensive execut ion plan, t he query execut or st art s at t he beginning of t he plan and asks t he t opm ost
operat or t o produce a result set . Each operat or t ransform s it s input set int o a result set —t he input set m ay com e from anot her operat or lower
in t he t ree. When t he t opm ost operat or com plet es it s t ransform at ion, t he result s are ret urned t o t he client applicat ion.

EX P LA I N

The EXPLAIN st at em ent gives you som e insight int o how t he Post greSQL query planner/ opt im izer decides t o execut e a query.

First , you should know t hat t he EXPLAIN st at em ent can be used only t o analyze SELECT, INSERT, DELETE, UPDATE, and DECLARE...CURSOR
com m ands.

The synt ax for t he EXPLAIN com m and is

EXPLAIN [ANALYZE][VERBOSE] query;

Let 's st art by looking at a sim ple exam ple:

perf=# EXPLAIN ANALYZE SELECT * FROM recalls;


NOTICE: QUERY PLAN:
Seq Scan on recalls (cost=0.00..9217.41 rows=39241 width=1917)
(actual time=69.35..3052.72 rows=39241 loops=1)
Total runtime: 3144.61 msec

The form at of t he execut ion plan can be a lit t le m yst erious at first . For each st ep in t he execut ion plan, EXPLAIN print s t he following
inform at ion:

€ The t ype of operat ion required.

€ The est im at ed cost of execut ion.

€ I f you specified EXPLAIN ANALYZE, t he act ual cost of execut ion. I f y ou om it t he ANALYZE keyword, t he query is planned but not
execut ed, and t he act ual cost is not displayed.

I n t his exam ple, Post greSQL has decided t o perform a sequent ial scan of t he recalls t able ( Seq Scan on recalls) . There are m any
operat ors t hat Post greSQL can use t o execut e a query. I 'll explain t he operat ion t ype in m ore det ail in a m om ent .

There are t hree dat a it em s in t he cost est im at e. The fir st set of num bers ( cost=0.00..9217.41) is an est im at e of how " expensive" t his
operat ion will be. " Expensive" is m easured in t erm s of disk reads. Two num bers are given: The first num ber represent s how quickly t he first
row in t he result set can be ret urned by t he operat ion; t he second ( which is usually t he m ost im port ant ) represent s how long t he ent ire
operat ion should t ake. The second dat a it em in t he cost est im at e ( rows=39241) shows how m any rows Post greSQL expect s t o ret urn from
t his operat ion. The final dat a it em ( width=1917) is an est im at e of t he widt h, in byt es, of t he average row in t he result set .

I f you include t he ANALYZE keyword in t he EXPLAIN com m and, Post greSQL will execut e t he query and display t he act ual execut ion cost s.

Cost Est im a t e s

I will rem ove t he cost est im at es from som e of t he EXPLAIN result s in t his chapt er t o m ake t he plan a bit easier t o read. Don't
be confused by t his—t he EXPLAIN com m and will always print cost est im at es.

This was a sim ple exam ple. Post greSQL required only one st ep t o execut e t his query ( a seq uent ial scan on t he ent ire t able) . Many queries
require m ult iple st eps and t he EXPLAIN com m and will show you each of t hose st eps. Let 's look at a m ore com plex exam ple:

perf=# EXPLAIN ANALYZE SELECT * FROM recalls ORDER BY yeartxt;


NOTICE: QUERY PLAN:

Sort (cost=145321.51..145321.51 rows=39241 width=1911)


(actual time=13014.92..13663.86 rows=39241 loops=1)

->Seq Scan on recalls (cost=0.00..9217.41 rows=39241 width=1917)


(actual time=68.99..3446.74 rows=39241 loops=1)
Total runtime: 16052.53 msec

This exam ple shows a t wo- st ep query plan. I n t his case, t he first st ep is act ually list ed at t he end of t he plan. When you read a query plan, it
is im port ant t o rem em ber t hat each st ep in t he plan produces an int erm ediat e result set . Each int erm ediat e result set is fed int o t he next st ep
of t he plan.

Looking at t his plan, Post greSQL fir st produces an int erm ediat e result set by perform ing a sequent ial scan ( Seq Scan) on t he ent ire recalls
t able. That st ep should t ake about 9,217 disk page reads, and t he result set will have about 39,241 rows, averaging 1,917 byt es each. Not ice
t hat t hese est im at es are ident ical t o t hose produced in t he first exam ple—and in bot h cases, you are execut ing a sequent ial scan on t he
ent ire t able.
Aft er t he sequent ial scan has finished building it s int erm ediat e result set , it is fed int o t he next st ep in t he plan. The final st ep in t his
part icular plan is a sort operat ion, which is required t o sat isfy our ORDER BY clause [ 7] . The sort operat ion reorders t he result set produced by
t he sequent ial scan and ret urns t he final result set t o t he client applicat ion.

[ 7] An ORDER BY clause does not require a Sort operat ion in all cases. The planner/ opt im izer m ay decide t hat it can use an
index t o order t he result set .

The Sort operat ion expect s a single operand—a result set . The Seq Scan operat ion expect s a single operand—a t able. Som e operat ions
require m ore t han one operand. Here is a j oin bet ween t he recalls t able and t he mfgs t able:

perf=# EXPLAIN SELECT * FROM recalls, mfgs


perf-# WHERE recalls.mfgname = mfgs.mfgname;
NOTICE: QUERY PLAN:

Merge Join
-> Sort
-> Seq Scan on recalls
-> Sort
-> Seq Scan on mfgs

I f you use your im aginat ion, y ou will see t hat t his query plan is act ually a t ree st ruct ure, as illust rat ed in Figure 4.7.

Figu r e 4 .7 . Ex e cu t ion pla n vie w e d a s a t r e e .

When Post greSQL execut es t his query plan, it st art s at t he t op of t he t ree. The Merge Join operat ion requires t wo result set s for input , so
Post greSQL m ust m ove down one level in t he t ree; let 's assum e t hat y ou t raverse t he left child first . Each Sort operat ion requires a single
result set for input , so again t he query execut or m oves down one m ore level. At t he bot t om of t he t ree, t he Seq Scan operat ion sim ply reads
a row from a t able and ret urns t hat row t o it s parent . Aft er a Seq Scan operat ion has scanned t he ent ire t able, t he left - hand Sort operat ion
can com plet e. As soon as t he left - hand Sort operat ion com plet es, t he Merge Join operat or will evaluat e it s right child. I n t his case, t he
right - hand child evaluat es t he sam e w ay as t he left - hand child. When bot h Sort operat ions com plet e, t he Merge Join operat or will execut e,
producing t he final result set .

So far, you've seen t hree query execut ion operat ors in t he execut ion plans. Post greSQL current ly has 19 query operat ors. Let 's look at each
in m ore det ail.

Seq Scan

The Seq Scan operat or is t he m ost basic query operat or. Any single- t able query can be carried out using t he Seq Scan operat or.

Seq Scan works by st art ing at t he beginning of t he t able and scanning t o t he end of t he t able. For each row in t he t able, Seq Scan evaluat es
t he query const raint s [ 8] ( t hat is, t he WHERE clause) ; if t he const raint s are sat isfied, t he required colum ns are added t o t he result set .

[ 8]
The ent ire WHERE clause m ay not be evaluat ed for each row in t he input set . Post greSQL evaluat es only t he port ions of t he
clause t hat apply t o t he given row ( if any) . For a single- t able SELECT, t he ent ire WHERE clause is evaluat ed. For a m ult i- t able
j oin, only t he port ion t hat applies t o t he given row is evaluat ed.

As you saw earlier in t his chapt er, a t able can include dead ( t hat is, delet ed) rows and rows t hat m ay not be visible because t hey have not
been com m it t ed. Seq Scan does not include dead rows in t he result set , but it m ust read t he dead rows, and t hat can be expensive in a
heavily updat ed t able.

The cost est im at e for a Seq Scan operat or gives you a hint about how t he operat or works:

Seq Scan on recalls (cost=0.00..9217.41 rows=39241 width=1917)

The st art up cost is always 0.00. This im plies t hat t he first row of a Seq Scan operat or can be ret urned im m ediat ely and t hat Seq Scan does
not read t he ent ire t able before ret urning t he first row. I f you open a cursor against a query t hat uses t he Seq Scan operat or ( and no ot her
operat ors) , t he first FETCH will ret urn im m ediat ely —you won't have t o wait for t he ent ire result set t o be m at erialized before you can FETCH
t he first r ow . Ot her operat ors ( such as Sort) do read t he ent ire input set before ret urning t he first row.

The planner/ opt im izer chooses a Seq Scan if t here are no indexes t hat can be used t o sat isfy t he query. A Seq Scan is also used when t he
planner/ opt im izer decides t hat it would be less expensive ( or j ust as expensive) t o scan t he ent ire t able and t hen sort t he result set t o m eet
an ordering const raint ( such as an iORDER BY clause) .

I n d e x S ca n

An Index Scan operat or works by t raversing an index st ruct ure. I f you specify a st art ing value for an indexed colum n ( WHERE record_id >=
1000, for exam ple) , t he Index Scan will begin at t he appropriat e value. I f you specify an ending value ( such as WHERE record_id < 2000) ,
t he Index Scan will com plet e as soon as it finds an index ent ry great er t han t he ending value.

The Index Scan operat or has t wo advant ages over t he Seq Scan operat or. First , a Seq Scan m ust read every row in t he t able—it can only
rem ove rows from t he result set by evaluat ing t he WHERE clause for each r ow . Index Scan m ay not r ead every row if you provide st art ing
and/ or ending values. Second, a Seq Scan ret urns rows in t able order, not in sort ed order. Index Scan will ret urn rows in index order.

Not all indexes are scannable. The B-Tree, R-Tree, and GiST index t ypes can be scanned; a Hash index cannot .

The planner/ opt im izer uses an Index Scan operat or when it can reduce t he size of t he result set by t raversing a range of indexed values, or
when it can avoid a sort because of t he im plicit ordering offered by an index.

So r t

The Sort operat or im poses an ordering on t he result set . Post greSQL uses t wo different sort st rat egies: an in- m em ory sort and an on- disk
sort . You can t une a Post greSQL inst ance by adj ust ing t he value of t he sort_mem runt im e param et er. I f t he size of t he result set exceeds
sort_mem, Sort will dist ribut e t he input set t o a collect ion of sort ed work files and t hen m erge t he work files back t oget her again. I f t he result
set will fit in sort_mem*1024 byt es, t he sort is done in m em ory using t he QSort algorit hm .

A Sort operat or never reduces t he size of t he result set —it does not rem ove rows or colum ns.

Unlike Seq Scan and Index Scan, t he Sort operat or m ust process t he ent ire input set before it can ret urn t he first row.

The Sort operat or is used for m any purposes. Obviously, a Sort can be used t o sat isfy an ORDER BY clause. Som e query operat ors require
t heir input set s t o be ordered. For exam ple, t he Unique operat or ( we'll see t hat in a m om ent ) elim inat es rows by det ect ing duplicat e values
as it reads t hrough a sort ed input set . Sort will also be used for som e j oin operat ions, group operat ions, and for som e set operat ions ( such
as INTERSECT and UNION) .

Un iqu e

The Unique operat or elim inat es duplicat e values fr om t he input set . The input set m ust be ordered by t he colum ns, and t he colum ns m ust be
unique. For exam ple, t he following com m and

SELECT DISTINCT mfgname FROM recalls;

m ight produce t his execut ion plan:

Unique
-> Sort
-> Seq Scan on recalls

The Sort operat ion in t his plan orders it s input set by t he mfgname colum n. Unique works by com paring t he unique colum n( s) from each row
t o t he previous r ow . I f t he values are t he sam e, t he duplicat e is rem oved from t he result set .

The Unique operat or rem oves only rows—it does not rem ove colum ns and it does not change t he ordering of t he result set .

Unique can ret urn t he first row in t he result set before it has finished processing t he input set .

The planner/ opt im izer uses t he Unique operat or t o sat isfy a DISTINCT clause. Unique is also used t o elim inat e duplicat es in a UNION.

LI M I T

The LIMIT operat or is used t o lim it t he size of a result set . Post greSQL uses t he LIMIT operat or for bot h LIMIT and OFFSET processing. The
LIMIT operat or works by discarding t he fir st x rows from it s input set , ret urning t he next y rows, and discarding t he rem ainder. I f t he query
includes an OFFSET clause, x represent s t he offset am ount ; ot herwise, x is zero. I f t he quer y includes a LIMIT clause, y represent s t he LIMIT
am ount ; ot herwise, y is at least as lar ge as t he num ber of rows in t he input set .

The ordering of t he input set is not im port ant t o t he LIMIT operat or, but it is usually im port ant t o t he overall query plan. For exam ple, t he
query plan for t his query

perf=# EXPLAIN SELECT * FROM recalls LIMIT 5;


NOTICE: QUERY PLAN:

Limit (cost=0.00..0.10 rows=5 width=1917)


-> Seq Scan on recalls (cost=0.00..9217.41 rows=39241 width=1917)
shows t hat t he LIMIT operat or rej ect s all but t he first five rows ret urned by t he Seq Scan. On t he ot her hand, t his query

perf=# EXPLAIN ANALYZE SELECT * FROM recalls ORDER BY yeartxt LIMIT 5;


NOTICE: QUERY PLAN:

Limit (cost=0.00..0.10 rows=5 width=1917)


->Sort (cost=145321.51..145321.51 rows=39241 width=1911)
->Seq Scan on recalls (cost=0.00..9217.41 rows=39241 width=1917)

shows t hat t he LIMIT operat or ret urns t he first five rows from an ordered input set .

The LIMIT operat or never rem oves colum ns from t he result set , but it obviously rem oves rows.

The planner/ opt im izer uses a LIMIT operat or if t he query includes a LIMIT clause, an OFFSET clause, or bot h. I f t he query includes only a
LIMIT clause, t he LIMIT operat or can ret urn t he first row before it processes t he ent ire set .

Ag g r e g a t e

The planner/ opt im izer produces an Aggregate operat or whenever t he query includes an aggregat e funct ion. The following funct ions are
aggregat e funct ions: AVG(), COUNT(), MAX(), MIN(), STDDEV(), SUM(), and VARIANCE().

Aggregate works by reading all t he rows in t he input set and com put ing t he aggregat e values. I f t he input set is not grouped, Aggregate
produces a single result row. For exam ple:

movies=# EXPLAIN SELECT COUNT(*) FROM customers;


Aggregate (cost=22.50..22.50 rows=1 width=0)
-> Seq Scan on customers (cost=0.00..20.00 rows=1000 width=0)

I f t he input set is grouped, Aggregate produces one result row for each group:

movies=# EXPLAIN
movies-# SELECT COUNT(*), EXTRACT( DECADE FROM birth_date )
movies-# FROM customers
movies-# GROUP BY EXTRACT( DECADE FROM birth_date );
NOTICE: QUERY PLAN:

Aggregate (cost=69.83..74.83 rows=100 width=4)


-> Group (cost=69.83..72.33 rows=1000 width=4)
-> Sort (cost=69.83..69.83 rows=1000 width=4)
-> Seq Scan on customers (cost=0.00..20.00 rows=1000 width=4)

Not ice t hat t he row est im at e of an ungrouped aggregat e is always 1; t he row est im at e of a group aggregat e is 1/ 10 th of t he size of t he input
set .

Ap p e n d

The Append operat or is used t o im plem ent a UNION. An Append operat or will have t wo or m ore input set s. Append works by ret urning all rows
from t he first input set , t hen all rows from t he second input set , and so on unt il all rows from all input set s have been processed.

Here is a query plan t hat shows t he Append operat or:

perf=# EXPLAIN
perf-# SELECT * FROM recalls WHERE mfgname = 'FORD'
perf-# UNION
perf=# SELECT * FROM recalls WHERE yeartxt = '1983';

Unique
->Sort
->Append
->Subquery Scan *SELECT* 1
->Seq Scan on recalls
->Subquery Scan *SELECT* 2
->Seq Scan on recalls

The cost est im at e for an Append operat or is sim ply t he sum of cost est im at es for all input set s. An Append operat or can ret urn it s first row
before processing all input rows.

The planner/ opt im izer uses an Append operat or whenever it encount ers a UNION clause. Append is also used when you select from a t able
involved in an inherit ance hierarchy. I n Chapt er 3, " Post greSQL SQL Synt ax and Use," I defined t hree t ables, as shown in Figure 4.8.

Figur e 4 .8 . I n h e r it a n ce h ie r a r ch y.
The dvds t able inherit s from video, as does t he tapes t able. I f you SELECT from dvds or video, Post greSQL will respond wit h a sim ple query
plan:

movies=# EXPLAIN SELECT * FROM dvds;


Seq Scan on dvds (cost=0.00..20.00 rows=1000 width=122)

movies=# EXPLAIN SELECT * FROM tapes;


Seq Scan on tapes (cost=0.00..20.00 rows=1000 width=86)

Rem em ber, because of t he inherit ance hierarchy, a dvd is a video and a tape is a video. I f you SELECT from video, you would expect t o see
all dvds, all tapes, and all videos. The query plan reflect s t he inherit ance hierarchy:

movies=# EXPLAIN SELECT * FROM video;

Result(cost=0.00..60.00 rows=3000 width=86)


->Append(cost=0.00..60.00 rows=3000 width=86)
->Seq Scan on video (cost=0.00..20.00 rows=1000 width=86)
->Seq Scan on tapes video (cost=0.00..20.00 rows=1000 width=86)
->Seq Scan on dvds video (cost=0.00..20.00 rows=1000 width=86)

Look closely at t he width clause in t he preceding cost est im at es. I f y ou SELECT from t he dvds t able, t he width est im at e is 122 byt es per row.
I f you SELECT from t he tapes t able, t he width est im at e is 86 byt es per row. When you SELECT from video, all rows are expect ed t o be 86
byt es long. Here are t he com m ands used t o cr eat e t he tapes and dvds t ables:

movies=# CREATE TABLE tapes ( ) INHERITS( video );

movies=# CREATE TABLE dvds


movies-# (
movies(# region_id INTEGER,
movies(# audio_tracks VARCHAR[]
movies(# ) INHERITS ( video );

You can see t hat a row from t he tapes t able is ident ical t o a row in t he video t able—you would expect t hem t o be t he sam e size ( 86 byt es) .
A row in t he dvds t able cont ains a video plus a few ext ra colum ns, so y ou would expect a dvds row t o be longer t han a video row. When you
SELECT from t he video t able, you want all videos. Post greSQL discards any colum ns t hat ar e not inherit ed from t he video t able.

R e su l t

The Result operat or is used in t hree cont ext s.

First , a Result operat or is used t o execut e a query t hat does not ret rieve dat a from a t able:

movies=# EXPLAIN SELECT timeofday();


Result

I n t his for m , t he Result operat or sim ply evaluat es t he given expression( s) and ret urns t he result s.

Result is also used t o evaluat e t he part s of a WHERE clause t hat don't depend on dat a ret rieved from a t able. For exam ple:

movies=# EXPLAIN SELECT * FROM tapes WHERE 1 <> 1;


Result
->Seq Scan on tapes

This m ight seem like a silly query, but som e client applicat ions will generat e a query of t his form as an easy way t o ret rieve t he m et adat a
( t hat is, colum n definit ions) for a t able.

I n t his for m , t he Result operat or first evaluat es t he const ant part of t he WHERE clause. I f t he expression evaluat es t o FALSE, no furt her
processing is required and t he Result operat or com plet es. I f t he expression evaluat es t o TRUE, Result will ret urn it s input set .

The planner/ opt im izer also generat es a Result operat or if t he t op node in t he query plan is an Append operat or. This is a rat her obscure rule
t hat has no perform ance im plicat ions; it j ust happens t o m ake t he query planner and execut or a bit sim pler for t he Post greSQL developers t o
m aint ain.
N e st e d Lo o p

The Nested Loop operat or is used t o perform a j oin bet ween t wo t ables. A Nested Loop operat or requires t wo input set s ( given t hat a
Nested Loop j oins t wo t ables, t his m akes perfect sense) .

Nested Loop works by fet ching each row from one of t he input set s ( called t he out er t able) . For each row in t he out er t able, t he ot her input
( called t he inner t able) is searched for a row t hat m eet s t he j oin qualifier.

Here is an exam ple:

perf=# EXPLAIN
perf-# SELECT * FROM customers, rentals
perf=# WHERE customers.customer_id = rentals.customer_id;

Nested Loop
-> Seq Scan on rentals
-> Index Scan using customer_id on customers

The out er t able is always list ed first in t he query plan ( in t his case, rentals is t he out er t able) . To execut e t his plan, t he Nested Loop
operat or will read each row [ 9] in t he rentals t able. For each rentals row, Nested Loop reads t he corresponding customers row using an
indexed lookup on t he customer_id index.

[ 9] Act ually, Nested Loop reads only t hose rows t hat m eet t he query const raint s.

A Nested Loop operat or can be used t o perform inner j oins, left out er j oins, and unions.

Because Nested Loop does not process t he ent ire inner t able, it can't be used for ot her j oin t ypes ( full, right j oin, and so on) .

M e r g e Jo i n

The Merge Join operat or also j oins t wo t ables. Like t he Nested Loop operat or, Merge Join requires t wo input set s: an out er t able and an
inner t able. Each input set m ust be ordered by t he j oin colum ns.

Let 's look at t he previous query, t his t im e execut ed as a Merge Join:

perf=# EXPLAIN
perf-# SELECT * FROM customers, rentals
perf=# WHERE customers.customer_id = rentals.customer_id;

Merge Join
-> Sort
-> Seq Scan on rentals
-> Index Scan using customer_id on customers

Merge Join st art s reading t he first row from each t able ( see Figure 4.9) .

Figur e 4 .9 . Merge Join—St e p 1 .

I f t he j oin colum ns are equal ( as in t his case) , Merge Join creat es a new row cont aining t he necessary colum ns from each input t able and
ret urns t he new r ow . Merge Join t hen m oves t o t he next row in t he out er t able and j oins it w it h t he corresponding row in t he inner t able
( see Figure 4.10 ) .

Figur e 4 .1 0 . Merge Join—St e p 2 .

Next , Merge Join reads t he t hird row in t he out er t able ( see Figure 4.11) .
Figur e 4 .1 1 . Merge Join—St e p 3 .

Now Merge Join m ust advance t he inner t able t wice before anot her result row can be creat ed ( see Figure 4.12) .

Figur e 4 .1 2 . Merge Join—St e p 4 .

Aft er producing t he result row for customer_id = 3, Merge Join m oves t o t he last row in t he out er t able and t hen advances t he inner t able
t o a m at ching row ( see Figure 4.13) .

Figu r e 4 .1 3 . Merge Join—St e p 5 .

Merge Join com plet es by producing t he final result r ow ( customer_id = 4) .

You can see t hat Merge Join works by walking t hrough t wo sort ed t ables and finding m at ches—t he t rick is in keeping t he point ers
synchronized.

This exam ple shows an inner j oin, but t he Merge Join operat or can be used for ot her j oin t ypes by walking t hrough t he sor t ed input set s in
different w ays. Merge Join can do inner j oins, out er j oins, and unions.

Hash a n d Hash Join

The Hash and Hash Join operat ors work t oget her. The Hash Join operat or requires t wo input set s, again called t he out er and inner t ables.
Here is a query plan t hat uses t he Hash Join operat or:

movies=# EXPLAIN
movies-# SELECT * FROM customers, rentals
movies-# WHERE rentals.customer_id = customers.customer_id;
Hash Join
-> Seq Scan on customers
-> Hash
-> Seq Scan on rentals

Unlike ot her j oin operat ors, Hash Join does not require eit her input set t o be ordered by t he j oin colum n. I nst ead, t he inner t able is always a
hash t able, and t he ordering of t he out er t able is not im port ant .

The Hash Join operat or st art s by creat ing it s inner t able using t he Hash operat or. The Hash operat or creat es a t em porary Hash index t hat
covers t he j oin colum n in t he inner t able.

Once t he hash t able ( t hat is, t he inner t able) has been creat ed, Hash Join reads each row in t he out er t able, hashes t he j oin colum n ( from
t he out er t able) , and searches t he t em porary Hash index for a m at ching value.

A Hash Join operat or can be used t o perform inner j oins, left out er j oins, and unions.

Gr o u p

The Group operat or is used t o sat isfy a GROUP BY clause. A single input set is required by t he Group operat or, and it m ust be ordered by t he
grouping colum n( s) .

Group can work in t wo dist inct m odes. I f y ou are com put ing a grouped aggregat e, Group will ret urn each row in it s input set , following each
group w it h a NULL row t o indicat e t he end of t he group ( t he NULL row is for int ernal bookkeeping only, and it will not show up in t he final
result set ) . For exam ple:

movies=# EXPLAIN
movies-# SELECT COUNT(*), EXTRACT( DECADE FROM birth_date )
movies-# FROM customers
movies-# GROUP BY EXTRACT( DECADE FROM birth_date );
NOTICE: QUERY PLAN:

Aggregate (cost=69.83..74.83 rows=100 width=4)


-> Group (cost=69.83..72.33 rows=1000 width=4)
-> Sort (cost=69.83..69.83 rows=1000 width=4)
-> Seq Scan on customers (cost=0.00..20.00 rows=1000 width=4)

Not ice t hat t he row count in t he Group operat or's cost est im at e is t he sam e as t he size of it s input set .

I f you are not com put ing a group aggregat e, Group will ret urn one row for each group in it s input set . For exam ple:

movies=# EXPLAIN
movies-# SELECT EXTRACT( DECADE FROM birth_date ) FROM customers
movies-# GROUP BY EXTRACT( DECADE FROM birth_date );

Group (cost=69.83..69,83 rows=100 width=4)


-> Sort (cost=69.83..69.83 rows=1000 width=4)
-> Seq Scan on customers (cost=0.00..20.00 rows=1000 width=4)

I n t his case, t he est im at ed row count is 1/ 10 th of t he Group operat or's input set .

Subquery Scan a n d Subplan

A Subquery Scan operat or is used t o sat isfy a UNION clause; Subplan is used for subselect s. These operat ors scan t hrough t heir input set s,
adding each row t o t he result set . Each of t hese operat ors are used for int ernal bookkeeping purposes and really don't affect t he overall
query plan—you can usually ignore t hem .

Just so you know when t hey are likely t o be used, here are t wo sam ple query plans t hat show t he Subquery Scan and Subplan operat ors:

perf=# EXPLAIN
perf-# SELECT * FROM recalls WHERE mfgname = 'FORD'
perf-# UNION
perf=# SELECT * FROM recalls WHERE yeartxt = '1983';

Unique
->Sort
->Append
->Subquery Scan *SELECT* 1
->Seq Scan on recalls
->Subquery Scan *SELECT* 2
->Seq Scan on recalls

movies=# EXPLAIN
movies-# SELECT * FROM customers
movies-# WHERE customer_id IN
movies-# (
movies(# SELECT customer_id FROM rentals
movies(# );
NOTICE: QUERY PLAN:

Seq Scan on customers (cost=0.00..3.66 rows=2 width=47)


SubPlan
-> Seq Scan on rentals (cost=0.00..1.04 rows=4 width=4)

T i d Sca n

The Tid Scan ( t uple I D scan) operat or is rarely used. A t uple is roughly equivalent t o a row. Every t uple has an ident ifier t hat is unique
wit hin a t able—t his is called t he t uple I D. When you select a row, y ou can ask for t he row's t uple I D:

movies=# SELECT ctid, customer_id, customer_name FROM customers;


ctid | customer_id | customer_name
-------+-------------+----------------------
(0,1) | 3 | Panky, Henry
(0,2) | 1 | Jones, Henry
(0,3) | 4 | Wonderland, Alice N.
(0,4) | 2 | Rubin, William
(4 rows)

The " ctid" is a special colum n ( sim ilar t o t he OID) t hat is aut om at ically a part of every row. A t uple I D is com posed of a block num ber and a
t uple num ber wit hin t he block. All t he rows in t he previous sam ple ar e st ored in block 0 ( t he first block of t he t able file) . The customers row
for "Panky, Henry" is st ored in t uple 3 of block 0.

Aft er you know a row's t uple I D, you can request t hat row again by using it s I D:

movies=# SELECT customer_id, customer_name FROM customers


movies-# WHERE ctid = '(0,3)';
customer_id | customer_name
-------------+----------------------
4 | Wonderland, Alice N.
(1 row)

The t uple I D works like a bookm ark. A t uple I D, however, is valid only wit hin a single t ransact ion. Aft er t he t ransact ion com plet es, t he t uple
I D should not be used.

The Tid Scan operat or is used whenever t he planner/ opt im izer encount ers a const raint of t he form ctid = expression or expression =
ctid.

The fast est possible w ay t o ret rieve a row is by it s t uple I D. When you SELECT by t uple I D, t he Tid Scan operat or reads t he block specified in
t he t uple I D and ret urns t he request ed t uple.

M a t e r ia liz e

The Materialize operat or is used for som e subselect operat ions. The planner/ opt im izer m ay decide t hat it is less expensive t o m at erialize a
subselect once t han t o repeat t he work for each t op- level row.

Materialize will also be used for som e m erge- j oin operat ions. I n part icular, if t he inner input set of a Merge Join operat or is not produced
by a Seq Scan, an Index Scan, a Sort, or a Materialize operat or, t he planner/ opt im izer will insert a Materialize operat or int o t he plan.
The reasoning behind t his rule is not obvious—it has m ore t o do wit h t he capabilit ies of t he ot her operat ors t han w it h t he perform ance or t he
st ruct ure of your dat a. The Merge Join operat or is com plex; one requirem ent of Merge Join is t hat t he input set s m ust be ordered by t he
j oin colum ns. A second requirem ent is t hat t he inner input set m ust be reposit ionable; t hat is, Merge Join needs t o m ove backward and
forward t hrough t he input set . Not all ordered operat ors can m ove backward and forward. I f t he inner input set is produced by an operat or
t hat is not reposit ionable, t he planner/ opt im izer will insert a Materialize.

Setop ( Intersect, Intersect All, Except, Except All)

There are four Setop operat ors: Setop Intersect, Setop Intersect All, Setop Except, and Setop Except All. These operat ors are
produced only when t he planner/ opt im izer encount ers an INTERSECT, INTERSECT ALL, EXCEPT, or EXCEPT ALL clause, respect ively.

All Setop operat ors require t wo input set s. The Setop operat ors work by first com bining t he input set s int o a sort ed list , and t hen groups of
ident ical rows are ident ified. For each group, t he Setop operat or count s t he num ber of rows cont ribut ed by each input set . Finally, each
Setop operat or uses t he count s t o det erm ine how m any rows t o add t o t he result set .

I t hink t his will be easier t o underst and by looking at an exam ple. Here are t wo queries; t he first select s all customers born in t he 1960s:

movies=# SELECT * FROM customers


movies-# WHERE EXTRACT( DECADE FROM birth_date ) = 196;
customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00

The second select s all customers wit h a balance great er t han 0:

movies=# SELECT * FROM customers WHERE balance > 0;


customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
Now, com bine t hese t wo queries w it h an INTERSECT clause:

movies=# EXPLAIN
movies-# SELECT * FROM customers
movies-# WHERE EXTRACT( DECADE FROM birth_date ) = 196
movies-# INTERSECT
movies-# SELECT * FROM customers WHERE balance > 0;
SetOp Intersect
-> Sort
-> Append
-> Subquery Scan *SELECT* 1
-> Seq Scan on customers
-> Subquery Scan *SELECT* 2
-> Seq Scan on customers

The query execut or st art s by execut ing t he t wo subqueries and t hen com bining t he result s int o a sort ed list . An ext ra colum n is added t hat
indicat es which input set cont ribut ed each row:

customer_id | customer_name | birth_date | balance | input set


-------------+----------------------+------------+---------+----------
2 | Rubin, William | 1972-07-10 | 15.00 | inner
3 | Panky, Henry | 1968-01-21 | 0.00 | outer
4 | Wonderland, Alice N. | 1969-03-05 | 3.00 | outer
4 | Wonderland, Alice N. | 1969-03-05 | 3.00 | inner

The SetOp operat or finds groups of duplicat e rows ( ignoring t he input set pseudo- colum n) . For each group, SetOp count s t he num ber of rows
cont ribut ed by each input set . The num ber of rows cont ribut ed by t he out er set is called count(outer). The num ber of rows cont ribut ed by
t he inner result set is called count(inner).

Here is how t he sam ple looks aft er count ing each group:

customer_id | customer_name | birth_date | balance | input set


-------------+----------------------+------------+---------+----------
2 | Rubin, William | 1972-07-10 | 15.00 | inner
count(outer) = 0
count(inner) = 1
3 | Panky, Henry | 1968-01-21 | 0.00 | outer
count(outer) = 1
count(inner) = 0
4 | Wonderland, Alice N. | 1969-03-05 | 3.00 | outer
4 | Wonderland, Alice N. | 1969-03-05 | 3.00 | inner
count(outer) = 1
count(inner) = 1

The first group cont ains a single row, cont ribut ed by t he inner input set . The second group cont ains a single row, cont ribut ed by t he out er
input set . The final group cont ains t wo rows, one cont ribut ed by each input set .

When SetOp reaches t he end of a group of duplicat e row s, it det erm ines how m any copies t o writ e int o t he result set according t o t he
following rules:

€ INTERSECT— I f count(outer) > 0 and count(inner) > 0, writ e one copy of t he row t o t he result set ; ot herwise, t he row is not
included in t he result set .

€ INTERSECT ALL— I f count(outer) > 0 and count(inner) > 0, writ e n copies of t he row t o t he result set ; w her e n is t he great er
count(outer) and count(inner).

€ EXCEPT— I f count(outer) > 0 and count(inner) = 0, writ e one copy of t he row t o t he result set .

€ EXCEPT ALL— I f count(inner) >= count(outer), writ e n copies of t he row t o t he result set ; where n is count(outer) - count
(inner).
Ex e cu t i o n P l a n s Ge n e r a t e d b y t h e P l a n n e r

The EXPLAIN com m and only shows you t he execut ion plan t hat Post greSQL considered t o be t he least expensive. Unfort unat ely, you can't
convince Post greSQL to show t he ot her execut ion plans t hat it considered. The m ost com m on perform ance quest ion t hat we hear is "why didn't
t he dat abase use m y index?" I f you could see all t he alt ernat ives, you could usually ( but not always) answer t hat quest ion.

When t he opt im izer generat es a set of execut ion plans for a query, it st art s by generat ing a set of plans t hat t raverse each base t able involved
in t he query. For a single- t able query, t here is only one base t able and t he planner generat es a single set of execut ion plans. For a m ult it able
query ( a j oin) , t he planner st art s by generat ing a set of t raversal plans for each t able.

There are only t hree ways t hat Post greSQL can scan an individual t able: a t able scan ( Seq Scan) , an Index Scan, or a t uple- I D scan ( TID
Scan) . For each t able involved in t he query, t he planner generat es one plan t hat m akes a pass over t he t able using a Seq Scan operat or. I f t he
WHERE clause of t he query select s one or m ore rows by ctid value, t he planner generat es a TID Scan. Next , t he planner exam ines each index
defined for t he t able. I n t heory, any single- t able query can be sat isfied by m aking a com plet e scan of a B- Tree index ( assum ing t hat t he index
is not a part ial index) , but t he planner knows t hat a com plet e Index Scan is always m ore expensive t han a com plet e Seq Scan and won't
consider an Index Scan unless it offers som e advant age.

An index is useful if it can reduce t he num ber of t uples read from t he t able. I f t he WHERE clause for a query cont ains an expression of t he form
indexCol operator constant-expression or constant-expression operator indexCol, Post greSQL m ay be able t o use t he index t o read a
subset of t he t able. For exam ple, t he recalls t able has a B- Tree index t hat covers t he record_id colum n. I f you execut e t he query

perf=# SELECT * FROM recalls WHERE record_id > 8000;

t he planner exam ines t he expression record_id > 8000 and finds t hat it is writ t en in t he form indexCol operator constant-expression—
t he record_id colum n is an indexCol. Not ice t hat Post greSQL looks for a constant-expression, not j ust a const ant . That m eans t hat an
expression such as record_id > (800 * 10) is accept able as well. You can also include funct ion calls in t he constant-expression as long as
t he funct ions are not volat ile. A volat ile funct ion ( such as random()) can change value from row to row as Post greSQL scans t hrough t he t able.
A constant-expression can include any operat or t hat is not im plem ent ed by a volat ile funct ion.

Prior t o version 8.0, Post greSQL would only use an index if t he dat a t ype of t he indexed value exact ly m at ched t he dat a t ype of t he constant-
expression. St art ing wit h 8.0, Post greSQL will use an index if constant-expression can be coerced ( t hat is, convert ed) t o t he sam e t ype as
t he indexed value.

An index is also useful if it can produce rows in a desired order. I f an index produces rows in t he sequence required by t he ORDER BY clause,
t he planner will generat e an Index Scan plan for t he t able. Som e of t he query operat ors ( MergeJoin, Unique, Group, and Setop) require an
ordered input set . For exam ple, t he Unique operat or requires it s input set t o be ordered by t he set of colum ns required t o be unique. I f an
index can produce rows in t he order required by one of t hese operat ors, t he planner will generat e an Index Scan plan for t he t able. A Hash
index cannot produce rows in any part icular order and t herefore can't cont ribut e to t he ordering of a t able.

What happens if you have t wo ( or m ore) indexes t hat are useful t o a given query? The planner generat es a plan for each index and t he
opt im izer chooses t he least expensive plan am ong all of t he alt ernat ives.

Once t he planner has generat ed a set of plans for each base t able, it generat es a set of plans t o j oin t he t ables t oget her according t o t he WHERE
clause. Post greSQL can j oin t wo t ables t oget her using any of t hree query operat ors: Merge Join, Hash Join, or Nested Loop. Consider a
sim ple t wo- t able query such as

movies=# SELECT * FROM rentals, customers


WHERE rentals.customer_id = customers.customer_id;

Assum ing t hat you have a B- Tree indexes t hat cover rentals.customer_id and customers.customer_id, t he planner would generat e t he
following plans t o t raverse each t able individually:

SeqScan( rentals )
IndexScan( rentals.customer_id )
SeqScan( customers )
IndexScan( customers.customer_id )

To j oin t hese t wo t ables t oget her, t he planner produces a set of execut ion plans.

First , t he planner j oins rentals and customers using t he MergeJoin operat or. Given t hat t here are t wo pat hs t hrough each t able, t he planner
produces four MergeJoin plans for t he com binat ion of rentals and customers:

Code View: Scroll / Show All

MergeJoin( IndexScan( rentals.customer_id ), IndexScan( customers.customer_id ))


MergeJoin(Sort( SeqScan( rentals )), Sort( SeqScan( customers )))
MergeJoin( IndexScan( rentals.customer_id ) , Sort( SeqScan( customers )))
MergeJoin( Sort( SeqScan( rentals )), IndexScan( customers.customer_id ))

Not ice t hat t he MergeJoin operat or requires bot h input set t o be ordered by t he j oin colum n—because a SeqScan operat or does not produce
rows in any part icular order, t he planner insert s a Sort operat or where needed.

Next , t he planner produces a set of four NestedLoop plans:


Code View: Scroll / Show All

NestedLoop( IndexScan( rentals.customer_id ), IndexScan( customers.customer_id ))


NestedLoop( SeqScan( rentals ), SeqScan( customers ))
NestedLoop( IndexScan( rentals.customer_id ) , SeqScan( customers ))
NestedLoop( SeqScan( rentals ), IndexScan( customers.customer_id ))

Then, t he planner considers a set of four HashJoin plans:

HashJoin
(
IndexScan( rentals.customer_id ),
Hash( IndexScan( customers.customer_id ))
)

HashJoin
(
SeqScan( rentals ),
Hash( SeqScan( customers ))
)

HashJoin
(
IndexScan( rentals.customer_id ) ,
Hash( SeqScan( customers ))
)

HashJoin
(
SeqScan( rentals ),
Hash( IndexScan( customers.customer_id ))
)

The HashJoin operat or requires t he inner input set t o be a hash t able so t he planner insert s a Hash operat or in front of each of t he inner
t ables.

For a sim ple j oin, t he planner has considered 12 plans. But it 's not finished yet . The planner generat es a second set of plans using customers
as t he out er t able and rentals as t he inner t able ( in t he first set of j oin plans, rentals served as t he out er t able and customers served as t he
inner t able) :

Code View: Scroll / Show All

MergeJoin( IndexScan( customers.customer_id ), IndexScan( rentals.customer_id ))


MergeJoin(Sort( SeqScan( customers )), Sort( SeqScan( rentals )))
MergeJoin( IndexScan( customers.customer_id ) , Sort( SeqScan( rentals )))
MergeJoin( Sort( SeqScan(customers )), IndexScan( rentals.customer_id ))
NestedLoop( IndexScan( customers.customer_id ), IndexScan( rentals.customer_id ))
NestedLoop( SeqScan(customers ), SeqScan( rentals ))
NestedLoop( IndexScan(customers.customer_id ) , SeqScan( rentals ))
NestedLoop( SeqScan(customers ), IndexScan( rentals.customer_id ))
HashJoin(IndexScan(customers.customer_id), Hash(IndexScan(rentals.customer_id)))
HashJoin( SeqScan(customers ), Hash( SeqScan( rentals )))
HashJoin( IndexScan(customers.customer_id ) , Hash( SeqScan( rentals )))
HashJoin( SeqScan( customers ), Hash( IndexScan( rentals.customer_id )))

Once it 's finished, t he planner has considered 24 plans to j oin t hese t wo t ables. I n general, t he planner will consider

joinOperatorCount x (( pathCount( table1 ) x pathCount( table2 )) x 2 )

plans to j oin any t wo t ables, where pathCount(table ) is t he num ber of possible pat hs ( SeqScans, Index Scans, and TID Scans) t hrough a
given t able and joinOperatorCount is always 3 in Post greSQL ( MergeJoin, NestedLoop, and HashJoin) .

As you've seen, a t wo- t able j oin will result in 24 possible plans ( assum ing t hat t here are t wo pat hs t hrough each t able) . Add a t hird t able and
t he num ber of possible plans skyrocket s. So how does t he planner generat e plans for a t hree- t able j oin? I t first generat es a set of plans t o j oin
t wo of t he t hree t ables int o a single result set t hen generat es a set of plans t o j oin t he int erm ediat e result set t o t he rem aining t able. Wit h
t hree t ables ( a, b, and c) , you find t he following com binat ions ( not e – I 've abbreviat ed MergeJoin and HashJoin here t o bet t er fit t he print ed
page) :

Code View: Scroll / Show All

Merge( a, Join( b, c )) NestedLoop( a, Join( b, c )) Hash( a, Join( b, c ))


Merge( a, Join( c, b )) NestedLoop( a, Join( c, b )) Hash( a, Join( c, b ))
Merge( b, Join( a, c )) NestedLoop( b, Join( a, c )) Hash( b, Join( a, c ))
Merge( b, Join( c, a )) NestedLoop( b, Join( c, a )) Hash( b, Join( c, a ))
Merge( c, Join( a, b )) NestedLoop( c, Join( a, b )) Hash( c, Join( a, b ))
Merge( c, Join( c, b )) NestedLoop( c, Join( c, b )) Hash( c, Join( c, b ))
Merge( Join( a, b ), c ) NestedLoop( Join( a, b ), c ) Hash( Join( a, b ), c )
Merge( Join( a, c ), b ) NestedLoop( Join( a, c ), b ) Hash( Join( a, c ), b )
Merge( Join( b, a ), c ) NestedLoop( Join( b, a ), c ) Hash( Join( b, a ), c )
Merge( Join( b, c ), a ) NestedLoop( Join( b, c ), a ) Hash( Join( b, c ), a )
Merge( Join( c, a ), b ) NestedLoop( Join( c, a ), b ) Hash( Join( c, a ), b )
Merge( Join( c, b ), a ) NestedLoop( Join( c, b ), a ) Hash( Join( c, b ), a )

And considering t hat any of t hese two- t able j oin result s in 24 possible plans, you're suddenly looking at 864 possible plans! I f you add a fourt h
t able, t he planner considers t he plans needed t o j oin t hree of t he four t ables int o an int erm ediat e result , t hen j oins t he fourt h t able t o t hat .

I n pract ice, t he planner won't t ake t he t im e t o generat e every possible plan—t he planner cont ains a num ber of heurist ics t hat avoid generat ing
plans t hat are known t o be m ore expensive t han plans already seen. For exam ple, t he planner knows t hat a com plet e Index Scan is m ore
expensive t han a com plet e Seq Scan and it won't generat e a plan t hat includes a com plet e Index Scan unless t he ordering of t he result set is
im port ant .

I n fact , when you reach a cert ain point , t he plan generat or swit ches from a near- exhaust ive search t o an algorit hm known as t he genet ic query
opt im izer. The genet ic opt im izer evolves a plan by m ut at ing and recom bining possible j oin plans and t hen evaluat ing each generat ion for it s
"fit ness." As each generat ion em erges, t he genet ic opt im izer select s t hose m ut at ions and recom binat ions t hat result in lower execut ion plans.
The plan t hat event ually evolves is not guarant eed t o be t he best possible plan, but it is t ypically a "good" plan. By default , Post greSQL uses
t he genet ic query opt im izer when t he FROM clause of a query refers t o 12 or m ore t ables.
T h e A R C Bu f f e r M a n a g e r

I t 's im port ant t o keep a few point s in m ind when you're t rying t o t une a Post greSQL dat abase. First , t he shared buffer cache is shared. All of
t he exam ples in t his chapt er were built using a single- session dat abase—if you t ry t o reproduce t hese experim ent s, be sure you're t he only
one using t he dat abase or y our result s will vary widely. Second, t he im port ant part of a buffer m anagem ent schem e isn't t he part t hat
det erm ines what goes int o t he cache, it 's t he part t hat det erm ines w hat get s t hrown out of t he cache. When Post greSQL reads dat a from a
t able, it first checks t o see if t he required page is in t he cache. I f Post greSQL finds it in t he cache, it st ops looking. I f t he required page isn't
in t he cache, Post greSQL m ust read it in from disk. That m eans t hat every page in every t able is read int o t he cache as soon as a query ( or
ot her com m and) refers t o t he page. I f your buffer cache is large enough, Post greSQL will never evict a page from t he cache ( alt hough it will
writ e m odified pages t o disk) .

When Post greSQL adds a page t o t he shared cache and finds t hat t he cache is already full, it m ust evict som e ot her page. Prior t o release
8.0, Post greSQL would always evict t he least - recent ly - used page. Pret end t hat you have a very sm all buffer cache ( say t hree pages) . When
you execut e a com m and t hat causes a t able scan ( a scan of every page from beginning t o end) , t he server st art s by reading t he first page
int o t he cache. Next , t he server processes every t uple on t hat page ( ignoring dead and uncom m it t ed t uples as it goes) .

Once it has finished processing t he first page, it unpins t hat page in t he cache ( m eaning t hat t hat page is no longer in use and can be evict ed
if necessary) . The server t hen m oves on t o t he second page. Because t here are st ill t wo free pages in t he cache, Post greSQL j ust reads t he
second page from disk, st or es t hat page in t he cache, processes each t uple in t hat page, and t hen unpins t hat page. The server repeat s t his
sequence for t he t hird page. When Post greSQL com es t o t he fourt h page, it finds t hat t he cache is full and evict s t he least - recent ly - used page
( page one) from t he cache, replacing it w it h page four. That leaves you w it h pages t w o, t hree, and four in t he cache. When t he server reads
in page five, it evict s page t wo ( t he least - recent ly - used page) from t he cache, leaving y ou wit h pages t hree, four, and five in t he cache. That
sequence cont inues unt il t he server has finished reading t he ent ire t able—when you're finished, t he cache cont ains t he last t hree pages from
t he t able. I f you execut e t he sam e com m and again, t he sequence is t he sam e except t hat Post greSQL w ill have t o evict one of t he last t hree
pages before it can add t he first page t o t he cache. I f inst ead you execut e a com m and t hat can be sat isfied by looking at t he last t hree pages
of t he t able, Post greSQL will find t hose pages in t he cache and won't read t hem from disk.

Of course, if anot her user is running a com m and at t he sam e t im e, he's using t he sam e buffer cache and t he evict ion sequence will be
com plet ely different .

As you've seen earlier in t his chapt er, a t able scan can evict all of t he pages from an LRU cache.

St art ing wit h version 8.0, Post greSQL uses a new caching m echanism t hat const ant ly adapt s it self t o a changing workload. The ARC ( adapt ive
replacem ent cache) schem e effect ively uses t wo caches: One is a t radit ional LRU cache and t he ot her is a LFU cache. LFU st ands for " least -
frequent ly - used" as opposed t o " least - recent ly - used." Post greSQL divides t he shared m em ory segm ent int o one cache t hat buffers recent ly
used pages and a second cache t hat buffers frequent ly used pages. That m eans t hat if your shared buffer cache cont ains 1,024 pages, som e
pages w ill cont ain recent ly used pages and som e will cont ain frequent ly used pages. How m any pages does Post greSQL devot e t o each
cache? I t depends on your workload: Post greSQL adj ust s t he relat ive size of each cache as it runs. I f t he server sees a period of high " localit y
of reference" ( m eaning t hat t he current workload is frequent ly accessing a sm all set of pages) , it devot es m ore space t o t he LFU cache
( t aking pages away from t he LRU cache) . I f t he server sees a request for a page t hat was recent ly evict ed from t he LRU cache, it devot es
m ore space t o t he LRU cache ( t aking pages away from t he LFU cache) . To see how ARC affect s t he shared buffer cache, we'll show you t wo
sim ple queries—we'll run t he queries first in Post greSQL version 7.4.2, t hen again in version 8.0.

Code View: Scroll / Show All

$ timer \
> "SELECT * FROM recalls WHERE record_id > 8000 AND record_id < 8050" recalls
+-------------+--------------------------------+------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached|
|-------------+------+-------+----------+------+------+-------+---------+------+
|recalls | 0 | 0 | 4 | 45 | 1 | 49 | 3 | 0 |
+-------------+------+-------+----------+------+------+-------+---------+------+

To sat isfy t his query, Post greSQL uses t he index t hat covers t he record_id colum n. Because we j ust rest art ed t he 7.4 server ( and w e have
enough room for 512 pages in t he shared buffer cache) , Post greSQL had t o read all t hree of t he index blocks t hat we hit from disk.

Code View: Scroll / Show All

$ timer \
> "SELECT * FROM recalls WHERE record_id > 8000 AND record_id < 8050" recalls
+-------------+--------------------------------+-------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached|
|-------------+------+-------+----------+------+------+-------+---------+------+
|recalls | 0 | 0 | 0 | 49 | 1 | 49 | 0 | 3 |
+-------------+--------------------------------+-------------------------------+

When we execut e t he sam e query again, Post greSQL reads t he sam e t hree index blocks, but t his t im e, it finds t hem in t he cache. Now we'll
execut e a query t hat causes a t able scan:

Code View: Scroll / Show All

$ timer "SELECT * FROM recalls" recalls


+-------------+--------------------------------+-------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached|
|-------------+------+-------+----------+------+------+-------+---------+------+
|recalls | 1 | 39241 | 4400 | 0 | 0 | 0 | 0 | 0 |
+-------------+--------------------------------+-------------------------------+

This query m ade a com plet e pass t hrough t he t able, shuffling all 4,400 heap blocks t hrough a cache t hat can only hold 512 blocks. When t he
query com plet es, t he last 512 or so heap blocks t hat w e read are st ill in t he cache. Now go back and execut e t he first query ( t he one t hat
causes a part ial index scan) :

Code View: Scroll / Show All

$ timer \
> "SELECT * FROM recalls WHERE record_id > 8000 AND record_id < 8050" recalls
+-------------+--------------------------------+-------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached|
|-------------+------+-------+----------+------+------+-------+---------+------+
|recalls | 0 | 0 | 4 | 45 | 1 | 49 | 3 | 0 |
+-------------+--------------------------------+-------------------------------+

Not ice t hat Post greSQL had t o read t he sam e t hree index blocks again, but t he int ervening t able scan has evict ed t hem from t he cache and
t hey m ust be read from disk.

Now here is t he sam e sequence running in a version 8.0 server. Again, we'll execut e t he sam e quer y t hat caused a part ial index scan:

Code View: Scroll / Show All

$ timer \
> "SELECT * FROM recalls WHERE record_id > 8000 AND record_id < 8050" recalls
+-------------+--------------------------------+-------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached|
|-------------+------+-------+----------+------+------+-------+---------+------+
|recalls | 0 | 0 | 5 | 0 | 1 | 49 | 4 | 0 |
+-------------+--------------------------------+-------------------------------+

And we'll execut e it again j ust t o m ake sure t hat t he index blocks did in fact st ay in t he cache:

Code View: Scroll / Show All

$ timer \
> "SELECT * FROM recalls WHERE record_id > 8000 AND record_id < 8050" recalls
+-------------+--------------------------------+-------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached|
|-------------+------+-------+----------+------+------+-------+---------+------+
|recalls | 0 | 0 | 0 | 5 | 1 | 49 | 0 | 4 |
+-------------+--------------------------------+-------------------------------+

Now we'll execut e a query t hat , in version 7.4.2, t hrew t he index blocks out of t he cache:

Code View: Scroll / Show All

$ timer "SELECT * FROM recalls" recalls


+-------------+--------------------------------+-------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached|
|-------------+------+-------+----------+------+------+-------+---------+------+
|recalls | 1 | 39241 | 4400 | 5 | 0 | 0 | 0 | 0 |
+-------------+--------------------------------+-------------------------------+

And finally, we'll repeat t he part ial index scan query:

Code View: Scroll / Show All

$ timer \
> "SELECT * FROM recalls WHERE record_id > 8000 AND record_id < 8050" recalls
+-------------+--------------------------------+-------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached|
|-------------+------+-------+----------+------+------+-------+---------+------+
|recalls | 0 | 0 | 0 | 5 | 1 | 49 | 0 | 4 |
+-------------+--------------------------------+-------------------------------+

This t im e, t he 8.0 server has ret ained t he frequent ly used index blocks in t he LFU cache.
T a b l e St a t i st i cs

You've seen all t he operat ors t hat Post greSQL can use t o execut e a query. Rem em ber t hat t he goal of t he opt im izer is t o find t he plan wit h t he
least overall expense. Each operat or uses a different algorit hm for est im at ing it s cost of execut ion. The cost est im at ors need som e basic
st at ist ical inform at ion t o m ake educat ed est im at es.

Table st at ist ics are st ored in t wo places in a Post greSQL dat abase: pg_class and pg_statistic.

The pg_class syst em t able cont ains one row for each t able defined in your dat abase ( it also cont ains inform at ion about views, indexes, and
sequences) . For any given t able, t he pg_class.relpages colum n cont ains an est im at e of t he num ber of 8KB pages required t o hold t he t able.
The pg_class.reltuples colum n cont ains an est im at e of t he num ber of t uples current ly cont ained in each t able.

Not e t hat pg_class holds only est im at es—when you creat e a new t able, t he relpages est im at e is set t o 10 pages and relt uples is set t o 1,000
t uples. As you INSERT and DELETE rows, Post greSQL does not m aint ain t he pg_class est im at es. You can see t his here:

movies=# SELECT * FROM tapes;

tape_id | title | duration


----------+---------------+----------
AB-12345 | The Godfather |
AB-67472 | The Godfather |
MC-68873 | Casablanca |
OW-41221 | Citizen Kane |
AH-54706 | Rear Window |
(5 rows)

movies=# CREATE TABLE tapes2 AS SELECT * FROM tapes;


SELECT
movies=# SELECT reltuples, relpages FROM pg_class
movies-# WHERE relname = 'tapes2';
reltuples | relpages
-----------+----------
1000 | 10

Creat e t he tapes2 t able by duplicat ing t he tapes t able. You know t hat tapes2 really holds five t uples ( and probably requires a single disk
page) , but Post greSQL has not updat ed t he init ial default est im at e.

There are t hree com m ands t hat you can use t o updat e t he pg_class est im at es: VACUUM, ANALYZE, and CREATE INDEX.

The VACUUM com m and rem oves any dead t uples from a t able and recom put es t he pg_class st at ist ical inform at ion:

movies=# VACUUM tapes2;


VACUUM
movies=# SELECT reltuples, relpages FROM pg_class WHERE relname = 'tapes2';
reltuples | relpages
-----------+----------
5 | 1
(1 row)

The pg_statistic syst em t able holds det ailed inform at ion about t he dat a in a t able. Like pg_class, pg_statistic is not aut om at ically
m aint ained when you INSERT and DELETE dat a. The pg_statistic t able is not updat ed by t he VACUUM or CREATE INDEX com m and, but it is
updat ed by t he ANALYZE com m and:

movies=# SELECT staattnum, stawidth, stanullfrac FROM pg_statistic


movies-# WHERE starelid =
movies-# (
movies(# SELECT oid FROM pg_class WHERE relname = 'tapes2'
movies(# );
staattnum | stawidth | stanullfrac
-----------+----------+-------------
(0 rows)

movies=# ANALYZE tapes2;


ANALYZE

movies=# SELECT staattnum, stawidth, stanullfrac FROM pg_statistic


movies-# WHERE starelid =
movies-# (
movies(# SELECT oid FROM pg_class WHERE relname = 'tapes2'
movies(# );
staattnum | stawidth | stanullfrac
-----------+----------+-------------
1 | 12 | 0
2 | 15 | 0
3 | 4 | 0
(3 rows)

Post greSQL defines a view ( called pg_stats) t hat m akes t he pg_statistic t able a lit t le easier t o deal wit h. Here is what t he pg_stats view
t ells us about t he tapes2 t able:

movies=# SELECT attname, null_frac, avg_width, n_distinct FROM pg_stats


movies-# WHERE tablename = 'tapes2';
attname | null_frac | avg_width | n_distinct
---------+-----------+-----------+------------
tape_id | 0 | 12 | -1
title | 0 | 15 | -0.8
(2 rows)

You can see t hat pg_stats ( and t he underlying pg_statistics t able) cont ains one row for each colum n in t he tapes2 t able ( except for t he
duration colum n where every value happens t o be NULL) . The null_frac value t ells you t he percent age of rows where a given colum n
cont ains NULL. I n t his case, t here are no NULL values in t he tapes2 t able, so null_frac is set t o 0 for each colum n. avg_width cont ains t he
average widt h ( in byt es) of t he values in a given colum n. The n_distinct value t ells you how m any dist inct values are present for a given
colum n. I f n_distinct is posit ive, it indicat es t he act ual num ber of dist inct values. I f n_distinct is negat ive, it indicat es t he percent age of
rows t hat cont ain a dist inct value. A value of - 1 t ells you t hat every row in t he t able cont ains a unique value for t hat colum n.

pg_stats also cont ains inform at ion about t he act ual values in a t able:

movies=# SELECT attname, most_common_vals, most_common_freqs


movies-# FROM pg_stats
movies-# WHERE tablename = 'tapes2';
attname | most_common_vals | most_common_freqs
---------+-------------------+-------------------
tape_id | |
title | {"The Godfather"} | {0.4}
(2 rows)

The most_common_vals colum n is an array cont aining t he m ost com m on values in a given colum n. The most_common_freqs value t ells you how
oft en each of t he m ost com m on values appear. By default , ANALYZE st ores t he 10 m ost com m on values ( and t he frequency of t hose 10 values) .
You can increase or decrease t he num ber of com m on values using t he ALTER TABLE ... SET STATISTICS com m and.

Looking back at t he recalls t able, you can see t hat t he datea colum n cont ains 825 dist inct values:

Code View: Scroll / Show All

perf=# \x
Expanded display is on.
perf=# SELECT
perf-# n_distinct, most_common_vals, most_common_freqs
perf-# FROM
perf-# pg_stats
perf-# WHERE
perf-# tablename = 'recalls' AND attname = 'datea';
-[ RECORD 1 ]-----+-------------------------------------------------------------
n_distinct | 825
most_common_vals | {19791012,19921230,20001129,19980814,19950524,19950901,...}
most_common_freqs | {0.319667,0.005,0.005,0.00466667,0.00433333,0.00433333,...}

( We've t urned on psql's expanded display ( wit h t he \x com m and) and t rim m ed t he result s a bit to m ake t hem easier t o read.) The m ost
com m only found datea value is 19791012 and it occurs in approxim at ely 32% of all rows. The second m ost com m on value is 19921230 and
t hat value is found in .5% of all rows. Go ahead and creat e an index t hat covers t he datea colum n and t hen ANALYZE t he recalls t able:

perf=# CREATE INDEX recalls_by_datea ON recalls( datea );


CREATE INDEX
perf=# ANALYZE recalls;
ANALYZE

You m ight expect Post greSQL t o use t his index in a query t hat select s rows based on datea values, and som et im es it does:

perf=# EXPLAIN SELECT * FROM recalls WHERE datea = '19921230';


QUERY PLAN
------------------------------------------------------------------------
Index Scan using recalls_by_datea (cost=0.00..31.44 rows=31 width=1908)
Index Cond: (datea = '19921230'::bpchar)

You can see t hat t he opt im izer chose a part ial index scan ( using t he recalls_by_datea index) to sat isfy t his query. Now t ry t o select a
different set of rows:

perf=# EXPLAIN SELECT * FROM recalls WHERE datea = '19791012';


QUERY PLAN
-----------------------------------------------------------------
Seq Scan on recalls (cost=0.00..9015.09 rows=10690 width=1908)
Filter: (datea = '19791012'::bpchar)

I n t his case, t he opt im izer t hinks it would be fast er t o perform a com plet e t able scan on t he recalls t able, ignoring t he rows t hat fail t o sat isfy
t he WHERE clause.

The only t hing t hat 's changed bet ween t he t wo queries is t he value t hat you're searching for. The recalls t able hasn't changed; t he st at ist ics
haven't changed. Why would Post greSQL use an index to ret rieve t he second m ost - frequent ly - found value, but not t he m ost - frequent ly - found
value? Because t he opt im izer knows ( based on t he pg_stats.most_common_vals and pg_stats.most_common_freqs) t hat it m ust process 32%
of t he recalls t able in t he second case and reading a t able via an index is m ore expensive t han reading it via a sequent ial scan. To ret rieve
t he second m ost - frequent ly - found value, t he opt im izer knows t hat it will only read .5% of t he t able via t he index.

Anot her st at ist ic exposed by pg_stat is called histogram_bounds. The histogram_bounds colum n cont ains an array of values for each colum n
in your t able. These values are used t o part it ion your dat a int o approxim at ely equally sized chunks. For exam ple, here are t he
histogram_bounds values for t he recalls.potaff colum n:

perf=# SELECT histogram_bounds FROM pg_stats


perf-# WHERE tablename = 'recalls' and attname = 'potaff';
histogram_bounds
------------------------------------------------------------
{3,104,305,700,1503,3203,6503,15263,48003,210003,32000003}

Because t here are 11 values shown, t he histogram_bounds show t hat 10% of t he potaff values fall bet ween 3 and 104, 10% of t he potaff
values fall bet ween 104 and 305, 10% fall bet ween 305 and 700, and so on. The opt im izer uses t he histograms_bounds to decide how m uch
of an index it will need to t raverse in order to search for a specific value. For exam ple, if you search for a potaff value of 32000004 ( which fit s
int o t he last hist ogram "bucket ") , Post greSQL knows t hat it will only have t o t raverse t he last 10% of t he recalls_by_potaff index to sat isfy
t he query. On t he ot her hand, if you search for t he value 210003, Post greSQL m ust t raverse t he last 20% of t he index. Here's how t he
opt im izer handles a search over t he last 10% of t he index:

Code View: Scroll / Show All

perf=# EXPLAIN SELECT * FROM recalls WHERE potaff >= 14500003;


QUERY PLAN
---------------------------------------------------------------------------------
Index Scan using recalls_by_potaff (cost=0.00..17.99 rows=4 width=1908)
Index Cond: (potaff >= 14500003::numeric)

The opt im izer has chosen an index scan t o sat isfy t his query because t raversing 10% of t he index is less expensive t han a com plet e t able scan.
Now consider a sim ilar query, but t his t im e, you're searching for a value known t o fall wit hin t he last 20% of t he index:

perf=# EXPLAIN SELECT * FROM recalls WHERE potaff >= 210003;


QUERY PLAN
----------------------------------------------------------------
Seq Scan on recalls (cost=0.00..9015.09 rows=3928 width=1908)
Filter: (potaff >= 210003::numeric)

This t im e, t he opt im izer has chosen a t able scan. The st ruct ure of t his query is ident ical t he previous query—only t he search value has
changed.

The last st at ist ic st ored in pg_stats is an indicat ion of whet her t he rows in a t able are st ored in colum n order:

movies=# SELECT attname, correlation FROM pg_stats


movies-# WHERE tablename = 'tapes2';
attname | correlation
---------+-------------

attname | correlation
---------+-------------
tape_id | 0.7
title | -0.5
(2 rows)

A correlation of 1 m eans t hat t he rows are sort ed by t he given colum n. I n pract ice, you will see a correlation of 1 only for brand new
t ables ( whose rows happened t o be sort ed before insert ion) or t ables t hat you have reordered using t he CLUSTER com m and.
P e r f o r m a n ce T i p s

That wraps up t he discussion of perform ance in Post greSQL. Here are few t ips t hat you should keep in m ind whenever you
run int o an apparent perform ance problem :

€ VACUUM and ANALYZE your dat abase aft er any large change in dat a values. This will give t he query opt im izer a bet t er
idea of how your dat a is dist ribut ed.

€ Use t he CREATE TABLE AS or CLUSTER com m ands t o clust er rows wit h sim ilar key values. This m akes an index
t raversal m uch fast er.

€ I f you t hink you have a perform ance problem , use t he EXPLAIN com m and t o find out how Post greSQL has decided t o
execut e your query.

€ You can influence t he opt im izer by disabling cert ain query operat ors. For exam ple, if you want t o ensure t hat a query
is execut ed as a sequent ial scan, you can disable t he I ndex Scan operat or by execut ing t he following com m and: " SET
ENABLE_INDEX_SCAN TO OFF;". Disabling an operat or does not guarant ee t hat t he opt im izer won't use t hat operat or —
it j ust considers t he operat or t o be m uch m ore expensive. The Post greSQL User Manual cont ains a com plet e list of
runt im e param et ers.

€ You can also influence t he opt im izer by adj ust ing t he relat ive cost s for cert ain query operat ions. See t he descript ions
for CPU_INDEX_TUPLE_COST, CPU_OPERATOR_COST, CPU_TUPLE_COST, EFFECTIVE_CACHE_SIZE, and RANDOM_PAGE_COST
in t he Post greSQL User Manual.

€ Minim ize net work t raffic by doing as m uch work as possible in t he server. You will usually get bet t er perform ance if
you can filt er dat a on t he server rat her t han in t he client applicat ion.

€ One source of ext ra net work t raffic t hat m ight not be so obvious is m et adat a. I f your client applicat ion ret rieves 10
rows using a single SELECT, one set of m et adat a is sent t o t he client . On t he ot her hand, if you creat e a cursor t o
ret rieve t he sam e set of rows, but execut e 10 FETCH com m ands t o grab t he dat a, you'll also get 10 ( ident ical) set s of
m et adat a.

€ Use server- side procedures ( t riggers and funct ions) t o perform com m on operat ions. A server- side procedure is
parsed, planned, and opt im ized t he first t im e you use it , not every t im e you use it .
Pa r t I I : Pr ogr a m m ing w it h Post gr e SQL
5 I nt r oduct ion t o Posgr eSQL Pr ogr am m ing

6 Ex t ending Post gr eSQL

7 PL/ pgSQL

8 The Post gr eSQL C API —libpq

9 A Sim pler C API —libpgeasy

10 The New Post gr eSQL C+ + API —libpqx x

11 Em bedding SQL Com m ands in C Pr ogr am s—ecpg

12 Using Post gr eSQL fr om an ODBC Client Applicat ion

13 Using Post gr eSQL fr om a Jav a Client Applicat ion

14 Using Post gr eSQL w it h Per l

15 Using Post gr eSQL w it h PHP

16 Using Post gr eSQL w it h Tcl and Tcl/ Tk

17 Using Post gr eSQL w it h Py t hon

18 npgsql: The . NET Dat a Pr ov ider

19 Ot her Useful Pr ogr am m ing Tools

Chapter 5 . I ntroduction to PostgreSQL Program m ing

Post gr eSQL is a client / ser v er dat abase. When y ou use Post gr eSQL, t her e ar e at least t w o pr ocesses inv olv ed—t he client and t he ser v er . I n
a client / ser v er env ir onm ent , t he ser v er pr ov ides a ser v ice t o one or m or e client s. The Post gr eSQL ser v er pr ov ides dat a st or age and
r et r iev al ser v ices. A Post gr eSQL client is an applicat ion t hat r eceiv es dat a st or age and r et r iev al ser v ices fr om a Post gr eSQL ser v er . Quit e
oft en, t he client and t he ser v er exist on differ ent phy sical m achines connect ed by a net w or k . The client and ser v er can also ex ist on a
single host . As y ou w ill see, t he client and t he ser v er do not hav e t o be w r it t en in t he sam e com put er language. The Post gr eSQL ser v er is
w r it t en in C; m an y client applicat ions ar e w r it t en in ot her languages.

I n t his chapt er , I 'll int r oduce y ou t o som e of t he concept s behind client / ser v er pr ogr am m ing for Post gr eSQL. I 'll also show y ou opt ions
you hav e for ser v er - side pr ogr am m ing languages an d for client - side pr ogr am m ing int er faces. I also discuss t he basic st r uct ur e of a
Post gr eSQL client applicat ion, r egar dless of w hich client - side language y ou choose. Finally , I ex plor e t he adv ant ages and disadv ant ages of
ser v er - side v er sus client - side code.

Server - Side Program m ing

The t ask of pr ogr am m ing for Post gr eSQL falls int o t w o br oad cat egor ies: ser v er - side pr ogr am m ing and client - side pr ogr am m ing.

Ser v er - side code ( as t he nam e im plies) is code t hat ex ecut es w it hin a Post gr eSQL ser v er . Ser v er- side code ex ecut es t he sam e w ay
r egar dless of w hich language w as used t o im plem ent an y giv en client . I f t he client an d ser v er ar e r unning on differ ent phy sical host s, all
ser v er - side code ex ecut es on t he ser v er m achine an d w it hin t he ser v er pr ocess. I f t he client and ser v er ar e r unning on t he sam e m achine,
ser v er - side code st ill r uns w it hin t he ser v er pr ocess. I n m ost cases, ser v er - side code is w r it t en in one of t he pr ocedur al languages
dist r ibut ed w it h Post gr eSQL.

Post gr eSQL v er sion 7. 1 ships w it h t hr ee pr ocedur al languages: PL/ pgSQL, PL/ Tcl, and PL/ Per l. Release 7. 2 adds PL/ Py t hon t o t he m ix . You
can also w r it e ser v er - side pr ocedur es in SQL. Lat er v er sions of Post gr eSQL add suppor t for PL/ Jav a. You can ev en w r it e ser v er - side
pr ocedur es in t he for m of bash shell- scr ipt s using PL/ bash.

You can use pr ocedur al languages t o cr eat e funct ions t hat ex ecut e w it hin t he ser v er . A funct ion is a nam ed sequence of st at em ent s t hat
y ou can use w it hin an SQL ex pr ession. When y ou w r it e a funct ion in a ser v er - side language, y ou ar e ex t ending t he ser v er . These ser v er
ex t ensions ar e also k now n as st or ed pr ocedur es.

PL/ pgSQL

I f y ou hav e ev er used a com m er cial dat abase sy st em —Or acle, Sy base, or SQL Ser v er , for ex am ple—y ou hav e pr obably used a SQL- based
pr ocedur al language. Or acle's pr ocedur al language is called PL/ SQL; Sy base and SQL Ser v er use Tr ansact SQL. PL/ pgSQL is v er y sim ilar t o
t hese pr ocedur al languages.

PL/ pgSQL com bines t he declar at iv e nat ur e of SQL com m ands w it h st r uct ur es offer ed by ot her languages. When y ou cr eat e a PL/ pgSQL
funct ion, y ou can declar e local v ar iables t o st or e int er m ediat e r esult s. PL/ pgSQL offer s a v ar iet y of loop const r uct s ( FOR loops, WHI LE
loops, and cur sor it er at ion loops) . PL/ pgSQL giv es y ou t he capabilit y t o condit ionally ex ecut e sect ions of code based on t he r esult s of a
t est . You can pass par am et er s t o a PL/ pgSQL funct ion, m ak ing t he funct ion r eusable. You can also inv ok e ot her funct ions fr om w it hin a
PL/ pgSQL funct ion.

Chapt er 7 , " PL/ pgSQL," pr ov ides an in- dept h descr ipt ion of PL/ pgSQL.

Ot her Procedural Languages Support ed by Post greSQL

One of t he m or e unusual aspect s of Post gr eSQL ( com par ed t o ot her dat abase sy st em s) is t hat y ou can w r it e pr ocedur al code in m or e t han
one language. As not ed pr ev iously , t he st andar d dist r ibut ion of Post gr eSQL includes PL/ pgSQL, PL/ Per l, PL/ Tcl, and, as of r elease 7. 2,
PL/ Py t hon.

The lat t er t hr ee languages each enable y ou t o cr eat e st or ed pr ocedur es using a subset of t he host language. Post gr eSQL r est r ict s each t o
a subset of t he language t o ensur e t hat a st or ed pr ocedur e can't do nast y t hings t o y our env ir onm ent .

Specifically , t he Post gr eSQL pr ocedur al languages ar e not allow ed t o per for m I / O ex t er nal t o t he dat abase ( in ot her w or ds, y ou can't use
a Post gr eSQL pr ocedur al language t o do any t hing out side of t he cont ex t of t he ser v er ) . I f y ou find t hat y ou need t o affect y our ex t er nal
env ir onm ent , y ou can load an unt r ust ed pr ocedur al language, but be aw ar e t hat y ou w ill be int r oducing a secur it y r isk w hen y ou do so.

When y ou inst all Post gr eSQL fr om a st andar d dist r ibut ion, none of t he ser v er - side languages ar e inst alled. You can pick and choose w hich
languages y ou w ant t o inst all in t he ser v er . I f y ou don't use a giv en language, y ou can choose not t o inst all it . I 'll show y ou how t o inst all
ser v er - side languages in Chapt er 7.

You can see w hich languages are cur r ent ly inst alled in y our dat abase ser v er w it h t he follow ing quer y :

Code View : Scr oll / Show All

movies=# select * from pg_language;


lanname | lanispl | lanpltrusted | lanplcallfoid | lanvalidator | lanacl
----------+---------+--------------+---------------+--------------+---------
internal | f | f | 0 | 2246 |
c | f | f | 0 | 2247 |
sql | f | t | 0 | 2248 | {=U/pg}
(3 rows)

You can see t hat m y ser v er cur r ent ly suppor t s t hr ee languages: int er nal, C, an d sql. The lanispl colum n t ells u s t hat none of t hese ar e
consider ed t o be pr ocedur al languages. You m ay be t hink ing t hat C should be consider ed a pr ocedur al language, but in t his cont ex t a
pr ocedur al language is one t hat can be inst alled an d de- inst alled fr om t he ser v er . You can det er m ine w het her a language is t r ust ed by
ex am ining t he lanplt r ust ed colum n. A t r ust ed language pr om ises not t o pr ov ide elev at ed pr iv ileges t o a user . I f a language is not a
t r ust ed language, only Post gr eSQL super user s can cr eat e a new funct ion in t hat language.

Ext ending Post greSQL Using Ext ernal Languages

Post gr eSQL- host ed pr ocedur al languages ar e not t he only t ools av ailable for ex t ending t he ser ver . You can also add ex t ensions t o a
Post gr eSQL ser v er by cr eat ing cust om dat a t y pes, new funct ions, and new oper at or s w r it t en in an ex t er nal language ( usually C or C+ + ) .

When y ou cr eat e pr ocedur al- language ex t ensions, t he sour ce code ( and t he obj ect code, if any ) for t hose funct ions is st or ed in t ables
w it hin t he dat abase. When y ou cr eat e a funct ion using an ex t er nal language, t he funct ion is not st or ed in t he dat abase. I nst ead, it is
st or ed in a shar ed libr ar y t hat is link ed int o t he ser v er w hen fir st used.

You can find m any Post gr eSQL ex t ensions on t he Web. For ex am ple, t he Post GI S pr oj ect adds a set of dat a t y pes and suppor t ing
funct ions for dealing w it h geogr aphic dat a. The cont r ib dir ect or y of a Post gr eSQL dist r ibut ion cont ains an ex t ension for dealing w it h I SBNs
and I SSNs.

I n Chapt er 6 , " Ex t ending Post gr eSQL," I 'll show y ou a few sim ple ex am ples of h ow t o add cust om dat a t y pes and funct ions w r it t en in C.
Cl i e n t - Si d e AP I s

When you want t o build applicat ions t hat access a Post greSQL dat abase, you use one ( or m ore) of t he client applicat ion
program m ing int erfaces ( or API s for short ) . Post greSQL has a rich variet y of API s t hat support a num ber of program m ing
languages.

Post greSQL support s t he API s shown in Table 5.1.

Ta ble 5 .1 . Post greSQL Clie nt API s

I nt erfa ce N a m e Support ed La ngua ges Described I n


libpq C/ C+ + Chapt er 8
libpgeasy C/ C+ + Chapt er 9
libpq++ C+ + Chapt er 10
ecpg C/ C+ + Chapt er 11
ODBC C/ C+ + Chapt er 12
JDBC Java Chapt er 13
Perl Perl Chapt er 14

PHP[ 1 ] PHP Chapt er 15

pgtcl TCL Chapt er 16


PyGreSQL Pyt hon Chapt er 17

pg.el1 Em acs Lisp Not covered

[ 1]
The st andard Post greSQL dist ribut ion does not include t he PHP or Em acs int erfaces, but t hey are available
separat ely on t he Web.

Table 5.1 is not all- inclusive. You can writ e Post greSQL client s using languages not m ent ioned in Table 5.1. For exam ple,
Kylix ( Borland's Pascal offering for Linux) offers a Post greSQL int erface. Also, m any ot her languages ( such as Microsoft
Access and Visual Basic) provide access t o Post greSQL t hrough t he ODBC int erface. I n recent versions, t he Post greSQL
developm ent t eam has rem oved som e int erfaces from t he core Post greSQL dist ribut ion. I f t he language t hat you want t o use
is not direct ly support ed in t he core dist ribut ion, surf t he gborg.post gresql.org websit e t o find t he int erface you need.
Ge n e r a l St r u ct u r e of Clie n t Ap p lica t ion s

Th is is a g ood t im e t o discu ss, in g en er al t er m s, h o w a clien t ap p licat ion in t er act s w it h a Post g r eSQL d at ab ase. All t he clien t API s
h av e a co m m o n st r u ct u r e, b u t t h e d et ails v ar y g r eat ly f r om lan g u ag e t o lan g u ag e.

Fig u r e 5 . 1 illu st r at es t h e b asic flow of a clien t ' s in t er act ion w it h a ser v er .

Fi g u r e 5 . 1 . Cl i e n t / se r v e r i n t e r a ct i o n .
An ap p licat ion b eg in s in t er act in g w it h a Post g r eSQL d at ab ase b y est ab lish in g a con n ect ion .

Becau se Post g r eSQL is a clien t / ser v er d at ab ase, som e sor t of con n ect ion m u st ex ist b et w een a clien t ap p licat ion an d a d at ab ase
ser v er . I n t h e case of Post g r eSQL, clien t / ser v er com m u n icat ion t ak es t h e f or m of a n et w or k lin k . I f t h e clien t an d ser v er ar e on
d if f er en t sy st em s, t h e n et w or k lin k is a TCP/ I P sock et . I f t h e clien t an d ser v er ar e o n t h e sam e sy st em , t h e n et w or k lin k is eit h er a
Un ix - d om ain sock et o r a TCP/ I P con n ect ion . A Un ix - d om ain sock et is a lin k t h at ex ist s en t ir ely w it h in a sin g le h ost — t he n et w or k is
a log ical n et w or k ( r at h er t h an a p h y sical n et w or k ) w it h in t h e OS k er n el.

Con n e ct ion Pr op e r t ie s

Reg ar d less of w h et h er y ou ar e con n ect in g t o a local ser v er or a r em ot e ser v er , t h e API u ses a set of p r op er t ies t o est ab lish t h e
con n ect ion . Con n ect ion p r op er t ies ar e u sed t o iden t if y t h e ser v er ( a n et w or k p or t n u m b er an d h ost ad d r ess) , t h e sp ecif ic d at ab ase
t h at y ou w an t t o con n ect t o, y ou r u ser I D ( an d p assw or d if r eq u ir ed ) , an d v ar iou s d eb u g g in g an d log g in g op t ion s. Each API allow s
y ou t o ex plicit ly sp ecif y con n ect ion p r op er t ies, b u t y ou can also u se d ef au lt v alu es f or som e ( or all) of t h e p r op er t ies. Most of t h e
clien t - side API s let y ou sp ecif y con n ect ion p r op er t ies in t h e f or m of a st r in g of keyword=value p air s. For ex am p le, t o con n ect t o a
d at ab ase n a m ed accounting o n a h ost n am ed jersey, y ou w ou ld u se ap r op er t y st r in g su ch as

"dbname=accounting host=jersey"

Each keyword=value d ef in es a sin g le con n ect ion p r op er t y . I f y ou om it a con n ect ion p r op er t y , Post g r eSQL ch eck s f or a n
en v ir on m en t v ar iab le t h at cor r esp on d s t o t h e p r op er t y an d , if t h e en v ir on m en t v ar iab le d oesn ' t ex ist , Post g r eSQL u ses a h ar d -
cod ed d ef au lt v alu e. See Tab le 5 . 2 .

T a b l e 5 . 2 . Ke y w o r d s a n d En v i r o n m e n t V a r i a b l e s

Ke y w o r d En v i r o n m e n t V a r i a b l e D e scr i p t i o n

dbname PGDATABASE Sp ecif ies t h e n a m e of t h e d at ab ase t h at y ou w an t t o


con n ect t o. I f n ot sp ecif ied , t h e clien t ap p licat ion
t r ies t o con n ect t o a d at a- b ase w it h t h e sam e n a m e
as y ou r u ser n am e.

user PGUSER Sp ecif ies t h e Post g r eSQL u ser n am e y ou w an t t o


con n ect as. I f n ot sp ecif ied , t h e clien t u ses y ou r
op er at in g sy st em id en t it y .

host PGHOST Sp ecif ies t h e n am e ( or I P ad d r ess) of t h e com p u t er


t h at h ost s t h e d at ab ase y ou w an t t o con n ect t o. I f
t h e v alu e st ar t s w it h a ' /', t h e clien t assu m es t h at
y ou w an t t o con n ect t o a Un ix - d om ain sock et
locat ed in t h at d ir ect or y . I f n ot sp ecif ied , t h e clien t
con n ect s t o a Un ix - d om ain sock et in /tmp.

hostaddr PGHOSTADDR Specif ies t h e I P ad d r ess of t h e com p u t er t h at h ost s


t h e d at ab ase y ou w an t t o con n ect t o . Wh en y ou
sp ecif y hostaddr ( in st ead of host) , y ou av oid a
n am e look u p ( w h ich can b e slow on som e
n et w or k s) . I f n ot sp ecif ied , t h e clien t u ses t h e v alu e
of t h e host p r op er t y t o f in d t h e ser v er .

port PGPORT Sp ecif ies t h e TCP/ I P p or t n u m b er t o con n ect t o ( or ,


if y ou ' r e con n ect in g t o a Un ix - d om ain sock et , t h e
sock et f ilen am e ex t en sion ) . I f n ot sp ecif ied , t h e
d ef au lt PGPORT is 5 4 3 2 .

connect_timeout PGCONNECT_TIMEOUT Sp ecif ies t h e m ax i m u m am o u n t of t im e ( in secon d s)


t o w ait f or t h e con n ect ion p r ocess t o com p let e. I f
n ot sp ecif ied ( or if y ou sp ecif y a v alu e of 0 ) , t h e
clien t w ill w ait f or ev er .

slmode PGSSLMODE Sp ecif ies w h et h er t h e clien t w ill at t em p t ( or accep t )


an SSL- secu r ed con n ect ion . Possib le v alu es ar e
disable, allow, prefer, an d require. disable a n d
require ar e ob v iou s, b u t allow an d prefer seem a
bit m y st er iou s. I f sslmode is allow, t h e clien t f ir st
at t em p t s a n in secu r e con n ect ion , b u t allow s a n SSL
con n ect ion if a n in secu r e con n ect ion can ' t b e bu ilt .
I f sslmode is p r ef er r ed , t h e clien t f ir st at t em p t s a
secu r e con n ect ion , b u t accep t s an in secu r e
con n ect ion if a secu r e con n ect ion can ' t b e bu ilt . I f
n ot sp ecif ied , t h e d ef au lt sslmode is prefer.

service PGSERVICE Sp ecif ies t h e n a m e of a ser v ice as d ef in ed in t h e


pg_service.conf f ile ( see n ex t sect ion ) .

A m or e con v en ien t w ay t o en cod e con n ect ion p ar am et er s is t o u se t h e pg_service.conf f ile. Wh en y ou sp ecif y a ser v ice n am e
( w it h t h e PGSERVICE en v ir on m en t v ar iab le or t h e service=service-name con n ect ion p r op er t y ) , t h e clien t ap p licat ion ( act u ally t h e
libpq lib r ar y ) op en s a f ile n am ed $PREFIX/etc/pg_service.conf an d sear ch es for a sect ion t h at m at ch es t h e service-name t h at
y ou p r ov id ed . I f libpq locat es t h e sect ion t h at y ou n am ed , it r ead s con n ect ion p r op er t ies f r om t h at sect ion . A t y p ical
pg_service.conf f ile m ig h t look sim ilar t o t h e f ollow in g :

[accounting]
dbname=accounting
host=jersey
sslmode=required

[development]
dbname=accounting
host=guernsey
sslmode=prefer

Each ser v ice b eg in s w it h t h e ser v ice n am e ( en closed in sq u ar e b r ack et s) an d con t in u es u n t il t h e n ex t sect ion ( or t h e en d of t h e
f ile) . A ser v ice is sim p ly a collect ion of con n ect ion p r op er t ies in t h e u su al keyword=value f or m at . Th e sam p le ab ov e d ef in es t w o
ser v ices ( on e n am ed accounting an d t h e ot h er n am ed development) .

Th e n ice t h in g ab ou t u sin g a ser v ice n am e is t h at y ou can con solid at e all y ou r con n ect ion p r op er t ies in a sin gle locat ion an d t h en
giv e on ly t h e ser v ice n am e t o y ou r d at ab ase u ser s. Wh en y ou con n ect t o a d at ab ase u sin g a ser v ice n am e, t h e clien t ap p licat ion
load s t h e ser v ice d ef in it ion f ir st t h en p r ocesses t h e con n ect ion st r in g . Th at m ean s t h at y ou can sp ecif y b ot h a ser v ice n a m e an d a
con n ect ion st r in g ( p r op er t ies f ou n d in t h e con n ect ion st r in g w ill ov er r id e t h e p r op er t ies sp ecif ied in t h e ser v ice) . En v ir on m en t
v ar iab les ar e on ly con su lt ed in a last - d it ch ef f or t t o f in d m issin g v alu es.

Af t er a ser v er con n ect ion h as b een est ab lish ed , t h e API g iv es y ou a h an d le. A h an d le is n ot h in g m or e t h an a ch u n k of d at a t h at y ou


g et f r om t h e API an d t h at y ou giv e b ack t o t h e API w h en y ou w an t t o sen d or r eceiv e d at a ov er t h e con n ect ion . Th e ex act f or m of a
h an d le v ar ies d ep en d in g o n t h e lan g u ag e t h at y ou ar e u sin g ( or m or e p r ecisely , t h e d at a t y p e of a h an d le v ar ies w it h t h e API t h at
y ou u se) . For ex am p le, in lib p q ( t h e C API ) , a h an d le is a v oid p oin t er —y ou can ' t d o an y t h in g w it h a v oid p oin t er ex cep t t o g iv e it
b ack t o t h e API . I n t h e case of lib p q + + an d JDBC, a h an d le is em b ed d ed w it h in a class.

Af t er y ou ob t ain a con n ect ion h an d le f r om t h e API , y ou can u se t h at h an d le t o in t er act w it h t h e d at ab ase. Ty p ically , a clien t w ill
w an t t o ex ecu t e SQL q u er ies an d p r ocess r esu lt s. Each API p r ov id es a set of f u n ct ion s t h at w ill sen d a SQL co m m an d t o t h e
d at ab ase. I n t h e sim p lest case, y ou u se a sin g le f u n ct ion ; m or e com p lex ap p licat ion s ( an d API s) can sep ar at e co m m an d ex ecu t ion
in t o t w o p h ases. Th e f ir st p h ase sen d s t h e co m m an d t o t h e ser v er ( f or er r or ch eck in g an d q u er y p lan n in g ) an d t h e secon d p h ase
act u ally car r ies ou t t h e co m m an d ; y ou can r ep eat t h e ex ecu t ion p h ase as m an y t im es as y o u lik e. Th e ad v an t ag e t o a t w o - p h ase
ex ecu t ion m et h od is p er f or m an ce. You can p ar se an d p lan a co m m an d on ce an d ex ecu t e it m a n y t im es, r at h er t h an p ar sin g an d
p lan n in g ev er y t im e y ou ex ecu t e t h e co m m an d . Tw o- p h ase ex ecu t ion can also sim p lif y y ou r co d e b y f act or in g t h e w or k r eq u ir ed t o
g en er at e a co m m an d in t o a sep ar at e f u n ct ion : On e f u n ct ion can g en er at e a co m m an d an d a sep ar at e f u n ct ion can ex ecu t e t h e
co m m an d .

Af t er y ou u se an API t o sen d a co m m an d t o t he ser v er , y ou g et b ack t h r ee t y p es of r esu lt s. Th e f ir st r esu lt t h at co m es b ack f r om


t h e ser v er is an in d icat ion of su ccess or f ailu r e— ev er y co m m an d t h at y ou sen d t o t h e ser v er w ill eit h er f ail o r su cceed . I f y ou r
co m m an d f ails, y ou can u se t h e API t o r et r iev e an er r or cod e an d a t r an slat ion of t h at cod e in t o so m e f or m of t ex t u al m essag e.

I f t h e ser v er t ells y ou t h at t h e co m m an d ex ecu t ed su ccessf u lly , y ou can r et r iev e t h e n ex t t y p e of r esu lt : m et ad at a. Met ad at a is


d at a ab ou t d at a. Sp ecif ically , m et ad at a is in f or m at ion ab ou t t h e r esu lt s of t h e co m m an d t h at y ou j u st ex ecu t ed . I f y ou alr ead y
k n ow t h e f or m at of t h e r esu lt set , y ou can ig n or e t h e m et ad at a.

Wh en y ou ex ecu t e a co m m an d su ch as INSERT, UPDATE, or DELETE, t h e m et ad at a r et u r n ed b y t h e ser v er is sim p ly a cou n t of t h e


n u m b er of r ow s af f ect ed b y t h e com m an d . So m e co m m an d s r et u r n n o m et ad at a. For ex am p le, w h en y ou ex ecu t e a CREATE TABLE
com m an d , t h e on ly r esu lt s t h at y ou g et f r om t h e ser v er ar e su ccess or f ailu r e ( an d a n er r or cod e if t h e co m m an d f ails) . Wh en y ou
ex ecu t e a SELECT co m m an d , t h e m et ad at a is m or e com p lex . Rem em b er t h at a SELECT st at em en t can r et u r n a set of zer o or m or e
r ow s, each con t ain in g on e or m or e colu m n s. Th is is called t h e r esu lt set . Th e m et ad at a f or a SELECT st at em en t d escr ib es each of
t h e colu m n s in t h e r esu lt set .

Fie ld Ve r su s Colu m n in Re su lt Se t s

Wh en d iscu ssin g a r esu lt set , t h e Post g r eSQL d ocu m en t at ion m ak es a d ist in ct ion b et w een a field an d a colu m n . A
colu m n com es d ir ect ly f r om a t ab le ( or a v iew ) . A f ield is t h e r esu lt of a com p u t at ion in t h e SELECT st at em en t . Fo r
ex am p le, if y ou ex ecu t e t h e co m m an d SELECT customer_name, customer_balance * 1.05 FROM customers,
customer_name is a colu m n in t h e r esu lt set an d customer_balance * 1.05 is a f ield in t h e r esu lt set . Th e
d if f er en ce b et w een a f ield an d a col u m n is m ost ly ir r elev an t an d can b e ig n or ed ; j u st b e aw ar e t h at t h e
d ocu m en t at ion u ses t w o d if f er en t w or d s f or t h e sam e m ean in g .

Wh en t h e ser v er sen d s r esu lt set m et ad at a, it r et u r n s t h e n u m b er of r ow s in t h e r esu lt set an d t h e n u m b er of f ield s. For each f ield
in t h e r esu lt set , t h e m et ad at a in clu d es t h e f ield n am e, d at a t y p e in f or m at ion , an d t h e size of t h e f ield ( on t h e ser v er ) .

I sh ou ld m en t ion h er e t h at m ost clien t ap p licat ion s d on ' t r eally n eed t o d eal w it h all t h e m et ad at a r et u r n ed b y t h e ser v er . I n
g en er al, w h en y ou w r it e an ap p licat ion y ou alr ead y k n ow t h e st r u ct u r e of y ou r d at a. You ' ll of t en n eed t o k n o w h o w m a n y r ow s
w er e r et u r n ed b y a g iv en q u er y , b u t t h e ot h er m et ad at a is m ost u sef u l w h en y ou ar e p r ocessin g ad - h oc co m m an d s—com m an d s
t h at ar e n ot k n ow n t o y ou at t h e t im e y ou ar e w r it in g y ou r ap p licat ion .

Af t er y ou p r ocess t h e m et ad at a ( if y ou n eed t o) , y ou r ap p licat ion w ill u su ally p r ocess all t h e r ow s in t h e r esu lt set . I f y ou ex ecu t e a
SELECT st at em en t , t h e r esu lt set w ill in clu d e all t h e r ow s t h at m eet t h e con st r ain t s of t h e WHERE clau se ( if an y ) . I n so m e
cir cu m st an ces, y ou w ill f in d it m or e con v en ien t t o DECLARE a cu r sor f or t h e SELECT st at em en t an d t h en ex ecu t e m u lt ip le FETCH
st at em en t s. Wh en y ou ex ecu t e t h e DECLARE st at em en t , y ou w on ' t g et m et ad at a. How ev er , as y ou ex ecu t e FETCH com m an d s, y ou
ar e con st r u ct in g a n ew r esu lt set f or each FETCH an d t h e ser v er h as t o sen d m et ad at a d escr ib in g t h e r esu lt in g f ields— t h at can b e
ex p en siv e.

Af t er y ou h av e f in ish ed p r ocessin g t h e r esu lt set , y ou can ex ecu t e m or e com m an d s, o r y ou can d iscon n ect f r om t h e ser v er .
LI STEN / N OTI FY

Som et im es, y ou m ig h t w an t a clien t ap p licat ion t o w ait f or so m e ser v er - side ev en t t o occu r b ef or e p r oceed in g . For ex am p le, y ou
m ig h t n eed a q u eu in g sy st em t h at w r it es a w or k or d er in t o a Post g r eSQL t ab le an d t h en ex p ect s a clien t ap p licat ion t o car r y ou t
t h at w or k or d er . Th e m ost ob v iou s w ay t o w r it e a clien t of t h is sor t is t o p u t y ou r clien t ap p licat ion t o sleep f or a f ew secon d s ( or a
f ew m in u t es) , t h en , w h en y ou r ap p licat ion aw ak en s, ch eck f or a n ew r ecor d in t h e w or k - or d er t ab le. I f t h e r ecor d ex ist s, d o y ou r
w or k an d t h en r ep eat t h e w h ole cy cle.

Th er e ar e t w o p r ob lem s w it h t h is ap p r oach . Fir st , y ou r clien t ap p licat ion can ' t b e v er y r esp on siv e. Wh en a n ew w or k or d er is ad d ed ,
it m a y t ak e a f ew secon d s ( or a f ew m in u t es) f or y ou r clien t t o n ot ice ( it ' s f ast asleep af t er all) . Secon d , y ou r clien t ap p licat ion
m ig h t sp en d a lot of t im e sear ch in g f or w or k or d er s t h at d on ' t ex ist .

Post g r eSQL of f er s a solu t ion t o t h is p r ob lem : t h e LISTEN/NOTIFY m ech an ism . A Post g r eSQL ser v er can sig n al clien t ap p licat ion s
t h at so m e ev en t h as occu r r ed by ex ecu t in g a NOTIFY eventName co m m an d . All clien t ap p licat ion s t h at ar e list en in g f or t h at ev en t
ar e n ot if ied t h at t h e ev en t h as occu r r ed . You g et t o ch oose y ou r o w n ev en t n am es. I n a w o r k - or d er ap p licat ion , y ou m ig h t d ef in e
an ev en t n am ed workOrderReceived. To in f or m t h e ser v er t h at y ou ar e in t er est ed in t h at ev en t , t h e clien t ap p licat ion ex ecu t es a
LISTEN workOrderReceived co m m an d ( t o t ell t h e ser v er t h at y ou ar e n o lon g er in t er est ed in an ev en t , sim p ly UNLISTEN
workOrderReceived) . Wh en a w or k or d er ar r iv es at t h e ser v er ( v ia so m e ot h er clien t ap p licat ion ) , ex ecu t in g t h e co m m an d NOTIFY
workOrderReceived w ill in f or m all clien t s t h at a workOrderReceived ev en t h as occu r r ed ( act u ally , Post g r eSQL w ill on ly n ot if y
t h ose clien t s list en in g f or t h at sp ecif ic ev en t ) .

Each clien t - side API of f er s a d if f er en t LISTEN m ech an ism an d y ou r ar ely ex ecu t e a LISTEN co m m an d y ou r self —in st ead , y ou call an
API f u n ct ion t h at ex ecu t es t h e LISTEN co m m an d f or y ou ( af t er ar r an g in g t o in t er cep t t h e ev en t in a lan g u ag e- specif ic w ay ) .

Reg ar d less of t h e lan g u ag e t h at y ou ch oose, y o u sh ou ld b e aw ar e t h at n ot if icat ion s ar e on ly sen t at t h e en d of a su ccessf u l


t r an sact ion . I f y ou ROLLBACK a t r an sact ion , an y NOTIFY co m m an d s ex ecu t ed w it h in t h at t r an sact ion ar e ig n or ed . Th at m ak es sen se
if y ou t h in k ab ou t it : I f y ou r ap p licat ion ad d s a w or k or d er r ecor d , b u t t h en ab or t s t h e t r an sact ion , y ou d on ' t w an t t o w ak e clien t
ap p licat ion s w it h a f alse alar m .
Ch o o si n g a n Ap p l i ca t i o n En v i r o n m e n t

When you choose an environm ent for your code, t here are a num ber of issues t o consider. To st art wit h, you have t o decide
whet her t he feat ure t hat you want t o build should be server- side code, client - side code, or a com binat ion of bot h.

Se r v e r - Si d e Co d e

There are several advant ages t o adding funct ionalit y as server- side code.

The first considerat ion is perform ance. I f you are creat ing an applicat ion t hat needs t o access m any rows of dat a, it will
execut e fast er on t he server. You won't have t o send t he dat a across t he net work t o t he client ( net work t raffic is very
expensive in t erm s of perform ance) .

Next , you should consider code reuse. I f you add a feat ure in t he form of a server - side funct ion, t hat feat ure can be used by
any client applicat ion. You can also use server - side funct ions wit hin SQL queries.

Anot her advant age t o creat ing server- side funct ions is that you can use a server funct ion as a t rigger. A t rigger funct ion is
execut ed whenever a part icular condit ion occurs. For exam ple, you can define a t rigger t hat execut es whenever a row is
delet ed from a part icular t able.

Finally, server- side code is port able. Any funct ion t hat you writ e in a server- side procedural language runs on any plat form
that support s Post greSQL. Of course, if you writ e a server - side funct ion t hat requires specific server - side feat ures ( such as
ot her funct ions or dat a t ypes) , t hose feat ures m ust be inst alled in each server.

Cl i e n t - Sid e Co d e

Client - side code is useful for building t he user int erface. You can't build a user int erface using one of t he server- side
procedural languages—t hey execut e wit hin t he cont ext of t he server and t he server has no user int erface.

One of t he int erest ing t hings t o not e about t he client API s is t hat m ost of t hem are im plem ent ed using t he libpq API ( ODBC
and JDBC are not ) . This m eans, for exam ple, t hat if you are using libpq+ + from a C+ + applicat ion and you call a m em ber
funct ion of t he PgDat abase class, it will be t ranslat ed int o one or m ore calls t o t he libpq library.

The ODBC and JDBC int erfaces are not im plem ent ed using libpq. I nst ead, t hey t alk direct ly t o t he backend dat abase using
t he sam e net work prot ocol as libpq. I f you ever decide t o im plem ent your own client API , you can choose eit her m et hod:
im plem ent your API in t erm s of libpq ( or one of t he ot her API s) , or t alk direct ly t o t he server using t he sam e underlying
net work prot ocol.

M i x i n g Se r v e r - Si d e a n d Cl i e n t - Sid e Co d e

A part icularly powerful st rat egy is t o creat e an applicat ion using a m ixt ure of client - side code and st ored- procedures. Many
com m ercial applicat ions are shipped wit h t wo t ypes of code. When you use one of t hese packages, you inst all a set of st ored-
procedures int o t he dat abase; t hen you inst all ext ernal client applicat ions t hat m ake use of t he cust om procedures.

This arrangem ent gives you all t he advant ages of server- side code ( perform ance, port abilit y, and reusabilit y) plus t he
capabilit y t o creat e a pleasant user int erface in t he client .
Ch a p t e r 6 . Ex t e n d in g Po st g r e SQL

PostgreSQL is an ext ensible dat abase. You can add new funct ions, new operat ors, and cust om data types t o t he PostgreSQL
server.

I n this chapter, I 'll show you how t o add two sim ple funct ions, a new dat a t ype, and a set of operat ors t hat work wit h t he new
type. The exam ples build on each ot her, so it would be a good idea t o read this chapt er in sequence rat her t han skipping
around t oo m uch. The sam ple code used in this chapt er was developed using PostgreSQL release 8.0.

We'll st art by adding a new funct ion t o t he PostgreSQL server. The det ails are im port ant , but t he process is not difficult . Aft er
you know how t o add one funct ion t o t he server, it 's easy t o add ot hers.

Ex t e n d in g t h e P o st g r e SQ L Se r v e r w i t h Cu st o m Fu n ct io n s

An ext ension funct ion is loaded int o a running Post greSQL server process as needed. I f you don't act ually use an ext ension, it
will not be loaded. Ext ension funct ions m ust be creat ed in t he form of a dynam ically loadable obj ect m odule. I n t he Windows
world, an ext ension is cont ained wit hin a DLL. I n t he Linux/ Unix environm ent , an ext ension is cont ained wit hin a shared obj ect
m odule.

There are t wo phases t o t he process of adding an ext ension funct ion t o t he Post greSQL server. First , you creat e t he extension
funct ion in t he language of your choice, com piling it int o a dynam ic obj ect m odule ( .dll or .so) . Next , t ell t he Post greSQL
server about t he funct ion. The CREATE FUNCTION com m and adds a new funct ion t o a dat abase.

I 'll show you t wo exam ples that should help clarify this process.

Post g r e SQL a n d Por t a b ilit y

Som e of t he st eps required t o writ e a PostgreSQL ext ension funct ion in C m ay seem rat her odd at first . You
m ay feel m ore com fort able wit h t he process if you underst and t he problem t hat t he Post greSQL authors were
t rying t o fix.

When you call a funct ion in a t ypical C program , you know at t he t im e you writ e your code how t o call t hat
funct ion. You know how m any argum ent s are required and you know t he data type of each argum ent. I f you
provide an incorrect num ber of param et ers or incorrect data t ypes, it is highly likely that your program will
crash. For exam ple, t he fopen() funct ion ( from t he C Runt im e Library) requires t wo param et ers:

FILE * fopen( const char * filename, const char * mode )

I f you om it t he mode param eter or send a num eric data t ype inst ead of a point er, your program will fail in som e
way.

Now, suppose t hat your program prom pt s t he user for t he nam e of a dynam ic obj ect m odule and t he nam e of a
funct ion wit hin t hat m odule. Aft er you load t he given m odule int o your program , you have t o call t he nam ed
funct ion. I f you know which funct ion t he user will select , you can form ulat e your funct ion call properly at t he
tim e you writ e your code. What happens if t he user select s som e other funct ion t hat t akes a com plet ely
different argum ent list ? How can you form ulat e t he funct ion call if you don't know t he param eter list ? There is
no port able way t o do that, and PostgreSQL aim s t o be ext rem ely port able.

So, t he Post greSQL aut hors decided t o change t he way you pass argum ents t o an ext ension funct ion. Rat her
than declaring a separat e form al param et er for each value passed t o t he funct ion, PostgreSQL m arshals all t he
argum ents int o a separat e dat a st ruct ure and passes t he address of t he m arshaled form t o your ext ension.
When you need to access funct ion param et ers, you get t o t hem t hrough t he m arshaled form .

This is sim ilar in concept t o t he way t he main() funct ion of a C program behaves. You can't know, at t he t im e
you write t he main() funct ion, how m any com m and-line param et ers you will receive. ( You m ight know how
m any param et ers you should receive, but how m any you will receive is not quit e t he sam e anim al.) The st art up
rout ine on t he C Runt im e Library m arshals t he com m and- line argum ent s int o a data st ruct ure ( t he argv[]
array) and passes you t he address of t hat st ruct ure. To find t he act ual values specified on t he com m and line,
you m ust use t he data st ruct ure rat her than form al param et ers.

Older versions of PostgreSQL used a st rat egy t hat becam e less port able as operat ing system s advanced int o
t he 64 -bit arena. The old st rat egy is known as t he "version-0 calling convent ion." The new st rat egy is called t he
"version- 1 calling convent ion." Post greSQL st ill support s bot h calling convent ions, but you should st ick t o t he
version-1 convent ion for bet t er port abilit y.

For m ore inform at ion on t he difference bet ween t he version- 0 and version-1 convent ions, see sect ion 12 of t he
PostgreSQL Program m er's Guide.

There are t wo im port ant consequences t o t he version- 1 convent ion. First , all version- 1 funct ions ret urn t he
sam e dat a t ype: a Datum. A Datum is a sort of universal data type. Any Post greSQL data t ype can be accessed
t hrough a Datum. PostgreSQL provides a set of m acros t hat m ake it easy t o work wit h Datums. Second, a
version-1 funct ion m akes use of a set of m acros t o access funct ion argum ent s. Every version -1 funct ion is
declared in t he sam e way:

Datum function-name(PG_FUNCTION_ARGS);

As you read t hrough t he exam ples in t his chapt er, keep in m ind t hat t he PostgreSQL authors had t o solve t he
port abilit y problem .

The first exam ple adds a sim ple funct ion, nam ed filesize, t o t he Post greSQL server. Given t he nam e of a file, it returns t he
size of t he file ( in byt es) . I f t he file does not exist , cannot be exam ined, or is not a regular [ 1 ] file, t his funct ion ret urns NULL.
You m ight find this funct ion ( and t he filelist() funct ion shown lat er) useful for perform ing syst em adm inist rat ion tasks from
wit hin a PostgreSQL applicat ion. Aft er you have creat ed t he filesize funct ion, you can call it like this:

[ 1]
I n t his cont ext , a file is considered "regular" if it is not a direct ory, nam ed pipe, sym bolic link, device file, or
socket.

movies=# SELECT filesize( '/bin/bash' );


filesize
----------
512668

We'll develop t he filesize funct ion in C ( see List ing 6.1 ).

The filesize funct ion t akes a single argum ent —a pathnam e in t he form of a TEXT value. This funct ion ret urns t he size of t he
nam ed file as an INTEGER value.

List ing 6 .1 . filesize.c

Code View: Scroll / Show All

1 /*
2 ** Filename: filesize.c
3 */
4
5 #include "postgres.h"
6 #include "fmgr.h"
7 #include <sys/stat.h>
8
9 PG_FUNCTION_INFO_V1(filesize);
10
11 Datum filesize(PG_FUNCTION_ARGS)
12 {
13 text * fileNameText = PG_GETARG_TEXT_P(0);
14 size_t fileNameLen = VARSIZE( fileNameText ) - VARHDRSZ;
15 char * fileName = (char *)palloc( fileNameLen + 1 );
16 struct stat statBuf;
17
18 memcpy( fileName, VARDATA( fileNameText ), fileNameLen );
19 fileName[fileNameLen] = '\0';
20
21 if( stat(fileName, &statBuf) == 0 && S_ISREG(statBuf.st_mode))
22 {
23 pfree( fileName );
24
25 PG_RETURN_INT32((int32)statBuf.st_size );
26 }
27 else
28 {
29 pfree( fileName );
30
31 PG_RETURN_NULL();
32 }
33 }

Lines 5 and 6 #include two header files supplied by PostgreSQL. These files ( postgres.h and fmgr.h) provide dat a t ype
definit ions, funct ion prot ot ypes, and m acros t hat you can use when writ ing ext ensions. The <sys/stat.h> file included at line
7 defines t he layout of t he struct stat obj ect used by t he stat() funct ion ( described lat er) .

Line 9 uses t he PG_FUNCTION_INFO_V1() t o t ell Post greSQL t hat t he funct ion ( filesize()) uses t he version-1 calling
convent ion.

At line 11, you see t he signat ure used for all version-1 funct ions. The filesize() funct ion ret urns a Datum and expect s a
single argum ent. PG_FUNCTION_ARGS is a preprocessor sym bol t hat expands t o declare a consist ent ly nam ed param et er. So,
your funct ion definit ion expands from t his:

Datum filesize(PG_FUNCTION_ARGS)

t o t his:

Datum filesize( FunctionCallInfo fcinfo )

This m ight seem a lit t le st range at first , but t he version- 1 argum ent accessor m acros are writ t en so t hat t he single funct ion
argum ent m ust be nam ed fcinfo.

At line 13, you creat e a variable of t ype text. text is one of t he data t ypes defined in t he postgres.h header file ( or in a file
included by postgres.h) . Whenever you writ e an ext ension funct ion, you will be working wit h t wo set s of data t ypes. Each
funct ion param eter ( and t he return value) will have a SQL data type and a C dat a type. For exam ple, when you call t he
filesize funct ion from wit hin Post greSQL, you pass a TEXT param et er: TEXT is t he SQL dat a type. When you im plem ent t he
filesize funct ion in C, you receive a text value: text is t he C data type. The nam e for t he C dat a t ype is usually sim ilar t o
t he nam e of t he corresponding SQL data t ype. For clarit y, I 'll refer t o t he Post greSQL data types using uppercase let t ers and
t he C data t ypes using lowercase let t ers.

Notice t hat a m acro is used t o ret rieve t he address of t he TEXT value. I m ent ioned earlier t hat an ext ension funct ion m ust use
m acros t o access param et ers, and this is an exam ple of such a m acro. The PG_GETARG_TEXT_P(n) m acro returns t he nth
param et er, which m ust be of t ype TEXT. The ret urn value of PG_GETARG_TEXT_P(n) is of t ype text. There are m any argum ent -
accessor funct ions, each corresponding t o a specific param et er t ype: PG_GETARG_INT32(n), PG_GETARG_BOOL(n),
PG_GETARG_OID(n), and so on. See t he fmgr.h PostgreSQL header file for a com plete list .

We'll be using t he stat() funct ion ( from t he C Runt im e library) t o find t he size of a given file. stat() expect s t o find t he
pathnam e in t he form of a null-t erm inat ed st ring. Post greSQL has given you a text value, and text values are not null-
term inated. You will need t o convert fileNameText int o a null-t erm inat ed st ring.

I f fileNameText is not null- term inated, how do you know t he lengt h of t he pathnam e? Let 's t ake a peek at t he definit ion of
t he text data t ype ( from t he c.h PostgreSQL header file) :

struct varlena
{
int32 vl_len;
char vl_data[1];
};

typedef struct varlena text;

You can see t hat a text value is defined by t he struct varlena st ruct ure. The vl_len m em ber t ells you how m any byt es are
required t o hold t he ent ire st ruct ure. The charact ers t hat m ake up t he t ext value st art at t he address of t he v1_data[0]
m em ber. PostgreSQL supplies two m acros t hat m ake it easy t o work wit h variable-lengt h data st ruct ures. The VARHDRSZ
sym bol cont ains t he size of t he fixed port ion of a struct varlena. The VARSIZE() m acro returns t he size of t he ent ire data
st ruct ure. The VARDATA() m acro returns a point er t o first byt e of t he TEXT value. The lengt h of t he TEXT value is VARSIZE() -
VARHDRSZ. You st ore t hat lengt h in t he fileNameLen variable.

At line 15, you allocat e enough space t o hold a copy of t he null- t erm inat ed st ring. The palloc() funct ion is sim ilar t o malloc
(): I t allocat es t he request ed num ber of bytes and returns a point er t o t he new space. You should use palloc() and pfree()
when you writ e ext ension funct ions rather than malloc() and free(). The palloc() and pfree() funct ions ensure t hat you
can't create a m em ory leak in an ext ension funct ion, which is som et hing you can do if you use malloc() inst ead.

Lines 18 and 19 creat e a null-t erm inat ed copy of t he TEXT value, and line 21 passes t he null- t erm inat ed st ring t o t he stat()
funct ion. I f t he stat() funct ion succeeds, it fills in t he statBuf st ruct ure and ret urns 0.

I f you succeeded in ret rieving t he file st at us inform at ion and t he file is a regular file, free t he null- t erm inat ed st ring ( using
pfree()) and ret urn t he file size. Not ice t hat you m ust use a m acro t o t ranslat e t he ret urn value ( an int32) int o a Datum.

I f t he stat() funct ion failed ( or t he file is not a regular file) , you free t he null- t erm inat ed st ring and ret urn NULL. Again, you
use a m acro t o produce t he ret urn value in t he form of a Datum.
Now t hat you have craft ed t he filesize funct ion, you need t o com pile it int o a shared obj ect m odule. You usually com pile a C
source file int o a st andalone execut able program , but PostgreSQL expects t o find t he filesize funct ion in a shared obj ect
m odule. The procedure for producing a shared obj ect m odule is different for each com piler; sect ion 31.9 of t he Post greSQL
reference docum ent at ion describes t he process for a num ber of com pilers. List ing 6.2 shows t he makefile t hat I 've used t o
com pile t he filesize funct ion using Fedora Core ( Linux) :

List ing 6 .2 . makefile

# File name: makefile


SERVER_INCLUDES += -I $(shell pg_config --includedir)
SERVER_INCLUDES += -I $(shell pg_config --includedir-server)

CFLAGS += -g $(SERVER_INCLUDES)

.SUFFIXES: .so

.c.so:
$(CC) $(CFLAGS) -fpic -c $<
$(CC) $(CFLAGS) -shared -o $@ $(basename $<).o

To com pile filesize using t his makefile, you would issue t he following com m and:

$ make -f makefile filesize.so

After t he com pile step is com plet ed, you are left wit h a file nam ed filesize.so in your current direct ory. The preferred
locat ion for a PostgreSQL extension can be found using t he pg_config com m and:

$ pg_config --pkglibdir
/usr/local/pg800/lib/postgresql

You can copy t he filesize.so file t o this direct ory, but I prefer t o creat e a sym bolic link point ing back t o m y developm ent
direct ory inst ead. Aft er an ext ension is com plet ely debugged, I delet e t he sym bolic link and copy t he final version int o t he
preferred locat ion. To creat e a sym bolic link, use t he following com m and:

$ ln -s `pwd`/filesize.so `pg_config --pkglibdir`

At t his point , you have a shared obj ect m odule, but you st ill have t o t ell Post greSQL about t he funct ion t hat you want t o
im port int o t he server.

The CREATE FUNCTION com m and t ells Post greSQL everyt hing it needs t o know t o call your funct ion:

movies=# CREATE OR REPLACE FUNCTION


movies-# filesize( TEXT ) RETURNS INTEGER AS
movies-# 'filesize.so', 'filesize' LANGUAGE 'C'
movies-# STRICT;
CREATE

This com m and defines a funct ion nam ed filesize( TEXT ). This funct ion ret urns an INTEGER value. The funct ion is writ t en in
C and can be found in t he file filesize.so in t he preferred ext ension direct ory. You can specify a com plete pat hnam e t o t he
shared obj ect m odule if you want t o, but in m ost cases it 's easier t o j ust put it where PostgreSQL expects t o find it , as I 've
done here. You can also om it t he filenam e ext ension ( t he .so part ) , as long as you follow t he shared obj ect m odule- nam ing
rules im posed by your host operat ing syst em .

I 've defined filesize() as a st rict funct ion. The STRICT at t ribut e t ells PostgreSQL t hat this funct ion will always ret urn NULL if
any argum ent is NULL. I f PostgreSQL knows t hat a funct ion is STRICT, it can avoid calling t he funct ion wit h a NULL argum ent
( again, a perform ance opt im izat ion) . STRICT m akes it easier for you to im plem ent your extension funct ions; you don't have t o
check for NULL argum ents if you declare your funct ions t o be STRICT.

Now you can call t he funct ion from wit hin a PostgreSQL session:

movies=# SELECT filesize( '/bin/bash' );


filesize
----------
512668
(1 row)
movies=# SELECT filesize( 'non-existent file' );
filesize
----------

(1 row)

D e b u g g in g Po st g r e SQL Ex t e n sion s

One of t he challenges you'll face in creat ing PostgreSQL ext ensions is figuring out how t o debug them . Relax,
it's easy. First , rem em ber t hat t he ext ension funct ion that you creat e is loaded int o t he server ( not t he client ) .
That m eans t hat when you fire up a debugger, you want t o at t ach it t o t he server process. How do you find t he
server process? Call t he pg_backend_pid() funct ion once t he server is up and running—pg_backend_pid()
returns t he process I D of t he server t hat your client is connect ed t o. Next , load t he shared-obj ect file ( t he file
t hat cont ains your extension funct ion) int o t he server wit h t he LOAD com m and. At t his point , your server is
wait ing for you—it's t im e t o at t ach t he debugger. I f you're using t he gdb debugger, you can at t ach t o a running
process wit h t he com m and:

gdb postgres server-process-id

But rem em ber, t he PostgreSQL server process is owned by user postgres: I f you try to at t ach wit hout t he
proper privileges, gdb will j ust laugh at you. Make sure you su postgres before you run gdb. When you ask
gdb t o at t ach t o a running process, t he second argum ent is ignored—it has t o be t here, but it really doesn't
m atter what you st ring you use.

I f you have t he proper privileges, gdb should now be wait ing for you t o ent er a com m and. Now you can set a
breakpoint at your ext ension funct ion and gdb will int errupt t he server when that funct ion is invoked. Not ice
t hat t he server process is suspended unt il you tell t he debugger t o continue ( if you try to execut e a com m and
from your PostgreSQL client applicat ion, t he client will hang unt il t he server wakes up again) . Once you've t old
gdb t o cont inue, you can go back t o your client applicat ion and issue a com m and t hat invokes t he funct ion t hat
you're int erest ed in.

I f you're debugging an ext ension funct ion on a Windows host , t he procedure is sim ilar: Find t he process I D of
t he server and at t ach a Windows debugger t o t hat process.

To sum m arize:

€ St art a client applicat ion ( such as psql)

€ From wit hin t he client : SELECT pg_backend_pid();

€ From wit hin t he client : LOAD 'extension-file.so';

€ St art anot her t erm inal session and su postgres

€ Fire up t he debugger: $ gdb postgres server-process-id

€ Set a breakpoint : (gdb) break my-function

€ Tell gdb t o cont inue: (gdb) cont

€ Go back t o t he client applicat ion and execut e a com m and t hat will call your ext ension funct ion
Re t u r n in g M u lt ip le V a lu e s f r om a n Ex t e n sion Fu n ct ion

Th e secon d ex t en sion t h at y ou w ill ad d w or k s w ell w it h t h e filesize f u n ct ion . Giv en t h e n a m e of a d ir ect or y , t h e filelist


f u n ct ion r et u r n s a list of all f iles ( an d su b d ir ect or ies) con t ain ed in t h at d ir ect or y . Th e filesize f u n ct ion ( f r om t h e p r ev iou s
ex am p le) r et u r n s a sin gle v alu e; filelist w ill r et u r n m u lt ip le r ow s. An ex t en sion f u n ct ion t h at can r et u r n m u lt ip le r esu lt s is called
a set - r et u r n in g f u n ct ion , or SRF.

Post gr e SQL's SRF I n t e r fa ce

Bef or e y ou r ead t oo m u ch f u r t h er , I sh ou ld t ell y ou t h at t h er e' s a n easy w ay t o w r it e set - r et u r n in g f u n ct ion s an d


an ot h er m et h od t h at ' s alm ost as easy . I ' ll d escr ib e b ot h m et h od s, st ar t in g w it h t h e sligh t ly m or e dif f icu lt ap p r oach .
St ar t in g w it h v er sion 7 . 3 , t h e Post g r eSQL d ev elop er s in t r od u ced a set of w r ap p er f u n ct ion s ( an d m acr os) t h at p u t a
f r ien dlier f ace o n t op of t h e or ig in al m et h od . Un d er t h e h ood , y ou r SRF is d oin g t h e sam e t h in g w h et h er y ou u se t h e
n ew ap p r oach or t h e old ap p r oach , b u t it ' s a lit t le easier t o u n d er st an d t h e n ew SRF in t er f ace if y ou can p eek u n d er
t h e cov er s.

Wh en y ou ar e f in ish ed cr eat in g t h e filelist f u n ct ion , y ou can u se it lik e t h is:

movies=# SELECT filelist( '/usr' );


filelist
------------
.
..
bin
dict
etc
games
html
include
kerberos
lib
libexec
local
sbin
share
src
tmp
X11R6
(17 rows)

I n t h is ex am p le, t h e u ser h as in v ok ed t h e filelist f u n ct ion on ly on ce, b u t 1 7 r o w s w er e r et u r n ed . A SRF is act u ally called m u lt ip le


t im es. I n t h is case, t h e filelist() f u n ct ion is called 1 8 t im es. Th e f ir st t im e t h r ou g h , filelist() d oes a n y p r ep ar at or y w or k
r eq u ir ed an d t h en r et u r n s t h e f ir st r esu lt . For each su b seq u en t call, filelist() r et u r n s an ot h er r o w u n t il t h e r esu lt set is
ex h au st ed . On t h e 1 8 t h call, filelist() r et u r n s a st at u s t h at t ells t h e ser v er t h at t h er e ar e n o m or e r esu lt s av ailab le.

Lik e t h e filesize f u n ct ion , filelist t ak es a sin g le ar g u m en t ; a d ir ect or y n a m e in t h e f or m of a TEXT v alu e. Th is f u n ct ion r et u r n s


a SETOF TEXT v alu es. List in g 6 . 3 sh ow s t h e f ir st p ar t of t h e filelist.c sou r ce f ile:

Li st i n g 6 . 3 . filelist.c ( P a r t 1 )

1 /*
2 ** Filename: filelist.c
3 */
4
5 #include "postgres.h"
6 #include "fmgr.h"
7 #include "nodes/execnodes.h"
8
9 #include <dirent.h>
10
11 typedef struct
12 {
13 int dir_ctx_count;
14 struct dirent ** dir_ctx_entries;
15 int dir_ctx_current;
16 } dir_ctx;
17
18 PG_FUNCTION_INFO_V1(filelist);
19

filelist.c #includes f ou r h ead er f iles, t h e f ir st t h r ee of w h ich ar e su p p lied b y Post g r eSQL. postgres.h an d fmgr.h p r ov id e d at a
t y p e d ef in it ion s, f u n ct ion p r ot ot y p es, a n d m acr os t h at y ou w ill n eed t o cr eat e ex t en sion s. Th e nodes/execnodes.h h ead er file
d ef in es a st r u ct u r e ( ReturnSetInfo) t h at y ou n eed b ecau se filelist r et u r n s a set of v alu es. You w ill u se t h e scandir() f u n ct ion
t o r et r iev e t h e d ir ect or y con t en t s f r om t h e op er at in g sy st em . Th e f ou r t h h ead er f ile d ef in es a f ew d at a t y p es t h at ar e u sed b y
scandir().

Lin e 1 1 d ef in es a st r u ct u r e t h at k eep s t r ack of y ou r p r og r ess. I n t h e f ir st in v ocat ion , y ou w ill set u p a con t ex t st r u ct u r e ( dir_ctx)
t h at w e can u se f or each su b seq u en t call. Th e dir_ctx_count m em b er in d icat es t h e n u m b er of f iles an d su b d ir ect or ies in t h e g iv en
d ir ect or y . Th e dir_ctx_entries m em b er is a p oin t er t o an ar r ay of struct dirent st r u ct u r es. Each m em b er of t h is ar r ay con t ain s
a d escr ip t ion of a f ile or su b d ir ect or y . dir_ctx_current k eep s t r ack of t h e cu r r en t p osit ion as y ou t r av er se t h e dir_ctx_entries
ar r ay .

Lin e 1 8 t ells Post g r eSQL t h at filelist() u ses t h e v er sion - 1 callin g con v en t ion .

List in g 6 . 4 sh ow s t h e filelist() f u n ct ion :

Li st i n g 6 . 4 . filelist.c ( P a r t 2 )

Cod e View : Scr oll / Sh ow All

20 Datum filelist(PG_FUNCTION_ARGS)
21 {
22 FmgrInfo * fmgr_info = fcinfo->flinfo;
23 ReturnSetInfo * resultInfo = (ReturnSetInfo *)fcinfo->resultinfo;
24 text * startText = PG_GETARG_TEXT_P(0);
25 int len = VARSIZE( startText ) - VARHDRSZ;
26 char * start = (char *)palloc( len+1 );
27 dir_ctx * ctx;
28
29 memcpy( start, startText->vl_dat, len );
30 start[len] = '\0';
31
32 if( fcinfo->resultinfo == NULL )
33 elog(ERROR, "filelist: context does not accept a set result");
34
35 if( !IsA( fcinfo->resultinfo, ReturnSetInfo ))
36 elog(ERROR, "filelist: context does not accept a set result");
37
38 if( fmgr_info->fn_extra == NULL )
39 {
40 dir_ctx * new_ctx;
41
42 fmgr_info->fn_extra = MemoryContextAlloc( fmgr_info->fn_mcxt,
43 sizeof( dir_ctx ));
44
45 new_ctx = (dir_ctx *)fmgr_info->fn_extra;
46
47 new_ctx->dir_ctx_count = scandir( start,
48 &new_ctx->dir_ctx_entries,
49 NULL,
50 alphasort );
51 new_ctx->dir_ctx_current = 0;
52 }
53
54 ctx = (dir_ctx *)fmgr_info->fn_extra;
55
56 if( ctx->dir_ctx_count == -1 )
57 {
58 pfree( fmgr_info->fn_extra );
59
60 fmgr_info->fn_extra = NULL;
61
62 resultInfo->isDone = ExprEndResult;
63
64 PG_RETURN_NULL();
65 }
66
67 if( ctx->dir_ctx_current < ctx->dir_ctx_count )
68 {
69 struct dirent * entry;
70 size_t nameLen;
71 size_t resultLen;
72 text * result;
73
74 entry = ctx->dir_ctx_entries[ctx->dir_ctx_current];
75 nameLen = strlen( entry->d_name );
76 resultLen = nameLen + VARHDRSZ;
77
78 result = (text *)palloc( resultLen );
79
80 VARATT_SIZEP( result ) = resultLen;
81
82 memcpy( VARDATA( result ), entry->d_name, nameLen );
83
84 resultInfo->isDone = ExprMultipleResult;
85
86 /*
87 ** Advance to the next entry in our array of
88 ** filenames/subdirectories
89 */
90 ctx->dir_ctx_current++;
91
92 PG_RETURN_TEXT_P( result );
93 }
94 else
95 {
96 free( ctx->dir_ctx_entries );
97
98 pfree( fmgr_info->fn_extra );
99
100 fmgr_info->fn_extra = NULL;
101
102 resultInfo->isDone = ExprEndResult;
103
104 PG_RETURN_NULL();
105 }
106 }

Lin e 2 0 d eclar es filelist() u sin g t h e st an d ar d v er sion - 1 callin g con v en t ion ( r em em b er , a v er sion - 1 f u n ct ion alw ay s r et u r n s a
Datum an d u ses t h e PG_FUNCTION_ARGS p r ep r ocessor sy m b ol as an ar g u m en t list ) .

Th e C p r ep r ocessor t r an slat ed lin e 2 0 in t o

Datum filesize( FunctionCallInfo fcinfo )

As y ou can see, y ou can access t h e sin gle ar g u m en t t o filesize() t h r ou g h t h e v ar iab le fcinfo. All v er sion - 1 ex t en sion f u n ct ion s
ex p ect a FunctionCallInfo st r u ct u r e. Her e is t h e d ef in it ion of t h e FunctionCallInfo d at a t y p e:

typedef struct FunctionCallInfoData


{
FmgrInfo *flinfo; /* ptr to lookup info used for this call */
struct Node *context; /* pass info about context of call */
struct Node *resultinfo; /* pass or return extra info about result */
bool isnull; /* true if result is NULL */
short nargs; /* # arguments actually passed */
Datum arg[FUNC_MAX_ARGS]; /* Function arguments */
bool argnull[FUNC_MAX_ARGS]; /* T if arg[i] is NULL */
} FunctionCallInfoData;

Th er e is q u it e a b it of in f or m at ion in t h is st r u ct u r e. For n ow , y ou n eed t o k n o w ab ou t on ly t w o of t h e st r u ct u r e m em b er s; t h e r est


of t h e m em b er s ar e m an ip u lat ed u sin g m acr os, so y ou sh ou ld p r et en d t h at y ou don ' t see t h em . Th e t w o m em b er s t h at y ou ar e
in t er est ed in ar e flinfo an d resultInfo. Th e flinfo m em b er p oin t s t o a st r u ct u r e of t y p e FmgrInfo. Th e FmgrInfo st r u ct u r e
look s lik e t h is:

typedef struct FmgrInfo


{
PGFunction fn_addr; /* function or handler to be called */
Oid fn_oid; /* OID of function (NOT of handler, if any) */
short fn_nargs; /* 0..FUNC_MAX_ARGS, or -1 if variable arg */
bool fn_strict; /* func. is "strict" (NULL in = NULL out) */
bool fn_retset; /* func. returns a set (multiple calls) */
void *fn_extra; /* extra space for use by handler */
MemoryContext fn_mcxt; /* memory context to store fn_extra in */
} FmgrInfo;

Look closely at t h e FmgrInfo an d FunctionCallInfo st r u ct u r es. W h y w ou ld y ou n eed t w o st r u ct u r es t o r ep r esen t a f u n ct ion call?


Th e FmgrInfo f u n ct ion con t ain s in f or m at ion ab ou t t h e def in it ion of a f u n ct ion ; in ot h er w or d s, t h e st u f f y ou t ell Post g r eSQL in t h e
CREATE FUNCTION co m m an d can b e f ou n d in t h e FmgrInfo st r u ct u r e. Th e FunctionCallInfo st r u ct u r e r ep r esen t s a sin g le
in v ocat ion of a f u n ct ion . I f y ou call t h e sam e f u n ct ion 2 0 t im es, y ou ' ll h av e 2 0 d if f er en t FunctionCallInfo st r u ct u r es, each
p oin t in g t o a sin gle FmgrInfo st r u ct u r e. You can see t h e d if f er en ce by com p ar in g FmgrInfo.fn_nargs w it h
FunctionCallInfo.nargs. FmgrInfo.fn_nargs t ells y ou h o w m a n y ar g u m en t s w er e list ed in t h e CREATE FUNCTION co m m an d ;
FmgrInfo.fn_nargs t ells y ou h o w m a n y ar g u m en t s w er e p assed t o t h is p ar t icu lar in v ocat ion .

Lin e 2 3 d eclar es a v ar iab le called fmgr_info; y ou ' ll u se t h is t o g et t o t h e FmgrInfo st r u ct u r e f or t h is f u n ct ion . Lin e 2 4 d eclar es a
v ar iab le t h at y ou w ill u se t o g et t o t h e ReturnSetInfo st r u ct u r e. I ' ll d escr ib e t h e ReturnSetInfo st r u ct u r e in a m o m en t .
Lin es 2 4 t h r ou g h 3 0 t u r n t h e text ar g u m en t in t o a n u ll- t er m in at ed st r in g . Th is is b asically t h e sam e p r oced u r e y o u u sed in t h e
filesize() f u n ct ion .

Lin es 3 2 t h r ou g h 3 6 p er f or m so m e san it y ch eck s. I t ' s p ossib le t o call t h e filelist() f u n ct ion in a n in ap p r op r iat e con t ex t . We
k n ow t h at filelist() r et u r n s m u lt ip le r ow s, so it m ak es sen se t o call t h at f u n ct ion as a t ar g et of a SELECT co m m an d . You cou ld
also call filelist() in t h e WHERE clau se of a SELECT co m m an d , b u t t h at w ou ld b e a n in ap p r op r iat e con t ex t ( b ecau se of t h at
m u lt ip le- r ow p r ob lem ) . Wh en y ou w r it e a f u n ct ion t h at r et u r n s a set of v alu es, y ou sh ou ld en su r e t h at y ou r f u n ct ion is b ein g called
in an ap p r op r iat e con t ex t t h e w ay w e do h er e.

Lin e 3 8 is w h er e t h e in t er est in g st u f f st ar t s. fmgr_info->fn_extra is a p oin t er t h at y ou can u se f or y ou r o w n p u r p oses;


Post g r eSQL d oesn ' t d o an y t h in g w it h t h is st r u ct u r e m em b er ex cep t t o p r ov id e f or y ou r u se. Th e f ir st t im e filelist() is called , t h e
fmgr_info->fn_extra m em b er is NULL. I n each su b seq u en t call, fmgr_info->fn_extra is eq u al t o w h at ev er y ou set it t o in t h e
p r ev iou s call. Sou n d s lik e a g r eat place t o k eep con t ex t in f or m at ion . Rem em b er t h e dir_ctx st r u ct u r e y ou look ed at ear lier ? Th at
st r u ct u r e h old s t h e in f or m at ion t h at y ou u se t o k eep t r ack of y ou r p r og r ess as y ou w alk t h r ou g h t h e ar r ay of file en t r ies in a g iv en
d ir ect or y .

At lin e 4 2 , y ou k n ow t h at fmgr_info->fn_extra is NULL: Th at im p lies t h at y ou h av e n ot y et st ar t ed t r av er sin g a d ir ect or y list . So,


y ou allocat e a dir_ctx st r u ct u r e an d p oin t fmgr_info->fn_extra t o t h e n ew st r u ct u r e. Th e n ex t t im e y ou ar e called , fmgr_info-
>fn_extra w ill p oin t t o t h e sam e dir_ctx st r u ct u r e ( r em em b er , t h er e is on ly o n e FmgrInfo st r u ct u r e, r eg ar d less of h o w m a n y
t im es t h is f u n ct ion is called ) .

You m a y be t h in k in g t h at I sh ou ld h av e u sed palloc() t o allocat e t h e dir_ctx st r u ct u r e. I n m ost ex t en sion f u n ct ion s, t h at is


p r ecisely w h at y ou sh ou ld do. Bu t in t h e case of an SRF, y ou w an t t o allocat e in f or m at ion r elat ed t o t h e FmgrInfo st r u ct u r e in a
d if f er en t m em or y con t ex t [ 2] , t h e con t ex t p oin t ed t o in t h e fmgr_info st r u ct u r e.

[ 2]
You can t h in k of a m em o r y con t ex t as a p ool of m em or y . Un lik e malloc(), t h e MemoryContextAlloc() f u n ct ion
allocat es m em o r y f r om a specif ic p ool ( malloc() allocat es all m em or y f r om t h e sam e p ool) . A m em o r y con t ex t h as
lif et im e ( or scop e) . Wh en t h e scop e com p let es, all m em o r y allocat ed w it h in t h at scop e is au t om at ically r eleased . Th e
palloc() f u n ct ion is j u st a w r ap p er ar ou n d MemoryContextAlloc(). Th e m em o r y con t ex t u sed b y palloc() is
d est r oy ed at t h e en d of a t r an sact ion ( or p ossib ly soon er ) .

Lin es 4 7 t h r ou g h 5 0 d o t h e r eal g r u n t w or k . You u se t h e scandir() f u n ct ion t o cr eat e an ar r ay of struct dirent st r u ct u r es. Each
elem en t in t h is ar r ay ( new_ctx->dir_ctx_entries) d escr ib es a f ile or su b d ir ect or y . Th e scandir() f u n ct ion ex p ect s f ou r
p ar am et er s. Th e f ir st p ar am et er is t h e n am e of t h e d ir ect or y t h at y ou ar e in t er est ed in ; y ou p ass t h e n u ll- t er m in at ed st r in g ( start)
t h at y o u cr af t ed ear lier in t h is f u n ct ion . Th e secon d p ar am et er is a bit com p lex — it ' s a p oin t er t o a p oin t er t o a n ar r ay of struct
dirent st r u ct u r es. You k n o w t h at y ou r dir_ctx.dir_ctx_entries m em b er is a p oin t er t o an ar r ay of st r u ct u r es, so y ou p ass t h e
ad d r ess of dir_ctx_entries a n d scandir() p oin t s dir_ctx_entries t o t h e n ew ar r ay . Th e t h ir d p ar am et er is a p oin t er t o a
st r u ct u r e. I f y ou w an t t o ch oose w h ich f iles an d su b d ir ect or ies t o in clu d e in t h e r esu lt set , y ou can w r it e y ou r o w n select ion f u n ct ion
an d p ass it s ad d r ess t o scandir(). You w an t all f iles an d su b d ir ect or ies so y ou j u st p ass in a NULL t o t ell scandir() n ot t o f ilt er t h e
r esu lt set . Th e f in al scandir() p ar am et er is a p oin t er t o a com p ar ison f u n ct ion . I f y ou d on ' t p r ov id e a com p ar ison f u n ct ion ,
scandir() w on ' t sor t t h e r esu lt set . Use t h e alphasort f u n ct ion f r om t h e C Ru n t im e Lib r ar y — it ' s alr ead y w r it t en , an d y ou ar en ' t
t oo con cer n ed ab ou t p er f or m an ce h er e. For m or e in f or m at ion o n scandir() an d alphasort(), see t h e scandir() man p ag e.

Fin ish in it ializin g t h e dir_ctx st r u ct u r e b y set t in g dir_ctx_current t o zer o. dir_ctx_current is in cr em en t ed as y ou w alk t h r ou g h


t h e dir_ctx_entries.

Now t h at t h e in it ializat ion is com p let e, y ou can r et u r n y ou r f ir st r esu lt . Bu t fir st , a q u ick r ev iew . You k n ow t h at Post g r eSQL calls
t h is f u n ct ion m an y t im es an d it con t in u es t o call filelist() u n t il y ou set resultInfo->isDone t o ExprEndResult. You can d et ect
t he in it ial call t o filelist() b y t h e f act t h at fmgr_info->fn_extra is NULL. I n t h e in it ial call, y ou allocat e a con t ex t st r u ct u r e an d
poin t fmgr_info->fn_extra t o t he n ew st r u ct u r e; t h e n ex t t im e t h at filelist() is called , fmgr_info->fn_extra w ill n ot b e NULL,
so y ou k n o w t h at y ou can sk ip t h e in it ializat ion st ep . Nex t , p op u lat e t h e con t ex t st r u ct u r e b y callin g t h e scandir() f u n ct ion :
scandir() allocat es a n ar r ay of struct dirent st r u ct u r es an d g iv es y ou a p oin t er t o t h at ar r ay .

Lin e 5 4 r et r iev es t h e ad d r ess of y ou r con t ex t st r u ct u r e f r om fmgr_info->fn_extra.

Lin es 5 6 t h r ou g h 6 5 t ak e car e of t h e case w h er e t h e scandir() f u n ct ion f ails t o r et u r n an y d ir ect or y en t r ies. Th e scandir()


f u n ct ion r et u r n s t h e n u m b er of d ir ect or y en t r ies r et r iev ed — it r et u r n s - 1 o n f ailu r e.

Th e d et ails in t h is sect ion of cod e ar e im p or t an t . Fir st , y ou m u st f r ee t h e con t ex t st r u ct u r e t h at y ou allocat ed in t h e in it ial call


( u sin g pfree()) . You also set fmgr__info->fn_extra t o NULL; if y ou f or g et t h is st ep , t h e n ex t call t o filelist() w ill f in d a st ale
con t ex t st r u ct u r e an d w on ' t r ein it ialize. Rem em b er , t h er e is o n e Fu n ct ion CallI n f o st r u ct u r e f or each in v ocat ion , b u t t h er e is n ev er
m or e t h an on e Fm g r I n f o st r u ct u r e; y ou ' ll g et t h e sam e Fm g r I n f o st r u ct u r e each t im e filelist() is in v ok ed . Lin e 6 2 t ells
Post g r eSQL t h at y ou h av e r each ed t h e en d of t h e r esu lt set an d lin e 6 4 r et u r n s a NULL Dat u m .

Lin es 6 7 t h r ou g h 9 3 t ak e car e of r et u r n in g a sin g le r esu lt t o t h e caller .

Lin es 7 4 t h r ou g h 8 2 cr eat e a text v alu e f r om a n u ll- t er m in at ed d ir ect or y en t r y ( act u ally , ig n or e m ost of t h e struct dirent
st r u ct u r e an d j u st r et u r n t h e n am e p or t ion ) . You f ir st allocat e a n ew text st r u ct u r e u sin g palloc(); t h en set t h e st r u ct u r e size an d
cop y t h e d ir ect or y en t r y n am e in t o p lace. Not ice t h at y ou d on ' t cop y t h e n u ll- t er m in at or : A text v alu e sh ou ld n ot b e n u ll-
t er m in at ed . At lin e 8 4 , y ou t ell Post g r eSQL t h at y ou ar e r et u r n in g a r esu lt an d t h er e m a y b e m or e r esu lt s, so k eep callin g. Nex t ,
y ou in cr em en t t h e ar r ay in d ex so t h at t h e n ex t call t o filelist() w ill r et u r n t h e n ex t d ir ect or y en t r y . Fin ally , y ou r et u r n t h e
d ir ect or y en t r y t o t h e caller in t h e f or m of a text v alu e.
Not ice t h at t h e con t ex t st r u ct u r e in t h is sect ion of cod e h as n ot b een f r eed . You n eed t o p r eser v e t h e dir_ctx st r u ct u r e u n t il y ou
h av e p r ocessed t h e last d ir ect or y en t r y .

You r each Lin es 9 6 t h r ou g h 1 0 4 on ce y ou h av e r et u r n ed all d ir ect or y en t r ies. Th is sect ion is n ear ly iden t ical t o t h e cod e t h at d eals
w it h a scandir() f ailu r e ( lin es 5 8 - 6 4 ) . I n f act , t h e on ly d if f er en ce is t h at y ou h av e on e m or e t h in g t o clean u p . Wh en y ou called
t h e scandir() f u n ct ion , it allocat ed an ar r ay of struct dirent st r u ct u r es u sin g malloc(). You h av e t o free() t h at ar r ay b ef or e
y ou f in ish u p .

Th at com p let es t h e C p ar t of t h is f u n ct ion ; n o w y ou h av e t o com p ile it in t o a sh ar ed ob j ect m od u le an d t ell Post g r eSQL w h er e t o


f in d it . You can u se t h e sam e makefile t h at y o u u sed t o com p ile t h e filesize f u n ct ion :

$ make -f makefile filelist.so

As b ef or e, y ou ' ll cr eat e a sy m b olic link b et w een filelist.so an d Post g r eSQL' s p r ef er r ed p ack ag e d ir ect or y :

$ ln -s `pwd`/filelist.so `pg_config --pkglibdir`

Now t h e on ly t h in g r em ain in g is t o t ell Post g r eSQL ab ou t t h e n ew f u n ct ion :

movies=# CREATE FUNCTION filelist( TEXT )


movies-# RETURNS SETOF TEXT
movies-# AS 'filelist.so' LANGUAGE 'C';
CREATE

Now , let ' s call filelist() t o see h o w it w or k s:

movies=# SELECT filelist( '/usr' );


filelist
------------
.
..
bin
dict
etc
games
html
include
kerberos
lib
libexec
local
sbin
share
src
tmp
X11R6
(17 rows)

Not ice t h at t h e r esu lt s ap p ear in sor t ed or der . Th e or d er in g com es b ecau se y ou u sed t h e alphasort() f u n ct ion w h en y ou called
scandir(). I f y ou d on ' t car e ab ou t t h e or d er in g , y ou can sp ecif y a NULL com p ar ison f u n ct ion in st ead . Of cou r se, w e can ask
Post g r eSQL t o or d er t h e d at a it self :

movies=# SELECT filelist( '/usr' ) ORDER BY filelist DESC;


filelist
------------
X11R6
tmp
src
share
sbin
local
libexec
lib
kerberos
include
html
games
etc
dict
bin
..
.
(17 rows)

Now t h at y ou k n ow h o w t o cr eat e an SRF t h e h ar d w ay , I ' ll d escr ib e t h e n ew SRF in t er f ace t h at w a s in t r od u ced w it h Post g r eSQL
v er sion 7 . 3 .
The Post gr e SQL SRF I nt e r fa ce

Fir st off, y ou should k n ow t hat t he n ew SRF int er face is sim ply a w r apper ar ound t he old m et hod. Set - r et ur ning funct ions ar e st ill
inv ok ed m ult iple t im es. The fir st t im e t hr ough, an SRF init ializes it s ow n cont ex t st r uct ur e an d st or es t hat st r uct ur e aw ay so t hat each
subsequent inv ocat ion can find it . I n t he old appr oach, an SRF ex am ined fmgr_info->fn_extra t o det er m ine w het her it w as being
inv ok ed for t he fir st t im e ( if fmgr_info->fn_extra is NULL, t his is t he fir st call) . I n t he n ew appr oach, y ou call t he SRF_IS_FIRSTCALL()
m acr o inst ead. You can pr obably guess w hat t his m acr o does: I t r et ur ns TRUE if fmgr_info->fn_extra is NULL ( im ply ing t hat t his is t he
fir st call) . I n fact , her e's t he definit ion of t he SRF_IS_FIRSTCALL() m acr o:

#define SRF_IS_FIRSTCALL() ( fcinfo->flinfo->fn_extra == NULL )

No gr eat sur pr ises t her e.

Once y ou k n ow t hat y ou'r e look ing at t he fir st inv ocat ion of an SRF, y ou t y pically allocat e a cont ex t st r uct ur e of som e sor t an d st or e t he
addr ess of t he st r uct ur e in fmgr_info->fn_extra. I n t he new appr oach, y ou call t he SRF_FIRSTCALL_INIT() m acr o. This m acr o
allocat es it s ow n cont ex t st r uct ur e ( a st r uct ur e of t y pe FuncCallContext) , r ecor ds t he addr ess of t he st r uct ur e in fmgr_info-
>fn_extra, an d r et u r n s t he addr ess back t o y our SRF. A FuncCallContext st r uct ur e look s lik e t his:

typedef struct FuncCallContext


{
uint32 call_cntr;
uint32 max_calls;
TupleTableSlot * slot;
void * user_fctx;
AttInMetadata * attinmeta;
MemoryContext multi_call_memory_ctx;
TupleDesc tuple_desc;
} FuncCallContext;

I f t he SRF_FIRSTCALL_INIT() m acr o st or es it s ow n point er in fmgr_info->fn_extra, w her e ar e y ou supposed t o st or e t he addr ess of


y our cont ex t st r uct ur e? I n t he user_fctx field—t hat point er is r eser v ed for y our ow n per sonal use, j ust lik e fmgr_info->fn_extra w as
r eser v ed for y our u se in t he old SRF m echanism . I 'll ex plain t he ot her m em ber s of t he FuncCallContext st r uct ur e in a m om en t .

Now t hat y ou h av e a point er t o a spank ing n ew FuncCallContext ( r em em ber , SRF_FIRSTCALL_INIT() r et ur ns t he addr ess of t he
st r uct ur e) , y ou can allocat e y our ow n cont ex t st r uct ur e an d st or e it s addr ess in user_fctx:

...
FuncCallContext * srf = SRF_FIRSTCALL_INIT();
dir_ctx * ctx;

ctx = (dir_ctx *) MemoryContextAlloc( srf->multi_call_memory_ctx,


sizeof( dir_ctx ));

srf->usr_fctx = ctx;
...

Not ice t hat t he FuncCallContext st r uct ur e holds a MemoryContext n am ed multi_call_memory_ctx. Any dat a t hat y ou n eed t o sav e
fr om one inv ocat ion t o t he n ex t m u st be allocat ed fr om t he multi_call_memory_ctx or Post gr eSQL w ill discar d t hat dat a as soon as t he
fir st inv ocat ion com plet es ( multi_call_memory_ctx is equiv alent t o fmgr_info->fn_mctx in t he old SRF m echanism ) .

Each t im e y our SRF is inv ok ed ( ev en t he fir st t im e) , y ou should call t he SRF_PERCALL_SETUP() m acr o. Lik e SRF_FIRSTCALL_INIT(),
SRF_PERCALL_SETUP() r et ur ns a point er t o t he FuncCallContext st r uct ur e. Th e cont ex t point er t hat y ou sav ed in user_fctx is st ill
t her e. You can u se t hat point er t o get t o t he cont ex t st r uct ur e t hat y ou allocat ed ( an d init ialized) t he fir st t im e t hr ough.

The new SRF m echanism pr ov ides t w o m or e m acr os: SRF_RETURN_NEXT() an d SRF_RETURN_DONE(). As y ou m ight ex pect , t hese m acr os
r et ur n infor m at ion t o t he caller . The SRF_RETURN_NEXT() m acr os r et u r n s a v alue ( a Datum) t o t he caller an d t ells t he ser v er t o call y ou
again t o r et r iev e t he n ex t v alu e in t he r esult set ( r em em ber , y ou'r e w r it ing a set - r et ur ning funct ion; t he ser v er w ill call y our funct ion
unt il y ou indicat e t hat y ou hav e no m or e r esult s t o add t o t he set ) . Th e SRF_RETURN_DONE() m acr os r et u r n s a NULL v alue t o t he caller
and t ells t he ser v er t hat y ou hav e no m or e r esult s t o add t o t he r esult set . SRF_RETURN_DONE() also deallocat es t he FullCallContext
st r uct ur e so y ou should per for m an y cleanup w or k befor e y ou call SRF_RETURN_DONE()—y ou w on't get anot her ch an ce.

To sh ow y ou h ow all of t hese m acr os fit t oget her , List ing 6. 5 show s t he filelist() funct ion again , t his t im e cr eat ed w it h t he new SRF
m echanism :

List in g 6 .5 . filelistSRF.c

Code View : Scr oll / Show All

1 /*
2 ** Filename: filelistSRF.c
3 */
4
5 #include "postgres.h"
6 #include "funcapi.h"
7
8 #include <dirent.h>
9 #include <sys/stat.h>
10
11 typedef struct
12 {
13 struct dirent ** dir_ctx_entries;
14 } dir_ctx;
15
16 PG_FUNCTION_INFO_V1(filelist);
17
18 Datum filelist(PG_FUNCTION_ARGS)
19 {
20 text * startText = PG_GETARG_TEXT_P(0);
21 int len = VARSIZE( startText ) - VARHDRSZ;
22 char * start = (char *)palloc( len+1 );
23 dir_ctx * ctx;
24 FuncCallContext * srf;
25
26 memcpy( start, startText->vl_dat, len );
27 start[len] = '\0';
28
29 if( SRF_IS_FIRSTCALL())
30 {
31 srf = SRF_FIRSTCALL_INIT();
32
33 srf->user_fctx = MemoryContextAlloc( srf->multi_call_memory_ctx,
34 sizeof( dir_ctx ));
35
36 ctx = (dir_ctx *)srf->user_fctx;
37
38 srf->max_calls = scandir(start,&ctx->dir_ctx_entries,NULL,alphasort);
39 srf->call_cntr = 0;
40 }
41
42 srf = SRF_PERCALL_SETUP();
43 ctx = (dir_ctx *)srf->user_fctx;
44
45 if( srf->max_calls == -1 )
46 SRF_RETURN_DONE( srf );
47
48 if( srf->call_cntr < srf->max_calls )
49 {
50 struct dirent * entry;
51 size_t nameLen;
52 size_t resultLen;
53 text * result;
54
55 entry = ctx->dir_ctx_entries[srf->call_cntr];
56 nameLen = strlen( entry->d_name );
57 resultLen = nameLen + VARHDRSZ;
58
59 result = (text *)palloc( resultLen );
60
61 VARATT_SIZEP( result ) = resultLen;
62
63 memcpy( VARDATA( result ), entry->d_name, nameLen );
64
65 SRF_RETURN_NEXT( srf, (Datum) result );
66 }
67 else
68 {
69 SRF_RETURN_DONE( srf );
70 }
71 }

I 'll point out a f ew of t he differ ences. Fir st , not ice t hat y ou don't n eed quit e as m an y #include files w h en y ou use t he n ew m echanism
( t he new funcapi.h h eader t ak es car e of including an y r equir ed header s) . Nex t , t ak e a look at t he dir_ctx st r uct ur e at line 11. I f y ou
com par e t hat t o t he or iginal v er sion, y ou'll not ice t hat t he new v er sion is m u ch shor t er . Th e FuncCallContext st r uct ur e alr eady
cont ains placeholder s for som e of t he dat a t hat u sed t o be in dir_ctx. I 'll ex plain m or e in a m om en t .

The nex t significant change appear s at line 29. The n ew v er sion of filelist() calls t he SRF_IS_FIRSTCALL() m acr o t o decide w h et h er
t o init ialize it self or r et ur n t he n ex t v alue in t he r esult set . At line 3 1 y ou see a call t o t he SRF_FIRSTCALL_INIT() m acr o. That m acr o
r et ur ns a point er t o t he FuncCallContext st r uct ur e t hat y ou'r e su pposed t o u se for t his inv ocat ion an d for fut ur e inv ocat ions. Th e call
t o MemoryContextAlloc() ( line 3 3 ) allocat es space for a dir_ctx fr om t he srf->multi_call_memory_ctx cont ex t . srf-
>multi_call_memory_ctx is a m em or y pool t hat sur v iv es fr om inv ocat ion t o inv ocat ion.

Now t ak e a look at lines 38 and 39. The call t o scandir() r et ur ns t he n u m ber of files t hat it finds in t he giv en dir ect or y . I n t he pr ev ious
v er sion, y ou st or ed t he file count in dir_ctx->dir_ctx_count. I n t he new v er sion, y ou don't n eed an ex t r a field ( dir_ctx_count) t o
hold t he file count ; t he FuncCallContext st r uct ur e alr eady has a field t hat ser v es t he sam e pur pose: max_calls. Th e SRF m ech an ism
doesn't act ually do any t hing w it h max_calls, it j ust giv es y ou a place t o st or e a n u m ber . At line 38, filelist() st or es t he file count in
srf->max_calls. Th e FuncCallContext st r uct ur e also h as a r eplacem ent for t he dir_ctx_current field t hat y ou saw in t he or iginal
v er sion of t his funct ion. Each t im e Post gr eSQL calls y our funct ion, it incr em ent s t he call_cntr field in t he FuncCallContext st r uct ur e.
call_cntr st ar t s at 0 an d is incr em ent ed each t im e y ou call SRF_RETURN_NEXT(). To sum m ar ize, t he n ew v er sion of filelist() st or es
t he file count in srf->max_calls an d uses srf->call_cntr t o index in t o t he ar r ay of filenam es.

Ev er y t im e t he ser v er calls t his funct ion, filelist() calls SRF_PERCALL_SETUP() t o r et r iev e a point er t o t he FuncCallContext st r uct ur e
( see line 4 2 ) . I f filelist() decides t hat it has no m or e filenam es t o add t o t he r esult set , it calls SRF_RETURN_DONE() t o t ell
Post gr eSQL t hat it has finished it s w or k ( lines 46 an d 6 9 ) . I f filelist() does h av e anot her r esult , it cr eat es a t ex t st r uct ur e an d calls
SET_RETURN_NEXT() ( see t he pr ev ious v er sion of t his funct ion for a m or e com plet e ex planat ion) . SET_RETURN_NEXT() incr em ent s srf-
>call_cntr, r et ur ns t he Datum ( result) t o t he ser v er , an d t ells t he ser v er t hat it sh ou ld call t his funct ion again t o r et r iev e t he n ex t
r esult .

You can see t hat t he n ew v er sion of t his funct ion is v er y sim ilar t o t he old v er sion. Th e SRF m acr os sim ply hide a few of t he quir k s
im posed by t he Post gr eSQL calling conv ent ion. Wh ich appr oach should y ou use? Th e dow n - and- dir t y appr oach or t he new SRF- m acr o–
based appr oach? That depen ds on y our goals. I f y ou need t o w r it e an ex t ension funct ion t hat w ill w or k in an older v er sion of Post gr eSQL
( older t han v er sion 7. 3) , u se t he or iginal appr oach ( t h e SRF m acr os w er e added in v er sion 7. 3) . I f not , consider t he n ew appr oach . I t 's
possible t hat t he SRF calling conv ent ion m ay ch an ge in t he fut ur e an d it seem s safer t o assu m e t hat t he Post gr eSQL dev eloper s w ill
hide as m an y ch an ges as possible behind t he SRF m acr os. I f y ou choose t o u se t he old appr oach, y ou m ay find t hat y ou h av e t o ch an ge
y our sour ce code w h en y ou upgr ade t o a fut ur e r elease.
Re t u r n in g Com p le t e Row s f r om a n Ex t e n sion Fu n ct ion

I f y o u ' v e r ead t h r ou g h t h e f ir st f e w sect ion s in t h is ch ap t er , y o u k n o w h o w t o w r i t e an ex t en si on f u n ct ion t h at r et u r n s a sin g le scalar v alu e ( t h at ' s w h at


t h e filesize() f u n ct i o n d o es) . Yo u also k n o w h o w t o r et u r n a set o f scal ar v al u es ( t h at ' s w h a t t h e filelist() f u n ct i o n d o es) . I n t h is sect ion , I ' ll sh o w
y o u h o w t o r et u r n a set o f r o w s ( or , a s t h e Po st g r eSQL d ev el o p er s p r ef er t o call t h em , t u p les) .

To illu st r at e t h e seq u en ce t h at y o u m u st f ollow t o r et u r n m u l t i p l e t u p les f r o m a n ex t en si o n f u n ct ion , I ' ll cr eat e a n e w f u n ct ion , fileinfo(), t h at


co m b i n es filesize() a n d filelist(). Yo u call fileinfo() w i t h t h e n a m e of a d ir ect or y a n d it r et u r n s a SETOF t u p les. Each t u p le con t ain s t h r ee
co l u m n s: a f ilen am e, t h e size of t h e f ile ( o r NULL if t h e size is n o t k n o w n ) , a n d t h e f ile t y p e ( o r NULL if t h e t y p e is n o t k n o w n ) . W h e n y o u ' v e f in ish ed ,
y o u can call t h e fileinfo() f u n ct i o n lik e t h is:

movies=# SELECT * FROM fileinfo( '/dev' );


filename | filesize | filetype
---------------+------------+----------
. | 9380 | d
.. | 4096 | d
adsp | 0 | c
agpgart | 0 | c
arpd | 0 | c
audio | 0 | c
cdrom | 0 | b
console | 0 | c
core | 1073156096 | -
cpu | 360 | d
...

To st ar t , y o u m u st d ef in e a d at a t y p e t h at d escr ib es each r o w r et u r n ed b y t h e fileinfo() f u n ct i o n :

Co d e Vi ew : Scr oll / Sh o w All

movies=# CREATE TYPE _fileinfo AS ( filename TEXT, filesize INTEGER, filetype CHAR(1));
CREATE TYPE

I 'll cr eat e a f ew h el p er f u n ct ion s t h at w ill sim p lif y t h e fileinfo() f u n ct ion . List in g 6 . 6 sh o w s t h e getFileInfo() , getFileType(), a n d text2cstring
() f u n ct i on s:

Li st i n g 6 . 6 . fileinfo.c ( P a r t 1 )

Co d e Vi ew : Scr oll / Sh o w All

1 /*
2 ** Filename: fileinfo.c
3 */
4 #include "postgres.h"
5 #include "funcapi.h"
6
7 #include <dirent.h>
8 #include <sys/stat.h>
9
10 typedef struct
11 {
12 struct dirent ** dir_ctx_entries;
13 char * dir_ctx_name;
14 } dir_ctx;
15
16 static bool getFileInfo(struct stat * buf, char * dirName, char * fileName)
17 {
18 char * pathName = (char *) palloc(strlen(dirName)+1+strlen(fileName)+1);
19
20 strcpy( pathName, dirName );
21 strcat( pathName, "/" );
22 strcat( pathName, fileName );
23
24 if( stat( pathName, buf ) == 0 )
25 return( true );
26 else
27 return( false );
28 }
29
30 static char getFileType( mode_t mode )
31 {
32 if( S_ISREG(mode))
33 return( '-' );
34 if( S_ISDIR(mode))
35 return( 'd' );
36 if( S_ISCHR(mode))
37 return( 'c' );
38 if( S_ISBLK(mode))
39 return( 'b' );
40 if( S_ISFIFO(mode))
41 return( 'p' );
42 if( S_ISLNK(mode))
43 return( 'l' );
44 if( S_ISSOCK(mode))
45 return( 's' );
46
47 return( '?' );
48
49 }
50
51 static char * text2cstring( text * src )
52 {
53 int len = VARSIZE( src ) - VARHDRSZ;
54 char * dst = (char *)palloc( len+1 );
55
56 memcpy( dst, src->vl_dat, len );
57 dst[len] = '\0';
58
59 return( dst );
60 }
61

Th e getFileInfo() h el p er f u n ct ion ( lin es 1 6 t h r o u g h 2 8 ) calls stat() t o r et r iev e m et a d a t a t h a t d escr i b es t h e g i v en f ile. Th e caller p r ov id es t h r ee


p ar am et er s: t h e ad d r ess of a struct stat st r u ct u r e t h at getFileInfo() f ills in , t h e n a m e o f t h e d ir ect or y w h er e t h e t ar g et f ile r esid es, a n d t h e n a m e
of t h e t ar g et f ile it self . I f t h e stat() f u n ct i o n su cceed s, getFileInfo() r et u r n s true a n d t h e caller can f in d t h e m et a d a t a f or t h e f ile in t h e struct
stat st r u ct u r e. I f t h e stat() f u n ct ion f ails, getFileInfo() r et u r n s false.

Th e seco n d h el p er f u n ct ion , getFileType(), t r an slat es a mode_t ( r et u r n ed b y t h e stat() f u n ct i o n ) in t o a sin g le ch ar act er t h at r ep r esen t s a f ile t y p e.
getFileType() r et u r n s o n e of t h e f ollow in g v al u es:

€ d ( d ir ect or y )

€ c ( ch ar act er d ev i ce)

€ b ( b l ock d ev ice)

€ p ( n a m ed p i p e)

€ l ( sy m b ol i c lin k )

€ s ( so ck et )

€ ? ( unk now n)

€ - ( a " r eg u lar " file — t h at is, n ot o n e of t h e a b o v e )

Th e last h el p er f u n ct i o n is text2cstring() ( see lin es 5 1 t h r o u g h 6 0 ) . Th is f u n ct i o n con v er t s a TEXT v al u e in t o a d y n am i cal l y allocat ed , n u ll- t er m i n at ed


st r in g . Th e fileinfo() f u n ct i o n ( w h i ch I ' ll d escr ib e n ex t ) calls text2cstring() t o co n v er t it s TEXT a r g u m en t in t o t h e f o r m ex p ect ed b y scandir().

Th e fileinfo() f u n ct i o n is sh o w n in List in g 6 . 7 :

Li st i n g 6 . 7 . fileinfo.c ( P a r t 2 )

Co d e Vi ew : Scr oll / Sh o w All

62 PG_FUNCTION_INFO_V1(fileinfo);
63
64 Datum fileinfo(PG_FUNCTION_ARGS)
65 {
66 char * start = text2cstring( PG_GETARG_TEXT_P(0));
67 dir_ctx * ctx;
68 FuncCallContext * srf;
69
70 if( SRF_IS_FIRSTCALL())
71 {
72 TupleDesc tupdesc;
73 MemoryContext oldContext;
74
75 srf = SRF_FIRSTCALL_INIT();
76
77 oldContext = MemoryContextSwitchTo( srf->multi_call_memory_ctx );
78
79 ctx = (dir_ctx *) palloc( sizeof( dir_ctx ));
80
81 tupdesc = RelationNameGetTupleDesc( "_fileinfo" );
82
83 srf->user_fctx = ctx;
84 srf->max_calls = scandir( start, &ctx->dir_ctx_entries, NULL, alphasort );
85 srf->attinmeta = TupleDescGetAttInMetadata( tupdesc );
86
87 ctx->dir_ctx_name = start;
88
89 MemoryContextSwitchTo( oldContext );
90
91 }
92
93 srf = SRF_PERCALL_SETUP();
94 ctx = (dir_ctx *)srf->user_fctx;
95
96 if( srf->max_calls == -1 )
97 SRF_RETURN_DONE( srf );
98
99 if( srf->call_cntr < srf->max_calls )
100 {
101 struct dirent * entry;
102 char * values[3];
103 struct stat statBuf;
104 char fileSizeStr[10+1] = {0};
105 char fileTypeStr[1+1] = {0};
106 HeapTuple tuple;
107
108 entry = ctx->dir_ctx_entries[srf->call_cntr];
109 values[0] = entry->d_name;
110
111 if( getFileInfo( &statBuf, ctx->dir_ctx_name, entry->d_name ))
112 {
113 snprintf( fileSizeStr, sizeof( fileSizeStr ), "%d", statBuf.st_size );
114 fileTypeStr[0] = getFileType( statBuf.st_mode );
115
116 values[1] = fileSizeStr;
117 values[2] = fileTypeStr;
118 }
119 else
120 {
121 values[1] = NULL;
122 values[2] = NULL;
123 }
124
125 tuple = BuildTupleFromCStrings( srf->attinmeta, values );
126
127 SRF_RETURN_NEXT( srf, HeapTupleGetDatum( tuple ));
128 }
129 else
130 {
131 SRF_RETURN_DONE( srf );
132 }
133 }

Th e fileinfo() f u n ct i o n calls scandir() t o g en er at e an ar r ay t h at con t ain s t h e n a m es of all f iles in t h e g i v en d ir ect or y a n d t h en calls getFileInfo()
( w h i ch in t u r n calls stat()) t o r et r iev e t h e m et a d a t a f or each f ile. Th e ser v er calls fileinfo() u n t il it st o p s r et u r n i n g v alu es. Each in v ocat ion r et u r n s a
sin g le t u p le ( of t y p e _fileinfo) t h at co n t ai n s a f i l en am e a n d t h e si ze a n d t y p e of t h at f ile.

fileinfo() st ar t s b y co n v er t i n g it s ar g u m en t f r o m a TEXT v al u e in t o a n u ll- t er m i n at ed st r in g ( t h e scandir() f u n ct i o n t h at fileinfo() calls at lin e 8 4


r eq u ir es a n u ll- t er m i n at ed st r in g ) . At lin e 7 0 , fileinfo() calls t h e SRF_IS_FIRSTCALL() m a cr o t o d ecid e w h et h er it sh o u l d cr eat e a n d in it ialize a n e w
co n t ex t st r u ct u r e or u se a st r u ct u r e cr eat ed b y a p r ior in v ocat ion .

Th e fileinfo() f u n ct i o n h as t o d o a lit t le m o r e m e m o r y - m a n a g em en t w o r k t h an t h e ot h er f u n ct i on s y o u ' v e seen in t h is ch ap t er . Th e ear lier f u n ct ion s


allocat ed m e m o r y f r o m t h e srf ->multi_call_memory_ctx ( o r fmgr_info>fn_mcxt) p ool. fileinfo() al so allocat es m e m o r y f r o m t h a t p o ol , b u t
fileinfo() calls ot h er Po st g r eSQL f u n ct i on s t h at allocat e m em o r y a s w ell. Fo r ex am p l e, at lin e 8 1 , y o u see a call t o RelationNameGetTupleDesc().
Th at f u n ct i o n allocat es m e m o r y u si n g t h e ser v er ' s palloc() f u n ct ion . Yo u m u st en su r e t h at RelationNameGetTupleDesc() ( an d a n y f u n ct i o n called b y
RelationNameGetTupleDesc()) allocat es m e m o r y f r o m t h e co r r ect MemoryContext. Each MemoryContext h as it s o w n lif et im e ( o r sco p e) . I f
RelationNameGetTupleDesc() allocat es m em o r y f r o m a MemoryContext w i t h a lif et im e t h at ' s t o o sh or t ( t h at is, a lif et im e t h at en d s b ef o r e t h at last
call t o fileinfo()) , y ou ' ll f i n d t h at t h e d a t a cr eat ed b y RelationNameGetTupleDesc() is d e - allocat ed ou t f r om u n d er y o u . On t h e o t h er h an d , if
RelationNameGetTupleDesc() allocat es m e m o r y f r o m a MemoryContext w i t h a lif et im e t h at ' s t oo lon g , y ou ' ll cr eat e a m e m o r y leak . Ta k e a l o o k at lin e
8 1 . Not ice t h at y o u call RelationNameGetTupleDesc() w i t h a sin g le a r g u m en t ( t h e n a m e of t u p l e t y p e) . Si n ce y o u can ' t p ass a MemoryContext t o
RelationNameGetTupleDesc(), h o w d o y o u t ell t h at f u n ct i o n w h i ch MemoryContext t o u se? Th e an sw er is d ecep t iv ely sim p le. Lo o k closely at t h e call t o
palloc() a t lin e 7 9 . palloc() is t h e m o st co m m o n l y u sed m em o r y allocat ion f u n ct i o n in t h e Po st g r eSQL ser v er . palloc() allocat es m em o r y f r o m t h e
MemoryContext p o i n t ed t o b y t h e CurrentMemoryContext g lob al v ar iab le. I f y o u w an t t o t alk RelationNameGetTupleDesc() in t o u sin g a sp ecif ic
MemoryContext, y o u h av e t o p o i n t CurrentMemoryContext t o t h a t con t ex t . Th at ' s w h a t t h e co d e at lin e 7 7 d o es. Cal l MemoryContextSwitchTo()
w h en ev er y o u n eed t o ch an g e t h e lif et im e of d a t a allocat ed b y palloc(). No t i ce t h a t t h e call at lin e 7 7 select s srf->multi_call_memory_ctx ( w h i ch is
a MemoryContext t h at su r v iv es a s l o n g a s fileinfo() h as m o r e t u p les t o r et u r n ) . Af t er MemoryContextSwitchTo() r et u r n s, palloc() w ill allocat e
m em o r y f r o m t h at MemoryContext u n t il so m e b o d y calls MemoryContextSwitchTo() ag ai n . MemoryContextSwitchTo() sw i t ch es t o a n ew
MemoryContext a n d r et u r n s t h e p r ev i o u s v al u e. Yo u sh ou ld r est or e t h e or ig in al MemoryContext w h en y ou ' r e f in ish ed w i t h t h e n ew o n e ( see lin e 8 9 ) .

On ce t h e co r r ect MemoryContext is in p lace, fileinfo() allocat es a n e w dir_ctx con t ex t st r u ct u r e ( see lin e 7 9 ) . Nex t , fileinfo() calls t h e
RelationNameGetTupleDesc() f u n ct i o n t o r et r iev e t h e TupleDesc t h at d ef in es t h e _fileinfo t y p e ( r em em b er , t h e _fileinfo t y p e d escr ib es t h e
lay ou t of t h e t u p les r et u r n ed b y fileinfo(); y o u cr eat ed t h e _fileinfo t y p e ear lier w i t h a CREATE TYPE co m m a n d ) . A TupleDesc is a st r u ct u r e t h at
d escr ib es t h e sh ap e of a t u p le. I t con t ain s ( a m o n g ot h er t h in g s) t h e n u m b er of co l u m n s in t h e t u p l e an d a d escr ip t ion of e a ch co l u m n . Yo u d on ' t h av e
t o p eek in sid e of a TupleDesc ( u n less y o u w an t t o) b u t fileinfo() n eed s t h e d escr ip t or t o b u i l d a r et u r n v alu e.

Af t er r et r iev in g t h e t u p l e d escr ip t or , fileinfo() r eco r d s t h e ad d r ess of it s n ew con t ex t st r u ct u r e so it can f i n d t h e st r u ct u r e in f u t u r e in v ocat ion s ( lin e
8 3 ) . Nex t , fileinfo() calls t h e scandir() f u n ct i o n t o g en er at e an ar r ay t h at con t ain s t h e n a m e of each f ile in t h e g i v en d ir ect or y ( start) . scandir()
r ecor d s t h e ad d r ess of t h e ar r ay in ctx ->dir_ctx_entries.

W h en t h e fileinfo() f u n ct i o n r et u r n s a v al u e t o t h e caller , it d o es so b y b u ild in g a t u p l e o u t o f a collect ion of n u ll- t er m in at ed st r in g s; each st r in g


cor r esp on d s t o o n e of t h e co l u m n s in t h e t u p le. Th e TupleDesc t h at fileinfo() r et r i ev ed at lin e 8 1 d oesn ' t con t ai n q u it e en o u g h i n f o r m at i o n t o
con v er t C st r in g s i n t o a t u p le. For t u n at ely , Po st g r eSQL p r o v i d es a f u n ct ion t h at t r an slat es a TupleDesc in t o a n e w st r u ct u r e t h a t con t ain s all of t h e
d at a y ou ' ll n eed : TupleDescGetAttInMetaData(). Th e co d e at lin e 8 5 calls t h is f u n ct ion a n d st or es t h e ad d r ess of t h e r esu lt in g st r u ct u r e in srf-
>attinmeta ( w h i ch t h e Po st g r eSQL d ev el o p er s con v en ien t ly in clu d ed f or j u st t h is p u r p ose) . A lit t le lat er , fileinfo() w ill u se t h e n e w st r u ct u r e t o
cr eat e t h e r et u r n t u p l e.

Th e in it ializat ion p h a se co m p l et es b y st or in g a co p y of t h e d ir ect or y n a m e ( lin e 8 7 ) a n d r est or in g t h e MemoryContext t h at w a s in p lace w h e n fileinfo


() w a s f ir st called ( lin e 8 9 ) .

Th e co d e at lin es 9 3 t h r o u g h 9 9 sh ou ld b e f am iliar b y n o w —see t h e p r ev i o u s sect ion ( " Th e Po st g r eSQL SRF I n t er f ace" ) if y o u n eed a r ef r esh er . Ev er y
t i m e fileinfo() is called , it calls t h e SRF_PERCALL_SETUP() m a cr o t o f i n d t h e ap p r op r i at e FuncCallContext st r u ct u r e a n d t h en ex t r act s t h e ad d r ess
of t h e dir_ctx st r u ct u r e cr eat ed b y t h e in it ial in v ocat ion .

I m en t i o n ed ear lier t h at fileinfo() cr eat es a r et u r n t u p l e ou t of a collect ion of n u ll- t er m i n at ed st r in g s. Each t u p le con t ain s t h r ee co l u m n s: a f ilen am e,
t h e si ze of t h e f ile, a n d t h e f ile t y p e. Accor d i n g l y , fileinfo() cr eat es t h r ee n u ll- t er m i n at ed st r in g s ( on e f or each co l u m n ) . Th e values[] ar r ay ( see
lin e 1 0 2 ) con t ain s a p oin t er t o each n u ll- t er m i n at ed st r in g . Af t er f illin g in values[], fileinfo() w ill call BuildTupleFromCStrings() t o co n v er t t h e
st r in g s in t o a t u p l e.

Th e f ir st n u ll- t er m i n at ed st r in g ( values[0]) con t ain s t h e n a m e o f o n e f ile f o u n d in t h e dir_ctx_entries[] ar r ay . Th e a ssi g n m en t st at em en t at lin e


1 0 9 co p i es t h e ad d r ess of t h e f ile n a m e in t o values[0].

Th e ot h er n u ll- t er m i n at ed st r in g s ( values[1] a n d values[2]) con t ai n t h e f ile size an d t h e f ile t y p e ( r esp ect iv ely ) . To f i n d t h e size of t h e f ile, fileinfo
() calls t h e getFileInfo() f u n ct i o n y o u sa w ear lier . I f su ccessf u l, getFileInfo() r et u r n s t r u e a n d f ills in t h e statBuf st r u ct u r e w i t h ( a m o n g o t h er
t h in g s) , t h e f ile size a n d t y p e. Af t er con v er t in g t h e f ile si ze in t o a n u ll- t er m i n at ed st r in g ( lin e 1 1 3 ) a n d t r an slat in g t h e f ile m o d e in t o a h u m a n - r ead ab le
f ile t y p e ( lin e 1 1 4 ) , fileinfo() f ills in t h e r est of t h e values[] ar r ay .

I f getFileInfo() f ails f or so m e r eason , t h e r et u r n t u p l e sh o u l d con t ai n a NULL filesize a n d a NULL filetype t o in d icat e t h at t h o se v al u es a r e


" u n k n o w n . " Set t i n g a co l u m n t o NULL is e a sy Ju st set t h e co r r esp o n d i n g en t r y in t h e values[] ar r ay t o NULL ( see lin es 1 2 1 a n d 1 2 2 ) .

By t h e t i m e it r each es lin e 1 2 5 , fileinfo() h as g at h er ed all of t h e in f or m at ion it n eed s t o cr eat e t h e r et u r n t u p le. Th e values[] ar r ay con t ain s t h r ee
st r in g p oin t er s ( o r o n e st r in g p oin t er a n d t w o NULL' s) . To co n v er t t h e n u ll- t er m i n at ed st r in g s in t o a t u p le, fileinfo() calls Post g r eSQL' s
BuildTupleFromCStrings(). Th at f u n ct ion u ses t h e t u p l e d escr ip t ion p r o d u ced b y t h e ear lier call t o TupleDescGetAttInMetadata() a n d t h e p o i n t er s
in values[] t o cr eat e a t u p l e in t h e f o r m ex p ect ed b y t h e Po st g r eSQL ser v er . fileinfo() r et u r n s t h e t u p le t o t h e ser v er b y in v ok in g t h e
SRF_RETURN_NEXT() m a cr o t h at I d escr i b ed ear lier ( see " Th e Po st g r eSQL SRF I n t er f ace" f or m o r e in f or m at ion ) .

W h en fileinfo() h as f in ish ed p r o cessi n g all of t h e f ile n a m e s f o u n d in dir_ctx_entries[], it i n v o k es t h e SRF_RETURN_DONE() m a cr o i n st ead t o t ell


t h e ser v er t h at it h as f in ish ed b u ild in g t h e r esu lt set .

I f y o u w a n t t o t r y t h is f u n ct ion y ou r self , com p ile a n d in st all it ( as d escr i b ed ear lier ) an d ex ecu t e t h e f ollow in g co m m a n d t o t ell t h e Po st g r eSQL ser v er
h o w t o f i n d a n d i n v o k e t h e f u n ct i o n :

CREATE OR REPLACE FUNCTION fileinfo(TEXT)


RETURNS SETOF _fileinfo
AS 'filelist.so','fileinfo' LANGUAGE C
STRICT;

I f y o u ar e r ew a r d ed w i t h a m essa g e t h at st at es type _fileinfo is not yet defined, y o u f or g ot t o ex ecu t e t h e CREATE TYPE co m m a n d t h at I


m en t i o n ed at t h e b eg i n n i n g o f t h is sect ion .

You can call t h e fileinfo() f u n ct ion in a n y con t ex t w h er e y o u w ou l d n o r m al l y SELECT f r o m a t ab le. Fo r ex am p l e, t o f in d t h e n a m e s of all f iles in
t h e /dev d ir ect or y :

movies=# SELECT * FROM fileinfo( '/dev' );


filename | filesize | filetype
---------------+------------+----------
. | 9380 | d
.. | 4096 | d
adsp | 0 | c
agpgart | 0 | c
arpd | 0 | c
audio | 0 | c
cdrom | 0 | b
console | 0 | c
core | 1073156096 | -
cpu | 360 | d
...

On e of t h e cool t h in g s ab o u t Po st g r eSQL f u n ct i on s is t h at y o u ca n m i x f u n ct ion s t h at ar e w r i t t en in d if f er en t l an g u ag es. Fo r ex am p l e, y o u can call


fileinfo() ( w h i ch is w r i t t en in C) f r o m a f u n ct ion w r i t t en in PL/ p g SQL ( o n e of Po st g r eSQL' s p r oced u r al lan g u ag e) . I n f act , List in g 6 . 8 sh o w s a
PL/ p g SQL f u n ct i o n t h at r et u r n s a SETOF _fileinfo t u p les ( j u st lik e t h e fileinfo() f u n ct ion ) . Th is f u n ct i o n calls fileinfo() t o r ecu r siv ely d escen d
t h r ou g h a n en t ir e d ir ect or y t r ee, r et u r n i n g o n e t u p l e f or each f ile ( a n d su b d i r ect or y ) t h at it f in d s.

Li st i n g 6 . 8 . dirtree.sql

Co d e Vi ew : Scr oll / Sh o w All

1 -- File: dirtree.sql
2
3 CREATE OR REPLACE FUNCTION dirtree( TEXT ) RETURNS SETOF _fileinfo AS $$
4 DECLARE
5 file _fileinfo%rowtype;
6 child _fileinfo%rowtype;
7 BEGIN
8
9 FOR file IN SELECT * FROM fileinfo( $1 ) LOOP
10 IF file.filename != '.' and file.filename != '..' THEN
11 file.filename = $1 || '/' || file.filename;
12
13 IF file.filetype = 'd' THEN
14 FOR child in SELECT * FROM dirtree( file.filename ) LOOP
15 RETURN NEXT child;
16 END LOOP;
17 END IF;
18 RETURN NEXT file;
19 END IF;
20 END LOOP;
21
22 RETURN;
23
24 END
25 $$ LANGUAGE 'PLPGSQL';

Do n ' t w o r r y if y o u d on ' t u n d er st an d t h e dirtree() f u n ct i o n y et . I ' ll d escr i b e t h e PL/ p g SQL l a n g u a g e in f u ll d et ail in Ch ap t er 7 , " PL/ p g SQL." Th e
i m p or t an t t h i n g t o n o t e h er e is t h at dirtree(), a f u n ct i o n w r i t t en in PL/ p g SQL can call fileinfo(), a f u n ct ion w r i t t en in C. Ad d i n g u sef u l ex t en si o n
f u n ct ion s t o Po st g r eSQL is n o t t o o d if f icu lt ( assu m i n g t h at y o u ar e co m f o r t ab l e w o r k i n g in C) . No w t h at y o u u n d er st an d t h e m ech a n i sm f or cr eat in g
n ew f u n ct ion s, I ' d lik e t o t u r n y o u r at t en t i on t o t h e p r o cess of cr eat in g a n e w d a t a t y p e. W h e n y o u a d d a n e w d at a t y p e t o Po st g r eSQL, y o u m u st
cr eat e a f ew su p p or t i n g ex t en si on f u n ct ion s, so b e su r e y o u u n d er st an d t h e m at er ial co v er ed so f ar .
Ex t e n d i n g t h e P o st g r e SQ L Se r v e r w i t h Cu st o m D a t a T y p e s

The customers t able in t his sam ple applicat ion cont ains a colum n nam ed balance. I 've m ade t he assum pt ion t hat t he values
in t he balance colum n are expressed in local currency ( t hat is, U.S. dollars in t he U.S., Brit ish pounds in t he U.K.) . This
assum pt ion serves us well unt il our corner video st ore opens a web sit e and st art s accept ing orders from foreign cust om ers.

Post greSQL doesn't have a predefined dat a t ype t hat represent s a foreign currency value, so let 's creat e one. You want t o
st ore t hree pieces of inform at ion for each foreign currency value: t he nam e of t he currency ( pounds, dollars, drachm a, and
so on) , t he num ber of unit s, and t he exchange rat e at t he t im e t he foreign currency value was creat ed. Call your new dat a
t ype FCUR ( Foreign Currency) . Aft er you have fully defined t he FCUR dat a t ype, you can creat e t ables wit h FCUR colum ns,
ent er and display FCUR values, convert bet ween FCUR values and ot her num eric t ypes, and use a few operat ors ( +,-,*,/) t o
m anipulat e FCUR values.
I n t e r n a l a n d Ex t e r n a l Fo r m s

Before going m uch furt her, it is im port ant t o underst and t he difference bet ween t he ext ernal form of a value and t he int ernal
form .

The ext ernal form of a dat a t ype defines how t he user ent ers a value and how a value is displayed t o t he user. For exam ple,
if you ent er a num eric value, you m ight ent er t he charact ers 7218942. I f you ent er t hese charact ers from a client t hat uses
an ASCI I encoding, you have ent ered t he charact er values 37, 32, 31, 38, 39, 34, and 32 ( in hexadecim al not at ion) . The
ext ernal form of a dat a t ype is used t o int eract wit h t he user.

The int ernal form of a dat a t ype defines how a value is represent ed inside t he dat abase. The preceding num eric value form
m ight be t ranslat ed from t he st ring 7218942 int o t he four-byt e int eger value 00 6E 26 FE ( again in hexadecim al not at ion) .
The int ernal form of a dat a t ype is used wit hin t he dat abase.

Why have t wo form s? Most program m ing languages can deal wit h num eric values im plicit ly ( t hat is, wit hout requiring t he
program m er t o im plem ent sim ple arit hm et ic operat ions) . For exam ple, t he C program m ing language defines a built -in dat a
t ype nam ed int. An int value can st ore int eger ( t hat is, whole) num bers wit hin som e range det erm ined by t he com piler. The
C com piler knows how t o add, subt ract , m ult iply, and divide int values. A C program m er is not required t o perform t he bit
m anipulat ions him self; t he com piler em it s t he code required t o perform t he arit hm et ic.

Most program m ers share a com m on underst anding of what it m eans t o add t wo int eger values. When you add t wo int eger
values, you expect t he result t o be t he arit hm et ic sum of t he values. Anot her way t o st at e t his is t o say t hat t he + operat or,
when applied t o t wo int eger operands, should ret urn t he arit hm et ic sum of t he operands, m ost likely in t he form of an
int eger.

What would you expect t he result t o be if you applied t he + operat or t o t wo st ring values? I f each st ring cont ained only a
sequence of one or m ore digit s, such as '1' + '34', you m ight expect t he result t o be t he st ring '35'. What would happen
if you t ried adding '1' + 'red'? That 's pret t y hard t o predict . Because it is difficult t o com e up wit h a good arit hm et ic
definit ion of t he + operat or when applied t o st rings, m any program m ing languages define + t o m ean concat enat ion when
applied t o st ring operands. So, t he expression '1' + 'red' would evaluat e t o t he st ring '1red'.

So, t o sum m arize a bit , t he ext ernal form of a num eric value is a st ring of num eric digit s, sign charact ers, and a radix point .
When you choose t he int ernal form for a num eric value, you want t o choose a represent at ion t hat m akes it easy t o define
and im plem ent m at hem at ical operat ions.

You've already seen t he ext ernal and int ernal form of t he TEXT dat a t ype. The ext ernal form of a TEXT value is a st ring of
charact ers enclosed in single quot es ( t he quot es are not part of t he value; t hey j ust m ark t he boundaries of t he value) . I f you
need t o include single quot es in a TEXT value, t he ext ernal form defines a set of rules for doing so. The int ernal form of a
TEXT value is defined by t he TEXT dat a t ype. The TEXT st ruct ure cont ains a lengt h and an array of charact ers.
D e f in in g a Sim ple D a t a Ty p e in Post g r e SQL

Now t h at y ou u n d er st an d t h e d if f er en ce b et w een in t er n al an d ex t er n al f or m s, it sh ou ld b e ob v iou s t h at Post g r eSQL n eed s t o


con v er t v alu es b et w een t h ese f or m s. Wh en y ou d ef in e a n ew d at a t y p e, y ou t ell Post g r eSQL h o w t o con v er t a v alu e f r om ex t er n al
f or m t o in t er n al f or m an d f r om in t er n al f or m t o ex t er n al f or m .

Let ' s cr eat e a sim p le t y p e t h at m im ics t h e bu ilt - in TEXT d at a t y p e. Dat a t y p e d escr ip t ion s ar e st or ed in t h e pg_type sy st em t ab le.
We ar e in t er est ed in t h r ee of t h e colu m n s:

movies=# SELECT typinput, typoutput, typlen


movies-# FROM pg_type
movies-# WHERE typname = 'text';
typinput | typoutput | typlen
----------+-----------+--------
textin | textout | -1

Th e typinput col u m n t ells y ou t h e n am e of t h e f u n ct ion t h at Post g r eSQL u ses t o con v er t a TEXT v alu e f r om ex t er n al f or m t o
in t er n al f or m ; in t h is case, t h e f u n ct ion is n am ed textin. Th e typoutput col u m n con t ain s t h e n am e of t h e f u n ct ion ( textout) t h at
Post g r eSQL u ses t o con v er t f r om in t er n al t o ex t er n al f or m . Fin ally , typlen sp ecif ies h o w m u ch sp ace is r eq u ir ed t o h old t h e in t er n al
f or m of a TEXT v alu e. TEXT v alu es ar e of v ar iab le len g t h , so t h e sp ace r eq u ir ed t o h old t h e in t er n al f or m is also v ar iab le ( - 1 in t h is
colu m n m ean s v ar iab le len g t h ) . I f TEXT w er e a f ix ed - len g t h t y p e, t h e typlen col u m n w ou ld con t ain t h e n u m b er of b y t es r eq u ir ed t o
h old t h e in t er n al f or m .

Now y ou h av e en ou g h in f or m at ion t o cr eat e a n ew d at a t y p e. Her e is t h e co m m an d t h at y ou ' ll u se t o cr eat e a t y p e n am ed


mytexttype:

movies=# CREATE TYPE mytexttype


movies-# (
movies-# INPUT=textin,
movies-# OUTPUT=textout,
movies-# INTERNALLENGTH=VARIABLE
movies-# );

Th e INPUT=textin clau se t ells Post g r eSQL w h ich f u n ct ion t o call w h en it n eed s t o con v er t a mytexttype v alu e f r om ex t er n al t o
in t er n al f or m . Th e OUTPUT=textout clau se t ells Post g r eSQL w h ich f u n ct ion con v er t s a mytexttype v alu e f r om in t er n al t o ex t er n al
f or m . Th e f in al clau se, INTERNALLENGTH=VARIABLE, t ells Post g r eSQL h o w m u ch sp ace is r eq u ir ed t o h old t h e in t er n al f or m of a
mytexttype v alu e; y ou sp ecif y VARIABLE h er e t o t ell Post g r eSQL t h at y ou ar e n ot d ef in in g a f ix ed len g t h d at a t y p e.

You h av e essen t ially clon ed t h e TEXT [ 3] d at a t y p e. Becau se y ou ar e u sin g t h e sam e in p u t an d ou t p u t f u n ct ion s as t h e TEXT t y p e,
t h e in t er n al an d ex t er n al f or m of a mytexttype v alu e is iden t ical t o t h e in t er n al an d ex t er n al f or m of a TEXT v alu e.

[ 3]
You h av e cr eat ed a n ex t r em ely lim it ed clon e. At t h is p oin t , y ou can en t er a n d d isp lay mytexttype v alu es, b u t y ou
can ' t do an y t h in g else w it h t h em . You h av e n ot d ef in ed an y op er at or s t h at can m an ip u lat e mytexttype v alu es.

Af t er y ou ex ecu t e t h is CREATE TYPE co m m an d , y ou can u se t h e mytexttype d at a t y p e t o cr eat e n ew colu m n s:

movies=# CREATE TABLE myTestTable


movies-# (
movies(# pkey INTEGER,
movies(# value mytexttype
movies(# );
CREATE

You can also en t er mytexttype v alu es. Becau se y ou b or r ow ed t h e textin an d textout f u n ct ion s, y ou h av e t o en t er v alu es
accor d in g t o t h e r u les f or a TEXT v alu e:

movies=# INSERT INTO myTestTable


movies-# VALUES ( 1, 'This is a mytexttype value in external form' );

Now , let ' s d ef in e a n ew d at a t y p e f r om scr at ch .


D e f i n i n g t h e D a t a T y p e in C

We'll st art out by defining t he int ernal form for an FCUR value. As I m ent ioned before, you want t o st ore t hree pieces of
inform at ion for each value: t he nam e of t he currency ( dollars, euros, yen, and so on) , t he num ber of unit s, and t he exchange
rat e at t he t im e t he value was creat ed. Why do you need t o st ore t he exchange rat e wit h each value? Because exchange
rat es vary over t im e, and you need t o know t he rat e at t he t im e t he value is creat ed.

Because you are going t o use t he C program m ing language t o im plem ent t he required conversion funct ions, you need t o
define a st ruct ure [ 4 ] cont aining t he t hree com ponent s. List ing 6.9 shows t he first few lines of t he im plem ent at ion file:

[ 4]
This is not necessarily t he m ost efficient ( or even realist ic) way t o st ore a foreign currency value, but it
works well for purposes of illust rat ion. I n a real- world im plem ent at ion, you would not want t o st ore m onet ary
values using float ing- point dat a t ypes because of t heir inherent lack of precision. You would also want m ore
cont rol over t he form at of t he currency nam e.

List ing 6 .9 . fcur.c ( Pa rt 1 )

1 /*
2 ** File name: fcur.c
3 */
4
5 #include "postgres.h"
6 #include "fmgr.h"
7
8 typedef struct
9 {
10 char fcur_name[4]; /* Currency name */
11 float4 fcur_units; /* Units of currency */
12 float4 fcur_xrate; /* Exchange rate */
13 } fcur;
14
15 static char * baseCurrencyName = "US$";
16 static char * unknownCurrencyName = "???";
17

St art by #including t he postgres.h and fmgr.h header files, j ust like you did for t he earlier exam ples. The fcur st ruct ure
defines t he int ernal form for your fcur dat a t ype. St ore t he currency nam e ( fcur_name) as a t hree- charact er, null-
t erm inat ed st ring. The fcur_units m em ber st ores t he num ber of currency unit s as a float ing- point num ber. The exchange
rat e is st ored as a float ing- point num ber in fcur_xrate.

At lines 15 and 16, you define t wo currency nam es. The baseCurrencyName is t he nam e of t he local currency. When t he
fcur_name of a value is equal t o baseCurrencyName, t he value is said t o be norm alized. A norm alized value will always have
an exchange rat e ( fcur_xrate) of 1.0: One U.S. dollar always equals one U.S. dollar. The unknownCurrencyName is used
when t he user ent ers a value cont aining a num ber of unit s and an exchange rat e, but fails t o provide t he currency nam e.
We'll use each of t hese variables in a m om ent .
D e f i n i n g t h e I n p u t a n d O u t p u t Fu n ct i o n s in C

Now you will creat e t he input and out put funct ions for t his dat a t ype. At t his point , you have t o decide what your ext ernal
form will look like. You know t hat you need t o deal wit h t hree com ponent s: t he num ber of unit s, an opt ional exchange rat e,
and an opt ional currency nam e. You want t he t ypical case ( unit s only) t o be easy t o ent er, so you will accept input in any of
t he following form s:

units
units(exchange-rate)
units(exchange-rate/currency-name)

I f you see a num ber ( and not hing else) , assum e t hat you have a num ber of unit s of t he base currency. I f you see a num ber
followed by an open parent hesis, you will expect an exchange rat e t o follow. I f t he exchange rat e is followed by a slash
charact er, expect a currency nam e. Of course, we expect a closed parent hesis if we see an open one.

Table 6.1 shows a few valid FCUR ext ernal values ( assum ing t hat baseCurrencyName is "US$") :

Ta ble 6 .1 . Sa m ple FCUR Values ( in Ex t erna l Form )

Ex t erna l For m M eaning


'1' 1 U.S. dollar
'1(.5)' 1 unit of unknownCurrencyName wit h an exchange rat e of 0.5
'3(1/US$)' 3 U.S. dollars
'5(.687853/GPB)' -5 Brit ish pounds wit h an exchange rat e of .687853 Pounds per
1 U.S. dollar
'10(7.2566/FRF)' -10 French francs wit h an exchange rat e of 7.2566 Francs per
1 U.S. dollar
'1.52(1.5702/CA$)' -1.52 Canadian dollars wit h an exchange rat e of 1.5702
Canadian dollars per 1 U.S. dollar

The input funct ion is nam ed fcur_in ( see List ing 6.10) , and it convert s from ext ernal ( FCUR) form t o int ernal ( fcur) form .
This funct ion expect s a single param et er: a point er t o a null- t erm inat ed st ring cont aining t he ext ernal form of an fcur value.

List ing 6 .1 0 . fcur.c ( Pa rt 2 )

Code View: Scroll / Show All

18 /*
19 ** Name: fcur_in()
20 **
21 ** Converts an fcur value from external form
22 ** to internal form.
23 */
24
25 PG_FUNCTION_INFO_V1(fcur_in);
26
27 Datum fcur_in(PG_FUNCTION_ARGS)
28 {
29 char * src = PG_GETARG_CSTRING(0);
30 char * workStr = pstrdup( src );
31 char * units = NULL;
32 char * name = NULL;
33 char * xrate = NULL;
34 fcur * result = NULL;
35 char * endPtr = NULL;
36
37 /* strtok() will find all of the components for us */
38
39 units = strtok( workStr, "(" );
40 xrate = strtok( NULL, "/)" );
41 name = strtok( NULL, ")" );
42
43 result = (fcur *)palloc( sizeof( fcur ));
44
45 memset( result, 0x00, sizeof( fcur ));
46
47 result->fcur_units = strtod( units, &endPtr );
48
49 if( xrate )
50 {
51 result->fcur_xrate = strtod( xrate, &endPtr );
52 }
53 else
54 {
55 result->fcur_xrate = 1.0;
56 }
57
58 if( name )
59 {
60 strncpy( result->fcur_name,
61 name,
62 sizeof( result->fcur_name ));
63 }
64 else
65 {
66 strncpy( result->fcur_name,
67 unknownCurrencyName,
68 sizeof( result->fcur_name ));
69 }
70
71 PG_RETURN_POINTER( result );
72 }
73

Not ice t hat t his looks suspiciously sim ilar t o t he ext ension funct ions you saw earlier in t his chapt er. I n part icular, fcur_in()
ret urns a Datum and uses PG_FUNCTION_ARGS t o declare t he param et er list . This sim ilarit y exist s because fcur_in() is an
ext ension funct ion, so everyt hing t hat you already know about writ ing ext ension funct ions applies t o t his discussion as well.

You use t he strtok() funct ion ( from t he C Runt im e Library) t o parse out t he ext ernal form . strtok() is a dest ruct ive
funct ion; it m odifies t he st ring t hat you pass t o it . So t he first t hing you need t o do in t his funct ion is t o m ake a copy of t he
input st ring. Use t he pstrdup() funct ion t o m ake t he copy. pstrdup() is sim ilar t o t he strdup() funct ion from t he C
Runt im e Library, except t hat t he m em ory t hat holds t he copy is allocat ed using palloc() and m ust be freed using pfree().
You use pstrdup() t o avoid any m em ory leaks should you forget t o clean up aft er yourself.

Lines 39, 40, and 41 parse t he input st ring int o t hree com ponent s. Rem em ber, you will accept input st rings in any of t he
following form s:

units
units(exchange-rate)
units(exchange-rate/currency-name)

The units com ponent m ust be a st ring represent ing a float ing- point num ber. You will use t he strtod() runt im e funct ion t o
convert units int o a float4, so t he form at of t he input st ring m ust m eet t he requirem ent s of strtod(). Here is an excerpt
from t he Linux strtod() man page t hat describes t he required form :

The expected form of the string is optional leading white


space as checked by isspace(3), an optional plus (``+'')
or minus sign (``-'') followed by a sequence of digits
optionally containing a decimal-point character, option-
ally followed by an exponent. An exponent consists of an
``E'' or ``e'', followed by an optional plus or minus
sign, followed by a non-empty sequence of digits. If the
locale is not "C" or "POSIX", different formats may be
used.

The opt ional exchange-rate com ponent is also convert ed t o a float4 by strtod().

The currency-name com ponent is sim ply a t hree- charact er st ring. Values such as "US$" ( U.S. dollar) ,"GPB" ( Brit ish pound) ,
and "CA$" ( Canadian dollar) seem reasonable. I n your sam ple dat a t ype, you won't do any validat ion on t his st ring. I n a real-
world im plem ent at ion, you would probably want t o m at ch t he currency nam e wit h a t able of valid ( and st andardized)
spellings.

The first call t o strtok() ret urns a null- t erm inat ed st ring cont aining all charact ers up t o ( but not including) t he first ( in
workStr. I f workStr doesn't cont ain a ( charact er, units will cont ain t he ent ire input st ring. The second call t o strtok()
picks out t he opt ional exchange-rate com ponent . The final call t o strtok() picks out t he opt ional currency-name.
Aft er you have t okenized t he input st ring int o unit s, exchange rat e, and currency nam e, you can allocat e space for t he
int ernal form at line 43. Not ice t hat palloc() is used here.

The rest of t his funct ion is pret t y sim ple. You use strtod() t o convert t he unit s and exchange rat e int o t he fcur st ruct ure. I f
t he user didn't provide you wit h an exchange rat e, assum e t hat it m ust be 1.0. You finish building t he fcur st ruct ure by
copying in t he first t hree charact ers of t he currency nam e, or unknownCurrencyName if you didn't find a currency nam e in t he
input st ring.

Line 71 ret urns t he Datum t o t he caller.

That 's pret t y sim ple! Of course, I om it t ed all t he error-checking code t hat you would need in a real- world applicat ion.

Now, let 's look at t he out put funct ion. fcur_out(), shown in List ing 6.11, convert s an fcur st ruct ure from int ernal t o
ext ernal form .

List ing 6 .1 1 . fcur.c ( Pa rt 3 )

Code View: Scroll / Show All

74 /*
75 ** Name: fcur_out()
76 **
77 ** Converts an fcur value from internal form
78 ** to external form.
79 */
80
81 PG_FUNCTION_INFO_V1(fcur_out);
82
83 Datum fcur_out(PG_FUNCTION_ARGS)
84 {
85 fcur * src = (fcur *)PG_GETARG_POINTER( 0 );
86 char * result;
87 char work[16+sizeof(src->fcur_name)+16+4];
88
89 sprintf( work, "%g(%g/%s)",
90 src->fcur_units,
91 src->fcur_xrate,
92 src->fcur_name );
93
94 result = (char *)palloc( strlen( work ) + 1 );
95
96 strcpy( result, work );
97
98 PG_RETURN_CSTRING( result );
99
100 }
101

This funct ion is m uch short er t han t he input funct ion. That 's t ypically t he case because your code has far fewer decisions t o
m ake.

You form at t he fcur com ponent s int o a work buffer at lines 89 t hrough 92: sprintf() t akes care of all t he grunt work.
Not ice t hat you are form at t ing int o an array of charact ers large enough t o hold t he largest result t hat you can expect ( t wo
16- digit num bers, a funct ion nam e, t wo parent heses, a slash, and a null t erm inat or) . Som e of you m ight not like using a
fixed- size buffer wit h sprintf(); use snprintf() if you have it and you are worried about buffer overflows.

Aft er you have a form at t ed st ring, use palloc() t o allocat e t he result st ring. ( I n case you were wondering, you form at int o a
t em porary buffer first so t hat you can allocat e a result st ring of t he m inim um possible size.) At line 96, you copy t he
t em porary st ring int o t he result st ring and t hen ret urn t hat st ring at line 98.

I should point out an im port ant considerat ion about t he input and out put funct ions t hat you have j ust writ t en. I t 's very
im port ant t hat t he form at of t he st ring produced by t he out put funct ion m at ch t he form at underst ood by t he input funct ion.
When you back up a t able using pg_dump, t he archive cont ains t he ext ernal form of each colum n. When you rest ore from t he
archive, t he dat a m ust be convert ed from ext ernal form t o int ernal form . I f t hey don't m at ch, you won't be able t o rest ore
your dat a.
D e f i n i n g t h e I n p u t a n d O u t p u t Fu n ct i o n s in P o st g r e SQ L

Now t hat you have creat ed t he input ( ext ernal t o int ernal) and out put ( int ernal t o ext ernal) funct ions in C, you m ust com pile
t hem int o a shared obj ect m odule:

$ make -f makefile fcur.so

Next , creat e a sym bolic link bet ween fcur.so and Post greSQL's preferred package direct ory so t hat Post greSQL knows how
t o find out code:

$ ln -s `pwd`/fcur.so `pg_config --pkglibdir`

Now you can define t he input and out put funct ions in Post greSQL:

movies=# CREATE OR REPLACE FUNCTION fcur_in( opaque )


movies-# RETURNS opaque
movies-# AS 'fcur.so' LANGUAGE 'C'
movies=# IMMUTABLE STRICT
CREATE
movies=# CREATE OR REPLACE FUNCTION fcur_out( opaque )
movies-# RETURNS opaque
movies-# AS 'fcur.so' LANGUAGE 'C'
movies=# IMMUTABLE STRICT

Not ice t hat each of t hese funct ions expect s an opaque param et er and ret urns an opaque value. You m ight be t hinking t hat
fcur_in() should t ake a null-t erm inat ed st ring and ret urn a FCUR. That m akes sense except for t wo m inor problem s:
Post greSQL doesn't have a SQL dat a t ype t hat represent s a null- t erm inat ed st ring and Post greSQL doesn't know anyt hing
about t he FCUR dat a t ype yet . Okay, t hose aren't exact ly m inor problem s. Post greSQL helps you out a lit t le here by let t ing
you define t hese funct ions in t erm s of opaque. The opaque dat a t ype t ells Post greSQL t hat a SQL dat a t ype doesn't define t he
dat a t hat you are working wit h. One of t he special propert ies of an opaque funct ion is t hat you can't call it direct ly:

movies=# SELECT fcur_in( '5(1.3/GPB)' );


ERROR: getTypeOutputInfo: Cache lookup of type 0 failed

This error m essage m eans, "don't t ry t hat again."

We've defined each of t hese funct ions wit h t wo addit ional at t ribut es. The IMMUTABLE at t ribut e t ells Post greSQL t hat calling
t his funct ion t wice wit h t he sam e argum ent ( s) is guarant eed t o ret urn t he sam e result . I f Post greSQL knows t hat a funct ion
IMMUTABLE, it can opt im ize cert ain operat ions by com put ing t he ret urn value once and caching t he result ( hence t he clever
nam e) .
D e f in in g t h e D a t a T y p e in P o st g r e SQ L

At t his point , Post greSQL knows about your input and out put funct ions. Now you can t ell Post greSQL about your dat a t ype:

CREATE TYPE FCUR ( INPUT=fcur_in, OUTPUT=fcur_out, INTERNALLENGTH=12 );

This com m and creat es a new dat a type ( how excit ing) nam ed FCUR. The input funct ion is nam ed fcur_in, and t he out put funct ion is nam ed
fcur_out. The INTERNALLENGTH=12 clause t ells Post greSQL how m uch space is required t o hold t he int ernal value. I com put ed t his value by
hand—j ust add up t he size of each m em ber of t he fcur st ruct ure and be sure t hat you account for any pad byt es. The safest way t o com put e t he
INTERNALLENGTH is to use your C com piler's sizeof() operat or.

Let 's creat e a t able t hat uses t his dat a t ype and insert a few values:

movies=# CREATE TABLE fcur_test( pkey INT, val FCUR );


CREATE
movies=# INSERT INTO fcur_test VALUES( 1, '1' );
INSERT
movies=# INSERT INTO fcur_test VALUES( 2, '1(.5)' );
INSERT
movies=# INSERT INTO fcur_test VALUES( 3, '3(1/US$)' );
INSERT
movies=# INSERT INTO fcur_test VALUES( 4, '5(.687853/GBP)' );
INSERT
movies=# INSERT INTO fcur_test VALUES( 5, '10(7.2566/FRF)' );
INSERT
movies=# INSERT INTO fcur_test VALUES( 6, '1(1.5702/CA$)' );
INSERT
movies=# INSERT INTO fcur_test VALUES( 7, '1.5702(1.5702/CA$)' );
INSERT

Now let 's see what t hose values look like when you ret rieve t hem :

movies=# SELECT * FROM fcur_test;


pkey | val
------+--------------------
1 | 1(1/???)
2 | 1(0.5/???)
3 | 3(1/US$)
4 | 5(0.687853/GBP)
5 | 10(7.2566/FRF)
6 | 1(1.5702/CA$)
7 | 1.5702(1.5702/CA$)

Not bad. The quest ion m arks are kind of ugly, but t he dat a t hat you put in cam e back out .

At t his point , you officially have a new dat a t ype. You can put values in and you can get values out . Let 's add a few funct ions t hat m ake t he FCUR
type a lit t le m ore useful.

I t would be nice t o know if t wo FCUR values represent t he sam e am ount of m oney expressed in your local currency. I n ot her words, you want a
funct ion, fcur_eq, which you can call like t his:

movies=# SELECT fcur_eq( '1', '1.5702(1.5702/CA$)' );


fcur_eq
---------
t
(1 row)

movies=# SELECT fcur_eq( '1', '3(1.5702/CA$)' );


fcur_eq
---------
f
(1 row)

The first call to fcur_eq t ells you t hat 1.5702 Canadian dollars is equal to 1 U.S. dollar. The second call t ells you t hat 3 Canadian dollars are not
equal to 1 U.S. dollar.

To com pare t wo FCUR values, you m ust convert t hem int o a com m on currency. The normalize() funct ion shown in List ing 6.12 does j ust t hat .

List ing 6 .1 2 . fcur.c 0( Part 4 )

102 /*
103 ** Name: normalize()
104 **
105 ** Converts an fcur value into a normalized
106 ** double by applying the exchange rate.
107 */
108
109 static double normalize( fcur * src )
110 {
111 return( src->fcur_units / src->fcur_xrate );
112 }
The normalize() funct ion convert s a given FCUR value int o our local currency. You can use normalize() t o im plem ent t he fcur_eq() funct ion,
shown in List ing 6.13.

List ing 6 .1 3 . fcur.c ( Part 5 )

115 /*
116 ** Name: fcur_eq()
117 **
118 ** Returns true if the two fcur values
119 ** are equal (after normalization), otherwise
120 ** returns false.
121 */
122
123 PG_FUNCTION_INFO_V1(fcur_eq);
124
125 Datum fcur_eq(PG_FUNCTION_ARGS)
126 {
127 fcur * left = (fcur *)PG_GETARG_POINTER(0);
128 fcur * right = (fcur *)PG_GETARG_POINTER(1);
129
130 PG_RETURN_BOOL( normalize( left ) == normalize( right ));
131 }
132

This funct ion is st raight forward. You norm alize each argum ent , com pare t hem using t he C == operat or, and ret urn t he result as a BOOL Datum.
You declare t his funct ion as STRICT so t hat you don't have t o check for NULL argum ent s.

Now you can com pile your code again and t ell Post greSQL about your new funct ion ( fcur_eq()):

$ make -f makefile fcur.so


$ psql -q
movies=# CREATE OR REPLACE FUNCTION fcur_eq( fcur, fcur )
movies-# RETURNS bool
movies-# AS 'fcur.so' LANGUAGE 'C'
movies=# IMMUTABLE STRICT

Now you can call t his funct ion t o com pare any t wo FCUR values:

movies=# SELECT fcur_eq( '1', '1.5702(1.5702/CA$)' );


fcur_eq
---------
t
(1 row)

movies=# SELECT fcur_eq( '1', NULL );


fcur_eq
---------

(1 row)

The fcur_eq funct ion is nice, but you really want t o com pare FCUR values using t he = operat or. Fort unat ely, t hat 's easy t o do:

movies=# CREATE OPERATOR =


movies-# (
movies-# leftarg = FCUR,
movies-# rightarg = FCUR,
movies-# procedure = fcur_eq,
movies-# );

This com m and creat es a new operat or nam ed =. This operat or has a FCUR value on t he left side and a FCUR value on t he right side. Post greSQL
calls t he fcur_eq funct ion whenever it needs to evaluat e t his operat or.

Now you can evaluat e expressions such as

movies=# SELECT * FROM fcur_test WHERE val = '1';


pkey | val
------+--------------------
1 | 1(1/???)
7 | 1.5702(1.5702/CA$)
(2 rows)

The operat or synt ax is m uch easier t o read t han t he funct ional synt ax. Let 's go ahead and add t he ot her com parison operat ors: <>, <, <=, >, and
>= ( see List ing 6.14) . They all follow t he sam e pat t ern as t he = operat or: You norm alize bot h argum ent s and t hen com pare them as double
values.

List ing 6 .1 4 . fcur.c ( Part 6 )


Code View: Scroll / Show All

133 /*
134 ** Name: fcur_ne()
135 **
136 ** Returns true if the two fcur values
137 ** are not equal (after normalization),
138 ** otherwise returns false.
139 */
140
141 PG_FUNCTION_INFO_V1(fcur_ne);
142
143 Datum fcur_ne(PG_FUNCTION_ARGS)
144 {
145 fcur * left = (fcur *)PG_GETARG_POINTER(0);
146 fcur * right = (fcur *)PG_GETARG_POINTER(1);
147
148 PG_RETURN_BOOL( normalize( left ) != normalize( right ));
149 }
150
151 /*
152 ** Name: fcur_lt()
153 **
154 ** Returns true if the left operand
155 ** is less than the right operand.
156 */
157
158 PG_FUNCTION_INFO_V1(fcur_lt);
159
160 Datum fcur_lt(PG_FUNCTION_ARGS)
161 {
162 fcur * left = (fcur *)PG_GETARG_POINTER(0);
163 fcur * right = (fcur *)PG_GETARG_POINTER(1);
164
165 PG_RETURN_BOOL( normalize( left ) < normalize( right ));
166 }
167
168 /*
169 ** Name: fcur_le()
170 **
171 ** Returns true if the left operand
172 ** is less than or equal to the right
173 ** operand.
174 */
175
176 PG_FUNCTION_INFO_V1(fcur_le);
177
178 Datum fcur_le(PG_FUNCTION_ARGS)
179 {
180 fcur * left = (fcur *)PG_GETARG_POINTER(0);
181 fcur * right = (fcur *)PG_GETARG_POINTER(1);
182
183 PG_RETURN_BOOL( normalize( left ) <= normalize( right ));
184 }
185
186 /*
187 ** Name: fcur_gt()
188 **
189 ** Returns true if the left operand
190 ** is greater than the right operand.
191 */
192
193 PG_FUNCTION_INFO_V1(fcur_gt);
194
195 Datum fcur_gt(PG_FUNCTION_ARGS)
196 {
197 fcur * left = (fcur *)PG_GETARG_POINTER(0);
198 fcur * right = (fcur *)PG_GETARG_POINTER(1);
199
200 PG_RETURN_BOOL( normalize( left ) > normalize( right ));
201 }
202
203 /*
204 ** Name: fcur_ge()
205 **
206 ** Returns true if the left operand
207 ** is greater than or equal to the right operand.
208 */
209
210 PG_FUNCTION_INFO_V1(fcur_ge);
211
212 Datum fcur_ge(PG_FUNCTION_ARGS)
213 {
214 fcur * left = (fcur *)PG_GETARG_POINTER(0);
215 fcur * right = (fcur *)PG_GETARG_POINTER(1);
216
217 PG_RETURN_BOOL( normalize( left ) >= normalize( right ));
218 }

Now you can t ell Post greSQL about t hese funct ions:
Code View: Scroll / Show All

movies=# CREATE OR REPLACE FUNCTION fcur_ne( fcur, fcur )


movies-# RETURNS boolean
movies-# AS 'fcur.so' LANGUAGE 'C'
movies=# IMMUTABLE STRICT
CREATE
movies=# CREATE OR REPLACE FUNCTION fcur_lt( fcur, fcur )
movies-# RETURNS boolean
movies-# AS 'fcur.so' LANGUAGE 'C'
movies=# IMMUTABLE STRICT
CREATE
movies=# CREATE OR REPLACE FUNCTION fcur_le( fcur, fcur )
movies-# RETURNS boolean
movies-# AS 'fcur.so' LANGUAGE 'C'
movies=# IMMUTABLE STRICT
CREATE
movies=# CREATE OR REPLACE FUNCTION fcur_gt( fcur, fcur )
movies-# RETURNS boolean
movies-# AS 'fcur.so' LANGUAGE 'C'
movies=# IMMUTABLE STRICT
CREATE
movies=# CREATE OR REPLACE FUNCTION fcur_ge( fcur, fcur )
movies-# RETURNS boolean
movies-# AS 'fcur.so' LANGUAGE 'C'
movies=# IMMUTABLE STRICT
CREATE

And you can t urn each of t hese funct ions int o an operat or:

Code View: Scroll / Show All

movies=# CREATE OPERATOR <>


movies-# (
movies-# leftarg = fcur,
movies-# rightarg = fcur,
movies-# procedure = fcur_ne,
movies-# commutator = <>
movies-# );
CREATE

movies=# CREATE OPERATOR <


movies-# (
movies-# leftarg = fcur,
movies-# rightarg = fcur,
movies-# procedure = fcur_lt,
movies-# commutator = >
movies-#);
CREATE

movies=# CREATE OPERATOR <=


movies-# (
movies-# leftarg = fcur,
movies-# rightarg = fcur,
movies-# procedure = fcur_le,
movies-# commutator = >=
movies-# );
CREATE

movies=# CREATE OPERATOR >


movies-# (
movies-# leftarg = fcur,
movies-# rightarg = fcur,
movies-# procedure = fcur_gt,
movies-# commutator = <
movies-# );
CREATE

movies=# CREATE OPERATOR >=


movies-# (
movies-# leftarg = fcur,
movies-# rightarg = fcur,
movies-# procedure = fcur_ge,
movies-# commutator = <=
movies-#);
CREATE

Not ice t hat t here is a commutator for each of t hese operat ors. The commutator can help Post greSQL opt im ize queries t hat involve t he operat or.

For exam ple, let 's say t hat you have an index t hat covers t he balance colum n. Wit h a commutator, t he query

SELECT * FROM customers WHERE balance > 10 and new_balance > balance;

can be rewrit t en as
SELECT * FROM customers WHERE balance > 10 and balance < new_balance;

This allows Post greSQL to perform a range scan using t he balance index. The commutator for an operat or is t he operat or t hat Post greSQL can
use to swap t he order of t he operands. For exam ple, > is t he com m ut at or for < because if x > y, y < x. Likewise, < is t he com m ut at or for >.
Som e operat ors are commutators for t hem selves. For exam ple, t he = operat or is a com m ut at or for it self. I f x = y is t rue, t hen y = x is also t rue.

There are ot her opt im izer hint s t hat you can associat e wit h an operat or. See t he CREATE OPERATOR sect ion of t he Post greSQL Reference Manual
for m ore inform at ion.

I 'll finish up t his chapt er by defining one m ore operat or ( addit ion) and t wo funct ions t hat ext end t he usefulness of t he FCUR dat a t ype.

First , let 's look at a funct ion t hat adds t wo FCUR values ( see List ing 6.15):

List ing 6 .1 5 . fcur.c ( Part 7 )

Code View: Scroll / Show All

259 /*
260 ** Name: fcur_add()
261 **
262 ** Adds two fcur values, returning the result
263 ** If the operands are expressed in the same
264 ** currency (and exchange rate), the result
265 ** will be expressed in that currency,
266 ** otherwise, the result will be in normalized
267 ** form.
268 */
269
270 PG_FUNCTION_INFO_V1(fcur_add);
271
272 Datum fcur_add(PG_FUNCTION_ARGS)
273 {
274 fcur * left = (fcur *)PG_GETARG_POINTER(0);
275 fcur * right = (fcur *)PG_GETARG_POINTER(1);
276 fcur * result;
277
278 result = (fcur *)palloc( sizeof( fcur ));
279
280 if( left->fcur_xrate == right->fcur_xrate )
281 {
282 if( strcmp( left->fcur_name, right->fcur_name ) == 0 )
283 {
284 /*
285 ** The two operands have a common currency - preserve
286 ** that currency by constructing a new fcur with the
287 ** same currency type.
288 */
289 result->fcur_xrate = left->fcur_xrate;
290 result->fcur_units = left->fcur_units + right->fcur_units;
291 strcpy( result->fcur_name, left->fcur_name );
292
293 PG_RETURN_POINTER( result );
294 }
295 }
296
297 result->fcur_xrate = 1.0;
298 result->fcur_units = normalize( left ) + normalize( right );
299 strcpy( result->fcur_name, baseCurrencyName );
300
301 PG_RETURN_POINTER( result );
302
303 }
304

This funct ion ret urns a FCUR dat um ; at line 278, we use palloc() t o allocat e t he ret urn value. fcur_add() has a nice feat ure: I f t he t wo
operands have a com m on currency and a com m on exchange rat e, t he result is expressed in t hat currency. I f t he operands are not expressed in a
com m on currency, t he result will be a value in local currency.

Lines 289 t hrough 291 const ruct t he result in a case where t he operand currencies are com pat ible. I f t he currencies are not com pat ible, const ruct
t he result at lines 297 t hrough 299.

Let 's t ell Post greSQL about t his funct ion and m ake an operat or ( +) out of it :

movies=# CREATE OR REPLACE FUNCTION fcur_add( fcur, fcur )


movies-# RETURNS fcur
movies-# AS 'fcur.so' LANGUAGE 'C'
movies=# IMMUTABLE STRICT
CREATE
movies-# CREATE OPERATOR +
movies-# (
movies-# leftarg = fcur,
movies-# rightarg = fcur,
movies-# procedure = fcur_add,
movies-# commutator = +
movies-# );
CREATE

Now, t ry it :

movies=# SELECT *, val + '2(1.5702/CA$)' AS result FROM fcur_test;


pkey | val | result
------+--------------------+--------------------
1 | 1(1/???) | 2.27372(1/US$)
2 | 1(0.5/???) | 3.27372(1/US$)
3 | 3(1/US$) | 4.27372(1/US$)
4 | 5(0.687853/GBP) | 8.54272(1/US$)
5 | 10(7.2566/FRF) | 2.65178(1/US$)
6 | 1(1.5702/CA$) | 3(1.5702/CA$)
7 | 1.5702(1.5702/CA$) | 3.5702(1.5702/CA$)
(7 rows)

Not ice t hat t he result values for rows 6 and 7 are expressed in Canadian dollars.

Creat ing ot her arit hm et ic operat ors for t he FCUR t ype is sim ple. I f t he operands share a com m on currency ( and exchange rat e) , t he result should
be expressed in t hat currency. I 'll let you add t he rest of t he arit hm et ic operat ors.

The last two funct ions t hat I want ed to show you ( see List ing 6.16) will convert FCUR values to and from REAL values. I nt ernally, t he REAL dat a
type is known as a float4.

List ing 6 .1 6 . fcur.c ( Part 8 )

220 /*
221 ** Name: fcur_to_float4()
222 **
223 ** Converts the given fcur value into a
224 ** normalized float4.
225 */
226
227 PG_FUNCTION_INFO_V1(fcur_to_float4);
228
229 Datum fcur_to_float4(PG_FUNCTION_ARGS)
230 {
231 fcur * src = (fcur *)PG_GETARG_POINTER(0);
232
233 PG_RETURN_FLOAT4( normalize( src ));
234
235 }

The fcur_to_float4() funct ion convert s an FCUR value int o a norm alized FLOAT4 ( t hat is, REAL) value. There isn't anyt hing fancy in t his
funct ion; let normalize() do t he heavy lift ing.

List ing 6.17 shows t he float4_to_fcur() funct ion:

List ing 6 .1 7 . fcur.c ( Part 9 )

237 /*
238 ** Name: float4_to_fcur()
239 **
240 ** Converts the given float4 value into an
241 ** fcur value
242 */
243
244 PG_FUNCTION_INFO_V1(float4_to_fcur);
245
246 Datum float4_to_fcur(PG_FUNCTION_ARGS)
247 {
248 float4 src = PG_GETARG_FLOAT4(0);
249 fcur * result = (fcur *)palloc( sizeof( fcur ));
250
251 result->fcur_units = src;
252 result->fcur_xrate = 1.0;
253
254 strcpy( result->fcur_name, baseCurrencyName );
255
256 PG_RETURN_POINTER( result );
257 }

The float4_to_fcur() funct ion is a bit longer, but it 's not com plex. You allocat e space for t he result using palloc(); t hen creat e t he result
as a value expressed in your local currency.

When you t ell Post greSQL about t hese funct ions, you won't follow t he sam e form t hat you have used in earlier exam ples:
movies=# CREATE OR REPLACE FUNCTION FCUR( FLOAT4 )
movies-# RETURNS FCUR
movies-# AS 'fcur.so','float4_to_fcur'
movies-# LANGUAGE 'C'
movies=# IMMUTABLE STRICT
CREATE

Not ice t hat t he int ernal ( C) nam e for this funct ion is float4_to_fcur(), but t he ext ernal ( Post greSQL) nam e is FCUR. Older versions of
Post greSQL ( release 7.2 or older) know t hat t he FCUR funct ion can be used to im plicit ly convert a FLOAT4 ( or REAL) value int o a FCUR value.
Post greSQL considers a funct ion to be a conversion funct ion if all of t he following are t rue:

€ The nam e of t he funct ion is t he sam e as t he nam e of a dat a type.

€ The funct ion ret urns a value whose type is t he sam e as t he funct ion's nam e.

€ The funct ion t akes a single argum ent of som e ot her dat a t ype.

You can see t hat t he FCUR funct ion m eets t hese crit eria. Let 's creat e t he FLOAT4 funct ion along t he sam e pat t ern:

movies=# CREATE OR REPLACE FUNCTION FLOAT4( FCUR )


movies-# RETURNS FLOAT4
movies-# AS 'fcur.so','fcur_to_float4'
movies-# LANGUAGE 'C'
movies=# IMMUTABLE STRICT
CREATE

I f you're using Post greSQL version 7.3 or lat er, you m ust explicit ly t ell t he Post greSQL server t hat FLOAT4( FCUR ) and FCUR( FLOAT4 ) are
conversion funct ions. To creat e a CAST t hat will convert a FLOAT4 value t o an FCUR value, execut e t he following com m and:

Code View: Scroll / Show All

movies=# CREATE CAST( FLOAT4 AS FCUR ) WITH FUNCTION FCUR( float4 ) AS IMPLICIT;

The CREATE CAST com m and specified t he source t ype ( FLOAT4) , t he t arget type ( FCUR) and t he signat ure of t he conversion funct ion ( FCUR
( float4 )) .The AS IMPLICIT clause t ells Post greSQL t hat it can silent ly convert FLOAT4 values t o FCUR values whenever it needs t o; you don't
have t o writ e t hings like CAST( 4.0 AS FCUR) once you've defined an IMPLICIT CAST.

Don't forget t o create a CAST t hat will convert values in t he ot her direct ion:

Code View: Scroll / Show All

movies=# CREATE CAST( FCUR AS FLOAT4) WITH FUNCTION FLOAT4( FCUR ) AS IMPLICIT;

Now Post greSQL knows how to ( im plicit ly) convert bet ween FLOAT4 values and FCUR values. Why is t hat so im port ant ? You can now use a FCUR
value in any cont ext in which a FLOAT4 value is allowed. I f you haven't defined a part icular funct ion ( or operat or) , Post greSQL will im plicit ly
convert t he FCUR value int o a FLOAT4 value and then choose t he appropriat e funct ion ( or operat or) .

For exam ple, you have not defined a m ult iplicat ion operat or for your FCUR dat a t ype, but Post greSQL knows how to m ult iply FLOAT4 values:

movies=# SELECT *, (val * 5) as "Result" FROM fcur_test;


pkey | val | Result
------+--------------------+------------------
1 | 1(1/???) | 5
2 | 1(0.5/???) | 10
3 | 3(1/US$) | 15
4 | 5(0.687853/GBP) | 36.3449764251709
5 | 10(7.2566/FRF) | 6.89027905464172
6 | 1(1.5702/CA$) | 3.18430781364441
7 | 1.5702(1.5702/CA$) | 5

You can now m ult iply FCUR values. Not ice t hat t he Result colum n does not cont ain FCUR values. Post greSQL convert ed t he FCUR values int o
FLOAT4 values and t hen perform ed t he m ult iplicat ion. Of course, you can cast t he result back to FCUR form . Here, we use t he @ ( absolut e value)
operat or to convert from FCUR t o FLOAT4 form and t hen cast t he result back int o FCUR form :

movies=# SELECT *, CAST( abs(val) AS FCUR ) FROM fcur_test;


pkey | val | fcur
------+--------------------+-----------------
1 | 1(1/???) | 1(1/US$)
2 | 1(0.5/???) | 2(1/US$)
3 | 3(1/US$) | 3(1/US$)
4 | 5(0.687853/GBP) | 7.269(1/US$)
5 | 10(7.2566/FRF) | 1.37806(1/US$)
6 | 1(1.5702/CA$) | 0.636862(1/US$)
7 | 1.5702(1.5702/CA$) | 1(1/US$)
(7 rows)

Not ice t hat all t he result values have been norm alized int o your local currency.
I n d e x i n g Cu st o m D a t a T y p e s

At t his point , you have a reasonably com plet e cust om dat a t ype. You can creat e and display FCUR values, st ore t hem in a
t able, com pare t wo FCUR values, and convert t hem t o ( and from ) ot her dat a t ypes. But you're m issing one im port ant feat ure:
You can't creat e an index t hat includes an FCUR value. Once you have all of t he com parison operat ors ( <, <=, =, >=, and so
on) in place you are t wo short st eps away.

To index values of a given dat a t ype, you m ust creat e an operat or class t hat t ells Post greSQL which operat ors it should use
for t hat t ype. You m ay recall from Chapt er 3, "Post greSQL Synt ax and Use," t hat Post greSQL support s a num ber of index
t ypes (B- t ree, hash, R- t ree, and GiST) . Each index t ype requires a different set of operat ors. For exam ple, t o build a B-t ree
index, Post greSQL can m ake use of five different operat ors: <, <=, =, >=, and >. Before you can creat e an operat or class t hat
Post greSQL can use t o build B- t ree indexes over FCUR values, you'll need one m ore funct ion.

List ing 6.18 shows t he fcur_cmp() funct ion t hat com pares t wo FCUR values:

List ing 6 .1 8 . fcur.c ( Pa rt 1 0 )

305 PG_FUNCTION_INFO_V1(fcur_cmp);
306
307 Datum fcur_cmp(PG_FUNCTION_ARGS)
308 {
309 fcur * left = (fcur *)PG_GETARG_POINTER(0);
310 fcur * right = (fcur *)PG_GETARG_POINTER(1);
311 double left_dbl = normalize( left );
312 double right_dbl = normalize( right );
313
314 if( left_dbl > right_dbl )
315 PG_RETURN_INT32( 1 );
316 else if( left_dbl < right_dbl )
317 PG_RETURN_INT32 ( -1 );
318 else
319 PG_RETURN_INT32 ( 0 );
320 }

Post greSQL will call fcur_cmp() repeat edly as it builds a B- t ree index. fcur_cmp() expect s t wo argum ent s, bot h of t ype
fcur. Aft er norm alizing t he values, fcur_cmp() ret urns +1 if t he first argum ent is great er t han t he second, -1 if t he second
argum ent is great er t han t he first , or 0 if t he argum ent s are equal.

Don't forget t o t ell Post greSQL how t o find t his funct ion:

CREATE OR REPLACE FUNCTION FCUR_CMP( FCUR, FCUR )


RETURNS INT4
AS 'fcur.so'
LANGUAGE 'C'
IMMUTABLE STRICT;

Now you have all of t he pieces in place; you can creat e an operat or class wit h t he following com m and:

CREATE OPERATOR CLASS fcur_ops


DEFAULT FOR TYPE fcur USING BTREE as
OPERATOR 1 <,
OPERATOR 2 <=,
OPERATOR 3 =,
OPERATOR 4 >=,
OPERATOR 5 >,
FUNCTION 1 fcur_cmp( fcur, fcur );

That 's it ; once you've creat ed an operat or class for t ype FCUR, you can creat e an index t hat includes values of t hat t ype. To
creat e an operat or class for t he ot her index t ypes ( hash, R- t ree, and GiST) , you'll have t o creat e a few m ore support
funct ions. See sect ion 31 ( "Ext ending SQL") of t he Post greSQL reference docum ent at ion for m ore det ails.
Chapter 7 . PL/ pgSQL

PL/ pgSQL ( Procedural Language/ Post greSQL) is a language t hat com bines t he expressive power of SQL wit h t he m ore t ypical
feat ures of a program m ing language. PL/ pgSQL adds cont rol st ruct ures such as condit ionals, loops, and except ion handling t o
t he SQL language. When you writ e a PL/ pgSQL funct ion, you can include any and all SQL com m ands, as well as t he
procedural st at em ent s added by PL/ pgSQL.

Funct ions writ t en in PL/ pgSQL can be called from ot her funct ions. You can also define a PL/ pgSQL funct ion as a t rigger. A
t rigger is a procedure t hat execut es when som e event occurs. For exam ple, you m ight want t o execut e a PL/ pgSQL funct ion
that fires when a new row is added t o a t able—t hat 's what a t rigger is for. You can define t riggers for t he INSERT, UPDATE,
and DELETE com m ands.

I n st a llin g P L/ p g SQ L

Post greSQL can support a variet y of procedural languages. Before you can use a procedural language, you have t o inst all it
int o t he dat abase. Fort unat ely, t his is a sim ple procedure.

The createlang shell script inst alls PL/ pgSQL int o a dat abase. I f you inst all PL/ pgSQL in t he template1 dat abase, it will
aut om at ically be inst alled in all dat abases creat ed from t hat t em plat e. The form at for createlang is

createlang plpgsql database-name

To inst all PL/ pgSQL in t he movies dat abase, execut e t he following com m and:

$ createlang plpgsql movies

Not ice t hat t his is a com m and-line ut ilit y, not a psql com m and.
La n g u a g e St r u ct u r e

PL/ pgSQL is t erm ed a block - st ruct ured language. A block is a sequence of st at em ent s bet ween a m at ched set of DECLARE/BEGIN and END
st at em ent s. Blocks can be nest ed—m eaning t hat one block can ent irely cont ain anot her block, which in t urn can cont ain ot her blocks, and so
on. For exam ple, here is a PL/ pgSQL funct ion:

Code View: Scroll / Show All

1 --
2 -- ch07.sql
3 --
4
5 CREATE OR REPLACE FUNCTION my_factorial(value INTEGER) RETURNS INTEGER AS $$
6 DECLARE
7 arg INTEGER;
8 BEGIN
9
10 arg := value;
11
12 IF arg IS NULL OR arg < 0 THEN
13 RAISE NOTICE 'Invalid Number';
14 RETURN NULL;
15 ELSE
16 IF arg = 1 THEN
17 RETURN 1;
18 ELSE
19 DECLARE
20 next_value INTEGER;
21 BEGIN
22 next_value := my_factorial(arg - 1) * arg;
23 RETURN next_value;
24 END;
25 END IF;
26 END IF;
27 END;
28 $$ LANGUAGE 'plpgsql';

The body of my_factorial() is act ually t he st ring bet ween t he opening dollar quot es ( following t he word AS) and t he closing dollar quot es
( j ust before t he word LANGUAGE) .

This funct ion cont ains t wo blocks of code. The first block st art s at line 6 and ends at line 27. The second block, which is nest ed inside t he
first , st art s at line 19 and ends at line 24. The first block is called an out er block because it cont ains t he inner block.

I 'll t alk about variable declarat ions in m ore det ail in a m om ent , but I want t o point out a few t hings here. At line 7, w e declare a variable
nam ed arg. This variable has a well- defined lifet im e. arg com es int o exist ence when t he funct ion reaches t he fir st DECLARE st at em ent and
goes out of exist ence as soon as t he funct ion reaches t he END st at em ent at line 27. The lifet im e of a variable is also referred t o as it s scope.
You can refer t o a variable in any st at em ent wit hin t he block t hat defines t he scope of t he variable. I f you t ry t o refer t o a variable out side of
it s scope, you will receive a com pilat ion error. Rem em ber t hat you have t wo ( nest ed) blocks in t his funct ion: t he out er block and t he inner
block. Variables declared in an out er block can be used in inner blocks, but t he reverse is not t rue. At line 22 ( which is in t he inner block) , we
use t he arg variable, which was declared in t he out er block. The variable next_value is declared w it hin t he inner block: I f you t ry t o use
next_value in t he out er block, you'll get an error.

This funct ion ( my_factorial()) cont ains t wo blocks, one nest ed wit hin t he ot her. You can nest blocks as deeply as you need t o. You can also
define blocks t hat are not nest ed. Here is t he my_factorial() funct ion again, but t his t im e, I 've included a few m ore blocks:

Code View: Scroll / Show All

1 --
2 -- ch07.sql
3 --
4
5 CREATE FUNCTION my_factorial( value INTEGER ) RETURNS INTEGER AS $$
6 DECLARE
7 arg INTEGER;
8 BEGIN
9
10 arg := value;
11
12 IF arg IS NULL OR arg < 0 THEN
13 BEGIN
14 RAISE NOTICE 'Invalid Number';
15 RETURN NULL;
16 END;
17 ELSE
18 IF arg = 1 THEN
19 BEGIN
20 RETURN 1;
21 END;
22 ELSE
23 DECLARE
24 next_value INTEGER;
25 BEGIN
26 next_value := my_factorial(arg - 1) * arg;
27 RETURN next_value;
28 END;
29 END IF;
30 END IF;
31 END;
32 $$ LANGUAGE 'plpgsql';

This version st ill has an out er block ( lines 6 t hrough 31) , but you have m ult iple inner blocks: lines 13 t hrough 16, lines 19 t hrough 21, and
lines 23 t hrough 28. As I said earlier, variables declared in an out er block can be used in inner blocks but t he reverse is not t r ue. I f y ou had
declared any variables in t he block st art ing at line 19, you could not use any of t hose variables past t he end of t he block ( at line 21) .

Not ice t hat you can indicat e t he beginning of a block wit h a DECLARE st at em ent or w it h a BEGIN st at em ent . I f y ou need t o declare any
variables wit hin a block, you m ust include a DECLARE sect ion. I f you don't need any local variables wit hin a block, t he DECLARE sect ion is
opt ional ( an em pt y DECLARE sect ion is perfect ly legal) .

Q u o t i n g Em b e d d e d St r i n g s

Prior t o version 8.0, including st ring lit erals in a PL/ pgSQL funct ion was difficult and error prone. Because t he body of a PL/ pgSQL funct ion is
it self a st ring, you had t o double up t he quot e charact ers around any st ring lit erals wit hin t he funct ion.

Take a close look at line 14 in t he previous exam ple:

RAISE NOTICE 'Invalid Number';

Not ice t hat t he st ring lit eral Invalid Number is surrounded by a set of single quot es. You can writ e an em bedded st ring value t hat way
because t he body of t he funct ion is defined in a st ring delim it ed by Post greSQL's new dollar - quot ing m echanism . I f y ou don't use dollar-
quot ing t o define funct ion body, you m ust double up t he quot es, like t his:

RAISE NOTICE ''Invalid Number'';

I f you're using a version of Post greSQL older t han 8.0, you can't use dollar - quot ing and you'll have t o writ e em bedded st ring lit erals in one of
t he ot her form s described in Chapt er 2, " Working w it h Dat a in Post greSQL." You could have writ t en t he em bedded st ring in any of t he t hree
following form s:

RAISE NOTICE ''Invalid Number'';

RAISE NOTICE \'Invalid Number\';

RAISE NOTICE \047Invalid Number\047;

CREATE FUNCTION

Now, let 's go back and look at t he com ponent s of a funct ion in m ore det ail.

You define a new PL/ pgSQL funct ion using t he CREATE FUNCTION com m and. The CREATE FUNCTION com m and com es in t wo form s. The first
form is used for language int erpret ers t hat ar e em bedded int o t he Post greSQL server —PL/ pgSQL funct ions fall int o t his cat egory:

CREATE [OR REPLACE] FUNCTION name ( [[argname] argtype [, ...] ] )


RETURNS return_type
AS $$definition$$
LANGUAGE langname
[ WITH ( attribute [, ...] ) ]

The second form is used t o define funct ions t hat are defined in an ext ernal language and com piled int o a dynam ically loaded obj ect m odule:

CREATE [OR REPLACE] FUNCTION name ( [[argname] argtype [, ...] ] )


RETURNS return_type
AS $$obj_file$$, $$link_symbol$$
LANGUAGE langname
[ WITH ( attribute [, ...] ) ]

I covered com piled funct ions in m ore det ail in Chapt er 6, " Ext ending Post greSQL." For t his chapt er, I 'll focus on t he first form . Don't forget , if
you're using a version of Post greSQL older t han 8.0, you can't use $$ t o delim it st ring values and you'll have t o carefully quot e em bedded
st rings as described earlier in t his chapt er.

Each funct ion has a nam e. However, t he nam e alone is not enough t o uniquely ident ify a Post greSQL funct ion. I nst ead, t he funct ion nam e
and t he dat a t ypes of each argum ent ( if any) are com bined int o a signat ure. A funct ion's signat ure uniquely ident ifies t he funct ion wit hin a
dat abase. This m eans t hat you can define m any my_factorial() funct ions:

CREATE FUNCTION my_factorial( INTEGER )...


CREATE FUNCTION my_factorial( REAL )...
CREATE FUNCTION my_factorial( NUMERIC )...
Each of t hese funct ions is uniquely ident ified by it s signat ure. When you call one of t hese funct ions, you provide t he funct ion nam e and an
argum ent ; Post greSQL det erm ines which funct ion t o use by com paring t he dat a t ype of t he argum ent s t hat you provide wit h t he funct ion
signat ures. I f an exact m at ch is found, Post greSQL uses t hat funct ion. I f Post greSQL can't find an exact m at ch, it t ries t o find t he closest
m at ch.

When you creat e a new funct ion, you specify a list of argum ent s required by t hat funct ion. I n m ost program m ing languages, you would
declare a nam e and a t y pe for each funct ion argum ent . I n PL/ pgSQL, y ou declare only t he dat a t ype. The first argum ent is aut om at ically
nam ed " $1", t he second argum ent is nam ed " $2", and so fort h, up t o a m axim um of 32 argum ent s ( if you're using a version of Post greSQL
older t han 8.0, you're lim it ed t o 16 argum ent s per funct ion) . St art ing wit h Post greSQL version 8.0, y ou can include argum ent nam es in t he
CREATE FUNCTION com m and. That m eans t hat you can define t he my_factorial() funct ion like t his:

CREATE FUNCTION my_factorial( inputArgument INTEGER )...


CREATE FUNCTION my_factorial( inputArgument REAL )...
CREATE FUNCTION my_factorial( inputArgument NUMERIC )...

I nside of my_factorial(), you can refer t o t he first argum ent as $1 or as inputArgument. I f you include argum ent nam es in t he CREATE
FUNCTION com m and, t he nam es are not considered t o be part of t he funct ion signat ure. I f you define a funct ion such as

CREATE FUNCTION my_factorial( inputArgument INTEGER )...

t he funct ion's signat ure is my_factorial( INTEGER )—you can DROP t he funct ion wit hout specifying argum ent nam es ( in fact , if you do
specify argum ent nam es in a DROP FUNCTION com m and, t he nam es are ignored) .

You can use predefined dat a t ypes, user - defined dat a t ypes, and arrays of t hose t ypes in a PL/ pgSQL funct ion.

I t is im port ant t o rem em ber t hat PL/ pgSQL does not support default param et ers. I f y ou define a funct ion t hat requires t hree param et ers, you
cannot call t hat funct ion w it h fewer ( or m ore) param et ers. I f you find t hat y ou need a funct ion w it h a variable argum ent list , you can usually
overload your funct ion t o obt ain t he sam e effect . When you overload a funct ion, you define t wo ( or m ore) funct ions wit h t he sam e nam e but
different argum ent list s. For exam ple, let 's define a funct ion t o com put e t he due dat e for a t ape rent al:

1 --
2 -- ch07.sql
3 --
4
5 CREATE FUNCTION compute_due_date(DATE) RETURNS DATE AS $$
6 DECLARE
7
8 due_date DATE;
9 rental_period INTERVAL := '7 days';
10
11 BEGIN
12
13 due_date := $1 + rental_period;
14
15 RETURN due_date;
16
17 END;
18 $$ LANGUAGE 'plpgsql';

This funct ion t akes a single param et er, a DATE value, and ret urns t he dat e one week lat er. You m ight want a second version of t his funct ion
t hat expect s t he rent al dat e and a rent al period:

20 -- ch07.sql
21 --
22 CREATE FUNCTION compute_due_date(DATE, INTERVAL) RETURNS DATE AS $$
23 BEGIN
24
25 RETURN( $1 + $2 );
26
27 END;
28 $$ LANGUAGE 'plpgsql';

Now you have t wo funct ions nam ed compute_due_date(). One funct ion expect s a DATE value, and t he ot her expect s a DATE value and an
INTERVAL value. The first funct ion compute_due_date(DATE), provides t he equivalent of a default param et er. I f you call compute_due_date()
wit h a single argum ent , t he rental_period default s t o seven days.

I 'd like t o point out t wo t hings about t he compute_due_date(DATE, INTERVAL) funct ion.

First , a st ylist ic issue—t he RETURN st at em ent t akes a single argum ent , t he value t o be ret urned t o t he caller. You can RETURN any expression
t hat evaluat es t o t he return_type of t he funct ion ( we'll t alk m ore about a funct ion's return_type in a m om ent ) . I find it easier t o read a
RETURN st at em ent if t he expression is enclosed in parent heses ( see line 25) .

Second, you'll not ice t hat I did not DECLARE any local variables. You can t reat param et er variables j ust like any ot her variable—I used t hem in
an expression in line 25. I t 's a rare occasion when y ou should set t le for t he aut om at ic variable nam es supplied for funct ion param et ers. The
nam e "$1" doesn't convey m uch m eaning beyond t elling you t hat t his variable happens t o be t he first param et er. You should really provide a
m eaningful nam e for each param et er; t his gives t he reader som e idea of what you int ended t o do wit h each param et er.

I f you're using an older version of Post greSQL ( or you're writ ing code t hat m ust work on an older version) , you can use t he ALIAS st at em ent
t o give a second, m ore m eaningful nam e t o a param et er. Here is t he compute_due_date(DATE, INTERVAL) funct ion again, but t his t im e I
have given alt ernat e nam es t o t he param et ers:

20 -- ch07.sql
21 --
22 CREATE FUNCTION compute_due_date(DATE, INTERVAL) RETURNS DATE AS '
23 DECLARE
24 rental_date ALIAS FOR $1;
25 rental_period ALIAS FOR $2;
26 BEGIN
27
28 RETURN( rental_date + rental_period );
29
30 END;
31 ' LANGUAGE 'plpgsql';

ALIAS gives you an alt ernat e nam e for a param et er: you can st ill refer t o an aliased param et er using t he $n form , but I don't recom m end it .
Why bot her t o give a m eaningful nam e t o a param et er and t hen ignore it ?

St art ing wit h Post greSQL version 8.0, you can skip t he ALIAS com m ands and sim ply nam e t he argum ent s in t he CREATE FUNCTION com m and,
like t his:

CREATE FUNCTION compute_due_date(rental_date DATE, rental_period INTERVAL)


RETURNS DATE AS ...

When you creat e a funct ion, you m ust declare t he dat a t ype of t he ret urn value. Our compute_due_date() funct ions ret urn a value of t ype
DATE. A value is ret urned from a funct ion using t he RETURN expression st at em ent . Keep in m ind t hat PL/ pgSQL will t ry t o convert t he
ret urned expression int o t he t ype t hat you specified when you creat ed t he funct ion. I f you t ried t o RETURN( ''Bad Value'' ) from t he
compute_due_date() funct ion, you would get an error ( Bad Date External Representation) . We'll see a special dat a t ype a lit t le lat er
( TRIGGER, or in versions older t han 8.0, OPAQUE) t hat can be used only for t rigger funct ions.

I f you're writ ing a PL/ pgSQL funct ion t hat you want t o run in a version of Post greSQL older t han 7.3, you m ust ensure t hat t he funct ion
ret urns a value, even it if it only ret urns NULL. St art ing wit h version 7.3, you can define funct ions t hat ret urn t ype void. A funct ion t hat
ret urns t ype void doesn't act ually ret urn a value—you would call such a funct ion for t he side effect s provided by t he funct ion.

I 'll skip over t he funct ion body [ 1] for t he m om ent and look at t he final com ponent [ 2] required t o define a new funct ion. Post greSQL funct ions
can be writ t en in a variet y of languages. When you creat e a new funct ion, t he last com ponent t hat you specify is t he nam e of t he language in
which t he body of t he funct ion is writ t en. All t he funct ions t hat you will see in t his chapt er are writ t en in PL/ pgSQL, which Post greSQL knows
as LANGUAGE 'plpgsql'.

[ 1]
The funct ion body is everyt hing bet ween t he AS keyword and t he LANGUAGE keyword. The funct ion body is specified in t he
form of a st ring.

[ 2] When you creat e a funct ion, you can also specify a set of opt ional at t ribut es t hat apply t o t hat funct ion. These at t ribut es t ell
Post greSQL about t he behavior of t he funct ion so t hat t he query opt im izer can know whet her it can t ake cert ain short cut s when
evaluat ing t he funct ion. See t he CREATE FUNCTION sect ion in t he Post greSQL Program m er's Guide for m ore inform at ion.

DROP FUNCTION

Before y ou experim ent m uch m ore wit h PL/ pgSQL funct ions, it m ight be useful for you t o know how t o replace t he definit ion of a funct ion.

I f you are using Post greSQL 7.2 or lat er, you can use t he CREATE OR REPLACE FUNCTION ... synt ax. I f a funct ion w it h t he sam e signat ure
already exist s, Post greSQL will silent ly replace t he old version of t he funct ion; ot herwise, a new funct ion is creat ed.

I f you are using a version of Post greSQL older t han 7.2, you will have t o DROP t he old funct ion before you can creat e a new one. The synt ax
for t he DROP FUNCTION com m and is

DROP FUNCTION name( [[argname] argtype [, ...] ] );

Not ice t hat you have t o provide t he com plet e signat ure when you drop a funct ion; ot herwise, Post greSQL would not know which version of
t he funct ion t o rem ove.

Of course, you can use t he DROP FUNCTION com m and t o sim ply rem ove a funct ion—you don't have t o replace it w it h a new version.
Fu n ct i o n Bo d y

Now t hat you have an overview of t he com ponent s of a PL/ pgSQL funct ion, let 's look at t he funct ion body in great er det ail. I 'll st art by showing
you how t o include docum ent at ion ( t hat is, com m ent s) in your PL/ pgSQL funct ions. Next , I 'll look at variable declarat ions. Finally, I 'll finish up
t his sect ion by describing t he different kinds of st at em ent s t hat you can use inside of a PL/ pgSQL funct ion.

Co m m e n t s

There are t wo com m ent st yles in PL/ pgSQL. The m ost frequent ly seen com m ent indicat or is t he double dash: --. A double dash int roduces a
com m ent t hat ext ends t o t he end of t he current line. For exam ple:

-- This line contains a comment and nothing else


DECLARE
customer_id INTEGER; -- This is also a comment

-- due_date DATE; -- This entire line is a comment


-- because it begins with a '--'

PL/ pgSQL underst ands C- st yle com m ent s as well. A C- st yle com m ent begins wit h t he charact ers /* and ends wit h t he charact ers */. A C- st yle
com m ent can span m ult iple lines:

/*
NAME: compute_due_date()

DESCRIPTION: This function will compute the due date for a tape
rental.

INPUT:
$1 -- Date of original rental

RETURNS: A date indicating when the rental is due.


*/

CREATE FUNCTION compute_due_date( DATE ) RETURNS DATE


...

Choosing a com m ent st yle is purely a m at t er of personal preference. Of course, t he person choosing t he st yle m ay not be you—you m ay have
to conform to coding st andards im posed by your cust om er ( and/ or em ployer) . I t end to use only t he double- dash com m ent st yle in PL/ pgSQL
code. I f I want t o include a m ult i- line com m ent , I st art each line wit h a double dash:

----------------------------------------------------------------------
-- NAME: compute_due_date()
--
-- DESCRIPTION: This function will compute the due date for a tape
-- rental.
--
-- INPUT:
-- $1 -- Date of original rental
--
-- RETURNS: A date indicating when the rental is due.

CREATE FUNCTION compute_due_date( DATE ) RETURNS DATE


...

I find t hat t he double- dash st yle looks a lit t le cleaner.

V a r ia ble s

The variable declarat ions t hat you've seen up to t his point have all been pret t y sim ple. There are act ually five ways t o int roduce a new variable
( or at least a new variable nam e) int o a PL/ pgSQL funct ion.

€ Each param et er defines a new variable.

€ You can declare new variables in t he DECLARE sect ion of a block.

€ You can creat e an alt ernat e nam e for a funct ion param et er using t he ALIAS st at em ent .

€ You can define a new nam e for a variable ( invalidat ing t he old nam e) using t he RENAME st at em ent .

€ The it erat or variable for an int eger - based FOR loop is aut om at ically declared t o be an int eger.

Let 's look at t hese variables one at a t im e.

Funct ion Pa r a m e t e r s

I m ent ioned earlier in t his chapt er t hat each param et er in a PL/ pgSQL funct ion is aut om at ically assigned a nam e. The first param et er ( in left -
t o- right order) is nam ed $1, t he second param et er is nam ed $2, and so on. You define t he dat a t ype for each param et er in t he funct ion
definit ion- for exam ple:
CREATE FUNCTION write_history( DATE, rentals )...

This funct ion expect s t wo param et ers. The first param et er is nam ed $1 and is of t ype DATE. The second param et er is nam ed $2 and is of t ype
rentals. I f you're using a newer version of Post greSQL ( 8.0 or lat er) , you can also define your own nam es for funct ion param et ers:

CREATE FUNCTION write_history( historyDate DATE, rentalRecord rentals )...

I n t his case, you've given t wo nam es t o each param et er. You can refer t o t he first param et er as historyDate or as $1 and t he second
param et er as rentalRecord or $2. When you include param et er nam es in a CREATE FUNCTION com m and, you're assigning aliases for t he
param et ers wit hout explicit ly writ ing ALIAS com m ands.

Not ice t hat t he write_history() funct ion ( in t he preceding code line) expect s an argum ent of t ype rentals. I n t he sam ple dat abase,
'rentals' is act ually t he nam e of a t able. I nside of t he write_history() funct ion, you can use t he rentalRecord param et er ( also known as
$2) as if it were a row in t he rentals t able. That m eans t hat you can work wit h rentalRecord.tape_id, rentalRecord.customer_id,
rentalRecord.rental_date, or $2.tape_id, $2.customer_id, and $2.rental_date.

When you call t his funct ion, you need t o pass a row from t he rentals t able as t he second argum ent . For exam ple:

SELECT write_history( CURRENT_DATE, rentals ) FROM rentals;

DECLARE

The second way t o int roduce a new variable int o a PL/ pgSQL funct ion is t o list t he variable in t he DECLARE sect ion of a block. The nam e of a
non- param et er variable can include alphabet ic charact ers ( A- Z) , underscores, and digit s. Variable nam es m ust begin wit h a let t er ( A- Z or a- z)
or an underscore. Nam es are case- insensit ive: my_variable can also be writ t en as My_Variable, and bot h st ill refer t o t he sam e variable.

The PL/ pgSQL docum ent at ion m ent ions t hat you can force a variable nam e t o be case- sensit ive by enclosing it in double quot es. For exam ple,
"pi". As of Post greSQL 7.1.3, t his does not seem t o work. You can enclose a variable nam e wit hin double quot es if you need t o st art t he nam e
wit h a digit .

Oddly enough, you can act ually DECLARE a variable whose nam e st art s wit h a '$', $3, for exam ple, but I wouldn't recom m end it ; I would expect
t hat t his feat ure ( bug?) m ay be rem oved ( fixed?) at som e point in t he fut ure.

The com plet e synt ax for a variable declarat ion is

var-name [CONSTANT] var-type [NOT NULL] [{ DEFAULT | := } expression];

Som e of t he exam ples in t his chapt er have declared variables using t he m ost basic form :

due_date DATE;
rental_period INTERVAL := ''7 days'';

The first line creat es a new variable nam ed due_date. The dat a t ype of due_date is DATE. Because I haven't explicit ly provided an init ial value
for due_date, it will be init ialized t o NULL.

The second line defines a new INTERVAL variable nam ed rental_period. I n t his case, I have provided an init ial value, so rental_period will
be init ialized t o t he INTERVAL value '7 days'. I could have writ t en t his declarat ion as

rental_period INTERVAL DEFAULT ''7 days'';

I n t he DECLARE sect ion of a block, DEFAULT is synonym ous wit h ':='.

The init ializer expression m ust evaluat e t o a value of t he correct t ype. I f you are creat ing an INTEGER variable, t he init ializer expression m ust
evaluat e to an INTEGER value or t o a t ype t hat can be coerced int o an INTEGER value.

I n newer versions of Post greSQL, you can declare array variables by writ ing a set of square bracket s ( and an opt ional elem ent count ) following
t he dat a t ype. For exam ple, t he declarat ion

montly_balances NUMERIC(7.2)[12] := '{}';

defines a variable nam ed monthly_balances as an array of 12 num eric values. There is one count er - int uit ive quirk t hat you should know about
when you declare an array variable. I f you define an array wit hout an init ializer, t he array is NULL—apparent ly, t hat 's not t he sam e t hing as
saying t hat t he array is full of NULL values. You can't insert individual values int o a NULL array. That m eans t hat code such as t he following will
silent ly fail:

DECLARE
monthly_balances NUMERIC(7.2)[12];
BEGIN
monthly_balances[1] := 10;
monthly_balances[2] := monthly_balances[1] * 1.10;
...
You can't insert a value int o a NULL array, but you can copy an ent ire array over t he t op of a NULL array:

DECLARE
new_balances NUMERIC(7.2)[12];
old_balances NUMERIC(7,2)[12] := '{}';
BEGIN
new_balances := old_balances;
...

So you can only put a value int o an array by init ializing it or by copying anot her array over t he t op of it .

You can define PL/ pgSQL funct ions t hat t ake array values as argum ent s, and you can ret urn array values from PL/ pgSQL funct ions.

Prior to Post greSQL version 8.0, t he DECLARE sect ion had a couple of surprises up it s sleeve. First , you could use any of t he funct ion
param et ers in t he init ializer expression, even if you ALIASed t hem . The following is illegal:

CREATE FUNCTION compute_due_date(DATE) RETURNS DATE AS '


DECLARE
due_date DATE := $1 + ''7 days''::INTERVAL;
...

ERROR: Parameter $1 is out of range

The second issue was t hat once you creat ed a variable in a DECLARE sect ion, you could not use t hat variable lat er wit hin t he sam e DECLARE
sect ion. That m eant t hat you couldn't do som et hing like

CREATE FUNCTION do_some_geometry(REAL) RETURNS REAL AS '


DECLARE
pi CONSTANT REAL := 3.1415926535;
radius REAL := 3.0;
diameter REAL := pi * ( radius * radius );
...

ERROR: Attribute 'pi' not found

Bot h of t hese problem s have been fixed in release 8.0.

Not ice in t he previous exam ple t hat I declared pi to be a 'CONSTANT REAL'. When you define a variable as CONSTANT, you prevent assignm ent
to t hat variable. You m ust provide an init ializer for a CONSTANT.

The final m odifier for a variable declarat ion is NOT NULL. Defining a variable t o be NOT NULL m eans t hat you will receive an error if you t ry to
set t hat variable t o NULL. You m ust provide an init ializer when you creat e a NOT NULL variable [ 3] .

[ 3] This m akes perfect sense if you t hink about it . I f you don't provide an init ializer, PL/ pgSQL will init ialize each variable to
NULL—you can't do t hat if you have declared t he variable t o be NOT NULL.

Now you can put all t hese pieces t oget her. The following declarat ions are ident ical in funct ion:

pi CONSTANT REAL NOT NULL DEFAULT 3.1415926535;


pi CONSTANT REAL NOT NULL := 3.1415926535;
pi CONSTANT REAL := 3.1415926535;

Each declares a REAL variable nam ed pi, wit h an init ial value of 3.14159265. The NOT NULL clause is superfluous here because we have
declared pi to be a const ant and we have given it a non- null init ial value; it 's not a bad idea t o include NOT NULL for docum ent at ion purposes.

The default value for a variable is com put ed each t im e you ent er t he block t hat declares it . I f you define a default value in t erm s of an
expression, t he variables and funct ions wit hin t hat expression can change value from one execut ion to t he next . For exam ple, if an inner block
declares a variable whose default value is defined by a variable in an out er block, t he default value w ill vary wit h t he out er variable.

Pseudo Da t a Types—%TYPE a n d %ROWTYPE

When you creat e a PL/ pgSQL variable, you m ust declare it s dat a t ype. Before m oving on t o t he ALIAS com m and, t here are a few pseudo dat a
t ypes t hat you should know about .

%TYPE let s you define one variable t o be of t he sam e t ype as anot her. Quit e oft en, you will find t hat you need to t em porarily st ore a value t hat
you have ret rieved from a t able, or you m ight need to m ake a copy of a funct ion param et er. Let 's say t hat you are writ ing a funct ion t o process
a rentals record in som e way:

CREATE FUNCTION process_rental( rentals ) RETURNS BOOLEAN AS $$


DECLARE
original_tape_id CHAR(8);
original_customer_id INTEGER;
original_rental_row ALIAS FOR $1;

BEGIN

original_tape_id := original_rental_row.tape_id;
original_customer_id := original_rental_row.customer_id;
...
I n t his snippet , you are m aking a local copy of t he rentals.tape_id and rentals.customer_id colum ns. Wit hout %TYPE, you have t o ensure
t hat you use t he correct dat a t ypes when you declare t he original_tape_id and original_customer_id variables.

That m ight not sound like such a big deal now, but what about six m ont hs lat er when you decide t hat eight charact ers isn't enough to hold a
t ape I D?

I nst ead of doing all t hat m aint enance work yourself, you can let PL/ pgSQL do t he work for you. Here is a m uch bet t er version of t he
process_rental() funct ion:

CREATE FUNCTION process_rental( rentals ) RETURNS BOOLEAN AS $$


DECLARE
original_tape_id rentals.tape_id%TYPE;
original_customer_id rentals.customer_id%TYPE;
original_rental_row ALIAS FOR $1;

BEGIN

original_tape_id := original_rental_row.tape_id;
original_customer_id := original_rental_row.customer_id;
...

By using %TYPE, I 've t old PL/ pgSQL to creat e t he original_tape_id variable using what ever t ype rentals.tape_id is defined to be. I 've also
creat ed original_customer_id wit h t he sam e dat a t ype as t he rentals.customer_id colum n.

This is an ext rem ely powerful feat ure. At first blush, it m ay appear t o be j ust a sim ple t im esaving t rick t hat you can use when you first creat e a
funct ion. The real power behind %TYPE is t hat your funct ions becom e self - m aint aining. I f you change t he dat a t ype of t he rentals.tape_id
colum n, t he process_rentals() funct ion will aut om at ically inherit t he change. You won't have t o t rack down all t he places where you have
m ade a t em porary copy of a tape_id and change t he dat a t ypes.

You can use t he %TYPE feat ure t o obt ain t he t ype of a colum n or t ype of anot her variable ( as shown in t he code t hat follows) . You cannot use %
TYPE t o obt ain t he t ype of a param et er. St art ing wit h Post greSQL version 7.2, you can use %TYPE in t he argum ent list for a funct ion. For
exam ple:

CREATE FUNCTION process_rental( rentals, rentals.customer_id%TYPE )


RETURNS BOOLEAN AS '
DECLARE
original_tape_id rentals.tape_id%TYPE;
original_customer_id rentals.customer_id%TYPE;
original_rental_row ALIAS FOR $1;
...

%TYPE let s you access t he dat a t ype of a colum n ( or variable) . %ROWTYPE provides sim ilar funct ionalit y. You can use %ROWTYPE to declare a
variable t hat has t he sam e st ruct ure as a row in t he given t able. For exam ple:

CREATE FUNCTION process_rental( rentals ) RETURNS BOOLEAN AS $$


DECLARE
original_tape_id rentals.tape_id%TYPE;
original_customer_id rentals.customer_id%TYPE;
original_rental_row rentals%ROWTYPE;
...

The original_rental_row variable is defined t o have t he sam e st ruct ure as a row in t he rentals t able. You can access colum ns in
original_rental_row using t he norm al dot synt ax: original_rental_row.tape_id, original_rental_row.rental_date, and so on.

Using %ROWTYPE, you can define a variable t hat has t he sam e st ruct ure as a row in a specific t able. A bit lat er in t his chapt er, I 'll show you how
to process dynam ic queries ( see t he sect ion "EXECUTE") ; t hat is, a query whose t ext is not known at t he t im e you are writ ing your funct ion.
When you are processing dynam ic queries, you won't know which t able t o use wit h %ROWTYPE.

Ot her Pseudo Types

The RECORD dat a t ype is used t o declare a com posit e variable whose st ruct ure will be det erm ined at execut ion t im e. I 'll describe t he RECORD
t ype in m ore det ail a bit lat er ( see t he sect ion " Loop Const ruct s") .

Post greSQL version 7.3 int roduced a new pseudo t ype nam ed TRIGGER. A funct ion defined wit h a ret urn t ype of TRIGGER can only be used as a
t rigger funct ion. I 'll describe t rigger funct ions lat er in t his chapt er ( see t he sect ion t it led " Triggers") .

The final pseudo dat a t ype is OPAQUE. The OPAQUE t ype can be used only t o define t he ret urn t ype of a funct ion [ 4] . You cannot declare a variable
( or param et er) to be of t ype OPAQUE. I n fact , you can use OPAQUE only t o define t he ret urn t ype of a t rigger funct ion only. OPAQUE is an obsolet e
nam e; you should define t rigger funct ions using t he TRIGGER t ype inst ead.

[ 4] You can use OPAQUE to define t he dat a t ype of a funct ion argum ent , but not when you are creat ing a PL/ pgSQL funct ion.
Rem em ber, funct ions can be defined in a num ber of different languages.

ALIAS a n d RENAME

Now, let 's m ove on t o t he next m et hod t hat you can use t o define a new variable, or a least a new nam e for an exist ing variable. You've
already seen t he ALIAS st at em ent earlier in t his chapt er. The ALIAS st at em ent creat es an alt ernat ive nam e for a funct ion param et er. You
cannot ALIAS a variable t hat is not a funct ion param et er. Using ALIAS, you can define any num ber of nam es t hat equat e to a param et er:
CREATE FUNCTION foo( INTEGER ) RETURNS INTEGER AS '
DECLARE
param_1 ALIAS FOR $1;
my_param ALIAS FOR $1;
arg_1 ALIAS FOR $1;
BEGIN
$1 := 42;
-- At this point, $1, param_1, my_param and arg_1
-- are all set to 42.
...

As we've m ent ioned already, if you're using Post greSQL version 8.0 or lat er, you can skip t he ALIAS com m ands and sim ply nam e each
param et er in t he argum ent list .

The RENAME st at em ent is sim ilar t o ALIAS; it provides a new nam e for an exist ing variable. Unlike ALIAS, RENAME invalidat es t he old variable
nam e. You can RENAME any variable, not j ust funct ion param et ers. The synt ax for t he RENAME st at em ent is

RENAME old-name TO new-name

Here is an exam ple of t he RENAME st at em ent :

CREATE FUNCTION foo( INTEGER ) RETURNS INTEGER AS '


DECLARE
RENAME $1 TO param1;
BEGIN
...

I m p or t a n t N ot e

The RENAME st at em ent does not work in Post greSQL versions 7.1.2 t hrough at least 7.4, but it appears to funct ion correct ly in
version 8.0.

RENAME and ALIAS can be used only wit hin t he DECLARE sect ion of a block.

FOR Loop I t e r a t or

So far, you have seen four m et hods for int roducing a new variable or a new variable nam e. I n each of t he preceding m et hods, you explicit ly
declare a new variable ( or nam e) in t he DECLARE sect ion of a block and t he scope of t he variable is t he block in which it is defined. The final
m et hod is different .

One of t he cont rol st ruct ures t hat you will be looking at soon is t he FOR loop. The FOR loop com es in t wo flavors—t he first flavor is used t o
execut e a block of st at em ent s som e fixed num ber of t im es; t he second flavor execut es a st at em ent block for each row ret urned by a query. I n
t his sect ion, I will t alk only about t he first flavor.

Here is an exam ple of a FOR loop:

FOR i IN 1 .. 12 LOOP
balance := balance + customers.monthly_balances[i];
END LOOP;

I n t his exam ple, you have defined a loop t hat will execut e 12 t im es. Each st at em ent wit hin t he loop ( you have only a single st at em ent ) will be
execut ed 12 t im es. The variable i is called t he it erat or for t he loop ( you m ay also see t he t erm loop index to describe t he it erat or) . Each t im e
you go t hrough t his loop, t he it erat or ( i) is increm ent ed by 1.

The it erat or for an int eger FOR loop is aut om at ically declared for you. The t ype of t he it erat or is INTEGER. I t is im port ant t o rem em ber t hat t he
it erat or for an int eger FOR loop is a new variable. I f you have already declared a variable wit h t he sam e nam e as t he it erat or, t he original
variable will be hidden for t he rem ainder of t he loop. For exam ple:

...
DECLARE
i REAL = 0;
balance NUMERIC(9,2) = 0;
BEGIN

--
-- At this point, i = 0
--

FOR i IN 1 .. 12 LOOP

--
-- we now have a new copy of i, it will vary from 1 to 12
--

balance := balance + customers.monthly_balances[i];


END LOOP;

--
-- Now, if we access i, we will find that it is
-- equal to 0 again
--

Not ice t hat while you are inside t he loop, t here are t wo variables nam ed i—t he inner variable is t he loop it erat or, and t he out er variable was
declared inside of t his block. I f you refer t o i inside t he loop, you are referring to t he inner variable. I f you refer t o i out side t he loop, you are
referring to t he out er variable. A lit t le lat er, I 'll show you how t o access t he out er variable from wit hin t he loop.

Now t hat you have seen how t o define new variables, it 's t im e t o m ove on. This next sect ion explains each t ype of st at em ent t hat you can use
in t he body of a PL/ pgSQL funct ion.

P L/ p g SQ L St a t e m e n t T y p e s

At t he beginning of t his chapt er, I said t hat PL/ pgSQL adds a set of procedural const ruct s t o t he basic SQL language. I n t his next sect ion, I 'll
exam ine t he st at em ent t ypes added by PL/ pgSQL. PL/ pgSQL includes const ruct s for looping, except ion and error handling, sim ple assignm ent ,
and condit ional execut ion ( t hat is, I F/ THEN/ ELSE) . Alt hough I don't describe t hem here, it 's im port ant t o rem em ber t hat you can also include
any SQL com m and in a PL/ pgSQL funct ion.

Assignm ent

The m ost com m only seen st at em ent in m any program s is t he assignm ent st at em ent . Assignm ent let s you assign a new value t o a variable. The
form at of an assignm ent st at em ent should be fam iliar by now; you've already seen it in m ost of t he exam ples in t his chapt er:

target := expression;

target should ident ify a variable, a funct ion param et er, a colum n, or in som e cases, a row. I f target is declared as CONSTANT, you will receive
an error. When PL/ pgSQL execut es an assignm ent st at em ent , it st art s by evaluat ing t he expression. I f expression evaluat es to a value
whose dat a t ype is not t he sam e as t he dat a t ype of target, PL/ pgSQL will convert t he value t o t he target t ype. ( I n cases where conversion is
not possible, Post greSQL will reward you wit h an error m essage.)

The expression is act ually evaluat ed by t he Post greSQL server, not by PL/ pgSQL. This m eans t hat expression can be any valid Post greSQL
expression. Chapt er 2, "Working wit h Dat a in Post greSQL," describes Post greSQL expressions in m ore det ail.

SELECT INTO

The assignm ent st at em ent is one way t o put dat a int o a variable; SELECT INTO is anot her. The synt ax for a SELECT INTO st at em ent is

SELECT INTO destination [, ...] select-list FROM ...;

A t ypical SELECT INTO st at em ent m ight look like t his:

...
DECLARE
customer customers%ROWTYPE;
BEGIN
SELECT INTO customer * FROM customers WHERE customer_id = 10;
...

When t his st at em ent is execut ed, PL/ pgSQL sends t he query " SELECT * FROM customers WHERE customer_id = 10" to t he server. This query
should not ret urn m ore t han one row. The result s of t he query are placed int o t he customer variable. Because I specified t hat customer is of
t ype customers%ROWTYPE, t he query m ust ret urn a row shaped exact ly like a customers row; ot herwise, PL/ pgSQL signals an error.

I could also SELECT INTO a list of variables, rat her t han int o a single com posit e variable:

DECLARE
phone customers.phone%TYPE;
name customers.customer_name%TYPE;
BEGIN
SELECT INTO name, phone
customer_name, customers.phone FROM customers
WHERE customer_id = 10;
...

Not ice t hat I had t o explicit ly request customers.phone in t his query. I f I had sim ply request ed phone, PL/ pgSQL would have assum ed t hat I
really want ed t o execut e t he query:

SELECT customer_name, NULL FROM customers where customer_id = 10;

Why? Because I have declared a local variable nam ed phone in t his funct ion, and PL/ pgSQL would subst it ut e t he current value of phone
wherever it occurred in t he query. Because phone ( t he local variable) is init ialized t o NULL, PL/ pgSQL would have st uffed NULL int o t he query.
You should choose variable nam es t hat don't conflict wit h colum n nam es, or fully qualify colum n nam e references.

Of course, you can also SELECT INTO a RECORD variable and t he RECORD will adapt it s shape t o m at ch t he result s of t he query.

I m ent ioned earlier t hat t he query specified in a SELECT INTO st at em ent m ust ret urn no m ore t han one row. What happens if t he query ret urns
no dat a? The variables t hat you are select ing int o are set to NULL. You can also check t he value of t he predefined variable FOUND ( described
lat er in t his chapt er) t o det erm ine whet her a row was act ually ret rieved. What happens if t he query ret urns m ore t han one row? I f you're using
an older version of Post greSQL, PL/ pgSQL will t hrow an error at you. I f you're using Post greSQL version 8.0 or lat er, t he t arget variables are
filled in wit h values from t he first row ret urned by t he SELECT com m and.

A bit lat er in t his chapt er, you'll see t he FOR-IN-SELECT loop t hat can handle an arbit rary num ber of rows ( see t he sect ion " Loop Const ruct s") .

Condit iona l Ex e cut ion

Using t he IF st at em ent , you can condit ionally execut e a sect ion of code. The m ost basic form of t he IF st at em ent is

IF expression THEN
statements
END IF;

The expression m ust evaluat e t o a BOOLEAN value or t o a value t hat can be coerced int o a BOOLEAN value. I f expression evaluat es t o TRUE,
t he st at em ent s bet ween THEN and END IF are execut ed. I f expression evaluat es to FALSE or NULL, t he statements are not execut ed.

Here are som e sam ple IF st at em ent s:

IF ( now() > rentals.rental_date + rental_period ) THEN


late_fee := handle_rental_overdue();
END IF;

IF ( customers.balance > maximum_balance ) THEN


PERFORM customer_over_balance( customers );
RETURN( FALSE );
END IF;

I n each of t hese st at em ent s, t he condit ion expression is evaluat ed by t he Post greSQL server. I f t he condit ion evaluat es t o TRUE, t he st at em ent s
bet ween THEN and END IF are execut ed; ot herwise, t hey are skipped and execut ion cont inues wit h t he st at em ent following t he END IF.

You can also define a new block wit hin t he IF st at em ent :

IF ( tapes.dist_id IS NULL ) THEN


DECLARE
default_dist_id CONSTANT integer := 0;
BEGIN
...
END;
END IF;

The obvious advant age to defining a new block wit hin an IF st at em ent is t hat you can declare new variables. I t 's usually a good idea t o declare
variables wit h t he short est possible scope; you won't pollut e t he funct ion's nam espace wit h variables t hat you need in only a few places, and
you can assign init ial values t hat m ay rely on earlier com put at ions.

The next form of t he IF st at em ent provides a way to execut e one sect ion of code if a condit ion is TRUE and a different set of code if t he
condit ion is not TRUE. The synt ax for an IF-THEN-ELSE st at em ent is

IF expression THEN
statements_1
ELSE
statements_2
END IF;

I n t his form , statements_1 will execut e if expression evaluat es to TRUE; ot herwise, statements_2 will execut e. Not e t hat statements_2 will
not execut e if t he expression is TRUE. Here are som e sam ple IF-THEN-ELSE st at em ent s:

IF ( now() > rentals.rental_date + rental_period ) THEN


late_fee := handle_rental_overdue();
ELSE
late_fee := 0;
END IF;

IF ( customers.balance > maximum_balance ) THEN


PERFORM customer_over_balance( customers );
RETURN( FALSE );
ELSE
rental_ok = TRUE;
END IF;

An IF-THEN-ELSE is alm ost equivalent t o t wo IF st at em ent s. For exam ple, t he following

IF ( now() > rentals.rental_date + rental_period ) THEN


statements_1
ELSE
statements_2
END IF;
is nearly ident ical t o

IF ( now() > rentals.rental_date + rental_period ) THEN


statements_1
END IF;

IF ( now() <= rentals.rental_date + rental_period ) THEN


statements_2
END IF;

The difference bet ween t hese t wo scenarios is t hat using IF-THEN-ELSE, t he condit ion expression is evaluat ed once; but using t wo IF
st at em ent s, t he condit ion expression is evaluat ed t wice. I n m any cases, t his dist inct ion won't be im port ant ; but in som e circum st ances, t he
condit ion expression m ay have side effect s ( such as causing a t rigger t o execut e) , and evaluat ing t he expression t wice will double t he side
effect s.

You can nest IF-THEN-ELSE st at em ent s:

IF ( today > compute_due_date( rentals )) THEN


--
-- This rental is past due
--
...
ELSE
IF ( today = compute_due_date( rentals )) THEN
--
-- This rental is due today
--
...
ELSE
--
-- This rental is not late and it's not due today
--
...
END IF;
END IF;

Post greSQL versions 7.2 and lat er support a m ore convenient way t o nest IF-THEN-ELSE-IF st at em ent s:

IF ( today > compute_due_date( rentals )) THEN


--
-- This rental is past due
--
...
ELSIF ( today = compute_due_date( rentals )) THEN
--
-- This rental is due today
--
...
ELSE
--
-- This rental is not late and it's not due today
--
...
END IF;

The ELSIF form is funct ionally equivalent t o a nest ed IF-THEN-ELSE-IF but you need only a single END IF st at em ent . Not ice t hat t he spelling is
ELSIF, not ELSE IF. You can include as m any ELSIF sect ions as you like.

Loop Const r uct s

Next , let 's look at t he loop const ruct s offered by PL/ pgSQL. Using a loop, you can repeat a sequence of st at em ent s unt il a condit ion occurs. The
m ost basic loop const ruct is t he LOOP st at em ent :

[<<label>>]
LOOP
statements
END LOOP;

I n t his form , t he statements bet ween LOOP and END LOOP are repeat ed unt il an EXIT or RETURN st at em ent exit s t he loop. I f you don't include
an EXIT or RETURN st at em ent , your funct ion will loop forever. I 'll explain t he opt ional <<label>> in t he sect ion t hat covers t he EXIT st at em ent .

You can nest loops as deeply as you need:

Code View: Scroll / Show All

1 row := 0;
2
3 LOOP
4 IF( row = 100 ) THEN
5 EXIT;
6 END IF;
7
8 col := 0;
9
10 LOOP
11 IF( col = 100 ) THEN
12 EXIT;
13 END IF;
14
15 PERFORM process( row, col );
16
17 col := col + 1;
18
19 END LOOP;
20
21 row := row + 1;
22 END LOOP;
23
24 RETURN( 0 );

I n t he preceding code snippet , t here are t wo loops. Because t he inner loop is com plet ely enclosed wit hin t he out er loop, t he inner loop execut es
each t im e t he out er loop repeat s. The st at em ent s in t he out er loop execut e 100 t im es. The st at em ent s in t he inner loop ( lines 10 t hrough 19)
execut e 100 x 100 t im es.

The EXIT st at em ent at line 5 causes t he out er LOOP to t erm inat e; when you execut e t hat st at em ent , execut ion cont inues at t he st at em ent
following t he END LOOP for t he enclosing loop ( at line 24) . The EXIT st at em ent at line 12 will change t he point of execut ion t o t he st at em ent
following t he END LOOP for t he enclosing loop ( at line 21) .

I 'll cover t he EXIT st at em ent in m ore det ail in t he next sect ion.

The next loop const ruct is t he WHILE loop. The synt ax for a WHILE loop is

[<<label>>]
WHILE expression LOOP
statements
END LOOP;

The WHILE loop is used m ore frequent ly t han a plain LOOP. A WHILE loop is equivalent t o

[<<label>>]
LOOP

IF( NOT ( expression )) THEN


EXIT;
END IF;

statements

END LOOP;

The condit ion expression m ust evaluat e to a BOOLEAN value or t o a value t hat can be coerced t o a BOOLEAN. The expression is evaluat ed each
t im e execut ion reaches t he t op of t he loop. I f expression evaluat es to TRUE, t he st at em ent s wit hin t he loop are execut ed. I f expression
evaluat es to FALSE or NULL, execut ion cont inues wit h t he st at em ent following t he END LOOP.

Here is t he nest ed loop exam ple again, but t his t im e, I have replaced t he IF t est s wit h a WHILE loop:

1 row := 0;
2
3 WHILE ( row < 100 ) LOOP
4
5 col := 0;
6
7 WHILE ( col < 100 ) LOOP
8
9 PERFORM process( row, col );
10
11 col := col + 1;
12
13 END LOOP;
14
15 row := row + 1;
16 END LOOP;
17
18 RETURN( 0 );

You can see t hat t he WHILE loop is m uch neat er and easier to underst and t han t he previous form . I t 's also a lot easier t o int roduce a bug if you
use a plain LOOP and have to writ e t he IF t est s yourself.

The t hird loop const ruct is t he FOR loop. There are t wo form s of t he FOR loop. I n t he first form , called t he integer-FOR loop, t he loop is
cont rolled by an int eger variable:

[<<label>>]
FOR iterator IN [ REVERSE ] start-expression .. end-expression LOOP
statements
END LOOP;

I n t his form , t he st at em ent s inside t he loop are repeat ed while t he iterator is less t han or equal to end-expression ( or great er t han or equal
to if t he loop direct ion is REVERSE) . Just before t he first it erat ion of t he loop, iterator is init ialized t o start-expression. At t he bot t om of t he
loop, iterator is increm ent ed by 1 ( or - 1 if t he loop direct ion is REVERSE) ; and if wit hin t he end-expression, execut ion j um ps back t o t he first
st at em ent in t he loop.

An integer-FOR loop is equivalent to:

[<<label>>]
DECLARE
Iterator INTEGER;
increment INTEGER;
end_value INTEGER;
BEGIN
IF( loop-direction = REVERSE ) THEN
increment := -1;
ELSE
increment := 1;
END IF;

iterator := start-expression;
end_value := end-expression;

LOOP
IF( iterator >= end_value ) THEN
EXIT;
END IF;

statements

iterator := iterator + increment;

END LOOP;
END;

The start-expression and end-expression are evaluat ed once, j ust before t he loop begins. Bot h expressions m ust evaluat e to an INTEGER
value or to a value t hat can be coerced to an INTEGER.

Here is t he exam ple code snippet again, t his t im e writ t en in t he form of an integer-FOR loop:

1 FOR row IN 0 .. 99 LOOP


2
3 FOR col in 0 .. 99 LOOP
4
5 PERFORM process( row, col );
6
8 END LOOP;
9
10 END LOOP;
11
12 RETURN( 0 );

This version is m ore readable t han t he version t hat used a WHILE loop. All t he inform at ion t hat you need in order to underst and t he loop
const ruct is in t he first line of t he loop. Looking at line 1, you can see t hat t his loop uses a variable nam ed row as t he it erat or; and unless
som et hing unusual happens inside t he loop, row st art s at 0 and increm ent s t o 99.

There are a few point s t o rem em ber about t he integer-FOR loop. First , t he iterator variable is aut om at ically declared—it is defined t o be an
INTEGER and is local t o t he loop. Second, you can t erm inat e t he loop early using t he EXIT ( or RETURN) st at em ent . Third, you can change t he
value of t he iterator variable inside t he loop: Doing so can affect t he num ber of it erat ions t hrough t he loop.

You can use t his last point t o your advant age. I n PL/ pgSQL, t here is no way t o explicit ly specify a loop increm ent ot her t han 1 ( or - 1 if t he loop
is REVERSEd) . But you can change t he effect ive increm ent by m odifying t he it erat or wit hin t he loop. For exam ple, let 's say t hat you want t o
process only odd num bers inside a loop:

1 ...
2 FOR i IN 1 .. 100 LOOP
3 ...
4 i := i + 1;
5 ...
6 END LOOP;
7 ...

The first t im e you go t hrough t his loop, i will be init ialized t o 1. At line 4, you increm ent i to 2. When you reach line 6, t he FOR loop w ill
increm ent i to 3 and t hen j um p back t o line 3 ( t he first line in t he loop) . You can, of course, increm ent t he loop it erat or in what ever form you
need. I f you fiddle wit h t he loop it erat or, be sure t o writ e yourself a com m ent t hat explains what you're doing.

The second form of t he FOR loop is used t o process t he result s of a query. The synt ax for t his form is

[<<label>>]
FOR iterator IN query LOOP
statements
END LOOP;
I n t his form , which I 'll call t he FOR-IN-SELECT form , t he statements wit hin t he loop are execut ed once for each row ret urned by t he query.
query m ust be a SQL SELECT com m and. Each t im e t hrough t he loop, iterator will cont ain t he next row ret urned by t he query. I f t he query
does not ret urn any rows, t he statements wit hin t he loop will not execut e.

The iterator variable m ust eit her be of t ype RECORD or of a %ROWTYPE t hat m at ches t he st ruct ure of a row ret urned by t he query. Even if t he
query ret urns a single colum n, t he it erat or m ust be a RECORD or a %ROWTYPE.

Here is a code snippet t hat shows t he FOR st at em ent :

1 DECLARE
2 rental rentals%ROWTYPE;
3 BEGIN
4
5 FOR rental IN SELECT * FROM rentals ORDER BY rental_date LOOP
6 IF( rental_is_overdue( rental )) THEN
7 PERFORM process_late_rental( rental );
8 END IF;
9 END LOOP;
10
11 END;

A %ROWTYPE iterator is fine if t he query ret urns an ent ire row. I f you need t o ret rieve a part ial row, or you want t o ret rieve t he result of a
com put at ion, declare t he iterator variable as a RECORD. Here is an exam ple:

1 DECLARE
2 my_record RECORD;
3 BEGIN
4
5 FOR my_record IN
6 SELECT tape_id, compute_due_date(rentals) AS due_date FROM rentals
7 LOOP
8 PERFORM
9 check_for_late_rental( my_record.tape_id, my_record.due_date );
10 END LOOP;
11
12 END;

A RECORD variable does not have a fixed st ruct ure. The fields in a RECORD variable are det erm ined at t he t im e t hat a row is assigned. I n t he
previous exam ple, you assign a row ret urned by t he SELECT to t he my_record RECORD. Because t he query ret urns t wo colum ns, my_record will
cont ain t wo fields: tape_id and due_date. A RECORD variable can change it s shape. I f you used t he my_record variable as t he it erat or in a
second FOR-IN-SELECT loop in t his funct ion, t he field nam es wit hin t he RECORD would change. For exam ple:

1 DECLARE
2 my_record RECORD;
3 BEGIN
4
5 FOR my_record IN SELECT * FROM rentals LOOP
6 -- my_record now holds a row from the rentals table
7 -- I can access my_record.tape_id, my_record.rental_date, etc.
8 END LOOP;
9
10 FOR my_record IN SELECT * FROM tapes LOOP
11 -- my_record now holds a row from the tapes table
12 -- I can now access my_record.tape_id, my_record.title, etc.
13 END LOOP;
12 END;

You also can process t he result s of a dynam ic query ( t hat is, a query not known at t he t im e you writ e t he funct ion) in a FOR loop. To execut e a
dynam ic query in a FOR loop, t he synt ax is a bit different :

[<<label>>]
FOR iterator IN EXECUTE query-string LOOP
statements
END LOOP;

Not ice t hat t his is nearly ident ical t o a FOR-IN loop. The EXECUTE keyword t ells PL/ pgSQL t hat t he following st ring m ay change each t im e t he
st at em ent is execut ed. The query- st ring can be an arbit rarily com plex expression t hat evaluat es to a st ring value; of course, it m ust evaluat e
to a valid SELECT st at em ent . The following funct ion shows t he FOR-IN-EXECUTE loop:

1 CREATE OR REPLACE FUNCTION my_count( VARCHAR ) RETURNS INTEGER AS '


2 DECLARE
3 query ALIAS FOR $1;
4 count INTEGER := 0;
5 my_record RECORD;
6 BEGIN
7 FOR my_record IN EXECUTE query LOOP
8 count := count + 1;
9 END LOOP;
10 RETURN count;
11 END;
12 ' LANGUAGE 'plpgsql';
EXIT

An EXIT st at em ent ( wit hout any operands) t erm inat es t he enclosing block, and execut ion cont inues at t he st at em ent following t he end of t he
block.

The full synt ax for t he EXIT st at em ent is

EXIT [label] [WHEN boolean-expression];

All t he EXIT st at em ent s t hat you have seen in t his chapt er have been sim ple EXIT st at em ent s. A sim ple EXIT st at em ent uncondit ionally
t erm inat es t he m ost closely nest ed block.

I f you include WHEN boolean-expression in an EXIT st at em ent , t he EXIT becom es condit ional—t he EXIT occurs only if boolean-expression
evaluat es to TRUE. For exam ple:

1 FOR i IN 1 .. 12 LOOP
2 balance := customer.customer_balances[i];
3 EXIT WHEN ( balance = 0 );
4 PERFORM check_balance( customer, balance );
5 END LOOP;
6
7 RETURN( 0 );

When execut ion reaches line 3, t he WHEN expression is evaluat ed. I f t he expression evaluat es t o TRUE, t he loop will be t erm inat ed and execut ion
will cont inue at line 7.

This st at em ent should really be nam ed EXIT...IF. The EXIT...WHEN expression is not evaluat ed aft er each st at em ent , as t he nam e m ight
im ply.

La bels—EXIT Ta r ge t s a nd N a m e Qua lifie r s

Now let 's t urn our at t ent ion t o t he subj ect of labels. A label is sim ply a st ring of t he form

<<label>>

You can include a label prior t o any of t he following:

€ A DECLARE sect ion

€ A LOOP

€ A WHILE loop

€ An int eger FOR loop

€ A FOR...SELECT loop

A label can perform t wo dist inct funct ions. First , a label can be referenced in an EXIT st at em ent . For exam ple:

1 <<row_loop>>
2 FOR row IN 0 .. 99 LOOP
3
4 <<column_loop>>
5 FOR col in 0 .. 99 LOOP
6
7 IF( process( row, col ) = FALSE ) THEN
8 EXIT row_loop;
9 END IF;
10
11 END LOOP;
12
13 END LOOP;
15
15 RETURN( 0 );

Norm ally, an EXIT st at em ent t erm inat es t he m ost closely nest ed block ( or loop) . When you refer t o a label in an EXIT st at em ent , you can
t erm inat e m ore t han one nest ed block. When PL/ pgSQL execut es t he EXIT st at em ent at line 8, it will t erm inat e t he <<column_loop>> block and
t he <<row_loop>> block. You can't EXIT a block unless it is act ive: I n ot her words, you can't EXIT a block t hat has already ended or t hat has
not y et begun.

The second use for a label has to do wit h variable scoping. Rem em ber t hat an integer-FOR loop creat es a new copy of t he it erat or variable. I f
you have already declared t he it erat or variable out side of t he loop, you can't direct ly access it wit hin t he loop. Consider t he following exam ple:

1 <<func>>
2 DECLARE
3 month_num INTEGER := 6;
4 BEGIN
5 FOR month_num IN 1 .. 12 LOOP
6 PERFORM compute_monthly_info( month_num );
7 END LOOP;
8 END;

Line 2 declares a variable nam ed month_num. When execut ion reaches line 4, PL/ pgSQL will creat e a second variable nam ed month_num ( and
t his variable w ill vary bet ween 1 and 12) . Wit hin t he scope of t he new variable ( bet ween lines 4 and 6) , any reference t o month_num will refer
t o t he new variable creat ed at line 4. I f you want t o refer t o t he out er variable, you can qualify t he nam e as func.month_num. I n general t erm s,
you can refer t o any variable in a fully qualified form . I f you om it t he label qualifier, a variable reference refers to t he variable wit h t he short est
lifet im e ( t hat is, t he m ost recent ly creat ed variable) .

RETURN

Every PL/ pgSQL funct ion m ust t erm inat e wit h a RETURN st at em ent . There are t wo form s for t he RETURN st at em ent :

RETURN expression;
RETURN;

Use t he first form when you're writ ing a PL/ pgSQL funct ion t hat ret urns a sim ple value and t he second form when you're writ ing a funct ion
ret urns a SETOF values. I f your funct ion ret urns a SETOF values, you'll use t he RETURN NEXT st at em ent ( described in t he next sect ion) t o build
up a result set as you go.

When a RETURN st at em ent execut es, four t hings happen:

1. The expression ( if any) is evaluat ed and, if necessary, coerced int o t he appropriat e dat a t ype. The RETURN t ype of a funct ion is declared
when you creat e t he funct ion. I n t he exam ple " CREATE FUNCTION func() RETURNS INTEGER ...", t he RETURN t ype is declared t o be an
INTEGER. I f t he RETURN expression does not evaluat e t o t he declared RETURN t ype, PL/ pgSQL will t ry to convert it t o t he required t ype.
I f you are writ ing a funct ion t hat ret urns a SETOF values, you should om it t he expression.

2. The current funct ion t erm inat es. When a funct ion t erm inat es, all code blocks wit hin t hat funct ion t erm inat e, and all variables declared
wit hin t hat funct ion are dest royed.

3. The ret urn value ( obt ained by evaluat ing expression or execut ing som e num ber of RETURN NEXT st at em ent s) is ret urned t o t he caller. I f
t he caller assigns t he ret urn value t o a variable, t he assignm ent com plet es. I f t he caller uses t he ret urn value in an expression, t he caller
uses t he ret urn value t o evaluat e t he expression. I f t he funct ion was called by a PERFORM st at em ent , t he ret urn value is discarded.

4. The point of execut ion ret urns t o t he caller.

I f you fail to execut e a RETURN st at em ent , you will receive an error ( control reaches end of function without RETURN) . You can include
m any RETURN st at em ent s in a funct ion, but only one will execut e: whichever RETURN st at em ent is reached first .

RETURN NEXT

I f you've defined a funct ion t hat ret urns a SETOF values, you don't use t he RETURN st at em ent t o give a value to t he caller. I nst ead, you execut e
a series of zero or m ore RETURN NEXT st at em ent s. The synt ax for a RETURN NEXT st at em ent is

RETURN NEXT expression;

Each t im e you execut e a RETURN NEXT st at em ent , PL/ pgSQL evaluat es t he expression and adds t he result t o t he funct ion's result set . I f you
are ret urning a SETOF rows, expression m ust evaluat e t o a row value. I f you are ret urning a SETOF arrays, each expression m ust evaluat e t o
an array ( of t he proper t ype) . I f you are ret urning a SETOF sim ple values, each expression m ust evaluat e to a sim ple value of t he appropriat e
t ype. I f you are ret urning a SETOF anyarray or anyelement, see t he discussion of polym orphic funct ions lat er in t his chapt er.

When you have finished building t he result set , sim ply RETURN from t he funct ion.The following exam ple defines a funct ion t hat ret urns t he
m ont hly balances for a given cust om er in t he form of a SETOF NUMERIC values:

Code View: Scroll / Show All

CREATE OR REPLACE FUNCTION getBalances( id INTEGER ) RETURNS SETOF NUMERIC AS $$


DECLARE
customer customers%ROWTYPE;
BEGIN

SELECT * FROM customers INTO customer WHERE customer_id = id;

FOR month IN 1..12 LOOP

IF customer.monthly_balances[month] IS NOT NULL THEN


RETURN NEXT customer.monthly_balances[month];
END IF;

END LOOP;

RETURN;

END;
$$ LANGUAGE 'plpgsql';
Not ice t hat t his funct ion will execut e t he RETURN NEXT st at em ent anywhere from 0 t o 12 t im es—t hat m eans t hat t he result set built by t his
funct ion m ay cont ain anywhere from 0 t o 12 rows. I f you don't execut e a RETURN NEXT st at em ent , t he result set built by t he funct ion will be
em pt y.

A funct ion t hat ret urns a SETOF values act s like a t able. That m eans t hat a SETOF funct ion is t ypically writ t en t o t he right of t he FROM in a
SELECT com m and. For exam ple, to call t he getBalances() funct ion you j ust saw, you would writ e a query such as t he following. ( Not e: These
queries won't work for you unless you've added a monthly_balances array to t he customers t able) :

movies=# SELECT customer_id, customer_name, balance, monthly_balances


movies-# FROM customers;
customer_id | customer_name | balance | monthly_balances
------------+---------------------+---------+------------------
1 | Jones, Henry | 0.00 |
4 | Wonderland, Alice N.| 3.00 |
2 | Rubin, William | 15.00 |
3 | Panky, Henry | 0.00 | {5.00,52.20}
(4 rows)

movies=# SELECT * FROM getBalances( 3 );


getbalances
-------------
5.00
52.20
(2 rows)

movies=# SELECT * FROM getBalances( 2 );


getbalances
-------------
(0 rows)

Not ice t hat t he first call to getBalances( 3 ) ret urned t wo rows because t here are two ent ries in t he monthly_balances colum n for cust om er
num ber 3. The second call ret urned zero rows.

PERFORM

A funct ion writ t en in PL/ pgSQL can cont ain SQL com m ands int erm ingled wit h PL/ pgSQL- specific st at em ent s. Rem em ber, a SQL com m and is
som et hing like CREATE TABLE, INSERT, UPDATE, and so on; whereas PL/ pgSQL adds procedural st at em ent s such as IF, RETURN, or WHILE. I f
you want t o creat e a new t able wit hin a PL/ pgSQL funct ion, you can j ust include a CREATE TABLE com m and in t he code:

CREATE FUNCTION process_month_end( ) RETURNS BOOLEAN AS '


BEGIN
...
CREATE TABLE temp_data ( ... );
...
DROP TABLE temp_data;
...
END;
' LANGUAGE 'plpgsql';

You can include alm ost any SQL com m and j ust by writ ing t he com m and inline. The except ion is t he SELECT com m and. A SELECT com m and
ret rieves dat a from t he server. I f you want to execut e a SELECT com m and in a PL/ pgSQL funct ion, you norm ally provide variables t o hold t he
result s:

DECLARE
Customer customers%ROWTYPE;
BEGIN
...
SELECT INTO customer * FROM customers WHERE( customer_id = 1 );
--
-- The customer variable will now hold the results of the query
--
...
END;

On rare occasions, you m ay need t o execut e a SELECT st at em ent , but you want t o ignore t he dat a ret urned by t he query. Most likely, t he
SELECT st at em ent t hat you want t o execut e will have som e side effect , such as execut ing a funct ion. You can use t he PERFORM st at em ent t o
execut e an arbit rary SELECT com m and wit hout using t he result s. For exam ple:

...
PERFORM SELECT my_function( rentals ) FROM rentals;
...

You can also use PERFORM to evaluat e an arbit rary expression, again discarding t he result s:

...
PERFORM record_timestamp( timeofday() );
...

EXECUTE

The EXECUTE st at em ent is sim ilar t o t he PERFORM st at em ent . Alt hough t he PERFORM st at em ent evaluat es a SQL expression and discards t he
result s, t he EXECUTE st at em ent execut es a dynam ic SQL com m and, and t hen discards t he result s. The difference is subt le but im port ant . When
t he PL/ pgSQL processor com piles a PERFORM expression st at em ent , t he query plan required to evaluat e t he expression is generat ed and
st ored along wit h t he funct ion. This m eans t hat expression m ust be known at t he t im e you writ e your funct ion. The EXECUTE st at em ent , on
t he ot her hand, execut es a SQL st at em ent t hat is not known at t he t im e you writ e your funct ion. You m ay, for exam ple, const ruct t he t ext of a
SQL st at em ent wit hin your funct ion, or you m ight accept a st ring value from t he caller and t hen execut e t hat st ring.

Here is a funct ion t hat uses t he EXECUTE com m and t o t im e t he execut ion of a SQL com m and:

1 CREATE FUNCTION time_command( VARCHAR ) RETURNS INTERVAL AS '


2 DECLARE
3 beg_time TIMESTAMP;
4 end_time TIMESTAMP;
5 BEGIN
6
7 beg_time := timeofday( );
8 EXECUTE $1;
9 end_time := timeofday( );
10
11 RETURN( end_time - beg_time );
12 END;
13 ' LANGUAGE 'plpgsql';

You would call t he time_command() funct ion like t his:

movies=# SELECT time_command( 'SELECT * FROM rentals' );


time_command
--------------
00:00:00.82
(1 row)

Wit h t he EXECUTE st at em ent , you can execut e any SQL com m and ( including calls to PL/ pgSQL funct ions) and t he result s will be discarded,
except for t he side effect s.

GET DIAGNOSTICS

PL/ pgSQL provides a cat ch- all st at em ent t hat gives you access t o various pieces of result inform at ion: GET DIAGNOSTICS. Using GET
DIAGNOSTICS, you can ret rieve a count of t he rows affect ed by t he m ost recent UPDATE or DELETE com m and and t he object-ID of t he m ost
recent ly insert ed row. The synt ax for t he GET DIAGNOSTICS st at em ent is

GET DIAGNOSTICS variable = [ROW_COUNT|RESULT_OID], ...;

ROW_COUNT is m eaningless unt il you have execut ed an UPDATE or DELETE com m and. Likewise, RESULT_OID is m eaningless unt il you execut e an
INSERT com m and.

Er r or H a ndling

Post greSQL version 8.0 int roduced a new error - handling schem e t o PL/ pgSQL. Prior to version 8.0, any error t hat occurred during a PL/ pgSQL
funct ion would abort t he funct ion and t he t ransact ion t hat called t he funct ion. Beginning wit h version 8.0, you can int ercept error condit ions
( PL/ pgSQL calls t hem except ions) and handle t hem gracefully.

To t rap an except ion, include an EXCEPTION sect ion j ust before t he END of a block. The synt ax for an EXCEPTION sect ion is

EXCEPTION
WHEN condition [OR condition...] THEN
statements
[ WHEN condition [OR condition...] THEN
statements
...
]

The condition is derived from t he error descript ions list ed in Appendix A of t he Post greSQL reference docum ent at ion. Table 7.1 shows an
excerpt from Appendix A. To convert one of t hese errors int o a condition, j ust find t he error code t hat you want t o t rap and writ e t he error
descript ion, replacing each space wit h an underscore.

Ta ble 7 .1 . Sa m ple Post gr e SQL Er r or Code s

Er r or Code D e scr ipt ion

Class 08 Connect ion Except ion

08000 CONNECTI ON EXCEPTI ON

08003 CONNECTI ON DOES NOT EXI ST

08006 CONNECTI ON FAI LURE

08001 SQLCLI ENT UNABLE TO ESTABLI SH


SQLCONNECTI ON

08004 SQLSERVER REJECTED ESTABLI SHMENT OF


SQLCONNECTI ON
08007 TRANSACTI ON RESOLUTI ON UNKNOWN

08P01 PROTOCOL VI OLATI ON

For exam ple, t o t rap error 08006, you would writ e an EXCEPTION sect ion like t his:

BEGIN
...
EXCEPTION
WHEN connection_failure THEN
RAISE ERROR 'Connection To Server Lost';
END;

I f any of t he st at em ent s bet ween BEGIN and EXCEPTION t hrows a connection_failure error, PL/ pgSQL im m ediat ely j um ps t o t he first
st at em ent in t he except ion handler ( in t his case, t he RAISE ERROR st at em ent ) , bypassing t he rest of t he st at em ent s in t he block.

You can't t rap every condit ion list ed in Appendix A; in part icular, you can't t rap successful_completion, any of t he condit ions list ed in t he
WARNING cat egory, or any of t he condit ions list ed in t he NO DATA cat egory.

You can t rap a whole cat egory of error condit ions by writ ing an EXCEPTION handler for t hat cat egory. You can dist inguish bet ween errors and
cat egories by looking at t he last digit of t he error code. I f t he last digit is a 0, you're looking at a cat egory. To t rap any of t he errors in t he
connection_exception class, j ust writ e an EXCEPTION sect ion like t his:

BEGIN
...
EXCEPTION
WHEN connection_exception THEN
RAISE ERROR 'Something went wrong with the server connection';
END;

That sequence is equivalent t o:

BEGIN
...
EXCEPTION
WHEN
connection_does_not_exist OR
connection_failure OR
sql_client_unable_to_establish_sql_connection OR
sql_server_rejected_establishment_of_sql_connection OR
transaction_resolution_unkown OR
protocol_violation
RAISE ERROR 'Something went wrong with the server connection';
END;

PL/ pgSQL defines a cat ch- all condit ion, nam ed others, t hat you can use to t rap any except ions not t rapped by anot her handler.

A single except ion m ay m at ch m ult iple except ion handlers. For exam ple, consider t he following EXCEPTION sect ion:

BEGIN
...
EXCEPTION
WHEN connection_failure THEN
RAISE ERROR 'Connection Lost;
WHEN connection_exception THEN
RAISE ERROR 'Something went wrong with the server connection';
WHEN others THEN
RAISE ERROR 'Something broke';
END;

I f a connection_failure occurs, all t hree handlers m at ch t he except ion: The connection_failure handler m at ches exact ly; t he
connection_exception handler m at ches because a connection_failure is a m em ber of t he connection_exception cat egory; and t he
others handler m at ches because others will m at ch any except ion. Which handler execut es? The first one t hat m at ches. That m eans t hat you
should always list t he handlers from m ost - specific t o m ost - general. I f you were to writ e t he others handler first , t he connection_failure and
connection_exception handlers could never execut e.

Rem em ber t hat you can nest blocks wit hin a single PL/ pgSQL funct ion. Each block can have it s own EXCEPTION sect ion. When an except ion
occurs, PL/ pgSQL searches t hrough t he current ly act ive blocks to find a handler for t hat except ion. I f t he first ( m ost deeply nest ed) block
hasn't defined a handler for t he except ion, PL/ pgSQL abort s t he first block and looks at t he surrounding block. I f t hat block hasn't defined a
handler for t he except ion, PL/ pgSQL abort s t he second block as well and cont inues t o t he next block. I f PL/ pgSQL can't find a handler, it abort s
t he ent ire funct ion and report s t he except ion t o t he caller of t he funct ion.

When a PL/ pgSQL funct ion ent ers a block t hat includes an EXCEPTION sect ion, it creat es a "subt ransact ion" by execut ing t he int ernal equivalent
of a SAVEPOINT com m and. I f you have one block nest ed wit hin anot her ( and each block defines except ion handlers) , you have t wo
subt ransact ions, one nest ed wit hin t he ot her. I f an except ion occurs, PL/ pgSQL rolls back nest ed subt ransact ions as it searches for an
except ion handler. When PL/ pgSQL finds an except ion handler, it execut es t he handler and rolls back t hat subt ransact ion as well. Consider t he
following code snippet :

Code View: Scroll / Show All


...
FOR tape IN SELECT * FROM tapes LOOP
BEGIN

update_tape( tape );

FOR rental IN SELECT * FROM rentals WHERE rentals.tape_id = tape.tape_id LOOP

BEGIN

update_rental_1( rental );
update_rental_2( rental );

EXCEPTION
WHEN insufficient_privilege THEN
RAISE NOTICE 'Privilege denied';
END;

END LOOP;
EXCEPTION
WHEN others THEN
RAISE NOTICE 'Unable to process all tapes';
END;
END LOOP;
...

This snippet cont ains t wo loops, one nest ed wit hin t he ot her. The out er loop reads t hrough t he tapes t able and, for each t ape, calls a funct ion
nam ed update_tape() ( presum ably anot her PL/ pgSQL funct ion) . The inner loop reads each rentals record for t he current tape and calls t wo
funct ions wit h each rental.

Every t im e t he PL/ pgSQL int erpret er execut es t he first BEGIN st at em ent , it creat es a new subt ransact ion which we'll call Touter. Likewise,
every t im e PL/ pgSQL execut es t he second BEGIN st at em ent , it creat es a new subt ransact ion, Tinner, nest ed wit hin Tout er. Now consider what
happens when an except ion occurs.

I f t he update_rental_2() funct ion t hrows an except ion, PL/ pgSQL abort s Tinner ( rolling back any changes m ade by update_rental_1() and
update_rental_2()) and t hen searches for a handler t hat m at ches t he except ion. I f update_rental_2() t hrows an insufficient_privilege
except ion, PL/ pgSQL finds t he inner - m ost except ion handler, j um ps to t he first RAISE NOTICE st at em ent , and t hen m oves on to t he st at em ent
following t he inner - m ost block. I f update_rental_2() t hrows any ot her except ion, PL/ pgSQL ignores t he inner - m ost except ion handler
( because it doesn't m at ch t he except ion) , abort s Tout er ( rolling back any changes m ade by update_tape(), update_rental_1(), and
update_rental_2()) , j um ps t o t he second RAISE NOTICE st at em ent , and m oves on t o t he st at em ent following t he out er - m ost block.

I f t he update_tape() funct ion t hrows an except ion, PL/ pgSQL abort s Tout er ( rolling back any changes m ade by update_tape()) , j um ps t o t he
second RAISE NOTICE st at em ent , and t hen m oves on t o t he st at em ent following t he out er - m ost block.

Not ice t hat an except ion always abort s t he inner- m ost subt ransact ion. PL/ pgSQL will cont inue abort ing nest ed subt ransact ions unt il it finds a
handler for t he except ion. I f no handler is found, t he ent ire t ransact ion is abort ed. By using nest ed subt ransact ions ( and nest ed except ion
handlers) in t his way, t he inner subt ransact ion cont ains all of t he updat es for a single rental. I f t he inner subt ransact ion abort s, only t hose
changes m ade t o t he current rental are rolled back. The out er subt ransact ion cont ains all of t he updat es for a single tape ( including all of t he
updat es for all rentals of t hat tape) . I f you abort t he out er subt ransact ion all changes m ade t o t he tape are rolled back and all changes m ade
to t he rentals of t hat tape are rolled back as well.

RAISE

Even t hough PL/ pgSQL doesn't offer a way t o int ercept errors, it does provide a way t o generat e an error: t he RAISE st at em ent . Except ions are
usually generat ed when an error occurs while execut ing an SQL ( or PL/ pgSQL) st at em ent , but you can explicit ly raise an except ion using t he
RAISE st at em ent . The synt ax for a RAISE st at em ent is

RAISE severity 'message' [, variable [...]];

The severity det erm ines how far t he error m essage will go and whet her t he error should abort t he current t ransact ion.

Valid values for severity are

€ DEBUG— The m essage is writ t en t o t he server's log file and ot herwise ignored. The funct ion runs t o com plet ion, and t he current
t ransact ion is not affect ed.

€ NOTICE— The m essage is writ t en to t he server's log file and sent to t he client applicat ion. The funct ion runs to com plet ion, and t he
current t ransact ion is not affect ed.

€ EXCEPTION— The m essage is writ t en t o t he server's log file and PL/ pgSQL t hrows a raise_exception except ion t hat you can t rap wit h
an EXCEPTION handler as described in t he previous sect ion.

The message st ring m ust be a lit eral value—you can't use a PL/ pgSQL variable in t his slot , and you cannot include a m ore com plex expression.
I f you need t o include variable inform at ion in t he error m essage, you can sneak it int o t he m essage by including a % charact er wherever you
want t he variable value t o appear. For exam ple:

rentals.tape_id := ''AH-54706'';
RAISE DEBUG ''tape_id = %'', rentals.tape_id;
When t hese st at em ent s are execut ed, t he m essage tape_id = AH-54706 will be writ t en t o t he server's log file. For each ( single) % charact er in
t he message st ring, you m ust include a variable. I f you want t o include a lit eral percent charact er in t he m essage, writ e it as %%. For exam ple:

percentage := 20;
RAISE NOTICE ''Top (%)%%'', percentage;

t ranslat es t o Top (20)%.

The RAISE st at em ent is useful for debugging your PL/ pgSQL code; it 's even bet t er for debugging som eone else's code. I find t hat t he DEBUG
severit y is perfect for leaving evidence in t he server log. When you ship a PL/ pgSQL funct ion t o your users, you m ight want t o leave a few
RAISE DEBUG st at em ent s in your code. This can cert ainly m ake it easier t o t rack down an elusive bug ( rem em ber, users never writ e down error
m essages, so you m ight as well arrange for t he m essages t o appear in a log file) . I use t he RAISE NOTICE st at em ent for int eract ive debugging.
When I am first building a new PL/ pgSQL funct ion, t he chances are very slim t hat I 'll get it right t he first t im e. ( Funny, it doesn't seem t o
m at t er how t rivial or com plex t he funct ion is.) I st art out by lit t ering m y code wit h RAISE NOTICE st at em ent s; I 'll usually print t he value of
each funct ion param et er as well as key inform at ion from each record t hat I SELECT. As it becom es clearer t hat m y code is working, I 'll eit her
rem ove or com m ent out ( using " --") t he RAISE NOTICE st at em ent s. Before I send out m y code to a vict im , er, user, I 'll find st rat egic places
where I can leave RAISE DEBUG st at em ent s. The RAISE DEBUG st at em ent is perfect for report ing t hings t hat should never happen. For exam ple,
because of t he referent ial int egrit y t hat I built int o t he tapes, customers, and rentals t ables, I should never find a rentals record t hat refers
to a nonexist ent cust om er. I 'll check for t hat condit ion ( a m issing cust om er) and report t he error wit h a RAISE DEBUG st at em ent . Of course, in
som e circum st ances, a m issing cust om er should really t rigger a RAISE EXCEPTION—if I j ust happen to not ice t he problem in passing and it
really doesn't affect t he current funct ion, I 'll j ust not e it wit h a RAISE DEBUG. So, t he rule I follow is: if t he condit ion prevent s furt her
processing, I RAISE an EXCEPTION; if t he condit ion should never happen, I RAISE a DEBUG m essage; if I am st ill developing m y code, I RAISE a
NOTICE.
Cu r so r s

Direct cursor support is new in PL/ pgSQL version 7.2. Processing a result set using a cursor is sim ilar t o processing a result set using a FOR
loop, but cursors offer a few dist inct advant ages t hat you'll see in a m om ent .

You can t hink of a cursor as a nam e for a result set . You m ust declare a cursor variable j ust as you declare any ot her variable. The following
code snippet shows how you m ight declare a cursor variable:

...
DECLARE
rental_cursor CURSOR FOR SELECT * FROM rentals;
...

rental_cursor is declared t o be a cursor for t he result set of t he query SELECT * FROM rentals. When you declare a variable of t ype
CURSOR, you m ust include a query. The cursor variable is said t o be bound t o t his query, and t he variable is a bound cursor variable.

Before y ou can use a bound cursor, y ou m ust open t he cursor using t he OPEN st at em ent :

...
DECLARE
rental_cursor CURSOR FOR SELECT * FROM rentals;
BEGIN

OPEN rental_cursor;

...

I f you t ry t o OPEN a cursor t hat is already open, you will receive an error m essage ( cursor "name" already in use) . I f you t ry t o FETCH
( see t he sect ion t hat follows) from a cursor t hat has not been opened, you'll receive an error m essage ( cursor "name" is invalid) . When
you use a cursor, you first DECLARE it , t hen OPEN it , FETCH from it , and finally CLOSE it , in t hat order. You can repeat t he OPEN, FETCH, CLOSE
cycle if you want t o process t he cursor result s again.

FETCH

Aft er a bound cursor has been opened, you can ret rieve t he result set ( one row at a t im e) using t he FETCH st at em ent . When you fet ch a row
from a cursor, you have t o provide one or m ore dest inat ion variables t hat PL/ pgSQL can st uff t he result s int o. The synt ax for t he FETCH
st at em ent is

FETCH cursor-name INTO destination [ , destination [...]];

The destination ( or destinations) m ust m at ch t he shape of a row ret urned by t he cursor. For exam ple, if t he cursor SELECTs a row from
t he rentals t able, t here are t hree possible destinations:

€ A variable of t ype rentals%ROWTYPE

€ Three variables: one of t ype rentals.tape_id%TYPE, one of t ype rentals.customer_id%TYPE, and t he last of t ype
rentals.rental_date%TYPE

€ A variable of t ype RECORD

Let 's look at each of t hese destination t ypes in m ore det ail.

When you FETCH int o a variable of som e %ROWTYPE, you can refer t o t he individual colum ns using t he usual variable.column not at ion. For
exam ple:

...
DECLARE
rental_cursor CURSOR FOR SELECT * FROM rentals;
rental rentals%ROWTYPE;
BEGIN

OPEN rental_cursor;

FETCH rental_cursor INTO rental;


--
-- I can now access rental.tape_id,
-- rental.customer_id, and rental.rental_date
--
IF ( overdue( rental.rental_date )) THEN
...

Next , I can FETCH int o a com m a- separat ed list of variables. I n t he previous exam ple, t he rental_cursor cursor will ret urn rows t hat each
cont ain t hree colum ns. Rat her t han fet ching int o a %ROWTYPE variable, I can declare t hree separat e variables ( of t he appropriat e t ypes) and
FETCH int o t hose inst ead:

...
DECLARE
rental_cursor CURSOR FOR SELECT * FROM rentals;
tape_id rentals.tape_id%TYPE;
customer_id rentals.customer_id%TYPE;
rental_date rentals.rental_date%TYPE;
BEGIN

OPEN rental_cursor;

FETCH rental_cursor INTO tape_id, customer_id, rental_date;

IF ( overdue( rental_date )) THEN


...

You are not required t o use variables declared w it h %TYPE, but t his is t he perfect place t o do so. At t he t im e y ou creat e a funct ion, you usually
know which colum ns you will be int erest ed in, and declaring variables wit h %TYPE will m ake your funct ions m uch less fragile in cases where
t he referenced colum n t ypes m ight change.

You cannot com bine com posit e variables and scalar variables in t he sam e FETCH st at em ent [ 5] :

[ 5]
This seem s like a bug t o m e. You m ay be able t o com bine com posit e and scalar variables in a fut ure release.

...
DECLARE
rental_cursor CURSOR FOR SELECT *, now() - rental_date FROM rentals;
rental rentals%ROWTYPE;
elapsed INTERVAL;
BEGIN

OPEN rental_cursor;

FETCH rental_cursor INTO rental, elapsed; -- WRONG! Can't combine


-- composite and scalar
-- variables in the same
-- FETCH

IF ( overdue( rental.rental_date )) THEN


...

The t hird t ype of destination t hat you can use wit h a FETCH st at em ent is a variable of t ype RECORD. You m ay recall from earlier in t his
chapt er t hat a RECORD variable is som et hing of a cham eleon—it adj ust s t o w hat ever kind of dat a t hat you put int o it . For exam ple, t he
following snippet uses t he sam e RECORD variable t o hold t w o different ly shaped rows:

...
DECLARE
rental_cursor CURSOR FOR SELECT * FROM rentals;
customer_cursor CURSOR FOR SELECT * FROM customers;
my_data RECORD;
BEGIN
OPEN rental_cursor;
OPEN customer_cursor;

FETCH rental_cursor INTO my_data;


-- I can now refer to:
-- my_data.tape_id
-- my_data.customer_id
-- my_data.rental_date

FETCH customer_cursor INTO my_data;


-- Now I can refer to:
-- my_data.customer_id
-- my_data.customer_name
-- my_data.phone
-- my_data.birth_date
-- my_data.balance
...

Aft er you have execut ed a FETCH st at em ent , how do you know whet her a row was act ually ret rieved? I f y ou FETCH aft er ret rieving t he ent ire
result , no error occurs. I nst ead, each PL/ pgSQL funct ion has an aut om at ically declared variable nam ed FOUND. FOUND is a BOOLEAN variable
t hat is set by t he PL/ pgSQL int erpret er t o indicat e various kinds of st at e inform at ion. Table 7.2 list s t he point s in t im e where PL/ pgSQL set s
t he FOUND variable and t he corresponding values.

Ta ble 7 .2 . FOUND Eve n t s a nd Va lue s

Eve nt Va lu e

St art of each funct ion FALSE

St art of an int eger—FOR loop FALSE

Wit hin an int eger —FOR loop TRUE

St art of a FOR...SELECT loop FALSE

Wit hin a FOR...SELECT loop TRUE

Before SELECT INTO st at em ent FALSE


Aft er SELECT INTO st at em ent TRUE ( if rows are ret urned)

Before FETCH st at em ent FALSE

Aft er FETCH st at em ent TRUE ( if a row is ret urned)

So, you can see t hat FOUND is set t o TRUE if a FETCH st at em ent ret urns a r ow . Let 's see how t o put all t he cursor relat ed st at em ent s t oget her
int o a single PL/ pgSQL funct ion:

...
DECLARE
next_rental CURSOR FOR SELECT * FROM rentals;
rental rentals%ROWTYPE;
BEGIN
OPEN next_rental;

LOOP
FETCH next_rental INTO rental;
EXIT WHEN NOT FOUND;
PERFORM process_rental( rental );
END LOOP;

CLOSE next_rental;
END;
...

The first t hing you do in t his code snippet is OPEN t he cursor. Next , you ent er a LOOP t hat will process every row ret urned from t he cursor.
I nside of t he LOOP, you FETCH a single record, EXIT t he loop if t he cursor is exhaust ed, and call anot her funct ion ( process_rental()) if not .
Aft er t he loop t erm inat es, close t he cursor using t he CLOSE st at em ent .

So far, it looks like a cursor loop is pret t y m uch t he sam e as a FOR-IN-SELECT loop. What else can you do wit h a cursor?

P a r a m e t e r i z e d Cu r so r s

You've seen t hat you m ust provide a SELECT st at em ent when you declare a CURSOR. Quit e oft en, you'll find t hat y ou don't know t he exact
values involved in t he query at t he t im e you're writ ing a funct ion. You can declare a param et erized cursor t o solve t his problem .

A param et erized cursor is sim ilar in concept t o a param et erized funct ion. When you define a funct ion, you can declare a set of param et ers
( t hese ar e called t he form al param et ers, or form al argum ent s) ; t hose param et ers can be used wit hin t he funct ion t o change t he result s of t he
funct ion. I f y ou define a funct ion wit hout param et ers, t he funct ion will always ret urn t he sam e result s ( unless influenced by global, ext ernal
dat a) . Each language im poses rest rict ions on where you can use a param et er wit hin a funct ion. I n general, funct ion param et ers can be used
anywhere t hat a value- yielding expression can be used. When you m ake a call t o a param et erized funct ion, you provide a value for each
param et er: The values t hat y ou provide ( t hese are called t he act ual param et ers, or act ual argum ent s) are subst it ut ed inside of t he funct ion
wherever t he form al param et ers appear.

When you define a cursor, you can declare a set of form al param et ers; t hose param et ers can be used w it h t he cursor t o change t he result set
of t he query. I f you define a cursor wit hout param et ers, t he query will always ret urn t he sam e result set , unless influenced by ext ernal dat a.
PL/ pgSQL rest rict s t he places t hat you can use a param et er wit hin a cursor definit ion. A cursor param et er can be used anyw here t hat a
value- yielding expression can be used. When you open a cursor, you m ust specify values for each form al param et er. The act ual param et ers
are subst it ut ed inside of t he cursor wherever t he form al param et ers appear.

Let 's look at an exam ple:

1 ...
2 DECLARE
3 next_customer CURSOR (ID INTEGER) FOR
4 SELECT * FROM customers WHERE
5 customer_id = ID;
6 customer customers%ROWTYPE;
7 target_customer ALIAS FOR $1;
8 BEGIN
9
10 OPEN next_customer( target_customer );
11 ...

Lines 3, 4, and 5 declare a param et erized cursor. This cursor has a single form al param et er; an INTEGER nam ed ID. Not ice ( at t he end of line
5) , t hat I have used t he form al param et er wit hin t he cursor definit ion. When I open t his cursor, I 'll provide an INTEGER value for t he ID
param et er. The act ual param et er t hat I provide will be subst it ut ed int o t he query wherever t he form al param et er is used. So, if
target_customer is equal t o, say, 42, t he cursor opened at line 10 w ill read:

SELECT * FROM customers WHERE customer_id = 42;

The full synt ax for a cursor declarat ion is

variable-name CURSOR
[ (param-name param-type [, param-name param-type ...] ) ]
FOR select-query;
The full synt ax for an OPEN st at em ent is

OPEN cursor-name [ ( actual-param-value [, actual-param-value...] ) ];

You would param et erize a cursor for t he sam e reasons t hat y ou would param et erize a funct ion: y ou want t he result s t o depend on t he act ual
argum ent s. When you param et erize a cursor, you are also m aking t he cursor m ore reusable. For exam ple, I m ight want t o process all t he
rent als in m y invent ory, but I want t o process t he rent als one cust om er at a t im e. I f I don't use a param et erized cursor, I have t o declare one
cursor for each of m y cust om ers ( and I have t o know t he set of cust om ers at t he t im e I writ e t he funct ion) . Using a param et erized cursor, I
can declare t he cursor once and provide different act ual argum ent s each t im e I open t he cursor:

Code View: Scroll / Show All

1 CREATE OR REPLACE FUNCTION process_rentals_by_customer( ) RETURNS void AS $$


2 DECLARE
3 next_customer CURSOR FOR SELECT * FROM customers;
4 next_rental CURSOR( ID integer ) FOR
5 SELECT * FROM rentals WHERE customer_id = ID;
6 customer customers%ROWTYPE;
7 rental rentals%ROWTYPE;
8 BEGIN
9
10 OPEN next_customer;
11
12 LOOP
13 FETCH next_customer INTO customer;
14 EXIT WHEN NOT FOUND;
15
16 OPEN next_rental( customer.customer_id );
17
18 LOOP
19 FETCH next_rental INTO rental;
20 EXIT WHEN NOT FOUND;
21
22 PERFORM process_rental( customer, rental );
23
24 END LOOP;
25
26 CLOSE next_rental;
27 END LOOP;
28
29 CLOSE next_customer;
30
31 RETURN;
32
33 END;
34
35 $$ LANGUAGE 'plpgsql';

Not ice t hat you can OPEN and CLOSE a cursor as oft en as y ou like. A cursor m ust be closed before it can be opened. Each t im e you open a
param et erized cursor, you can provide new act ual param et ers.

Cu r so r R e f e r e n ce s

Now, let 's t urn our at t ent ion t o anot her aspect of cursor support in PL/ pgSQL—cursor references.

When you declare a CURSOR variable, y ou provide a SELECT st at em ent t hat is bound t o t he cursor. You can't change t he t ext of t he query
aft er t he cursor has been declared. Of course, you can param et erize t he query t o change t he result s, but t he shape of t he query rem ains t he
sam e: I f t he query ret urns rows from t he tapes t able, it will always ret urn rows from t he tapes t able.

I nst ead of declaring a CURSOR, you can declare a variable t o be of t ype REFCURSOR. A REFCURSOR is not act ually a cursor, but a reference t o a
cursor. The synt ax for declaring a REFCURSOR is

DECLARE
ref-name REFCURSOR;
...

Not ice t hat you do not specify a query when creat ing a REFCURSOR. I nst ead, a cursor is bound t o a REFCURSOR at runt im e. Here is a sim ple
exam ple:

1 ...
2 DECLARE
3 next_rental CURSOR FOR SELECT * FROM rentals;
4 next_tape CURSOR FOR SELECT * FROM tapes;
5 rental rentals%ROWTYPE;
6 tape tape%ROWTYPE;
7 next_row REFCURSOR;
8 BEGIN
9 OPEN next_rental;
10 next_row := next_rental;
11 FETCH next_rental INTO rental;
12 FETCH next_row INTO rental;
13 CLOSE next_rental;
14
15 next_row := next_tape;
16 OPEN next_tape;
17 FETCH next_row INTO tape;
18 CLOSE next_row;
19 ...

I n t his block, I 've declared t wo cursors and one cursor reference. One of t he cursors ret urns rows from t he rentals t able, and t he ot her
ret urns rows from t he tapes t able.

At line 9, t he next_rental cursor opens. At line 10, I give a value t o t he next_row cursor reference. We now have t wo w ays t o access t he
next_rental cursor: t hrough t he next_rental cursor variable and t hrough t he next_row cursor reference. At t his point , next_row refers t o
t he next_rental cursor. You can see ( at lines 11 and 12) t hat you can FETCH a row using eit her variable. Bot h FETCH st at em ent s ret urn a
row from t he rentals t able.

At line 14, t he next_row cursor reference point s t o a different cursor. Now, when you FETCH from next_row, you'll get a row from t he tapes
t able. Not ice t hat you can point next_row t o a cursor t hat has not yet been opened. You can CLOSE a cursor using a cursor reference, but y ou
can't OPEN a cursor using a cursor reference.

Act ually, you can open a cursor using a REFCURSOR; you j ust can't open a nam ed cursor. When you declare a CURSOR variable, y ou are really
creat ing a Post greSQL cursor whose nam e is t he sam e as t he nam e of t he variable. I n t he previous exam ple, you creat ed one cursor ( not j ust
a cursor variable) nam ed next_rental and a cursor nam ed next_tape. PL/ pgSQL allows you t o creat e anonym ous cursors using REFCURSOR
variables. An anonym ous cursor is a cursor t hat doesn't have a nam e [ 6] . You creat e an anonym ous cursor using t he OPEN st at em ent , a
REFCURSOR, and a SELECT st at em ent :

[ 6] An anonym ous cursor does in fact have a nam e, but Post greSQL const ruct s t he nam e, and it isn't very reader - friendly. An
anonym ous cursor has a nam e such as <unnamed cursor 42>.

1 ...
2 DECLARE
3 next_row REFCURSOR;
4 BEGIN
5 OPEN next_row FOR SELECT * FROM customers;
6 ...

At line 5, you are creat ing an anonym ous cursor and binding it t o t he next_row cursor reference. Aft er an anonym ous cursor has been
opened, you can t reat it like any ot her cursor. You can FETCH from it , CLOSE it , and lose it . That last part m ight sound a lit t le fishy, so let m e
explain furt her. Take a close look at t he following code fragm ent :

Code View: Scroll / Show All

1 CREATE FUNCTION leak_cursors( INTEGER ) RETURNS INTEGER AS '


2 DECLARE
3 next_customer CURSOR FOR SELECT * FROM customers;
4 next_rental REFCURSOR;
5 customer customers%ROWTYPE;
6 rental rentals%ROWTYPE;
7 count INTEGER := 0;
8 BEGIN
9
10 OPEN next_customer;
11
12 LOOP
13 FETCH next_customer INTO customer;
14 EXIT WHEN NOT FOUND;
15 OPEN next_rental FOR
16 SELECT * FROM rentals
17 WHERE rentals.customer_id = customer.customer_id;
18
19 LOOP
20 FETCH next_rental INTO rental;
21 EXIT WHEN NOT FOUND;
22
23 RAISE NOTICE ''customer_id = %, rental_date = %'',
24 customer.customer_id, rental.rental_date;
25
26 count := count + 1;
27 END LOOP;
28
29 next_rental := NULL;
30
31 END LOOP;
32 CLOSE next_customer;
33 RETURN( count );
34 END;
35 ' LANGUAGE 'plpgsql';

This funct ion cont ains t wo loops: an out er loop t hat reads t hrough t he customers t able and an inner loop t hat reads each rent al for a given
cust om er. The next_customer cursor is opened ( at line 10) before t he out er loop begins. The next_rental cursor is bound and opened ( at
lines 15, 16, and 17) j ust before t he inner loop begins. Aft er t he inner loop com plet es, I set t he next_rental cursor reference t o NULL and
cont inue w it h t he out er loop. What happens t o t he cursor t hat was bound t o next_rental? I didn't explicit ly close t he cursor, so it m ust
rem ain open. Aft er execut ing t he assignm ent st at em ent at line 29, I have no way t o access t he cursor again—rem em ber, it 's an anonym ous
cursor, so I can't refer t o it by nam e. This sit uat ion is called a resource leak . A resource leak occurs when y ou creat e an obj ect ( in t his case, a
cursor) and t hen you lose all references t o t hat obj ect . I f y ou can't find t he obj ect again, you can't free t he resource. Avoid resource leaks;
t hey're nast y and can cause perform ance problem s. Resource leaks will also cause your code t o fail if y ou run out of a resource ( such as
m em ory space) . We can avoid t he resource leak shown in t his exam ple by closing t he next_rental before set t ing it t o NULL.

You've seen what not t o do w it h a cursor reference, but let 's see what cursor references are really good for. The nice t hing about a cursor
reference is t hat you can pass t he reference t o anot her funct ion, or you can ret urn a reference t o t he caller. These are powerful feat ures. By
sharing cursor references bet ween funct ions, you can fact or your PL/ pgSQL code int o reusable pieces.

One of t he m ore effect ive ways t o use cursor references is t o separat e t he code t hat processes a cursor from t he code t hat creat es t he
cursor. For exam ple, you m ay find t hat we need a funct ion t o com put e t he t ot al am ount of m oney t hat we have received from a given
cust om er over a given period of t im e. I m ight st art by creat ing a single funct ion t hat const ruct s a cursor and processes each row in t hat
cursor:

...
OPEN next_rental FOR
SELECT * FROM rentals WHERE
customer_id = $1 AND
rental_date BETWEEN $2 AND $3;

LOOP
FETCH next_rental INTO rental
-- accumulate rental values here
...

This is a good st art , but it w or ks only for a single set of condit ions: a given cust om er and a given pair of dat es. I nst ead, you can fact or t his
one funct ion int o t hree separat e funct ions.

The first funct ion creat es a cursor t hat , when opened, will ret urn all rentals records for a given cust om er wit hin a given period; t he cursor is
ret urned t o t he caller:

CREATE FUNCTION
select_rentals_by_customer_interval( INTEGER, DATE, DATE )
RETURNS REFCURSOR AS '
DECLARE
next_rental REFCURSOR;
BEGIN
OPEN next_rental FOR
SELECT * FROM RENTALS WHERE
customer_id = $1 AND
rental_date BETWEEN $2 AND $3;
RETURN( next_rental );
END;
' LANGUAGE 'plpgsql';

The second funct ion, given a cursor t hat ret urns rentals records, com put es t he t ot al value of t he rentals accessible t hrough t hat cursor:

CREATE FUNCTION
compute_rental_value( REFCURSOR )
RETURNS NUMERIC AS '
DECLARE
total NUMERIC(7,2) := 0;
rental rentals%ROWTYPE;
next_rental ALIAS FOR $1;
BEGIN
LOOP
FETCH next_rental INTO rental;
EXIT WHEN NOT FOUND;
-- accumulate rental values here
--
-- pretend that this is a complex
-- task which requires loads of amazingly
-- clever code
...
END LOOP;
RETURN( total );
END;
' LANGUAGE 'plpgsql';

The last funct ion invokes t he first t wo:

CREATE FUNCTION
compute_value_by_customer_interval( INTEGER, DATE, DATE )
RETURNS NUMERIC AS '
DECLARE
curs REFCURSOR;
total NUMERIC(7,2);
BEGIN
curs := select_rentals_by_customer_interval( $1, $2, $3 );
total := compute_rental_value( curs );
CLOSE curs;
RETURN( total );
END;
' LANGUAGE 'plpgsql';
The advant age t o t his approach is t hat you can const ruct a cursor using different select ion crit eria and call compute_total_value(). For
exam ple, you m ight want t o com put e t he t ot al values of all rentals of a given t ape:

CREATE FUNCTION compute_tape_value( VARCHAR )


RETURNS NUMERIC AS '
DECLARE
curs REFCURSOR;
total NUMERIC(7,2);
BEGIN
OPEN curs FOR SELECT * FROM rentals WHERE tape_id = $1;
total := compute_rental_value( curs );
CLOSE curs;
RETURN( total );
END;
' LANGUAGE 'plpgsql';
Tr igge r s

So far, all t he funct ions t hat defined in t his chapt er have been called explicit ly, eit her by using a SELECT function()
com m and or by using t he funct ion wit hin an expression. You can also call cert ain PL/ pgSQL funct ions aut om at ically. A t rigger
is a funct ion t hat is called whenever a specific event occurs in a given t able. An INSERT com m and, an UPDATE com m and, or a
DELETE com m and can cause a t rigger t o execut e.

Let 's look at a sim ple exam ple. You current ly have a customers t able defined like t his:

CREATE TABLE customers


(
customer_id integer primary key,
customer_name character varying(50) not null,
phone character(8),
birth_date date,
balance decimal(7,2)
);

You want t o creat e a new t able t hat you can use t o archive any rows t hat are delet ed from t he customers t able. You also
want t o archive any updat es t o t he customers t able. Nam e t his t able customer_archive:

CREATE TABLE customer_archive


(
customer_id integer,
customer_name character varying(50) not null,
phone character(8),
birth_date date,
balance decimal(7,2),
user_changed varchar,
date_changed date,
operation varchar
);

Each row in t he customer_archive t able cont ains a com plet e customers record plus a few pieces of inform at ion about t he
m odificat ion t hat t ook place.

Now, let 's creat e a t rigger funct ion t hat execut es whenever a change is m ade t o a row in t he customers t able. A t rigger
funct ion is a funct ion t hat t akes no argum ent s and ret urns a special dat a t ype—TRIGGER. ( I 'll t alk m ore about t he inform at ion
ret urned by a t rigger in a m om ent .)

CREATE FUNCTION archive_customer() RETURNS TRIGGER AS '


BEGIN
INSERT INTO customer_archive
VALUES
(
OLD.customer_id,
OLD.customer_name,
OLD.phone,
OLD.birth_date,
OLD.balance,
CURRENT_USER,
now(),
TG_OP
);
RETURN NULL;
END;
' LANGUAGE 'plpgsql';

Not ice t hat I am using a variable in t his funct ion t hat I have not declared: OLD. Trigger funct ions have access t o several
predefined variables t hat m ake it easier t o find inform at ion about t he cont ext in which t he t rigger event occurred. The OLD
variable cont ains a copy of t he original row when a t rigger is execut ed because of an UPDATE or DELETE com m and. The NEW
variable cont ains a copy of t he new row when a t rigger is execut ed for an UPDATE or INSERT com m and.

When t his t rigger execut es, it creat es a new row in t he customer_archive() t able. The new row will cont ain a copy of t he
original customers row, t he nam e of t he user m aking t he m odificat ion, t he dat e t hat t he m odificat ion was m ade, and t he
t ype of operat ion: TG_OP will be set t o 'UPDATE', 'INSERT', or 'DELETE'.

Table 7.3 cont ains a com plet e list of t he predefined variables t hat you can use inside of a t rigger funct ion:

Ta ble 7 .3 . Predefined Tr igge r Va ria bles


N am e Type Descript ion
NEW % ROWTYPE New values ( for UPDATE and INSERT)

OLD %ROWTYPE Old values ( for UPDATE and DELETE)

TG_NAME name Nam e of t rigger


TG_WHEN text BEFORE or AFTER
TG_LEVEL text ROW or STATEMENT[ 7 ]
TG_OP text INSERT, UPDATE, or DELETE
TG_RELID Oid Obj ect I D of t rigger t able
TG_RELNAME name Nam e of t rigger t able
TG_NARGS integer Count of t he opt ional argum ent s given t o
t he CREATE TRIGGER com m and

TG_ARGV[] text[] Opt ional argum ent s given t o t he CREATE


TRIGGER com m and

[ 7]
St at em ent t riggers are not support ed in Post greSQL, so TG_LEVEL will always be set to ROW.

Now t hat you have creat ed a funct ion, you have t o define it as a t rigger funct ion. The CREATE TRIGGER com m and associat es
a funct ion wit h an event ( or event s) in a given t able. Here is t he com m and t hat you use for t he archive_customer()
funct ion:

1 CREATE TRIGGER archive_customer


2 AFTER DELETE OR UPDATE
3 ON customers
4 FOR EACH ROW
5 EXECUTE PROCEDURE archive_customer();

This is a rat her unwieldy com m and, so let 's look at it one line at a t im e.

The first line t ells Post greSQL t hat you want t o creat e a new t rigger—each t rigger has a nam e—in t his case,
archive_customer. Trigger nam es m ust be unique wit hin each t able ( in ot her words, I can have t wo t riggers nam ed foo as
long as t he t riggers are defined for t wo different t ables) . I nside t he t rigger funct ion, t he TG_NAME variable holds t he nam e of
t he t rigger.

Line 2 specifies t he event ( or event s) t hat cause t his t rigger t o fire. I n t his case, I want t he t rigger t o occur AFTER a DELETE
com m and or an UPDATE com m and. Alt oget her, Post greSQL can fire a t rigger BEFORE or AFTER an UPDATE com m and, an
INSERT com m and, or a DELETE com m and. I n t he t rigger funct ion, TG_WHEN is set t o eit her BEFORE or AFTER, and TG_OP is set
to INSERT, UPDATE, or DELETE.

Line 3 associat es t his t rigger wit h a specific t able. This is not an opt ional clause; each t rigger m ust be associat ed wit h a
specific t able. You can't , for exam ple, define a t rigger t hat will execut e on every INSERT st at em ent regardless of t he t able
involved. You can use t he TG_RELNAME variable in t he t rigger funct ion t o find t he nam e of t he associat ed t able. TG_RELOID
holds t he obj ect - I D ( OID) of t he t able.

A single DELETE or UPDATE st at em ent can affect m ult iple rows. The FOR EACH clause det erm ines whet her a t rigger will
execut e once for each row or once for t he ent ire st at em ent . Post greSQL does not support st at em ent -level t riggers at t he
m om ent , so t he only choice is FOR EACH ROW. I nside of t he t rigger funct ion, TG_LEVEL can cont ain eit her ROW or STATEMENT;
but t he only value current ly im plem ent ed is ROW.

Line 5 finally get s around t o t elling Post greSQL which funct ion you act ually want t o execut e when t he specified event s occur.

The full synt ax for t he CREATE TRIGGER com m and is

CREATE TRIGGER trigger-name


[BEFORE | AFTER] [ INSERT | DELETE | UPDATE [OR ...]]
ON table-name FOR EACH ROW
EXECUTE PROCEDURE function-name [(args)];

TRIGGER Re t u r n V a l u e s

A t rigger funct ion can ret urn a value j ust like any ot her funct ion, but t he value t hat you ret urn can have far -reaching
consequences. I f you ret urn NULL from a row- level BEFORE t rigger, Post greSQL cancels t he rest of t he operat ion for t hat
row—t hat m eans t hat Post greSQL won't fire any subsequent t riggers and t he INSERT, UPDATE, or DELETE won't occur for t hat
row. I f you ret urn a non- NULL value from a row-level BEFORE t rigger, t he value t hat you ret urn m ust m at ch t he st ruct ure of
t he t able t hat you're m odifying. I f Post greSQL is execut ing an UPDATE or INSERT com m and, t he row value t hat you ret urn
from t he t rigger funct ion is used in place of t he original value.

Post greSQL ignores t he ret urn value of an AFTER t rigger. Post greSQL also ignores t he ret urn value of a st at em ent -level
BEFORE t rigger.

TRIGGER Fu n ct i o n Ar g u m e n t s

Not ice t hat t he CREATE TRIGGER com m and allows you t o specify opt ional argum ent s ( indicat ed by args in t he preceding
synt ax diagram ) . You can include a list of st ring lit erals when you creat e a t rigger ( any argum ent s t hat are not of st ring t ype
are convert ed int o st rings) . The argum ent s t hat you specify are m ade available t o t he t rigger funct ion t hrough t he TG_NARGS
and TG_ARGV variables. TG_NARGS cont ains an int eger count of t he num ber of argum ent s. TG_ARGV cont ains an array of st rings
corresponding t o t he values t hat you specified when you creat ed t he t rigger: TG_ARGV[0] cont ains t he first argum ent ,
TG_ARGV[1] cont ains t he second argum ent , and so on. You can use t he opt ional t rigger argum ent s t o pass ext ra inform at ion
t hat m ight help t he t rigger funct ion know m ore about t he cont ext in which t he t rigger has execut ed. You m ight find t his
useful when using t he sam e funct ion as a t rigger for m ult iple t ables; alt hough in m ost sit uat ions, t he TG_NAME, TG_RELNAME,
and TG_OP variables provide enough cont ext inform at ion.
Poly m or ph ic Fu n ct ion s

St ar t in g w it h Post g r eSQL v er sion 8 . 0 , y ou can w r it e p oly m or p h ic f u n ct ion s in PL/ p g SQL. A p oly m or p h ic f u n ct ion is a f u n ct ion w it h at
least on e p ar am et er of t y p e ANYELEMENT or ANYARRAY. Th e t y p es ANYELEMENT an d ANYARRAY ar e called p oly m or p h ic t y p es becau se
t h ey can assu m e dif f er en t " sh ap es" at r u n - t im e.

Her e' s a sim ple p oly m or p h ic f u n ct ion t h at w ill r et u r n t he g r eat er of t w o ar g u m en t s:

-- ch07.sql
CREATE OR REPLACE FUNCTION max( arg1 ANYELEMENT, arg2 ANYELEMENT )
RETURNS ANYELEMENT AS $$
BEGIN

IF( arg1 > arg2 ) THEN


RETURN( arg1 );
ELSE
RETURN( arg2 );
END IF;

END;
$$ LANGUAGE 'plpgsql';

Wh en y ou call t h is f u n ct ion w it h t w o INTEGER v alu es, PL/ p g SQL t r eat s t h e f u n ct ion as if y ou h ad d ef in ed it as

CREATE OR REPLACE FUNCTION max( arg1 INTEGER, arg2 INTEGER )


RETURNS INTEGER AS $$

Th e p oly m or p h ic ar g u m en t s arg1 an d arg2 ar e assu m ed t o b e of t y p e INTEGER.

I f y ou call t h is f u n ct ion w it h t w o TEXT v alu es, arg1 an d arg2 ar e con sid er ed t o b e of t y p e TEXT an d t h e r et u r n v alu e is also assu m ed
t o b e of t y p e TEXT. I n fact , y ou can call t h is f u n ct ion w it h t w o ar g u m en t s of alm ost an y t y pe. Th e on ly r est r ict ion is t h at t h e f u n ct ion
m u st com p ile pr oper ly f or a g iv en t y pe. I n t h e case of t h e max() f u n ct ion , t h at m ean s t h at t h er e m u st be a > op er at or t h at com p ar es
t w o v alu es of t h at t y p e ( sin ce t h e f u n ct ion com p ar es arg1 an d arg2 u sin g t h e > op er at or ) .

Wh en y ou call a p oly m or p h ic f u n ct ion , t he act u al v alu es t h at y ou pr ov ide f or p oly m or p h ic p ar am et er s m u st all be of t h e sam e t y pe.
You can ' t call t h e max() f u n ct ion w it h an INTEGER an d a TEXT ar g u m en t b ecau se arg1 an d arg2 ar e b ot h d ef in ed as ANYELEMENT
par am et er s. You can m ix p oly m or p h ic ar g u m en t s w it h ot h er d at a t y p es, y ou j u st h av e t o en su r e t h at all p oly m or p h ic ar g u m en t s ar e
of t he sam e t y pe. I f y ou def in e ANYARRAY ar g u m en t s, t h e elem en t s w it h in t h ose ar r ay s m u st m at ch t h e t y p e of ot h er p oly m or p h ic
p ar am et er s.

You can also w r it e f u n ct ion s t h at r et u r n a v alu e of t y p e ANYELEMENT or ANYARRAY. Wh en y ou call su ch a f u n ct ion , Post g r eSQL in f er s
t he dat e t y p e of t h e r et u r n v alu e f r om t h e d at a t y pe of t h e p oly m or p h ic ar g u m en t s. You can ' t w r it e a f u n ct ion t h at r et u r n s a
p oly m or p h ic v alu e u n less t h e f u n ct ion ex pect s at least on e ANYELEMENT ( or ANYARRAY) ar g u m en t .

Her e' s a f u n ct ion t h at r et u r n s a p oly m or p h ic v alu e. firstSmaller() f in ds t h e fir st elem en t in arg2 t h at ' s sm aller t h an arg1. arg2
m u st be a on e - dim en sion al ar r ay :

-- ch07.sql
CREATE OR REPLACE FUNCTION firstSmaller( arg1 ANYELEMENT, arg2 ANYARRAY )
RETURNS ANYELEMENT AS $$
BEGIN

FOR i IN array_lower( arg2, 1 ) .. array_upper( arg2, 1 ) LOOP

IF arg2[i] < arg1 THEN


RETURN( arg2[i] );
END IF;

END LOOP;

RETURN NULL;

END;
$$ LANGUAGE 'plpgsql';

You can call t h is f u n ct ion w it h an INTEGER v alu e an d ar r ay of INTEGERS, or a TEXT v alu e an d ar r ay of TEXT v alu es, or a NUMERIC
v alu e an d an ar r ay of NUMERIC v alu es, an d so on . I f t h e p oly m or p h ic ar g u m en t s ( arg1 an d arg2) ar e of t y p e INTEGER, t he r et u r n
v alu e w ill be of t y p e INTEGER. I f y ou call firstSmaller() w it h NUMERIC v alu es, t h e r et u r n v alu e w ill be of t y p e NUMERIC.

A f u n ct ion t h at r et u r n s a p oly m or p h ic v alu e au t om at ically in h er it s an ex t r a v ar iable n am ed $0. You can ALIAS $0 t o a m or e


descr ipt iv e n am e, su ch as result, t o m ak e it easier t o r ead y ou r code. Th e t y p e of $0 is t h e sam e as t h e t y p e of t h e r et u r n v alu e; in
ot h er w or ds, t h e dat a t y p e of $0 m at ch es t h e d at a t y p e of t h e p oly m or p h ic ar g u m en t s.

Th e sum() f u n ct ion , sh ow n h er e, r et u r n s a p oly m or p h ic v alu e.


-- ch07.sql
CREATE OR REPLACE FUNCTION sum( arg1 ANYARRAY ) RETURNS ANYELEMENT AS $$
DECLARE
result ALIAS FOR $0;
BEGIN

result := 0;

FOR i IN array_lower( arg1, 1 ) .. array_upper( arg1, 1 ) LOOP

IF arg1[i] IS NOT NULL THEN


result := result + arg1[i];
END IF;

END LOOP;

RETURN( result );

END;

$$ LANGUAGE 'plpgsql';

Th e dat a t y p e f or $0 is in f er r ed f r om t h e t y p e of arg1. PL/ p g SQL alw ay s in it ializes t h e r et u r n v alu e t o NULL—becau se t h is f u n ct ion


accu m u lat es result as it p r og r esses t h r ou g h t h e arg1 ar r ay , y ou m u st r e- init ialize r esu lt t o 0 bef or e y ou can ad d t o it ( r em em b er ,
NULL + 5 is n ot t he sam e as 0 + 5—NULL+ 5 = NULL) .

Not e t h at y ou can ' t call t h e sum() f u n ct ion w it h an ar r ay of TEXT v alu es b ecau se Post g r eSQL doesn ' t d ef in e a TEXT + TEXT oper at or .
You can call sum() w it h an ar r ay of an y n u m er ic t y p e ( INTEGER, REAL, NUMERIC, an d so on ) .
P L/ p g SQ L a n d Se cu r i t y

By default , a PL/ pgSQL funct ion execut es wit h t he privileges of t he user t hat calls it . That 's safe because an unprivileged user
won't gain ext ra privileges sim ply by calling a PL/ pgSQL funct ion. However, t here are t im es when you m ay want t o convey
ext ra privileges t o a funct ion. For exam ple, you m ight hide sensit ive inform at ion ( such as payroll dat a) from a clerical user,
but you want t hat user t o "close t he books" at t he end of each m ont h. Presum ably, t he close_the_books() funct ion can do
it s work wit hout exposing secret dat a t o t he user. I f t hat 's t he case, you can t ell Post greSQL t hat you want t he
close_the_books() funct ion t o inherit t he privileges of t he aut hor of t he funct ion. To convey ext ra privileges t o a funct ion,
j ust add a SECURITY clause t o t he funct ion definit ion. The SECURITY clause follows t he funct ion body and can precede or
follow t he LANGUAGE clause:

CREATE [OR REPLACE] FUNCTION name ( [[argname] argtype [, ...] ] )


RETURNS return_type
AS $$definition$$
LANGUAGE langname | [ SECURITY INVOKER | SECURITY DEFINER ]

For exam ple:

CREATE OR REPLACE FUNCTION close_the_books( ) RETURNS void AS $$


BEGIN
...
END;
$$ LANGUAGE 'plpgsql' SECURITY DEFINER;

I f you don't include a SECURITY clause, Post greSQL assum es SECURITY INVOKER ( m eaning t hat t he funct ion execut es wit h
t he privileges of t he invoker) . Be aware t hat when you convey ext ra privileges t o close_the_books(), you are also
conveying ext ra privileges t o any SECURITY INVOKER funct ions called by close_the_books().
Chapter 1 5 . Using PostgreSQL w ith PHP

PHP is a general- purpose program m ing language, but PHP is m ost com m only used t o build dynam ic web pages. A dynam ic
web page is a docum ent that is regenerat ed each t im e it is displayed. For exam ple, each t im e you point your web browser t o
cnn.com, you see t he lat est news. PHP is useful for building dynam ic web pages because you can em bed PHP program s
wit hin HTML docum ent s. I n fact , you can produce HTML docum ent s from a PHP script .

P H P Ar ch i t e ct u r e O v e r v i e w

The j ob of a web server ( such as Apache or Microsoft 's I I S) is t o reply t o request s com ing from a client ( usually a web
browser) . When a browser connect s t o a web server, it request s inform at ion by sending a URL ( Uniform Resource Locat or) .
For exam ple, if you browse t o t he URL ht t p: / / www.post gresql.org/ soft ware.ht m l, your web browser connect s t o t he server at
www.post gresql.org and request s a file nam ed software.html.

Aft er t he web server has received t his request , it m ust decide how t o reply. I f t he request ed file cannot be found, you'll see
t he all t oo fam iliar HTTP 404 - File not found. Most web servers will choose a response based on t he ext ension of t he
request ed file. A filenam e ending wit h .html ( or .htm) is usually associat ed wit h a t ext file cont aining a HTML docum ent .

Occasionally, you'll see a URL t hat ends in t he suffix .php. A .php file is a script t hat is execut ed by a PHP processor
em bedded wit hin t he web server. The script is execut ed each t im e a client request s it . The web browser never sees t he .php
script ; only t he web server sees it . As t he .php script execut es, it sends inform at ion back t o t he browser ( usually in t he form
of an HTML docum ent ) .

List ing 15.1 shows a sim ple PHP script .

List ing 1 5 .1 . Simple.php

1 <?php
2 # Filename: Simple.php
3 echo "Hey there, I'm a PHP script!";
4 ?>

When you run t his script ( I 'll show you how in a m om ent ) , t he PHP int erpret er will send t he st ring " Hey there, I'm a PHP
script!" t o t he browser.

PHP synt ax m ight look a lit t le st range at first , so here's a quick explanat ion. The script st art s wit h t he charact ers <?php: This
t ells t he web server t hat everyt hing t hat follows, up t o t he next ?>, is a PHP script and should be int erpret ed by t he PHP
processor. The next line is t reat ed as a com m ent because it st art s wit h a # charact er ( PHP underst ands ot her com m ent
charact ers, such as " //" as well) . The t hird line is where st uff happens—t his is a call t o PHP's echo() funct ion. echo() is
pret t y easy t o underst and; it j ust sends a st ring t o t he web browser. The charact ers on line 4 ( ?>) m ark t he end of t he script .

Web browsers don't underst and how t o int erpret PHP script s; t hey prefer HTML docum ent s. I f you can use PHP t o send
t ext ual dat a from t he server t o t he browser, you can also send HTML docum ent s ( because an HTML docum ent is t ext ual
dat a) . This next PHP script ( see List ing 15.2) will creat e an HTML docum ent ( and send it t o t he browser) as it execut es.

List ing 1 5 .2 . SimpleHTML.php

1 <?php
2 # Filename: SimpleHTML.php
3 echo "<HTML>\n";
4 echo "<HEAD>\n";
5 echo "<TITLE>SimpleHTML</TITLE>\n";
6 echo "<BODY>\n";
7 echo "<CENTER>I'm another simple PHP script</CENTER>\n";
8 echo "</BODY>\n";
9 echo "</HTML>";
10 ?>

When you use a web browser t o request t his file ( SimpleHTML.php), t he server will execut e t he script and send t he following
t ext t o t he browser:

<HTML>
<HEAD>
<TITLE>SimpleHTML</TITLE>
<BODY>
<CENTER>I'm another simple PHP script</CENTER>
</BODY>
</HTML>

The web browser int erpret s t his as an HTML docum ent and displays t he result , as shown in Figure 15.1.

Figure 1 5 .1 . SimpleHTML.php in a brow ser.

Of course, if you want t o display st at ic HTML pages, PHP doesn't really offer any advant ages—we could have produced t his
HTML docum ent wit hout PHP's help. The power behind a PHP script is t hat it can produce a different page each t im e it
execut es. List ing 15.3 shows a script t hat displays t he current t im e ( in t he server's t im e zone) .

List ing 1 5 .3 . Time.php

1 <?php
2 //Filename: Time.php
3
4 $datetime = date( "Y-m-d H:i:s (T)" );
5
6 echo "<HTML>\n";
7 echo "<HEAD>\n";
8 echo "<TITLE>Time</TITLE>\n";
9 echo "<BODY>\n";
10 echo "<CENTER>";
11 echo "The current time " . $datetime;
12 echo "</CENTER>\n";
13 echo "</BODY>\n";
14 echo "</HTML>";
15 ?>

Line 4 ret rieves t he current dat e and t im e, and assigns it t o t he variable $datetime. Line 11 appends t he value of $datetime
t o a st ring lit eral and echoes t he result t o t he browser. When you request t his PHP script from wit hin a browser, you see a
result such as t hat shown in Figure 15.2.

Figure 1 5 .2 . Time.php in a brow ser.


I f you request t his docum ent again ( say by pressing t he Refresh but t on) , t he web server will execut e t he script again and
display a different result .
P r e r e q u i si t e s

To try t he exam ples in t his chapt er, you will need access t o a web server t hat underst ands PHP. I 'll be using t he Apache web
server wit h PHP inst alled, but you can also use PHP wit h Microsoft 's I I S, Net scape's web server, and m any ot her servers. To
find out if t he Post greSQL int erface is available in your copy of PHP, call t he phpinfo() funct ion and look for a sect ion t it led
"Post greSQL Support ." I f you see t hat sect ion, you're ready t o go. I f you don't you m ight have t o com pile PHP from source
code. You can learn how t o com pile PHP for your plat form at t he Zend websit e ( ht t p: / / www.zend.com ) . Be sure t o add t he --
with-pgsql opt ion when you configure t he PHP source code.

I 'll assum e t hat you are com fort able reading sim ple HTML docum ent s and have som e basic fam iliarit y wit h PHP in general.
Most of t his chapt er focuses on t he det ails of int eract ing wit h a Post greSQL dat abase from PHP. I f you need m ore inform at ion
regarding general PHP program m ing, visit ht t p: / / www.zend.com .
Clie n t 1 —Con n e ct in g t o t h e Se r v e r

Th e f ir st PHP/ Post g r eSQL clien t est ab lish es a con n ect ion t o a Post g r eSQL ser v er an d d isp lay s t h e n am e of t h e d at ab ase t o w h ich
y ou con n ect . List in g 1 5 . 4 sh o w s t h e client1a.php scr ip t .

Li st i n g 1 5 . 4 . client1a.php

1 <?php
2 //Filename: client1a.php
3
4 $connect_string = "dbname=movies user=bruce";
5
6 $db_handle = pg_connect( $connect_string );
7
8 echo "<HTML>\n";
9 echo "<HEAD>\n";
10 echo "<TITLE>client1</TITLE>\n";
11 echo "<BODY>\n";
12 echo "<CENTER>";
13 echo "Connected to " . pg_dbname( $db_handle );
14 echo "</CENTER>\n";
15 echo "</BODY>\n";
16 echo "</HTML>";
17 ?>

Th is scr ip t con n ect s t o a d at ab ase w h ose n am e is h ar d - cod ed in t h e scr ip t ( at lin e 4 ) . At lin e 6 , y ou at t em p t t o m ak e a con n ect ion
by callin g t h e pg_connect() f u n ct ion . pg_connect() r et u r n s a d at ab ase h an d le ( also called a d at ab ase r esou r ce) . Man y of t h e
Post g r eSQL- r elat ed f u n ct ion s r eq u ir e a d at ab ase h an d le, so y ou n eed t o cap t u r e t h e r et u r n v alu e in a v ar iab le ( $db_handle) .

Wh en y ou call pg_connect(), y ou su p p ly a con n ect ion st r in g t h at con t ain s a list of property=value p air s [ 1] . Tab le 1 5 . 1 list s so m e
of t h e p r op er t ies t h at can ap p ear in a pg_connect() con n ect ion st r in g. I n client1.php, y ou sp ecif ied t w o p r op er t ies:
dbname=movies an d user=bruce.

[ 1]
Wh en y ou call pg_connect() w it h a sin gle ar g u m en t , PHP calls t h e PQconnectdb() f u n ct ion f r om Post g r eSQL' s
lib p q API . PHP is y et an ot h er Post g r eSQL API im p lem en t ed in t er m s of lib p q .

T a b l e 1 5 . 1 . Co n n e ct i o n At t r i b u t e s

Co n n e ct - st r i n g P r o p e r t y En v i r o n m e n t V a r i a b l e Ex a m p l e

user PGUSER user=korry

password PGPASSWORD password=cows

dbname PGDATABASE dbname=accounting

host PGHOST host=jersey

hostaddr PGHOSTADDR hostaddr=127.0.0.1

port PGPORT port=5432

I f y ou d on ' t sp ecif y on e or m or e of t h e con n ect - st r in g p r op er t ies, d ef au lt v alu es ar e d er iv ed f r om t h e en v ir on m en t v ar iab les sh ow n


in Tab le 1 5 . 1 . I f n ecessar y , pg_connect() w ill u se h ar d- cod ed d ef au lt v alu es f or t h e host(localhost) an d port(5432) p r op er t ies.
See t h e sect ion t it led " Con n ect ion Pr op er t ies" in Ch ap t er 5 , " I n t r od u ct ion t o Post g r eSQL Pr og r am m in g , " f or a com p let e d escr ip t ion
of t h e con n ect ion p r op er t ies t h at y ou can u se w h en y ou call pg_connect().

I ' m n ot v er y com f or t ab le w it h t h e id ea of leav in g u ser n am es an d p assw or d s sit t in g ar ou n d in t h e w eb ser v er ' s d ocu m en t t r ee. I t ' s
j u st t oo easy t o m ak e a con f ig u r at ion er r or t h at w ill let a su r f er g r ab y ou r PHP scr ip t f iles in plain - t ex t f or m . I f t h at h ap p en s, y ou ' v e
su d d en ly ex p osed y ou r Post g r eSQL p assw or d t o t h e w or ld .

A b et t er solu t ion is t o f act or t h e cod e t h at est ab lish es a d at ab ase con n ect ion in t o a sep ar at e PHP scr ip t an d t h en m o v e t h at scr ip t
ou t sid e t h e w eb ser v er ' s d ocu m en t t r ee. List in g 1 5 . 5 sh ow s a m or e secu r e v er sion of y ou r b asic Post g r eSQL/ PHP scr ip t .

Li st i n g 1 5 . 5 . client1b.php

1 <?php
2 //Filename: client1b.php
3
4 include( "secure/my_connect_pg.php" );
5
6 $db_handle = my_connect_pg( "movies" );
7
8 echo "<HTML>\n";
9 echo "<HEAD>\n";
10 echo "<TITLE>client1</TITLE>\n";
11 echo "<BODY>\n";
12 echo "<CENTER>";
13 echo "Connected to " . pg_dbname( $db_handle );
14 echo "</CENTER>\n";
15 echo "</BODY>\n";
16 echo "</HTML>";
17 ?>

I f y ou com p ar e t h is t o client1a.php, y ou ' ll see t h at I ' v e r ep laced t h e call t o pg_connect() w it h a call t o my_connect_pg(). I ' v e
also ad d ed a call t o PHP' s include() d ir ect iv e. Th e include() d ir ect iv e is sim ilar t o t h e #include d ir ect iv e f ou n d in m ost C
p r og r am s: include(filename) in lin es t h e n am ed f ile in t o t h e PHP scr ip t ( .php) . No w let ' s look at t h e my_connect_pg.php f ile ( see
List in g 1 5 . 6 ) .

Li st i n g 1 5 . 6 . connect_pg.php

1 <?php
2 // File: my_connect_pg.php
3
4 function my_connect_pg( $dbname )
5 {
6 $connect_string = "user=korry password=cows dbname=";
7 $connect_string .= $dbname;
8
9 return( pg_connect( $connect_string ));
10 }
11 ?>

Th is scr ip t d ef in es a f u n ct ion , n am ed my_connect_pg(), w h ich y ou can call t o cr eat e a Post g r eSQL con n ect ion . my_connect_pg()
ex p ect s a sin g le st r in g ar g u m en t , w h ich m u st sp ecif y t h e n am e of a Post g r eSQL d at ab ase.

Not ice t h at t h e u ser n am e an d p assw or d ar e ex plicit ly in clu d ed in t h is scr ipt . Place t h is scr ipt ou t side of t h e w eb ser v er ' s d ocu m en t
t r ee so t h at it can ' t f all in t o t h e h an d s of a w eb su r f er . Th e q u est ion is: Wh er e sh ou ld y ou p u t it ? Wh en y ou call t h e include()
dir ect iv e ( or t h e r elat ed require() f u n ct ion ) , y ou can sp ecif y an ab solu t e p at h or a r elat iv e p at h . An ab solu t e p at h st ar t s w it h a /
( or d r iv e n am e or b ack slash in Win d ow s) . A r elat iv e p at h d oes n ot . Th e PHP in t er p r et er u ses a sear ch p at h ( t h at is, a list of
d ir ect or y n am es) t o r esolv e r elat iv e p at h n am es. You can f in d t h e sear ch p at h u sin g PHP' s ini_get() f u n ct ion :

...
echo "Include path = " . ini_get( "include_path" );
...

Th e ini_get() f u n ct ion r et u r n s a v ar iab le d ef in ed in PHP' s in it ializat ion file [ 2] ; in t h is case, t h e v alu e of include_path. On m y
sy st em , ini_get(" include_path" ) r et u r n s " .:/usr/local/php" . PHP sear ch es f or include f iles in t h e cu r r en t d ir ect or y ( t h at is,
t he d ir ect or y t h at con t ain s t h e in clu d in g scr ip t ) , an d t h en in /usr/local/php. I f y ou r ef er b ack t o List in g 1 5 . 5 , y ou ' ll see t h at I am
in clu d in g secure/my_connect_pg.php. Com b in in g t h e sear ch p at h an d r elat iv e p at h n am e, PHP w ill f in d m y include f ile
in /usr/local/php/secure/my_connect_pg.php. Th e im p or t an t d et ail h er e is t h at /usr/local/php is ou t sid e t h e w eb ser v er ' s
d ocu m en t t r ee ( /usr/local/htdocs) .

[ 2]
You can f in d t h e PHP' s in it ializat ion file u sin g ech o get_cfg_var( "cfg_file_path" ).

Th e my_connect_pg.php scr ip t n o t on ly secu r es t h e Post g r eSQL p assw or d , it also g iv es y ou a sin g le con n ect ion f u n ct ion t h at y ou
can call f r om an y scr ip t — all y ou n eed t o k n o w is t h e n am e of t h e d at ab ase t h at y ou w an t .

I f ev er y t h in g g oes w ell, t h e u ser w ill see t h e m essag e " Con n ect ed t o m ov ies. "

Let ' s see w h at h ap p en s w h en y ou t h r ow a f ew er r or con d it ion s at t h is scr ip t . Fir st , t r y t o con n ect t o a n on ex ist en t d at ab ase ( see
Figu r e 1 5 . 3 ) .

Fi g u r e 1 5 . 3 . Co n n e ct i n g t o a n o n e x i st e n t d a t a b a se .

[ View f u ll size im ag e]
Th at ' s n ot a f r ien d ly er r or m essag e. Let ' s see w h at h ap p en s w h en y ou t r y t o con n ect t o a d at ab ase t h at d oes ex ist , b u t w h er e t h e
Post g r eSQL ser v er h as b een sh u t d ow n ( see Fig u r e 1 5 . 4 ) .

Fi g u r e 1 5 . 4 . Co n n e ct i n g t o a d a t a b a se t h a t h a s b e e n sh u t d o w n .

[ View f u ll size im ag e]

Ag ain , n ot ex act ly t h e k in d of m essag e t h at y ou w an t y ou r u ser s t o see. I n t h e n ex t sect ion , I 'll sh ow y ou h o w t o in t er cep t t h is sor t
of er r or an d r esp on d a lit t le m or e g r acef u lly .
Clie n t 2 —Ad d in g Er r or Ch e ck in g

You ' v e seen t h at PHP w ill sim p ly d u m p er r or m essag es in t o t h e ou t p u t st r eam sen t t o t h e w eb b r ow ser . Th at m ak es it easy t o
d eb u g PHP scr ip t s, b u t it ' s n ot p ar t icu lar ly k in d t o y ou r u ser s.

Th er e ar e t w o er r or m essag es d isp lay ed in Figu r e 1 5 . 4 . Th e f ir st er r or occu r s w h en y ou call t h e pg_connect() f u n ct ion . Not ice t h at
t h e er r or m essag e in clu d es t h e n a m e of t h e scr ip t t h at w as r u n n in g at t h e t im e t h e er r or occu r r ed . I n t h is case,
my_connect__pg.php en cou n t er ed a n er r or on lin e 9 —t h at ' s t h e call t o pg_connect(). Th e secon d er r or m essag e com es f r om lin e
1 3 of client1b.php, w h er e y ou t r y t o u se t h e d at ab ase h an d le r et u r n ed b y my_connect_pg(). Wh en t h e f ir st er r or occu r r ed ,
pg_connect() r et u r n ed an in v alid h an d le an d my_connect_pg() r et u r n ed t h at v alu e t o t h e caller .

List in g 1 5 . 7 sh ow s a n ew v er sion of t h e clien t scr ip t t h at in t er cep t s b ot h er r or m essag es.

Li st i n g 1 5 . 7 . client2a.php

Cod e View : Scr oll / Sh ow All

1 <?php
2 //Filename: client2a.php
3
4 include( "secure/my_connect_pg.php" );
5
6 $db_handle = @my_connect_pg( "movies" );
7
8 echo "<HTML>\n";
9 echo "<HEAD>\n";
10 echo "<TITLE>client1b</TITLE>\n";
11 echo "<BODY>\n";
12 echo "<CENTER>";
13
14 if( $db_handle == FALSE )
15 echo "Sorry, can't connect to the movies database";
16 else
17 echo "Connected to " . pg_dbname( $db_handle );
18
19 echo "</CENTER>\n";
20 echo "</BODY>\n";
21 echo "</HTML>";
22 ?>

I f y ou com p ar e t h is scr ip t w it h client1b.php, y ou ' ll see t h at t h ey ar e v er y sim ilar . Th e f ir st ch an g e is at lin e 6 —I ' v e ad d ed a @


ch ar act er in f r on t of t h e call t o my_connect_pg(). Th e @ ch ar act er t u r n s off er r or r ep or t in g f or t h e ex p r ession t h at f ollow s. Th e n ex t
ch an g e is at lin e 1 4 . Rat h er t h an blin dly u sin g t h e d at ab ase h an d le r et u r n ed b y my_connect_pg(), y ou sh ou ld f ir st en su r e t h at it is
a v alid h an d le. pg_connect() ( an d t h er ef or e my_connect_pg()) w ill r et u r n FALSE t o in d icat e t h at a con n ect ion cou ld n ot b e
est ab lish ed . I f y ou f in d t h at $db_handle is FALSE, client2a d isp lay s a f r ien d ly er r or m essag e; ot h er w ise, it d isp lay s t h e n am e of
t he d at ab ase t o w h ich y ou ar e con n ect ed ( see Fig u r e 1 5 . 5 ) .

Fi g u r e 1 5 . 5 . A f r i e n d l i e r e r r o r m e ssa g e .

[ View f u ll size im ag e]
Th is look s m u ch n icer , b u t n o w w e' v e lost t h e d et ails t h at w e n eed t o d eb u g con n ect ion p r ob lem s. Wh at w e r eally w an t is a f r ien d ly
er r or m essag e f or t h e u ser , b u t d et ails f or t h e ad m in ist r at or .

You can ach iev e t h is u sin g a cu st om - w r it t en er r or h an d ler . List in g 1 5 . 8 sh ow s a cu st om er r or h an d ler t h at em ails t h e t ex t of a n y


er r or m essag es t o y ou r ad m in ist r at or .

Li st i n g 1 5 . 8 . my_error_handler.php

1 <?php
2
3 // Filename: my_handler.inc
4
5 function my_handler( $errno, $errmsg, $fname, $lineno, $context )
6 {
7
8 $err_txt = "At " . date("Y-m-d H:i:s (T)");
9 $err_txt .= " an error occured at line " . $lineno;
10 $err_txt .= " of file " . $fname . "\n\n";
11 $err_txt .= "The text of the error message is:\n";
12 $err_txt .= $errmsg;
13
14 mail( "administrator", "Website error", $err_txt );
15 }
16 ?>

I n a m om en t , y ou ' ll m od if y t h e client2a.php scr ip t so t h at it in st alls t h is er r or h an d ler b ef or e con n ect in g t o Post g r eSQL.

An er r or h an d ler f u n ct ion is called w h en ev er a PHP scr ipt en cou n t er s a n er r or . Th e d ef au lt er r or h an d ler w r it es er r or m essag es in t o


t h e ou t p u t st r eam sen t t o t h e w eb b r ow ser . Th e cu st om er r or h an d ler sh ow n in List in g 1 5 . 8 b u ild s a n em ail m essag e f r om t h e
v ar iou s er r or m essag e com p on en t s an d t h en u ses PHP' s mail() f u n ct ion t o sen d t h e m essag e t o an ad d r ess of y ou r ch oice.

Now , let ' s m od if y t h e clien t so t h at it u ses my_handler() ( see List in g 1 5 . 9 ) .

Li st i n g 1 5 . 9 . client2b.php

Cod e View : Scr oll / Sh ow All

1 <?php
2 //Filename: client2b.php
3
4 include( "secure/my_connect_pg.php" );
5 include( "my_handler.php" );
6
7 set_error_handler( "my_handler" );
8
9 $db_handle = my_connect_pg( "movies" );
10
11 echo "<HTML>\n";
12 echo "<HEAD>\n";
13 echo "<TITLE>client2b</TITLE>\n";
14 echo "<BODY>\n";
15 echo "<CENTER>";
16
17 if( $db_handle == FALSE )
18 echo "Sorry, can't connect to the movies database";
19 else
20 echo "Connected to " . pg_dbname( $db_handle );
21
22 echo "</CENTER>\n";
23 echo "</BODY>\n";
24 echo "</HTML>";
25
26 restore_error_handler();
27 ?>

You ' ll m ak e f ou r m in or ch an g es t o client2a.php. Fir st , include() my__handler.php. Nex t , call set_error_handler() t o d ir ect
PHP t o call my_handler() r at h er t h an t h e d ef au lt er r or h an d ler ( see lin e 7 ) . Th ir d , r em ov e t h e @ f r om t h e call t o my_connect_pg
()— y ou w an t er r or s t o b e r ep or t ed n o w ; y ou j u st w an t t h em r ep or t ed t h r ou g h my_handler(). Fin ally , at lin e 2 6 , r est or e t h e
d ef au lt er r or h an d ler ( b ecau se t h is is t h e last st at em en t in y ou r scr ipt , t h is isn ' t st r ict ly r eq u ir ed ) .

Now , if y ou r u n client2b.php, y ou ' ll see a u ser - f r ien dly er r or m essag e, an d y ou sh ou ld g et a p iece of em ail sim ilar t o t h is:

From daemon Sat Jan 12 09:15:59 2002


Date: Sat, 12 Jan 2002 09:15:59 -0400
From: daemon <daemon@davinci>
To: bruce@virtual_movies.com
Subject: Website error

At 2002-02-12 09:15:59 (EDT) an error occurred at line 9


of file /usr/local/php/secure/my_connect_pg.php

The text of the error message is:


pg_connect() unable to connect to PostgreSQL server: could
not connect to server: No such file or directory

Is the server running locally and accepting


connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

Now , y ou k n o w h o w t o su p p r ess er r or m essag es ( u sin g t h e @ op er at or ) an d h o w t o in t er cep t t h em w it h y ou r o w n er r or h an d ler .

I n t h e r em ain in g sam p les in t h is ch ap t er , I w ill om it m ost er r or h an d lin g cod e so t h at y ou can see an y er r or m essag es in y ou r w eb
b r ow ser ; t h at sh ou ld m ak e d eb u g g in g a lit t le easier .

Now , it ' s t im e t o m o v e o n t o t h e n ex t t op ic- q u er y p r ocessin g .


Te ch n isch e U n iv e r sit a t W ie n • Logout >

Se a r ch < Ret u r n t o Sear ch Resu lt s • Pr in t

Ent ir e Sit e Post g r e SQL, Se con d Edit ion

All Con t en t Table of Con t en t s • I n d ex

Cu r r en t Book On ly Sear ch Ter m s: Show / Hide

" r em ot e sen sin g " Using Post greSQL w it h PH P > Client 3 —Query Processing
Ad v an ced Sear ch
Clie n t 3 —Que r y Pr oce ssin g
Ta b le of Con t e n t s
Th e t ask s in v olv ed in p r ocessin g a q u er y ( or ot h er co m m an d ) u sin g PHP ar e sim ilar t o
t h ose r eq u ir ed in ot h er Post g r eSQL API s. Th e f ir st st ep is t o ex ecu t e t h e co m m an d ; t h en
y ou can ( op t ion ally ) p r ocess t h e m et ad at a r et u r n ed b y t h e co m m an d ; an d f in ally , y ou
P o st g r e SQ L, Se co n d
Ed i t i o n p r ocess t h e r esu lt set .

Cop y r ig h t
We' r e g oin g t o sw it ch g ear s h er e. So f ar , w e h av e b een w r it in g PHP scr ip t s t h at ar e
Th e Real Valu e in Fr ee
p r oced u r al—on e PHP co m m an d f ollow s t h e n ex t . We' v e t h r ow n in a cou p le of f u n ct ion s t o
Sof t w ar e
f act or ou t so m e r ep et it iv e d et ails ( su ch as est ab lish in g a n ew con n ect ion ) . For t h e n ex t
Ab ou t t h e Au t h or s
ex am p le, y ou ' ll cr eat e a PHP class, n am ed my_table, t h at w ill ex ecu t e a co m m an d an d
Ack n ow led g m en t s p r ocess t h e r esu lt s. You can r eu se t h is class in ot h er PHP scr ip t s; an d each t im e y ou
We Wan t t o Hear f r om You ! ex t en d t h e class, all scr ip t s au t om at ically in h er it t h e ch an g es.
Read er Ser v ices
I n t r od u ct ion Let ' s st ar t b y look in g at t h e f ir st scr ip t t h at u ses t h e my_table class an d t h en w e' ll st ar t
Gen er al Post g r eSQL Use d ev elop in g t h e class. List in g 1 5 . 1 0 sh ow s client3a.php.
Pr og r am m in g w it h
Post g r eSQL
List ing 1 5 .1 0 . client3a.php
I n t r od u ct ion t o
Post g r eSQL
Pr og r am m in g
Code View : Scr oll / Show All
Ex t en d in g Post g r eSQL
PL/ p g SQL
1 <HTML>
Th e Post g r eSQL C 2 <HEAD>
API —libpq 3 <TITLE>client3a</TITLE>
A Sim p ler C API — 4 <BODY>
lib p g easy 5
Th e New Post g r eSQL 6 <?php
C+ + API —libpqx x 7 //Filename: client3a.php
8
Em b ed d in g SQL
9 include( "secure/my_connect_pg.php" );
Co m m an d s in C
10 include( "my_table_a.php" );
Pr og r am s—ecp g
11
Usin g Post g r eSQL f r o m 12 $db_handle = my_connect_pg( "movies" );
an OD BC Clien t 13
Ap p licat ion 14 $table = new my_table( $db_handle, "SELECT * FROM customers;" );
Usin g Post g r eSQL f r o m 15 $table->finish();
a Jav a Clien t 16
Ap p licat ion 17 pg_close( $db_handle );
Usin g Post g r eSQL w it h 18
Per l 19 ?>
20
Usin g Post g r eSQL w it h
21 </BODY>
PHP
22 </HTML>
PHP Ar ch it ect u r e
Ov er v iew
Pr er eq u isit es
Clien t 1 —Con n ect in g
t o t h e Ser v er
Clien t 2 —Ad d in g Er r or
Ch eck in g
Clie n t 3 —Q u e r y I r ear r an g ed t h e cod e in t h is clien t so t h at t h e st at ic ( t h at is, u n ch an g in g ) HTML cod e is
P r o ce ssi n g sep ar at ed f r om t h e PHP scr ip t ; t h at m ak es it a lit t le easier t o d iscer n t h e scr ip t .
Clien t 4 —An
I n t er act iv e Qu er y
Pr ocessor At lin e 1 0 , client3a includes() t h e my_table_a.php f ile. Th at file con t ain s t h e d ef in it ion
of t h e my_table class, an d w e' ll look at it in g r eat er d et ail in a m om en t . Lin e 1 4 cr eat es a
Ot h er Feat u r es
n ew my_table ob j ect n am ed $table. Th e con st r u ct or f u n ct ion f or t h e my_table class
Su m m ar y
ex p ect s t w o p ar am et er s: a d at ab ase h an d le an d a co m m an d st r in g . my_table() ex ecu t es
Usin g Post g r eSQL w it h
Tcl an d Tcl/ Tk t h e g iv en co m m an d an d f or m at s t h e r esu lt s in t o an HTML t ab le. At lin e 1 5 , t h e call t o
my_table->finish() com p let es t h e HTML t ab le. Fin ally , y ou call pg_close() t o close t h e
Usin g Post g r eSQL w it h
Py t h on d at ab ase con n ect ion ; t h at ' s n ot st r ict ly n ecessar y , b u t it ' s g ood f or m .
Np g sq l: Th e . NET Dat a
Pr ov id er List in g 1 5 . 1 1 sh ow s my_table_a.php.
Ot h er Usef u l
Pr og r am m in g Tools
Post g r eSQL Ad m in ist r at ion
I n d ex
List ing 1 5 .1 1 . my_table_a.php

Br ow se b y Ca t e g or y Code View : Scr oll / Show All


Ap p lied Scien ces
Ar t if icial I n t ellig en ce
1 <?php
2
Bu sin ess 3 // Filename: my_table_a.php
Cer t if icat ion 4
Com p u t er Scien ce 5 class my_table
6 {
Dat ab ases
7 var $result;
Access 8 var $columns;
Ad m in ist r at ion 9
10 function my_table( $db_handle, $command )
Ber k eley D B
11 {
Cold Fu sion 12 $this->result = pg_query( $db_handle, $command );
Dat a Min in g 13 $this->columns = pg_num_fields( $this->result );
Dat a War eh ou sin g 14 $row_count = pg_num_rows( $this->result );
15
D B2
16 $this->start_table();
Dat ab ase Desig n 17
Dat ab ase Man ag em en t 18 for( $row = 0; $row < $row_count; $row++ )
19 $this->append_row( $this->result, $row );
Filem ak er Pr o
20 }
Fox Pr o 21
I n f or m ix 22 function start_table()
I n t r od u ct ion 23 {
24 echo '<TABLE CELLPADDING="2" CELLSPACING="0" BORDER=1>';
My SQL
25 echo "\n";
Ob j ect Mo d el 26 }
Ob j ect Or ien t ed 27
Dat ab ase 28 function finish()
29 {
Or acle
30 print( "</TABLE>\n" );
Per f or m an ce Tu n in g 31
P o st g r e SQ L 32 pg_free_result( $this->result );
Relat ion al Dat ab ase 33 }
34
SQL
35 function append_row( $result, $row )
SQL Ser v er 36 {
Visu al Basic 37 echo( "<TR>\n" );
38
Web / I n t er n et Dat ab ase
39 for( $col = 0; $col < $this->columns; $col++ )
Desk t op Pu b lish in g 40 {
Desk t op Ap p licat ion s 41 echo " <TD>";
E- Bu sin ess 42 echo pg_fetch_result( $result, $row, $col );
43 echo "</TD>\n";
E- Com m er ce
44 }
En t er p r ise Com p u t in g 45
Gr ap h ics 46 echo( "</TR>\n" );
47 }
Hu m an - Com p u t er
I n t er act ion
48 }
49
Har d w ar e 50 ?>
I n t er n et / On lin e
I T Man ag em en t
Mar k u p Lan g u ag es
Mu lt im ed ia
Net w or k in g
Op er at in g Sy st em s
Pr og r am m in g my_table.php d ef in es a sin g le class n am ed my_table. At lin es 7 an d 8 , y ou see t h e t w o
Secu r it y in st an ce v ar iab les f or t h is class. $this->$result con t ain s a h an d le t o a r esu lt set . $this-
Sof t w ar e En g in eer in g >columns st or es t h e n u m b er of colu m n s in t h e r esu lt set .
View All Tit les >
Th e con st r u ct or f or my_table ( lin es 1 0 t h r ou g h 2 0 ) ex p ect s a d at ab ase h an d le an d a
co m m an d st r in g . At lin e 1 2 , t h e con st r u ct or calls t h e pq_query() f u n ct ion t o ex ecu t e t h e
g iv en co m m an d . pg_query() r et u r n s a r esu lt set h an d le if su ccessf u l, an d r et u r n s FALSE if
an er r or occu r s. You ' ll see h o w t o in t er cep t pg_query() er r or s in a m o m en t . Af t er y ou h av e
a r esu lt set , y ou can call pg_num_fields() t o d et er m in e t h e n u m b er of colu m n s in t h e
r esu lt set a n d pg_num_rows() t o f in d t h e n u m b er of r ow s.

pg_query() in Ea r lie r PH P V e r sion s

I n old er v er sion s of PHP, t h e pg_query() f u n ct ion w as n am ed pg_exec(),


pg_num_fields() w as n am ed pg_numfields(), an d pg_num_rows() w as
n am ed pg_numrows(). I f y o u r u n in t o com p lain t s ab ou t in v alid f u n ct ion n am es,
t r y t h e old n am es.

At lin e 1 6 , t h e call t o t h e start_table() m em b er f u n ct ion p r in t s t h e HTML t ab le h ead er .


Fin ally , at lin es 1 8 a n d 1 9 , t h e con st r u ct or it er at es t h r ou g h each r o w in t h e r esu lt set an d
calls append_row() t o cr eat e a n ew r ow in t h e HTML t ab le. We' ll look at append_row()
sh or t ly .
Th e start_table() a n d finish_table() m em b er f u n ct ion s cr eat e t h e HTML t ab le h ead er
an d t ab le f oot er , r esp ect iv ely . finish_table() also f r ees u p t h e r esou r ces co n su m ed b y
t h e r esu lt set b y callin g pg_free_result().

Th e append_row() m em b er f u n ct ion st ar t s at lin e 3 5 . append_row() ex p ect s t w o


p ar am et er s: a r esu lt set h an d le ( $result) an d a r o w n u m b er ( $row) . At lin e 3 7 ,
append_row() w r it es t h e HTML t ab le- r ow t ag ( <TR>) . Th e loop at lin es 3 9 t h r ou g h 4 4
p r ocesses each col u m n in t h e g iv en r o w . Fo r each colu m n , append_row() w r it es t h e HTML
t able- d at a t ag ( <TD>) an d t h e t ab le - d at a closin g t ag ( </TD>) . I n - b et w een t h ese t ag s, y o u
see a call t o pg_fetch_result() t h at r et r iev es a sin gle v alu e f r om t h e r esu lt set . Wh en
y ou call pg_fetch_result(), y ou p r ov id e t h r ee p ar am et er s: a r esu lt set h an d le, a r o w
n u m b er , an d a col u m n n u m b er . pg_fetch_result() r et u r n s NULL if t h e r eq u est ed v alu e is
NULL[ 3] . I f n ot NULL, pg_fetch_result() w ill r et u r n t h e r eq u est ed v alu e in t h e f or m of a
st r in g. Not e t h at t h e PHP/ Post g r eSQL d ocu m en t at ion st at es n u m er ic v alu es ar e r et u r n ed as
float or integer v alu es. Th is ap p ear s n ot t o b e t h e case; all v alu es ar e r et u r n ed in st r in g
f or m .

[ 3] I n PHP 4.0 and above, NULL is equal to FALSE, but not identical to FALSE. This means that NULL
== FALSE evaluates to TRUE, but NULL === FALSE does not.

Now if y ou load client3a.php in y ou r w eb b r ow ser , y ou ' ll see a t ab le sim ilar t o t h at sh ow n


in Figu r e 1 5 . 6 .

Figur e 1 5 .6 . client3a.php loa de d int o you r w e b br ow se r .

[ View full size im age]

Ot he r W a ys t o Re t r ie ve Result Se t Va lue s

Besid es pg_fetch_result(), PHP p r ov id es a n u m b er of f u n ct ion s t h at r et r iev e r esu lt set


v alu es.

Th e pg_fetch_row() f u n ct ion r et u r n s an ar r ay of v alu es t h at cor r esp on d t o a g iv en r o w .


pg_fetch_row() r eq u ir es t w o p ar am et er s: a r esu lt r esou r ce ( also k n o w n as a r esu lt set
h an d le) an d a r o w n u m b er .

pg_fetch_row( resource result, int row_number )

List in g 1 5 . 1 2 sh ow s t h e my_table.append_row() m em b er f u n ct ion im p lem en t ed in t er m s


of pg_fetch_row().

List ing 1 5 .1 2 . append_row() Using pg_fetch_row()

...
1 function append_row( $result, $row )
2 {
3 echo( "<TR>\n" );
4
5 $values = pg_fetch_row( $result, $row );
6
7 for( $col = 0; $col < count( $values ); $col++ )
8 {
9 echo " <TD>";
10 echo $values[$col];
11 echo "</TD>\n";
12 }
13 echo( "</TR>\n" );
14 }
...

I n t h is v er sion , y ou f et ch t h e r eq u est ed r o w at lin e 5 . Wh en t h e call t o pg_fetch_row()


com p let es, $values w ill con t ain an ar r ay of col u m n v alu es. You can access each ar r ay
elem en t u sin g a n in t eg er in d ex , st ar t in g at el em en t 0 .

Th e n ex t f u n ct ion , pg_fetch_array(), is sim ilar t o pg_fetch_row(). Lik e pg_fetch_row(),


pg_fetch_array() r et u r n s an ar r ay of colu m n s v alu es. Th e d if f er en ce b et w een t h ese
f u n ct ion s is t h at pg_fetch_array() can r et u r n a n or m al ar r ay ( in d ex ed b y colu m n
n u m b er ) , an associat iv e ar r ay ( in d ex ed b y colu m n n am e) , or b ot h . pg_fetch_array()
ex p ect s on e, t w o, or t h r ee p ar am et er s:

pg_fetch_array( resource result [, int row [, int result_type ]] )

Th e t h ir d p ar am et er can b e PGSQL_NUM, PGSQL_ASSOC, or PGSQL_BOTH. Wh en y o u sp ecif y


PGSQL_NUM, pg_fetch_array() op er at es id en t ically t o pg_fetch_row(); t h e r et u r n v alu e is
an ar r ay in d ex ed b y colu m n n u m b er . Wh en y ou sp ecif y PGSQL_ASSOC, pg_fetch_array()
r et u r n s an associat iv e ar r ay in d ex ed b y colu m n n am e. I f y ou sp ecif y PGSQL_BOTH, y ou w ill
g et b ack an ar r ay t h at can b e in d ex ed b y colu m n n u m b er as w ell as b y col u m n n am e.
List in g 1 5 . 1 3 sh ow s t h e append_row() f u n ct ion r ew r it t en t o u se pg_fetch_array().

List ing 1 5 .1 3 . append_row() Using pg_fetch_array()

...
1 function append_row( $result, $row )
2 {
3 echo( "<TR>\n" );
4
5 $values = pg_fetch_array( $result, $row, PGSQL_ASSOC );
6
7 foreach( $values as $column_value )
8 {
9 echo " <TD>";
10 echo $column_value;
11 echo "</TD>\n";
12 }
13
14 echo( "</TR>\n" );
15 }
...

You sh ou ld n ot e t h at t h is v er sion of append_row() m isses t h e p oin t of u sin g PGSQL_ASSOC.


I t ig n or es t h e f act t h at pg_fetch_array() h as r et u r n ed an associat iv e ar r ay . Associat iv e
ar r ay s m a k e it easy t o w o r k w it h a r esu lt set if y o u k n ow t h e col u m n n am es ah ead of t im e
( t h at is, at t h e t im e y o u w r it e y ou r scr ip t ) , b u t t h ey r eally d on ' t of f er m u ch of an
ad v an t ag e f or ad h oc q u er ies. To r eally t ak e ad v an t ag e of pg_fetch_array(), y ou w ou ld
w r it e cod e su ch as

...
$result = pg_query( $dbhandle, "SELECT * FROM customers;" );

for( $row = 0; $row < pg_num_rows( $result ); $row++ )


{
$customer = pg_fetch_array( $result, $row, PGSQL_ASSOC );

do_something_useful( $customer["customer_name"] );

do_something_else( $customer["id"], $customer["phone"] );


}
...

You can also ob t ain an associat iv e ar r ay b y callin g pg_fetch_assoc( resource result [,


int row] ). Callin g pg_fetch_assoc() is eq u iv alen t t o callin g pg_fetch_array( ...,
PGSQL_ASSOC ).

An ot h er f u n ct ion u sef u l f or st at ic q u er ies is pg_fetch_object(). pg_fetch_object()


r et u r n s a sin g le r o w in t h e f or m of a n ob j ect . Th e ob j ect r et u r n ed h as on e f ield f or each
colu m n , a n d t h e n am e of each f ield w ill b e t h e sam e as t h e n a m e of t h e colu m n . For
ex am p le:

...
$result = pg_query( $dbhandle, "SELECT * FROM customers;" );

for( $row = 0; $row < pg_num_rows( $result ); $row++ )


{
$customer = pg_fetch_object( $result, $row, PGSQL_ASSOC );

do_something_useful( $customer->customer_name );

do_something_else( $customer->id, $customer->phone );


}
...

Th er e is no sig n if ican t d if f er en ce b et w een an ob j ect r et u r n ed b y pg_fetch_object() an d


an associat iv e ar r ay r et u r n ed b y pg_fetch_array(). Wit h pg_fetch_array(), y ou
r ef er en ce a v alu e u sin g $array[$column] sy n t ax . Wit h pg_fetch_object(), y ou r ef er en ce
a v alu e u sin g $object->$column sy n t ax . Ch oose w h ich ev er sy n t ax y ou p r ef er .

On e w ar n in g ab ou t pg_fetch_object() an d pg_fetch_array( ..., PGSQL_ASSOC)—if


y ou r q u er y r et u r n s t w o o r m or e colu m n s w it h t h e sam e colu m n n am e, y ou w ill lose all b u t
on e of t h e colu m n s. You can ' t h av e an associat iv e ar r ay w it h d u p licat e in d ex n am es, an d
y ou can ' t h av e a n ob j ect w it h d u p licat e f ield n am es.

M e t a da t a Access

You ' v e seen t h at pg_fetch_object() a n d pg_fetch_array() ex p ose colu m n n am es t o


y ou , b u t t h e PHP/ Post g r eSQL API let s y ou g et at m u ch m or e m et ad at a t h an j u st t h e
colu m n n am es.

Th e PHP/ Post g r eSQL in t er f ace is w r it t en u sin g lib p q ( Post g r eSQL' s C- lan g u ag e API ) . Most
of t h e f u n ct ion s av ailab le t h r ou g h lib p q can b e called f r om PHP, in clu d in g t h e lib p q
m et ad at a f u n ct ion s. Un f or t u n at ely , t h is m ean s t h at PHP sh ar es t h e lim it at ion s t h at y ou f in d
in lib p q .

I n p ar t icu lar , t h e pg_field_size() f u n ct ion r et u r n s t h e size of a f ield . pg_field_size()


ex p ect s t w o p ar am et er s:

int pg_field_size( resource $result, int $column_number )

Th e p r ob lem w it h t h is f u n ct ion is t h at t h e size r ep or t ed is t h e n u m b er of b y t es r eq u ir ed t o


st or e t h e v alu e on t h e ser v er . I t h as n ot h in g t o d o w it h t h e n u m b er of b y t es seen b y t h e
clien t ( t h at is, t h e n u m b er of b y t es seen b y y ou r PHP scr ip t ) . For v ar iab le- len g t h d at a
t y p es, pg_field_size() w ill r et u r n - 1 . I f y ou ' r e u sin g a n ew er v er sion of PHP ( at least
v er sion 4 . 2 . 0 ) y ou can call pg_field_prtlen() t o f in d t h e st r in g len g t h of a g iv en v alu e.
You can call pg_field_prtlen() in eit h er of t h e f ollow in g f or m s:

Cod e View : Scr oll / Sh ow All

int pg_field_prtlen( resource $result, int $row_number, int $column_number )


int pg_field_prtlen( resource $result, int $row_number, string $column_name )

Th e pg_field_type() f u n ct ion r et u r n s t h e n a m e of t h e d at a t y p e f or a g iv en colu m n .


pg_field_type() r eq u ir es t w o p ar am et er s:

int pg_field_type( resource $result, int $column_number )

Th e p r ob lem w it h pg_field_type() is t h at it is n ot 1 0 0 % accu r at e. pg_field_type()


k n ow s n ot h in g of u ser - d ef in ed t y p es or d om ain s. Also, pg_field_type() w on ' t r et u r n
d et ails ab o u t p ar am et er ized d at a t y p es. For ex am p le, a col u m n d ef in ed as NUMERIC( 7,2 )
is r ep or t ed as t y p e NUMERIC. Not e: pg_field_type() h as b een im p r ov ed in PHP v er sion 5 ;
it n o w q u er ies t h e ser v er t o r et r iev e t h e n am e of t h e colu m n ' s d at a t y p e so it w ill r et u r n
t h e cor r ect n am e f or u ser - d ef in ed t y p es an d d om ain s ( b u t it st ill d oesn ' t r et u r n d et ails
ab ou t p ar am et er ized t y p es) .

Hav in g con v ey ed t h e b ad n ew s, let ' s look at t h e m et ad at a f u n ct ion s t h at ar e a lit t le m or e


u sef u l f or m ost ap p licat ion s.
You ' v e alr ead y seen pg_num_rows() a n d pg_num_fields(). Th ese f u n ct ion s r et u r n t h e
n u m b er of r ow s a n d colu m n s ( r esp ect iv ely ) in a r esu lt set .

Th e pg_field_name() a n d pg_field_num() f u n ct ion s ar e so m ew h at r elat ed .


pg_field_name() r et u r n s t h e n am e of a colu m n , g iv en a colu m n n u m b er in d ex .
pg_field_num() r et u r n s t h e col u m n n u m b er in d ex of a f ield g iv en t h e f ield' s n am e.

Let ' s en h an ce t h e my_table class a bit b y in clu d in g col u m n n am es in t h e HTML t ab le t h at


w e p r od u ce. List in g 1 5 . 1 4 sh ow s a n ew v er sion of t h e start_table() m em b er f u n ct ion .

List ing 1 5 .1 4 . my_table.start_table()

1 function start_table()
2 {
3 echo '<TABLE CELLPADDING="2" CELLSPACING="0" BORDER=1>';
4
5 for( $col = 0; $col < $this->columns; $col++ )
6 {
7 echo " <TH>";
8 echo pg_field_name( $this->result, $col );
9 echo "</TH>\n";
10 }
11 echo "\n";
12 }

I u sed t h e <TH> t ag h er e in st ead of <TD>, so t h at t h e b r ow ser k n ow s t h at t h ese ar e t ab le


h ead er cells ( t able h ead er cells ar e t y p ically b old ed an d cen t er ed ) .

Now w h en y ou b r ow se t o client3a.php, y ou see a n ice set of col u m n h ead er s as sh o w n in


Figu r e 1 5 . 7 .

Figur e 1 5 .7 . client3a.php—w it h colum n he a de r s.

[ View full size im age]

Let ' s f ix o n e ot h er p r ob lem as lon g as w e ar e f iddlin g w it h m et ad at a. You m ay h av e n ot iced


t h at t h e last r o w in Figu r e 1 5 . 7 look s a lit t le f u n k y —t h e phone n u m b er cell h as n ot b een
d r aw n t h e sam e as t h e ot h er cells. Th at h ap p en s w h en w e t r y t o cr eat e a t ab le cell f or a
NULL v alu e. I f y o u look at t h e cod e t h at y ou b u ilt f or t h e HTML t ab le, y ou ' ll see t h at t h e last
r ow h as an em p t y <TD></TD> cell. For som e r eason , w eb b r ow ser s d r aw a n em p t y cell
d if f er en t ly .

To f ix t h is p r ob lem , y ou can m od if y append_row() t o d et ect NULL v alu es ( see List in g


15.15).

List ing 1 5 .1 5 . my_table.append_row()

1 function append_row( $result, $row )


2 {
3 echo( "<TR>\n" );
4
5 for( $col = 0; $col < $this->columns; $col++ )
6 {
7 echo " <TD>";
8
9 if( pg_field_is_null( $result, $row, $col ) == 1 )
10 echo "&nbsp;";
11 elseif( strlen( pg_result( $result, $row, $col )) == 0 )
12 echo "&nbsp;"
13 else
14 echo pg_result( $result, $row, $col );
15 echo "</TD>\n";
16 }
17
18 echo( "</TR>\n" );
19 }

At lin e 9 , y ou d et ect NULL v alu es u sin g t h e pg_field_is_null() f u n ct ion . I f y ou en cou n t er


a NULL, y ou ech o a n on - b r eak in g sp ace ch ar act er ( &nbsp;) in st ead of an em p t y st r in g . You
h av e t h e sa m e p r ob lem ( a b ad ly d r aw n b or d er ) if y ou en cou n t er an em p t y st r in g , an d y ou
fix it t h e sam e w ay ( lin es 1 1 an d 1 2 ) . Now , w h en y ou d isp lay a t ab le, all t h e cells ar e
d r aw n cor r ect ly , as sh ow n in Figu r e 1 5 . 8 .

Figure 1 5 .8 . client3a.php—fina l ve r sion.

[ View full size im age]

Th er e ar e a f ew m o r e m et ad at a f u n ct ion s t h at y ou can u se in PHP, an d y ou w ill n eed t h ose


f u n ct ion s in t h e n ex t clien t t h at y ou w r it e.

PH P, Po st g r e SQL, a n d Associa t iv e Fu n ct io n s

On e of t h e m or e in t er est in g ab st r act ion s p r om ised ( b u t n ot y et of f er ed ) b y PHP


an d t h e PHP/ Post g r eSQL API is t h e associat iv e f u n ct ion . An associat iv e f u n ct ion
g iv es y ou a w ay t o ex ecu t e a SQL co m m an d w it h ou t h av in g t o con st r u ct t h e
en t ir e co m m an d y ou r self . Let ' s say t h at y ou n eed t o INSERT a n ew r o w in t o t h e
customers t ab le. Th e m ost ob v iou s w ay t o d o t h is in PHP is t o b u ild u p a n
INSERT co m m an d b y con cat en at in g t h e n ew v alu es an d t h en ex ecu t in g t h e
co m m an d u sin g pg_query(). An ot h er op t ion is t o u se t h e pg_insert()
f u n ct ion . Wit h pg_insert(), y ou b u ild a n associat iv e ar r ay . Each elem en t in
t h e ar r ay cor r esp on d s t o a colu m n . Th e k ey f or a g iv en elem en t is t h e n am e of
t h e colu m n , an d t h e v alu e f or t h e elem en t is t h e v alu e t h at y ou w an t t o in ser t .
For ex am p le, y ou can ad d a n ew r o w t o t h e cu st om er s t ab le w it h t h e f ollow in g
cod e:

...
$customer["id"] = 8;
$customer["customer_name"] = "Smallberries, John";
$customer["birth_date"] = "1985-05-14";

pg_insert( $db_handle, "customers", $customer );


...
I n t h is cod e sn ip p et , y o u h av e cr eat ed a n associat iv e ar r ay w it h t h r ee en t r ies.
Wh en y ou ex ecu t e t h e call t o pg_insert(), PHP w ill con st r u ct t h e f ollow in g
INSERT co m m an d :

INSERT INTO customers


(
id,
customer_name,
birth_date
)
VALUES
(
8,
'Smallberries, John',
'1985-05-14'
);

PHP k n ow s t h e n am e of t h e t ab le b y look in g at t h e secon d ar g u m en t t o


pg_insert(). Th e col u m n n am es ar e d er iv ed f r om t h e k ey s in t h e $customers
ar r ay , an d t h e v alu es co m e f r om t h e v alu es in t h e associat iv e ar r ay .

Besid es pg_insert(), y ou can call pg_delete() t o bu ild a n d ex ecu t e a DELETE


co m m an d . Wh en y ou call pg_delete(), y ou p r ov id e a d at ab ase h an d le, a t ab le
n am e, an d an associat iv e ar r ay . Th e associat iv e ar r ay is u sed t o con st r u ct a
WHERE clau se f or t h e DELETE co m m an d . Th e v alu es in t h e associat iv e ar r ay ar e
ANDed t og et h er t o f or m t h e WHERE clau se.

You can also u se pg_select() t o con st r u ct an d ex ecu t e a SELECT * co m m an d .


pg_select() is sim ilar t o pg_delete()—it ex p ect s a d at ab ase h an d le, a t ab le
n am e, an d an associat iv e ar r ay . Lik e pg_delete(), t h e v alu es in t h e
associat iv e ar r ay ar e ANDed t og et h er t o f or m a WHERE clau se.

Fin ally , t h e pg_update() f u n ct ion ex p ect s t w o associat iv e ar r ay s. Th e f ir st


ar r ay is u sed t o f or m a WHERE clau se, an d t h e secon d ar r ay sh ou ld con t ain t h e
d at a ( colu m n n am es an d v alu es) t o b e u p d at ed .

As of PHP v er sion 5 . 0 , t h e associat iv e f u n ct ion s ar e d ocu m en t ed as


ex p er im en t al an d ar e lik ely t o ch an g e. Wat ch f or t h ese f u n ct ion s in a f u t u r e
r elease.

Post g r e SQL, Se con d Edit ion

Table of Con t en t s • I n d ex

Top of Page

Ad d it ion a l Re a d in g Hid e

Safar i has ident ified sect ions in ot her book s t hat r elat e dir ect ly t o t his select ion using Self-
Or ganizing Maps ( SOM) , a t y pe of neur al net w or k algor it hm . SOM enables us t o deliv er r elat ed
sect ions w it h higher qualit y r esult s t han t r adit ional quer y - based appr oaches allow .

1. Using t he PHP SQLit e Ex t ension


From SQLit e by Chr is New m an

2. The PHP API for My SQL


From Sam s Teach Your self My SQL in 21 Day s, Second Edit ion by Tony But cher

3. PHP Language St r uct ur e


From Sam s Teach Your self PHP, My SQL and Apache: All in One, Thir d Edit ion by Julie Meloni

4. The PHP API


Fr om My SQL® Phr asebook by Zak Gr eant ; Chr is New m an

5. The Pr ocess
Fr om Lear ning PHP and My SQL by Jon A. Phillips; Michele E. Dav is

6. Quick St ar t , Quick Refer ence


From PHP and My SQL by Ex am ple by Ellie Quigley ; Mar k o Gar gent a

7. Accessing My SQL Using PHP


Fr om Lear ning My SQL by Sey ed M. M. " Saied" Tahaghoghi; Hugh E. William s

8. Unt aint ing User Dat a


Fr om Lear ning My SQL by Sey ed M. M. " Saied" Tahaghoghi; Hugh E. William s

9.
Wr it ing an Obj ect - Or ient ed My SQL I nt er face for PHP
Fr om My SQL Cook book by Paul DuBois

10. Quer y ing t he Dat abase w it h PHP Funct ions


From Lear ning PHP & My SQL, 2nd Edit ion by Michele E. Dav is; Jon A. Phillips

U RL ht t p: / / pr oquest .safar ibook sonline.com / 0672327562/ ch15lev 1sec5

Co m p a n y | T e r m s o f Se r v i ce | P r i v a cy P o l i cy | Co n t a ct U s | H e l p | 5 0 8 Co m p l i a n ce
Cop y r ig h t © 2 0 0 7 Saf ar i Book s On lin e. All r ig h t s r eser v ed .
Clie nt 4 —An I n t e r a ct ive Qu e r y Pr oce ssor

You n ow h av e m ost of t he pieces t hat y ou n eed t o bu ild a gen er al- pu r pose qu er y pr ocessor w it h in a w eb br ow ser . Ou r n ex t client
sim ply pr om pt s t he u ser for a SQL com m an d, ex ecu t es t he com m an d, an d display s t he r esu lt s.

I f y ou w an t t o t r y t h is on y our ow n w eb ser v er , be su r e t hat y ou u n der st an d t h e secu r it y im plicat ion s. I f y ou follow t h e ex am ples in


t his ch apt er , y ou r PHP scr ipt w ill u se a h ar d- coded u ser n am e t o con n ect t o Post gr eSQL. Ch oose a u ser w it h v er y f ew pr iv ileges. I n
fact , m ost PHP/ Post gr eSQL sit es sh ou ld pr obably define a u ser accou n t specifically design ed for w eb access. I f y ou ' r e n ot car eful, y ou ' ll
gr an t Joh n Q. Hack er per m ission s t o alt er im por t an t dat a.

We' ll st ar t ou t w it h a sim ple scr ipt an d t h en r ef in e it as w e discov er pr oblem s.

Fir st , y ou n eed an HTML page t hat display s a w elcom e an d pr om pt s t he u ser for a SQL com m an d. List ing 1 5 . 1 6 sh ow s t h e
client4.html docu m en t .

List in g 1 5 . 1 6 . client4.html

Code View : Scr oll / Sh ow All

1 <HTML>
2
3 <!-- Filename: client4.html>
4
5 <HEAD>
6 <TITLE>client4a</TITLE>
7 <BODY>
8 <CENTER>
9 <FORM ACTION="client4a.php" METHOD="POST">
10 <I>Enter SQL command:</I><br>
11
12 <INPUT TYPE="text"
13 NAME="query"
14 SIZE="80"
15 ALIGN="left"
16 VALUE="">
17
18 <BR><BR>
19 <INPUT TYPE="submit" VALUE="Execute command">
20 </FORM>
21 </CENTER></BODY>
22 </HTML>

Th is HTML docu m en t defin es a for m t h at w ill be post ed t o t h e ser v er ( see line 9 ) . Af t er t h e u ser en t er s a com m an d an d pr esses t he
Execute Command bu t t on , t he br ow ser w ill r equ est t he file client4a.php. We' ll look at client4a.php in a m om en t . Wh en y ou r equ est
t his p ag e in a w eb br ow ser , y ou w ill see a f or m sim ilar t o t h at sh ow n in Figu r e 1 5 . 9 .

Fig u r e 1 5 . 9 . clie n t 4 . h t m l.

[ View full size im age]


Now let ' s look at t h e secon d half of t he pu zzle—client4a.php ( see List ing 1 5 . 1 7 ) .

List in g 1 5 . 1 7 . client4a.php

Code View : Scr oll / Sh ow All

1 <HTML>
2 <HEAD>
3 <TITLE>Query</TITLE>
4 <BODY>
5 <?php
6
7 # Filename: client4a.php
8
9 include( "secure/my_connect_pg.php" );
10 include( "my_table_e.php" );
11
12 $command_text = $HTTP_POST_VARS[ "query" ];
13
14 if( strlen( $command_text ) == 0 )
15 {
16 echo "You forgot to enter a command";
17 }
18 else
19 {
20 $db_handle = my_connect_pg( "movies" );
21
22 $table = new my_table( $db_handle, $command_text );
23 $table->finish();
24
25 pg_close( $db_handle );
26 }
27 ?>
28 </BODY>
29 </HTML>

Most of t his scr ipt sh ou ld be pr et t y fam iliar by n ow . You in clu de secure/my_connect__pg.php t o av oid em beddin g a u ser n am e an d
passw or d inline. Nex t , in clu de my_table_e.php so t h at y ou can u se t he my_table class ( my_table_e.php in clu des all t he m odif icat ion s
y ou m ad e t o t he or iginal v er sion of my_table_a.php) .

At line 12, client4a r et r iev es t h e com m an d en t er ed by t he u ser f r om t he $HTTP_POST__VARS[] v ar iable. Look back at lin es 1 2
t h r ou gh 1 6 of List ing 1 5 . 1 6 ( client4.html) . You ar e defin in g an INPUT field n am ed query. Wh en t he u ser en t er s a v alu e an d pr esses
t he Execute Command bu t t on , t he br ow ser post s t he query field t o client4a.php. PHP m ar sh als all t he post ed v alu es int o a single
associat iv e ar r ay n am ed $HTTP_POST__VARS[] ( also k n ow n as $_POST st ar t in g in PHP v er sion 5 ) . Th e k ey for each v alu e in t his ar r ay is
t he n am e of t h e post ed v ar iable. So, y ou defin ed a field n am ed query, an d y ou can find t he v alu e of t hat field in $HTTP_POST__VARS
[" query" ].

I f y ou t r y t o ex ecu t e an em p t y com m an d using pg_query(), y ou ' ll be r ew ar ded w it h an ugly er r or m essage. You can be a lit t le nicer t o
y our u ser s by in t er cept in g em p t y com m an d s at lin es 14 t h r ou gh 1 6 an d display in g a less in t im idat in g er r or m essage.

Th e r em ain der of t his scr ipt is st r aigh t for w ar d: sim ply est ablish a dat abase con n ect ion an d u se t h e my_table class t o ex ecu t e t h e
giv en com m an d an d display t he r esu lt .

Let ' s r u n t h is scr ipt t o see h ow it beh av es ( see Figu r es 1 5 . 1 0 an d 1 5 . 1 1 ) .

Fig u r e 1 5 . 1 0 . Su b m it t in g a q u e r y w it h client4.html.

[ View full size im age]


Fig u r e 1 5 . 1 1 . Su b m it t in g a q u e r y w it h client4.html—r e su lt .

[ View full size im age]

Th at w or k ed nicely . Let ' s t r y an ot h er qu er y ( see Figu r es 1 5 . 1 2 an d 1 5 . 1 3 ) .

Fig u r e 1 5 . 1 2 . Ca u sin g a n e r r o r w it h client4.html.

[ View full size im age]


Fig u r e 1 5 . 1 3 . Ca u sin g a n e r r o r w it h client4.html—r e su lt .

[ View full size im age]

Hm m m …t hat 's not w h at w e w er e h opin g for . Wh at w en t w r on g? Act u ally , t her e ar e sev er al pr oblem s sh ow n h er e. Fir st , PHP is
r epor t in g t h at w e h av e an er r on eou s back slash on line 12 of my_table_e.php. Lin e 12 is in side of t h e my_table con st r u ct or an d it
sen ds t h e follow in g com m an d t o t he ser v er :

$this->result = pg_query( $db_handle, $command );

Th er e ar e no back slash es on t hat line; t h er e ar e no back slash es in t he com m an d t hat y ou en t er ed. Wh er e ar e t h e back slash es com in g
f r om ? I f y ou echo $HTTP_POST_VARS_[" query" ], y ou ' ll see t h at PHP h as added escape ch ar act er s t o t h e com m an d en t er ed by t he
user . You en t er ed SELECT * FROM customers WHERE birth_date = ' 1984-02-21', an d PHP ch an ged t h is t o SELECT * FROM
customers WHERE birth_date = \'1984-02-21\'. Accor din g t o t he PHP m an u al, all single- qu ot es, dou ble- quot es, back slash es, an d
NULLs ar e escaped w it h a back slash w h en t h ey com e f r om a post ed v alu e. [ 4]

[ 4]
You can disable t he au t om at ic qu ot in g feat ur e by set t ing t he magic_quote_gpc con figu r at ion v ar iable t o n o. I w ou ld
not r ecom m en d ch an gin g t h is v alu e—y ou ' r e lik ely t o br eak m an y PHP scr ipt s.

Th is is easy t o fix . You can sim ply st r ip t he escape ch ar act er s w h en y ou r et r iev e t h e com m an d t ex t fr om $HTTP_VARS[]. Ch an gin g
client4a.php, line 1 2 , t o

if( get_magic_quotes_gpc())
$command_text = stripslashes( $HTTP_POST_VARS[ "query" ] );
w ill m ak e it possible t o ex ecu t e SQL com m an d s t h at con t ain sin gle- qu ot es.

Th at w as t h e fir st pr oblem . Th e secon d pr oblem is t h at y ou don't w an t t h e en d - u ser t o see t h ese n ast y - look ing PHP/ Post gr eSQL er r or
m essages. To fix t his pr oblem , y ou n eed t o int er cept t he er r or m essage and display it y ou r self. List ing 1 5 . 1 8 sh ow s a n ew v er sion of
t he my_table con st r u ct or .

List in g 1 5 . 1 8 . my_table.my_table()

1 function my_table( $db_handle, $command )


2 {
3 $this->result = @pg_query( $db_handle, $command );
4
5 if( $this->result == FALSE )
6 {
7 echo pg_last_error( $db_handle );
8 }
9 else
10 {
11 $this->columns = pg_num_fields( $this->result );
12 $row_count = pg_num_rows( $this->result );
13
14 $this->start_table( $command );
15
16 for( $row = 0; $row < $row_count; $row++ )
17 $this->append_row( $this->result, $row );
18 }
19 }

We' v e r est r u ct u r ed t his fu n ct ion a bit . Becau se t he g oal is t o in t er cept t he default er r or m essage, w e su ppr ess er r or r epor t in g by
pr efix ing t he call t o pg_query() w it h an @. At line 5, det er m in e w h et h er pg_query() r et u r n ed a v alid r esu lt set r esou r ce. I f y ou ar e
u sed t o using Post gr eSQL w it h ot h er API s, t h er e is an im por t an t differ en ce lu r k in g her e. I n ot h er Post gr eSQL API s, y ou get a r esu lt set
ev en w h en a com m an d fails—t he er r or m essage is par t of t he r esu lt set . I n PHP, pg_query() r et u r n s FALSE w h en an er r or occu r s. You
m u st call pg_last_error() t o r et r iev e t he t ex t of t h e er r or m essage ( see line 7 ) .

I f y ou h av e su cceeded in ex ecu t in g t he giv en com m an d, y ou can build an HTML t able f r om t he r esult set as bef or e.

Now , if y ou r u n in t o an er r or con dit ion , t he r esult is far m or e palat able ( see Figu r es 1 5 . 1 4 an d 1 5 . 1 5 ) .

Fig u r e 1 5 . 1 4 . Ca u sin g a n e r r o r w it h client4.html—p a r t 2 .

[ View full size im age]

Fig u r e 1 5 . 1 5 . Ca u sin g a n e r r o r w it h client4.html—p a r t 2 , r e su lt .

[ View full size im age]


Not ice t hat y ou see on ly on e er r or m essage t h is t im e. I n Figu r e 1 5 . 1 3, y ou saw m u lt iple er r or m essages. Not on ly h ad client4a failed
t o int er cept t he or iginal er r or , bu t it w en t on t o u se an inv alid r esu lt set h an dle; w h en y ou fix t he fir st pr oblem , t he ot h er er r or
m essages go aw ay .

At t h is poin t , y ou can ex ecu t e quer ies an d in t er cept er r or m essages. Let ' s see w h at h appen s w h en y ou ex ecu t e a com m an d ot h er t h an
SELECT. Fir st , ent er t he com m an d sh ow n in Figu r e 1 5 . 1 6.

Fig u r e 1 5 . 1 6 . Ex e cu t in g a n INSERT co m m a n d .

[ View full size im age]

Aft er click in g on t h e Execute Command bu t t on , y ou see t h e r esu lt display ed in Figu r e 1 5 . 1 7.

Fig u r e 1 5 . 1 7 . Ex e cu t in g a n INSERT co m m a n d —r e su lt .

[ View full size im age]


Hm m m …t hat 's a bit m in im alist f or m y t ast e. You sh ou ld at least see a con f ir m at ion t hat som et h in g h as h appen ed. Wh en y ou ex ecu t e a
non - SELECT com m an d, t he pg_query() fu n ct ion w ill r et ur n a r esu lt set r esou r ce, j ust lik e it d oes for a SELECT com m an d . You can
differ ent iat e bet w een SELECT an d ot h er com m an d s by t h e fact t h at pg_num_fields() alw ay s r et u r n s 0 for n on- SELECT com m an d s.

Let ' s m ak e on e last m odif icat ion t o t h e my_table con st r u ct or , in List ing 1 5 . 1 9 , so t h at it giv es f eedback r egar dless of w h ich t y pe of
com m an d ex ecu t ed.

List in g 1 5 . 1 9 . my_table.my_table()—Fin a l Fo r m

Code View : Scr oll / Sh ow All

1 function my_table( $db_handle, $command )


2 {
3 $this->result = @pg_query( $db_handle, $command );
4
5 if( $this->result == FALSE )
6 {
7 echo pg_last_error( $db_handle );
8 }
9 else
10 {
11 $this->columns = pg_num_fields( $this->result );
12
13 if( $this->columns == 0 )
14 {
15 echo $command;
16 echo "<BR>";
17 echo pg_affected_rows( $this->result );
18 echo " row(s) affected";
19
20 if( pg_last_oid( $this->result ) != 0 )
21 echo ", OID = ". pg_last_oid( $this->result );
22 }
23 else
24 {
25 $row_count = pg_num_rows( $this->result );
26
27 $this->start_table( $command );
28
29 for( $row = 0; $row < $row_count; $row++ )
30 $this->append_row( $this->result, $row );
31 }
32 }
33 }

Th is v er sion ch eck s t he r esult set colu m n cou n t at line 1 3 . I f y ou f in d t h at t he r esult set con t ain s 0 colu m n s, ech o t h e com m an d t ex t
an d t he n u m ber of r ow s affect ed by t he com m an d ( t h at giv es y ou f eedback sim ilar t o w h at y ou w ou ld see u sin g t he psql client ) . You
can also call t he pg_last_oid() fu n ct ion . pg_last_oid() r et u r n s t h e OID ( obj ect I D) of t h e m ost r ecen t ly in ser t ed r ow . pg_last_oid
() r et u r n s 0 if t he com m an d w as n ot an INSERT or if m or e t h an on e r ow w as in ser t ed.
Th e f in al r esu lt s ar e sh ow n in Figu r e 1 5 . 1 8.

Fig u r e 1 5 . 1 8 . Ex e cu t in g a n INSERT co m m a n d —f in a l r e su lt .

[ View full size im age]

Now t h at y ou k n ow h ow t o w r it e an int er act iv e qu er y pr ocessor u sin g PHP, don ' t for get t he secur it y im plicat ion s of doin g so. Mak e
su r e t h at y ou con n ect t o t he Post gr eSQL ser v er using an accou n t w it h v er y f ew pr iv ileges: I f y ou giv e a v isit or lot s of pr iv ileges an d a
w ay t o ex ecu t e an y com m an d h e w an t s, y ou ' r e in v it in g disast er .

Mor e t y pically , y ou ask a v isit or t o y our sit e t o t y pe in a v alu e t hat y ou u se t o con st r u ct an SQL com m an d. For ex am ple, y ou m ay
pr om pt t he u ser for a " Cu st om er Nam e" an d r et r iev e t h e m at ch in g r ecor d w it h a qu er y su ch as

$query = "SELECT * FROM customers WHERE customer_name = '";


$query .= $_POST['customer_name'] . "'";

Th at w or k s, som et im es. Wh at h appen s if t he u ser t y pes in a v alu e su ch as " Freddy's Fudge Factory" ? Th en y ou r qu er y becom es

SELECT * FROM customers WHERE customer_name = 'Freddy's Fudge Factory'.

See t h e pr oblem ? As far as Post gr eSQL is con cer n ed, a st r in g lit er al is en closed in a pair of single qu ot es. Th e Post gr eSQL par ser
t hink s t he cu st om er n am e en ds at Freddy—t he r est of t he cu st om er n am e is j ust a sy n t ax er r or . To f ix t h at sor t of pr oblem , j ust qu ot e
an y st r ing v alu es w it h t he pg_escape_string() fu n ct ion , lik e t his:

$customer_name = pg_escape_string( $_POST['customer_name'] );


$query = "SELECT * FROM customers WHERE customer_name = '$customer_name'";
O t h e r Fe a t u r e s

There are a num ber of Post greSQL- relat ed PHP funct ions t hat I have not covered in t his chapt er.

Newer versions of PHP have added support for asynchronous query processing ( see pg_send_query(), pg_connection_busy
(), and pg_get_result()) . Asynchronous query processing probably won't be of m uch use when you are const ruct ing
dynam ic web pages, but clever coders can use asynchronous queries t o provide int erm ediat e feedback for long- running
operat ions ( sorry, I 'm not t hat clever) .

PHP offers a set of funct ions t hat can give you inform at ion about a dat abase connect ion. We used t he pg_dbname() funct ion
in t he first client ( see List ing 15.4) t o display t he nam e of t he dat abase t o which we were connect ed. You can also use t he
pg_port() and pg_options() funct ion t o ret rieve t he port num ber and opt ions associat ed wit h a dat abase connect ion. PHP
provides a pg_host() funct ion t hat is supposed t o ret urn t he nam e of t he host where t he server resides. Be very careful
calling pg_host(); if you have est ablished a local connect ion ( t hat is, using a Unix - dom ain socket ) , calling pg_host() m ay
crash your web server because of a bug in t he PHP/ Post greSQL int erface.

Anot her funct ion offered by PHP is pg_pconnect(). The pg_pconnect() funct ion est ablishes a persist ent connect ion t o a
Post greSQL dat abase. Persist ent connect ions are cached by t he web server and can be reused t he next t im e a browser
request s a docum ent t hat requires access t o t he sam e dat abase. See t he PHP m anual for inform at ion about t he pros and
cons of persist ent connect ions.

Finally, PHP support s t he Post greSQL large-obj ect int erface. You can use t he large- obj ect int erface t o read ( or writ e) large
dat a it em s such as im ages or audio files.
Pa r t I I I : Post g r e SQL Ad m in ist r a t ion
20 I n t r odu ct ion t o Post g r eSQL Adm in ist r at ion

21 Post g r eSQL Adm in ist r at ion

22 I n t er n at ion alizat ion an d Localizat ion

23 Secu r it y

24 Replicat in g Post g r eSQL Dat a w it h Slon y

25 Con t r ib u t ed Mod u les

Cha pt e r 2 0 . I nt r oduct ion t o Post gr e SQL Adm inist r a t ion

Th is b ook is div ided in t o t h r ee par t s. Th e f ir st par t of t he b ook w as d esig n ed as a g u id e t o n ew Post g r eSQL u ser s. Th e m iddle sect ion
cov er ed Post g r eSQL p r og r am m in g . Th e t h ir d sect ion is d ev ot ed t o t h e t opic of Post g r eSQL adm in ist r at ion . Th ese t h r ee par t s
cor r espon d t o t h e r eal- w or ld r oles t h at w e p lay w h en u sin g Post g r eSQL.

User s ar e con cer n ed m ost ly w it h get t in g dat a in t o t h e d at ab ase an d get t in g it b ack ou t ag ain . Pr og r am m er s t r y t o pr ov ide u ser s w it h
t he f u n ct ion alit y t h at t h ey n eed . Ad m in ist r at or s ar e r espon sible f or en su r in g t h at p r og r am m er s an d en d u ser s can p er f or m t h eir j obs.
Qu it e of t en , on e p er son w ill fill t w o or t h r ee r oles at t h e sam e t im e.

Wh en y ou w ear t h e h at of an adm in ist r at or , y ou en su r e t h at y ou r u ser s can st or e t h eir dat a in a secu r e, r eliable, h igh - av ailabilit y ,
h igh - p er f or m an ce d at ab ase.

Secu r e m ean s t h at y ou r dat a is saf e f r om in t r u der s. You m u st en su r e t h at au t h or ized u ser s can d o t h e t h in gs t h ey n eed t o do. You
also n eed t o en su r e t h at u ser s can n ot gain access t o d at a t h at t h ey sh ou ld n ot see.

Reliable m ean s t h e dat a t h at g oes in t o a d at ab ase can b e r et r iev ed w it h ou t cor r u pt ion . An y dat a t r an sf or m at ion s sh ou ld be ex p ect ed ,
n ot acciden t al.

High - av ailabilit y m ean s t h at t h e d at ab ase is av ailable w h en n eed ed . You r u ser s sh ou ld ex pect t h at t h e d at ab ase is r eady t o u se w h en
t h ey log in . Rou t in e m ain t en an ce sh ou ld f ollow a pr edict able sch edu le an d sh ou ld n ot in t er f er e w it h n or m al u se. Hig h - av ailabilit y m ay
also af f ect y ou r ch oice of oper at in g sy st em an d h ar d w ar e. You m ay w an t t o ch oose a clu st er con f igu r at ion t o p r ev en t p r ob lem s in t h e
ev en t of a sin gle poin t of f ailu r e.

High - p er f or m an ce m ean s t h at a u ser sh ou ld be able t o p er f or m r equ ir ed t ask s w it h in an accep t ab le am ou n t of t im e. A h igh -


p er f or m an ce d at ab ase sh ou ld also f eel r esp on siv e.

I n t h is ch apt er , I ' ll in t r odu ce y ou t o som e of t h e t ask s t h at a Post g r eSQL adm in ist r at or m u st per f or m . Th e r em ain in g ch ap t er s cov er
each t opic in gr eat er det ail.

Se cu r it y

A Post g r eSQL adm in ist r at or is r esp on sib le for en su r in g t h at au t h or ized u ser s can d o w h at t h ey n eed t o do. An adm in ist r at or is also
r espon sible for m ak in g su r e t h at au t h or ized u ser s can d o on ly w h at t h ey n eed t o d o. An ot h er cr it ical j ob is t o k eep in t r u d er s aw ay
f r om t h e u ser ' s dat a.

Th er e ar e t w o aspect s t o Post g r eSQL secu r it y —au t h en t icat ion an d access. Au t h en t i cat ion en su r es t h at a u ser is in f act w h o h e
claim s t o b e. Af t er y ou ar e sat isf ied t h at a u ser h as p r ov en h is iden t it y , y ou m u st en su r e t h at h e can access t h e dat a t h at h e n eed s.

Each u ser ( or g r ou p ) r equ ir es access t o a specif ic set of r esou r ces. For ex am p le, an accou n t in g cler k n eed s access t o v en d or an d
cu st om er r ecor ds, bu t m ay n ot r equ ir e access t o pay r oll d at a. A pay r oll cler k , on t h e ot h er h an d , n eed s access t o pay r oll dat a, b u t
n ot t o cu st om er r ecor ds. On e of y ou r j obs as an adm in ist r at or is t o gr an t t h e pr oper pr iv ileges t o each u ser .

An ot h er aspect of secu r it y in g en er al is t he p r ob lem of secu r in g Post g r eSQL' s r u n t im e en v ir on m en t . Dep en d in g on y ou r secu r it y


r equ ir em en t s ( t h at is, t h e sen sit iv it y of y ou r dat a) , it m ay b e ap p r op r iat e t o in st all n et w or k fir ew alls, secu r e r ou t er s, an d possibly
ev en biom et r ic access con t r ols. Secu r in g y ou r r u n t im e en v ir on m en t is a p r ob lem t h at is n ot u n iq u e t o Post g r eSQL, an d I w on ' t
ex plor e t h at t opic f u r t h er in t h is b ook .

Ch ap t er 2 3 , " Secu r it y , " sh ow s y ou h ow t o gr an t an d r ev ok e u ser pr iv ileges an d also cov er s h ow t o p r ev en t t am p er in g b y in t r u der s.


I 'll sh ow y ou h ow t o secu r e Post g r eSQL dat a, con f igu r at ion , an d p r og r am files on Lin u x / Un ix sy st em s an d on Win d ow s h ost s.
Ch a p t e r 2 5 . Con t r ib u t e d M od u le s

Most of t he soft ware t hat I 've described in t his book is considered t o be part of t he core Post greSQL dist ribut ion. The core dist ribut ion is
m anaged by t he core developm ent t eam —a sm all, well- organized, and highly - dedicat ed t eam of professional developers and designers. But
t here are a huge num ber of developers t hat cont ribut e soft ware t o t he Post greSQL com m unit y. Som e of t he cont ribut ed packages are
included in t he core dist ribut ion ( in t he contrib direct ory) and you can find m any ot hers at t he PgFoundry and Gborg web sit es
( www.pgfoundry.org and gborg.post gresql.org) .

Cont ribut ed soft ware is a broad t erm t hat describes open- source soft ware designed t o work w it h Post greSQL. That includes everyt hing from
graphical SQL client applicat ions and graphical m anagem ent applicat ions t o procedural languages such as PL/ Java and PL/ perl. You can also
find cont ribut ed packages t hat will help you convert dat a and program s from ot her syst em s ( such as MySQL, Oracle, and m SQL) int o
Post greSQL. At t he Gborg web sit e, you can find an ODBC driver, a JDBC driver ( for Java applicat ions) , a DBD: : driver ( for Perl applicat ions) ,
and int erfaces for applicat ions writ t en in C, C+ + , Visual Basic, C# , Tcl/ Tk, Ruby, Pyt hon, and m aybe even Cobol. You'll find dat abase design
t ools, m onit oring t ools, adm inist rat or t ools, developer t ools, even com plet e business applicat ions.

I n t his chapt er, I 'll describe t wo of t he cont ribut e packages t hat com e in t he core Post greSQL dist ribut ion: xm l2 and t search2. The xm l2
package let s you st ore XML docum ent s inside a Post greSQL dat abase, query t hose docum ent s using XPat h queries, and convert XML
docum ent s using XSLT st ylesheet s. t search2 is a full- t ext indexing and searching package t hat let s y ou t urn your Post greSQL server int o a
search engine.

Ex ch a n g i n g P o st g r e S Q L D a t a w i t h X M L

XML is t he wave of t he fut ure. Well, it 's a wave in som e fut ure anyway. XML was designed t o let you and I writ e applicat ions t hat can
exchange st ruct ured dat a. An XML docum ent is a self - describing t ext ual represent at ion of dat a, oft en st ruct ured in a hierarchical form . I n t his
sect ion, I 'll assum e t hat you have som e knowledge of XML, XPat h queries, and XSLT st ylesheet s. I f you aren't fam iliar wit h t hose
t echnologies, read on—I 'll show you a few exam ples t hat should help y ou underst and t he basic concept s.

You can st ore XML dat a in a Post greSQL dat abase wit hout any help from t hird- part y soft ware. For exam ple, let 's say t hat one of your
dist ribut ors offers a new service t o t he video st ore t hat you're running. Every so oft en t he dist ribut or sends you an XML docum ent t hat
describes a num ber of film s. A t ypical docum ent is show n in List ing 25.1.

List ing 2 5 .1 . films.xml

Code View: Scroll / Show All

<films>
<film>
<name>Casablanca</name>
<year>1942</year>
<writers>
<writer>Murray Burnett</writer>
<writer>Joan Alison</writer>
<writer>Julius J. Epstein</writer>
<writer>Philip G. Epstein</writer>
<writer>Howard Koch</writer>
</writers>
<leads>
<lead>Humphrey Bogart</lead>
<lead>Ingrid Bergman</lead>
<lead>Peter Lorre</lead>
</leads>
<directors>
<director>Michael Curtiz</director>
</directors>
</film>

<film>
<name>Rear Window</name>
<year>1954</year>
<writers>
<writer>Cornell Woolrich</writer>
<writer>John Michael Hayes</writer>
</writers>
<leads>
<lead>James Stewart</lead>
<lead>Grace Kelly</lead>
<lead>Raymond Burr</lead>
</leads>
<directors>
<director>Alfred Hitchcock</director>
</directors>
</film>
<film>
<name>The Godfather</name>
<year>1972</year>
<writers>
<writer>Mario Puzo</writer>
<writer>Francis Ford Coppola</writer>
</writers>
<leads>
<lead>Marlon Brando</lead>
<lead>Al Pacino</lead>
<lead>James Caan</lead>
<lead>Robert Duvall</lead>
<lead>Diane Keaton</lead>
<lead>Talia Shire</lead>
</leads>
<directors>
<director>Francis Ford Coppola</director>
</directors>
</film>
</films>

This docum ent ( films.xml) describes t hree film s: Casablanca, Rear Window, and The Godfather. Each descript ion cont ains a name, a year
( t he year t hat t he film w as released) , a collect ion of writers, a collect ion of leads ( leading act ors and act resses) , and a collect ion of
directors.

To st ore t his docum ent in a Post greSQL dat abase, you could sim ply INSERT t he whole t hing int o a TEXT colum n. I n pract ice, you'd probably
want t o split t he docum ent int o separat e records ( one for each film ) and st ore each descript ion in a separat e r ow . Let 's do t hat . First , creat e a
t able ( filminfo) t hat will hold t he film descript ions like t his:

$ psql movies
Welcome to psql 8.0.0, the PostgreSQL interactive terminal.
...
movies=# CREATE TABLE filminfo
movies-# (
movies(# film_name VARCHAR PRIMARY KEY,
movies(# description TEXT
movies(# );
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit
index "filminfo_pkey" for table "filminfo"
CREATE TABLE

Now you have a cont ainer, but how do you get t he XML obj ect s int o t he filminfo t able? The dist ribut or has given you an XML docum ent ;
you want t o split t hat docum ent int o separat e obj ect s and t hen INSERT t hose obj ect s int o t he filminfo t able.

Sounds like a perfect j ob for XSLT ( Ext ensible St ylesheet Language Transform at ions) . List ing 25.2 shows an XSLT docum ent
( splitFilms.xsl) t hat will do t he t rick.

List ing 2 5 .2 . splitFilms.xsl

1 <xsl:stylesheet
2 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
3 <xsl:output method="xml" omit-xml-declaration="yes"/>
4
5 <xsl:template match="/">
6 <xsl:for-each select="films/film">
7 <xsl:text>INSERT INTO filminfo VALUES('</xsl:text>
8 <xsl:value-of select="name"/>
9 <xsl:text>','</xsl:text>
10 <xsl:copy-of select="."/>
11 <xsl:text>');
12 </xsl:text>
13 </xsl:for-each>
14 </xsl:template>
15 </xsl:stylesheet>

splitFilms.xsl will parse t hrough a list of film s and cr eat e an INSERT com m and for each film t hat it finds. Given t he first film described in
films.xml ( see List ing 25.1) , splitFilms.xsl will produce an INSERT com m and t hat looks like t his:

INSERT INTO filminfo VALUES('Casablanca','<film>


<name>Casablanca</name>
<year>1942</year>
<writers>
<writer>Murray Burnett</writer>
<writer>Joan Alison</writer>
<writer>Julius J. Epstein</writer>
<writer>Philip G. Epstein</writer>
<writer>Howard Koch</writer>
</writers>
<leads>
<lead>Humphrey Bogart</lead>
<lead>Ingrid Bergman</lead>
<lead>Peter Lorre</lead>
</leads>
<directors>
<director>Michael Curtiz</director>
</directors>
</film>');

You can save t he INSERT com m ands t o a t ext file or, bet t er yet , j ust pipe t he out put produced by splitFilms.xsl direct ly int o t he psql
com m and. I f you're using t he libxslt package, you can execut e splitFilms.xsl like t his:
$ xsltproc splitFilms.xsl films.xml | psql movies
INSERT 846648 1
INSERT 846649 1
INSERT 846650 1

There are t hree film s described in films.xml and psql report s t hree INSERT com m ands—t hat 's a good sign.

Now t hat you have XML docum ent s in your dat abase, you can use t hem j ust like any ot her TEXT value. You can search inside of an XML
docum ent using Post greSQL's regular expression operat ors. For exam ple, t o find film s st arring Jim m y St ewart , you could execut e t he
following query:

movies=# SELECT film_name FROM filminfo


movies-# WHERE description ~* '<leads>.*James Stewart.*</leads>';
film_name
------------
Rear Window
(1 row)

You can t reat an XML docum ent j ust like any ot her TEXT value, but you can add a num ber of XML- specific feat ures by inst alling t he xm l2
cont ribut ed m odule.

XPa t h Qu e r ie s

xm l2 ( also know n as pgxm l) is a sm all collect ion of funct ions t hat let y ou execut e XPat h queries against XML docum ent s st ored in a
Post greSQL dat abase. xm l2 does not t urn your dat abase int o an XML dat abase—you can't execut e XPat h queries in place of SELECT
st at em ent s. I nst ead, xm l2 let s you include XPat h queries inside of SELECT ( and ot her st at em ent s) .

xm l2 defines t hree funct ions t hat will ret urn a single value from an XPat h query:

xpath_string( document, query ) RETURNS TEXT


xpath_number( document, query ) RETURNS FLOAT4
xpath_bool( document, query) RETURNS BOOL

When you call one of t he xm l2 query funct ions, you provide an XML docum ent and an XPat h query. The query funct ion ret urns t he obj ect ( a
num ber, st ring, Boolean value, list of values, or a nodeset ) ident ified by t he XPat h. I f t hat sounds confusing, it m ay help t o look at an
exam ple. Here's a query t hat ext ract s t he year node st ored wit hin a film descript ion:

Code View: Scroll / Show All

movies=# SELECT film_name, xpath_string( description, 'year' ) FROM filminfo;


film_name | xpath_string
---------------+-------------
Casablanca | 1942
Rear Window | 1954
The Godfather | 1972
(3 rows)

I n t his case, t he xpath_string() funct ion is invoked t hree t im es ( because t here are t hree rows in t he filminfo t able) . I n each invocat ion,
t he description colum n act s as an XML docum ent and 'year' is an XPat h query t hat navigat es t hrough t he docum ent unt il it finds t he year
elem ent .

You can also use t he XPat h query funct ions in ot her part s of t he SELECT com m and, such as t he WHERE clause. For exam ple, t o find all film s
released before 1960:

movies=# SELECT film_name FROM filminfo


movies-# WHERE xpath_number( description, '/film/year' ) < 1960;
film_name
——————-
Casablanca
Rear Window
(2 rows)

I n fact , y ou can use t he result of an XPat h query as a t able. ( I 'll show you how t o do t hat in a m om ent .)

xm l2 defines five funct ions t hat ret urn m ult iple values from an XPat h query. The first set of funct ions ret urn a nodeset in t he form of a TEXT
st ring ( a nodeset is a collect ion of XML nodes) :

xpath_nodeset( document, query, topTag, itemTag ) RETURNS TEXT


xpath_nodeset( document, query, itemTag ) RETURNS TEXT
xpath_nodeset( document, query ) RETURNS TEXT

Use t he nodeset funct ions when you want t o ext ract a set of values from an XML docum ent and y ou want t he result t o ret ain t he XML t ags
present in t he docum ent . For exam ple, t o find all writers for a given film :
movies=# SELECT xpath_nodeset( description, 'writers' ) FROM filminfo
movies-# WHERE film_name = 'Rear Window';
xpath_nodeset
--------------
<writers>
<writer>Cornell Woolrich</writer>
<writer>John Michael Hayes</writer>
</writers>
(1 row)

I f you include a topTag or itemTag, xpath_nodeset() will wrap t he ent ire nodeset in t he topTag and wrap each it em inside of an <itemTag>
</itemTag> pair. The ext ra t ags are useful if you ar e building a new XML docum ent out of dat a already in t he dat abase.

To convert a nodeset int o a m ore convent ional ( and oft en m ore useful form ) , use t he xpath_list() funct ion.

movies=# SELECT xpath_list( description, 'leads/lead') FROM filminfo


movies-# WHERE film_name = 'The Godfather';
xpath_list
--------------------------------------------------------------------------
Marlon Brando,Al Pacino,James Caan,Robert Duvall,Diane Keaton,Talia Shire
(1 row)

xpath_list() expect s t wo or t hree argum ent s. I f you call xpath_list() wit h t hree argum ent s, t he last argum ent det erm ines t he st ring t hat
separat es each elem ent in t he list ( t he default separat or is ",") .

The last XPat h funct ion provided by xm l2 is xpath_table() and it can be som ewhat confusing. xpath_table() creat es a t abular result set by
execut ing an XPat h query ( or a series of queries separat ed by "|") against a t able ( or view) . Here's an exam ple:

Code View: Scroll / Show All

movies=# SELECT * FROM


movies-# xpath_table('film_name','description','filminfo','year|leads/*','1=1')
movies-# AS t(film_name text, year text, leads text);
film_name | year | leads
---------------+------+---------------
Casablanca | 1942 | Humphrey Bogart
Casablanca | | Ingrid Bergman
Casablanca | | Peter Lorre
Rear Window | 1954 | James Stewart
Rear Window | | Grace Kelly
Rear Window | | Raymond Burr
The Godfather | 1972 | Marlon Brando
The Godfather | | Al Pacino
The Godfather | | James Caan
The Godfather | | Robert Duvall
The Godfather | | Diane Keaton
The Godfather | | Talia Shire
(12 rows)

The AS clause t ells Post greSQL t he shape of t he result ing t able. ( xpath_table() is defined t o a ret urn a SETOF RECORDS—since a RECORD has
no predefined shape, you have t o t ell Post greSQL what shape t o expect .)

xpath_table() expect s five argum ent s:

xpath_table( key, document, table, xpathQueries, condition )

xpath_table() creat es a SELECT com m and ( based on t he argum ent s t hat you provide) and t hen execut es t hat com m and. Next , xpath_table
() reads t hrough each row ret urned by t he SELECT com m and and evaluat es t he XPat h queries against t he document colum n.

The SELECT com m and is const ruct ed from t he key, document, table, and condition argum ent s. Aft er execut ing t he SELECT com m and,
xpath_table() verifies t hat t he query ret urned exact ly t wo colum ns ( a key and a document) and t hen reads t hrough each row in t he result
set , evaluat ing each XPat h query t hat you provide. For each XPat h query, xpath_table() evaluat es t hat query and, if it ret urns a value,
st ores t hat value in an int erm ediat e t uple. When t he t uple is com plet e, xpath_table() adds it s t o t he final result set . I f y ou invoke
xpath_table() wit h t wo or m ore XPat h queries t hat ret urn nodeset s of differing sizes, t he final result set will cont ain NULL values.

That 's a rat her com plex descript ion t hat m ight be bet t er illust rat ed by walking t hrough t he process one st ep at a t im e.

Given t he argum ent s

xpath_table('film_name','description','filminfo','year|leads/*','1=1')

xpath_table() st art s by split t ing t he xpathQueries argum ent ( year|leads/*) int o individual queries. I n t his case, xpath_table() finds t wo
queries: year and leads/*. Since you've supplied t wo queries, t he result set produced by xpath_table() will cont ain t hree colum ns: t he first
colum n will cont ain t he key field ( film_name) , t he second colum n will cont ain t he result of t he first XPat h query ( year) , and t he t hird colum n
will cont ain t he result of t he send XPat h query ( leads/*) .
Next , xpath_table() past es t oget her a SELECT com m and t hat looks like t his:

SELECT film_name, description FROM filminfo WHERE 1=1;

xpath_table() always select s t wo colum ns from t he t able: t he key and t he docum ent . The condition argum ent is t acked on t o t he end of
t he com m and in t he form of a WHERE clause—you m ust provide a condition even if you want t o process every row in t he given t able [ 1] .

[ 1] The condit ion argum ent is j ust t acked ont o t he end of t he SELECT com m and—you can include any t ext t hat can legally
follow t he word WHERE in a SELECT com m and. I n fact , you can for ce xpath_table() t o j oin t wo t ables by list ing t hem bot h in
t he table argum ent and specifying a j oin in t he condition argum ent .

Next , xpath_table() execut es t he SELECT com m and and loops t hrough each row t hat m akes it t hrough t he WHERE clause. xpath_table()
const ruct s one or m ore t uples out of each row ret urned by t he SELECT com m and. To fill in t he first colum n in each new t uple, xpath_table()
sim ply copies t he key colum n ( film_name) from t he row ret urned by t he SELECT com m and. To fill in t he rem aining colum ns, xpath_table()
evaluat es each XPat h query against t he docum ent colum n. I n t his exam ple, t he first docum ent ( t he description colum n) ret urned by t he
SELECT com m and looks like t his:

<film>
<name>Casablanca</name>
<year>1942</year>
<writers>
<writer>Murray Burnett</writer>
<writer>Joan Alison</writer>
<writer>Julius J. Epstein</writer>
<writer>Philip G. Epstein</writer>
<writer>Howard Koch</writer>
</writers>
<leads>
<lead>Humphrey Bogart</lead>
<lead>Ingrid Bergman</lead>
<lead>Peter Lorre</lead>
</leads>
<directors>
<director>Michael Curtiz</director>
</directors>
</film>

The first XPat h query ( year) nabs a nodeset t hat cont ains a single value ( 1942) . The second XPat h query ( leads/*) ret urns a nodeset
cont aining t hree values ( Humphrey Bogart, Ingrid Bergman, and Peter Lorre) . For each row ret urned by t he SELECT com m and,
xpath_table() produces one or m ore t uples. The num ber of new t uples is det erm ined by t he largest nodeset ret urned by t he XPat h queries.
I n t his case, t he largest nodeset cont ains t hree nodes, so xpath_table() will add t hree t uples t o t he final result set . To form t he new t uples,
xpath_table() copies t he key value ( " Casablanca") int o t he first colum n of each new t uple, and t hen st art s copying t he nodes int o t he
rem aining colum ns. I f xpath_table() runs out of nodes for a given colum n ( and it will run out when t he nodeset s differ in lengt h) , it writ es a
NULL value int o t he t uple inst ead.

When it finishes wit h t he first row, xpath_table() repeat s t he process for each of t he rem aining row s. When it hit s t he second row ( Rear
Window) , xpath_table() again finds t hat t he nodeset produced by t he first XPat h query cont ains a single value ( 1954) and t he second
nodeset cont ains t hree values—t hat m eans t hree m ore rows in t he final result set . The last row ret urned by t he SELECT com m and ( The
Godfather) produces one nodeset t hat cont ains a single value ( 1972) and second nodeset t hat cont ains six values, so xpath_table() adds
six m ore rows t o t he final result set .

You can infer a few r ules from t his walk - t hrough:

€ The num ber of colum ns produced by xpath_tables() is always one m ore t han t he num ber of XPat h queries t hat y ou specify

€ The AS clause t hat you define m ust cont ain one m ore colum n t han t he num ber of XPat h queries t hat you specify

€ The first colum n in t he result set always cont ains values from t he key colum n t hat you specify

€ The num ber of rows produced for any given row in t he source t able is det erm ined by t he largest nodeset ext ract ed from t hat row

€ For any given row in t he source t able, som e colum ns will cont ain NULL values if t he nodeset s ext ract ed from t hat row differ in size

You can see t hat t he XPat h query funct ions ar e powerful but t hey can also be unwieldy. You can sim plify t he xm l2 funct ions by w rapping
t hem in cust om - m ade views and funct ions.

For exam ple, you can easily creat e a view t hat uses xpath_table() t o ext ract leading act ors and act resses like t his:

Code View: Scroll / Show All

movies=# CREATE VIEW film_leads AS


movies-# SELECT * FROM
movies-# xpath_table('film_name','description','filminfo','leads/*','1=1')
movies-# AS t(name text, leads text);
CREATE VIEW

test=# SELECT * FROM film_leads WHERE name = 'Casablanca';


name | leads
------------+----------------
Casablanca | Humphrey Bogart
Casablanca | Ingrid Bergman
Casablanca | Peter Lorre
(3 rows)

That 's m uch bet t er. Of course, you can t reat a view built from xpath_table() j ust like any ot her t able or view. For exam ple, you could j oin
t he film_leads view and t he tapes t able t o produce of a list of all leading act ors and act resses st arring in t he film s t hat you have in st ock:

movies=# SELECT DISTINCT ON( title, leads ) tape_id, title, leads


movies-# FROM tapes, film_leads WHERE name = title;
tape_id | title | leads
----------+---------------+----------------
MC-68873 | Casablanca | Humphrey Bogart
MC-68873 | Casablanca | Ingrid Bergman
MC-68873 | Casablanca | Peter Lorre
AH-54706 | Rear Window | Grace Kelly
AH-54706 | Rear Window | James Stewart
AH-54706 | Rear Window | Raymond Burr
AB-67472 | The Godfather | Al Pacino
AB-67472 | The Godfather | Diane Keaton
AB-67472 | The Godfather | James Caan
AB-67472 | The Godfather | Marlon Brando
AB-67472 | The Godfather | Robert Duvall
AB-67472 | The Godfather | Talia Shire
(12 rows)

( t he DISTINCT ON clause weeds out any duplicat es in case you hav e m ult iple copies of t he sam e video, each w it h a different tape_id.)

You can also sim plify t he XPat h query funct ions ( xpath_string(), xpath nodeset(), xpath_list(), and so on) by w rapping t hem in m ore
convenient form s. I n fact , you don't even have t o resort t o a procedural language ( such as PL/ pgSQL or Java) —you can writ e t he wrapper
funct ions in SQL. For exam ple, t he script shown in List ing 25.3 creat es a funct ion t hat ret urns a com m a- separat ed list of t he leads act ors and
act resses st arring in a given film .

List ing 2 5 .3 . starring.sql


— Filename: starring.sql

CREATE FUNCTION starring( title TEXT ) RETURNS TEXT AS


$$
SELECT xpath_list( description, 'leads/*')
FROM filminfo
WHERE film_name = $1
$$
LANGUAGE 'SQL';

You can call t his funct ion in t he select - list part of a SELECT com m and, in t he WHERE clause, or in bot h part s:

movies=# SELECT tape_id, title, starring( title ) FROM tapes


movies-# WHERE starring( title ) LIKE '%James Stewart%';
tape_id | title | starring
----------+-------------+--------------------------------------
AH-54706 | Rear Window | James Stewart,Grace Kelly,Raymond Burr
(1 row)

That query ret urns t he leading act ors and act resses who st ar in any video t hat you st ock t hat feat ures James Stewart.

Co n v e r t i n g X M L D a t a w i t h X S LT

At t he beginning of t he previous sect ion, I showed you how t o use XSLT t o t ranslat e an XML docum ent int o a sequence of INSERT com m ands.
XSLT can convert any XML docum ent int o ( j ust about ) any ot her form .

The xm l2 cont ribut ed m odule includes an XSLT processor t hat you can invoke from wit hin t he Post greSQL server. ( Act ually, xm l2 includes an
int erface t o t he libxslt package.) That m eans t hat you can use XSLT t o convert XML docum ent s st ored inside of your dat abase, wit hout ever
leaving t he com fort of your favorit e Post greSQL client .

XSLT is oft en used t o produce HTML web pages from XML docum ent s, and in t his sect ion, I 'll show you how t o t urn t he XML docum ent s
st ored in t he filminfo t able int o user - friendly web pages.

To convert an XML docum ent using an XSLT st ylesheet , call t he xslt_process() funct ion. xslt_process() expect s t wo argum ent s:

xslt_process( document TEXT, stylesheet TEXT ) RETURNS TEXT

I t m ay seem obvious, but it 's wort h point ing out t hat you can provide eit her argum ent as a TEXT lit eral, as an expression t hat evaluat es t o a
TEXT value, or as a reference t o a colum n in t he dat abase. The m ost convenient way t o use xslt_process() is t o st ore bot h t he XML
docum ent and t he XSLT st ylesheet in a Post greSQL t able ( probably in t wo separat e t ables) .

To st art t his exercise, I 'll creat e a t able t hat will hold XSLT st ylesheet s:

movies=# CREATE TABLE transforms( name VARCHAR, stylesheet TEXT );


CREATE TABLE

Next , I 'll creat e a funct ion nam ed stylesheet() t hat will ret rieve a st ylesheet from t he t ransform s t able, given t he nam e of t he desired
st yelsheet :

movies=# CREATE FUNCTION stylesheet( name VARCHAR ) RETURNS TEXT AS


movies-# $$
movies$# SELECT stylesheet FROM transforms WHERE name = $1
movies$# $$ LANGUAGE 'SQL';
CREATE FUNCTION

At t his point , t he movies dat abase st ores XML docum ent s inside of t he filminfo t able and st ores XSLT st ylesheet s in t he transforms t able.
List ing 25.4 shows a script t hat will add a new st ylesheet t o t he transforms t able.

List ing 2 5 .4 . movieOfTheWeek.sql

Code View: Scroll / Show All

INSERT INTO transforms VALUES(


'movieOfTheWeek',

$$<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="html"/>
<xsl:template match="/film">
<HTML>
<HEAD><TITLE>Movie of the week</TITLE></HEAD>
<STYLE TYPE="text/css">
BODY {font-family: sans-serif}
H1 {font-size: 20pt}
H2 {font-size: 22pt; font-style: italic}
LU {font-size: 16pt}
</STYLE>
<BODY>
<H1>This week's movie</H1>
<H2><xsl:value-of select="name"/></H2>

<H1>Starring:</H1>

<UL>
<xsl:for-each select="leads/lead">
<LI><xsl:value-of select="."/></LI>
</xsl:for-each>
</UL>

<H1>Directed by:</H1>

<UL>
<xsl:for-each select="directors/director">
<LI><xsl:value-of select="."/></LI>
</xsl:for-each>
</UL>

<P>Released in: <xsl:value-of select="year"/></P>

</BODY>
</HTML>
</xsl:template>
</xsl:stylesheet>$$ );

When you execut e t his script , you're st oring an XSLT st ylesheet in t he transforms t able. I won't walk you t hrough t he workings of t he
st ylesheet ; XSLT is a powerful but unfriendly language and I can't really do it j ust ice in t his book, so pick up a good XSLT book if you need
m ore inform at ion. I will point out a very im port ant rule, t hough —t he XSLT processor included wit h xm l2 exam ines t he first charact er in t he
st ylesheet t o decide whet her you've given it a real st ylesheet or a reference t o a rem ot e st ylesheet . I f t he first charact er is a " <", t he XSLT
processor assum es t hat t he st ring is t he st ylesheet . I f t he first charact er is anyt hing ot her t han a " <", t he XSLT processor assum es t hat
you've given it a URI t hat ident ifies t he real st ylesheet ( t hat is, t he st ring is t reat ed as if it were som et hing like " ht t p: / / exam ple.com / t he-
real- st ylesheet .xsl" ) . When you INSERT ( or UPDATE) a st ylesheet int o a t able, m ake sure t hat t he opening " <" im m ediat ely follows t he quot e
charact er( s) —you don't want a newline bet ween t he " $$" and t he "<" or you'll spend a great deal of t im e increasing your four - let t er
vocabulary.

To apply t he st ylesheet shown in List ing 25.4 , choose a film t hat you want t o feat ure ( say, Casablanca) and execut e t he following
com m ands:

movies=# \t
Showing only tuples.
movies=# \o movieOfTheWeek.html
movies=# SELECT xslt_process( description, stylesheet( 'movieOfTheWeek' ))
movies-# FROM filminfo
movies-# WHERE film_name = 'Casablanca';
movies=# \o
movies=# \t
Tuples only is off

The first com m and ( \t) t urns on psql's " t uple- only m ode." I n t his m ode, psql om it s t he colum n headers ( and com m and responses) t hat you
norm ally see when you execut e a quer y —you only see t he dat a values ret urned by each query. The second com m and ( \o
movieOfTheWeek.html) t ells psql t o creat e a new file nam ed movieOfTheWeek.html ( in t he current direct ory) and writ e com m and result s t o
t hat file. Once you've execut ed t hose t wo com m ands, psql will writ e raw query result s ( t uples only) t o movieOfTheWeek.html.

As I m ent ioned earlier, xslt_process() expect s t wo argum ent s. The first argum ent t ells xslt_process() t o convert t he XML docum ent
found in filminfo's description colum n. The second argum ent specifies t he st ylesheet t hat xslt_process() will use t o cont rol t he
conversion process. The SELECT com m and calls t he stylesheet() funct ion I defined earlier ( at t he beginning of t his sect ion) t o ret rieve t he
movieOfTheWeek st ylesheet .

When t he SELECT com m and com plet es, t he \o com m and closes movieOfTheWeek .html and redirect s psql out put t o your screen. The \t
com m and t urns t uple- only m ode off again so t hat you'll see colum n headers ( and com m and t ags) .

The overall result here is t hat you've produced an HTML docum ent ( movieOfTheWeek.html) by convert ing t he an XML docum ent using an
XSLT st ylesheet . I f y ou view movieOfTheWeek.html in a web browser, you'll see a page sim ilar t o t hat shown in Figure 25.1.

Figur e 2 5 .1 . movieOfTheWeek.html.
U si n g Fu l l - t e x t Se a r ch

I f you've ever used Google t o search for a web sit e ( and who hasn't ) , you've used a full- t ext search engine. A full- t ext search engine cat alogs
t he words and phrases found in a set of docum ent s and t hen let s you search for docum ent s t hat m at ch a given pat t ern. The t search2
cont ribut ed m odule adds full- t ext search feat ures t o a Post greSQL server. When you add t he t search2 package t o a dat abase, you get a sm all
collect ion of new dat a t ypes, a large collect ion of new funct ions ( m ost of which you'll never invoke direct ly) , and a new search operat or.
When you use t search2, t he docum ent s t hat you cat alog and search are t he values st ored in your dat abase. I f you have customers t able t hat
st ores m iscellaneous not es about each cust om er, t he notes colum n in each row m ight be a searchable docum ent . I f your dat abase includes a
parts list , you m ay want t o cat alog t he descript ion of each part . I n t search2 t erm s, a docum ent is a st ring t hat you want t o cat alog and
search. A pat t ern is a logical collect ion of words t hat you want t o find. A pat t ern m ay be as sim ple as a single word. I f a pat t ern cont ains
m ult iple words, you can use pat t ern operat ors t o define how t he words relat e t o t he docum ent s t hat you want t o search. For exam ple, you
can creat e a pat t ern t hat m at ches any docum ent t hat cont ains every word in t he pat t ern, a pat t ern t hat m at ches any docum ent t hat cont ains
any of t he words in t he pat t ern, a pat t ern t hat m at ches all docum ent s t hat cont ain none of t he words in t he pat t ern, or any com binat ion.

Before y ou t ry out any of t he exam ples in t his sect ion, inst all t search2 int o your dat abase and t hen execut e t he following com m and:

movies=# SELECT set_curcfg( 'default' );


set_curcfg
----------

(1 row)

I f you don't call set_curcfg() first , t he queries t hat I show y ou will fail w it h t he error:

ERROR: could not find tsearch config by locale

I 'll explain t he set_curcfg() funct ion in a m om ent , but let 's look at a few sam ple queries first .

The @@ operat or is sim ilar in concept t o LIKE, ILIKE, or t he regular - expression operat ors ( ~, ~*, and so on) : I t com pares a pat t ern against a
st ring. To use t he @@ operat or, you m ust convert t he pat t ern int o an obj ect of t ype tsquery and t he st ring you want t o com pare against int o
an obj ect of t ype tsvector. For exam ple, t o search t he tapes t able for t it les t hat include t he word " Godfat her" :

movies=# SELECT tape_id, title FROM tapes WHERE


movies-# to_tsvector( title ) @@ to_tsquery( 'Godfather' );
tape_id | title
----------+--------------
AB-12345 | The Godfather
AB-67472 | The Godfather
(2 rows)

I f you want t o search for all t it les t hat include Rear and Window, t he search pat t ern should look like t his:

to_tsquery( 'Rear & Window' )

To search for all t it les t hat include Rear or Window, separat e t he words w it h t he | operat or, like t his:

to_tsquery( 'Rear | Window' )

To search for all t it les t hat include t he word Window, but not t he word Rear, use t he ! operat or:

to_tsquery( 'Rear & ! Window' )

You can com bine operat ors t o creat e com plex pat t erns. Use parent heses t o group expressions. For exam ple, t his pat t ern

to_tsquery( 'Island & (Earth | Gilligan)' )

will m at ch This Island Earth and Escape From Gilligan's Island, but not Escape From Devil's Island.

I t 's easy t o confuse t he tsvector and tsquery dat a t ypes since t hey appear t o be so sim ilar. They are, in fact , very different . A tsvector
cont ains a cat alog of t he words t hat appear in a docum ent . A tsquery cont ains a search pat t ern, not j ust a list of words. I n fact , if you t r y t o
creat e a tsquery from a st ring t hat cont ains a list of words, t search2 will reward y ou wit h an error m essage ( ERROR: syntax error) .
I nst ead, y ou have t o t ell t search2 how t hose words relat e t o t he docum ent t hat you're searching—you have t o separat e m ult iple words wit h a
tsquery operat or ( &, |, or !) .

Of course, if pat t ern m at ching was t he only feat ure offered by t search2, it wouldn't be very excit ing—aft er all, you can m at ch pat t erns w it h
LIKE/ ILIKE or t he regular- expression operat ors. t search2 support s t hree dist inct advant ages over t he ot her pat t ern m at ching m echanism s
offered by Post greSQL:

€ St op words

€ St em m ing
€ I ndexing

A st op word is a word t hat t search2 aut om at ically ignores when searching and indexing. I f you've ever Googled for a phrase like the world's
best margarita recipe, you m ay have not iced t he following reply:

"the" is a very common word and was not included in your search.

" The" is a st op word—it 's a word t hat would m at ch j ust about every ( English) docum ent ever searched ( ot her exam ples include " a," " and,"
" not ," " som e," and so on.) . St op words t ake up ext ra space ( and slow down searches) wit hout cont ribut ing t o t he t ask of ident ifying
int erest ing docum ent s. When you convert a search pat t ern int o a tsquery obj ect , t search2 st rips out any st op words t hat it finds in t he
pat t ern.

St em m ing is t he process of ident ifying word variat ions. When you creat e a search pat t ern int o a tsquery obj ect , to_tsquery() replaces each
word in t he pat t ern w it h it s st em . For exam ple, donate, donation, donating, donates, and donated are all variat ions of t he sam e word. I f
to_tsquery() finds one of t hose variant s in a search pat t ern, it replaces each occurrence wit h t he st em : donat. That m eans t hat a search for
donate will m at ch docum ent s t hat cont ain any variant of donate.

Of course, if you st em a search pat t ern, you m ust also st em t he docum ent s t hat y ou are searching t hrough. to_tsvector() rem oves st ops
words and st em s word variat ions using t he sam e set of rules used by to_tsquery().

The process of st em m ing, st opping, and cat aloging t he words in a docum ent is expensive. When you execut e a query t hat invokes
to_tsvector() like t his:

SELECT tape_id, title FROM tapes WHERE


to_tsvector( title ) @@ to_tsquery( 'Godfather' );

t he Post greSQL server w ill st em , st op, and cat alog every row in t he t able. I f you execut e t he sam e query ( or a sim ilar query) again, t he
server has t o st em , st op, and cat alog every row a second t im e. You can great ly im prove perform ance by building a tsvector for each
docum ent at t he t im e you add t he docum ent t o t he dat abase.

The movies sam ple dat abase t hat we've been using in m ost of t his book doesn't really cont ain enough dat a t o t horoughly exercise t search2,
but t he recalls t able present ed in Chapt er 4, " Perform ance," does. The recalls t able ( in t he perf dat abase) cont ains 39,241 rows of
inform at ion about aut om obile recalls. Each row cont ains t hree large VARCHAR fields t hat cont ain a descript ion of a defect , t he possible
consequences of t he defect , and t he correct ive act ion prom ised by t he m anufact urer.

To dem onst rat e t he perform ance benefit s offered by t search2, I 'll use t search2 t o count t he num ber of recalls t hat cont ain t he word
hydraulic ( in t he desc_defect colum n ) .

I t t akes m y com put er approxim at ely 6.4 seconds t o execut e t he following query:

SELECT COUNT(*) FROM recalls WHERE


to_tsvector( desc_defect ) @@ to_tsquery( 'hydraulic' );

That query st em s, st ops, and cat alogs ever y desc_defect in t he recalls t able, and ident ifies 808 rows t hat m at ch t he given pat t ern. I f I
execut e t he sam e quer y repeat edly, each it erat ion t akes ( approxim at ely) t he sam e am ount of t im e.

Just for purposes of com parison, it t akes approxim at ely 2.7 seconds t o perform a sim ilar query using a regular - expression:

SELECT COUNT(*) FROM recalls WHERE


desc_defect ~* 'hydraulic';

Again, if I execut e t he sam e query repeat edly, each it erat ion t akes roughly t he sam e am ount of t im e.

Why does a t search2- based search t ake nearly t wo and a half t im es longer t han a regular - expression search? Because t he to_tsvector()
funct ion is st em m ing, st opping, and cat aloging every word found in t he desc_defect colum n. The regular - expression search sim ply scans
t hrough t he desc_defect colum n and st ops as soon as it finds t he word hydraulic. I n fact , t he t search2 and t he regular - expression search
ident ify a different set of m at ches ( because of t he st em m ing rules and parsing rules used by t search2) .

I t 's usually a w ast e of t im e t o st em , st op, and cat alog every row for each t search2 query, because t he vast m aj orit y of t he docum ent s rem ain
unchanged from query t o query. I nst ead, I 'll add a tsvector colum n t o m y recalls t able and " precom put e" t he st em , st op, and cat alog
inform at ion required for a t search2 query:

perf=# ALTER TABLE recalls


perf-# ADD COLUMN fts_desc_defect TSVECTOR;
ALTER TABLE

perf=# UPDATE RECALLS SET


perf-# fts_desc_defect = to_tsvector( desc_defect );
UPDATE 39241

perf=# VACUUM FULL ANALYZE recalls;


VACUUM

Now I can r un a t search2 query again, but t his t im e, I search t he new fts_desc_defect colum n inst ead:
SELECT COUNT(*) FROM recalls WHERE
fts_desc_defect @@ to_tsquery( 'hydraulic' );

Not ice t hat I don't have t o convert fts_desc_defect int o a tsvector because it already is a tsvector. This query ident ifies t he sam e set of
rows select ed by t he fir st query, but t his query r uns in 0.22 seconds. That 's a considerable im provem ent over t he original query ( 6.7
seconds) . But I can m ake it fast er st ill.

t search2 provides t he infrast ruct ure required t o index tsvector values. To creat e an index t hat @@ can use:

Code View: Scroll / Show All

perf=# CREATE INDEX fti_desc_defect ON recalls USING GIST( fts_desc_defect );


CREATE INDEX

perf=# VACUUM FULL ANALYZE recalls;


VACUUM

Now when I search t he fts_desc_defect colum n ( using t he sam e query) , it t akes less t han .04 seconds ( 4/ 100t hs of a second) t o ident ify
t he sam e set of rows. Post greSQL uses t he index on fts_desc_defect t o read only t hose rows t hat m at ch t he search pat t ern.

At t his point , I 've im proved perform ance, but I 've int roduced a bug. I f I search t he desc_defect colum n, Post greSQL has t o st em , st op, and
cat alog every row in t he recalls t able ( every t im e I search) and I get poor perform ance. I f I search t he pre- cat aloged fts_desc_defect
colum n, I get good perform ance. But what happens if I add a new row t o t he recalls t able? Or UPDATE an exist ing row ( and change t he words
in t he desc_defect colum n) ? Searching against desc_defect guarant ees t hat I 'll see t he m ost recent dat a. Searching against
fts_desc_defect guarant ees t hat I 'll see obsolet e dat a. Fort unat ely, t his problem is easy t o fix—in fact , t here are t wo different solut ions.

The m ost obvious w ay t o keep fts_desc_defect up- t o- dat e is t o creat e a TRIGGER t hat recom put es t he tsvector whenever I add a new row
or updat e an exist ing r ow . t search2 even com es w it h a funct ion t hat you can use t o im plem ent t he t rigger:

perf=# CREATE TRIGGER tg_fts_recalls


perf-# BEFORE UPDATE OR INSERT ON RECALLS
perf-# FOR EACH ROW
perf=# EXECUTE PROCEDURE tsearch2( fts_desc_defect, desc_defect );
CREATE TRIGGER

The tsearch2() funct ion expect s t wo argum ent s: t he nam e of a tsvector colum n and t he nam e of a t ext ( or ot her st ring- valued) colum n.
The t rigger will effect ively call ts_tovector( desc_defect ) and copy t he result int o t he fts_desc_defect colum n.

I can t est t his t rigger pret t y easily:

Code View: Scroll / Show All

perf=# SELECT COUNT(*) FROM recalls fts_desc_defect @@ to_tsquery('hydraulic');


count
------
808
(1 row)

perf=# UPDATE recalls


perf-# SET desc_defect = 'busted hydraulic line'
perf-# WHERE record_id = 4909;
UPDATE 1

perf=# SELECT COUNT(*) FROM recalls fts_desc_defect @@ to_tsquery('hydraulic');


count
------
809
(1 row)

Anot her w ay t o solve t his problem is t o drop t he fts_desc_defect colum n—you don't need it t o gain t he benefit s offered by pr e- cat aloging
and indexing. I nst ead of adding an fts_desc_defect colum n and t hen creat ing an index t hat covers t hat colum n, j ust creat e a funct ion-
based index. First , I 'll clean out t he tsvector colum n t hat I added earlier:

perf=# DROP TRIGGER tg_fts_recalls ON recalls;


DROP TRIGGER

perf=# DROP INDEX fti_desc_defect;


DROP INDEX

perf=# ALTER TABLE recalls DROP COLUMN tg_fts_recalls;


ALTER TABLE

Now I 'll creat e a new index funct ion- based index:


perf=# CREATE INDEX fti_desc_defect ON recalls
perf-# USING GIST( to_tsvector( desc_defect ));
CREATE INDEX

As you m ight expect , it t akes a while t o creat e t he funct ion- based index. Post greSQL reads t hrough every row in t he recalls t able; invokes
t he to_tsvector() funct ion t o st em , st op, and cat alog t he desc_defect colum n; and t hen st ores t he result in t he new index. Of course, if I
add a new row t o t he recalls t able ( or updat e an exist ing row) , Post greSQL ensures t hat t he index is kept up- t o- dat e.

By creat ing a funct ion- based index ( or, m ore properly, an expression- based index) , I elim inat e t he need for a t rigger, I can get r id of t he
ext ra tsvector colum n ( and save quit e a bit of space) , and I st ill get t he perform ance boost offered by a pre- cat aloged index.

Se a r ch i n g M u l t i p l e Co l u m n s

The recalls t able cont ains t hree VARCHAR fields t hat we m ight want t o search: desc_defect ( a descript ion of t he defect ) , con_defect
( possible consequences of t he defect ) , and cor_action ( t he correct ive act ion prom ised by t he m anufact urer) . I could search all t hree
colum ns using a quer y such as

perf=# SELECT COUNT(*) FROM recalls WHERE


perf-# to_tsvector(desc_defect) @@ to_tsquery('hydraulic' )
perf-# OR
perf-# to_tsvector(con_defect) @@ to_tsquery('hydraulic' )
perf-# OR
perf-# to_tsvector(cor_action) @@ to_tsquery('hydraulic' );
count
------
902
(1 row)

That works, but it 's not a sim ple query t o writ e. I nst ead of searching t hrough each colum n individually, I can st ring all t hree colum ns t oget her
and search t hrough t he concat enat ion:

perf=# SELECT COUNT(*) FROM recalls WHERE


perf-# to_tsvector(desc_defect || con_defect || cor_action)
perf-# @@ to_tsquery('hydraulic' )
count
------
902
(1 row)

Unfort unat ely, t he sim plicit y of t his query is m isleading: I t works for t he recalls t able, but it won't produce correct result s if your docum ent s
cont ain any NULL values. ( The docum ent colum ns in recalls cont ain no NULL values.) A NULL value wonks out t his query because t he
concat enat ion operat or ( ||) assum es t hat any st ring appended t o a NULL result s in a NULL. I n short : 'hydraulic line' || NULL ||
'hydraulic piston' evaluat es t o NULL. That m eans t hat a NULL value in desc_defect, con_defect, or cor_action would effect ively hide
t he ot her values in t hat r ow . To fix t his problem , I can rewrit e t he query using t he coalesce() funct ion t o m ap NULL values int o som e ot her
value ( in t his case, an em pt y st ring) :

perf=# SELECT COUNT(*) FROM recalls WHERE


perf-# to_tsvector(
perf(# COALESCE( desc_defect, '' ) || ' ' ||
perf(# COALESCE( con_defect, '' ) || ' ' ||
perf(# COALESCE( cor_action, '' ))
perf-# @@ to_tsquery('hydraulic' )
count
------
902
(1 row)

This query t akes about 15 seconds on m y com put er. ( The OR version t akes t he sam e am ount of t im e.) So m uch for sim plicit y —t he OR version
was easier t o writ e and easier t o underst and.

However, I 've been leading you down t his t ort uous pat h for a good reason. Rem em ber t hat in t he previous sect ion I showed you how t o
creat e an expression- based index? I can creat e an index defined by t he rat her com plex expression t hat I w rot e in t hat last query, and
Post greSQL will use t hat index t o search for pat t erns in desc_defect, con_defect, and cor_action.

perf=# CREATE INDEX fti_recalls ON recalls USING GIST(


perf-# to_tsvector(
perf(# COALESCE( desc_defect, '' ) || ' ' ||
perf(# COALESCE( con_defect, '' ) || ' ' ||
perf(# COALESCE( cor_action, '' ));
CREATE INDEX

Now, when I run t he previous query ( t he one wit h all t he COALESCE noise in it ) , I see t he result s in 0.44 seconds.

Si m p l i f y i n g t se a r ch 2 w i t h Cu st o m i z e d Fu n ct i o n s

t search2 queries t end t o be rat her unwieldy. You can sim ply t search2 by creat ing a few wrapper funct ions t hat hide t he det ails of t he
com plicat ed queries. For exam ple, I can creat e a funct ion nam ed documents() t hat will ret urn t he ( properly coalesced) concat enat ion of
desc_defect, con_defect, and cor_action:
perf=# CREATE FUNCTION documents( recall recalls ) RETURNS TSVECTOR AS
perf-# $$
perf$# SELECT
perf$# COALESCE( $1.desc_defect, '' ) || ' ' ||
perf$# COALESCE( $1.con_defect, '' ) || ' ' ||
perf$# COALESCE( $1.cor_action, '' );
perf$# $$ LANGUAGE 'SQL' IMMUTABLE;

I can also define a funct ion nam ed document() ( singular t his t im e) t hat convert s documents() int o a tsvector:

perf=# CREATE FUNCTION document( recall recalls ) RETURNS TSVECTOR AS


perf-# $$
perf$# SELECT to_tsvector( documents( $1 ));
perf$# $$ LANGUAGE 'SQL' IMMUTABLE;

Now I can use t he document() funct ion in conj unct ion w it h t he @@ operat or:

perf=# SELECT COUNT(*) FROM recalls r


perf-# WHERE document(r) @@ to_tsquery( 'hydraulic' );
count
-----
902
(1 row)

That 's m uch easier t o t ype in and m uch easier t o read. I can even creat e a funct ion- based index based on document():

perf=# DROP INDEX fti_recalls;


DROP INDEX

perf=# CREATE INDEX fti_recalls USING GIST( document( recalls ));


CREATE INDEX

The document() funct ion ret urns a tsvector based on desc_defect, con_defect, and cor_action. The only t hing t hat I can do wit h a
tsvector is search it , so I m ay as well add anot her funct ion t hat sim plifies t he search:

Code View: Scroll / Show All

perf=# CREATE FUNCTION find_recalls( pattern text ) RETURNS SETOF RECALLS AS


perf-# $$
perf$# SELECT * FROM recalls WHERE
perf$# to_tsvector(
perf(# COALESCE( desc_defect, '' ) || ' ' ||
perf(# COALESCE( con_defect, '' ) || ' ' ||
perf(# COALESCE( cor_action, '' ))
perf(# @@ to_tsquery( $1 );
perf$# $$ LANGUAGE 'SQL';
CREATE FUNCTION

Now I can sim ply invoke find_recalls() t o search for a pat t ern in desc_defect, con_defect, or cor_action:

perf=# SELECT COUNT(*) FROM find_recalls( 'hydraulic' );


count
-----
902
(1 row)

Se a r ch i n g f o r P h r a se s

You can't use t search2 t o search for phrases—t search2 cat alogs t he individual words in a docum ent , but doesn't keep enough inform at ion t o
know when one word direct ly follows anot her. You can't use t search2, for exam ple, t o search for a phrase such as hydraulic line.
( Enclosing t he phrase in quot es won't help.) You can search for a pat t ern such as hydraulic & line, but t hat will m at ch all docum ent s t hat
cont ain bot h words, even if line appears before hydraulic or if hydraulic is separat ed from line by a num ber of ot her words.

To search for a phrase, you have t o use LIKE, ILIKE, or a regular - expression operat or. For exam ple

perf=# SELECT COUNT(*) FROM recalls


perf-# WHERE documents(recalls) ~* 'hydraulic line';
count
------
53
(1 row)

But y ou can st ill use t search2 t o speed up a phrase search. I t st ands t o reason t hat any docum ent t hat cont ains t he phrase hydraulic line
will cont ain t he individual words hydraulic and line, right ? Looking at it t he ot her w ay around, a docum ent cannot cont ain t he phrase
hydraulic line unless it cont ains t he individual words hydraulic and line. You already know how t o ident ify t he set of rows t hat cont ain t he
words hydraulic and line—j ust add AND document(recalls) @@ to_tsquery( 'hydraulic & line' ) t o t he WHERE clause.

The original version of t his query ( regular - expression only) t akes about 8.2 seconds t o run on m y com put er. By adding t search2 t o a regular -
expression based phrase search, I can drast ically reduce t he num ber of rows t hat t he server will have t o search. The new query looks like
t his:

perf=# SELECT COUNT(*) FROM recalls


perf-# WHERE documents(recalls) ~* 'hydraulic line'
perf-# AND document(recalls) @@ to_tsquery('hydraulic & line');
count
------
53
(1 row)

The new version t akes 0.12 seconds t o ident ify t he sam e set of rows. I can use t he timer ut ilit y ( described in Chapt er 4) t o see what t he
server does wit h each query. Here's t he timer out put from t he first ( regular - expression only) query:

Code View: Scroll / Show All

$ timer "SELECT COUNT(*) FROM recalls


> WHERE documents(recalls) ~* 'hydraulic line'"
+--------+---------------+------------------+---------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| | scans| tuples | heap_blks |cached| scans| tuples | idx_blks |cached|
|--------+------+--------+-----------+------+------+--------+----------+------+
|recalls | 1| 39241 | 5399 | 0 | 0| 0 | 0 | 0 |
|--------+------+--------+-----------+------+------+--------+----------+------+

You can see t hat t he server evaluat ed t his query by scanning every one of t he 39,241 rows in t he t able.

Code View: Scroll / Show All

$ timer "SELECT COUNT(*) FROM recalls


> WHERE documents(recalls) ~* 'hydraulic line'"
> AND document(recalls) @@ to_tsquery('hydraulic & line');
+--------+----------------------------------+---------------------------------+
| | SEQUENTIAL I/O | INDEXED I/O |
| | scans| tuples | heap_blks |cached| scans| tuples | idx_blks |cached|
|--------+------+--------+-----------+------+------+--------+----------+------+
|recalls | 0| 0 | 71 | 5 | 1| 126 | 173 | 298 |
|--------+------+--------+-----------+------+------+--------+----------+------+

The server first uses an index scan ( on fts_recalls) t o quickly ident ify t he 126 rows t hat sat isfy t he t search2 crit eria and t hen m at ches
each of t hose rows against t he regular expression.

Co n f i g u r i n g t se a r ch 2

Each t im e you st art a new client session, t search2 t ries t o find a configurat ion t hat m at ches t he server's locale. t search2 configurat ions are
st ored in a sm all collect ion of t ables: pg_ts_cfg, pg_ts_cfgmap, pg_ts_parser, and pg_ts_dict. t search2 com es wit h t hree predefined
configurat ions: default, default_russian, and simple:

perf=# SELECT * FROM pg_ts_cfg


ts_name | prs_name | locale
-----------------+----------+-------------
default | default | C
default_russian | default | ru_RU.KOI8-R
simple | default |
(3 rows)

To find t he proper configurat ion, t search2 searches pg_ts_cfg for a row where t he locale colum n m at ches your server's locale. I f it can't find
a m at ching configurat ion, you'll see a m essage st at ing ERROR: could not find tsearch config by locale. You can find t he locale used
by your server wit h t he following query:

perf=# SELECT setting FROM pg_settings WHERE name = 'lc_ctype';


setting
------------
en_US.UTF-8
(1 row)

I f your server's locale doesn't m at ch any of t he locales in pg_ts_cfg, you have four opt ions:

€ Specify a configurat ion on every call t o to_tsvector() and to_tsquery()


€ Call t he set_curcfg() funct ion at t he beginning of every client session

€ Clone an exist ing configurat ion

€ Creat e a new configurat ion from scrat ch

The first opt ion is sim ple, but it com plicat es your code. The to_tsvector() funct ion com es in t wo flavors[ 2] . To use t he fir st flavor, you
invoke to_tsvector() wit h a single st ring argum ent and it convert s t hat st ring int o a tsvector using t he " current " configurat ion. To use t he
second flavor, call ts_vector() wit h t wo argum ent s: t he nam e of a configurat ion and a st ring. to_tsvector() will convert t he st ring int o a
tsvector using t he configurat ion t hat y ou specified in t he first argum ent . ts_toquery() com es in t wo flavors as well.

[ 2] t search2 provides a t hir d flavor for to_tsvector() and to_tsquery(). You m ust know t he OI D of a pg_ts_cfg row t o use
t he t hird form .

The second opt ion is inconvenient and som ewhat dangerous. You have t o call set_curcfg() in every client session t hat m ight use t search2—
if you've creat ed a t rigger or index based on t search2, t hat m eans you have t o call set_curcfg() in any session t hat could updat e a
cat aloged colum n.

Cloning an exist ing configurat ion is oft en t he easiest and safest choice. Cloning an exist ing configurat ion is a t wo- st ep process. First , you add
a new row t o t he pg_ts_cfg table, t hen you m ake a copy of t he corresponding ent ries in t he pg_ts_cfgmap table. For exam ple, if your
server's locale is en_US.UTF8, you can clone t he default configurat ion w it h t he following com m ands:

Code View: Scroll / Show All

perf=# INSERT INTO pg_ts_cfg VALUES( 'default_enUS', 'default', 'en_US.UTF8');


INSERT

perf=# INSERT INTO pg_ts_cfgmap


perf-# SELECT 'default_enUS', tok_alias, dict_name
perf-# FROM pg_ts_cfgmap WHERE ts_name = 'default';
INSERT

The first com m and creat es a new ( em pt y) configurat ion nam ed default_enUS—t search2 will select t his configurat ion when t he server's locale
is en_US.UTF8. The second com m and clones t he default ent ries in pg_ts_cfgmap, creat ing an ident ical set of ent ries t hat belong t o
default_enUS. Once you've creat ed a clone t hat m at ches your server's locale, you should be able t o use t search2 wit hout specifying an
explicit configurat ion in each call t o to_tsquery() and to_tsvector().

Creat ing a new configurat ion from scrat ch is not t oo com plex, but you'll need an underst anding of t he st em m ing, st opping, and cat aloging
process before you st art .

When you creat e a tsvector from a t ext st ring, t search2 st art s by invoking a parser t hat picks apart t he t ext st ring int o it s com ponent part s.
t search2 com es w it h a single parser ( nam ed default) , but you can writ e your ow n parser if you have special requirem ent s. The default
parser was designed t o parse plain- t ext and HTML docum ent s: I t knows how t o process HTML t ags, HTTP headers, em ail and host addresses,
and so on. The parser ident ifies and classifies each t oken in a t ext st ring. For exam ple, given t he st ring " send 42 messages to
bruce@example.com", t he default parser will ident ify four words ( send, messages, and to) , one unsigned int eger ( 42) , and an em ail address
( bruce@example.com) . The default parser classifies each t oken int o one ( or m ore) of t he cat egories shown in Table 25.1 [ 3] .

[ 3]
The default parser also defines t hree cat egories for words and word fragm ent s com posed of Cyrillic charact ers: nlword,
nlhword, and part-nlhword.

Ta ble 2 5 .1 . t se a r ch 2 Le x ica l Ca t e gor ie s

Ca t e gor y D e scr ipt ion Ex a m ple s


lword Any word com posed ent irely of alphabet ic bruce
charact ers
word Any word com posed of alphabet ic and bruce42, postgres81
num eric charact ers
email An I nt ernet em ail address ( user@host) bruce@example.com

url An HTTP or FTP URL ht t p: / / www.post gresql.org/ index.ht m l


ft p: / / ft p.post gresql.org/ index.ht m l
host An I nt ernet host nam e www.post gresq.orglocalhost .localdom ain
sfloat A float ing point num ber in scient ific 325.667E12 6.626E-34
not at ion
version A generic version num ber ( a num ber wit h 8.0.0 2.6.9.1
m ore t han one decim al point )
part_hword Part s of a hyphenat ed word post-gres-sql8

lpart_hword Lat in part s of a hyphenat ed word post-gre-sql8

blank Whit espace and any charact ers not (parens are considered blanks) $so
m at ched by ot her rules are other special characters!
tag An HTML t ag <tr> <a href="img.png"/>

http The prot ocol com ponent of an HTTP URL ht t p: / / www.post gresql.org
hword A hyphenat ed word postgre-sql8

lhword A hyphenat ed Lat in word postgre-sql

uri A uniform resource ident ifier ( usually t he ht t p: / / www.post gresql.org/ index.ht m l


filenam e com ponent of a URL)
file A relat ive or absolut e Linux/ Unix /tmp/README.txt../ README.txt
pat hnam e
float A float ing- point num ber 3.14159 6.626

int A signed int eger -32 +45

uint An unsigned int eger 32 45

entity An HTML ent it y &nbsp; &#044;

When t he parser finishes t okenizing and classifying t he t ext st ring, it ends up w it h a collect ion of t oken values and each t oken is assigned t o a
cat egory. Som e of t he " words" in t he t ext st ring m ay result in m ult iple t okens. For exam ple, t he st ring ht t p: / / www.post gresql.org/ index.ht m l
produces four t okens ( you can call t he ts_debug() funct ion t o see t he result of t he parsing process) :

perf=# SELECT token, tok_type


perf-# FROM ts_debug('http://www.postgresql.org/index.html');
token | tok_type
-------------------------------+---------
http:// | http
www.postgresql.org/index.html | url
www.postgresql.org | host
/index.html | uri
(4 rows)

Next , t he parser it erat es t hrough t he list of t okens and weeds out any t hat are deem ed unint erest ing. To decide which t okens t o discard,
to_tsvector() uses t he t oken t ype ( and t he configurat ion nam e) t o locat e a record in t he pg_ts_cfgmap t able. I f to_tsvector() can't find a
m at ching ent ry in pg_ts_cfgmap, it discards t he t oken. For exam ple, given t he t okens parsed from ht t p: / / www.post gresql.org/ index.ht m l,
t o_t svect or( ) finds:

perf=# SELECT * FROM pg_ts_cfgmap


perf-# WHERE ts_name = 'default'
perf-# AND tok_alias IN( 'http', 'url', 'host', 'uri' );
ts_name | tok_alias | dict_name
---------+-----------+----------
default | url | {simple}
default | host | {simple}
default | uri | {simple}
(3 rows)

Not ice t hat t search2 won't find an ent ry for ts_name = 'default' and tok_alias = 'http', so it discards t hat t oken ( t he ht t p: / / header) .
The default configurat ion discards blank, tag, http, and entity t okens. Discarding a " word" based on it s t oken classificat ion is sim ilar t o
st opping an ent ire cat egory of words. Discarded t okens are not cat aloged by t search2, so you won't be able t o search for t hem . Of course,
you can t ell t search2 t hat you want it t o cat alog a given cat egory by adding t hat cat egory t o t he pg ts cfgmap t able. Sim ilarly, you can t ell
t search2 t o ignore a given cat egory ( say, t he file cat egory) by rem oving t hat cat egory from pg_ts_cfgmap.

For each t oken t hat m akes it t hrough t he pg_ts_cfgmap filt er, t search2 st art s t he st em m ing and st opping process. When to_tsvector()
finds an ent ry in pg_ts_cfgmap t hat m at ches t he configurat ion nam e and t oken t ype, t hat ent ry ident ifies a dict ionary processor .
ts_tsvector() feeds t he t oken int o t hat dict ionary processor and adds t he result ( if any) t o t he tsvector. The dict ionary processor m ay
st em t he t oken by t rans lat ing it int o a new t oken. The dict ionary processor m ay inst ead st op t he word by ret urning a NULL value. Or, t he
dict ionary processor m ay pass t he t oken t hrough wit hout m odificat ion.

The t search2 package com es wit h five sam ple dict ionary processors.

The simple dict ionary processor convert s each t oken int o lowercase charact ers and t he searches for t he result in a list of st op words—if it
finds t he ( lowercased) t oken in t he list , it ret urns NULL, ot herwise it ret urns t he lowercased t oken t o to_tsvector() ( and to_tsvector()
adds t he t oken t o t he tsvector t hat it 's building) . t search2 inst alls t he sim ple dict ionary processor wit h an em pt y st op word list ( which
m eans t hat every t oken m akes it t hrough t he sim ple dict ionary aft er it 's been t ranslat ed t o lowercase) . To add a st op word list ( which is j ust
a newline- separat ed list of words) , save t he nam e of your st op word file in t he dict_initoption colum n of t he pg_ts_dict row
corresponding t o t he simple dict ionary processor. For exam ple, if you've st ored a list of st opwords in a file
nam ed /usr/share/stopwords.english, execut e t he following com m and:

Code View: Scroll / Show All

perf=# UPDATE pg_ts_dict SET dict_initoption = '/usr/share/stopwords.english';


UPDATE

The en_stem dict ionary processor handles st op words and st em m ing. en_stem searches for t he t oken in a list of st op words and discards t he
t oken if found. ( Like t he sim ple dict ionary processor, en_stem finds t he st op word list in it s pg_ts_dict.dict initoption.) I f t he t oken is
not found in t he st op word list , en_stem t ries t o convert t he t oken int o it s root form by st ripping off com m on English prefixes and suffixes. For
exam ple, en_stem convert s donate, donation, donating, donates, and donated int o t he st em donat. to_tsvector() st ores t he st em in t he
tsvector t hat it 's building. I f you search for t he word donate, t search2 will m at ch donate, donation, donating, donates, and donated.

The ru_stem dict ionary processor is ident ical t o t he en_stem processor except it st em s each t oken using rules designed for Russian t ext . ( You
would m ost likely use a different list of st op words t oo.)

The synonym dict ionary processor doesn't do any st op word processing ( or st em m ing) . When y ou feed a t oken t o t he synonym processor, it
searches for a m at ch in a list of word- synonym pairs. I f it finds a m at ch, t he processor ret urns t he synonym . For exam ple, given t he list of
synonym s:

zaurus pda
newton pda
pocketpc pda
nokia phone
treo phone

The synonym processor will t ranslat e zaurus, newton, and pocketpc int o pda, and will t ranslat e nokia and treo int o phone. I f synonym can't
find a m at ch in t he list , it ret urns NULL.

The last dict ionary processor is nam ed ispell_template. ispell_template is based on t he ispell program and it searches for each t oken
in a separat e dict ionary file ( not included wit h t search2) . I f ispell_template finds t he t oken ( or a variant of t he t oken) in t he dict ionary, it
ret urns t he st em m ed form of t he word t o to_tsvector(). I f ispell_template can't find t he t oken ( or a variant of t he t oken) in t he
dict ionary, it ret urns NULL ( and t he t oken is discarded) . ispell_template also uses a st op word list t o filt er out com m on words. There's an
im port ant difference bet w een ispell_template and en_stem. Bot h dict ionary processors convert t okens int o st em form , but
ispell_template will discard any t oken t hat it can't find in t he dict ionary: en_stem, on t he ot her hand, sim ply passes t hrough any t oken t hat
it can't st em . The ispell_template processor won't work unt il you connect it t o a dict ionary —a process described in t he " Tsearch
I nt roduct ion" docum ent t hat com es wit h t search2.

You can st ring m ult iple dict ionary processors t oget her by list ing each one in t he pg_ts_cfgmap.dictname colum n. For exam ple, t o apply t he
synonym processor and t hen t he en_stem processor t o every lword t oken:

perf=# UPDATE pg_ts_cfgmap


perf-# SET dict_name = '{"synonym","en_stem"}
perf-# WHERE ts_name = 'default_enUS' AND tok_alias = 'lword';
UPDATE

t search2 t r ies each dict ionary processor, in order, and st ops as soon as a processor ret urns a non- NULL value.

Now you know how all of t he pieces fit t oget her. t search2 uses t he server's locale t o find a configurat ion ( in t he pg_ts_cfg t able) . The
configurat ion ident ifies a parser. t search2 uses t hat parser t o split a t ext st ring int o a set of t okens and assigns a cat egory t o each t oken. The
pg_ts_cfgmap m aps each configurat ion/ t oken cat egory com binat ion int o t he nam e of a dict ionary processor. ( I f a com binat ion is not found in
pg_ts_cfgmap, t search2 discards all t okens of t hat cat egory.) The dict ionary processor ( t ypically) filt ers each t oken t hrough a list of st op
words and t hen st em s anyt hing t hat m akes it t hrough t he filt er.

To creat e a new configurat ion, you can writ e a new parser, change t he pg_ts_cfgmap t o include ( or exclude) t oken cat egories, m odify t he
pg_ts_cfgmap t o apply a different dict ionary processor t o a t oken cat egory, im plem ent a new dict ionary processor, or m odify t he list of st op
words used by a dict ionary. I f you use t he synonym dict ionary pro cessor, y ou can also m odify t he synonym m ap. I n m ost cases, you won't
need t o writ e any code ( unless you find t hat you have t o im plem ent a new parser or dict ionary processor) ; j ust adj ust a configurat ion t able
( or ext ernal file) . I f you do writ e a new parser or dict ionary processor, consider donat ing it t o t he Post greSQL com m unit y so ot her users can
benefit from your effort s.

t search2 offers a num ber of ot her feat ures t hat I haven't described here. You know t hat t search2 can ident ify t he docum ent s t hat m at ch a
given pat t ern—t search2 can also rank t he m at ches according t o relevance. ( Check out t he rank() and rank_cd() funct ions.) When t search2
finds a docum ent t hat m at ches a pat t ern, you can ask t he headline() funct ion t o produce a st ring t hat highlight s t he search words in
cont ext . See t he t search2 docum ent at ion for m ore det ails.

I f t search2 doesn't have what you need, check out t he OpenFTS package. OpenFTS is a user - friendly wrapper around t search2. You can use
OpenFTS t o expose t he docum ent s in your dat abase t o users t hat m ay not know how t o form ulat e SQL queries ( and m ay not underst and t he
result s) . You can find OpenFTS at openft s.sourceforge.net .

You might also like