You are on page 1of 9

IBSurgeon

Emergency and data availability tools for InterBase and Firebird databases

Firebird and InterBase databases' corruptions reasons by IBSurgeon, 2004-2008

IBSurgeon contacts:
www.ib-aid.com su""ort#ib-aid.com

More information:
Data uard ! "rotecting your data $mergency too%set for &irebird and InterBase

&irebird and InterBase databases corru"tions reasons

Contents
Firebird and InterBase databases' corruptions reasons....................................................1 Hardware-related errors................................................................................................3 Abnormal shutdown..................................................................................................3 Light HDD corruption.................................................................................................3 Hea ! HDD corruption..............................................................................................3 "D#D$D corruption....................................................................................................% Flash Dri e corruption...............................................................................................% &A' corruption.........................................................................................................% Lac( o) dis( space )or the database.........................................................................% Lac( o) dis( space )or temp )iles...............................................................................* Lac( o) dis( space )or interbase.log or )irebird.log...................................................* "orruptions caused b! maintenance carelessness .....................................................+ Accidentall! deleted database )ile.............................................................................+ ,Disappeared- )iles on Linu.#/ni.#H0-/1#................................................................+ 2rroneous implementation limits...................................................................................3 3+.+4b limit o) table's si5e.......................................................................................3 BL6B ma.imum si5e restrictions..............................................................................3 7oo man! transactions in pre InterBase +.* ersions...............................................3 7oo man! generators ...............................................................................................8 "orruptions caused b! 9:L..........................................................................................8 "reating and dropping tables during intensi e users' wor(......................................8 7his is not the end.........................................................................................................; "ontacts ...................................................................................................................... .;

' IBSurgeon

&irebird and InterBase databases corru"tions reasons

Hardware-related errors Abnormal shutdown


(bnorma% s)utdowns are t)e %eading cause of corru"tion. *)ey can be caused by "ower %oss on a com"uter wit) no +,S w)en a big mutant mo%e eats your city-s "ower su""%y cab%e .or any ot)er e/"%anation your energy com"any mig)t gi0e1, or t)e c%eaning %ady "u%%s t)e 2wrong3 cab%e w)i%e c%eaning your office. Sometimes "eo"%e 4ust "ower off t)eir com"uters wit)out concern for w)at t)e mac)ine mig)t be engaged in. (ny of t)ese conditions can %ead to corru"tion. 5onet)e%ess, you )a0e "robab%y obser0ed t)at not e0ery abnorma% s)utdown %eads to corru"tion. *)e initia% design of InterBase was, wit) some restrictions, forgi0ing toward suc) unstab%e en0ironments. (s you may 6now, ear%ier 0ersions of InterBase were used in t)e fire contro% system of t)e M78,S arti%%ery "%atform. $0ery 0o%%ey of M,78S was accom"anied by a strong e%ectromagnetic "u%se, causing t)e on-board com"uter to reboot e0ery time. It was a strong re9uirement t)at t)e on-board database ser0er be ab%e to re%oad in seconds be robust against t)e "otentia% for corru"tion caused by "ower s)utdown. InterBase fitted bot) conditions: it started 9uic6%y and its mu%ti-generation arc)itecture made it ca"ab%e of by"assing uncommitted or e0en corru"ted 0ersions of records, and retain its abi%ity to read good records. But time "asses. In %ater 0ersions ad0anced cac)ing was im"%emented and 2ci0i%ian3 InterBase 0ersions became more 0u%nerab%e in abnorma% s)utdowns. *)e best-6nown "rob%em is re%ated to &orced :rites on :indows. &orced writes is a f%ag, set on t)e database fi%e, t)at determines :indows cac)ing be)a0ior for t)is fi%e. :e strong%y recommend setting &orced :rites on :indows to ;5, because :indows is 0ery 2%a<y3 about re%easing its cac)e and cou%d be )o%ding days of unf%us)ed wor6.

Light HDD corruption


7ig)t corru"tion of a )ard dri0e )a""ens w)en 4ust a few c%usters become corru"ted. +sua%%y, t)e o"erating system warns t)at t)e fi%e is unreadab%e. *)e effect of suc) corru"tion is t)at t)e database fi%e )as se0era% ga"s wit)in it t)at are fi%%ed wit) <eros or, occasiona%%y, garbage information. *)e ga"s brea6 t)e interna% structure of t)e database, w)ic) can gi0e rise to a wide range of "ossib%e errors.

Heavy HDD corruption


Sometimes t)e )ard dri0e can s)atter into a com"%ete%y unreadab%e "i%e of meta% and "%astic. In t)is case you )a0e two o"tions, t)e first of w)ic) is to try re"airing t)e corru"ted dri0e wit) a s"ecia% uti%ity %i6e 8-Studio .)tt":==www.data-reco0ery-software.net=1. If t)at doesn-t )e%", you can en%ist t)e aid of a )ard dri0e reco0ery ser0ice. *)ese guys can rea%%y fetc) data from t)e &ourt) Dimension. *wo "rob%ems %ie )ere: first, t)eir fees are rat)er %arge .starting from +SD>?0001, meaning you )a0e to weig) t)at cost u" against your estimate of t)e 0a%ue of t)e %ost data before ordering t)e ser0ice. Second%y, reco0ered data are a%most a%ways mi/ed ! by t)is, I mean t)at c)ains of c%usters are arranged in ways t)at differ from t)e origina% database fi%e.

' IBSurgeon

&irebird and InterBase databases corru"tions reasons

CD/D D corruption
If you store databases on DADs or BDs ! for arc)i0ing "ur"oses or as a read-on%y dictionary database ! t)ey may brea6. +sua%%y t)e s)owing of corru"tion is t)at you can-t read t)e database fi%e from t)e DAD. *)e first t)ing to do is to e/tract t)e corru"ted fi%e from t)e DAD wit) some too% %i6e &i/DADC +nfortunate%y, in DDE of cases, a database fi%e e/tracted from a DAD is in a bad state: t)e e/tracted fi%e is t)e rig)t si<e but it is fi%%ed wit) a mi/ture of database "ages and garbage=<ero data. *)e most affordab%e way w)en e/tracting database fi%es from DAD is to create an image of t)e entire DAD .4.F b1 and %oo6 for database "ages in t)is area wit) s"ecia% IBSurgeon too%s.

Flash Drive corruption


&%as) dri0e is rat)er recent tec)no%ogy, wit) some %imitations on t)e count of read=write cyc%es. $ar%y 0ersions were inca"ab%e of sustaining e0en a mi%%ion cyc%es, but t)e "rob%em seems to be gone as t)e tec)no%ogy )as e0o%0ed. Gowe0er, I cannot recommend f%as) dri0es for e0eryday usage as t)e main data storage for a &irebird or InterBase database. *)e wor6 of a database ser0er in0o%0es many read=write o"erations in random access mode. :e )a0e )and%ed se0era% corru"ted databases %i0ing on f%as) dri0es w)ose users were not "rudent enoug) to bac6 u". Borru"tions are simi%ar to %ig)t GDD corru"tions ! se0era% "ieces of t)e database fi%e %ost.

!A" corruption
(mongst a%% )ardware corru"tions, 8(M corru"tion is t)e rea% nig)tmare. In genera%, it-s fortunate t)at 8(M corru"tion becomes a""arent wit) a BS;D .B%ue Screen ;f Deat)1 or ot)er critica% e0ents w)ic) can be easi%y detected by system administrator. But sometimes 8(M corru"tion is so s%ig)t t)at on%y s"ecia% too%s can detect it and it s)ows its teet) on%y during intensi0e usage. :)en ot)er )ardware corru"tions occur, it-s at t)e database "age %e0e%. :)o%e "ages are missed w)i%e ot)er "ages are intact. *)e "rob%em w)en 8(M becomes corru"ted is t)at any bit in t)e database fi%e can be intermittent%y c)anged from 0 to ?, or 0ice 0ersa. *)is 6ind of corru"tion is recogni<ab%e on%y after t)e fact: it s)ows u" on%y w)en some database "age become sufficient%y corru"ted to trigger an error. So, RAM corruption is hidden until the level of damage becomes critical. I once saw a couple of databases from a single server with corrupted RAM. The customer sent them one by one, with various corruptions, before I asked him to send me the interbase.log. There, I saw multiple rong record length errors, rong page type and even several esoteric errors. e tested the RAM with the memtest tool and found RAM problems. ( furt)er "rob%em e/)ibited wit) 8(M corru"tion is t)at attem"ts to 0a%idate t)e database wit) gfi/ on t)e com"uter wit) t)e bad 8(M can "roduce different resu%ts eac) time you run it. :orse, t)e wor6 of gfi! "mend, w)ic) tries to fi/ errors, can "roduce additiona% corru"tion of t)e database because it 0isits e0ery database "age and, in writing of 2mended3"ages, it can %ay down more wrong bits.

Lac# o$ dis# space $or the database


8unning out of dis6 s"ace is t)e fa0orite error of %a<y administrators. Borru"tion )a""ens w)en ' IBSurgeon 4

&irebird and InterBase databases corru"tions reasons t)e ser0er tries to re9uest more "age to e/tend t)e database fi%e and disco0ers t)at t)ere is no s"ace a0ai%ab%e on t)e dis6 or "artition. *)e most dangerous situation occurs w)en %ac6 of dis6 s"ace is combined wit) a %arge cac)e and forced writes off. *)e o"erating system tries to f%us) a %arge amount of data to dis6 and sim"%y fai%s if t)ere is not enoug) room. In t)is case t)e database wi%% be inconsistent, because %oss of t)e cac)e means t)at a%% c)anges in "age and record c)ains were interru"ted. :)en you try to re"air corru"tion caused by %ac6 of dis6 s"ace wit) &IH you may find interesting side effect: t)e interbase.%og wi%% be fi%%ed wit) a %oo"ing se9uence of 2"age doub%y a%%ocated3 errors. fi/ wi%% ne0er finis), t)e interbase.%og can grow 0ery %arge and dis6 s"ace can be e/)austed again.

Lac# o$ dis# space $or temp $iles


If you don-t )a0e s"ace on dis6 configured for InterBase or &irebird to use for storing t)e tem"orary fi%es it creates for sorting and merge o"erations, t)e engine wi%% use t)e directory s"ecified in t)e system 0ariab%e *$M,. If a )ea0y 9uery )as mi%%ions of rows to sort, t)e o0era%% si<e of t)e tem" fi%es can be 0ery %arge. If many 9ueries wit) sorts are running, it can occu"y a %ot of s"ace and e/)austion of free s"ace becomes a "ossibi%ity. +sua%%y suc) conditions are )and%ed correct%y and c%ient t)at %aunc)ed 9uery recei0es an error message. The funny thing about it is that, in old versions of Inter#ase and $irebird, the error te!t was %&o paper in printer% due to a wrong assignment of the e!ception message to the indows system error code. But it can )a""en t)at %ac6 of dis6 s"ace for tem" fi%es %eads to abnorma% ser0er termination and corru"tion of databases, es"ecia%%y wit) an o%der 0ersion of InterBase=&irebird.

Lac# o$ dis# space $or interbase%log or $irebird%log


If you donIt watc) t)e amount of free s"ace on t)e "artition w)ere your InterBase or &irebird is insta%%ed, you can run out of free s"ace t)ere w)en t)e interbase.%og or firebird.%og fi%e grows 0ery %arge. (%% errors in a%% databases under t)e ser0er are written down to t)e same %og fi%e so you are %i6e%y to run into t)e "rob%em if you )a0e a %ot of networ6 errors, suc) as: My_server(Server) Sat Jan 02 15:14:57 2006 INET/inet_error: read errno 10054 5ow, you mig)t we%% assume t)at suc) errors are rat)er rare and, wit) "ro"er%y designed a""%ications, s)ou%d ne0er a""ear. :)i%e t)at-s true, t)ere are a %ot of cases w)ere free s"ace does get e/)austed by t)e %og fi%e b%owing out. ItIs sim"%e to find an e/am"%e. Imagine you )a0e %ig)t corru"tion of an inde/ in a rat)er big database and you decide to run gfi/ to fi/ it. :)at )a""ens )ere is t)at t)e engine mar6s t)e corru"ted inde/ as wrong, frees a%% its "ages and %ogs t)e message 2,age HHH or")an3. :it) a %arge enoug) inde/, youI%% get t)ousands of suc) messages in t)e %og fi%e t)at can easi%y eat u" a%% a0ai%ab%e free s"ace and %ead to muc) )ea0ier corru"tion.

' IBSurgeon

&irebird and InterBase databases corru"tions reasons

Corruptions caused by maintenance carelessness Accidentally deleted database $ile


Sometimes a database fi%e can be accidenta%%y de%eted. In 0ersions "rior to &irebird ?.0 and InterBase F.0 it was "ossib%e to de%ete a database fi%e e0en if it was o"en, i.e., during acti0e I=; o"erations. *)e ser0er o"ened t)e database fi%e wit) t)e f%ag fmS)areDeny5one .0/401, so any "rocess was ab%e to modify and de%ete it. $0en wit) t)e newest 0ersions t)ere is no guarantee t)at a database fi%e wou%d not be de%eted w)i%e t)e ser0er is off or no connections are acti0e. +sua%%y it )a""ens w)en t)e database )as been bac6ed u" or co"ied to anot)er %ocation and it is wrong%y considered t)at t)e origina% database can be de%eted. If t)e bac6u" fai%ed or t)e destination fi%e was corru"ted during t)e fi%e co"y, we )a0e a de%eted database fi%e situation. *)e most urgent and immediate t)ing to do is to sto" a%% acti0ity on t)e dis6 w)ere database was situated. If it is a system dis6, "ower t)e com"uter down and remount t)e dis6 as secondary to a0oid any writes to it. 5e/t, you need to find some software for unde%eting fi%es. &or :indows and 7inu/ t)ere are many uti%ities and )ow-to guides for retrie0ing and re"airing fi%es. Some of t)em are %isted be%ow: )tt":==www.stud.tu-i%menau.de=Kmo4o=unde%ete.)tm% )tt":==reco0er.sourceforge.net=%inu/= )tt":==)ome.fna%.go0=Kmu<affar=unde%ete=8$(DM$.)tm% If no writes )a0e been done o0er t)e de%eted fi%e sectors, you )a0e a good c)ance to retrie0e t)e de%eted fi%e, "er)a"s e0en unscat)ed and wor6ab%e. +sua%%y, t)oug), some "arts are %ost and additiona% reco0ery effort is needed.

&Disappeared' $iles on Linu(/)ni(/H*+),/%%%


It is we%%-6nown fact t)at 7inu/ uses t)e inode mec)anism to su""ort different fi%e systems. ;ne of t)e 6ey features of t)is mec)anism is t)e use of cac)e to )and%e fi%e descri"tors ! it means t)at fi%e descri"tors are stored bot) in memory and on dis6. *o InterBase and &irebird it brings an onerous side-effect. If you re"%ace a database w)en users are sti%% connected, t)e ser0er wi%% continue to wor6 wit) t)e o%d fi%e, w)ic) is wrong%y assumed to be de%eted. *)e danger )ere is t)at, w)en t)e %ast user detac)es, t)e ser0er wi%% dro" t)e fi%e fore0er and t)e 2new3 fi%e ste"s in to re"%ace it at t)at "oint. Lou ne0er 6now it )as )a""ened unti% it is too %ate and t)en, it is most %i6e%y to be disco0ered by furious users: 2:)ere is my wor6 from %ast wee6MC3 *)e %ongest "eriod of %ost data due to suc) Ndisa""earingO t)at I )a0e obser0ed was ?.J years. It was a mu%ti-0o%ume database on 7inu/ and one of t)e 4 b 0o%umes was com"%ete%y %ost. 'ou may say it is a very rare circumstance but I can stake a case of beer on the fact that, right now, at least one hundred server installations have this problem. e receive at least one repair re(uest due to this problem every two months)

' IBSurgeon

&irebird and InterBase databases corru"tions reasons

Erroneous implementation limits -.%./b limit o$ table's si0e


:or6ing at interesting tec)nica% su""ort incident wit) one of our customers, we found w)at t)e actua% ma/imum number of rows differs from dec%ared %imit. In t)e documentation set Q;"erations uide for InterBase P, "age 2F, InterBase S"ecificationR we can read t)e fo%%owing: SMa/imum number of rows and co%umns "er tab%e: By design, 2T@2 rows, because rows are enumerated wit) a @2-bit unsigned integer "er tab%e.S ;f course, rows enumerated by @2-bit integer. Sad%y, but actua%%y no tab%e e0er can reac) 2 bi%%ion records %imit - e0en t)e tab%e wit) on%y t)e one co%umn. *)e reason of suc) be)a0iour is in t)e a%gorit)m of ca%cu%ation of free s"ace for new .inserted1 record - t)e integer o0erf%ow may occur and you wi%% see t)e fo%%owing error message in interbase.%og: "ointer "age 0anis)ed from D,MUne/t .24D1 (nd, t)e rea%ity is, t)at database "age si<e or row si<e doesn-t affect t)is %imit. *)e %imit is a magic tab%e si<e w)ic) is a%ways t)e same .K@P.P gigabytes1 and can be ca%cu%ated in "ages for any database "age si<e as: Ma/imum "age count for one tab%e can be ca%cu%ated as Ma/Data,ageBount V .Ma/Int = ,ageSi<e1 W ?F.4FP Ma/Int, of course, V 2?4F48@P4F. ,%ace your database "age si<e instead of ,ageSi<e and t)e resu%t wi%% s)ow )ow muc) "ages can be a%%ocated for any tab%e &or e/am"%e, tab%e can-t )a0e more t)an KD mi%%ion data "ages in t)e database wit) 4X "age si<e. &rom t)e "oint of 0iew of record count, tab%e wit) 2 integer co%umns can-t grow %arger t)an P00 mi%%ion .C1 records .don-t forget t)at e0ery record )a0e ?4 bytes )eader1. BonfusedM Mu%ti"%y Ma/Data,ageBount by ,ageSi<e, and di0ide resu%t wit) your biggest tab%e a0erage record si<e - you-%% 6now w)en t)is tab%e wi%% e/ceed t)e %imit and your database wi%% sto" wor6ing . *)is %imit was fi/ed on%y in &irebird 2. It sti%% e/ists in InterBase 200F. Addition* another side of this limit was discovered recently in garbage collector thread implementation in $irebird +.,.-* this error can appear in -.../b tables during garbage collection 0it1s caused by combination of special conditions2. e hope it will be fi!ed in the ne!t sub"release +.,.3 and in +.4 for sure.

BL1B ma(imum si0e restrictions


B7;Bs mec)anism )a0e im"%ementations im"erfection. In ear%ier 0ersions of InterBase .unti% F.?1 it was J?2Mb, t)en it was fi/ed u" to 2 b. Burrent%y we )a0e information about DAD images .4.F b1 w)ic) were stored in &irebird 2.0.@. (nyway we do not recommend to store big fi%es .Y?00Mb1 inside &irebird or InterBase databases unti% it-s abso%ute%y necessary .in t)is case we recommend to test t)e %imits first1.

2oo many transactions in pre InterBase .%3 versions


InterBase in "re P.J 0ersions )as a %imit of ma/imum transaction numbers .de"ends on "age si<e1 unti% restore .after bac6u"=restore transactions start again from scratc)1. 5ritical number of transactions in pre ..6 Inter#ase servers ' IBSurgeon F

&irebird and InterBase databases corru"tions reasons Database page size ?024 byte 2048 byte 40DP byte 8?D2 byte Critical number of transactions ?@? JDP 28F 2PJ 8?4 0?P J@4 24D 4F2 ? 0F? ?20 @84

2oo many generators


&or InterBase wit) 0ersions %ess t)an P.0.?.P we )a0e one more erroneous %imit: ma/imum number of generators in t)e database .de"ends on "age si<e1. 5ritical number of generators in early Inter#ase versions Aersion ,age si<eV?024 ,age si<eV2048 ,age si<eV40DP ,age si<eV8?D2 ,re P 248 J04 ?0?P 2040 P.0./ ?24 2JF J08 ?020

Corruptions caused by SQL Creating and dropping tables during intensive users' wor#
:e%%, it-s t)e to" one from t)e most often ways to corru"t database. Breating and dro""ing tab%es w)i%e users are writing somet)ing into database can %ead to confusion between system "ages .used by created=dro""ed tab%es1 and user data. +sua%%y suc) corru"tions %eads to moderate %oss of data and can be reco0ered using manua% IBSurgeon ser0ice.

' IBSurgeon

&irebird and InterBase databases corru"tions reasons

This is not the end


:e%%, we can continue to scary you wit) new ty"e of corru"tions caused by new wa0es of tec)no%ogy and ubi9uitous )uman factor, but we wou%d %i6e to a0oid ma6ing im"ression t)at &irebird and InterBase are unstab%e or not re%iab%e enoug) for serious a""%ication de0e%o"ment .actua%%y, in com"arison wit) some ot)er database "%atforms t)e "rob%ems described abo0e %oo6 %i6e c)i%d "%ay1. Instead of t)is wrong im"ression we wou%d %i6e to gi0e you confidence and better understanding of "rocesses and %ife of your database. If you wi%% treat it in a rig)t way and care we%%, database wi%% be stab%e, fast and re%iab%e. Do not )esitate to contact our sa%es re"s or our su""ort team at su""ort#ib-aid.com wit) any 9uestions and we wi%% be )a""y to )e%" you.

Contacts IBSurgeon contacts:


www.ib-aid.com su""ort#ib-aid.com

More information:
&BData uard ! "rotecting your &irebird data $mergency too%set for &irebird and InterBase

' IBSurgeon

You might also like