The Top Ten Largest Databases In The World

By Lewis Keller 2/27/2012

The Top Ten Largest Databases In The World Introduction

When I was presented with the opportunity to research the largest databases in the world, I was willing to do a detailed discussion of the top five. However, I came across a list of the top 10 largest databases in the world. So, I decided to expand my discussion to cover the whole list. One thing that I¶m not surprised about is that the top two are owned by our government (the Library of Congress and the Central Intelligence Agency, respectively). However, what I am surprised about is that Google made it only to #7 on the list. Considering that it has a vast amount of knowledge available to the public, I thought that it would be somewhere within the top five. Overall, though, the sizes of these databases are pretty astounding, as several of them are hundreds of terabytes in size.

#1: The Library of Congress

The Library of Congress has 130 million documents altogether. They have so much text data that, if it were to be digitized, it would be 20 terabytes in total size! They have 5 million digital documents, and over 10,000 items are being added to the database every day. However, many of these items are restricted from the general public.

I decided to test their online system by doing a search for ³Vietnam´. I immediately ran across their 10,000-item limit, which shows me how immense their online system is. The newest document I came across in my search was an article from 1991, and they had several documents from the 1960¶s and the 1970¶s. The only thing that I don¶t like about it is that the system gave me only five minutes to do my search before it would kick me out.

#2: The Central Intelligence Agency

One interesting thing about the CIA¶s database is that its size is unknown, due to the number of classified files that it contains. However, there are portions of it available to the public, such as The World Fact Book and the contents of the Freedom of Information Act Electronic Reading Room. Another thing about the database is that it contains statistics on more than 250 countries and entities.

The Electronic Reading Room makes some (potentially sensitive) government documents available to the public, which can help someone find a copy of a previously passed act of law to use for research. So, with my high level of curiosity, I decided to test it, too. I did a search on Africa, and was able to come up with 98 items, which were available in both GIF and PDF formats.

#3: Amazon.com

With the wealth of items that Amazon has for sale online, one would expect them to have a large database. Well, their expectations are right, because Amazon¶s database contains 42 terabytes of data. This database gathers and keeps massive amounts of intimate information about its millions of shoppers, including their religion, sexual orientation, ethnicity and income. This database combines information disclosed voluntarily by customers with facts gleaned from public databases. This gives Amazon more detailed information about its customers than any other retailer.

#4: YouTube

In 2006, back when YouTube was just starting to gain its foothold in our society, their database was projected to have 45 terabytes of data. I seriously can¶t imagine how many terabytes of data are on there now, six years later. The database is open for people who want to access it, which I find kind of astonishing, because of the possibility of users¶ personal data being exposed to the public. Despite this, in order to gain access the database, you must request special developer and client keys. Due to the varying sizes and time-lengths of each video, estimating the size of YouTube¶s database is a difficult task to achieve. YouTube¶s data API is geared towards developers who have experience in dealing with programming server-side languages.

#5: ChoicePoint

Consisting of 250 terabytes of personal data, ChoicePoint's database of 17 billion public records is used for background checks, insurance applications and tenant screening. The database contains information on approximately 250 million people. One thing that I don¶t like about ChoicePoint, is that they sell data to the highest bidders, which include the U.S. government. However, much of their business is being administrated by the Fair Credit Reporting Act.

#6: Sprint

Sprint has 53 million subscribers worldwide, and their database is very expansive. Large telecommunication companies like Sprint are notorious for having immense databases to keep track of all of the calls taking place on their network. The database is spread across 2.85 trillion data insertions (the largest number in the world). 365 million call detail records processed by the

database per day. However, phone information has previously been leaked out of the database, though.

#7: Google

Google¶s database contains virtual profiles of countless number of users, and it contains all of the words that are used in search terms. Google searches account for more than 50% of all internet searches. Like the CIA¶s database, the size of Google¶s database is unknown (due to it being locked in a vault).

For a search through Google¶s database to work, a crawler visits a page, copies the content and follows the links from that page to the pages linked to it, repeating this process over and over until it has crawled billions of pages on the web.

#8: AT&T

AT&T¶s database contains 323 terabytes of data, and has 1.9 trillion phone call records. AT&T is so careful with their records that they've maintained calling data from decades ago, when the technology to store hundreds of terabytes of data was still non-existent. As a former AT&T customer, I have to say that that¶s a very impressive thing to do, because one never knows when such a call might wind up putting somebody in jail over a crime they committed 20 years ago.

#9: NERSC

The NERSC is comprised of 2.8 petabytes, and is operated by more than 2,000 computer scientists. Some of the information that¶s included on it pertains to simulations of the early

universe, atomic energy research, and more. What distinguishes it from others is its successful creation of an environment that makes the resources operative for research.

#10: The World Data Centre for Climate

This database is, by far, the largest database in the world! It contains 330 terabytes of web/climate simulation data, and 6 petabytes of additional data on magnetic tape. The database is so large, that it has to be hosted on a machine that cost 35 million euros ($46,942,000).

Conclusion

In conclusion, with the immense amount of data that they contain, each of these databases help the general public find something that they want and/or need in some fashion. More importantly, though, they set precedence for future databases. They do it through their size, their accuracy, and the data that they contain. I honestly think that databases will continue to grow in all three categories, thus providing more and more information to those who will be requesting for it.

Bibliography Credit.com. "Credit.com." 12 Questions for ChoicePoint. Web. 25 Feb. 2012. <http://www.credit.com/credit_information/credit_law/Questions-for-Choicepoint.jsp>. Dennyson, Robert. "Top 10 Largest Databases in the World." Beyondrelational.com. 01 July 2011. Web. 25 Feb. 2012. <http://beyondrelational.com/modules/1/justlearned/388/tips/9212/top-10-largestdatabases-in-the-world.aspx>. "Freedom of Information Act." CIA FOIA. CIA. Web. 25 Feb. 2012. <http://www.foia.cia.gov/search.asp>. Google. "Technology Overview Company." Technology Overview Company. Web. 26

Feb. 2012. <http://www.google.com/intl/en/about/company/tech.html>. Harris, Craig. "Amazon Database Would Put Shoppers' Intimate Details on the Line." Seattlepi.com. Seattlepi, 10 Aug. 2006. Web. 25 Feb. 2012. <http://www.seattlepi.com/business/article/Amazon-database-would-put-shoppersintimate-1211419.php>. Lee, Kevin. "What Is a Database on YouTube?" EHow. Demand Media, 04 Jan. 2012. Web. 25 Feb. 2012. <http://www.ehow.com/info_12217150_database-youtube.html>. "LG Optimus Slider Aka Gelato Shows up in Sprint Database with September 11 Release Date." Phone Arena. 13 June 2011. Web. 26 Feb. 2012. <http://www.phonearena.com/news/LGOptimus-Slider-aka-Gelato-shows-up-in-Sprint-database-with-September-11-releasedate_id19516>. "Library of Congress Online Catalogs." Library of Congress Online Catalogs. Web. 25 Feb. 2012. <http://catalog.loc.gov/>.

"Model & Data: World Data Center for Climate (WDCC)." Model & Data: Welcome to the Model & Data Homepage. 19 Feb. 2008. Web. 26 Feb. 2012. <http://www.mad.zmaw.de/wdc-for-climate/>. NERSC. "About NERSC." NERSC: National Energy Research Scientific Computing Center. Web. 26 Feb. 2012. <http://www.nersc.gov/about/>. "Top 10 Largest Databases in the World." Focus. Focus, Inc., 2012. Web. 25 Feb. 2012. <http://www.focus.com/fyi/10-largest-databases-in-the-world/>.

Sign up to vote on this title
UsefulNot useful