This action might not be possible to undo. Are you sure you want to continue?
TEAM NAME: Tech Buddy
REiBlue_184.108.40.20641 (Matured Alpha)
TEAM MENTOR Mrs. Hema Malini Asst. Prof in SASTRA University
TEAM MEMBERS Ashwanth Kumar Kirubaharan A. Salaikumar @ Saravanan M. Swetha S.
Table of Contents
Unit I: Introduction
Modes of Research Engine Purpose of Research Engine Technologies to be used Hardware Requirements Software Requirements iBlue Class hierarchy 1 2 2 4 4 5
Unit II: Web Search Component
Use Case Diagram Activity Diagram Sequential Diagram Tools/Libraries used in implementation 6 7 10 8
Unit III: Knowledge Engine Component
Use Case Diagram Activity Diagram Sequential Diagram Tools/Libraries used in implementation 11 12 13 14
Unit IV: Code Search Component
Use Case Diagram Sequential Diagram Activity Diagram Tools/Libraries used in implementation 15 16 17 18
Research Engine – Matured Alpha (REiBlue_1. Kirubaharan 2.co.cc/docs/ .3. Kirubaharan A.1086) Initial design of the Research Engine (Semantic Search Engine). Authors: Salaikumar @ Saravanan and A.1. and implementing knowledge engine. code search and public SPARQL end-point for Linked Data Authors: Salaikumar @ Saravanan. Ashwanth Kumar. Research Engine – Alpha (REiBlue_1.0. using NLP for query processing and refinement.Software Requirements Specification REVISIONS 1.2841) Improved design using world class standard components. Document base URL: http://re-iblue.0. and Swetha S.
and other related science concepts.net. Web Search . to identify what you want very easily. we are forced to use WolframAlpha (though in Alpha stage. Modes of Operation It operates on three modes. 2010) 3. When we want to know how a scientific expression is derived.Helps you search all opensource code available on SF. who was not happy with the way information was available on the internet. If you want the website that contains the best cookery information. and other public SVN. and the power of Wolfram Alpha knowledge Engine. to provide to the users. or how exactly it is put into use. Software Requirement Specification UNIT I Introduction Research Engine (Code name: iBlue). We hope you like using it as much as we did. All the data of the knowledge base are obtained from Wikipedia (updated till 4th April. 2010. Google Code.Google like searching interface.Works exactly like Wolfram Alpha. we use Google. github. Knowledge Engine . but uses powerful clustering algorithm to categorize your results. We need a system that combines the power of both of these technologies. In brief it has the elegance of Google. it has a really large entity index of science and technological information).Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December. or the famous site for Movie ratings.2841 . It’s the outcome of efforts of four people. Code Search . 1. and the difficulty involved to get what you want.3. what we call: "Instant Answers to all your Questions!" Welcome to Research Engine (code name: iBlue). is a semantic search engine for the people of 21st century.0. especially when we are not sure of what we want. Tech Buddy / SASTRA University / Tamil Nadu 1 | iBlue Research Engine – SRS v1. 2.
Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December. Knowledge Engine. was basically built to be a Semantic Knowledge Engine. Software Requirement Specification Purpose of Research Engine The main purpose of Research Engine (now on referred to as iBlue). Though the idea initially was built a search engine. but their contents are not machine ready. iBlue thus is a semantic search engine.3. Technologies and Tools used in the implementation: Rational Software Architect (RSA): IBM Rational Software Architect.0. 2010. Code Search. is a labs feature which should any user (especially students) to browse through the large canopy of free and open source code available online. and model-driven development (MDD) with the UML for creating resilient applications and web services. Information repositories like Wikipedia contains all structured human-edited information on various subjects. available as the Web Ontology Language (OWL) format. providing search feature to all its users based on their interests (syndicated from Social networking sites – Facebook in our case) and Semantic results are quantized on the basis of their preferences and presented. is the implementation of an inference engine which stands on top of semantic data (information from Wikipedia).2841 . Rational Software Architect is built on the Eclipse open-source software framework and includes capabilities focused on architectural code analysis. from ground up. is a comprehensive modeling and development environment that uses the Unified Modeling Language (UML) for designing architecture for C++ and Java 2 Enterprise Edition (J2EE) applications and web services. (RSA) made by IBM's Rational Software division. as we worked on it we understood the need for structured data online. Tech Buddy / SASTRA University / Tamil Nadu 2 | iBlue Research Engine – SRS v1. C++.
certified Java EE 5 application server for building and managing Java applications. Semantic Web (Web 3.or "semantics" . distributed.3. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers. It enables applications to work with thousands of nodes and petabytes of data. 2010. Enterprise Edition or Java EE is a widely used platform for server programming in the Java programming language.0): Semantic Web is a group of methods and technologies to allow machines to understand the meaning . Software Requirement Specification J2EE: Java Platform. based largely on modular components running on an application server.0.Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December.2841 . IBM DB2 Express – C: IBM DB2 Express-C is a free to download. IBM WASCE: IBM WebSphere Application Server Community Edition (WASCE) is a free. The Java platform (Enterprise Edition) differs from the Java Standard Edition Platform (Java SE) in that it adds libraries which provide functionality to deploy fault-tolerant. use and redistribute edition of the IBM DB2 data server. which has both XML database and relational database management system features. Apache Hadoop: Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license.of information on the World Wide Web. It is IBM's supported distribution of Apache Geronimo that uses Tomcat for servlet container and Axis 2 for web services. multi-tier Java software. Tech Buddy / SASTRA University / Tamil Nadu 3 | iBlue Research Engine – SRS v1.
It provides an API to extract data from and write to RDF graphs. A model can be sourced with data from files. databases. Software Requirement Specification Jena API: Jena is an open source Semantic Web framework for Java. 80 GB HDD.Intel Pentium processor. Websphere community Edition or Websphere Application server or any other equivalent must be installed 5. Ubuntu 10.3. The graphs are represented as an abstract "model". 512 MB RAM. 40 GB HDD. DVD Optical drive.6 preferably from Oracle. IBM DB2 Express Edition or DB2 Enterprise edition Tech Buddy / SASTRA University / Tamil Nadu 4 | iBlue Research Engine – SRS v1. high speed network connectivity (optic fiber recommended) & uninterrupted power supply Software Requirements 1. and JAVA_HOME variable must be set to jvm home 3. 2010. 1 GB RAM.Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December. Hardware Requirements Minimum Configuration 1 Node running . and Broadband internet connection.0. URLs or a combination of these. A Model can also be queried through SPARQL and updated through SPARUL. Java 1.2841 . Recommended Configuration 2 – 100 nodes running .Pentium 4 Processor. SSH package must be installed (Used by Hadoop to contact other nodes on the network) 4.04 or any Linux based operating system 2.
2010. Software Req equirement Specification iBlue Class Hierarchy List of all classes being used and i implemented in iBlue can be visualized as abov ve. which he doesn’t come into this class hierar ierarchy.Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December. Tech Buddy / SASTRA University / Tamil Nadu 5 | iBlue Research Engine – SRS v1.3. Each classes. their methods and required fields are mapped under the respective packages.0. There are some more depende encies and open source tools being used in th project. You can find the complete list o components used in each module at the end of design under of d each module.2841 e .
0. The yc improved query is then searched against the index. Use Case Diagram for Web search component User enters his query.3. ed ted Tech Buddy / SASTRA University / Tamil Nadu 6 | iBlue Research Engine – SRS v1.2841 e . The index gives the weight list of results. which is then improved using Ontology of OpenCy concepts. Software Req equirement Specification UNIT II Web Search Component Web search module enables us sers of the site to perform text based searchin on the entire ing web. 2010.Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December.
0. This result set is then clustered by a clus ster engine using Lingo algorithgm. Software Req equirement Specification based on the query and semantics. WebSearch servlet also supports REST based search A h API. The web co omponent is also supported via AJAX also.Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December. When a user is logged in. the s search results from the indexer is further imp proved using the users’ connections and their ont ntology. Rest of the process is same with the clustering of results and output formats.2841 e .3. Activity diagram for We search component eb Tech Buddy / SASTRA University / Tamil Nadu 7 | iBlue Research Engine – SRS v1. 2010.
com/hadoop/) . is explained above.apache. interests. 2. Software Requirement Specification The user activity with the web search module. Apache Nutch (http://nutch.org) . The above sequence is also represented in the following sequence diagram.cloudera. etc. and then clustered results for the user. text/json. activities. Tech Buddy / SASTRA University / Tamil Nadu 8 | iBlue Research Engine – SRS v1. filetered. Cloudera’s Distribution for Hadoop (http://www. It uses Lingo algorthim to categorize the search results into different categories Result – Contains the quantized.2841 .Nutch is open source web-search software. such as a crawler. parsers for HTML and other document formats. (Available only to loggedin user) AddToWebHistory – Add the search query of the user to the WebHistory table in the system database for later retrival and filteration process upon subsequent relevant queries Clustering – Groups the results of the UserQuery based on the semantics of the result. a link-graph database. isUserAuthenticated – Returns true if the user is logged in else false SearchIndex – Searches the index for matching patterns of the UserQuery Quantization – Filter the fetched urls based on the users’ likes and dislikes. It may be in any of the following forms: text/xml.0.Cloudera’s Distribution for Hadoop (CDH) sets a new standard for Hadoop-based data management platforms. It is the most comprehensive platform available today and significantly accelerates deployment of Apache Hadoop in your organization.3. UserQuery – User gives their query in form of text or keywords to search from the web. Nutch can run on a single machine. but gains a lot of its strength from running in a Hadoop cluster. adding web-specifics. text/html Tools / Libraries used in Implementation: 1. It builds on Lucene and Solr. 2010.Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December. etc.
3. 4. Software Requirement Specification 3.Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December.2841 . Apache Lucene (http://lucene. It is supported by the Apache Software Foundation and is released under the Apache Software License. Facebook Connect API (http://developers.org/) .Apache Lucene is a free/open source information retrieval software library.0. User context is derived from OpenGraph protocol of Facebook. 2010.com/) .apache.Facebook's powerful APIs enable us to create social experiences to drive growth and engagement on our web site.facebook. Tech Buddy / SASTRA University / Tamil Nadu 9 | iBlue Research Engine – SRS v1.
2010.3. Software Req equirement Specification Web se earch module sequence diagram Tech Buddy / SASTRA University / Tamil Nadu 10 | iBlue Research Engine – SRS v1.0.Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December.2841 e .
or anything ical you can think of. 2010). DBPedia. Computational engine.2841 e .3. does all t semantic processing of data from the WWW. r applied since the information has no context of the user related to it. can be used to query information about any entity on the web. Inference engine. Use case for Knowledge Engine Component e Users of the component includes any user (logged in and guest). etc. It can also be used as an Analytic Engine. Currently the the dataset is limited to 3 million art rticles from Wikipedia (as of July. OpenCyc. .Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December. 2010. iBlue Kn as nowledge Engine. Software Req equirement Specification UNIT III Knowledge Base Engine (KBEng e gine) Knowledge Base Engine. it can be extended to be used under any asic e type of application and requirements. There is no restrictions being es. n Tech Buddy / SASTRA University / Tamil Nadu 11 | iBlue Research Engine – SRS v1. The dat is published in ata Linked Data format to be compa atible with Open Calais. It is a very bas knowledge engine.0.
this is b implemented separately its not t part of the KBEngine.3. Software Req equirement Specification It can also perform analysis u using Machine learning algorithms (at the backend).2841 e . the Administrators can block a partic icular Entity or a Entity type (example scenarios include parental s control or unmatured informatio ion).Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December.0. 2010. KBEngine activity in iBl lue Tech Buddy / SASTRA University / Tamil Nadu 12 | iBlue Research Engine – SRS v1.
er TokenizeQuery – It generates the valid tokens of the query es ASTForm – It is an intermediate form of representation of the query in the memory. 2010. The general operatio e ional activities are as follows. execute it GenerateOutput – The com mputed value or information is then syndicat in the form ated requested by the user (JSON/XML) KBEngine Sequence for iBlue r Tech Buddy / SASTRA University / Tamil Nadu 13 | iBlue Research Engine – SRS v1. Software Req equirement Specification Above activity depicts the usage of KBEngine with iBlue.2841 e . the corresponding action can be identified from e the AST form of the query PerformAction – Once the ac he action is identified. ready for iate computation or quering the Knowledge Store GetAction – Once the query is tokenized.3.Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December.0. GetUserQuery – Get the user query in REST based or form based medium.
com/p/wikixtractor/) – TechBuddy’s project for wikipedia content extraction.opencalais. which was used to generate the ~16 GB of structured data from Wikipedia using various parsers into N-Triple RDFs for processing. It provides a programmatic environment for RDF.OpenCyc is the open source version of the Cyc technology. 2. Software Requirement Specification Tools / Libraries used in Implemenation 1.opencyc. WikiTractor (http://code. 3.0. 2010. Jena (http://openjena. OpenCyc (http://sw. the world's largest and most complete general knowledge base and commonsense reasoning engine.google.Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December.org/sparql Tech Buddy / SASTRA University / Tamil Nadu 14 | iBlue Research Engine – SRS v1. is a web service that helps to annotate the unstructed text using the OpenCalais Ontology. An OWL API.org/) . OpenCalais (http://www. 6. SPARQL and includes a rule-based inference engine. Validated all wikipedia entities against DBPedia public end-point 5.SPARQL Explorer for http://dbpedia.3.org/snorql/) . N3 and N-Triples. The Jena Framework includes: A RDF API. RDFS and OWL.org/) .Jena is a Java framework for building Semantic Web applications. 4. Jena is open source and grown out of work with the HP Labs Semantic Web Programme. Reading and writing RDF in RDF/XML. In-memory and persistent storage and SPARQL query engine. Snorl (http://dbpedia.2841 .com/) – Open Calais.
2010.Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December.0. Google Code Eclipse labs projects. It mainly concentrates on crawling and ind d indexing open source projects under SVN repositer itery. ame ds Code Search Use case diagr ram Code search module is very sim imilar to Google Code Search feature. package. Th programs are e he classified based on the programm language used. As said its an addon aid. SF. method name.3. implementation for the iBlue spider (web crawler used by all the modu dules). file nam pattern. Currently it crawls SF. and also based on custom keyword by the user. Google Code. p Tech Buddy / SASTRA University / Tamil Nadu 15 | iBlue Research Engine – SRS v1.2841 e .net. ming hich class type. Software Req equirement Specification UNIT IV Code Search Code search is an addon implementation of searching feature purely concerntr rated on indexing and searching open source code online from Apache. Apache t he top-level projects. license under wh its available.net.
custom e user query. ion Code Search sequence f iBlue for The above diagram represents the sequence of actions that takes place in the system with respect to the module. In the e mean while the user client has th PrettyPrintCode JS Framework (similar to Bespin). is available as a ser ervice so that users writing any IDE can utilize the service for e proving real time code completio sugestion techniques. 2010. file name pattern. and methods.0. product license. which accepts the query akes ac and searches in the index. User mak a query to the Code Search server. at the backend. he Bes Tech Buddy / SASTRA University / Tamil Nadu 16 | iBlue Research Engine – SRS v1. Software Req equirement Specification Users can search via the program ramming language.Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December.3. g. can delete any S repositery that is already indexed by the craw SVN awler. class.2841 e . Code Index. The weighted list of results are then returned to the user. Users can also contribute a SVN Url for indexing Administrators .
It starts of with user query to identify the project and proceeds upto to display the projec details. e ject Tech Buddy / SASTRA University / Tamil Nadu 17 | iBlue Research Engine – SRS v1. Software Req equirement Specification Code Search Activity Di iagram Code Search module’s activity diag iagram has the similar implementation of the se equence.3.Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December.0. 2010.2841 e .
It is supported by the Apache Software Foundation and is released under the Apache Software License.org/) . SVNKit (http://svnkit. text/xml or text/html) after formatting the code Tools / Libraries used in Implemenation 1.Apache Lucene is a free/open source information retrieval software library. class. 2.com/projecthosting/) – Repositery of opensource softwares hosted at Google infrastructure 3. 2010. package.net/) – Repositery of opensource softwares 4. and licesence CodeIndexer – Analyses the CodeQuery against the code index to determine any valid search patterns GetRequiredIndexParameter – Returns the required properties of the index object repective to the user code query.org/) – All world class open source projects developed over a course of time by a vibrant developer community 5.everything within your Java application.apache.0. Google project hosting (http://code. method.2841 .google. containing various parameters like language. file (regular expression). Apache Software Foundation (http://projects.com/) .Project Scenario: Research Engine Team Name: Tech Buddy Date: 8th December.apache. access and manipulate Subversion repositories . Tech Buddy / SASTRA University / Tamil Nadu 18 | iBlue Research Engine – SRS v1.SVNKit is a pure Java toolkit . Software Requirement Specification Above activity depicts the typical usage of code search module. type.3. CodeSearcher – Searches the indices for valid pattern match for CodeQuery PrettyPrintOutput – Output’s the result in the preferred format (text/json. SourceForge (http://sourceforge. Each activity method is described as follows: CodeQuery – Query from the user for the code.it implements all Subversion features and provides APIs to work with Subversion working copies. Apache Lucene (http://lucene.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.