ADBMS ORAL” QUESTION “UNIT -1”

Q1 -> identify the architecture and explain it ?

Q2 -> identify the architecture and explain it ?

Q3 -> identify the architecture and explain it ?

Q4 -> identify the architecture and explain it ?

Q5 -> what are interconnection networks ? Q 6 -> what do u mean by shared nothing architecture ?

Q 7 -> what do u mean by shared Hierarchical architecture ?

Q 8 -> what are advantages and disadvantages of shared memory architecture ?

Q 9 -> what are advantages and disadvantages of shared disk architecture ?

Q 10 -> what are advantages and disadvantages of shared nothing architecture ?

Q 11 -> what are advantages and disadvantages of hierarchical architecture ?

12)WHAT IS MEANT BY I/O PARALLELISM? 13)WHAT ARE DIFFERENT PARTITIONING TECHNIQUES? 14)WHAT IS HORIZONTAL PARTITIONING? 15)WHICH PARTITIONING TECHNIQUE RESULTS IN HIGHER THROUGHPUT WHILE MAINTAINING GOOD RESPONSE TIME ? 16)WHICH PARTITIONING TECHNIQUE IS SUITED FOR POINT QUERIES BUT NOT FOR RANGE QUERIES? Q 17 -> how parallel systems improve processing and i/o speed ?

Q18.> list the need of parallel database..

Q19.>list the types of interconnection networks.

Q20 -> what is shared memory architecture?

Q 21 -> what is shared disk architecture ?

Q22- WHAT IS PARALLEL JOIN? Q23- WHAT ARE THE DIFFERENT WAYS OF PARTITIONING IN PARTITIONED JOIN? Q24- DEFINE RANGE PARTITIONING AND HASH PARTITIONING? Q25- WHEN DO WE USE FRAGMENT AND REPLICATE TECHNIQUE? Q26- EXPLAIN ASYMMETRIC FRAGMENT AND REPLICATE? Q 27>What is parallel database system? Q 28> List the following issues that should be consider whil e designing parallel database system? Q29 > Why fault tolerance of system is required? Q30 >Why system should allow online changes? Q31 > What is “onlne index creations” ? 32. > what are the type of parallelism in parallel systems ?define them? 33> what are the performance measures of parallel system 34> .describe the speed up measure in parallellism performance measures? 35 > decscribe the scaleup measure of parallellism performance measure 36 >.Define alla interconnection networks in terms of ther scale and communication capacity 37>.Types of Parallel Sort?

38>.Explain Range Partitioning Sort. 39> .Explain Parallel External Sort – Merge. 41> .What is data parallelism?
42>What is intra operation parallelism?Give example. 43>What is parallel sort? 44>What are the two steps in range parallel sort?

45>Give one method which is alternative to range partitioning. 46>What do u mean by data parallelism?

Q47) WHAT IS INTER-QUERY PARALLELISM? Q48) WHAT IS THE USE OF INTER-QUERY PARALLELISM? Q49) DESCRIBE A PROTOCOL FOR SHARED DISK SYSTEM Q50) "THE SHARED DISK PROTOCOLS CAN BE EXTENDED TO SHARED NOTHING ARCHITECTURE".EXPLAIN. Q51) WHY IS INTER-QUERY PARALLELISM IS COMPLICATED IN SHARED DISK OR SHARED NOTHING ARCHITECTURE?

Topic Name: Distributed Database System Q1:What do you mean by Distributed Database Management System?

Ans: A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (distributed DBMS) is the software system that permits the management of the distributed database and makes the distribution transparent to the users . The term “distributed database system “ (DDBS) is typically used to refer to the combination of DDB and the distributed DBMS. Q2: Which architecture it is?

Ans: Client-Server databases architecture Q3: What is the difference between distributed file system and distributed database system? Distributed file systems simply allow users to access files that are located on machines other than their own. These files have no explicit structure (i.e., they are flat) and the relationships among data in different files are not managed by the system and are the users responsibility. But in Distributed Database is organized according to a schema that defines both the structure of the distributed data, and the relationships among the data.

Q4:What is the advantages and disadvantages of of Distributed Database System? Ans: Advantages:  Users can be geographically separate. This is important for large corporations, where business decisions must be made by people in different locations, but those decisions must be based on company-wide data.  Multiple machines can improve performance and scalability. Because a client-server system is distributed over several machines, you can improve the performance and scalability in several ways. There might be multiple replicas of a server running on separate machines, so each handles only a fraction of the total number of clients.  Heterogeneous systems can use the best tools for each task. Different components of an application can run on hardware that is optimized for a specific task.  Distributed systems can reduce maintenance costs. For example, by upgrading an application image on a single server, it is possible to upgrade thousands of clients. Disadvatages: Software: difficult to develop software for distributed systems Network: saturation, lossy transmissions Security: easy access also applies to secrete data. Q5: What are the goals of Distributed database system? Ans: 1)Transperancy: Access: Hides differences in data representation and invocation mechanisms Location :Hides where an object resides Migration :Hides from an object the ability of a system to change that object’s location Relocation :Hides from a client the ability of a system to change the location of an object to which the client is bound Replication :Hides the fact that an object or its state may be replicated and that replicas reside at different locations

2)Openness: Be able to interact with services from other open systems, irrespective of the underlying environment: 3)Scalabilty: Number of users and/or processes(size scalability) Maximum distance between nodes (geographical scalability) Number of administrative domains (administrative scalability) 4)Replication: Make copies of data available at different machines: Replicated file servers (mainly for fault tolerance) Replicated databases Mirrored Web sites Large-scale distributed shared memory systems Topic Name: Distributed Data storage Q11.Give the different sorting approaches of Distributed Data storage? Ans: There are three main sorting approaches of Distributed Database i. Replication: The system maintains several identical replicas of the relation, and store each replicas at different site. ii. Fragmentation: The system partition the relation into several fragments ,and stores each fragment at a different site. iii. Transparency: User should not required to know where the data is physically located or how the data can be accessed at the specific local site. Q12.Give the advantages of data replication in distributed data storage? Ans :i. Availability : If one site fail then data can be found in another site so that system can work continuously

ii. Increased parallelism: Q13.give the disadvantages of data replication in distributed data storage? Ans :Increase overhead on update: If update result at one site it should agrees in various Sites. Q14.What is fragmentation? Ans: Fragmentation consists of breaking a relation into smaller relation or fragments and storing the fragment (instead of relation)possibly at different sites. Q15.Distinguish between horizontal and vertical fragmentation in distributed data storage? Ans: Horizontal fragmentation Vertical fragmentation 1.each fragment consist of a subset of rows of the original relation 2.horizontal fragment are identical by a selection query. Topic Name: 1.Characteristics of distributed database. -One of the goals in using distributed database is high availability;that is,the database must function almost all the times. -For the distributed system to be robust,it must detect failures. 2.What is the difference in function between the coordinator and its backup? 1. each fragment consist of a subset of columns of the original relation 2.vertical fragment are identified by a projection query.

-The difference is that the backup does not take any action that affects other sites. 3.What is the function of electron algorithm? -Electron algorithm enables the sites to choose the site for the new coordinator in a decenterlized mannner. 4. What are the advantages of coordinator selection? -Ability to continue processing immediately. -The backup coordinator approach avoids a substantial amount of delay while the distributed system recovers from a coordinate failure. 5.What ae the disadvantages of coordinator selection? -There is problem of overhead of duplicate execution of the coordinator's task. -A coordinator and its backup need to communicate regularly to ensure that their activities are synchronised. Topic Name: Distributed Query Processing Q1: What do you mean by Query Processing ? Ans: The process by which a declarative query is translated into low- level data manipulation operations. Q2:What is objective of query processing in Distributed Systems? Ans: Easy retrival of data To ensure the user query, which is posed as if the database was centralized (i.e. logically integrated), executes correctly and efficiently over data that is distributed. Q3:Discuss various steps in Query processing. Ans:

Query Parser is parsing and translating a given high-level language query into its immediate form such as relational algebra expressions. The parser need to check for the syntax of the query and also check for the semantic of the query ( it means verifying the relation names, the attribute names in the query are the names of relations and attributes in the database). A parse-tree of the query is constructed and then translated into relational algebra expression. Q4: In distributed System ,for choosing a strategy for query processing which issues must be taken into account: Ans: The cost of a data transmission over the network .The data transmission depends upon Speed of disk and type of network. The potential gain in performance from having several sites process parts of the query in parallel database Q5: What are the general approaches to query optimization? Heuristic- based query optimization: Given query expression, perform selection and projection as early as early. Eliminate duplicate computations

Cost-based query optimization: Estimate cost of different query expressions using heuristic and algebra manipulation and choose execution plan with lowest cost estimation. Topic Name: DIRECTORY SYSTEM Q.1 what is Directory? Ans: A directory is a listing of information about some class or objects. Directories also used to store other information. e.g. web browser store personal bookmarks. Q.2 what is use of directory? Ans: Directories can be used to find information about specific object or find objects that meet a certain requirements. It also store the necessary information. Q.3 what are the ways for accessing directory information? Ans: Directory information can be made available through web interfaces. People can access these directory information sometimes, programs also access directory information. Q.4 what are the reasons for having protocols for accessing directory information? Ans: 1. Directory access protocols are simplified and modified to a limited type of access to data. 2. They can be implemented with database access protocols. 3. It provides simple mechanism for giving name objects in a hierarchical fashion. Q.5 where DAP protocol is used? Ans: DAP protocol is used in a distributed directory system to specify what information is stored is each to the directory servers. Topic Name: LDAP

1>What is LDAP? LDAP (Lightweight Directory Access Protocol) for accessing online directory.LDAP (Lightweight Directory Access Protocol) is a protocol for communications between LDAP servers and LDAP clients. LDAP servers store "directories" which are access by LDAP clients. 2>.Why LDAP is called light weight? LDAP is called lightweight because it is a smaller and easier protocol which was derived from the X.500 DAP (Directory Access Protocol) defined in the OSI network protocol stack. 3>.What is the use of LDIF? LDIF is LDAP data interchange format used for storing and exchanging information 4>.How the communication between LDAP server and client takes place? A client starts an LDAP session by connecting to an LDAP server, called a Directory System Agent (DSA), by default on TCP port 389. The client then sends an operation request to the server, and the server sends responses in return. With some exceptions, the client need not wait for a response before sending the next request, and the server may send the responses in any order. 5>.What are the different operation request made by client when it is connected to ldap server? The client may request the following operations: Start TLS — use the LDAPv3 Transport Layer Security (TLS) extension for a secure connection Bind — authenticate and specify LDAP protocol version Search — search for and/or retrieve directory entries Compare — test if a named entry contains a given attribute value Add a new entry Delete an entry

Modify an entry Unbind — close the connection (not the inverse of Bind) 6>.Difference bet LDAP and database? The largest general difference between directories and databases is complexity. Databases are capable of storing almost any arbitrary set of information and can can be greatly customized for a specific purpose. They also provide a complex query interface, allowing for flexible searches returning customized results. Directories, on the other hand, tend to have very specific implementations that follow a strict pattern or schema. This allows them to be extremely fast, and allows for easy organization and comprehension of the data they store. 7>.What is DIT? Directories are viewed as a tree, like a computer's file system. This overall tree structure is called theDirectory Information Tree (DIT) 8>.What are the Object? What are the different objects of DIT? Each entry in a directory is called an object. These objects are of two types, containers and leafs. A container is like a folder: it contains other containers or leafs. A leaf is simply an object at the end of a tree. A tree cannot contain any arbitrary set of containers and leafs. It must match the schema defined for the directory. 9>.What are the applications of LDAP? Internet Application: Centralize or Distributed White pages ISP online subscriber directory Intranet Application: Internal White pages Certification and CRL distribution System/Network management database

10>.What are the Content of LDAP query? Base : a node within the DIT by giving distinguish name Search condition: combination of Boolean condition on individual attributes Scope :It can be the just the base or base and its children or the entire sub tree of base Attributes: Name of attributes which is to be return Limits on number of results nad resources consumption

Topic Name: Commit Protocol 1:What are the types of Commit Protocol? Ans: Two-phase commit protocol(2PC),Three phase commit protocol(3PC) 2: Explain two phase commit protocol. Ans: When transaction T completes its execution that is when all the sites at which T has executed inform transaction coordinator Ci that T has completed Ci starts the 2PC protocol. 3: What is the main disadvantage of the 2-phase commit protocol? Ans: Coordinator failure may result in blocking where a decision either to commit or to abort transaction may have to be postponed until Ci recovers. 4:Explain three phase commit protocol. Ans: This protocol avoids the blocking of problem under certain assumption that no network partition occurs and not more than n sites fail. Where n is predetermined number. Under these assumptions the protocol avoids blocking by introducing an extra third phase where multiple sites are involved in the decision to commit. 5: What are the assumption in 3 phase commit & its disadvantage?

Ans : Assumptions 1) there is no network partition occurs, and not more than k sites fail, where k is predetermined number. By this assumption the protocol avoids blocking by introducing an extra third phase where multiple sites are involved in the decision to commit. The protocol has to be carefully implemented to ensure that network partitioning does not result in inconsistencies, where a transaction is committed in one partition, and aborted in one another so that 3PC protocol is not widely used. Topic Name: Distributed Directory Trees 1) Why directory object is used? Ans:- The directory object is used to store and retrieve information about objects. 2) Why naming tree is called the Directory Information Tree(DIT)? Ans:- As the directory entry is associated with each vertex of this tree, where the entry holds information about the object having the corresponding names. 3) What is Directory System Agent(DSA)? Ans:- A system that maintaines and communicates directory information is called as Directory System Agent. 4) What is Relative Distinguished Name(RDN)? Ans:- The name component added as we move one step down the naming tree is called the Relative Distinguished Name for the corresponding entry. 5)All directory information will be part of one "global directory". true or false? Ans:- True. All directory information will be part of one "global directory. Global in the sense that is world wide, and global in the sense it will be common for all directory uses. 6)How an object is represented? Ans:- An object is represented by an X.500 directory always has so-calles distinguished name structured.

Q: Draw LDAP architecture.

ADBMS ORAL QUESTIONS TOPIC - CLIENT SERVER ARCHITECTURE (UNIT III). 1. WHAT DO U MEAN BY CLIENT SERVER ARCHITECTURE ? ANS: CLIENT / SERVER ARCHITECTURE DESCRIBES THE RELATIONSHIP BETWEEN THE TWO COMPUTER PROGRAMS WHERE ONE PROGRAM THE CLIENT MAKES THE REQUEST FROM ANOTHER PROGRAM THE SERVER WHICH FULFILLS THE REQUEST. 2. WHAT ARE THE ARCHITECTURES? ANS: DIFFERENT TYPES OF CLIENT SERVER

1. MAINFRAME ARCHITECTURE 2. FILE SHARING ARCHITECTURE 3. SINGLE TIER ARCHITECTURE

4. TWO TIER ARCHITECTURE 5. THREE TIER ARCHITECTURE.

3. WHAT IS MAINFRAME ARCHITECTURE? ANS: WITH MAINFRAME SOFTWARE ARCHITECTURE, ALL INTELLIGENGE IS WITHIN CENTRAL HOST COMPUTER. USERS INTERACT WITH HOST THROUGH A TERMINAL THAT CAPTURE KEYSTROKES AND SENDS INFORMATION TO THE HOST.

4. WHAT IS LIMITATION OF MAINFRAME ARCHITECTURE? ANS: LIMITATION OF MAINFRAME ARCH. IS THAT THEY DO NOT EASILY SUPPORT GRAPHICAL USER INTERFACES (GUI) OR ACCESS MULTIPLE DATABASES.

5.

WHAT IS ADVANTAGE OF MAINFRAME ARCHITECTURE?

ANS: ADVANTAGE OF MAINFRAME IS THAT IT IS NOT TIED TO THE HARDWARE PLATFORM. USER CAN INTERACT THROUGH PCs AND UNIX WORKSTATIONS.

6) What are architechcture? Ans : Advantage

advantage

and

disadvantage

of

Client

Server

1. Processing of entire database system is spread over client server architechcture. 2. It is possible to keep control over all clients through single system in DBMS it is require for transaction control and management. Disadvantage :

1. Implementation is more complex because it includes network management. 2. Additional burden on DBMS server to handle concurrency. 7) What are basic components of Database system architechcture? What is 1-tier architechcture? Ans: There are architechcture. 3 basic components of Database system

1. Presentation Logic: User Interface, displaying data to the

user, accepting input from user.
2. Business Logic: Data validation, ensuring the data is 100%

correct before adding it to database.
3. Data

Access Logic: Database communication, accessing tables and indices, packing and unpacking data. three

1-tier architechcture: In this architechcture all components of the application are handled in single layer. 8) What is 2-tier architechcture? Ans: 2-tier architechcture: In this architechcture components of the application are distributed in two layers. all

three

1st Layer/ Primary Layer: consists of Presentation logic AND Business logic. 2nd Layer/Secondary Layer: consists of Data Access logic. This consists of primary tier which incorporates all presentation and business logic, and a secondary tier which contains all data access logic. 9) What are its limitations of 2-tier architechcture? Ans: Limitations: 1. Implementing business logic in stored procedure can limit the scalability. 2. This architechcture is not effective in batch processing.

3. It limits interoperability by using stored procedure to implement complex processing logic because stored procedures are implemented by DBMS’s proprietary language. 4. This architecture is difficult to administer and maintain because when application reside on client every upgrade must be delivered and tested on server. 10) Identify which client server (tier) architechcture is shown below, and explain its components?

Ans: 3-tier architechcture. Topic: Web Fundamentals 11. What is the web based systems? Ans: To access data from databases some application programs are required. Now a day’s internet is most popular so web tools are most widely used for user interface. Web based systems are the systems used on internet for purpose of information exchange using user interface. Web based systems are the heart of E-commerce. 12. What is need of web interface to database? Ans: With growth of info services and E-commerce on web, databases used for information services, DSS, and transaction processing must be linked with web. For such connection it requires some bridge to link your application to database which is the web interface. 13. Which are the web fundamental components?

Ans: URL, HTML, Client side scripting, Applet, Servlets, server side scripting. 14. What is Servlet?

Ans: In 2-tier client server architechcture the application runs as part of web server. So in order to complete the user request architechcture has to load the java program with web server, that function is provided by Servlet. Servlets are mainly present on server side which defines the communication between the web server and application program. 15. What is web server?

Ans: Web server is the program running on server side which accepts the requests from the web browser and sends back result in the form of HML document. 16. What is web Applet?

Ans: Java code can be compiled into byte code which is platform independent and can be executed on any browser. Java applets are used for better GUI purpose which is downloaded as part of web page. 17. What is HTML?

Ans: HTML is hypertext markup language allows formatting on text. HTML is used for designing web pages. HTML allows to display tables, forms, style sheet as well as other display attributes. 18. What is URL?

Ans: URL is uniform resource locator. URL field in web browser allows user to enter the web address or web site name after that it connects to that system using some protocols. Parts of URL are, HTTP:is hypertext transfer protocol. Web address: name/ip address of machine that has web server.

Topic: XML Domain Specific DTDs 19. What is XML? Ans : XML (eXtensible Markup Language) mainly intended for Document Management. It is derived from SGML (Standard Generalized Markup Language) .XML can represent Data ,as well as many other kinds of structured data used in applications.

20. What is Standard Query Language for XML Called? Ans: W3C (World Wide Web Consortium) standard query language for XML is called XQUERY.

21. What is XPATH ? Ans : It is a language for path expression. It is a sequence of locations steps separated by “/”.Using XPATH we can select data from XML document.

22. What is XSLT? Ans : It is a transformation language can generate XML output.

23. Which is current approved version of XML? Ans: XML version = “1.0”

24. What is a reference? Ans: A references allows you to additional text or mark-up in an XML document. References always begin with “ & ” and end with “ ; ” .

25. What is an entity reference? Ans : An entity Reference ,like “ &amp ;” contains a name (eg amp) between the start and end delimiter. The name refers to predefined text or markup like a macro in C/C++ prog. Languages.

26. What is Character reference? Ans : A character references ,like “ & ” contains a hash mark followed by the number. The number always refers to the Unicode code for a single character such as 65 for A.

Ans: A FLOWR expression is a query construct composed of FOR,LET,WHERE,Order by AND RETURN clauses .

27. What is FLOWR expression?

28. What is MATHML? Ans: MATHML is intended to facilitate the use and reuse of mathematical and scientific content on the web and for other applications such as computer algebra systems ,print typesetting and voice synthesis. SOAP 29. What is SOAP?

Ans-It is Simple Object Access Protocol.It is a Protocol Specification for exchanging Structured information in the implementation of web services in computer Networks.It is a XML based messaging Protocol.

30. Ans-XML

SOAP relies on which language as its message format?

31.

What are common protocols SOAP relies on?

Ans-It relies on Application layer Protocols(SMTP,HTTP,HTTPS,RPC,etc.).But the most commonly used protocols are RPC and HTTP.

32. Why XML was chosen as the standard message format for SOAP?

Ans-Because of its widespread use by major corporations and open source development efforts. Hardware appliances are available to accelerate processing of XML messages.

33. Advantages of SOAP? Ansi)Using SOAP over HTTP allows for easier communication through proxies and firewalls than previous remote execution technology. ii)Versatile enough to allow for the use of different transport protocols. iii)platform independent iv)language independent

34. Disadvantages of SOAP? Ansi)Because of verbase XML format, SOAP can be considerably slower.This may not be an issue when only small messages are sent. ii)Although SOAP is an open standard, not all languages offer appropriate support. Java,Delphi, .NET and Flex offer excellent SOAP integration and/or IDE support.Python and PHP support is much weaker.

35. State one competing middleware technology with SOAP. Ans-CORBA(Common Object Request Broker Architecture).

36. What does SOAP specification contains?

Ans-Soap is a specification for using XML documents as messages.The SOAP specification containsi)A syntax for defining messages as XML documents(SOAP messages). ii)A modelfor exchanging SOAP messages. iii)Set of rules for representing data within SOAP messages(SOAP encoding). iv)A guideline for transporting SOAP message over HTTP. v)A convention for performing RPC using SOAP messages.

37. Where will you place SOAP in 2-tier and 3-tier Architectures. Ans2-tier -> The Presentation layer and business logic are in a single layer. 3-tier -> At middle layer. 38. What is Active Server Pages?

Active Server Pages (ASPs) are Web pages that contain server-side scripts in addition to the usual mixture of text and HTML tags. Server-side scripts are special commands you put in Web pages that are processed before the pages are sent from the server to the web-browser of someone who's visiting your website.

39. what are the Requirements to run ASP?

Since the server must do additional processing on the ASP scripts, it must have the ability to do so. The only servers which support this facility are Microsoft Internet Information Services & Microsoft Personal Web Server. Let us look at both in detail, so that you can decide which one is most suitable for you.

40. what are the few basic rules for XML document elements.? 1:Element names can contain letters, numbers, hyphens, underscores, periods, and colons when namespaces are used (more on namespaces later). 2:Element names cannot contain spaces; underscores are usually used to replace spaces. 3: Element names can start with a letter, underscore, or colon, but cannot start with other non-alphabetic characters or a number, or the letters xml.

41. what are formats to represent the XML Elements? Elements look like this and always have an opening and closing tag: <element></element>

42. what are the Internet Information Services? This is Microsoft’s web server designed for the Windows NT platform. It can only run on Microsoft Windows NT 4.0, Windows 2000 Professional, & Windows 2000 Server. The current version is 5.0, and it ships as a part of the Windows 2000 operating system.

43. what is DTD? Document Type Definition (DTD) is the original way to validate XML document structure and enforce specific formatting of select text, and probably still the most prevalent. Although the posting of the XML declaration at the top of the DTD would lead one to believe that this is an XML document, DTDs are in fact non-well-formed XML documents. This is because they follow DTD syntax rules rather than XML document syntax. In following line the reference is to the DTD located in the first element under the XML document declaration:

44. what are XML Attributes?

Attributes contain values that are associated with an element and are always part of an element’s opening tag: <element attribute=”value”></element>. The attribute name must follow an element name, then an equals sign (=),then the attribute value, in single or double quotes. The attribute value can contain quotes, and if it does, one type of quote must be used in the value, and another around the value.

45.

State True or False . Is it necessary to close a tag in XML?

Ans : True.(Explanation : Each tag is delimited by angle bracket)

46 . What is difference between HTML and ASP?

Ans : In HTML we cant make changes dynamically in web pages. But in ASP we can make changes dynamically.

47. Define the following : i) Thin client ii) Thick client Ans : Thin client : The architecture in which the client implement GUI, and the server implements both business logic and data management ,such clients are called as thin client.

Thick client: Clients that implement user interface and a part of business logic, with remaining part being implemented at the server level, such clients are called as Thick client.

48. Identify the following Diagram .

Ans : Thick Client.

49. Explain the following Diagram.

50.

Explain – entity references.

Ans : It’s a standard XML document – the root element is the Envelope, which has a namespace called ‘s’. The envelope contains header, which indicates of transaction and the ID, and a Body, which indicates requested service.

1 what is .DIFF BETWEEN OLTP N OLAP(3 PTS) A= OLTP==> SIMPLE TANSACRION OLAP==> COMPLEX TRANS. READ/WRITE QUERY READ QUERY SMALL DB

HUGE DB

2. WHAT ARE 3 TIERS IN DATA WAREHOUSE ARCH. A= BOTTOM TIER-DATA MARTS,DATA WAREHOUSE,METADATA REPOSITORY MIDDLE TIER-OLAP SERVER TOP TIER- DATA MINING TOOLS,ANALYSIS TOOLS ETC

3.WHAT ARE ADVANTAGE. OF DATA WAREHOUSE A= MARKETING WEAPON

CUSTOMER SUPPORT BUSINESS INTELLIGENCE DECISION SUPPORT

4.WHAT IS APPLICATION. OF DATA WAREHOUSE A= INFORMATION PROCESSING ANALYTICAL PROCESSING DATA MINING

5.WHAT IS USE OF META DATA REPOSITORY A= STORES DATA ABOUT DATA MARTS AND DEFINITIONS OF DATA WARE HOUSE

6.TYPES OF OLAP SERVERS A= ROLAP--RELATIONAL DBMS MOLAP--MULTI-DIMENSIONAL VIEWS HOLAP--COMBINES ROLAP N MOLAP

7.WTA IS DATA MART A= CONTAINS SUBSET OF CORPORATE WIDE DATA THAT IS OF VALUE TO SPECIFIC GROUP OF USERS

8.WHAT ARE TYPES OF DATA WAREHOUSE DESIGN A= TOP DOWN--STARTS WITH OVERALL DESIGH N PLANNING BOTTOM UP--STARTS WITH EXPERIMENTS N PROTOTYPES COMBINED--COMBINES BOTH OF ABOVE

9.USE OF DATA PREPROCESSING A= IMPROVE QUALITY OF DATA IMPROVE QUALITY OF MINING RESULT IMPROVE EFFICIENCY N EASE OF MINING

10.DATA PRE PROCESSING TECH. A= DATA CLEANING DATA INTEGRATION DATA TRANSACTION DATA REDUCTION

Q1. What is Data Warehouse?What are the features of data Warehouse?

Q2.what are the steps in the design and construction data warehouse?what are it's components?

Q3.Explain the three-tier data Warehouse architecture

Q4.What are the applications of Data Warehouse?

Q5.What are the differences between data warehouse and data marts?

1. What is Online Analytical processing?(OLAP)  OLAP is an interactive system that permits an anlyst to view different summarizes of multidimensional data.OLAP tools support interactive analysis of summery information.

2. Explain In Brief OLAP Implementation.
 OLAP is implemented on multidimensional models.In MOLAP servers,data warehousesdirectly store multidimensional data in special data structures(eg,arrays) and implement the OLAP operations over this special data structure.

3. What Is Relational OLAP(ROLAP) System?
 Special schema design:star,snowflake. Special indexes:bitmap,multi-table join Special tuning: maximize query throughput. Proven tech. tend to outperform specialized MDDB especially on large data sets.

4. What Is Hybrid OLAP(HOLAP)Systems?

 A system which stores some summaries in memory and store,the base data and other summaries in relational database are called HOLAP

5. Give OLAP Component Of SQL  Extended aggregation-> 1999 standard define a rich set of aggregation function .The new aggression functions on single attributes are standard deviation and variance.1999 also supports new class of binary aggregate function,which compute stastical result on pair of attributes,they include correlation,covariance and regression curves which give a line approximating.

Q1. What is a data cube ? ----Data cube is used to represent data along some measure of interest. Although called a “data cube” ,it can be 2-dimensional, 3-dimensional , or higher dimensional.

Q2. What are the various operations on data cube. ----Summerization or rollup Drill down Iceberg-Cubes

Q3. What is a cross tab ? ----- Cross tab is a table where values for one attribute (say A) form the row headers, values for the another attribute (say B) form the column header, and each cell is identified by (ai, bj) where ai is value for A & bj is value for B.

Q4. What are the different data preprocessing techniques ? ----Data Integration Data Cleaning Data normalization Data reduction.

Q5. What are the problems with data ? ----Missing attributes and missing attribute values Improper types

Q6. What is the need for data preprocessing ?

----

Data quality is a key issue with data mining

To increase the accuracy of the mining, we hav to perform data preprocessing Other-wise Garbage in => Garbage out.

• What is semantic integration? Ans:- a coolection of viewsto give a group of users a uniform presentation of relevant data from multiple databases is called semantic integration.

what is data integration?

Ans:- consolidate different source into one repository usually data warehousing(schema reconsolidation). a) using metadata.

b) correlation analysis.

what is different stratergies of reduction?

Ans:1) 2) 3) 4) 5) • data cube aggregation. attribute subset selection. dimensionally reduction. numerosity reduction. concept hierarchy generation. what is data cleaning?

Ans:-Real world data tend to be incomplete, noisy,inconsistent,to fill in missing values,smooth out noise and correct inconsistencies in the data.

which methods to used for data cleaning?

Ans:1) 2) 3) 4) 5) look for missing values. Ignore the toples. Fill missing values manually. Use global constant to fill in missing values. Use most probable valuetofill in missing values.

Q:What all sub-processes are genarally involved in Data Transformation? Ans:Smoothing,Aggregations,Generalization,Normalization,Attribute construction.

Q:Name the different strategies for data reduction? Ans:Data cube aggreation,Attribute subset selection,Dimensionality reduction,Numerosity reduction,Discretization and concept hierarchy generation.

Q:What is the use of data reduction? Ans:To obtain a reduced representation of data set that is much smaller in volume,yet closely maintains the integrity of the original data.

Q:What is the aim behind data transformation? Ans:To transform or consolidate into forms appropiate for mining

Q:What is mean by smoothing? Ans:Removing noise from the data is called smoothing.

Q:compare r-olap & m-olap.

Q:name any two operations on data cube that you have performed in your practical.

Q:what is hybrid Olap?give its benifits

Q:explain data cleaning

A:real-world data tend to be incomplete,noisy and inconsistent.Data cleaning routines attempt to fill in missing values,smooth out noise and correct inconsistencies in the data.

1. Fill in missing values (attribute or class value): * Ignore the tuple: usually done when class label is missing. * Use the attribute mean (or majority nominal value) to fill in the missing value. * Use the attribute mean (or majority nominal value) for all samples belonging to the same class. * Predict the missing value by using a learning algorithm: consider the attribute with the missing value as a dependent (class) variable and run a learning algorithm (usually Bayes or decision tree) to predict the missing value. 2. Identify outliers and smooth out noisy data: * Binning o Sort the attribute values and partition them into bins (see "Unsupervised discretization" below); o Then smooth by bin means, bin median, or bin boundaries. * Clustering: group values in clusters and then detect and remove outliers (automatic or manual) * Regression: smooth by fitting the data into regression functions. 3. Correct inconsistent data: use domain knowledge or expert decision.

Q:explain data transformation

A:In data transformation,the data is transformed or consolidated into forms appropriate for mining.Data trnsformaion involve the following: 1. Normalization: * Scaling attribute values to fall within a specified range. o Example: to transform V in [min, max] to V' in [0,1], apply V'=(VMin)/(Max-Min) * Scaling by using mean and standard deviation (useful when min and max are unknown or when there are outliers): V'=(V-Mean)/StDev 2. Aggregation: moving up in the concept hierarchy on numeric attributes. 3. Generalization: moving up in the concept hierarchy on nominal attributes. 4. Attribute construction: replacing or adding new attributes inferred by existing attributes.

Q:Explain data reduction

A:data reduction techniques can be applied to obtain a reduced representation of the data set that is much smallerin volume,yet closely maintains the intregrity of theoriginal data.that is mining on the reduced data set shuld be more efficient yet produce the same analyical result.

1. Reducing the number of attributes * Data cube aggregation: applying roll-up, slice or dice operations. * Removing irrelevant attributes: attribute selection (filtering and wrapper methods), searching the attribute space (see Lecture 5: Attributeoriented analysis). * Principle component analysis (numeric attributes only): searching for a lower dimensional space that can best represent the data..

2. Reducing the number of attribute values * Binning (histograms): reducing the number of attributes by grouping them into intervals (bins). * Clustering: grouping values in clusters. * Aggregation or generalization 3. Reducing the number of tuples * Sampling

Q 1>what is the difference between OLTP query and OLAP query Ans=>OLTP query: 1.used to modify data. 2. Require fully updated database. OLAP query: 1.doesn’t modify data. 2. Doesn’t require fully updated database. Q 2>what is OLAP? Ans=>It is a online analytical processing. Q 3>what is OLTP?

?

Ans=> It is a online transaction processing .OLTP requires that the data are completely Up to date

Q 4> what are the operations OLAP tool supports? Ans=> supports: 1 slice operation 2 Dice operation

3 Roll up operation 4. Drill down 5. Visualization operation

Q 5>what are the different kinds of OLAP tool used? Ans=> ROLAP, MOLAP, HOLAP

Q1:What is noisy data? Ans: noise is random error or variance in a measured variable.so,it is necessary to smooth out the data to remove the noise.

Q2:What is data Integration? Ans: Data mining often requires data integration which combines data from multiple sources into coherent data store. Q3:How to transform data? Ans:data are transformed or consolidated into forms appropriate for mining.methods are: Smoothing Aggregation Generalization Normalization Attribute construction Q4:what are back end tools? Ans:data extraction Data cleaning Data transformation

Load Refresh Q5:what are data cube measures? Ans:data cube measure is numerical function that can be evaluated at each point in data cube space.

Unit 6 1. Which are different techniques of document indexing ? 2. Compare data retrieval and information retrieval ? 3. What are inverted index and signature file ? 4. Note on Indexing of documents 5. Why Effective index structure is important ? 6.What is web crawler? Ans: Web crawler are programs that locate and gather information on web. They recursively follow hyperlinks present in known documents to find other documents. A crawler retrieves the document and adds info. Found the documents to a combined index; the document is generally not stored, although some search engines do cache a copy of the document to give clients a faster access. 7. Describe Web search Engine. Ans: Since the number of documents on theWeb is very large, it is not possible to crawl the whole Web in a short period of time; and in fact, all search engines cover only some portions of theWeb, not all of it, and their crawlers may take weeks or months to perform a single crawl of all the pages they cover. There are usually many processes,

running on multiple machines, involved in crawling. A database stores a set of links (or sites) to be crawled; it assigns links from this set to each crawler process. New links found during a crawl are added to the database, and may be crawled later if they are not crawled immediately. Pages found during a crawl are also handed over to an indexing system, which may be running on a different machine. Pages have to be refetched (that is, links recrawled) periodically to obtain updated information, and to discard sites that no longer exist, so that the information in the search index is kept reasonably up to date. The indexing system itself runs on multiple machines in parallel. It is not a good idea to add pages to the same index that is being used for queries, since doing so would require concurrency control on the index, and affect query and update performance. Instead, one copy of the index is used to answer queries while another copy is updated with newly crawled pages. At periodic intervals the copies switch over, with the old one being updated while the new copy is being used for queries. 8. What is question answering system ? Ans: Question answering systems attempt to provide direct answers to questions posed by users. They are targeted at info on web typically generated one or more keyword queries from a submitted question, execute the keyword queries on against web search engines, and parsed returned documents that answer the question. 9. Describe distinct ways a user can find information on the web? Ans: 1) Information Extraction. 2) Querying Structured data 3) Question Answering. 10)What do Web Search engines do ? Describe in one line.

Ans: Web search engines crawl the web to find pages, analyze them to compute prestige measures, and index them. Q1:-WHAT IS SYNONYM? ANS1:-synonym means the words having the same meaning but different representation Q2:-WHAT IS HOMONYMS? ANS2:-Homonym means the words having the same pronounciation but bifferent meanings Q3:-WHAT IS ONTOLOGIES? ANS3:-It is the process to overcome the limitation of keyword based search Q4:-EXPLAIN SYNONYM WITH THE HELP OF EXAMPLE? ANS4:-Synonym is the collections of the words having the same meaning but different representation for eg."motorcycle repair" = "motorcycle representation"etc. Q5:-EXPLAIN HOMONYM WITH THE HELP OF EXAMPLE? ANS5:-homonym is the collections of the words having the different meaning but same pronounciation for eg. "hair" and "hare"etc. Q.1 How is relevance ranking calculated using TF? A. We use the frequency of occurance ( that is how many times that particular term has occurred ) of the term in the document as a measure of its relevance. One way of measuring TF (d,t) i.e. Term Frequency or the relevance of the document to a term t is TF(d,t) = log (1+ n(d,t)/n(d)) Q.2What is the use of Information Retrieval System? A. Information Retrieval System is intended to support people who are actively seeking or searching for information,

as in internet searching. Information Retrieval typically assumes a static or relatively static database, against which people search. Q.3 Explain Simillarity based Retireval System. A. Simillarty based Retrieval relies on best match rather than exact match and uses techniques to compute the similarities between the query and information items. As the user information needs are also fuzzy, an important characteristeic for this class or Retrieval Technique is its support for the iterative process of retrieval. Q.4 What is cosine Simillarity? Q.5 What is the use of similarity based Retrieval System? Q.1. What is the difference between Information Retrieval and Data Retrieval? A. 1. Data Retrieval System gives an exact match of the search elements, whereas, Information Retrieval System gives partial or best match results. 2. Query language in Data Retrieval System is Artficial, whereas, Natural language is used in Information Retrieval System. 3. Complete Query specification is required in Date Retrieval System, whereas, partial Query specification works in Information Retrieval System. Q.2. Explain the components of information Retrieval System. A. The typical components of Information Retrieval System are : 1. Input 2. Processor 3. Output

Q.3. what is Relevance? How is it calculated? A. Relevance can be calculated as the cosine between the two vectors, i.e. their cross product divided by the square roots of the squares of each vector. This measure varies between 0 and 1. Q.4. what is TF-IDF? (Term Frequency – Inverse Document Frequency ) A. A measure of the frequency of occurrence of a particular term in a particular document as well as how often that term occurs in the entire collection of interest. Q.5. How is TF – IDF used? What is the need? A. If a term occurs frequently in one document but also occurs frequently in every other document in the collection then it is not a very important t word and the TF-IDF measure reduces the weight placed on it. A common term is considered less important than the rare terms. If a term occurs in every document then the inverse document frequency is zero./ If it occurs in half of the documents, it will be 0.3, and if it occurs in 20 of 10000 documents, it will be 2.6 Q.6. Illustrate the components of Information Retrieval System using Diagram. Q.7. Information Retrieval System is best match or partial match, whereas, data Retrieval System is exact match. Expand.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.