Professional Documents
Culture Documents
TWOMARKSQUESTIONSANDANSWERS
Unit I
• It is used to determine the patterns and relationships in a sample data.Datamining tasks that
belongs to descriptivemodel:
• Clustering
• Summarization
• Association rules
• Sequence discovery
12. Describe the method of generating frequent item sets without candidategeneration.
Frequent-patterngrowth(or FP Growth) adopts divide-and-conquerstrategy.
Steps:
Compress the database representingfrequent items into a frequent pattern treeor FP tree
Divide the compresseddatabase intoa set of conditionaldatabaseMine each
conditionaldatabaseseparately
1.Define Clustering?
Clustering is a process ofgrouping the physical or conceptualdata object intoclusters.
22. What are the reasons for not usingthe linear regressionmodel to estimate theoutput data?
There are many reasonsfor that, Oneis that the data do not fita linear model,It ispossible howeverthat
the data generally do actuallyrepresent a linearmodel, but thelinearmodelgenerated is poor because
noise or outliers exist in the data.
Noise is erroneous data and outliers are data values thatare exceptions to the usual andexpected data.
23. What are the two approaches used by regression to perform classification?
Regression can be used to performclassification using the following approaches
1. Division: The data aredividedintoregions based on class.
2. Prediction:Formulas are generated to predict the output class value.
24. What do u mean by logistic regression?
Instead of fitting a data into a straightlinelogistic regressionuses a logisticcurve.The formulafor the
univariate logistic curveis
P= e (C0+C1X1)1+e (C0+C1X1)
The logistic curve gives a value between 0 and 1so it can be interpretedas theprobability ofclass
membership.
3.Define OLTP?
If an on-line operational database systems is used for efficient retrieval, efficientstorageand
managementof large amounts of data, then the systemis said to be on-linetransaction processing.
4.Define OLAP?
Data warehousesystems serves users(or) knowledge workers in the role ofdataanalysis
anddecision-making. Suchsystems can organize and present data in variousformats. These systems are
known as on-lineanalyticalprocessingsystems.
11.Define dimensiontable?
A dimension tableis used fordescribing the dimension.
(e.g.) A dimension table for item may containtheattributesitem_name,brand and type.
18.Point out the major differencebetween the star schemaand the snowflakeschema?
The dimension table of the snowflake schemamodel may be kept in normalizedformto reduce
redundancies. Such a tableis easy to maintain and saves storage space.
19.Which is popular in the data warehouse design, star schema model (or)snowflake schema
model?
Star schema model, because the snowflake structure canreduce the effectivenessand more joins
will be needed to execute a query.
20.Define concepthierarchy?
A concept hierarchydefines a sequence of mappings froma set of low-levelconcepts tohigher-
level concepts.
21.Define total order?
If the attributes of a dimension whichforms a concept hierarchy such as
“street<city<province_or_state<country”, thenit is said to be total order.
Country Province orstate City
Street
Fig: Partialorder for location
22.Define partialorder?
If the attributes of a dimension whichforms a latticesuch as“day<{month<quarter; week}<year, then it
is said to be partial order.23.Define schemahierarchy?
A concept hierarchythat is a total (or) partialorderamong attributes in adatabaseschema is called a
schema hierarchy.
24.List outthe OLAP operations in multidimensional data model?
_ Roll-up
_ Drill-down
_ Slice and dice
_ Pivot (or) rotate
25.What isroll-up operation?
The roll-upoperation isalso called drill-upoperation which performs aggregationon a data cube either
byclimbing up a concept hierarchyfor a dimension (or) bydimension reduction.
26.What is drill-down operation?
Drill-downis the reverse of roll-up operation.It navigates fromless detailed datato more detailed
data.Drill-downoperation can be taken place by stepping down aconcept hierarchy for a dimension.
27.What is slice operation?
The slice operationperforms a selection on one dimension ofthe cube resulting ina sub cube.
28.What isdice operation?
The dice operationdefines a sub cube by performing a selection on two(or) moredimensions.
35.Define ROLAP?
The ROLAPmodel is anextended relational DBMS that mapsoperations onmultidimensionaldata
tostandardrelational operations.
36.DefineMOLAP?
The MOLAP model is a
specialpurposeserverthatdirectlyimplementsmultidimensionaldataandoperations.
37.Define HOLAP?
The hybridOLAP approach combinesROLAP and MOLAP technology,benefiting fromthe greater
scalability of ROLAP and the fastercomputation of
MOLAP,(i.e.) a HOLAP server may allow large volumes of detail data tobe storedinarelational
database,while aggregations are kept in a separate MOLAP store.
42.Define indexing?
Indexing isa technique,which is used forefficient dataretrieval (or) accessing
data inafastermanner.When a table grows in volume, the indexes also increase insizerequiring more
storage.
43.What are the typesof indexing?
_ B-Treeindexing
_ Bit map indexing
_ Join indexing
44.Define metadata?
Metadata isused in data warehouse is usedfor describing data about data.
(i.e.) metadata are the datathatdefine warehouse objects. Metadata are createdforthedata names and
definitions of the given warehouse.
45.Define VLDB?
Very LargeData Base. If a databasewhose size is greaterthan 100GB, thenthe database is saidto be
very largedatabase.
UNIT – V
6. What are the areas inwhich data warehouses are usedin present and in future?
The potential subjectareas in whichdata ware housesmay bedeveloped atpresentandalso in future are
1.Census data:
The registrar general and census commissioner ofIndia decennially
compilesinformation of all individuals,villages, population groups, etc. Thisinformationis wide
ranging such as theindividual slip. A compilation ofinformation of individualhouseholds, ofwhich a
database of 5%sample is maintained for analysis. Adatawarehouse can be built fromthis database
uponwhich OLAP techniquescan be applied,Data mining also can be performed for analysis and
knowledge discovery
2.Prices ofEssentialCommodities
The ministry of food and civil supplies, Government of India complies
daily data for about 300observationcenters in the entire country on the prices ofessential
commoditiessuch as rice,edible oiletc, Adata warehouse canbe builtfor this dataand OLAP techniques
can be appliedfor its analysis
7. What are the other areas for Datawarehousing and data mining?
• Agriculture
• Rural development
• Health
• Planning
• Education
• Commerce and Trade
8. Specify some of thesectors in which data warehousing and data mining are used?
• Tourism
• ProgramImplementation
• Revenue
• EconomicAffairs
• Audit and Accounts
18. DefineDMQL
Data Mining Query Language
It specifies clauses and syntaxes for performingdifferenttypes of data miningtasks for example data
classification, data clustering and miningassociationrules. Alsoit uses SQl-like syntaxesto mine
databases.
UNIT-I
UNIT-II
Unit–III
UNIT IV
UNIT V