You are on page 1of 4

Literature Survey.

xlsx

Sr.No Paper Name Authors Publication Year Abstract Dataset-algorithm Advantages Disadvantages Result Future-Works

1 Mining Qian Cheng , © 2016 ACM, 2016 In this study, we focus on Linear-SVM,RBF-SVM,Decision- It provides a promising solutions to The RBF-SVM model yields Passively monitoring patient
Discriminative Jingbo BCB the validity of predictive Tree,DPClass monitor health status by simply the highest accuracy while status will be the promising
Patterns to Predict Shang, models for monitoring carrying a smartphone, but also the DPClass model provides application of this study,
Health Status for Joshua Juen , health status, including demonstrate how demographics better interpretation of the requiring additional
Cardiopulmonary Jiawei Han, both model performance influences predictive models of model mechanisms. techniques to take the
Patients Bruce Schatz and model interpretation. cardiopulmonary disease. motion collection out of clinic
into the home. Besides a
precise prediction for health
status is possible.

2 DPClass:An Jingbo Shang 2016 SIAM 2016 DPClass is a natural and Top-k Pattern Selection Discriminative pattern-based In real world applications, DPClass addresses the DPClass can be extended
Effective but ,Wenzhu International effective way to resolve classification framework (DPClass) many people favor general classification uniform machine learning
Concise Tong, Jian Conference on pattern-based classification could perform as good as previous generalized linear models problem and provide framework DPLearn, which
Discriminative Peng ,Jiawei Data Mining by adopting discriminative state-of-theart algorithms, provide instead of complex models, interpretability by supports multi-classes
Pattern-Based Han patterns which are the great interpretability by utilizing only including trees and neural incorporating a limited classification, regression, and
Classification prefix paths from root to very limited number of discriminative networks, as long as the number of discriminative ranking along the same
Framework nodes in tree-based models patterns, and predict new data accuracies are enough in patterns. discriminative pattern
(e.g., random forest) and extremely fast. practice, because they are selection direction. Another
further compress the mature, flexible, more possible direction is to apply
number of discriminative efficient when making DPClass to labeled textual
patterns by selecting the prediction, and easier to be and sequential data targeting
most effective pattern understood by providing on finding interesting
combinations that fit into a probabilistic interpretation. patterns (e.g., language
generalized linear model patterns).

3 Mining Electronic Naren IEEE 2010 A major goal of the new Common features among disparate Because electronic health High-quality, coded clinical
Health Records Ramakrishna initiatives is to encourage patients—whether diagnoses, data standards have not yet information will surely allow
n, David A. the development of a procedures, or even lab data—can be been fully developed or data mining to prove its
Hanauer, digital infrastructure for discovered using association analyses agreed upon, we may end up worth
Benjamin J. providers and patients so with an infrastructure
Keller, that care can be delivered comprised of too many
more effectively and noninterchangable and
efficiently proprietary systems,
resulting in a “tower of
Babel” of such data.

Page 1 of 4
Literature Survey.xlsx

Sr.No Paper Name Authors Publication Year Abstract Dataset-algorithm Advantages Disadvantages Result Future-Works

4 A Practical Guide Chih-Wei 5200 citations Initial The support vector machine RBF Kernel For users not familiar with SVM who This guide is not for SVM Probable increase in accuracy
to Support Vector Hsu, Chih- version: (SVM) is a popular often get unsatisfactory results at researchers nor does it for the provided guidelines.
Classification Chung 2003 Last classification technique. first, This guide gives a outline a guarantee the highest
Chang, and updated: However, beginners who “cookbook” approach which usually accuracy. Also, it does not
Chih-Jen Lin May 19, are not familiar with SVM gives reasonable results. intend to solve challenging or
2016 often get unsatisfactory difficult problems.
results since they miss
some easy but significant
steps. In this guide, we
propose a simple procedure
which usually gives
reasonable results

5 Generalization Micheline Journal of 2012 This paper addresses the MedGen and MedGenAdjust 1) Generalization by attribute- Efficiency and scalability It provides an integration Our future work involves
and Decision Tree Kamber Lara Signal efficiency and scalability algorithms oriented induction, to compress the become issues of concern which leads to efficient, high- further refinement of the
Induction: Winstone Processing issues by proposing a data training data. This includes storage of when these algorithms are quality, multiple-level leveladjustment procedure of
Efficient Wan Gong Systems classification method which the generalized data in a applied to the mining of very classification of large MedGenAdjust.
Classification in Shan Cheng integrates attribute- multidimensional data cube to allow large, real-world databases. amountsof data, the
Data Mining Jiawei Han oriented induction, fast accessing, 2) Relevance analysis, Most decision tree relaxation of the
relevance analysis, and the to remove irrelevant data attributes, algorithms have the requirement of perfect
induction of decision trees. thereby further compacting the restriction that the training training sets, and the elegant
training data, 3) Multi-level mining, tuples should reside in main handling of continuous and
which combines the induction of memory. In data mining noisy data
decision trees with knowledge in applications, very large
concept hierarchies. training sets of millions of
examples are common.
Hence, this restriction limits
the scalability of such
algorithms, where the
decision tree construction
can become inefficient due
to swapping of the training
samples in and out of main
and cache memories.

Page 2 of 4
Literature Survey.xlsx

Sr.No Paper Name Authors Publication Year Abstract Dataset-algorithm Advantages Disadvantages Result Future-Works

6 A Parallel Random Jianguo IEEE 2016 This paper addresses the From the perspective of task-parallel Taking advantage of the data-parallel Benefitting from the task- Focus on the incremental
Forest Algorithm Chen, Kenli efficiency and scalability optimization, a dual parallel optimization, the training dataset is parallel optimization, the parallel random forest
for Big Data in a Li, Zhuo issues by proposing a data approach is carried out in the reused and the volume of data is data transmission cost is algorithm for data streams in
Spark Cloud Tang, Kashif classification method which training process of RF, and a task reduced significantly. effectively reduced and the cloud environment, and
Computing Bilal, Shui integrates attribute- Directed Acyclic Graph (DAG) is performance of the improvement of the data
Environment Yu, Chuliang oriented induction, created according to the parallel algorithm is obviously allocation and task
Weng and relevance analysis, and the training process of PRF and the improved. Experimental scheduling mechanism for
Keqin Li induction of decision trees. dependence of the Resilient results indicate the the algorithm on a
Distributed Datasets (RDD) objects. superiority and notable distributed and parallel
Then, different task schedulers are strengths of PRF over the environment.
invoked for the tasks in the DAG. other algorithms in terms of
Moreover, to improve the classification accuracy,
algorithm’s accuracy for large, high- performance, and scalability.
dimensional, and noisy data, a
dimension-reduction approach in the
training process and a weighted
voting approach in the prediction
process prior to parallelization is
performed.

7 Support-Vector Corinna Kluwer 1995 The support-vector In this feature space a linear decision The idea behind the support-vector This result is extended to non-
Networks Cortes,Vladi Academic network is a new surface is constructed. Special network was previously implemented separable training data.
mir Vapnik Publishers 199 learning machine for properties of the decision surface for the restricted case where the
5 two-group classification ensures high generalization ability of training data can be separated
problems. The machine the learning machine. without errors.
conceptually
implements the
following idea: input
vectors are non-linearly
mapped to a very high-
dimension feature
space.

Page 3 of 4
Literature Survey.xlsx

Sr.No Paper Name Authors Publication Year Abstract Dataset-algorithm Advantages Disadvantages Result Future-Works

8 Mining of Iyad Batal PhD thesis, 2012 The objective of this report Temporal pattern mining The emergence of large-scale First, the most critical open The benefit of the patterns It's utility for knowledge
predictive and Milos University of is to briefly review data datasets in health care that record problem is to find ways for for classification can be discovery needs to be further
patterns in Hauskrecht Pittsburgh. mining and machine large amounts of information about reducing the space in which judged by the analysis of investigated and would
Electronic health learning method that aim the patients, their diseases and the predictive patterns are models built using these require a carefully designed
records data to extract predictive treatments and provide us with an searched. Efficient heuristics features,. evaluation studies with
patterns characterizing opportunity to understand better the for restricting clinical human evaluators.
patient subgroups and their dynamics of the disease, efficacy of variables or values the
predictive differences in treatments, and various influences variables may take when
electronic health records affecting the well-being of a patient building and searching more
(EHRs). complex patterns are
neccessary and need to be
designed. Second, the
patterns/features based on
temporal abstractions may
be used either for building
better classification models
or for the purpose of
knowledge discovery when
subgroups.

Page 4 of 4

You might also like