You are on page 1of 8

ASSIGNMENT #1

Report on PolyAnalyst
Capabilities, functionality and data mining

Group Members:
 Abdul Basit
Dr. A. Q. Khan Institute (18F-BSCS-13)
of Computer Sciences and Information
 Syed Hassan Raza Razvi (18F-BSCS-19)
Technology
 Syed Hussian Ali Shah Gallani (18F-BSCS-37)
 Hammad Abbasi (18F-BSCS-40)
 Qazi Khalid Amin (18F-BSCS-41)
 Furqan Hameed (18F-BSCS-14)

Submitted to:

Madam Sadi Mobeen


Introduction
Megaputer Intelligence is a leading developer of data and text mining software and custom
analytical solutions for various application domains. Megaputer uses PolyAnalyst; a data analysis
platform widely recognized by industry analysts such as Gartner, Forrester, and Hurwitz and is
used by Fortune 500 companies worldwide. Megaputer products offer comprehensive integrated
Analytics/BI environment featuring a broad selection of cutting edge NLP and predictive modeling
engines.

It uses client server architecture. There is some component use in PolyAnalyst which is
shown as follows.

Poly Analyst can read from a variety of databases and statistical packages. It can also read text
from HTML, Word and PDF files. An OLAP engine allows data to be aggregated or "diced and
sliced" prior to applying data mining algorithms. It includes tools such as decision trees, neural
networks, genetic algorithms, fuzzy logic classification, case-based reasoning, text
categorization and more.

PolyAnalyst Capability and functionality


PA Dictionary
It is very useful for intelligent spell checker. It is a structured list of words which may also
define relationship or other properties e.g. Synonyms, spellchecker etc. it add context and
semantic information to NLP. It perform the following task

Data Manipulation and Cleansing


It is very important part of analysis work flow. Data cleansing is a state that take place before
be move on data analysis whether it is data driven or analysis driven.in this section we cover
on text data. As we know data is not always in its best shape. In this section we perform some
process and restructured or we change some who this is called Cleansing or

Data cleansing. We can perform it so that our data is suitable for analyst.

Records can be selected according to multiple criteria. A union, intersection,

or complement of datasets can be created. Rules, automatically discovered by PolyAnalyst or


entered by the user, can be used to produce new fields. Exceptional records can be filtered
out. The drill-through feature allows selecting data points for a new dataset visually from a
chart. Data can be split into n-tile percentage intervals for any numerical variable.

Data Access
Both systems can directly access data held in Oracle, DB2, Informix, Sybase, MS SQL Server,
Ingres, or any other ODBC-compliant database. Data and exploration results can be exchanged
with MS Excel 7.0 or 97. New data can be added to the project when necessary. A customized
version of PolyAnalyst PRO or Power comes merged with the IBM Visual Warehouse or
ORACLE Express.
Machine Learning
PolyAnalyst use some supervise or unsupervised leaning algorithm to run self-learning engine.
It use 14 machine learning algorithm for convenient result reporting and outputting. Out of
seven unique exploration engines mentioned above, a new PolyNet Predictor is a newly
developed innovative tool. It combines the power of Group Method Data Handling and Neural
Net technologies to predict values of numerical variables. As always, the statistical significance
of the results obtained by each engine of PolyAnalyst is rigorously checked.

Visualization
PolyAnalyst has an easy-to-use graphical user interface. Data and exploration results can be
visualized in numerous formats: histograms, line and point plots with zoom and drill-through
capabilities, colored charts for three dimensions, interactive Rule-Graphs with sliders for
effective presentation of multidimensional relations, allowing the user to “feel” the discovered
relation. In addition, there is a special Frequencies function providing for a quick and thorough
visualization of the distribution of categorical, integer, or yes/no variables.

Results Reporting
Discovered relations are readily incorporated in existing DSS or EIS systems. The Print Form
feature provides for the generation of an advanced output including a mixture of text,
graphics, and system reports. A PolyAnalyst project file contains all the results of the
performed data exploration. Created datasets and summary statistics can be exported to MS
Excel. Hands-on Evaluation to master PolyAnalyst the user can follow the carefully
documented examples step by step.

Other Capabilities

 Unlocks value hidden in massive volumes of data and text


 Solves many typical text analysis tasks:
 Categorization
 Clustering
 Taxonomy building
 Entity extraction
 Natural language search
 Multi-dimensional reporting
 Visual link analysis
 Enterprise level scalability
 Visual creation of analysis scenarios
 Interactive visualization and drill-down
 Statistical analysis
 Association Discovery
 Multivariate Analysis
 Forecasting
 Regression Analysis
 Time Series

PolyAnalyst limitation
For text analysis, machine learning should be applied more. Connection with other software
like SharePoint and Smart sheet should be easier.

Reporting:-
PolyAnalyst use some supervise or unsupervised leaning algorithm to run self-learning engine
that enables the data analyst to create custom reports delivering key results of the analysis to
business users across the organization in a clean, consistent and easy to comprehend format.
It uses fourteen machine learning algorithm. Interactive reports include a mixture of graphs,
tables, numbers, text and links to other PolyAnalyst objects. Reports can be scheduled for re-
execution at a given time to provide business users with results based on the analysis of the
most up-to-date data. Static snapshots of reports can be exported to PDF, HTML and RTF
format.

Technical:-

1. Data loading and integration


a) Data Sources: ODBC, OLEDB, XML, CSV, MS Excel, Web, File System, FTP
b) Document Formats: PDF, ASCII, HTML, MS Word, MS RTF, RSS Feeds
c) Data type: integer, numerical, Boolean, string etc.
d) Character Formats: ASCII, Latin-1, Double-byte, UTF
e) Web Data sources
2. Data integration
a) Data joins on sets of keys
b) Data merging
c) Referencing data sets in other PolyAnalyst™ projects
d) Exporting results of the analysis to external RDBMS

3. Data cleansing, manipulation and exploratory analysis


a) Attribute name, type and value mapping
b) Data aggregation
c) Data consolidation
d) Expanding and collapsing transactions
e) Data transformations
f) Sampling and partitioning
g) Derivation of new attributes
h) OLAP
i) Statistics
j) Link analysis
k) Geospatial mapping
a) Text Mining
b) Text cleansing and normalization
c) User driven analysis
d) Data driven analysis
e) Languages
f) Customization: support for domain-specific dictionaries
g) Customization: domain-specific taxonomies for

6. Scalability
a) Client/Server implementation
b) Client – Server communications over TCP/IP protocol
c) Utilization of hard disk instead of RAM
d) Scalable implementation of algorithms
e) 64-bit implementation available
f) Analytic scenario development prior to actual data loading

7. Usability
a) Interactive drag-and-drop experience throughout the system
b) Visual development of reusable data analysis scenarios
c) Tight integration of analytical and reporting applications
d) Visual creation of nice looking interactive reports for business users
e) Publishing reports to popular document formats for better collaboration
f) Group nodes
g) Subject areas

8. Security
a) User name and password based authentication
b) Support for LDAP and MS Windows based authentication
c) Fully encrypted client-server communications
d) User activities logging e. Compliance with HIPAA regulations

9. PolyAnalyst Hadoop integration


a) Working with data in HDFS
b) Data does not enter PolyAnalyst
c) Analysis is performed on Hadoop cluster
d) PolyAnalyst can export data to Hadoop
e) Implemented text indexing, taxonomy categorization, machine learning, and more
f) Blazing speed analysis

You might also like