This action might not be possible to undo. Are you sure you want to continue?
From Wikipedia, the free encyclopedia
Jump to: navigation, search
A data visualization of Wikipedia as part of the World Wide Web, demonstrating hyperlinks Data visualization is the study of the visual representation of data, meaning "information which has been abstracted in some schematic form, including attributes or variables for the units of information". According to Friedman (2008) the "main goal of data visualization is to communicate information clearly and effectively through graphical means. It doesn¶t mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way. Yet designers often fail to achieve a balance between design and function, creating gorgeous data visualizations which fail to serve their main purpose ² to communicate information". Data visualization is closely related to Information graphics, Information visualization, Scientific visualization and Statistical graphics. In the new millennium data visualization has become active area of research, teaching and development. According to Post et al (2002) it has united the field of scientific and information visualization".
1 Data visualization scope 2 Related fields o 2.1 Data acquisition o 2.2 Data analysis o 2.3 Data governance o 2.4 Data management
On the other hand. . Frits H. data acquisition typically involves acquisition of signals and waveforms and processing the signals to obtain desired information. On this way Friendly (2008) presumes two main parts of data visualization: statistical graphics. Sometimes abbreviated DAQ or DAS. In this line the "Data Visualization: Modern Approaches" (2007) article gives an overview of seven subjects of data visualization: y y y y y y y Mindmaps Displaying news Displaying data Displaying connections Displaying websites Articles & resources Tools and services All these subjects are all close related to graphic design and information representation. which is acquired by data acquisition hardware. The components of data acquisition systems include appropriate sensors that convert any measurement parameter to an electrical signal. and thematic cartography.y y y y o 2. Post (2002) categorized the field into a number of sub-fields:  y y y y y y Visualization algorithms and techniques Volume visualization Information visualization Multiresolution methods Modelling techniques and Interaction techniques and architectures  Related fields  Data acquisition Data acquisition is the sampling of the real world to generate data that can be manipulated by a computer. from a computer science perspective.5 Data mining 3 See also 4 References 5 Further reading 6 External links  Data visualization scope There are different approaches on the scope of data visualization. One common focus is on information presentation such as Friedman (2008) presented it.
. It is usually used by business intelligence organizations.  Data mining Data mining is the process of sorting through large amounts of data and picking out relevant information. Types of data analysis are: y y Exploratory data analysis (EDA): an approach to analyzing data for the purpose of formulating hypotheses worth testing. processes and technology required to create a consistent. photographs. practices and procedures that properly manage the full data lifecycle needs of an enterprise. The official definition provided by DAMA is that "Data Resource Management is the development and execution of architectures. enterprise view of an organisation's data in order to: y y y y y Increase consistency & confidence in decision making Decrease the risk of regulatory fines Improve data security Maximize the income generation potential of data Designate accountability for information quality  Data management Data management comprises all the academic disciplines related to managing data as a valuable resource. for example words. It was so named by John Tukey. where the EDA focuses on discovering new features in the data." This definition is fairly broad and encompasses a number of professions which may not have direct technical contact with lowerlevel aspects of data management.  Data governance Data governance encompasses the people. complementing the tools of conventional statistics for testing hypotheses. In statistical applications. such as relational database management. policies. Data analysis Data analysis is the process of looking at and summarizing data with the intent to extract useful information and develop conclusions. and financial analysts. but data mining tends to focus on larger data sets. but is increasingly being used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods. Data analysis is closely related to data mining. Qualitative data analysis (QDA) or qualitative research is the analysis of non-numerical data. with less emphasis on making inference. etc. some people divide data analysis into descriptive statistics. and CDA on confirming or falsifying existing hypotheses. . exploratory data analysis and confirmatory data analysis. observations. and often uses data that was originally collected for a different purpose.
Understanding may involve detection. or complex structure Various units Discrete or continuous Spatial. symbolic (or mix) Scalar.It has been described as "the nontrivial extraction of implicit. temperature with color) Has semantics which may be crucial in graphical consideration . shapes. multilevel Independent or dependent Multidimensional Single or multiple sets May have similarity or distance metric May have intuitive graphical representation (e. OVERVIEW OF DATA VISUALIZATION Matthew Ward. structural Accurate or approximate Dense or sparce Ordered or non-ordered Disjoint or overlapping Binary. and comparison. Characteristics of Data y y y y y y y y y y y y y y y y Numeric. processes. vector. WPI CS Department Definitions y y y y Visualization is the graphical presentation of information. text) and attributes (color. previously unknown.g. according to Monk (2006). measurement. relations." In relation to enterprise resource planning. position. and is enhanced via interactive techniques and providing the information from multiple views and with multiple techniques. lines. Information may be data. enumerated. category. Graphical presentation may entail manipulation of graphical entities (points. looking for patterns that can aid decision making". temporal. with the goal of providing the viewer with a qualitative understanding of the information contents. size. relational. or concepts. quantity. and potentially useful information from data" and "the science of extracting useful information from large data sets or databases. data mining is "the statistical and logical analysis of large sets of transaction data. images. shape).
relative position/motion What do we see and how well do we see it? y y y y y Different viewers perceive different graphical/spatial/color in different degrees Context varies our sensitivity According to one researcher (Cleveland).g. f2(x. e. in increasing inaccuracy 1. Graphical entities and attributes y y Entity: point.9 1. t) -> 0 or more values of elements with position x and temp t. Length 4. polyline. Hue/saturation/intensity (informally derived) Weber's law .g. y) -> (t. Lie factor = size of visual effect/size of data effect .5 .8 for volume What makes a good visualization? y y Effective: the viewer gets it (ease of interpretation) Accurate: sufficient for correct quantitative evaluation. This is OK for statistics (e. p) -> 0 or 1 The key is that the mapping must go to a single value (or vector). f(x. Area 6. Position along a common scale 2. f4(x. y) -> p f3(x.g. t) -> 0 or 1.What is the dimension of data? Assume function with a domain and range. Position along identical. . y. y. hidden surfaces in projection). power is . f(x.. y) -> t. y. location..9 for area. not scale Stevens' law . histogram). If for every x and y we have temperature t and pressure p. text Attribute: color/intensity.detection is proportional to percent change. solid.6 . line. glyph. therefore losing information (e. t. Volume 7. surface. image.perceived scale is proportional to a power of the actual scale. non-aligned scales 3. style. .1 for length. size. Angle/slope 5. p) -> 0 or 1 f5(x. p) f1(x.
maximize data-ink ratio. brase redundant data-ink Aesthetics: must not offend viewer's senses (e. brase non-data-ink.y y y Efficient: minimize data-ink ratio and chart-junk. show data.to n-dimensional Maps: one of most effective Images: use color/intensity instead of distance (surfaces) 3-D surfaces and solids isosurfaces/slices translucency stereopsis animation . to compress information. get statistics Use random jiggling to separate overlaps Use multiple views to handle hidden relations. high dimensions Use effective grids. keys and labels to aid understanding Interacting with the data y y y y y y y Dynamically adjust mapping Tour data by varying views Labeling to get original data Deleting to eliminate clutter Brushing/Highlighting to see correspondence in multiple views Zooming to focus attention Panning to explore neighborhoods Common Techniques y y y y y y y y y y Charts: bar or pie Graphs: good for structure. relationships Plots: 1. logs) to emphasize changes Use projections.g. moire patterns) Adaptable: can adjust to serve multiple needs Mapping data to graphics y y y y y y y Examine cardinality of dimension with detectible variations in graphics (see below for evaluation under human perception) Use scaling and offset to fit in range Use derived values (residuals. other combinations.
Our approach has been evaluated with benchmark time series. we present a temporal data clustering framework via a weighted clustering ensemble of multiple partitions produced by initial clustering analysis on different temporal data representations.TEMPORAL CLUSTERING Temporal data clustering provides underpinning techniques for discovering the intrinsic structure and condensing information over temporal data. which cuts the information loss in a single representation and exploits various information sources underlying temporal data. motion trajectory and time-series data stream clustering tasks. As a result. As our weighted cluster ensemble algorithm can combine any input partitions to generate a clustering ensemble. we also investigate its limitation by formal analysis and empirical studies.g. we propose a weighted consensus function guided by clustering validation criteria to reconcile initial partitions to candidate consensus partitions from different perspectives and then introduce an agreement function to further reconcile those candidate consensus partitions to a final partition.. Dendrograms are often used in computational biology to illustrate the clustering of genes or samples. In this paper. -gramma "drawing") is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. the proposed weighted clustering ensemble algorithm provides an effective enabling technique for the joint use of different representations. In our approach. For a clustering example. suppose this data is to be clustered using Euclidean distance as the distance metric. In addition. DENDOGRAMS : A dendrogram (from Greek dendron "tree". Raw data . the number of clusters. our approach tends to capture the intrinsic structure of a data set. Simulation results demonstrate that our approach yields favorite results for a variety of temporal data clustering tasks. e.
.The hierarchical clustering dendrogram would be as such: Traditional representation Here the top row of nodes represent data.. forming a cluster which can be regarded as a "new" object... (CHECK THIS LINK) Dendrograms are often used for displaying relationships among clusters.. Objects which are closest to each other in the multidimensional data space are connected by a horizontal line.html . The new cluster and the remaining original data are again searched for the closest pair. http://docs..anu.... and the remaining nodes represent the clusters to which the data belong. and the arrows represent the distance.comptencies..pdf+DENDROGRAM+DATA+STRUCTURE&hl=en&gl=in &pid=bl&srcid=ADGEESgyP3lksuKddQ2Hk7KfG0tyBc2rNFLrY9ajB3idJpj1k3pe7tXsTJpyTq wSmC_N6rQI-JnyFkwlZN9HWATETA6Qj-r6fttmgaZpCGlaKT3vYFjE75vKQLzpnT9XONi2Law8w07&sig=AHIEtbT8xLK_pz7kKbPnpm5GHu SnEGU0Yw (CHECH THIS LINK FOR DENDOGRAM DATASTRUCTURE EXAMPLE) ..org/documents/resear ch/results/IA. A dendrogram shows the multidimensional distances between objects in a tree-like structure. The distance of the particular pair of objects (or clusters) is reflected in the height of the horizontal line. and so on..edu. http://arts.au/bullda/dendrograms.com/viewer?a=v&q=cache:hZKwMcIHBvAJ:iainstitute.graphs.google.
The results of a cluster analysis are often displayed as dendrograms which show the multidimensional relationships as a two dimensional line plot. minimal spanning trees non-hierarchical methods e. density estimators by potential functions display methods e. principal component plots. These methods are mostly based on calculations of the distance of the observations in multidimensional data space.g. dendrograms. Cluster analysis can help to find such clusters automatically.Cluster Analysis" is the generic term for multivariate methods which attempt to find structures ("clusters") in the data.g. Basically. In general. cluster analysis methods can be grouped into several categories: y y y hierarchical methods (agglomerative and divisive clustering) e. or non-linear mapping plots .g. cluster analysis will give answers to one of the following three questions: y y y How many classes (clusters) can be observed in a data set? Which objects belong to which classes? How consistent are the classes? At right you see a plot of about 150 data of three different kinds of flowers (50 each) which clearly show two clusters.
a relational database actually contains a set of tables. for example. Future data refers to data which is considered to be valid at a future time instant (but not now). Past or future data is not stored. A row contains data about a specific entity.b-eye-network. http://www. A DBMS stores the data in a well-defined format. Informix and O2 allow the storage of huge amounts of data. Past data usually is overwritten with new (updated) data. Each table contains rows (tuples) and columns (attributes). for . Each column specifies a certain property of these entities.com/view/6300 (CHEK DIS LINK FOR TEMPORAL DATA) What are Temporal Databases? y y y Non-Temporal Databases Temporal Databases Different Forms of Temporal Databases Non-Temporal Databases Commercial database management systems (DBMS) such as Oracle. Thus. Past data refers to data which was stored in the database at an earlier time instant and which might has been modified or deleted in the meantime. stores data in tables (also called relations). an employee. A relational DBMS. Sybase.Dendrograms are heavily dependant upon the measure used to calculate the distances between the objects. This data is usually considered to be valid now. for example.
the following table could result: . Assume we would like to store data about our employees with respect to the real world. tuples are timestamped. Valid time denotes the time period during which a fact is true with respect to the real world. Sets of objects of the same type are called collections. salary etc. So each employee is actually an object. Transaction time is the time period during which a fact is stored in the database. One approach is that a temporal database may timestamp entities with time periods. The type of an object specifies the properties the object has. the employee's name. 1998. An employee object thus has properties such as a name. where as the transaction time starts when we insert the facts into the database. for example. Thus . the other one is the transaction time. Imagine that we come up with a temporal database storing data about the 18th century. Temporal Databases Temporal data strored in a temporal database is different from the data stored in non-temporal database in that a time period attached to the data expresses when it was valid or stored in the database.a database contains a set of collections. objects and/or attribute values may be timestamped. they do not keep track of past or future database states. it becomes possible to store different database states.example. Another approach is the timestamping of the property values of the entities. a salary etc. This allows the distinction of different database states. One is called the valid time. January 21. In the relational data model. Note that these two time periods do not have to be the same for a single fact. there are mainly two different notions of time which are relevant for temporal databases. What time period do we store in these timestamps? As we mentioned already. A first step towards a temporal database thus is to timestamp the data. The valid time of these facts is somewhere between 1700 and 1799.in an object-oriented DBMS . Then. The following table stores data about employees: EmpID Name Department Salary 10 12 13 John Sales 12000 10500 15500 George Research Ringo Sales Object-oriented DBMS store data about entities in objects. where as in objectoriented data models. conventional databases consider the data stored in it to be valid at time instant now. As mentioned above. By attaching a time period to the data.
A historical database stores data with respect to valid time. Note that it is now possible to store information about past states. Then he changed to the sales department.allow the distinction of different forms of temporal databases. As we mentioned above. having a salary of 11000. A snapshot database in the context of valid time and transaction time is depicted in the following picture: . Different Forms of Temporal Databases The two different notions of time . Thus. We see that Paul was employed from 1988 until 1995. still earning 11000.valid time and transaction time . a rollback database stores data with respect to transaction time. The upper bound INF denotes that the tuple is valid until further notice. In 1993. In the corresponding non-temporal table. usually the most recent state. commercial DBMS are said to store only a single state of the real world. this information was (physically) deleted when Paul left the company. A bitemporal database stores data with respect to both valid time and transaction time. Such databases usually are called snapshot databases. The attributes ValidTimeStart and ValidTimeEnd actually represent a time interval which is closed at its lower and open at its upper bound. employee John was working in the research department. we see that during the time period [1985 1990).EmpID Name Department Salary ValidTimeStart ValidTimeEnd 10 10 10 11 12 13 John John John Paul Research Sales Sales Research 11000 11000 12000 10000 10500 15500 1985 1990 1993 1988 1991 1988 1990 1993 INF 1995 INF INF George Research Ringo Sales The above valid-time table stores the history of the employees with respect to the real world. he got a salary raise to 12000.
since it is managed by the system directly which does not know anything about future states. Of course. a temporal DBMS such as TimeDB does not store each database state separately as depicted in the picture below. a valid-time table (storing when the data is valid wrt. Additionally.On the other hand. the real world). temporal modification statements and temporal constraints. It stores valid time and/or transaction time for each tuple. A table in the bitemporal relational DBMS TimeDB may either be a snapshot table (storing only current data). Note that the history of when data was stored in the database (transaction time) is limited to past and present database states. it supports temporal queries. a transactiontime table (storing when the data was recorded in the database) or a bitemporal table (storing both valid time and transaction time). The states stored in a bitemporal database are sketched in the picture below. Existing tables may also be altered (schema versioning). a bitemporal DBMS such as TimeDB stores the history of data with respect to both valid time and transaction time. . as described above. An extended version of SQL allows to specify which kind of table is needed when the table is created.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.