Visual Analytics using Multidimensional Projections

EuroVis 2013 Workshop (June 19, 2013; Leipzig, Germany)

Overview

The workshop has passed. We now solicit contributions to a special issue in Neurocomputing. Authors who did not submit to the workshop are encouraged to submit, too! The deadline for special-issue submission is December 1st, 2013.

Dimensionality reduction is an active area in machine learning. New techniques have been proposed for more than 50 years, for instance, principal component analysis, classical scaling, isomap, probabilistic latent trait models, stochastic neighbor embedding, and neighborhood retrieval visualization. These techniques facilitate the visualization of high-dimensional data by representing data instances as points in a two-dimensional space in such a way that similar instances are modeled by nearby points and dissimilar instances are modelled by distant points.

Although many papers on these so-called “embedding” techniques are published every year, which aim to improve visual representations of high-dimensional data, it appears that these techniques have not gained popularity in the EuroVis community due to the inherent complexity of their interpretation.

At the cross-section of information visualization, machine learning, and graph drawing, this workshop will focus on issues that embedding techniques should address to bridge the gap with the information visualization community. Below is a (non-exhaustive) list of topics that we aim to address during the workshop:

Stability: Nonlinear embedding techniques are more efficient at preserving similarities than linear ones. However, non-linearities generate local optima as a result of which different initializations lead to different representations of the same data. The differences between these embeddings of the same data create confusion for the analyst, who is unable to grasp the common facts across the different visualizations. How can we design efficient and stable nonlinear embeddings?

Embedding of dynamic data: Embedding usually projects all the data at once; when new data arrive, how can we embed these data without modifying the current embedding too much?

Multiple methods: Each embedding algorithm necessarily comes with its own set of built-in underlying assumptions, and knowledge of these assumptions is often helpful in making sense of the visual output. How can we design black-box visualization methods that demand less understanding of underlying assumptions from the side of the analyst?

Evaluation and subjectivity: Visual interpretation is inherently subjective. How can we help analysts to verify whether an eye-catching pattern is real/essential or whether it just happens to be an artefact?

Inference and interactions: Nonlinear embedding techniques produce points clouds in which the axes have no meaning and pairwise distances are approximations which may have many artefacts. What kinds of analytical tasks can be performed with such embeddings? How can we better convey the meaning of the embeddings to analysts?

Feedback: The human eye is excellent at visual analysis, and can identify regularities and anomalous data even without having to define an algorithm. How can we make use of this ability to enhance the predictive performance of machine learning and embedding techniques?

Input data: Currently, the input data in embedding techniques typically comprises high-dimensional feature vectors or pairwise distance between objects. However, this is not always the kind of data that analysts encounter in practice. How can embeddings be constructed based on partial similarity rankings, associations or co-occurences of objects, heterogeneous data, data with missing values, relations between objects, structured objects, etc.?

Optimizing embeddings for visual analysis: nonlinear embeddings are found by optimizing mathematical goodness-of-fit measures. Instead of using off-the-shelf embedding methods, can the measures and methods be designed so that the optimized embeddings will be good for carrying out concrete low-level or high-level analysis tasks from the visualization?

Sponsors

CITEC