Naoki Saito, at UC Davis since 1997, was awarded Best Author and Best Paper presented to him by the Japan Society for Industrial and Applied Mathematics in connection with two of his recent papers.
In recent years, the advent of new sensors, measurement technologies, and social network infrastructure has provided many research opportunities for recording data of interest at various locations in complex networks, visualizing and analyzing complicated interconnected network structures, and making inferences and diagnostics. Network-based problems are ubiquitous: They appear in biology and medicine (e.g., voltages at dendrites connected to neurons, blood flow rates in a network of blood vessels); computer science (e.g., Internet traffic, email exchanges); electrical engineering (e.g., sensor networks); hydrology and geology (e.g., river flow measurements in a ramified river network); and civil engineering (e.g., traffic flow on a road network), to name just a few. Consequently, there is an explosion of interest and demand to analyze data sampled from irregular grids, graphs, and networks.
What about the mathematical and computational tools for analyzing such datasets? Time-honored harmonic analysis tools such as Fourier and wavelet transforms have been the “crown jewels” for examining regularly-sampled data. These tools have a wide range of applications, e.g. data compression, image analysis, and statistical signal processing. However, these conventional tools cannot directly handle datasets recorded on general graphs and networks: a fundamental difficulty is a lack of a proper notion of frequency on general graphs.
In collaboration with his former Ph.D. student Jeff Irion, Naoki Saito has developed the Hierarchical Graph Laplacian Eigen Transform (HGLET) and the Generalized Haar-Walsh Transform (GHWT). These transforms generate a redundant collection of basis vectors called basis dictionaries, consisting of a large number of orthonormal bases through which one can view an input signal defined on the graph nodes from many different perspectives. The fast best basis algorithm can select an orthonormal basis from these dictionaries that is most suitable (or offers the “best view”) for a given task, such as compression and classification. In the graph setting, HGLET and GHWT are true generalizations of classical hierarchical block discrete cosine transform and Haar-Walsh wavelet packets, respectively. Note that these classical tools are adopted for the JPEG and JPEG 2000 image compression standards. The essential idea behind them is as follows:
1) generate a hierarchical set of subgraphs by recursively bipartitioning a given graph; and
2) compute the expansion coefficients of an input signal on the nodes of each subgraph relative to the eigenvectors of Laplacian matrix for that subgraph (HGLET) or the basis vectors generated by recursive averaging and differencing operations (GHWT). Note that the Laplacian matrix of a graph is an analog of the finite difference approximation of the Laplace operator for the graph. Hence it is natural to use its eigenvectors for analyzing data on the graph: sines and cosines are the central objects in classical Fourier and Harmonic Analysis; they are nothing but the eigenfunctions of the Laplace operator on an interval subject to certain boundary conditions.
One of the important emerging applications of these newly developed tools is analyzing matrix data. For example, ratings of commercial products by their users lead to a matrix in which columns represent products, rows represent users, and matrix entry ai j represent user i’s rating of product j, say, on a 1–5 scale. Another example is a term-document matrix. Here, the rows correspond to words and the columns correspond to documents, and matrix entry ai j is typically the (relative) frequency with which word i appears in document j. In the case that the documents are journal articles from different fields, one would expect the prevalence and absence of certain words to be relatively consistent across documents from within each field. Such matrices are quite different from usual images and photographs. In fact, they are often more like shuffled and permuted images, possessing no spatial smoothness or coherency in general. Yet, those rows and columns are interrelated, and thus the rows of the matrix can tell us about the underlying structure of the columns, and vice versa. But how can we do that? Tools developed by Irion and Saito now come to our rescue. They allow us to discover, learn, and exploit hidden dependency and geometric structure in the matrix data in an innovative way. Here is their strategy:
Step 1: Apply spectral co-clustering to a given matrix in order to recursively bipartition its rows and columns
Step 2: Analyze row vectors of the input matrix using the GHWT dictionaries based on the column partitions and extract the basis that most sparsifies its column representation, i.e. the column best basis
Step 3: Repeat Step 2 with interchanging columns and rows to extract the row best basis
Step 4: Expand the input matrix with respect to the tensor product of the column and row best bases
Step 5: Analyze the expansion coefficients for a variety of tasks, e.g., compression, classification, etc.
In Step 1, the spectral co-clustering method, initially developed by I. Dhillon in 2001, is recursively applied to an input matrix to organize its rows and columns. Given a nonnegative data matrix A, Dhillon’s method treats its rows and columns as the two sets of nodes in a bipartite graph, where matrix entry ai j is the edge weight between the node for row i and the node for column j. Fig. 1a illustrates a small scale example. Using the Fiedler vector, i.e. eigenvector corresponding to the smallest positive eigenvalue of the Laplacian matrix of the resulting bipartite graph, the rows and the columns of A can be partitioned simultaneously. Here is a real example: the term-document matrix consisting of 1042 columns representing documents obtained from the Science News website and 1153 rows representing preselected most relevant words. Each document belongs to one of the eight categories ranging from Anthropology to Physics. Fig. 1b shows that both words and documents are embedded in the same Euclidean space using the spectral co-clustering! At the left side of the boomerang shape cluster, a majority of documents are of Astronomy (blue points) and the related words (e.g. extrasolar) are clustered closely to those documents while the right side of the boomerang mainly consists of Medical Science documents (red points) and the related words (e.g., tamoxifen).
Fig. 2b shows the reordered matrix using the Irion-Saito algorithm; the total number of orthonormal bases searched via their algorithm exceeds 10370. It is reassuring to see that the resulting best basis does in fact sparsify the input matrix nicely. However, the fine scale information may be too much emphasized in the original algorithm, which may be sensitive to “noise.” More robust and useful information is contained at the medium scale coefficients in this matrix. To emphasize such coefficients, one possibility is to weight all the coefficients in the GHWT dictionaries depending on the size of the subgraphs so that the magnitude of the finer coefficients become bigger, which discourages the best basis algorithm from selecting fine scale subgraphs. This new best basis sparsifies A less than before, yet well captures information on intermediate scales as shown below. Fig. 2c shows the partition pattern of the column best basis: the vertical axis indicates the scale information from the top (the coarsest) to the bottom (the finest) whereas the horizontal axis indicates the reordered column indices. In this figure, two coarse scale blocks stand out; one of them is marked with the red circle. Fig. 2d shows the column best basis vectors corresponding to the coefficients marked by the red circle in Fig. 2c, whose support corresponds to 51 documents with 48 belonging to Astronomy. The titles of the remaining three turn out to be the following:
• “Old Glory, New Glory: The Star-Spangled Banner gets some tender loving care” (Anthropology; on the preservation of the Star-Spangled Banner (flag) using the space-age technology)
• “Snouts: A star is born in a very odd way” (Life Sciences; on star-nosed moles)
• “Gravity tugs at the center of a priority battle” (Math/CS; on the priority war on the discovery of gravity between Newton, Halley, and Hooke)
These three articles are picked out along with the Astronomy articles because they contain some astronomical keywords as one can see. Similarly, one can analyze another coarse block in Fig. 2c, which supports 62 documents: 56 of which are in Medical Sciences and the remaining six are non-medical articles that just happen to contain medical terms.These results are based on the examination of the particular best basis vector and the corresponding coefficients at one intermediate scale. Saito thinks that their approach can further advance the state of clustering and classification of documents by linking information at multiple scales. In collaboration with Eugene Shvarts, a current Ph.D. student of his, Saito has been investigating basis vector selection algorithms that are fundamentally different from the above best basis algorithm. He envisions that these tools will be useful for finding “patterns” hidden in unorganized massive datasets of various kinds.