You are on page 1of 1

Topological Data Analysis of Knowledge Networks

Adam Schroeder, Jingyi Guan, Prof. Lori Ziegelmeier (Macalester College),


Prof. Russell Funk (University of Minnesota), Prof. Jason Owen-Smith (University of Michigan)
BCS-2318171
DMS-1854703
Science of Science Mean Citations by Paper Categories Hole Opening and Closure Over Time
• Science of Science involves analyzing big datasets to uncover the mechanisms driving - To investigate the importance of papers contributing to distinct topological features in the field of
scientific progress, ranging from selecting research topics to understanding career applied mathematics and discern if papers involved in these features receive more citations, we
trajectories and progress within a specific field. categorized all papers within the dataset into the following five categories: (as illustrated below)
Building Papers: Papers/edges on the boundary of the hole in the network.
• Science can be conceptualized as knowledge networks in which the concepts are Destroying Papers: Papers/edges not on the boundary of the hole.
nodes, and two concepts get connected with an edge/paper when they appear in the Tentpole Papers: Papers/edges where at least one end involves a concept/node not on the
same paper abstract and are both considered relevant enough to that article. We weigh boundary of the hole.
the edge with a standardized year of publication of that article. Opening Papers: The most recent paper among the building papers.
Closing Papers: The most recent paper among the destroying papers.
• In this study, we use the dataset of article publication information in the field of Applied
Mathematics to build knowledge networks with the concepts (that show up 1k to 15k - We found that the mean citation is the highest for papers that build at least one hole and destroy at
times* in the field) in order to study how science evolves over time. least one other hole among all paper categories. More notably, the papers that are involved in
*Please note: The frequency cutoff for concepts is not yet optimized and our topological holes have a much higher mean citation than those that are not.
proposed optimization algorithm will be discussed in the Future Work section.

Data
• article id — the unique identifier for the article
• year — the year of publication for the article
• concept — the concept
• relevance mean — the “relevance” score of the concept for the article; higher scores Future Work
mean more relevant in category (for applied mathematics) A critical task is to set an optimal threshold for concept frequency, which will
• frequency — how many times the concept appears in the field of the article in total - 2,3,4,5: Building papers define the selection of concepts to be included within the knowledge networks.
- 1,6,7,8: Destroying papers
• number of times cited — number of times this paper has been cited To do this, we aim to identify the network with high stability and a good amount
- 1,6,7: Tentpole papers
- 5: Opening paper of topological features with linear programming. Specifically, we
Topological Data Analysis - 8: Closing paper - Constructed multiple knowledge networks based on different
• Persistent homology provides a systematic way to study cycles in a knowledge network concept frequency cutoffs determined by different quantiles of
• We track cycles through time, characterizing when they first appear (are born) and when all frequencies
they no longer remain (die) - Conducted persistent homology analysis for each network to
• These cycles can be inferred as knowledge gaps. produce a space of different persistence diagrams
- Filtered out networks lacking sufficient amount of topological
features
- Will select the most stable network which has the lowest
average Euclidean distance between its persistence diagram's
persistence image (PI) and its adjacent diagrams’ PI

Goals
- Identify holes of varying complexity (dimensionality)
- Identify their opening, closing, and duration Survival Analysis of Knowledge Gaps
- Characterize the papers and concepts that participate in the formation and closure of
them with OAT*
*OAT (Open Applied Topology) is a computational toolset developed under the
ExHACT project supported by the NSF (DMS-1854703), leveraging sparse and
lazy computational methods to directly analyze high-dimensional data's
topological structure, enabling the study and localization of geometric patterns
within datasets.

Visualization of the Closure of a 1D Hole in


Applied Mathematics

Implications
• The growth and dynamics of scientific fields can be understood in
A cycle topological terms and analyzed with topological tools like persistent
We conducted a survival analysis among all knowledge gaps (topological holes) in the homology.
formed
network up to dimension 4. As illustrated by the Kaplan-Meier curves above, we found • Knowledge gaps with greater complexity or scope take longer to form.
that holes of higher dimensions were born later in time, and at any given year, the • Papers that are involved in the formation or the closure of knowledge gaps
proportion of the holes that are still alive is higher for higher dimensions. may exhibit higher importance in the field.

You might also like