Professional Documents
Culture Documents
Guan Jingyi AATRN-CIMAT 2023
Guan Jingyi AATRN-CIMAT 2023
Data
• article id — the unique identifier for the article
• year — the year of publication for the article
• concept — the concept
• relevance mean — the “relevance” score of the concept for the article; higher scores Future Work
mean more relevant in category (for applied mathematics) A critical task is to set an optimal threshold for concept frequency, which will
• frequency — how many times the concept appears in the field of the article in total - 2,3,4,5: Building papers define the selection of concepts to be included within the knowledge networks.
- 1,6,7,8: Destroying papers
• number of times cited — number of times this paper has been cited To do this, we aim to identify the network with high stability and a good amount
- 1,6,7: Tentpole papers
- 5: Opening paper of topological features with linear programming. Specifically, we
Topological Data Analysis - 8: Closing paper - Constructed multiple knowledge networks based on different
• Persistent homology provides a systematic way to study cycles in a knowledge network concept frequency cutoffs determined by different quantiles of
• We track cycles through time, characterizing when they first appear (are born) and when all frequencies
they no longer remain (die) - Conducted persistent homology analysis for each network to
• These cycles can be inferred as knowledge gaps. produce a space of different persistence diagrams
- Filtered out networks lacking sufficient amount of topological
features
- Will select the most stable network which has the lowest
average Euclidean distance between its persistence diagram's
persistence image (PI) and its adjacent diagrams’ PI
Goals
- Identify holes of varying complexity (dimensionality)
- Identify their opening, closing, and duration Survival Analysis of Knowledge Gaps
- Characterize the papers and concepts that participate in the formation and closure of
them with OAT*
*OAT (Open Applied Topology) is a computational toolset developed under the
ExHACT project supported by the NSF (DMS-1854703), leveraging sparse and
lazy computational methods to directly analyze high-dimensional data's
topological structure, enabling the study and localization of geometric patterns
within datasets.
Implications
• The growth and dynamics of scientific fields can be understood in
A cycle topological terms and analyzed with topological tools like persistent
We conducted a survival analysis among all knowledge gaps (topological holes) in the homology.
formed
network up to dimension 4. As illustrated by the Kaplan-Meier curves above, we found • Knowledge gaps with greater complexity or scope take longer to form.
that holes of higher dimensions were born later in time, and at any given year, the • Papers that are involved in the formation or the closure of knowledge gaps
proportion of the holes that are still alive is higher for higher dimensions. may exhibit higher importance in the field.