You are on page 1of 10

Available online at www.sciencedirect.

com
Available online at www.sciencedirect.com
ScienceDirect
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2022) 000–000
Procedia Computer Science 00 (2022) 000–000 www.elsevier.com/locate/procedia
ScienceDirect www.elsevier.com/locate/procedia

Procedia Computer Science 207 (2022) 1695–1704

26th International Conference on Knowledge-Based and Intelligent Information & Engineering


26th International Conference on Knowledge-Based
Systems (KES 2022)
and Intelligent Information & Engineering
Systems (KES 2022)
Models of Exchanged Datasets and Interactions of Buyers in the
Models
Data of Exchanged
Market: Datasets andSimulators
Toward Multi-Agent Interactions
forofSystem
BuyersDesign
in the
Data Market: Toward Multi-Agent Simulators for System Design
Teruaki Hayashia*, Hiroyasu Matsushimab, Hiroki Sakajia,
Teruaki Hayashi a
Yoshiaki*, Hiroyasu
Fukamic,Matsushima
b
, Hiroki
Takumi Shimizu d Sakajia,
a
Yoshiaki Fukamic, Takumi Shimizud
Department of Systems Innovation, School of Engineering, The University of Tokyo, Tokyo, Japan
b
a Center
Department for DataInnovation,
of Systems Science Education
School ofand Research, Shiga
Engineering, University,
The University of Shiga,
Tokyo,Japan
Tokyo, Japan
c
Departmentb Center
of International Digital and
for Data Science Designand
Education Management, TokyoUniversity,
Research, Shiga University Shiga,
of Science,
JapanTokyo, Japan
d
c Faculty ofDigital
Department of International Policy and
Management, Keio University,
Design Management, TokyoKanagawa,
University Japan
of Science, Tokyo, Japan
d
Faculty of Policy Management, Keio University, Kanagawa, Japan

Abstract
Abstract
Value creation by reusing data as exchangeable resources has been widely studied as a new source of innovation, resulting in the
establishment
Value creationofbya reusing
data market
data and its ecosystem.resources
as exchangeable However, owing
has been to the nascent
widely studiednature of the
as a new market,
source observable resulting
of innovation, information on
in the
exchanged
establishmentdatasets and market
of a data interactions among
and its stakeholders,
ecosystem. However,such as buyers
owing to theand providers,
nascent natureinofthe
themarket areobservable
market, limited. Therefore, a lack
information on
of observable
exchanged information
datasets that contributes
and interactions amongtostakeholders,
designing thesuch
market systemand
as buyers andproviders,
formulatingin regulations
the market areto promote
limited. the sound growth
Therefore, a lack
of observable
data markets exists. This
information thatstudy modeled
contributes exchangedthe
to designing datasets
marketand dataand
system buyers as agents,
formulating the smallest
regulations units ofthe
to promote thesound
data growth
market
components,
of data markets andexists.
prepared
Thisseven
studyscenarios for four market
modeled exchanged sizes.
datasets andWedatasimulated
buyers asthe effectstheofsmallest
agents, datasetsunits
and agent
of the models with
data market
different market
components, andsizes on the
prepared agents’
seven data purchases
scenarios for four and emergence
market sizes. Weof simulated
popular datasets and discussed
the effects of datasetstheand
factors
agentthat affectwith
models the
purchase market
different frequency distribution
sizes of thedata
on the agents’ datasets. We present
purchases the experimental
and emergence of popularresults and and
datasets newdiscussed
implications for developing
the factors a data
that affect the
market
purchase simulator.
frequency The development
distribution of this
of the simulator
datasets. We is expected
present the to significantlyresults
experimental advance andresearch on the market
new implications for understanding and
developing a data
system design in data
market simulator. The markets.
development of this simulator is expected to significantly advance research on the market understanding and
system design in data markets.
©
© 2022
2022 The
The Authors.
Authors. Published
Published by by Elsevier
ELSEVIER B.V.B.V.
This
This is
is an
an open
open access
access article
article under
under the
the CC
CC BY-NC-ND
BY-NC-ND license
license (https://creativecommons.org/licenses/by-nc-nd/4.0)
(https://creativecommons.org/licenses/by-nc-nd/4.0)
© 2022 The Authors.
Peer-review Published byofELSEVIER
under responsibility B.V.committee
the scientific of the 26th International Conference on Knowledge-Based and
Peer-review under responsibility of the scientific committee of KES International
Intelligent Information & Engineering Systems (KES 2022) (https://creativecommons.org/licenses/by-nc-nd/4.0)
This is an open access article under CC BY-NC-ND license
Keywords:
Peer-review data market;
under data exchange;
responsibility of data ecosystem;committee
the scientific system design; simulation.
of KES International
Keywords: data market; data exchange; data ecosystem; system design; simulation.

* Corresponding author. Tel.: +81-3-5841-2908.


* Corresponding hayashi@sys.t.u-tokyo.ac.jp
E-mail address:author. Tel.: +81-3-5841-2908.
E-mail address: hayashi@sys.t.u-tokyo.ac.jp
1877-0509 © 2022 The Authors. Published by ELSEVIER B.V.
This is an open
1877-0509 access
© 2022 Thearticle under
Authors. the CC BY-NC-ND
Published by ELSEVIER license
B.V.(https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under
This is an open responsibility
access of the
article under the scientific
CC BY-NC-NDcommittee of KES
license International
(https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of KES International
1877-0509 © 2022 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 26th International Conference on Knowledge-Based and Intelligent
Information & Engineering Systems (KES 2022)
10.1016/j.procs.2022.09.227
1696 Teruaki Hayashi et al. / Procedia Computer Science 207 (2022) 1695–1704
2 Author name / Procedia Computer Science 00 (2022) 000–000

1. Introduction

In recent years, owing to rapid technological advances, diverse and high-speed data generation and distribution has
become possible. In this climate, data have come to be characterized as an economic good, and since 2013, the year
of the big data boom, the data market, where data from different domains are exchanged and traded among various
stakeholders, has been widely studied [1, 2]. Furthermore, cross-industrial data co-creation has become an emerging
trend, in which diverse data are exchanged across different industries. However, the debate on when and what types
of regulations are appropriate for data distribution and exchange has not yet been settled, and potential data market
players are cautious about entering this new market. Therefore, for a sound development of the data market, it is urgent
to clarify the dynamic interactions among different types of datasets, regulations, and stakeholders in the market, and
to create rules for data distribution and exchange based on these interactions.
However, unlike widely known markets, such as financial and securities markets, few methods for observing and
evaluating the effects of new rules or regulations exist that are emerging in the data market. In addition, a possibility
of unexpected side effects caused by these rules and regulations exist. Furthermore, it is unclear how stakeholders’
behaviors and incentives within a data market interact to create and drive the market.
In computational social science [3], attempts have been made to explore human behavior and social phenomena
quantitatively and theoretically using large-scale social data, such as social networking service data [4]. However, the
data market requires a different approach from the conventional one, in that the observable events and interactions of
the datasets, stakeholders, and regulations are limited. In this case, artificially modeling the behavioral players in the
market as agents and exploring the market growth process through simulations can be effective [5, 6]. In particular,
the multi-agent simulation approach provides more realistic solutions than top-down approaches in market
understanding and institutional design [7].
The ultimate goal of our study was to elucidate the dynamic interactions among datasets, regulations, and
stakeholders in the data market. To achieve this goal, an advanced simulator must be developed to model various
components of the data market and consider developing market conditions, such as the effect of new rules or
regulations, to promote the sound growth of the market. To design a data market simulator, this study modeled the
exchanged datasets and data buyers as agents, which are the smallest units of data market components.
Subsequently, based on these two models, we built an environment to simulate data exchange in the data market.
Finally, simulations were performed for seven scenarios with dataset and agent models in four cases, assuming
different market sizes. The results revealed factors that affect the purchase frequency distribution of datasets.
The primary contribution of this study is the identification of factors that affect the purchase frequency distribution
of datasets by comparing the dataset and agent models with four different market sizes. the proposed models and
settings will be extended to realize a multi-agent simulator that considers the interactions between data and
stakeholders in the data market under legal and regulatory conditions. In addition, This may yield a significant progress
in market understanding and institutional design research in data markets where observable phenomena are limited.
The remainder of this paper is organized as follows. In Section 2, we outline the data market, which is the main
subject of this study, and explain the characteristics of the datasets and agents to be modelled based on the existing
studies. In Section 3, we describe the details of the proposed model and experimental settings. Section 4 presents the
results and discusses the scenarios considered using the models. Section 5 discusses the limitations of this study and
future studies for the institutional design of the data market. Section 6 draws the conclusions.

2. Approaches to Data Market and Modeling

The creation of new businesses and technologies by exchanging data in different fields has been discussed in many
previous studies, and technologies that support data exchange have been developed. In particular, the data market (data
market/marketplace, the market of/ for data, etc.) is a new entity on the scene and has been widely investigated in
recent years [1, 2, 8–11].
In this context, technologies to facilitate data distribution and exchange, such as pricing [12, 13] and data protection
models [2], have been proposed. However, the existing technologies implicitly assume that the data market is
sufficiently mature. Furthermore, studies on the institutional design to promote the growth of the data market is still
insufficient. Thus, the institutional design and rule-making should be strategically promoted to foster a sound market.
Teruaki Hayashi et al. / Procedia Computer Science 207 (2022) 1695–1704 1697
Author name / Procedia Computer Science 00 (2019) 000–000 3

However, owing to the nascent nature of the market, observable information on exchanged datasets and interactions
among stakeholders, such as buyers and providers, in the market are limited. Therefore, a lack of observable
information that contributes to designing the market system and formulating regulations to promote the sound growth
of data markets exists. Consequently, data market simulators are urgently needed to be developed.
To develop simulators for data markets, the smallest components are datasets exchanged in the market and buyers
who purchase the provided datasets. In this study, we introduce the concept of similarity of datasets to consider
individual datasets and interactions between these individual datasets. Hayashi and Ohsawa focused on variables to
identify the characteristics of datasets and their relationships with their metadata using a network approach [14]. Sakaji
et al. calculated the similarity of different datasets from summary sentences in metadata [15]. Based on the proposed
approaches in the existing studies, we model datasets themselves and their relationships; thus, the similarity between
the datasets is handled in the market simulation in this study.
Subsequently, players who handle the datasets in the market must be modeled. It is not completely clear what types
of stakeholders play a role in the emerging data markets. Stahl categorized data market stakeholders as suppliers,
buyers, and platform(er)s [16]. Quix et al. defined the business architecture of industrial data, called a data exchange
platform, and discussed the relationship between 11 types of business roles: data owners, consumers, and data brokers
[17]. Hayashi et al. did not prescribe clear roles for stakeholders in data marketplaces and collected roles according to
business cases; they found 155 types of stakeholder roles in 45 data businesses [18].
Our overview of literature led us to assume that diverse and complex relationships exist between various players
in a data market and its ecosystem. To simplify the simulation models in this study, first, we model only data buyers
as market players because data providers in the proposed primal model can be equated with datasets provided to the
market. Second, we consider interactions between the agents of data buyers. Interactions could have various types of
actions, such as sharing information about the purchase of datasets, duplicating datasets, and reselling, or data
processing, such as conducting machine learning or anonymizing. Here, we model two types of interactions, including
information-sharing of data purchases and the business categorization of the agents. Information sharing of data
purchases means that information about the purchased datasets is shared among other agents. This model allows us to
express the phenomenon that well-purchased datasets become even better purchased, leading to a satisfactory
reputation and popularity of the datasets.
The next step is the business categorization of agents. This model involves categories to which the agents belong,
such as telecommunication carriers, food businesses, and manufacturers. For example, agents engaged in food
businesses may tend to purchase traffic data to guarantee a stable food supply route and weather data for examining
the quality of ingredients. Additionally, those involved in the manufacturing industry may frequently use electricity-
related data. Thus, the behavioral types of data purchases may be slightly comparable, depending on the industry
category. With the modeling of the exchanged datasets and buyer agents, the next section details the experimental
procedure and setup.

3. Experimental Procedure and Setup

3.1. Purpose and Method

The purpose of this study is to obtain profound insight into developing a multi-agent simulator for the data market
by modeling the smallest unit of market components, that is, exchanged datasets and people who purchase datasets as
agents. Moreover, data providers are considered the same as the datasets offered to the market and are not modeled in
this study. In this experiment, we created four types of datasets and three agent models and simulated seven scenarios
by combining the models.
The first step was to model the datasets traded in the market. The simplest possible model is to assume that all
provided datasets are homogeneous and without individuality (D1). Although this is an unrealistic setting, it was
targeted as a benchmark and used for comparison in this study.
The second model considers the popularity of datasets (D2). Initially, as in D1, all exchanged data are homogeneous
and have no individuality. However, when an agent purchases a dataset, the information that the dataset has been
purchased is transmitted to other agents, which increases the popularity of the dataset. Thus, when popular datasets
1698 Teruaki Hayashi et al. / Procedia Computer Science 207 (2022) 1695–1704
4 Author name / Procedia Computer Science 00 (2022) 000–000

are purchased, they become even more popular and tend to be purchased more frequently. We set the parameter degree
of data popularity (𝜌𝜌) = 1 for D2, which corresponds to a probability of 1/100 for the first agent to purchase dataset
(𝑑𝑑) from a set of 100 datasets and 2/101 for the next agent to choose 𝑑𝑑.
The third model considers data similarity (D3). Similarity is the quantification of the degree to which different data
pairs are similar. For example, weather data from Japan and the United States are highly similar in that they deal with
weather, although in different regions. Moreover, employee data from Company A and Company B have information
as to people who belong to these companies, which can be considered highly similar data pairs. In contrast, weather
data from Japan and the employee data from Company A may be completely different, and thus data similarity
between them is low. This study was conducted on two networks, where the degree distribution of the similarity
network of heterogeneous datasets is either a power distribution (scale-free network) or a Poisson distribution (random
network). We named these models D3-1 and D3-2, respectively. In these models, when the dataset was purchased, the
popularity of the dataset and that of the neighboring datasets linked with the purchased datasets increased. For both
D3-1 and D3-2, the degree of popularity of the purchased datasets (𝜌𝜌!"#$%&'() ) was set to 2, and the degree of
popularity of similar datasets (𝜌𝜌'*+*,&# ) was set to 1.
In summary, the data model includes the following four types (D1, D2, D3-1, and D3-2).
・ D1: All datasets are homogeneous and have no individuality.
・ D2: All datasets are homogeneous and have no individuality at the beginning; however, information on datasets
purchased by agents is shared, and the popularity changes by 𝜌𝜌.
・ D3: All datasets are assigned a similarity level. The information on the datasets purchased by agents is shared,
and the popularity changes with 𝜌𝜌!"#$%&'() . In addition, the popularity of datasets similar to purchased datasets
(neighbor datasets linked in the network) increases, and thus they are more likely to be purchased by 𝜌𝜌'*+*,&# .
Data similarity networks were as follows:
Ø D3-1: A scale-free network, which is a power distribution; and
Ø D3-2: Random network, which is a Poisson distribution.
Next, we discuss agents that model players who purchase datasets. As for datasets, the simplest model treats all
agents as homogeneous and without individuality (A1). Thus, they choose and purchase datasets randomly, with no
preference for the datasets they purchase.
The second model includes agents with dataset preference (A2). Information about datasets purchased by other
agents is shared among agents, which makes them more likely to prefer highly popular datasets.
The third model included agents’ business categories (A3). Each agent belongs to one business category (𝑐𝑐- ), and
information on datasets purchased by other agents is shared within the category. This facilitates the agents’ purchases
of popular datasets; however, as the information is shared only within the categories, popular datasets are closed and
different from other business categories. In this study, the number of business categories was 10. The probability of
popular datasets being purchased within each industry (𝑝𝑝.! ) was assumed to be 0.1 for all categories, and the
probability of datasets being randomly purchased was 0.9 (= 1 − 𝑝𝑝.! ).
In summary, the following three buyer agent models purchase the datasets (A1, A2, and A3).
・ A1: All agents are homogeneous and do not interact with each other. They purchased datasets randomly.
・ A2: Information on datasets purchased by other agents is shared among agents, and they are more likely to prefer
highly popular datasets.
・ A3: Each agent belongs to one business category, and information on datasets purchased by other agents is shared
within the category. Popular datasets within categories were purchased with a probability of 𝑝𝑝.! = 0.1.
Assumptions of the data market, data being handled, and agents in this experiment are as follows:
・ Values of datasets are homogeneous, and no fake datasets exist. Agents have the same level of knowledge and
common value perception as the provided datasets.
・ There is one platform for data distribution and trading in the market.
・ In each step, buyer agents purchase one dataset.
・ Information on the purchased datasets is promptly shared among all agents. Thus, when an agent purchases a
dataset, the popularity of the dataset changes, affecting the next agent’s dataset purchase.
・ The order in which agents purchase datasets in each step is completely random.
Teruaki Hayashi et al. / Procedia Computer Science 207 (2022) 1695–1704 1699
Author name / Procedia Computer Science 00 (2019) 000–000 5

3.2. Scenarios

Table 1 lists models of study scenarios. The number of steps was set to 1 and 5; thus, in one step, all agents
purchased datasets once each, and after five steps, each agent had five datasets. The iteration was 100, and the average
frequency of the purchased dataset and 95% confidence interval were calculated. We set four cases, in which the
number of datasets and that of agents assume different sizes of the data market (Table 2).
Table 1. Seven scenarios of data and agent models.
Scenario # Data model # Agent model #
1 D1 A1
2 D2 A2
3 D3-1 A2
4 D3-2 A2
5 D2 A3
6 D3-1 A3
7 D3-2 A3

Table 2. Four cases assuming different sizes of data market.


Case # # of datasets # of agents
1 1,000 100
2 100 1,000
3 1,000 1,000
4 100 100

4. Results and Discussion

4.1. Comparisons of the Four Cases

Figure 1 shows the frequency distribution of the data purchased by buyer agents. Figures 1(a) and (b) show the
results of Step 1 and Step 5, respectively. By comparing the frequency distribution of the data purchased in Steps 1
and 5 for Cases 1–4, differences and similarities of the seven scenarios are discussed. As the purpose of this study is
to examine the influence of the difference between the dataset and agent models on the purchase frequency distribution,
we did not determine or test the exact parameters of distribution fittings.

4.1.1. Case 1
In Case 1, the number of agents is larger than that of datasets in the market, Scenarios 1, 5, 6, and 7 resemble a
Poisson distribution (or normal distribution), whereas Scenarios 2 and 4 follow an exponential distribution. Only
Scenario 3 has a distribution similar to a power distribution with a long-tailed feature and gradual straight decay in
the double-logarithmic graph. As many agents purchase a relatively small number of datasets at 100, the influence of
the popularity of datasets is pronounced, which illustrates right-skewed distributions that differ from the Poisson
distribution in Scenarios 2, 3, and 4. In particular, Scenario 3 models the phenomenon in which datasets similar to
popular datasets become popular and purchased; the similarity network of datasets is scale-free. Therefore, once a
dataset is purchased, it becomes popular, and thereafter is purchased more frequently compared with other datasets,
which makes the purchase frequency a power distribution.
In addition, Scenario 2 models the popularity of datasets, and the distribution shows a tendency that the more
popular the datasets, the more they are purchased. However, it does not model the dataset similarity; only the
purchased datasets are popular, and the frequency of appearance of popular datasets is not as high as in Scenario 3.
Scenario 4 models the characteristic that datasets similar to the purchased datasets become popular. Therefore, the
appearance of popular datasets depends on the structure of a similarity network with a random graph. As the random
graph does not allow for the appearance of datasets that gain as considerable popularity as a scale-free network, the
distribution is exponential, similar to that in Scenario 2.
1700 Teruaki Hayashi et al. / Procedia Computer Science 207 (2022) 1695–1704
6 Author name / Procedia Computer Science 00 (2022) 000–000

Fig. 1. Four cases with seven scenarios and two steps: (a) Step 1 and (b) Step 5.
The inset graphs represent a double-logarithmic graph.
Teruaki Hayashi et al. / Procedia Computer Science 207 (2022) 1695–1704 1701
Author name / Procedia Computer Science 00 (2019) 000–000 7

Scenarios 5, 6, and 7 are models with the agents’ business categories, which resulted in a Poisson distribution, the
same that of Scenario 1, where datasets are purchased randomly. This discussion applies to Cases 2–4, as explained
in Subsections 4.1.2–4.1.4. Scenario 5 is a model in which popular datasets appear within business categories, with a
probability of 𝑝𝑝.! = 0.1. The results suggest that the randomly purchased datasets have a greater influence on
purchases than the highly popular datasets within business categories. Scenarios 6 and 7 are models in which similar
datasets gain popularity within the categories and distributions are similar to those of Scenario 5. The results obtained
by varying the purchase probability 𝑝𝑝.! of popular datasets are discussed in Subsection 4.2.

4.1.2. Case 2
In Case 2, the number of agents is smaller than the number of datasets; a small difference exists between scenarios.
In addition, in all scenarios, the frequency distribution of data purchases is gradual right-skewed because the number
of datasets is larger than that of agents; hence, most datasets are not purchased or are purchased only a few times (at
most seven times for Step 1 and a maximum of 38 times for Step 5, in Scenario 3). Therefore, when a large number
of datasets are provided in a market, differences among models, such as the dataset popularity and agents’ business
categories, are less influential.

4.1.3. Case 3
In Case 3, where the number of datasets and agents is similarly large, the distribution of Scenario 3 decays gradually,
exhibiting a power-like distribution, whereas the distribution of other scenarios abruptly declines, as in other cases.
Step 5 divides distributions into three types. Scenario 3 is a gradual right-skewed power distribution, as in Step 1.
Scenario 2 is not as large as the power distribution; however, its shape resembles a long-tailed exponential distribution.
Nonetheless, Scenarios 1, 4, 5, 6, and 7 exhibit a Poisson distribution. As in Case 1, Scenarios 5, 6, and 7 in Case 3
have approximately the same Poisson distribution as in Scenario 1, which is a random selection. Thus, the models of
business categories and dataset popularity have an inconspicuous influence on the purchase frequency distribution
when the number of datasets provided is larger than that of agents. Interestingly, Scenario 4 had a similar distribution
to Scenario 3 for Cases 1, 2, and 4; however, it exhibited a Poisson distribution only in Case 3, which is thoroughly
discussed in Subsection 4.3.

4.1.4. Case 4
In Case 4, the number of datasets and agents are both small, and all scenarios in Step 1 exhibit gradual right-skewed
distributions. However, Step 5 shows three types of distributions: Scenario 3 exhibits a power distribution, as was the
case in Step 1; Scenarios 2 and 4 are similar in shape to the long-tailed exponential distribution, and are not as large
as the power distribution; and, Scenarios 1, 5, 6, and 7 have a Poisson distribution. Compared to Case 3, the purchase
frequency distribution of datasets for each scenario is completely comparable, except for Scenario 4. Therefore, if the
number of datasets and agents differ by approximately 10 times the market size, which is an experimental setting in
this study, the purchase frequency distribution does not significantly change from the scenario type. Thus, differences
in the models that make up the scenarios may have a stronger influence on the distribution than differences in the
number of datasets and agents.

4.2. Influence of the Purchase Probability in the Business Categories

Scenarios 1, 5, 6, and 7 resulted in similar frequency distributions in all cases. Scenarios 5, 6, and 7 modeled agents’
business categories with popular datasets; however, they had less influence on differences in purchase frequency
distribution regardless of the number of agents and datasets. Therefore, the results are equivalent to randomly buying
datasets. Figure 2 shows the results for Case 1 with a varying 𝑝𝑝.! , a parameter for popular data selection. The value of
𝑝𝑝.! = 0 is equivalent to a model in which, as in Scenario 1, no information sharing happens among agents and the
dataset is purchased completely at random. In contrast, as the value of 𝑝𝑝.! increases, popular datasets within the
categories are selected. When 𝑝𝑝.! is in the range of 0.0–0.6, a Poisson distribution is yielded; however, when it reaches
0.9, a long-tailed characteristic appears, and a gradual straight decline occurs in the high-frequency portion, where the
1702 Teruaki Hayashi et al. / Procedia Computer Science 207 (2022) 1695–1704
8 Author name / Procedia Computer Science 00 (2022) 000–000

number of purchases is 8 or greater. Consequently, the characteristics that frequently purchased datasets become
popular and come to be purchased increasingly appear, which creates a power distribution.
Scenarios 6 and 7 are models in which popular datasets appear based on dataset similarity. Although distributions
are similar when 𝑝𝑝.! is in the range of 0.0–0.6, a significant difference appears at 0.9. In Scenario 6, the data similarity
follows a scale-free network; hence, highly popular datasets are purchased more often in each category at 𝑝𝑝.! = 0.9.
Therefore, distribution of the most frequently purchased portion over approximately 10 purchases exhibits a gradual
decline. In contrast, in Scenario 7, the dataset similarity follows a random graph; therefore, extremely popular datasets
do not appear compared with Scenario 6. Consequently, the appearance of popular datasets is suppressed, and the
distribution remains a partially long-tailed Poisson distribution.
Consequently, when the purchase probability 𝑝𝑝.! is varied, Scenarios 5 and 6 have similar purchase frequency
distributions even though the dataset similarity follows a scale-free network, whereas Scenario 7, where the dataset
similarity follows a random graph, has a different distribution pattern.

Fig. 2. Purchase frequency distributions by changing the purchase probability of business categories.
The smaller graph in each graph represents a double-logarithmic graph.

4.3. Rationale for the Purchase Frequency Distribution Difference by Market Sizes in Scenario 4

Here, the mechanism of the special behavior of Scenario 4 is compared with those of Scenarios 2 and 3. Scenario
4 has an interesting behavior in that the purchase frequency distribution differs between Cases 3 and 4, where Case 3
is a model in which 1,000 agents purchase datasets from 1,000 dataset candidates, and Case 4 is in which 100 agents
purchase datasets from 100 candidates. Scenarios 2 and 4 have the same purchase frequency distribution, which is
similar to the exponential distribution in Cases 1, 2, and 4. Only Case 3 has a Poisson distribution similar to Scenarios
1, 5, 6, and 7.
Interestingly, Scenarios 2 and 4 have different distributions in Case 3, whereas they have the same distribution in
Case 4, particularly in Step 5, which is owing to the dataset model. The dataset popularity increases only by purchase
in Scenario 2, whereas it increases by dataset purchase and dataset similarity in Scenario 4. Furthermore, both
Scenarios 2 and 4 have a similar distribution, with a gradual decline in Step 1; however, a distinctive difference occurs
in Step 5. We conducted additional experiments by varying the number of steps and popularity of the datasets to
investigate the possible factors contributing to this change.
Figure 3 compares Scenarios 2 and 4 for Cases 3 and 4, and shows that the purchase frequency distribution with
the number of steps increases to 1, 5, and 10. In Case 4, as the number of steps increases in both scenarios, the
distribution becomes long-tailed, which is remarkably close to an exponential distribution. In Scenario 2 for Case 3,
the distribution becomes long-tailed as the number of steps increases, as in Case 4. However, in Scenario 4 for Case
3, the maximum number of purchases is on the order of 10, maintaining a Poisson distribution. Therefore, as the
number of purchased datasets in each step is large in Scenario 2 for Case 3, datasets that gain popularity already appear
within a step; thus, popular datasets become increasingly popular. However, In Scenario 4 for Case 3, as the number
of steps increases, the popularity of datasets increases similar to the purchased popular datasets, according to the
random graph. Thus, more candidates appear for purchase and popular datasets do not become extremely popular,
resulting in a distribution close to the Poisson distribution in Scenario 4. Consequently, the number of steps taken did
not significantly influence the purchase frequency distribution.
Teruaki Hayashi et al. / Procedia Computer Science 207 (2022) 1695–1704 1703
Author name / Procedia Computer Science 00 (2019) 000–000 9

Next, purchased and popular datasets are analyzed. Popular datasets are candidates for purchase. In one step, 1,000
and 100 datasets were purchased in Cases 3 and 4, respectively, which was consistent with the number of agents in
each case. In contrast, 630 popular datasets appeared at Step 1 in Case 3 and 33,819 datasets in Case 4, with
exponential and normal distributions, respectively. For comparison, in Scenario 2, where data similarity in the random
graph was not introduced, the number of popular datasets was twice that of purchased datasets, and both Cases 3 and
4 had an exponential distribution.
Subsequently, we compared the network characteristics of dataset similarity. In Case 3, there are 1,000 nodes,
15,103 links, and an average degree of 30.2; thus, one purchased dataset with a weight of 𝜌𝜌 = 2 and 30.2
nonpurchased similar datasets on average become popular datasets for one purchase. In Case 4, there are 100 nodes,
164 links, and an average degree of 3.28; thus, one purchased dataset with a weight of 𝜌𝜌 = 2 and approximately 3.3
nonpurchased similar datasets on average become popular datasets for one purchase. Therefore, the influence of the
purchased datasets on the popular datasets is larger in Case 4, and the mechanism by which popular datasets become
even more popular and increasingly purchased is more likely to occur in Case 4 than in Case 3, resulting in a normal
distribution of the purchase frequency distribution in the large market (Case 3), and close to an exponential distribution
in the small market (Case 4).
In Scenario 3 with a power distribution, the average degrees were 3.51 and 2.96 in Cases 3 and 4, respectively.
Compared to the average degrees of the random graph in Scenario 4, the influence of purchased datasets on the popular
datasets owing to the market size is small and insignificant; hence, purchase frequency distributions exhibit power
distributions for both Cases 3 and 4.

Fig. 3. Purchase frequency distributions by changing the steps. The inset in each graph represents a double-logarithmic graph.

5. Limitations and Future Works

The limitations of this study and implications derived from the experiment and its settings are as follows.
First, agents had no characteristics or strategies and were treated homogeneously. However, agents have individual
characteristics and the datasets they want must differ in practice. Some agents may randomly select datasets they want,
whereas others may preferentially purchase the most popular datasets. An agent that models these features should be
considered in future studies.
Second, we did not consider the price of datasets; however, data are exchanged by considering the price. For similar
datasets, some may purchase the less expensive dataset, whereas others may purchase a dataset provided by a highly
trustworthy company. In addition, agents have constraints on purchases, such as budget. Therefore, it is important to
model these factors in future studies.
Third, this study only classified agents by business category, whereas the dataset introduced a network model with
respect to similarity. Moreover, the probability of popular dataset purchases may differ according to business
categories. Further studies are needed to clarify the relationships between business players in a data market that are
still unobserved and incompletely explored.
Fourth limitation is the growth model of the data market. It is conceivable that the market grows and declines
repeatedly. Agent and dataset models are likely to vary depending on the degree of growth and scale of the market.
Therefore, parameter selection and simulator development from a growth perspective should be considered in future
studies.
1704 Teruaki Hayashi et al. / Procedia Computer Science 207 (2022) 1695–1704
10 Author name / Procedia Computer Science 00 (2022) 000–000

6. Conclusions

Owing to the limitations of observable transactions and interactions of the components in data markets and its
ecosystem, the data and agent models are not completely explored. In this study, we used a model-based approach to
model reasonable features of data markets, such as the similarity and popularity of datasets and business categories of
buyer agents. Simulations were conducted using seven scenarios with four different market sizes. We found three
major purchase frequency distributions: Poisson, exponential, and power. Furthermore, the results imply the
importance of the mechanism and parameters that datasets become popular within business categories and with the
models of data similarity, which provides us with new challenges to clarify the empirical values of these parameters
and a mechanism to determine them. Moreover, our future study will extend the models and setups, and develop a
multi-agent simulator to investigate dataset and player interactions in data markets under legal and regulatory
conditions.

Acknowledgements

This study was supported by the JSPS KAKENHI (JP20H02384). We wish to thank Editage for the English
language editing.

References

[1] Florian Stahl, Fabian Schomm, Gottfried Vossen. (2014) “Data Marketplaces: An Emerging Species,” Frontiers in Artificial Intelligence and
Applications, 145–158.
[2] Fan Liang, Wei Yu, Dou An, Qingyu Yang, Xinwen Fu, Wei Zhao. (2018) “A Survey on Big Data Market: Pricing, Trading and Protection,”
IEEE Access, 6:15132–15154.
[3] David Lazer, et al. (2009) “Computational Social Science,” Science, 323(5915):721–723.
[4] Yelena Mejova, Ingmar Weber, Michael W. Macy. (2015) Twitter: A Digital Socioscope, Cambridge University Press.
[5] J. Doyne Farmer, Duncan Foley. (2009) “The Economy Needs Agent-Based Modelling,” Nature, 460:685–686.
[6] Stefano Battiston, J. Doyne Farmer, Andreas Flache, Diego Garlaschelli, Andrew G. Haldane, Hans Heesterbeek, Cars Hommes, Carlo Jaeger,
Robert May, Marten Scheffer. (2016) “Complexity Theory and Financial Regulation,” Science, 351(6275):818–819.
[7] Nigel Gilbert, Klaus G. Troitzsch. (1999) “Simulation for the Social Scientist,” Open University Press.
[8] Magdalena Balazinska, Bill Howe, Dan Suciu. (2011) “Data Markets in the Cloud: An Opportunity for the Database Community,” the VLDB
Endowment, 4(12):1482–1485.
[9] Fabian Schomm, Florian Stahl, and Gottfried Vossen. (2013) “Marketplaces for Data: an Initial Survey,” ACM SIGMOD Record, 42(1):15–
26.
[10] Yukio Ohsawa, Hiroyuki Kido, Teruaki Hayashi, and Chang Liu. (2013) “Data Jackets for Synthesizing Values in the Market of Data,” 17th
International Conference in Knowledge Based and Intelligent Information and Engineering Systems.
[11] Markus Spiekermann. (2019) “Data Marketplaces: Trends and Monetisation of Data Goods,” Intereconomics, 54:208–216.
[12] Korn Sooksatra, Wei Li, Bo Mei, Arwa Alrawais, Shengling Wang, Jiguo Yu. (2018) “Solving Data Trading Dilemma with Asymmetric
Incomplete Information Using Zero-Determinant Strategy,” International Conference on Wireless Algorithms, Systems, and Applications.
[13] Jinfei Liu, Jian Lou, Junxu Liu, Li Xiong, Jian Pei, and Jimeng Sun. (2021) “Dealer: An End-to-End model Marketplace with Differential
Privacy,” VLDB Endowment, 14(6):957–969.
[14] Teruaki Hayashi, Yukio Ohsawa. (2020) “Understanding the Structural Characteristics of Data Platforms Using Metadata and a Network
Approach,” IEEE Access, 8:35469–35481.
[15] Hiroki Sakaji, Teruaki Hayashi, Kiyoshi Izumi, Yukio Ohsawa. (2020) “Verification of Data Similarity using Metadata on a Data Exchange
Platform,” IEEE International Conference on Big Data.
[16] Florian Stahl, Fabian Schomm, Gottfried Vossen, Lara Vomfell. (2016) “A Classification Framework for Data Marketplaces,” Vietnam Journal
of Computer Science, 3:137–143.
[17] Christoph Quix, Arnab Chakrabarti, Sebastian Kleff, Jaroslav Pullmann. (2017) “Business Process Modelling for a Data Exchange Platform,”
29th International Conference on Advanced Information Systems Engineering.
[18] Teruaki Hayashi, Gensei Ishimura, Yukio Ohsawa. (2021) “Structural Characteristics of Stakeholder Relationships and Value Chain Network
in Data Exchange Ecosystem,” IEEE Access, 9:52266–52276.

You might also like