You are on page 1of 4

Proceedings of the Fifth International Conference on Electronics, Communication and Aerospace Technology (ICECA 2021)

IEEE Xplore Part Number: CFP21J88-ART; ISBN: 978-1-6654-3524-6

Distributed Multi-Dimensional Data Index Strategy in


Cloud Computing Environment
Dawei Gui*, Guoqi He
School of Information and Intelligent Technology, Shaanxi radio and Television University, Xian Shaanxi 710119, China
guidaweisxrtu@sohu.com

Abstract—The widespread application of cloud computing convenient way of using and service model. Major Internet
2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA) | 978-1-6654-3524-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICECA52323.2021.9675845

technology makes data show an explosive growth trend, and companies and scientific research institutions have invested
poses new challenges to traditional data management technology. huge human and financial resources to develop their related
Existing cloud storage systems generally use distributed hash technologies and research reasonable applications, such as
tables to access data. This key-value-based model can achieve Google’s Mapreduce technology, IBM’s Blue Cloud project,
higher access efficiency in single-dimensional queries, but it does and the Azure platform provided by Microsofe. At present,
not support multi-dimensional queries. Therefore, in recent major Internet service platforms at ho me and abroad have
years, cloud storage auxiliary indexing has become a hot topic in widely used cloud computing and big data technology and
academic research, and related results have been published in top
international conferences and top journals in the field of
have achieved good results. Most of the text and picture data
database. This paper studies the distributed multi -dimensional
in the do mestic Weibo social p latform and shopping trade
data index strategy under the cloud computing environment. The platform are stored on the cloud platform. The hotspot
work of the thesis is carried out from two aspects: multi- informat ion in a fixed time period of the Internet platform can
dimensional data index in cloud storage and distributed be summarized by the user's visits and the click-through rate
computing. of the event during the time period. The shopping platform can
lock the type of products that the user needs to buy in th e
Keywords—Distributed Computing, Multidimensional Data recent period according to the user's browsing information and
Index, Cloud Computing, Big Data analyze the reco mmendation system Reco mmend
corresponding products to users accurately. According to the
I. INT RODUCT ION results of big data analysis, government departments can keep
Multidimensional data index has always been an important abreast of social trends and make correct guidance. It can be
research problem in the field of data management. With the seen that the results of big data analysis are very important for
arrival of the big data era, traditional relational data the corresponding decision-making of enterprises and
management systems are gradually unable to meet the needs institutions, and this series of technological advances poses
of practical applications in terms of efficiency and scalability. severe challenges to the index manage ment of cloud data [12-
Large-scale distributed cloud storage systems have become a 16].
new carrier of big data. Ho w to improve the performance of The rapid progress of Internet technology and the increase
mu lti-dimensional data query in the cloud computing in the frequency of use of GIS technology in people's lives
environment is one of the core issues in the cloud computing have produced a large amount of spatial data. However, the
field [1-5]. requirements for efficient management of these spatial data
With the rapid develop ment of Internet technology, the are constantly updated with the development of time. In the
data generated in the fields of science, engineering, and cloud computing environ ment, if the reasonable storage and
business computing has shown an explosive growth trend. efficient indexing of spatial data can be realized, it will make
IDC statistical report shows that the total amount of global users more convenient and convenient to use spatial data,
data in 2009 was about 0.8ZB, and by 2010, the total amoun t make the application of spatial data more suitable for reality,
of data had reached 1.2ZB. In just one year, the amount of and have a wider range of applications. Develop greater value
data almost doubled. However, the rate of data growth is still and guide people’s daily behaviors, wh ich can contribute to
accelerating. It is estimated that by 2020, the statistical value the development of the real society [17-20].
of this data will reach 35ZB, which is 44 times the amount of As the most basic infrastructure in cloud co mputing, data
data in 2009. The rapid growth of data poses severe challenges storage systems play a very important role in cloud co mputing.
to the storage and computing capabilit ies of existing IT Through cluster technology, distributed computing,
architectures in all walks of life. By continuously increasing virtualizat ion and other technologies, a large number of cheap
system hardware investment to imp rove system scalability, the and different types of storage media are managed by the cloud
business departments have been overwhelmed. Once the storage system to form a storage resource pool to provide
concept of cloud co mputing was put forward, it has received users with services. In the cloud storage model, data storage
extensive attention from industry and academia [6-11]. and management have beco me mo re centralized and
Cloud computing comb ines the advantages of distributed decentralized. Centralization means that data is stored in the
processing, parallel processing and grid computing, and is cloud in a unified manner for users, and users can obtain data
developed on this basis. It can be said that it is the commercial as long as they request, without paying attention to where the
realization of these computer science concepts, and it is a data comes from and how to manage it. The cloud storage

978-1-6654-3524-6/21/$31.00 ©2021 IEEE 931

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on December 24,2022 at 13:31:58 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Electronics, Communication and Aerospace Technology (ICECA 2021)
IEEE Xplore Part Number: CFP21J88-ART; ISBN: 978-1-6654-3524-6

system provides users with a convenient and efficient user mu lti-dimensional unified index; this kind of spatio-temporal
experience. Decentralizat ion is for cloud data centers. Un like index method is mostly extended to tree index. Thus, fro m the
traditional centralized data storage, the cloud storage system perspective of data structure, tree structure is the mainstream
uses a large-scale distributed data storage solution, and the data structure of spatio-temporal index.
data is stored in a large number of different data nodes. This
1n1  n2vs
storage architecture has obvious advantages, mainly reflected v  1  P  1  vs  (1)
in the fo llowing three aspects: High scalability. The cloud n1  n2
storage system adopts a parallel expansion method. The newly
purchased data server only needs to install the operating ai  zi  ( zi1  2kd 1) (2)
system and cloud storage software. After a simple
configuration, it can be added to the storage pool to achieve B. Multidimensional Data Index
capacity expansion [21-24]. Multidimensional data indexing has always been one of
the key research issues in the database field. There are already
II. T HE PROPOSED MET HODOLOGY
some relatively mature indexing technologies in relational
A. Distributed Computing databases. This section focuses on the analysis of existing
Most of the current spatio-temporal indexes are serial mu ltid imensional indexes based on tree structures,
spatio-temporal indexes in a centralized environment, and dimensionality reduction methods based on space curve filling,
and bitmap indexes.
most of the spatio-temporal data indexes evolved from spatial
indexes, especially so me based on the evolved spatio-temporal The tree structure is mo re efficient than sequential file
data indexes. In theory, high-dimensional and its variants can query, and the maintenance cost is also less. Therefore, the
be used as high-dimensional spatiotemporal indexes. They use tree structure has attracted much attention in index research.
the smallest space-time bounding rectangle to cluster space- The most representative one is the B-tree index, wh ich has
time objects into a hierarchical tree structure. This constraint been successfully applied in a large number of data
boundary may not represent the entire data range, and may management systems and file systems. However, B-trees only
partially overlap. The overlap problem is one of the support one-dimensional key-value queries, and mult iple B-
bottlenecks of the indexing method based on data partition, trees need to be established for mult i-d imensional queries,
which makes the storage space occupied by the index larger
because even a simp le point query may need to check mult iple
and the maintenance of the index more co mplicated. Therefore,
query paths. Especially when they are used for current relat ive
many supporting multi-dimensional index tree structures have
time data and mobile data, obvious node overlap problems and
been proposed, such as: R tree, KD tree, quad tree, octree and
dead space problems will seriously affect index performance. so on. This article mainly analy zes R-tree index and KD-tree
index.

Fig. 1. Distributed Computing


In order to solve these problems, many spatio-temporal
indexing methods have appeared in the past ten years. From
the perspective of data structure, it mainly includes: based on Fig. 2. T wo-dimensional R tree structure
extended spatio-temporal indexes, such as B-tree, STCB-tree,
etc.; this type of spatio-temporal index mainly adopts In 1984, Gutt man proposed the R-tree. Since then, various
transformation to achieve indexing of spatio-temporal data improvements to the R-tree have appeared, such as R* trees,
after data fractal dimension and data transformation Hilbert R-tree trees, dynamic R-trees, etc., which are
processing; Extended spatiotemporal indexes based on R-tree, collectively referred to as R-t ree index classes. R-tree is a
such as PTR-tree and RTR-tree, indoor moving object index balanced tree, which is the expansion of B-tree in a mult i-
based on DR-tree, etc.; this kind of time index is extended to dimensional space. The R-tree encloses adjacent data objects
R-tree, and the s mallest spatiotemporal boundary rectangle is with a small rectangle, called the smallest outer rectangle of
used to represent spatiotemporal objects, generally through the data (MBR for short). Each MBR constitutes a leaf node of
Additional time parameters and speed parameters to improve the R-tree index. At the same time, the MBR with a smaller
the continuous update performance, or through multip le tree value range can be used as a value range. The larger MBR
combinations to achieve spatio-temporal data index; spatio- covers and forms the parent node.
temporal index based on other data structures such as Grid, For an R tree o f order M (the maximu m nu mber of nodes
List, etc., such as the global discrete grid index based on linear that an R tree node can contain), suppose m is the min imu m
quadtree, based on Phase point analysis of mov ing data index value of the index items contained in the midd le node of the R
boundary constraint non-intersecting spherical entity object R tree Meet the

978-1-6654-3524-6/21/$31.00 ©2021 IEEE 932

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on December 24,2022 at 13:31:58 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Electronics, Communication and Aerospace Technology (ICECA 2021)
IEEE Xplore Part Number: CFP21J88-ART; ISBN: 978-1-6654-3524-6

following characteristics: 1) The root node contains at least


two nodes, unless it is a leaf node. 2) The number of all
intermediate nodes except the root node must be between m
and M, that is, there are at most M subtrees and at least m
subtrees. 3) Each leaf node has at most M data items and at
least m data items. 4) A ll leaf nodes must be at the same level.
Figure 2 is an examp le of a two-d imensional R-tree structure.
In the figure, the MBR of each parent node covers the MBR of
its child nodes.
C. Cloud Computing Fig. 3. T he impact of data size on data space
The emergence of cloud computing benefits fro m the
development of the Internet and computers. Ord inary users The search efficiency of mu lti-dimensional range with
different selection rates is shown in the figure.
believe that it is mo re convenient and quicker to obtain
resources and services. IT companies such as Google, IBM,
Microsoft, and Amazon believe that cloud computing can be
based on the needs of different scenarios. A platform capable
of continuously configuring computing and storage.
As the birthplace of cloud computing, the United States
has its own advanced technology that drives the development
of cloud computing and is the leader in the cloud co mputing
industry. More and more countries have begun to participate
in the field of cloud co mputing applications, and according to
the actual situation of each country Formu late reasonable and
effective plans and programs to pro mote its development. IBM
developed and imp lemented InterCloud storage software in
2013, M icrosoft also released a cloud service operating system
called W indows Azure in 2013, A mazon released streaming
computing services, and other well-known co mpanies also
designed and implemented corresponding cloud storage
services and cloud computing platforms.
my country's research on cloud computing has also
achieved fruitful results. Various relevant departments have
Fig. 4. Multi-dimensional range search efficiency with different selection
actively formu lated corresponding policies and programs to rates
promote the development of do mestic cloud computing, and
developed some technologically advanced cities such as Based on cloud computing, the efficiency of distributed
Beijing and Shanghai into pilot cities for cloud computing multi-dimensional data indexing is shown in the figure.
research and development. Some technology companies such
as Tencent, BaiDu, and Alibaba have also launched their own
cloud computing platforms in response to the general trend,
and important domestic enterprises or scientific research
institutions such as Lenovo, Sugon, and Huawei hav e also
actively responded by investing a large amount of manpower,
material resources and financial resources on related issues.
Cloud computing related projects. As a result, under the
impetus of various enterprises or institutions, my country's
cloud computing technology is constantly innovating, and the Fig. 5. Distributed multi-dimensional data index efficiency
industry is constantly optimizing and developing.
IV. CONCLUSION
SSE   i 1  pC p  mi
k 2
(3) This paper studies the distributed multi-dimensional data
i
index strategy under the cloud computing environ ment. The
work of the thesis is carried out fro m two aspects: mult i-
 t 
n  n0   n  n0  t 0 exp   (4) dimensional data index in cloud storage and distributed
R  computing. First, this article introduces the principles and
methods of distributed co mputing; then analyzes the strategies
III. EXPERIMENT and methods of mult i-d imensional data indexing; finally,
Based on cloud computing, the impact of data scale on analyzes the distributed mult i-dimensional data indexing
data space is shown in the figure. strategies based on cloud computing.

978-1-6654-3524-6/21/$31.00 ©2021 IEEE 933

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on December 24,2022 at 13:31:58 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Electronics, Communication and Aerospace Technology (ICECA 2021)
IEEE Xplore Part Number: CFP21J88-ART; ISBN: 978-1-6654-3524-6

REFERENCES [14] Laskhman A, Malik P. Cassandra: a decentralized structured storage


system[J]. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35-
[1] J. Gantz, D. Reinsel. The digital universe decade – are you ready?[R]. 40.
USA: Internationla Data Corporation, July 16, 2010
[15] Carstoiu D, Cernian A, Olteanu A. Hadoop hbase-0.20. 2 performance
[2] Armbrust M, Fox A, Griffith R, et al. A view of cloud computing [J]. evaluation[C]. Proceedings of the 4th International Conference on New
Communications of the ACM, 2010, 53(4): 50-58. Trends in Information Science and Service Science. New York: IEEE,
[3] Mell P, Grance T. The NIST definition of cloud computing [J]. 2010: 84-87.
Communications of the ACM, 2010, 53(6): 50. [16] Frank C, Morin S, Chebotko A, et al. Distributed semantic web data
[4] IDC Financial Insights [EB/OL]. December 21, 2015. management in HBase and MySQL cluster [C]. Proceedings of IEEE
[5] http://cs.com.cn/ssgs/hyzx/201510/t20151020_4820527.html, Oct 20, International Conference on Cloud Computing. New York: IEEE, 2011:
2015. 105-112.
[6] http://tech.caijing.com.cn/20160108/4049497.shtml, Jan 8, 2016. [17] Aguilera M K, Golab W, Shah M A. A practical scalable distributed b-
tree [J]. PVLDB, 2008, 1(1): 598-609.
[7] Agmon Ben-Yehuda O, Ben-Yehuda M, Schuster A, et al. Deconstructing
Amazon EC2 spot instance pricing[J]. ACM Transactions on Economics [18] Ghemawat S, Gobioff H, Leung S, et al. The Google file system [J].
and Computation, 2013, 1(3): 16. ACM SIGOPS Operating Systems Review, 2003, 37(5): 29-43.
[8] Ibm introduces ready-to-use cloud computing[EB/OL], http://www- [19] Kala Karun A, Chitharanjan K. A review on hadoop —HDFS
03.ibm.com/press/us/en/pressrelease/22613.wss. infrastructure extensions[C]. Proceedings of 2013 IEEE Conference on
Information & Communication T echnologies. IEEE, 2013: 132-137.
[9] Calder B, Wang J, Ogus A, et al. Windows Azure Storage: a highly
available cloud storage service with strong consistency [C]. Proceedings [20] Maltzahn C, Molina-Estolano E, Khurana A, et al. Ceph as a scalable
of the T wenty-Third ACM Symposium on Operating Systems alternative to the Hadoop Distributed File System[J]. The USENIX
Principles, ACM, 2011: 143-157. Magazine, 2010, 35: 38-49.
[10] Vavilapalli V K, Murthy A C, Douglas C, et al. Apache hadoop yarn: Yet [21] Abouzied A, Bajda-Pawlikowski K, Huang J, et al. HadoopDB in
another resource negotiator[C]. Proceedings of the 4th annual action: building real world applications[C]. Proceedings of the 2010
Symposium on Cloud Computing. ACM, 2013: 5. ACM SIGMOD International Conference on Management of data.
ACM, 2010: 1111-1114.
[11] DB-Engines Ranking. Http://db-engines.com/en/ranking, Feb 01, 2016
[22] Dittrich J, Quian´e-Ruiz J-A, Jindal A, et al. Hadoop++: Making a
[12] Chang F, Dean J, Ghemawat S, et al. Bigt able: A distributed storage
Yellow Elephant Run Like a Cheetah (Without It Even Noticing) [J].
system for structured data [J]. ACM Transactions on Computer Systems,
PVLDB, 2010, 3(1):518-529.
2008, 26(2): 1-26.
[23] Chandrasekar S, Dakshinamurthy R, Seshakumar PG, et al. A novel
[13] Decandia G, Hastorun D, Jampani M, et al. Dynamo: Amazon's highly indexing scheme for efficient handling of small files in hadoop
available key-value store [C]. Proceedings of the 21th ACM Symposium distributed file system[C]. Proceedings of 2013 International Conference
on Operating Systems Principles. New York, USA: ACM Press, 2007: on Computer Communication and Informatics, IEEE, 2013: 1-8.
205-220.
[24] Kim J W, Kim I. Efficient Range-based Query Processing on the
Hadoop Distributed File System[C]. Proceedings of the 2nd Multimedia
Workshop, 2013.

978-1-6654-3524-6/21/$31.00 ©2021 IEEE 934

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on December 24,2022 at 13:31:58 UTC from IEEE Xplore. Restrictions apply.

You might also like