Professional Documents
Culture Documents
Data Mining Association Method
Data Mining Association Method
Abstract— The purpose of this study was to determine functions to form candidate item combinations and then
consumer buying patterns at CV. XYZ by utilizing one of the tests whether the item combination meets the minimum
data mining methods, namely the association method. The support and confidence parameters which are the threshold
apriori algorithm can be used to find association rules from values given by the author. In addition to these two
sales transaction data in the company with the help of the
RapidMiner application. Thus the sales transaction data in
parameters, the lift parameter is also used to measure how
the company can be reprocessed to obtain critical important the rules that have been formed are based on the
information. Sales transaction data will be processed using support and confidence values by providing information on
Knowledge Discovery in Database (KDD). The test results whether it is true that product A was purchased together
with the RapidMiner application get four association rules. with product B [8]. The supporting application that can be
The best association rule is that if consumers buy Pants with used to find association rules from transaction data namely
code 1076, they are also likely to purchase Pants with code RapidMiner. RapidMiner is open-source software that
0814 (confidence = 83.3% & lift = 18.5). provides solutions for analyzing data mining, text mining
and predictive analysis [9].
Keywords— Association Method, Apriori Algorithm,
Association Rules, Data Mining
The author conducted research in retail garment
companies. So far, to see the results of sales of existing
I. INTRODUCTION products, the company still uses the difference between the
In Indonesia, many companies are engaged in the same goods produced and the goods sold. In addition, there is a
field. For example, garment retail is growing and in habit of consumers who always buy more than one type of
demand from young people to adults. The company must goods in one transaction. In interviews conducted by the
have various strategies to compete so that its business can author with the HRD Manager. The process of recording
develop and earn profits. One way is by utilizing all sales sales transactions using the Accurate application and
transaction data that has occurred in the company itself [1]. Microsoft Excel. To find out the types of goods that
Various problems are often experienced, such as not consumers like, the company looks at the application and
knowing the layout arrangement of goods based on consumers' opinions. This company does not have a unique
consumer habits of buying goods simultaneously. In method to determine the pattern of consumer buyers. So to
addition, some companies still do not know the pattern of find out which items are bought together by consumers in
purchasing goods, including what goods are purchased one transaction, the company looks at the sales transaction
simultaneously by consumers in one transaction [2]. records in the Accurate application and Microsoft Excel.
Business activities that run every day also cause transaction Data mining science that uses the association method
data to increase, but the data is only stored as an archive with the apriori algorithm is expected to be a solution to
and only used for making sales reports. Actually, the data provide an overview to the company regarding the pattern
has beneficial information, especially for retail business of linkage of any goods that consumers usually buy
people for the progress of running their business [3]. together as a reference for companies in producing goods
The ability and speed to process big data into useful based on consumer habits. In addition, the results obtained
information are very much needed by companies in can provide information about the placement of items that
formulating effective and efficient business strategies [4]. should be close together to make it easier for consumers to
With relevant information, purchasing patterns in an item find them. In this way, extensive sales transaction data can
can be used to improve sales performance so that we can be utilized as well as possible to gain knowledge that
make the right decisions. One way is using data mining benefits the company and not only be used as archives or
techniques to find these patterns [5]. With data mining, reports.
large transaction data will be explored for added value to Based on the explanation above, this study aims to
obtain knowledge that has not been known manually [6]. generate consumer purchasing patterns sourced from CV.
The association method is one of the methods found in data XYZ sales transaction data in 2021 by using RapidMiner
mining. This method aims to find the relationship between as a supporting application to find association rules. In
items from a database [7]. The apriori algorithm, including addition, from these results, what types of goods must be
the types of association rules found in data mining, sold and come from any category is also known.
The 10th International Conference on Cyber and IT Service Management (CITSM 2022)
Yogyakarta, September 20-21, 2022
∑ 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝐴
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝐴) =
𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠
The formula for finding the support value for two items is
as follows:
Conclusion
END
1. Identification of problems
The research stage begins with determining the Fig 3. RapidMiner Operators Used
research problem, namely CV. XYZ has a lot of sales
transaction data, but it is only used as an archive Determination of the minimum limit of support and
without being reused. Whereas the sales transaction confidence
data can be reprocessed to obtain critical information. The minimum support limit used is 0.015 because the
2. Goal setting number of item combinations appears most often three
Problems identified then determine the objectives to times. The items sold are of many types, which are not
proportional to the number of purchases. The minimum
be achieved, one of which is finding consumer
confidence limit used is 0.4 to produce the best association
purchasing patterns from large sales transaction data.
rules.
To accomplish this goal, one must study various
literature relevant to the research problem. Literature
Association rules results
can be in journals or books on data mining that use
association methods with apriori algorithms. The picture below is a result of the association rules
3. Data and information collection from CV. XYZ sales transaction data in 2021 produced by
Observations and direct interviews with companies RapidMiner.
can be carried out to determine the problem clearly
and obtain sales transaction data in 2021, which will
be processed to find association rules.
4. Methods and algorithms used
Processing of sales transaction data in 2021 that has
been obtained will be processed using the association
method with the apriori algorithm.
5. Processing with RapidMiner
The author uses the RapidMiner application to Fig 4. Association Rules 2021 in RapidMiner
process sales transaction data in 2021 by entering
tabulation data into the RapidMiner application to The result of association rules produces four rules. The
find the rules for the sales transaction data. explanation are:
6. Result analysis Rule 1 : If you buy PN 0773, you have 41.7% chance of
The author will then analyze the association rules buying PN 0844. The strength of the relationship is 9.25
generated from the RapidMiner application to obtain Rule 2 : If you buy PN 0844, you have 50% chance of
other important information.
The 10th International Conference on Cyber and IT Service Management (CITSM 2022)
Yogyakarta, September 20-21, 2022
buying PN 0773. The strength of the relationship From the resulting association rules, we can see that
is 9.25. the resulting support is the same high, but that does
Rule 3 : If you buy PN 0814, you have 50% chance of not mean that the confidence and lift generated are
buying PN 1076. The strength of the relationship also the same. For example, rule number 1 produces
is 18.5. lower confidence and lift than other association rules.
Rule 4 : If you buy PN 1076, you have 83.3% chance of
buying PN 0814. The strength of the relationship
is 18.5. V. CONCLUSION
The attributes are contained in CV. XYZ sales
Confidence (%) Lift transaction data can be used to perform data mining
analysis using the association method with the apriori
90 algorithm assisted by the RapidMiner application.
80 Companies can find the relationship of goods with one
70 another from many sales transactions. This can be seen
from the four association rules produced by RapidMiner
60
that the Pants category is the most often purchased together,
50 especially the type PN 1076 ⇒ PN 0814. The best
40 association rules can be used as recommendations for
30 companies to produce goods the following year.
The arrangement of goods based on the highest support
20
can be placed at the beginning because consumers most
10 often purchase these items. The collection of goods based
0 on the highest confidence can be placed side by side
PN 0773 ⇒ PN 0844 ⇒ PN 0814 ⇒ PN 1076 ⇒ because consumers will buy these items together. Another
PN 0844 PN 0773 PN 1076 PN 0814 thing that can be done is the creation of a sales brochure
that places items with the best association rules on one page
Fig 5. Association Rules Graph in 2021 so that consumers can easily see these items.
Association rules result analysis
Analysis of the results of association rules is needed to find REFERENCE
out more about the rules generated by RapidMiner. [1] A. Oktaviani, G. TM Napitupul, D. Sarkawi, and I. Yulianti,
1. Category “Penerapan Data Mining Terhadap Penjualan Pipa Pada Cv.
The Pants category dominated the association rules Gaskindo Sentosa Menggunakan Metode Algoritma Apriori,” J.
from sales transaction data in 2021. Ris. Inform., vol. 1, no. 4, pp. 167–172, 2019, doi:
10.34288/jri.v1i4.96.
2. The highest support [2] J. L. Putra, M. Raharjo, T. A. A. Sandi, R. Ridwan, and R.
Four resulting rules get the same high support, Prasetyo, “Implementasi Algoritma Apriori Terhadap Data
meaning that the combination of these items often Penjualan Pada Perusahaan Retail,” J. Pilar Nusa Mandiri, vol.
appears throughout the transaction. 15, no. 1, pp. 85–90, 2019, doi: 10.33480/pilar.v15i1.113.
[3] Y. Wahyuningtias and R. Rusdiansyah, “Analisis Penerapan
3. The highest confidence Asosiasi Untuk Menentukan Transaksi Penjualan Pada What’S
The highest confidence from rule number 4 (PN 1076 Up Café Dengan Metode Algoritma Apriori,” J. Ris. Inform.,
⇒ PN 0814) means that two items are the most vol. 1, no. 4, pp. 181–186, 2019, doi: 10.34288/jri.v1i4.92.
frequently purchased together by consumers. [4] N. Fitrina, K. Kustanto, and R. T. Vulandari, “Penerapan
Algoritma Apriori Pada Sistem Rekomendasi Barang Di
4. The highest lift Minimarket Batox,” J. Teknol. Inf. dan Komun., vol. 6, no. 2,
The highest lift is from rule number 3 (PN 0814 ⇒ PN pp. 21–27, 2018, doi: 10.30646/tikomsin.v6i2.376.
1076) and rule number 4 (PN 1076 ⇒ PN 0814), [5] A. Sani, “Analisa Penjualan Retail dengan Metode Association
meaning that the resulting association rules have the Rule untuk Pengambilan Keputusan Strategis Perusahaan: Studi
Kasus PT,” XYZ. Infotech, no. September, 2016, [Online].
most excellent/valid association strength compared to Available: https://www.researchgate.net/profile/Asrul-
association rules number 1 and 2. Sani/publication/327680554_ANALISA_PENJUALAN_RET
5. The best association rules AIL_DENGAN_METODE_ASSOCIATION_RULE_UNTUK
Rule number 4 is the best association rule because the _PENGAMBILAN_KEPUTUSAN_STRATEGIS_PERUSAH
AAN_Studi_Kasus_PT_XYZ/links/5b9e8660299bf13e60373b
confidence and lift produced are the greatest, meaning 02/ANALISA-PENJUALAN-RETAIL-DENGA.
in 2021, most consumers often buy Pants 1076 and [6] V. N. Budiyasari, P. Studi, T. Informatika, F. Teknik, U.
Pants 0814 together. Nusantara, and P. Kediri, “Implementasi Data Mining Pada
6. The similarity of confidence from the resulting lift Penjualan kacamata Dengan Menggunakan Algoritma Apriori,”
Indones. J. Comput. Inf. Technol., vol. 2, no. 2, pp. 31–39, 2017.
(PN 0844 ⇒ PN 0773) and (PN 0814 ⇒ PN 1076) [7] Nurdin and D. Astika, “Penerapan Data Mining Untuk
produced the same confidence but the lift produced Menganalisis Penjualan Barang dengan Menggunakan Metode
was different, meaning that the resulting association Apriori pada Supermarket Sejahtera Lhoksumawe,” J. Ilm.
strength would be slightly reduced. Rekayasa dan Manaj. Sist. Inf., vol. 4, pp. 77–80, 2018.
[8] D. A. N. Wulandari and L. Ningsih, “Data Mining Market
7. The similarity of support from the resulting Basket Analysis Menggunakan Algoritma Apriori Untuk
confidence and lift Menentukan Persediaan Obat,” Konf. Nas. Imu Sos. Teknol.,
vol. 1, no. 1, pp. 227–235, 2017.
The 10th International Conference on Cyber and IT Service Management (CITSM 2022)
Yogyakarta, September 20-21, 2022