Professional Documents
Culture Documents
net/publication/315472746
CITATIONS READS
9 4,299
5 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Mert Onuralp Gökalp on 08 October 2017.
MOA [19] and ADAMS [20] are similar to KNIME, The architecture of the proposed conceptual framework
KEPLER and RapidMiner in regards to application design consists of the following modules: Big data application
and execution. While the latter are batch processing oriented
432
430
design, pre-processing input data streams, distributed weeks/months. There is no “one-size-fits-all” big data
infrastructure, and distribution of results. solution. Instead, each big data platform has its own
advantages and disadvantages. Therefore, the proposed
Big Data Application Design module allows system
framework is aimed to support multiple big data platforms
engineers to develop their own big data applications with a
such as Storm, Spark and Flink. Hence, according to
visual editor. Applications are represented as directed
specific characteristics of an application under design, one
graphs where vertices represent data mining and machine
of the supported platforms can be chosen. Moreover, by
learning algorithms as well as programming constructs,
considering the designed application logic and use cases, the
while edges represent data streams which correspond to
framework itself can offer a suitable big data platform to run
intermediate results as shown in Figure 2. The programming
the application.
nodes take and produce data in a common standard to
handle data from various sources and to be integrated with The Results of the applications may be forwarded to
other programming nodes. Thus, the application logic can interested parties in different forms. Each distribution
be built by just connecting the programming nodes without channel is defined as a programming node in the visual
worrying about their internal details and interfaces. editor. Thus, users may select more than one distribution
channel to deliver the results. In this way, certain problems
in the production may be forwarded to right staff as
notifications. The results can also be used as inputs to
actuators and, hence, manufacturing processes can be
controlled and even improved. It is also possible to deliver
the results to external entities via web services for data
visualization or monitoring purposes.
IV. CONCLUSION
There is an abundance of tools and application frameworks
for processing big data, yet new tools continue to emerge
especially for stream data. These tools are commonly open
sourced after being developed by Internet based companies
including Google, Twitter, Linkedin, and Yahoo according
to their business requirements. Low level complexities of
Figure 2. Programming Model data processing platforms make them suitable for
programmers who have the knowledge and experience on
In this setting, a large number of Data Sources should be data science. On the other hand, people who have expertise
integrated to the platform to collect information regarding and deep knowledge in a specific domain only may not be
different aspects of a factory. Due to their heterogeneous able to use these tools. As a result, real time data coming
nature, these data sources may generate data in disparate from various sources cannot be integrated to business
formats. Therefore, data variety is an important challenge processes in an enterprise.
that can hinder the adoption of big data analytics in Industry
Specialized for the big data domain, data flow based visual
4.0 domain. Hence, the Preprocessing Input Data Streams
programming models can solve this problem by allowing
module plays a central role in our framework to convert data
programmers to iteratively develop new techniques which
into a common format for further processing. This is based
can utilize real time data. In organizations, people can
on data standardization to define a common standard for
quickly design and develop small programs to investigate
receiving structured, semi-structured and unstructured data
whether there are efficiency or quality issues in production
from various number of resources.
and service processes. We see this approach as an important
Deployed applications need fast and scalable infrastructures step towards the Industry 4.0 vision.
to handle big data use cases effectively. Therefore, big data
In this paper, we propose a conceptual framework which
platforms are established on a Distributed Infrastructure.
can be utilized in a smart enterprise. Its main components
User defined applications are deployed automatically on the
are designed to abstract users away from low level
distributed infrastructure to handle unique characteristics of
complexities like data standardization, platform specific
the big data. On the other hand, the requirements of big data
development, resource management, protocols, and APIs.
applications vary according to use cases. For instance, a
The framework handles collection of data from IoT and
monitoring application needs to process stream data and
Web based data sources, implementation of big data
produce results in a real-time manner. However, a predictive
analytics applications containing machine learning and data
analytics application needs to deal with bulk data to detect
mining components, translation of visually designed
potential risks about the production in upcoming
433
431
programs to platform specific ones, management of jobs [22] M. O. Gokalp, A. Kocyigit, and P. E. Eren, “A Cloud Based
among processing units, and delivery of results to people Architecture for Distributed Real Time Processing of Continuous
Queries,” in Proceedings - 41st Euromicro Conference on
and services. From this perspective, the framework Software Engineering and Advanced Applications, SEAA 2015,
facilitates the integration of big data analytics with business 2015, pp. 459–462.
processes by providing an end to end approach.
REFERENCES
[1] J. Lee, H. A. Kao, and S. Yang, “Service innovation and smart
analytics for Industry 4.0 and big data environment,” in Procedia
CIRP, 2014, vol. 16, pp. 3–8.
[2] K. Kayabay, M. O. Gökalp, M. A. Akyol, A. Koçyi÷it, and P. E.
Eren, “Big Data for Future Enterprises: Current State and
Trends,” in 3rd International Management Information Systems
Conference, øzmir, 2016, pp. 298–307.
[3] “Apache Beam.” [Online]. Available:
http://beam.incubator.apache.org. [Accessed: 10-Nov-2016].
[4] “Apache Spark.” [Online]. Available: http://spark.apache.org.
[Accessed: 28-Oct-2016].
[5] “Apache Flink.” [Online]. Available: http://flink.apache.org.
[Accessed: 28-Oct-2016].
[6] “Apache Samoa.” [Online]. Available:
https://samoa.incubator.apache.org. [Accessed: 10-Nov-2016].
[7] “Apache Storm.” [Online]. Available: http://storm.apache.org.
[Accessed: 28-Oct-2016].
[8] A. Toshniwal et al., “Storm@twitter,” Proc. 2014 ACM
SIGMOD Int. Conf. Manag. data - SIGMOD ’14, pp. 147–156,
2014.
[9] “Apache S4.” [Online]. Available:
http://incubator.apache.org/s4/. [Accessed: 10-Nov-2016].
[10] “Apache Samza.” [Online]. Available: http://samza.apache.org.
[Accessed: 10-Nov-2016].
[11] M. Blackstock and R. Lea, “WoTKit,” in Proceedings of the
Third International Workshop on the Web of Things - WOT ’12,
2012, pp. 1–6.
[12] “Node-Red.” [Online]. Available: https://nodered.org. [Accessed:
12-Nov-2016].
[13] S. Mayer, N. Inhelder, R. Verborgh, and R. Van De Wallet,
“User-friendly configuration of smart environments,” in 2014
IEEE International Conference on Pervasive Computing and
Communication Workshops, PERCOM WORKSHOPS 2014,
2014, pp. 163–165.
[14] H. Chen, R. H. L. Chiang, and V. C. Storey, Business
Intelligence and Analytics: From Big Data to Big Impact, vol.
36, no. 4. 2012.
[15] J. Demšar, B. Zupan, G. Leban, and T. Curk, “Orange: From
Experimental Machine Learning to Interactive Data Mining,”
Knowl. Discov. Databases PKDD 2004, pp. 537–539, 2004.
[16] M. R. Berthold et al., “KNIME-the Konstanz information miner:
version 2.0 and beyond,” AcM SIGKDD Explor. Newsl., vol. 11,
no. 1, pp. 26–31, 2009.
[17] I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S.
Mock, “Kepler: an extensible system for design and execution of
scientific workflows,” Sci. Stat. Database Manag. 2004.
Proceedings. 16th Int. Conf., vol. I, pp. 423–424, 2004.
[18] I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, and T. Euler,
“YALE: Rapid prototyping for complex data mining tasks,”
Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol.
2006, pp. 935–940, 2006.
[19] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, “MOA
Massive Online Analysis,” J. Mach. Learn. Res., vol. 11, pp.
1601–1604, 2011.
[20] P. Reutemann and J. Vanschoren, “Scientific Workflow
Management with ADAMS,” Knowl. Discov. Databases, pp.
833–837, 2012.
[21] “Apache Mahout.” [Online]. Available:
https://mahout.apache.org. [Accessed: 12-Nov-2016].
434
432