Big Data Workflow Management

Uploaded by

Gunveen Batra

0% found this document useful (0 votes)

8 views1 page

Original Title

Big data workflow management.docx

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

8 views1 page

Big Data Workflow Management

Uploaded by

Gunveen Batra

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 1

Search inside document

An important aspect of Big Data applications is the variability of technical needs and steps

based on applications being developed. These applications typically involving data ingestion,
preparation (e.g., extract, transform, and load), integration, analysis, visualization and
dissemination are referred to as Data Science Workflows. A data science workflow
development is the process of combining data and processes into a configurable, structured
set of steps that implement automated computational solutions of an application with
capabilities including provenance management, execution management and reporting tools,
integration of distributed computation and data management technologies, ability to ingest
local and remote scripts, and sensor management and data streaming interfaces. Each data
science workflow has a set of technological challenges that can potentially employ a number
of Big Data tools and middleware. Rapid programmability of applications on a use case basis
requires workflow management tools that can interface to and facilitate integration of other
tools. New programming techniques are needed for building effective and scalable solutions
span across the data science workflows. Flexibility of workflow systems to combine tools and
data together makes it an ideal choice for development of data science applications involving
common Big Data programming patterns.

Big Data workflows have been an active research area since the introduction of scientific
workflows. After the development and general adoption of MapReduce as a Big Data
programming pattern, a number of workflow systems were built or extended to enable
programmability of MapReduce applications including Oozie, Nova, Azkaban and
Cascading. The Kepler Workflow Environment, developed by the WorDS Center of
Excellence at SDSC led by Ilkay Altintas, also provide a distributed data-parallel (DDP)
programming module on MapReduce and other BigData programing patterns on top of well-
known Hadoop and Spark engines to build and execute big data workflows. The actor-
oriented approach of Kepler provides flexibility and improves application programmability
due to: (i) its heterogeneous nature in which Big Data programming patterns are placed as
part of other workflow tasks; (ii) its visual programming approach that does not require
scripting of Big Data patterns; (iii) its adaptability for execution of data parallel applications
on different execution engines.

Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
Rating: 5 out of 5 stars
5/5 (1)
Systematic Cloud Migration: A Hands-On Guide to Architecture, Design, and Technical Implementation
From Everand
Systematic Cloud Migration: A Hands-On Guide to Architecture, Design, and Technical Implementation
Taras Gleb
No ratings yet
Bindiya - 144628950
Document3 pages
Bindiya - 144628950
preeti d
No ratings yet
19mis0126 VL2021220101064 Pe003
Document35 pages
19mis0126 VL2021220101064 Pe003
Kndheera Kumar
No ratings yet
Management of Data Intensive Application Workflow in Cloud Computing
Document4 pages
Management of Data Intensive Application Workflow in Cloud Computing
International Journal of Application or Innovation in Engineering & Management
No ratings yet
S4: Distributed Stream Computing Platform
Document8 pages
S4: Distributed Stream Computing Platform
Otávio Carvalho
No ratings yet
Senger Et Al-2016-Concurrency and Computation Practice and Experience
Document4 pages
Senger Et Al-2016-Concurrency and Computation Practice and Experience
Mahalingam Ramaswamy
No ratings yet
Enhancing and Scalability in Big Data and Cloud Computing: Future Opportunities and Security
Document7 pages
Enhancing and Scalability in Big Data and Cloud Computing: Future Opportunities and Security
prayas jhariya
No ratings yet
Vindhya Pulicharla: Email: PH: 919-655-9520
Document7 pages
Vindhya Pulicharla: Email: PH: 919-655-9520
raheem d
No ratings yet
Vindhya Pulicharla: Email: PH: 919-655-9520
Document7 pages
Vindhya Pulicharla: Email: PH: 919-655-9520
raheem d
No ratings yet
Vindhya Pulicharla: Email: PH: 919-655-9520
Document7 pages
Vindhya Pulicharla: Email: PH: 919-655-9520
raheem d
No ratings yet
Exploration On Big Data Oriented Data Analyzing and Processing Technology
Document7 pages
Exploration On Big Data Oriented Data Analyzing and Processing Technology
Rookie Vishwakant
No ratings yet
Module 1 Glossary What Is Big Data
Document2 pages
Module 1 Glossary What Is Big Data
Hafiszan
No ratings yet
Testmagzine Admin,+115+manuscript 6
Document8 pages
Testmagzine Admin,+115+manuscript 6
jefferyleclerc
No ratings yet
Jargon of Hadoop Mapreduce Scheduling Techniques A Scientific Categorization
Document33 pages
Jargon of Hadoop Mapreduce Scheduling Techniques A Scientific Categorization
Thet Su Aung
No ratings yet
Profile
Document6 pages
Profile
Đoan Doãn
No ratings yet
A Comparison of Mapreduce and Parallel Database Management Systems
Document5 pages
A Comparison of Mapreduce and Parallel Database Management Systems
Ron Cyj
No ratings yet
Ijwsc 030401
Document13 pages
Ijwsc 030401
ijwsc
No ratings yet
Developing Enterprise Gis-Based Data Repositories For Municipal Infrastructure Asset Management
Document19 pages
Developing Enterprise Gis-Based Data Repositories For Municipal Infrastructure Asset Management
Nikita Bhagat
No ratings yet
Unit 2 Data Management and Processing System
Document43 pages
Unit 2 Data Management and Processing System
Shasan Sapkota
No ratings yet
Updated Trends 10
Document12 pages
Updated Trends 10
pammy313
No ratings yet
Mapreduce: Simplified Data Analysis of Big Data: Sciencedirect
Document9 pages
Mapreduce: Simplified Data Analysis of Big Data: Sciencedirect
mfs core
No ratings yet
3412ijwsc01 PDF
Document13 pages
3412ijwsc01 PDF
ssambangi555
No ratings yet
Data Architect
Document10 pages
Data Architect
Sajid Khan Pathan
No ratings yet
Analysis of Open Source Computing Techniques & Comparison On Cloud-Based ERP
Document7 pages
Analysis of Open Source Computing Techniques & Comparison On Cloud-Based ERP
International Journal of Application or Innovation in Engineering & Management
No ratings yet
Near Real-Time Big-Data Processing For Data Driven Application
Document8 pages
Near Real-Time Big-Data Processing For Data Driven Application
Teddy Iswahyudi
No ratings yet
Venk DSss
Document4 pages
Venk DSss
Rajesh Kumar
No ratings yet
Powercenter Big Data Edition - Data Sheet - 2194 PDF
Document4 pages
Powercenter Big Data Edition - Data Sheet - 2194 PDF
Farhaan Shaikh
No ratings yet
The Merger of Various Structured Techniques
Document3 pages
The Merger of Various Structured Techniques
sneel.bw3636
No ratings yet
Desarrollo Basado en Modelos de Aplicaciones Intensivas de Datos Sobre Recursos de La Nube
Document32 pages
Desarrollo Basado en Modelos de Aplicaciones Intensivas de Datos Sobre Recursos de La Nube
Daniela Baez
No ratings yet
Resume Ramses EnglishBAv13
Document7 pages
Resume Ramses EnglishBAv13
Grupo Nayar
No ratings yet
A Framework For Web GIS Development A Re
Document5 pages
A Framework For Web GIS Development A Re
Pius Agyekum
No ratings yet
Data Architect - GC
Document4 pages
Data Architect - GC
myprents.india
No ratings yet
KannanS Bigdata Resume Oct10
Document5 pages
KannanS Bigdata Resume Oct10
praja
No ratings yet
41-Career Opportunities and Job Roles in Cloud Computing
Document2 pages
41-Career Opportunities and Job Roles in Cloud Computing
Yesmine Makkes
No ratings yet
An Application of Map Reduce For Domain Reducing in Cloud Computing
Document9 pages
An Application of Map Reduce For Domain Reducing in Cloud Computing
Idrex Sa Ed
No ratings yet
Nishant Agarwal Resume
Document2 pages
Nishant Agarwal Resume
valish silverspace
No ratings yet
Parallel Computing On Clouds Map-Reduce Using Hadoop
Document18 pages
Parallel Computing On Clouds Map-Reduce Using Hadoop
manikanta
No ratings yet
Analyzation and Comparison of Cloud Comp
Document12 pages
Analyzation and Comparison of Cloud Comp
lalithavasavi12
No ratings yet
Kiran's Resume
Document1 page
Kiran's Resume
Sagar Sb
No ratings yet
Cloud Based
Document6 pages
Cloud Based
Azhar Khan
No ratings yet
B
Document4 pages
B
aniruddha
No ratings yet
Cloud Analytics Ability To Design, Build, Secure, and Maintain Analytics Solutions On The Cloud
Document5 pages
Cloud Analytics Ability To Design, Build, Secure, and Maintain Analytics Solutions On The Cloud
Editor IJTSRD
No ratings yet
Naveen Kumar Nemani Sr. Big Data Engineer: Summary
Document6 pages
Naveen Kumar Nemani Sr. Big Data Engineer: Summary
Vrahta
No ratings yet
Product Sheet: Delivering The Best Possible Design
Document2 pages
Product Sheet: Delivering The Best Possible Design
evonik123456
No ratings yet
Impact and Implications of Big Data Analytics in Cloud Computing Platforms
Document8 pages
Impact and Implications of Big Data Analytics in Cloud Computing Platforms
IJRASETPublications
No ratings yet
The Economic Value of Migrating On-Premises SQL Server Instances To Microsoft Azure SQL Solutions
Document12 pages
The Economic Value of Migrating On-Premises SQL Server Instances To Microsoft Azure SQL Solutions
Jos
No ratings yet
Big Data SE vs. SE of BD Systems
Document46 pages
Big Data SE vs. SE of BD Systems
Umm Shahad
No ratings yet
Data Mining y Cloud Computing
Document7 pages
Data Mining y Cloud Computing
Itecons Perú
No ratings yet
Deswik Infrastructure Construction Brochure
Document6 pages
Deswik Infrastructure Construction Brochure
Bryan
No ratings yet
Mapr Informatica Whitepaper
Document4 pages
Mapr Informatica Whitepaper
parashara
No ratings yet
Rohit Singla Resume New
Document2 pages
Rohit Singla Resume New
Puneet
No ratings yet
Naukri Vamsi (7y 5m)
Document5 pages
Naukri Vamsi (7y 5m)
Arup Naskar
No ratings yet
Simplifying Data Engineering Databricks
Document20 pages
Simplifying Data Engineering Databricks
Huy Hóm Hỉnh
100% (1)
Oaa 12c Whitepaperv6 2618427
Document20 pages
Oaa 12c Whitepaperv6 2618427
Rajesh
No ratings yet
2017 06v07 Anaconda Deployment Whitepaper
Document8 pages
2017 06v07 Anaconda Deployment Whitepaper
l1th1um
No ratings yet
Programme: Cartography and Gis Course: Gis Tools Course Code: Cag 311 Credit Unit: 2
Document22 pages
Programme: Cartography and Gis Course: Gis Tools Course Code: Cag 311 Credit Unit: 2
Abubakar Aminu Usman
No ratings yet
Mechanical Engineers' Training in Using Cloud and Mobile Services in Professional Activity
Document12 pages
Mechanical Engineers' Training in Using Cloud and Mobile Services in Professional Activity
Art
No ratings yet
Career Summary: Master of Science in Computers & Electrical Engineering, University of Missouri, KC.-2015
Document4 pages
Career Summary: Master of Science in Computers & Electrical Engineering, University of Missouri, KC.-2015
Umesh Kumar
No ratings yet
Naveen Resume
Document4 pages
Naveen Resume
bala murugan
No ratings yet
Why Is Big Data Processing Different?
Document23 pages
Why Is Big Data Processing Different?
Gunveen Batra
No ratings yet
Exploring Sensor Data - Wee 2 - Model and MGMT - Odt
Document3 pages
Exploring Sensor Data - Wee 2 - Model and MGMT - Odt
Gunveen Batra
No ratings yet
Summary of Big Data Modeling and Management
Document6 pages
Summary of Big Data Modeling and Management
Gunveen Batra
No ratings yet
10 Precision Medicine Gupta FinalBM2 1 1
Document23 pages
10 Precision Medicine Gupta FinalBM2 1 1
Henrique Moura
No ratings yet
Exploring Pandas Dataframe - Week 2 - Integration and Proc - Odt
Document4 pages
Exploring Pandas Dataframe - Week 2 - Integration and Proc - Odt
Gunveen Batra
No ratings yet
Instructuions For Setting Up Twitter - Week 4 - Model and MGMT PDF
Document2 pages
Instructuions For Setting Up Twitter - Week 4 - Model and MGMT PDF
Gunveen Batra
No ratings yet
Instructions On Configuring Virtual Box For Spar Streaming - Week 5 - Intrgeation and Proc - Odt
Document2 pages
Instructions On Configuring Virtual Box For Spar Streaming - Week 5 - Intrgeation and Proc - Odt
Gunveen Batra
No ratings yet
Array Data Model of Image - Week 2 - Model and MGMT
Document3 pages
Array Data Model of Image - Week 2 - Model and MGMT
Gunveen Batra
No ratings yet
Instructions For Download Hands On Data Sets - Week 1 - Integration and Proc - Odt
Document2 pages
Instructions For Download Hands On Data Sets - Week 1 - Integration and Proc - Odt
Gunveen Batra
No ratings yet
How Do I Figure How To Run Hadoop Map Reduce - Week 3 - Intro - Odt
Document1 page
How Do I Figure How To Run Hadoop Map Reduce - Week 3 - Intro - Odt
Gunveen Batra
No ratings yet
Instructions For Splunk Pivot Tutorial - Week 3 - Integration and Proc - Odt
Document1 page
Instructions For Splunk Pivot Tutorial - Week 3 - Integration and Proc - Odt
Gunveen Batra
No ratings yet
Pasts Sauce For Map Reduce - Week 3 - Intro
Document3 pages
Pasts Sauce For Map Reduce - Week 3 - Intro
Gunveen Batra
No ratings yet
Querying Relational Data With Postgres - Week 1 - Integration and Proc
Document4 pages
Querying Relational Data With Postgres - Week 1 - Integration and Proc
Gunveen Batra
No ratings yet
Querying MongoDB - Week 2 - Integration and Proc
Document5 pages
Querying MongoDB - Week 2 - Integration and Proc
Gunveen Batra
No ratings yet
Opening Jupyter - Week 1 - Integration and Proc
Document2 pages
Opening Jupyter - Week 1 - Integration and Proc
Gunveen Batra
No ratings yet
Disaster Management - An Overview
Document38 pages
Disaster Management - An Overview
Gunveen Batra
No ratings yet
Attachment3. Cloud Architecture
Document10 pages
Attachment3. Cloud Architecture
Gunveen Batra
No ratings yet
Attachment2. Cloud Services and Deployme
Document29 pages
Attachment2. Cloud Services and Deployme
Gunveen Batra
No ratings yet
Mongodb Queries
Document1 page
Mongodb Queries
Gunveen Batra
No ratings yet
Ijcse10 01 04 51
Document5 pages
Ijcse10 01 04 51
syed_umar_amin
No ratings yet