Professional Documents
Culture Documents
BIGPROD 04042022 Expert Session Slides
BIGPROD 04042022 Expert Session Slides
PROD
This project has received funding from the European
Union's Horizon 2020 research and innovation
programme under grant agreement No 870822
Agenda
• Introduction
• A comparison of NACE and Microsoft Academic Graph (MAG) based
industry classifications
• Field of Study (FOS) code-based digitalization score
• Academy-Industry collaboration based on website data
• Q&A
PROD
This project has received funding from the European
Union's Horizon 2020 research and innovation
programme under grant agreement No 870822
Addressing the productivity paradox
• The objective of the project is:
• To extend existing econometric approaches on productivity with a theoretically sound “Big
data” measures that can be operationalized and validated through pilots.
• To have deep stakeholder consultation mitigating the skills gap, creating transparency,
enabling stakeholder influence in sources and tools and enabling policy makers being
informed on tools and pilots.
Data
Data processing
183,161 medium- o Manufacture of basic pharmaceutical products and pharmaceutical preparations (21)
high and high tech Targeting
o Manufacture of computer, electronic andvarious
optical aspects
productsof(26)
companies micro-level
o Manufacture of air and spacecraft innovation
and related activity
machinery (30.3) Enriching by linking
Web scraping
o Manufacture of chemicals and such as collaborative
chemical activities,
products (20) to publicly available
o Manufacture of weapons and company’s
ammunition products,
(25.4) and use of data
o Manufacture of electrical equipment (27) standards
96,921 med-high and high-technology companies
o Manufacture of machinery and equipment n.e.c. (28)
o Manufacture of motor vehicles, trailers and semi-trailers
Constructing the DB (29)
o Manufacture of other transport equipment (30) excluding Building of ships and boats
(30.1) and excluding Manufacture of air and spacecraft
Relational andasrelated
database machinery
a PostgreSQL (30.3)
database
o Manufacture of medical and dental instruments and7supplies
including (32.5)
data tables and 28 variables
Financial data
47,826 companies
EU-27 and UK
This project has received funding from the European
Union's Horizon 2020 research and innovation
programme under grant agreement No 870822
Database structure
What is on websites anyway?
• Websites offer complementary data on companies’ innovation
activities, compared to patenting and publication activities,
particularly in downstream innovation activities (Gök et al., 2015).
• Interesting work on capturing innovation related data from websites
include Kinne & Axenbeck (2020), Arora et al. (2020) and Li et al.
(2018).
• We did a content analysis of sample of BIGPROD data companies.
• The content analysis of 38 companies’ websites, including large and SME
firms in both B2B and B2C; showed that the website information can be
categorized into 7 categories:
What we get from webpages
Category Description
The website attempts to signal the competitive advantages of the firm’s products and services. The competitive
Competitive
1 advantages can be related to quality and technology level, multi-aspect oriented, affordability, and product/ service
advantage
standards.
Competence The website communicates the firm’s competencies and capabilities. This message also can be highlighted using the
2 and firm’s knowledge/capabilities in offering diverse solutions, firm’s leadership and dominance in the market as well as
capabilities the relations with other the firms/ brands.
Corporate The website communicates corporate social responsibilities in terms of how social responsibility concerns are
3 social engaged in the company’s business activities and policies. Corporate social responsibility may include sustainability
responsibility issues, philanthropic activities, as well as inclusion and diversity.
What we get from webpages
Category Description
Ethics and The website may communicate the firm’s ethics and compliances, which can be explained through
4
compliance codes of conduct and ethical frameworks.
Organization The website describes the organizational structure, investors relations, and corporate governance. This
5
al structure message also may cover the firm’s mission and vision, long-term strategies, and growth framework.
The website presents the financial documents and earnings of the company, to show the profitability
6 Financials of the company.
The website targets the current and future suppliers and logistics partners to communicate the firm’s
7 Supply chain strategies in alliances in growing the business.
A comparison of NACE
and Microsoft Academic
Graph (MAG) based
industry classifications
PROD
This project has received funding from the European
Union's Horizon 2020 research and innovation
programme under grant agreement No 870822
European Classification of Economic Activities (NACE)
Limitations
• Distinguishing all activities and being inclusive.
• Changes in economic structures and organizations and technological
developments give rise to new activities and products, which may supersede
existing activities and products.
• The difference in the identification and grouping of similar economic activities
associated with moving to the new NACE implies a statistical break in the time
series.
• Therefore, NACE classification is under constant review process. NACE Rev. 2 the
detail of the classification has substantially increased (from 514 to 615 classes).
Mitigating NACE limitations
• Since NACE Rev 2., introduction, attempts been made to map and cross-validate the industry-
driven codes to research and technologies (Schmoch et al., 2003) -> analysis relationships
between industries from their technology capabilities
• The background for reallocation of different indicators is that many indicators commonly used in
innovation research at the meso-level to measure the output of innovation systems are measured
at different scales (Neuhäusler, Frietsch and Kroll, 2019).
• In order to assess the effects of innovation indicators on various social, economic, environmental
and technical events, some concordance efforts have been made to re-allocate difference
indicators to each other (Frietsch et al., 2017; Neuhäusler et al., 2017).
• In recent attempts, probabilistic concordance schemes has been generated for assignment of
patents and scientific publications to NACE codes (Neuhäusler, Frietsch and Kroll, 2019).
Methodology
• New NACE classifications reallocation and concordance model.
• Web scrapping exercise on companies' web pages and retrieved the textual content
• Map companies' activities indicated
in their websites to a hierarchical topic Company meta data
from ORBIS
modeling classification - Company ID
- Company Name
- NACE
• Populating companies' NACE code classification
- Website address
classification along the
web scraped topic modeled Mapping text Website Scrapping
Business and
new classifications. economic activity
content to MAG
- Classification by
– Automated process
with Python
classified by
NLP and topic scripting language
NACE codes (4
modelling based -Text cleaning and
digits)
on text content harmonizing
Microsoft Academic Graph (MAG)
Microsoft Academic Graph
(MAG) is a large heterogeneous
graph comprised of more than
200 million publications and the
related authors, venues,
organizations, and fields of study.
PROD
This project has received funding from the European
Union's Horizon 2020 research and innovation
programme under grant agreement No 870822
What is digitalization and digital capability?
Use of digital technologies to innovate business routines toward more
efficient and flexible performance, providing new revenue streams
through defining new business models, and promoting competitive
advantages by exploiting value-producing opportunities.
Monitoring the
digital
environment (Annarelli et al., 2021)
Product digitalization
• Despite all aspects of digitalization capabilities such as adopting of digital technologies, resources and
infrastructure, digitalization capabilities deliver different viewpoint than the firms’ orientation for
development of digital products.
• In manufacturing industries competitive advantages are promoted based on the technology and products.
The promotion of new technologies is significantly substantial in high-tech industries and R&D inputs are in
direction of new product development. (Hagedoorn and Cloodt, 2003 ;Björkdahl, 2020)
• The promotion of products into innovative digital products is a profound form of digital innovation.
Deployment of digital components in products can increase efficiency and functionality of product, and also
simplify the editability/upgradability. (Björkdahl, 2020)
• However, development of digital integrated products (digitized products) may require redesigning of
traditional physical products, that can impose high costs of develpment. Moreover, the hidden burdens such
as restructuring of development process, like testing the new product which is crucial in manufacturing
industries, or reusability of development platforms should be considered. (Björkdahl, 2020)
Measure development (data source)
• Measuring the capabilities using the conventional data source is hampered by the data coverage (Arora et
al., 2020). Instead, this analysis measures digitalization using a novel methodology through the companies’
webpages.
• Websites provide valuable information of information on company behavior (Gök, Waterworth and Shapira,
2015; Kinne and Axenbeck, 2020; Axenbeck and Breithaupt, 2021)
• Communicating the firm capabilities throughout the website enables the use of webpage in the
development of extensive internal capability measures at a large scale. Moreover, utilizing webpages as data
source facilitates more frequent and updated data in comparison with the conventional data source (Arora
et al., 2020).
Operationalization
Digitalization scores, investigate the presence of FOS ids associated with computer science.
𝑛𝑑𝑖𝑔𝑖𝑡𝑎𝑙
𝑃𝑟𝑜𝑑𝑢𝑐𝑡 𝑑𝑖𝑔𝑖𝑡𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 =
𝑛𝑛𝑜𝑛−𝑑𝑖𝑔𝑖𝑡𝑎𝑙 + 𝑛𝑑𝑖𝑔𝑖𝑡𝑎𝑙
𝑛𝑑𝑖𝑔𝑖𝑡𝑎𝑙 is the number of digital products of the firm, 𝑛𝑛𝑜𝑛−𝑑𝑖𝑔𝑖𝑡𝑎𝑙 is associated to the number of non-digital products.
𝑖
σ𝑛𝑖=0 𝑥𝑑𝑖𝑔𝑖𝑡𝑎𝑙
𝑐𝑎𝑝𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑑𝑖𝑔𝑖𝑡𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 = 𝑗
𝑖
σ𝑛𝑖=0 𝑥𝑑𝑖𝑔𝑖𝑡𝑎𝑙 + σ𝑚
𝑗=0 𝑥𝑛𝑜𝑛−𝑑𝑖𝑔𝑖𝑡𝑎𝑙
𝑖
𝑥𝑑𝑖𝑔𝑖𝑡𝑎𝑙 is the similarity score of FOS id i, known as a digital FOS id, when the website contains n digital FOS ids,
𝑗
𝑥𝑛𝑜𝑛−𝑑𝑖𝑔𝑖𝑡𝑎𝑙 is the similarity score of FOS id j, known as a non-digital FOS id, when the website contains m non-digital FOS
ids
Case examples
PROD
This project has received funding from the European
Union's Horizon 2020 research and innovation
programme under grant agreement No 870822
Number of connections
• Literature has extensively looked at the importance of • What we are proposing here is an additional vantage point
collaboration and in particular the collaboration to measuring collaboration between industrial actors and
between industrial actors and research organization between industrial actors and research organization.
(Cohen 2002; Suominen 2018)
• The measure is based on a web scraping a sample of
• Particularly ecosystems have seen a lot of research medium-high and high-technology companies from EU
and UK.
• Methods to operationalize collaboration include, but • The main motivation for the work is to offer additional
are surely not limited to measures to the partial views offered by existing
measures.
• Joint patenting (Petruzzelli 2009)
• Making no claims of superiority we see that the
• Research grants from industry to research organizations webscraped data offers insights to:
(D’Este et al. 2013) • Collaborative differences between different industries or by the size of
companies
• Survey data (e.g. Community Innovation Survey) (Kobarg et • Deep analysis on a geospatial level, analyzing regional ecosystem and
al 2013) the importance of distance at scale-
• Analyzing collaboration by thematic factors
• Co-publishing between academia and industry (Abramo et
al. 2009) • Analyzing collaborative differences between industry-industry and
industry-research organization
Number of connections
• The data contains 222 756 instances of
collaboration. From these:
• 57 899 are unique collaborators
• Overall,
• 19.4 % of the companies have mentioned their
collaborative activities on their website,
PROD
This project has received funding from the European
Union's Horizon 2020 research and innovation
programme under grant agreement No 870822
Quantitative Science and Technology Studies team,
Foresight-driven Business Strategies, VTT Technical
Research Centre of Finland
This project has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No 870822