You are on page 1of 3

Retest of the bioinformatics module "Digital Sustainability"

**********************************************************
6 June 2023

Please write your answers directly into this file right below the respective
questions (or delete the wrong answers in the multiple choice section), save the
file and hand it in via ILIAS.

A) OPEN SOURCE, OPEN DATA, DIGITAL SUSTAINABILITY

Open Source Software (10 points)


Assume you are working in a company that developed a great piece of AI software to
analyze the DNA of a virus. Now the boss has heard that many companies are nowadays
releasing open source software. She asks you what the reasons are for this and how
the company could benefit by also releasing software.

1) Write one reason how the company could benefit from open sourcing the code. (5
points)

Attraction of a wider community of developers and users who can share knowledge,
contribute new features, this also saves cost through cooperation among users,
contribute new features etc. it can help to build trust with the community, which
can lead to an increase in the use of the software.
2) Describe one risk of open sourcing the code. (5 points)
Competition can arise from other organisations, since the code isn't protected and
free for all.

Open Data (8 points)


In the lecture, you learned about many different types of open data. Now let's look
at open bioinformatics data:

1) What types of open data are relevant in bioinformatics? Please describe one
example and explain why it is important in bioinformatics. (4 points)

Data that can help in research and also provide large amount of biological data e.g
Uniprot, Cancer genome Atlas, Genomic data, proteomic data etc

GENOMIC DATA
Is useful because it provides a lot of information about the genomic composition of
organisms.

2) What types of bioinformatics related open data is available today? Do short


research on the Internet and list one example you found including the link and data
format. (4 points)
A lot of data are available for bioinformatic research today, e.g cancer genome
atlas

Cancer genome atlas


https://www.cancer.gov/ccg/research/genome-sequencing/tcga

It uses a lot of format e.g the BAM format, which we used in RNA sequencing to
check for aligned sequencing reads.
Digital Sustainability (12 points)
Imagine you have a large data set of DNA sequencing data. What would you need to do
with the data in order to make it digitally sustainable?
I would make sure it meets the digital sustainability conditions which are :
Elaborateness
Transparent
Semantic information - it should give information that has a purpose
There should be storage of data in multiple locations
Open licensing regime
There should be shared tacit knowledge to reduce dependence on single companies
There should be good governance
Participatory culture, to enable continuous improvements
Diversified funding
It should have a contribution to sustainable development.

Please list 2 actions and describe why this action is important for digital
sustainability. (6 points per correct action)

A) Transparency
It increases trust and enables improvements. It allows people see how digital
actions affect our environments e.g by reducing waste, reducing co2 emissions etc

B Semantic information
By giving out information that has a purpose, machine readable and helps us to
understand and also communicate the data.

B) GIT AND GITLAB

Git:
If you need to list all the different versions of a file available in your Git
repository, which command would you use? (4 points)
a) git checkout
b) git branch
c) git log
d) git status

(C) GIT LOT

What does the git clone command do in Git? (4 points)


a) Delete a repository
b) Merge two branches
c) Copy a repository into a new directory
d) Update the local repository to the latest commit

(C) Copy a repository into a new directory

GitLab:
What feature does GitLab offer to help protect your main branch from accidental
changes? (4 points)
a) Issue Tracker
b) Merge Request Approval
c) Wiki Pages
d) Time Tracking
(A) ISSUE TRACKER

GitLab provides an environment for developing, deploying, and maintaining


applications. Which of the following is NOT a feature provided by GitLab? (3
points)
a) Version control
b) Continuous Integration/Continuous Deployment (CI/CD)
c) Automated code review
d) High-speed internet browsing
(D) High-speed internet browsing

C) LINKED DATA

1) What is the basic principle of the Resource Description Framework (RDF) and why
is this more universal than for example tabular (flat) data? (8 Points)
RDF is standard way of representing and exchanging data on the web , it is used to
describe different resources such as verb pages etc and it is machine readable. The
core structure is a set of triple which are the subject, object and predicate. All
RDF statements are joined to form a graph, the graph can be visualised as a node
and directed arc diagram.

2) Visit Wikidata Query Service with the prefilled query at https://w.wiki/h5f, run
the query (play button). Explain the results based on the triple patterns in the
query. (7 Points)
It is a list of male software developers
The Wikidata identity is in under the "person", while the "person label" is a list
of their labels.

You might also like