You are on page 1of 47

Bioinformatics: Introduction and Methods 生物信息学: 导论与方法

Week 14
Final Exam
Prev
Home


o

Quiz: Final Exam

40 questions

Quiz • 30 min

Final Exam
Submit your assignment
Due DateJan 25, 7:59 AM CET
Attempts3 every 8 hours
Receive grade
To Pass80 % or higher
Grade

Final Exam
Graded Quiz • 30 min

Due Jan 25, 7:59 AM CET

Final Exam

Total points 40
1.
Question 1

你想查询一个已知的蛋白质的三维结构是否已经被解析出来了,应该去访问的
数据库是

To which of the following databases should you refer in order to find out whether a
known protein has already had its 3D structure resolved?

1 point
OMIM

HGMD

dbGAP

PDB

2.
Question 2

以下测序质量中,代表测序错误率最低的是(单字以 phred33 形式记录)

Which of the following qualities of sequencing denotes the lowest sequencing error
rate?(single character recorded in phred33)

1 point

15

30

3.
Question 3

BAM 格式中不包括的信息有哪些

Which of the following information is NOT included in BAM format?


1 point

读断序列

The sequence of the read

读段比对程序的名字

The name of the read mapper program

读段的名字

The name of the read

读段的比对结果

The alignment result for reads

4.
Question 4

高通量测序技术的序列回帖算法思想最类似以下哪种?

To which of the following algorithms is the reads mapping algorithm applied in high-
throughput sequencing technique most similar with respect to their basic ideas?

1 point

Smith-Waterman 局部比对

Smith-Waterman local alignment

Kruskal 最小生成树算法

Kruskal algorithm for Minimum Spanning Tree


广度优先搜索

Breadth First Search

BLAST 索引和数据库搜索

BLAST index and database search

5.
Question 5

下列哪一种测序仪不是高通量测序仪?

Which of the following sequencers is not high-throughput?

1 point

ABI 3500 测序仪

ABI 3500 sequencer

Illumina X10 测序仪

Illumina X10 sequencer

Illumina HiSeq 测序仪

Illumina HiSeq sequencer

Ion Torrent proton 半导体测序仪

Ion Torrent proton semiconductor sequencer

6.
Question 6

以下不属于生物信息学研究内容的是

Which of the following statements does NOT belong to bioinformatics research?


1 point

基因组数据挖掘

Data mining from genomic data

基因组序列比对技术

Genome sequence alignment

构建系统发育树

Create phylogenic tree

动作和手势识别比对技术

Movement and gesture recognition and alignment

7.
Question 7

下列关于替换矩阵的说法哪些是正确的

Which of the following statements are correct with respect to substitution matrix? (2
correct options)

1 point

一种替换在自然界中越容易发生,则这种替换在打分矩阵中对应的数值越小

The easier it is for a particular substitution to happen in the real world, the smaller
score this substitution has in the scoring matrix

PAM1 矩阵比 PAM100 矩阵效果更好

PAM1 matrix is better than PAM100


替换矩阵一定是沿主对角线对称的矩阵

The substitution matrix is always a matrix that is symmetric with respect to its main
diagonal

改变替换矩阵不会影响序列比对结果

Changing substitution matrix won't influence the result of a sequence alignment

现在人们已经找到了序列比对时最好的打分矩阵

Now people have found the best scoring matrix for sequence alignment

替换矩阵的值由且仅由经验公式决定

Values in a substitution matrix depends and only depends on empirical formula

替换矩阵中没有 gap 的罚分

The gap penalty score is not in a substitution matrix.

BLOSUM62 矩阵比 BLOSUM90 矩阵效果更好

BLOSUM62 matrix is better than BLOSUM90

PAM100 矩阵比 PAM1 矩阵效果更好

PAM100 matrix is better than PAM1

替换矩阵的值反应了碱基间的相似程度
The values in substitution matrix denote the similarities between bases

BLOSUM90 矩阵比 BLOSUM62 矩阵效果更好

BLOSUM90 matrix is better than BLOSUM62

8-10

8.
Question 8

Smith-Waterman 算法和 Needleman-Wunsch 算法的说法中法哪些是正确的

Which of the following statements are correct with respect to Smith-Waterman


algorithm and Needleman-Wunsch algorithm? (3 correct options)

1 point

Needleman-Wunsch 算法更适用于长度相似的同源序列 (1)

Needleman-Wunsch algorithm is more suitable for homologous sequences with


similar length

Smith-Waterman 的结果优于 Needleman-Wunsch 的结果 (2)

Smith-Waterman outperforms Needleman-Wunsch

单独使用 Needleman-Wunsch 不适合用于高通量测序数据分析,Smith-Waterman


则适合 (3)

Needleman-Wunsch alone does not fit for next-generation sequencing data analysis,
while Smith-Waterman alone is suitable for that

单独使用 Smith-Waterman 不适合用于高通量测序数据分析,Needleman-Wunsch


则适合

Smith-Waterman alone does not fit for next-generation sequencing data analysis,
while Needleman-Wunsch alone is suitable for that
单独使用 Smith-Waterman 算法和 Needleman-Wunsch 算法均不适合用于高通量
测序数据分析

Neither Smith-Waterman algorithm nor Needleman-Wunsch algorithm alone are


suitable for next-generation sequencing data analysis

Needleman-Wunsch 只能有一个最优解,Smith-Waterman 可以有多个

Needleman-Wunsch can have only one optimal solution, while Smith-Waterman can
have multiple optimal solutions

Needleman-Wunsch 获得的是局部最优的结果,Smith-Waterman 比对获得的是全


局最优的结果

Needleman-Wunsch finds the locally optimal result, while Smith-Waterman find the
globally optimal result

Smith-Waterman 算法更适用于寻找两个蛋白序列之间相似的功能域

Smith-Waterman algorithm is more suitable for finding similar function domains from
two protein sequences

Needleman-Wunsch 的结果优于 Smith-Waterman 的结果

Needleman-Wunsch outperforms Smith-Waterman

同时使用 Smith-Waterman 算法和 Needleman-Wunsch 算法则适合于高通量测序


数据分析

Combination of Smith-Waterman algorithm and Needleman-Wunsch algorithm is


suitable for next-generation sequencing data analysis

1-2-3 4-6-10 3-4-10

9.
Question 9

大规模进行数据比对时不采用动态规划算法的最主要原因

What is the main reason that the dynamic programming algorithm is NOT used for
large-scale alignments?

1 point

消耗内存大

It cost too much memory

结果不稳定

Its outcome is not unstable

结果不准确

Its result is not accurate

算法不可靠

Its algorithm is not reliable

可重复性差

Its reproduciblility is poor

编程难度大

Difficult to program

运算速度慢
It runs too slow

10.
Question 10

BLAST 有关说法中正确的有哪些

Which of the following statements are correct with respect to BLAST?

1 point

现在的 BLAST 比对数据库需要建立索引

Current BLAST needs to build indices when aligning to databases

BLAST 屏蔽低复杂度区域的步骤没有作用,可以省略

The step of masking low-complexity regions in BLAST is useless and can be skipped

BLAST 适合对高通量数据进行拼接

BLAST is suitable for align NGS to genome

BLAST 一定能找到最优解

BLAST is guaranteed to find the optimal solution

BLAST 是目前最快的序列比对算法

BLAST is the fastest alignment algorithm ever

BLAST 运行较比动态规划算法速度慢

BLAST runs slower than dynamic programming


早期 BLAST 无法处理 gap

The early version BLAST can‘t deal with gap

1-3-4-6 1-3-4

11.
Question 11

对同一序列进行 tblastx 时,其运算量理论上是 blastn 的几倍

When doing sequence alignment to the same sequence, how many times is the
theoretical computational overload of tblastx as big as that of blastn?

1 point

1/6

1/4

1/3

12

2
1/2

1/5

1/36

36

1/24

24

12.
Question 12

哪个不是我们为了成功进行 BLAST 所需要调节的参数

Which of the following parameters is NOT needed to tune to run BLAST


successfully?

1 point

输入序列数量

The number of input sequences


选择的打分矩阵

Choice of substitution matrix

序列的名称

The names of input sequences

种子字长

Seed word size

屏蔽或不屏蔽低复杂度区域

Masking low-complexity regions or not

1-2-4 2-3-5

13.
Question 13

针对下图的说明中错误的是

Which of the following statements is NOT correct with respect to the figure below?
1 point

该情况下设置了 1,2,3 三种隐状态

Given such an HMM, there are three hidden states: 1, 2, and 3

该情况下我们在每个状态可以观察到 a,b,c 三个值

Given such an HMM, we can observe three values at each state: a, b, and c

可能产生符号序列 abccc 的由 1 起始由 3 结束的状态序列有 10 种

There are in total 10 different state paths that starts from 1, ends at 3, and can generate
the token sequence "abccc"

产生 abccc 的由 1 起始由 3 结束的状态中,概率最大的概率为 0.00072


For all the state paths that starts from 1, ends at 3, and can generate the token
sequence "abccc", the one with the largest probability of generating "abccc" has such
probability being 0.00072

14.
Question 14

各转移概率和生成概率如下表,则存在问题的一组是

The transition probabilities and emission probabilities are given below. Then which of
the following statements is NOT correct?

1 point

生成概率的 c 行

The Row c in emission matrix

生成概率的 n 行

The Row n in emission matrix

转移概率的 c 行

The Row c in transition matrix

转移概率的 n 行

The Row n in transition matrix


15.
Question 15

转录本分析中测定转录本表达水平的“金标准”(Gold Standard)是

What is the gold standard for quantifying the expression level of transcripts in
transcript analysis?

1 point

RNA-seq

表达序列标签

Expressed Sequence Tag

实时荧光定量 PCR

Real-time fluorescent quantitative PCR

基因芯片 microarray

Gene chip microarray

固相捕获

Solid phase capture

16.
Question 16

关于 RNA-Seq 中序列回帖的 Split reads 方法,下面说法中错误的是哪些

Which of the following statements are wrong with respect to the Split reads strategy
used in reads mapping in RNA-Seq?

1 point
该方法可以将所有读断定位到基因组上

This method can map all reads to genome

该方法不能发现新的外显子

The method cannot discover new exons

该方法能够发现新的剪切体

This method is capable of discovering new splicing isoforms

该方法常与 join exon 方法组合使用

This method is always used together with the "join exon" method

该方法运行速度较慢

This method runs slow

1-2-3 2-4-5

17.
Question 17

如下图,转录本 1 的表达量为 20,转录本 2 的表达量为 30,则基因外显子 1 和


2 的表达量分别为

As shown in the figure below, the Transcript 1 has its expression level being 20 and
Transcript 2 has its being 30. Then what are the expression levels of Exon 1 and 2,
respectively?

1 point
40, 40

600, 30

40, 30

40, 50

20, 30

50, 30

30, 50

10, 50

18.
Question 18

已知 RNA-Seq 测序数据回帖后在某个基因区间的情况如下图所示(请仔细观察
图片,不同尝试图片可能会变)

Assume that the RNA-Seq reads are mapped back to part of a gene as shown
below(please check the picture carefully, the picture may change in different trial)
则该基因至少有几种转录本?

Then what is the minimum number of transcripts this gene could have?

1 point

19.
Question 19

在上一题中,该基因最多有多少个转录本?(假设所有转录本均已被测到)

In the previous question, what is the maximum number of transcripts this gene could
have? Assume that all the transcripts of this gene have been sequenced

1 point

3
6

20.
Question 20

下面关于长非编码 RNA(lncRNA)的说法,正确的是哪些

Which of the following statements are correct with respect to long noncoding RNAs
(lncRNAs)?

1 point

lncRNA 只能 in cis 地发挥功能

lncRNAs can only function in cis

lncRNA 都没有功能

All lncRNAs have no function

lncRNA 可以比某些编码 RNA 更长

lncRNAs can longer than some coding RNAs

lncRNA 上没有外显子和读码框

There are no exons and open reading frames on lncRNAs


lncRNA 有可变剪接

lncRNAs can have alternative splicing

lncRNA 都没有 polyA 尾巴

All lncRNAs do not have polyA tails

3-4-5 3-4-5-6

21.
Question 21

关于非编码 RNA 的鉴定,下面说法错误的是

Which of the following statements is NOT correct with respect to the identification of
noncoding RNAs?

1 point

选择合适的特征组合可以提高鉴定的准确率

The accuracy of identification can be enhanced by choosing a proper set of features

可以鉴定出所有的非编码 RNA

We can identify all the ncRNAs correctly

可以利用序列的二级结构信息来鉴定非编码 RNA

We can use the structure information of sequence to identify ncRNAs

LOG-ODD score 分数越高,表明得到的 ORF 结果越可靠

The higher the LOG-ODD score is, the more reliable the ORF result would be
可以利用序列碱基保守性信息鉴定非编码 RNA

We can use the information of sequence conservation to identify ncRNAs

仅利用序列本身的特性无法实现非编码 RNA 的鉴定

It is impossible to identify ncRNAs using information from sequence only

2-3-4 2-4-6

22.
Question 22

已知一次试验中出错的概率是 0.2,而且每次试验都相互独立。则在 3 次试验中


至少有 2 次出错的概率是多少?

Assume that the probability that an error occurs in a trial is 0.2, and all trials are
independent of each other. Then what is the probability that, in three trials, there are at
least two of them that have an error occur?

1 point

0.040

0.096

0.148

0.006

0.104
0.084

23.
Question 23

利用 Bonferroni Correction 使得对于比较 50000 个基因的实验,犯一类错误的概


率低于 0.05,则每个具有统计显著性的基因的 p-value 应小于

We use Bonferroni Correction to set an upper bound of 0.05 for the value of the
probability that the Type I error occurs in a trial where 50000 genes are compared.
Then all the p-values of significant genes should be smaller than ____

1 point

1.0e-6

1.0e-10 NOT SURE

0.01

0.005

0.05

0.1

24.
Question 24

'vitamin transporter activity' 属于 GO 分类的哪一类?

Which of the following classes of GO does the "vitamin transporter acitivity" belong
to?

1 point
Biological Regulation

Biological Component

Molecular Process

Cellular Function

Molecular Function

Biological Function

Molecular Regulation

Biological Process

Cellular Component

Cellular Process

25.
Question 25

根据 KEGG 数据库,threonine dehydratase 在 Glycine, serine and threonine


metabolism 代谢通路中发挥什么作用?

According to KEGG database, what is the function of threonine dehydratase in


glycine, serine and threonine metabolism
http://www.genome.jp/kegg/

1 point

催化 Phosphoserine 转变为 Serine

It catalyzes the reaction where Phosphoserine is turned into Serine

催化 Phosphoserine 转变为 Glycine

It catalyzes the reaction where Phosphoserine is turned into Glycine

催化 Serine 和 Glycine 相互转变

It catalyzes the reaction where Serine and Glycine are transformed into each other

催化 Threonine 和 Glycine 相互转变

It catalyzes the reaction where Threonine and Glycine are transformed into each other

催化 Serine 和 Pyruvate 相互转变

It catalyzes the reaction where Serine and Pyruvate are transformed into each other

26.
Question 26

假如某次实验分析得到下面这组基因 list(Entrez Gene ID 格式)

Assume we get the gene list below in an analysis(in Entrez Gene ID format)

498

506

509

513
514

515

516

517

518

521

522

539

4508

4509

9551

10476

10632

27109

请问 KOBAS 富集性分析(默认参数)得到的最显著富集的 KEGG pathway 是?

Then what is the most enriched KEGG pathway given by KOBAS (with all
parameters set to default)?

KOBAS: http://kobas.cbi.pku.edu.cn/

1 point

Autism spectrum disorder

The citric acid (TCA) cycle and respiratory electron transport

Dravet syndrome
Carnitine shuttle

Metabolic pathways

Huntington's disease

Option text

Alzheimer's disease

Oxidative phosphrylation

Beta oxidation

27.
Question 27

对于上题中的基因 list,KOBAS 分析(默认参数)得到最显著富集的 GO term


是?

For the gene list given in the previous question, what is the most enriched GO term
given by KOBAS (with all parameters set to default)?

1 point

chemosynthesis

organelle envelope
cellular respiration

oxidative phosphorylation

ATP metabolic process

photophosphorylation

hydrogen transport

proton-transporting ATP synthase complex

cation transmembrane transporter activity

28.
Question 28

蛋白质结构域方面的信息可以从下列哪个中查到?

From which one can one find information about protein motifs?

1 point

PolyPhen-2

SIFT

InterPro
SOAP

BLAT

MEGA

IntAct

DAMBE

KOBAS

29.
Question 29

你能从 NCBI-PubMed 数据库中查到什么信息?

What information can you retrieve from NCBI-PubMed?

1 point

物种分类层级关系

The hierarchy of taxonomy

蛋白质结构

Protein structure

基因注释信息
Gene annotation

蛋白质序列

Protein sequence

基因组序列

Genome sequence

生命科学相关图书

Books about life sciences

NCBI 网站的培训视频和教学指导

Training videos and tutorials for NCBI

基因型-表型 关联数据

Genotype-phenotype relationship data

药物设计和靶点信息

Medicine design and target data

生命科学和医学相关文献和相关资源链接

Biological and Medical literature and related URLs

30.
Question 30
UCSC 提供了下列哪些有用的工具?

Which of the following tools are provided by UCSC?

1 point

BLAST

BatchPrimer3

MEME Suite

ClinVar

MedGen

Sequence Read Archive (SRA)

ClustalW2

SIFT

PolyPhen-2

In-Silico PCR
Genome Browser

Blat

2-4-5 1-2-7 6-8-9

31.
Question 31

GO 的拓扑结构是?

What is GO topology structure?

1 point

双向星型结构

bi-directional star

双环图

dual-ring graph

层次树

Hierarchical Tree

有向无环图

Directed Acyclic Graph

无向有环图

Undirected Graph with loop


无向树

Undirected Tree

总线结构

daisy-chain

32.
Question 32

世界上第一个被发现的新基因是

The first new gene discovered in the world is

1 point

Jingwei 基因

Jingwei gene

Hun 基因

Hun gene

BC200 基因

BC200 gene

BSC4 基因

BSC4 gene

POXP2 基因

POXP2 gene
FGF4 基因

FGF4 gene

Tre2 基因

Tre2 gene

Sphinx 基因

Sphinx gene

“猴王” 基因

Monkey King gene (mkg)

XIST 基因

XIST gene

33.
Question 33

下图所示的新基因起源机制是哪一种?

What is the mechanism of new gene origination described by the figure below?

1 point

基因水平转移
Lateral gene transfer

逆转录转座

Retrotransposition

基因重复

Gene duplication

外显子/结构域重排

exon/domain shuffling

可移动元件

mobile element

从头起源

De novo origination

34.
Question 34

给定图中的物种系统发生关系和基因在各物种中是否存在,依据最简约原则如
下哪一个推断是正确的?

Assume that we know the phylogeny and the existence of some genes as shown
below. Then which of the following statements is correct if we apply Occam's razor?
1 point

MNOP 是一个在物种 5 和物种 1,2,3,4 的祖先分岐后起源的新基因

MNOP is a new gene originated after the divergence of Species 5 and the ancester of
Species 1, 2, 3, and 4

IJKL 在物种 2,3,4,5 中独立地起源了 4 次

IJKL originated four times independently in Species 2, 3, 4, and 5

ABCD 是一个在物种 1 和 2 分岐后起源的新基因

ABCD is a new gene originated after the divergence of Species 1 and 2

EFGH 是一个在所有物种中都有的新基因

EFGH is a new gene that exists in all species

35.
Question 35

如下哪个生物信息学方法可以用来寻找新基因?

Which of the following bioinformatics methods can be used to find new genes?
1 point

SOAP

Blast

KOBAS

SIFT

BWA

36.
Question 36

如下哪个计算方法不能对一个之前未知的从头起源基因提供有用的信息?

Which of the following methods cannot provide useful information for a de novo new
gene about which we knew nothing before?

1 point

蛋白理化性质(如 pI 值)预测

Prediction of the physical and chemical properties of proteins, such as the pI value

基于已知功能基因的同源注释

Homologous annotation based on genes whose functions are known

从 RNA-Seq 数据得到的 mRNA 表达特点

The characteristics of mRNA expression obtained from RNA-Seq data


蛋白二级结构预测

Prediction of protein secondary structure

37.
Question 37

下列关于直系同源基因和旁系同源基因说法正确的是

Which of the following statements is correct with respect to orthologs and paralogs?

1 point

直系同源基因是由物种分化产生的

Orthologs are produced by speciation event

旁系同源基因是由物种分化产生的

Paralogs are produced by speciation event

旁系同源基因是由基因复制产生的

Paralogs are produced by gene duplication

直系同源基因是由基因复制产生的

Orthologs are produced by gene duplication

2-3 2-4 1-3

38.
Question 38

如下哪些技术可以用来提供转录组数据

Which of the following techniques can be used to obtain transcriptome data?

1 point
RNA-seq

Mass spectrometry

SNP chip

cDNA microarray

ALL 1-3 1-4

39.
Question 39

如下哪个物种具有人基因 SRGAP2C 的直系同源 DNA 序列

Which of the following species has orthologous DNA sequences for the human gene
SRGAP2C?

1 point

家猪

Sus scrofa domesticus

小家鼠

Mus musculus

临夏鸵鸟

Struthio linxiaensis
索氏桃花水母

Craspedacusta sowerby

黑腹果蝇

Drosophila melanogaster

大肠杆菌

Escherichia coli

酿酒酵母

Saccharomyces cerevisiae

黑猩猩

Pan troglodytes

北极熊

Ursus maritimus

斑马鱼

Brachydanio rerio

40.
Question 40

我们今天知道的基因组上含有基因最多的物种是

To the best of our knowledge, which of the following species has the most abundant
genes?
1 point

拟南芥

Arabidopsis thaliana

小家鼠

Mus musculus

酿酒酵母

Saccharomyces cerevisiae

北极熊

Ursus maritimus

大肠杆菌

Escherichia coli

黑腹果蝇

Drosophila melanogaster

大豆

Glycine max

智人
Homo sapiens

番茄

Solanum lycopersicum

2.
Question 2

以下测序质量中,代表测序错误率最低的是(单字以 phred33 形式记录)

Which of the following qualities of sequencing denotes the lowest sequencing error
rate?(single character recorded in phred33)

1 point

40

3.
Question 3

BAM 格式中不包括的信息有哪些

Which of the following information is NOT included in BAM format?

1 point
读段序列

The sequence of the read

读段比对的染色体名字

The name of the chromosome of the read alignment

读段的结构信息

The structure information of the read

读段的质量

The quality of the read

4.
Question 4

高通量测序技术的序列回帖算法思想最类似以下哪种?

To which of the following algorithms is the reads mapping algorithm applied in high-
throughput sequencing technique most similar with respect to their basic ideas?

1 point

Smith-Waterman 局部比对

Smith-Waterman local alignment

广度优先搜索

Breadth First Search

Kruskal 最小生成树算法
Kruskal algorithm for Minimum Spanning Tree

BLAST 索引和数据库搜索

BLAST index and database search

5.
Question 5

下列哪一种测序仪不是高通量测序仪?

Which of the following sequencers is not high-throughput?

1 point

罗氏 454 焦磷酸测序仪

Roche 454 pyrosequencer

Ion Torrent PGM 半导体测序仪

Ion Torrent PGM semiconductor sequencer

ABI SOLiD 测序仪

ABI SOLiD sequencer

ABI 3730 测序仪

ABI 3730 sequencer

6.
Question 6

以下不属于生物信息学研究内容的是
Which of the following statements does NOT belong to bioinformatics research? (2
correct options)

1 point

氨基酸序列比对技术

Amino acid sequence alignment

序列数据库搜索

Sequence database search

转录组序列比对技术

Transcriptome sequence alignment

表型预测方法

Functional prediction methods

测序仪的水平稳定控制

Stability control of a sequencer

基因组数据挖掘

Data mining from genomic data

基因组序列比对技术

Genome sequence alignment


动作和手势识别比对技术

Movement and gesture recognition and alignment

代谢分析图模型

Graph models for pathway analysis

构建系统发育树

Create phylogenic tree

1-4

7.
Question 7

下列关于替换矩阵的说法哪些是正确的

Which of the following statements are correct with respect to substitution matrix?

1 point

替换矩阵中没有 gap 的罚分

The gap penalty score is not in a substitution matrix.

现在人们已经找到了序列比对时最好的打分矩阵

Now people have found the best scoring matrix for sequence alignment

替换矩阵的值由且仅由经验公式决定

Values in a substitution matrix depends and only depends on empirical formula

改变替换矩阵不会影响序列比对结果
Changing substitution matrix won't influence the result of a sequence alignment

BLOSUM90 矩阵比 BLOSUM62 矩阵效果更好

BLOSUM90 matrix is better than BLOSUM62

You might also like