You are on page 1of 5

Open access interview: Questions

Background to the video:


The Library Research Team undertook 50 semi-structured interviews to ascertain researchers
understanding, attitudes and behaviours around open access (OA) publishing. The results highlighted
that most of the respondents could articulate what open access was, but could not always
differentiate between Green and Gold OA publishing. The respondents felt that OA was beneficial to
them, the research community, and the general public, but a large percentage did not automatically
publish through OA. Respondents stated that the university could encourage authors to publish via
OA by providing appropriate funding, plus regularly reminding research staff of good practice
through case studies and the experiences of colleagues.

Purpose of the video:


You as a researcher, highlighting your experience of open access publishing and the benefits
to you, your colleagues, the university and the wider community. Use of examples relating
to your publishing experience would be really good.

Questions:
1. Please can you introduce yourself and your role at NTU?
2. What are your main research interests?
3. How do you normally publish your research outputs?
4. What does open access mean to you?
5. Why is open access important to you?
6. What are the benefits of open access publishing?
7. What advice would you give colleagues in relation to open access publishing?
Possible things to consider for inclusion in your answers

What does open access mean to you?


 Scholarly material free to the reader
 Green and Gold open access
 Deposit in IRep is green route, fee for publishing is Gold route

Why is open access important to you?


 Wider audience
 Increased readership
 Increased potential for greater collaboration
 Potential for higher citation rate
 Compliance with funding councils and submission to REF

What are the benefits of open access publishing?


 For self and research groups: raises profile and can lead to greater collaboration
opportunities and citations
 For NTU: raise profile and can lead to more funding opportunities
 For community: Free sharing of knowledge, both for academics who work in
institutions that cannot afford journal subscriptions, but also members of the public

What advice would you give colleagues in relation to open access publishing?
 If not published research through Gold OA, still can be RCUK and REF compliant by
submitting bibliographic data and full text to IRep as soon as accepted for
publication
 Deposit in IRep is easy
 Help is available - The Library Research Team can provide guidance on Choosing
where to publish; Copyright; Embargoes and APCs
Principal compooem analy,;" or rcA.;s a technique tha! is "'idely u<ed for appli.

cations such as dimensionality .-eduction, lossy data comprc"ion, feature e>tracti"".

and data v;,ualizatiOll (Jolliffe, 2(02). It;s also kno.... " as tile Karoan.n·I..,;"" tran,·

f~.

lbcrc an: t....o commonly used definitions of PeA that giye rise to the >arne

algorithm. PeA can be defined as the unhog<lnal projtttion of the data O/1tO a lo....er

dimensionallincar space. kno....n as the pri/lcip.al $uh.•p.aa. soch that the \'ariance of

the projttted data i' ma~imi,e<J (1I",.lIing. 1933). Equi"alemly,;t can be defined as

tbe linear projection that minimi"'. the average projttlion cost. defined as t~ mean

squa.-ed distance !letween the data [>Oint< and tbeir p<ojtttioo, (Pearson, 19(1). The

l"J'"OC"s< of onhogonal projection i' illustraled in FiguTe 12.2. We con,ider each of

these definitions in tum.

12,1.1 Mllximllm variance lormulation

Con,ider a dala set <If obser"\lations {x,,} where" = 1..... S, and x" i, a

Euclidean variable "'ilh dimen,ionality D. Our goal is to project If>/:: data onto a

'pace ha"ing dimen,ionality M < D" hile Ill3Jli",i,illg the "ariallCe ofthe projttted

data. For the !noll..nl. we 'hall assume that tbe "alue of M is g;\·en. Latcr in this

562 12. CONTINUOUS LATENT VARIABLES

chapter, we shall consider techniques to determine an appropriate value of IV! from

the data.

To begin with, consider the projection onto a one-dimensional space (M = 1).

We can define the direction of this space using a D-dimensional vector Ul, which

for convenience (and without loss of generality) we shall choose to be a unit vector

so that ufUl = 1 (note that we are only interested in the direction defined by Ul,

not in the magnitude of Ul itself). Each data point X n is then projected onto a scalar

value ufX n . The mean of the projected data is ufx where x is the sample set mean

given by

(12.1)

and the variance of the projected data is given by

where S is the data covariance matrix defined by

1N
S = - "(xn - x)(xn - x)T NLJ . n=l

(12.2)

(12.3)

Appendix E

We now maximize the projected variance UfSUl with respect to Ul. Clearly, this has

to be a constrained maximization to prevent Ilulll ..... 00. The appropriate constraint

comes from the normalization condition ufUl = 1. To enforce this constraint,

we introduce a Lagrange multiplier that we shall denote by AI, and then make an

unconstrained maximization of

(12.4)

By setting the derivative with respect to Ul equal to zero, we see that this quantity

will have a stationary point when

(12.5)

which says that Ul must be an eigenvector of S. If we left-multiply by uf and make

use of ufUl = 1, we see that the variance is given by

(12.6)

and so the variance will be a maximum when we set Ul equal to the eigenvector

having the largest eigenvalue AI. This eigenvector is known as the first principal

component.

We can define additional principal components in an incremental fashion by

choosing each new direction to be that which maximizes the projected variance

Exercise 12.1

Section 12.2.2

Appendix C

12.1. Principal Component Analysis 563

amongst all possible directions orthogonal to those already considered. If we consider the general
case of an M -dimensional projection space, the optimal linear projection for which the variance of
the projected data is maximized is now defined by

the M eigenvectors U 1, ... , U M of the data covariance matrix S corresponding to the

M largest eigenvalues >'1, ... ,AM. This is easily shown using proof by induction.

To summarize, principal component analysis involves evaluating the mean x

and the covariance matrix S ofthe data set and then finding the M eigenvectors of S
corresponding to the M largest eigenvalues. Algorithms for finding eigenvectors and

eigenvalues, as well as additional theorems related to eigenvector decomposition,

can be found in Golub and Van Loan (1996). Note that the computational cost of

computing the full eigenvector decomposition for a matrix of size D x Dis O(D3).

If we plan to project our data onto the first M principal components, then we only

need to find the first M eigenvalues and eigenvectors. This can be done with more

efficient techniques, such as the power method (Golub and Van Loan, 1996), that

scale like O(MD 2 ), or alternatively we can make use of the EM algorithm.

Because we wish to find a sequential sampling scheme, we shall suppose that a set of samples and
weights have been obtained at time step n, and that we have subsequently observed the value of
xn+1, and we wish to find the weights and samples at time step n + 1. We first sample from the
distribution p(zn+1|Xn). This is 646 13. SEQUENTIAL DATA straightforward since, again using Bayes’
theorem p(zn+1|Xn) = p(zn+1|zn, Xn)p(zn|Xn) dzn = p(zn+1|zn)p(zn|Xn) dzn = p(zn+1|zn)p(zn|xn,
Xn−1) dzn = p(zn+1|zn)p(xn|zn)p(zn|Xn−1) dzn p(xn|zn)p(zn|Xn−1) dzn = l w(l) n p(zn+1|z(l) n )
(13.119) where we have made use of the conditional independence properties p(zn+1|zn, Xn) =
p(zn+1|zn) (13.120) p(xn|zn, Xn−1) = p(xn|zn) (13.121) which follow from the application of the d-
separation criterion to the graph in Figure 13.5. The distribution given by (13.119) is a mixture
distribution, and samples can be drawn by choosing a component l with probability given by the
mixing coefficients w(l) and then drawing a sample from the corresponding component. In summary,
we can view each step of the particle filter algorithm as comprising two stages. At time step n, we
have a sample representation of the posterior distribution p(zn|Xn) expressed as samples {z (l) n }
with corresponding weights {w(l) n }. This can be viewed as a mixture representation of the form
(13.119). To obtain the corresponding representation for the next time step, we first draw L samples
from the mixture distribution (13.119), and then for each sample we use the new observation xn+1
to evaluate the corresponding weights w(l) n+1 ∝ p(xn+1|z (l) n+1). This is illustrated, for the case of
a single variable z, in Figure 13.23. The particle filtering, or sequential Monte Carlo, approach has
appeared in the literature under various names including the bootstrap filter (Gordon et al., 1993),
survival of the fittest (Kanazawa et al., 1995), and the condensation algorithm (Isard and Blake, 1998)