3 views

Uploaded by Rajat Govekar

- A Neural Network- Based Classification Method for Inspection of Bead Shape in ERW
- Face Recognition- A Convolutional Neural Network Approach
- Machine Learning Primer 108796
- Gerke Soccer Jersey Number ICCV 2015 Paper
- Intelligent Control Assignment
- Elowsson_Deep Layered Learning in MIR
- H-DenseUNet Hybrid Densely Connected UNet for Live (1)
- Recent Techniques Used in Transmission Line Protection Final..
- MC0076(C)
- 1. Literature Review
- report xixuwang
- Study on Application of Quantum BP Neural Network to Curve Fitting
- Multivariable Feedback Control - Analysis and Design
- Uso de Redes Neurais Na Predio Da Evapotranspirao Da Gerbera Jamesonii Cultivada Em Ambiente Protegido
- Mitja Perus- Multi-level Synergetic Computation in Brain
- Rr410212 Neural Networks and Applications
- 6D-NN-Deep
- He Bi-Directional Cascade Network for Perceptual Edge Detection CVPR 2019 Paper
- 1607.02383
- Build

You are on page 1of 10

Computer Vision Group, Xerox Research Centre Europe

training on GPUs [28] or on very large clusters [29] a ne-

Fisher Vectors (FV) and Convolutional Neural Networks cessity, or their lack of geometric invariance [17]. This ex-

(CNN) are two image classification pipelines with different plains why the FV is still very competitive for certain tasks

strengths. While CNNs have shown superior accuracy on – see e.g. the winning system [19] at the Fine-Grained Vi-

a number of classification tasks, FV classifiers are typically sual Competition 2013 [3]. Hence, several works [47, 49]

less costly to train and evaluate. We propose a hybrid archi- have started exploring the combination of FVs and CNNs.

tecture that combines their strengths: the first unsupervised Our first contribution is a novel hybrid architecture that

layers rely on the FV while the subsequent fully-connected combines the best of both worlds – see Fig. 1. Its first lay-

supervised layers are trained with back-propagation. We ers are unsupervised and involve the computation and di-

show experimentally that this hybrid architecture signifi- mensionality reduction of high-dimensional FVs. This is

cantly outperforms standard FV systems without incurring followed by a set of supervised fully connected Neural Net-

the high cost that comes with CNNs. We also derive compet- work (NN) layers – akin to a Multi-Layer Perceptron (MLP)

itive mid-level features from our architecture that are read- – trained through back-propagation.

ily applicable to other class sets and even to new tasks. The motivation behind our hybrid architecture is three-

fold. First, since the convolutional layers of the CNN share

much with the coding/aggregation steps of the FV (see sec-

1. Introduction tion 3.3), we wanted to understand whether FVs were com-

The goal of image classification is to tag an image with petitive with representations learned by the convolutional

one or multiple class names based on its content. Our pri- layers of the CNN. Our experiments show that the accuracy

mary focus in this work is on learning classifiers on a large of our architecture comes close to that of the CNN model of

scale, i.e. when many classes and images are involved. This Krizhevsky et al. [28]. Second, while FVs are designed to

topic has received much attention [10, 53, 43, 31, 28, 47, 11] work well with linear classifiers (because they correspond

thanks in large part to the release of datasets such as Ima- to the explicit feature map of the Fisher Kernel), we wanted

geNet [12] and to the organization of public competitions to understand whether FV classification could be improved

such as the ImageNet Large Scale Visual Recognition Chal- with non-linear classifiers. Our work is the first to provide

lenge (ILSVRC) [42] since 2010. a positive answer thus showing that much still needs to be

In 2010-2011, the leading approaches to ILSVRC [43, learned about the FV. Third, in speech recognition MLPs on

31] were based on the Fisher Vector [38, 39] (FV) or the top of “handcrafted” features were the state-of-the-art until

closely related Super Vector (SV) [57]. These pipelines in- recently [55, 9]. Hence, we wanted to understand how such

volve extracting local descriptors, encoding them with high- a combination performed on image classification.

order statistics, aggregating them, and feeding them to ker- Because it is unpractical to collect large amounts of la-

nel classifiers (e.g. SVMs). However, in 2012 it was shown beled data for each new task, we are also interested in trans-

that Convolutional Neural Networks [30] (CNNs) trained in ferring the mid-level features learned by our architecture.

a supervised fashion with large amounts of labeled data out- Transferring features derived from image classifiers either

performed the FV [28]. CNNs are feed-forward architec- to different class sets or even to new tasks (e.g. image re-

tures involving multiple computational layers that alternate trieval or object detection) has been a very active research

linear operations such as convolutions and non-linear oper- topic [51, 50, 18, 13, 34, 1, 16]. Our second contribution is

ations such as max-pooling. Since then CNNs have consis- to show that the mid-level features derived from our hybrid

tently led the classification task at ILSVRC [28, 56, 46, 48]. architecture are competitive with those derived from CNNs.

Despite their success, deep architectures come with chal- The remainder of the article is organized as follows. Sec-

lenges. This includes the requirement for large amounts of tion 2 reviews related works. In section 3 we describe the

1

proposed architecture and explain how it relates to standard 3. System Architecture

FV and CNN pipelines. Section 4 validates the proposed

architecture experimentally. We first show its competi- Fig. 1 illustrates the proposed architecture. The first set

tiveness for large-scale classification on ILSVRC’10 and of unsupervised layers relies on the FV framework. The re-

ILSVRC’12. We then show that the mid-level features we sulting representation is then fed to a set of fully connected

supervised layers. We first detail the two sets of layers of

derive from the ILSVRC classifiers can be transferred to

other classification tasks on PASCAL VOC’07 and VOC’12 our architecture. We then highlight the main differences be-

and to instance-level image retrieval on Holidays and UKB. tween this architecture and standard FV and CNN pipelines.

3.1. Unsupervised layers

Architecture. Our architecture involves three unsupervised

2. Related Work layers that sequentially perform local feature extraction, FV

encoding and dimensionality reduction.

We now review hybrid systems that combine FVs (or re- Local feature extraction layer (u1 ). Our framework is

lated representations such as the VLAD [26]) and NNs and independent of a particular choice of local descriptors. Fol-

works that derive representations from image classifiers. lowing [45] we use SIFT and color descriptors but other lo-

cal descriptors could have been included. We resize images

Hybrid systems. Simonyan et al. [47] define a FV layer as

to 100K pixels while keeping the aspect ratio. We extract

a set of five operations: i) FV encoding, ii) supervised di-

24×24 patches every 4 pixels at 5 scales thus leading to ap-

mensionality reduction, iii) spatial stacking, iv) `2 normal-

proximately 10K patches per image. From each patch we

ization and v) PCA dimensionality reduction. Such FV lay-

extract two types of local descriptors: 128-dim SIFT de-

ers are stacked into a FV network. Peng et al. [37] explored

scriptors [32] and 96-dim Local Color Statistics (LCS) [8].

a similar idea for action recognition. Sydorov et al. [49]

Their dimensionality is reduced with PCA respectively to

on the other hand propose to improve on the FV by jointly

77-dim and 45-dim. Following [44] we concatenate to these

learning the SVM classifier and the GMM visual vocabu-

descriptors the xy coordinates as well as the scale of the

lary. This is inspired by back-propagation as used to learn

patches, thus yielding 80-dim and 48-dim descriptors.

NN parameters: the SVM gradients are back-propagated to

FV encoding layer (u2 ). Perronnin and Dance [38] ap-

compute the GMM gradients. This idea was extended to plied the Fisher Kernel [22] framework to images by using

the VLAD [36]. Our hybrid architecture is orthogonal to as a generative model of local descriptors a Gaussian Mix-

and can be combined with [47, 37] and [49, 36] – see the ture Model (GMM). This was later refined by Perronnin et

conclusion. Finally, Gong et al. [17] address the lack of ge- al. [39] and was shown to be a state-of-the-art patch encod-

ometric invariance of CNNs by extracting CNN activations ing technique [5, 27, 45]. Given a GMM with K Gaus-

from large patches, embedding them using VLAD encoding sians, let us denote its parameters by λ = {wk , µk , σk , k =

and aggregating them at multiple scales. 1 . . . K} with wk , µk , and σk respectively the mixture

weight, mean vector and standard deviation vector of Gaus-

Mid-level features from classifiers. A simple approach to sian k (assuming a diagonal covariance). In the FV frame-

compute a representation from a set of image classifiers is work, the D-dim descriptor x is embedded with a function

to input an image to these classifiers and to stack their out- Φ(x) = [ϕ1 (x), . . . , ϕK (x)] into a 2KD-dim space where

puts into a vector [51, 50]. An issue with this approach each function ϕk (x) : RD → R2D is defined by:

is that the number of classes constrains the representation

(x − µk )2

» „ « „ «–

γ(k) x − µk γ(k)

dimension. To control this dimension, Bergamo et al. [4] ϕk (x) = √ ,√ − 1 (1)

wk σk 2wk σk2

and Gordo et al. [18] proposed to learn classifiers with an

intermediate hidden layer whose number of units can be and γ(k) denotes the soft assignment of descriptor x to

parametrized and to represent a new image as the output Gaussian k. We use a GMM with 1K=1,024 Gaussians. The

of this intermediate layer. A natural extension is to learn per-patch FVs are aggregated with sum-pooling, square-

deeper architectures, i.e. architectures with more than one rooted and `2 -normalized. One FV is computed on the SIFT

hidden layer, and to use the output of these intermediate descriptors and one on the LCS descriptors. The two FVs

layers as features for the new tasks. Krizhevsky et al. [28] are concatenated into a 256K-dim representation.

showed qualitatively that the output of the penultimate layer FV dimensionality reduction layer (u3 ). These FVs are

could be used for image retrieval. This finding was quan- too high-dimensional to be processed as is both for memory

titatively validated for a number of tasks including image and computational reasons. To address this issue we use a

classification [13, 34, 56, 6, 41], image retrieval [41, 1], ob- simple PCA dimensionality reduction followed by whiten-

ject detection [16], and action recognition [34]. We show ing and `2 -normalization. PCA followed by whitening has

that the output of the intermediate layers of our architecture been recently rediscovered as a useful transformation for

can also be used as generic image representations. tasks such as retrieval and classification [23] or detection

Figure 1. Proposed hybrid architecture. Image patches are described by PCA-projected SIFT descriptors (u1 ). These descriptors are then

embedded using FV-encoding and aggregated. This image-level representation is normalized by square-rooting and `2 -normalization (u2 ).

The resulting FV is PCA-projected and re-normalized (u3 ). The supervised layers (s1 ), (s2 ), .. (sL−1 ) involve a linear projection followed

by a reLU. The last layer (sL ) involves a linear projection followed by a softmax or a sigmoid and produces the label estimates. Our

architecture can be considered deep as it stacks several unsupervised and supervised layers. To simplify the figure, we assumed that only

SIFT descriptors were extracted. However, in practice, we also leverage color descriptors (see text).

[20]. In our experiments, the FV dimensionality is reduced x(e) → 1/(1 + exp(−x(e))). This enables multiple output

from 256K down to 4K. This value stroke a good compro- classes to have strong activations (i.e. close to 1). We exper-

mise between efficiency gain and accuracy loss. imented with a number of supervised layers varying from 1

Learning. All previous layers are learned in an unsuper- to 4 (i.e. 0 to 3 hidden supervised layers).

vised fashion. A significant advantage of such a strategy Learning. To train the parameters of these supervised lay-

is that the corresponding parameters can be learned once ers, we use the standard objective that involves minimiz-

and for all on a representative set of unlabeled images. In ing the cross-entropy between the network output and the

our experiments, this is done on a subset of 10K images ground-truth. We assume that we have N labeled training

of the ILSVRC’10 training set. The PCA reduction of the images with yn,c = 1 if image n is tagged with class label

local descriptors and the GMM were learned on a sub- c and yn,c = 0 otherwise. We denote by ŷn,c the predic-

sample of 1M local descriptors randomly extracted from tion of label c for image n as output by the last supervised

these 10K images. Given that the number of training im- layer. By definition ŷn,c = xL (c). In the mono-label case,

ages was smaller than the FV dimensionality (10K<256K), we maximize:

N X C

we learned the FV-level PCA in the dual. X

yn,c log(ŷn,c ) (2)

n=1 c=1

3.2. Supervised layers

while in the multi-label case, we maximize:

The PCA-reduced FVs output by the last unsupervised XN X C

layer (u3 ) are input to the first supervised layer (s1 ). yn,c log(ŷn,c ) + (1 − yn,c ) log(1 − ŷn,c ). (3)

Architecture. The supervised part of our architecture con- n=1 c=1

sists of a set of L fully connected layers that involve a We initialize the weights from a zero-mean Gaussian.

linear projection followed by a non-linearity. If xn−1 For the optimization, we use back-propagation with mini-

is the input of (sn ), xn is its output and if σ denotes batch stochastic gradient descent and a batch size of 128.

the non-linearity, we have xn = σ(Fn (xn−1 )) where The learning rate is fixed during an iteration (one pass over

Fn (x) = Wn x + bn and Wn and bn are the parameters the data). Once the validation error does not decrease any

to be learned. For the intermediate hidden layers (s1 ) to more (i.e. it decreases by less than 0.1% absolute between

(sL−1 ), we use a rectified Linear Unit (reLU) non-linearity two iterations) we divide the step-size by 10. We repeat this

σ(x) = max(0, x) which showed improved convergence procedure twice, i.e. we have 3 learning rates. The starting

with respect to sigmoid non-linearities both in computer vi- step-size was cross-validated in our experiments. To avoid

sion [28] and speech recognition [55, 9]. As for the out- over-fitting, we use drop-out [28] at the input of all super-

put of the last layer (sL ), in the mono-label case (e.g. Ima- vised layers. We use the same drop-out rate for all layers

geNet) we use a softmax non-linearity which given the vec- although using different values might improve results [9].

tor x = [x(1), . . . , x(E)] performs the following transfor-

PE 3.3. Comparison with FV and CNN pipelines

mation: x(e) → exp(x(e))/ i=1 exp(x(i)). This ensures

that output scores are non-negative and sum to unity and Relationship with FVs. With respect to the standard large-

thus provides a probabilistic-like output. In the multi-label scale classification FV pipeline [45], our architecture ex-

case (e.g. PASCAL VOC) we use a sigmoid non-linearity: hibits two significant differences.

An advantage of our architecture is that one can view the

learning of the first layers as an unsupervised pre-training

stage. Since these layers do not need to be retrained for each

task, our architecture is fast to train (see “convergence” in

section 4.1). As a corollary, we can train efficiently mul-

tiple models for bagging purposes, starting from different

random initializations, as the unsupervised layers are fixed.

Figure 2. Comparison of a Neural Network classifier with a single Also, we note that the cost of forward-passing one sample

hidden layer and a kernel classifier based on random feature maps. through the unsupervised layers is much higher than the cost

of forward-passing through the supervised layers (by a fac-

tor ∼100). Hence, at inference time when bagging multiple

The first one is the PCA that we use to reduce the di-

models, we perform only once the costly operations through

mension of the FVs, both for memory and computational

the unsupervised layers and we repeat only the costless op-

reasons. To handle the challenge of high-dimensional vec-

erations through the supervised layers.

tors, Sánchez and Perronnin [43] use Product Quantiza-

tion (PQ) [25] which does not reduce the feature dimen- 4. Experiments

sion. This is justified by their choice of a linear classifier

whose capacity is the feature dimension + 1. In the linear We first provide a detailed analysis of the proposed

case, high-dimensional features are necessary to get a high architecture on ILSVRC’10. We then report results on

enough capacity. In our case, since we use a non-linear net- ILSVRC’12. Finally, we show that we can extract from

work, the capacity is not constrained by the dimension of the intermediate levels of our architecture mid-level features

the features that enter the supervised layers. which can be used for classification (on PASCAL VOC’07

The second difference is the fact that, in the standard FV and VOC’12) and retrieval (on Holidays and UKB).

classification pipeline, supervised learning relies on kernel

4.1. A detailed study on ILSVRC’10

classifiers, and generally on linear classifiers. In what fol-

lows, we distinguish the linear and non-linear cases. The ILSVRC’10 [42] contains 1K classes and ∼1.4M im-

linear case can be readily interpreted as a special case of ages: 1.2M for training, 50K for validation, and 150K for

our architecture where there is a single supervised layer, i.e. testing. We follow the challenge training/validation/test

L=1. The non-linear kernel case can be understood as an protocol and evaluation measure, i.e. the top-5 error rate.

architecture with two layers (L=2) where the parameters of FV dimensionality reduction. We first quantify the influ-

the first layer are not learned but fixed – see for instance ence of the FV dimensionality reduction in layer (u3 ). In

[2]. In a two-layer architecture, the first layer can be inter- this experiment, we use a single supervised layer for clas-

preted as a non-linear mapping Ψ and the second layer as a sification, i.e. a linear logistic classifier. Where we do not

linear classifier in the embedding space. Interestingly if Ψ perform dimensionality reduction, we use PQ to reduce the

involves random Gaussian projections followed by a reLU memory footprint of the 256K-dim FVs. Using the standard

non-linearity, Ψ is the feature map of the arc-cosine kernel setting of [45] (FV divided into sub-vectors of 8 dimensions

of order one as proposed by Cho and Saul [7]. This shows and 8 bits per sub-vector), we obtain a 25.5% top-5 error.

a close connection between NNs with a single hidden layer This is exactly the accuracy reported in [45] (with 1M-dim

and a reLU non-linearity, and arc-cosine kernel classifiers. descriptors in their case). If we perform PCA to 4K-dim, we

The main difference is that in the former case both layers obtain 27.2% which is only 1.7% higher. However, this 64-

are trained in a supervised fashion while in the latter only fold compression significantly speeds-up subsequent com-

the second layer is learned in a supervised fashion while the putations in supervised layers.

parameters of the first layer are set at random – see Fig. 2. Model complexity and drop-out. We now study the influ-

In section 4, we show the superiority of the proposed archi- ence of two key parameters which control the complexity of

tecture over the arc-cosine kernel alternative. our supervised layers: the number of supervised layers and

Relationship with CNNs. While FVs and CNNs seem the number of units per layer (in all the following experi-

quite different at first glance, several authors pointed out ments, the number of units is constant across hidden lay-

that they have a lot in common [47, 49]. Indeed, the FV ers). As more complex models might more easily overfit,

extraction pipeline can also be interpreted as a series of lay- we study the influence of these two parameters jointly with

ers that alternate linear and non-linear operations and can the drop-out rate. Results are summarized in Fig. 3.

be paralleled with the convolutional layers of the CNN. The We make the following observations. Drop-out does not

main difference between the proposed architecture and the help when training a linear classifier (0 hidden layer), most

standard CNN [28] is that our first layers are unsupervised certainly because we operate on low-dimensional descrip-

while in [28] all layers are learned in a supervised fashion. tors (4K) and therefore there is little opportunity to overfit.

Figure 3. Top-5 error as a function of the drop-out rate for a variable number of hidden layers and a variable number of hidden units per

layer: 1,024 (left), 2,048 (middle) and 4,096 (right) units per hidden layer (always the same number of units in each hidden layer).

The 0.5 value which is generally used in CNNs [28] does tions when learning a model with 3 hidden supervised lay-

not work in our architecture and the optimal seems to be ers with 4K units and a drop-out rate of 0.2 (the best results

around 0.2 whatever the number of hidden layers and units. reported in Fig. 3 right). We observe that the convergence

Finally, more hidden layers does not seem to help for too is much faster than what is reported in [28] (10 passes in

small a number of hidden units (see Fig. 3 left). However, our case vs. 90 in [28]). We believe that this is because the

it makes a difference for a larger number of units (see Fig. first layers of our architecture are pre-trained in an unsu-

3 right) and the performance saturates with 3 hidden layers. pervised manner and are kept fixed during the supervised

Comparison with kernel methods. In section 3.3, we dis- learning process, thus reducing the number of parameters

cussed the link between kernel classifiers and NNs with one to be learned. Running 24 threads on a Linux server with

TM

hidden layer. We especially highlighted the close relation- Intel
Core

R i5-2400 CPUs @ 3.10GHz and 128GBs of

ship between the arc-cosine kernel classifier and NNs with RAM, a pass over the data is carried out in ∼ 1h with our

1 hidden layer employing reLU non-linearities. Therefore, C++ implementation. Hence, the whole training process

we compare these two alternatives. Since the arc-cosine takes about 10h on a single server. Using multiple servers, it

kernel is not common in computer vision, we also com- is possible to train in parallel multiple models from different

pare with the RBF kernel. Because of the large number initializations and to bag their outputs.

of training samples (1.2M), we train the non-linear kernel Bagging. We consider architectures with 0, 1, 2 and 3 hid-

classifiers in the primal by leveraging explicit feature maps. den supervised layers. For each architecture, we train 8 dif-

In the arc-cosine kernel case, we use the feature map pro- ferent models using different random seed initializations.

posed by Cho and Saul [7] which involves a random Nor- To combine the output of multiple networks, we multiply

mal projection followed by a reLU non-linearity. In the the probabilistic outputs. Results are presented in Fig. 4(c)

RBF kernel case, we use the explicit feature map of Rahimi when combining 2, 4 or 8 models. The larger the number

and Recht [40] which involves a random Normal projec- of hidden layers, the greater the impact of bagging. This

tion, a random offset and a cosine non-linearity. We cross- does not come as a complete surprise. Indeed, when there

validated the kernel bandwidth of the RBF. is no hidden supervised layer the objective to be optimized

Results are reported in Fig. 4(a) as a function of the di- is convex and therefore different initializations lead to the

mensionality of the feature map (for the arc-cosine and RBF same solution. The larger the number of hidden layers, the

kernels) and of the number of units in the hidden layer (in greater the opportunity to converge to different solutions. In

the NN case). The arc-cosine and RBF kernels perform the case of 3 hidden layers, we go from a top-5 error rate of

similarly and significantly worse than a NN with one hid- 22.6% with no bagging to 20.2% with bagging.

den layer, especially for a small number of random projec- Data augmentation. A standard way to avoid overfitting

tions / hidden units. Just to perform on par with the lin- when training classifiers and especially deep architectures is

ear baseline (NN with no hidden layer), both the arc-cosine to artificially augment the data by introducing transformed

and RBF kernels need to project the original samples in a versions of the original images. For instance, Krizhevsky et

32K-dim space (8 times larger than the original 4K-dim fea- al. [28] uses a predefined set of crops, flips and color pertur-

tures). While it might be possible to obtain better results bations. To avoid manually choosing the transformations,

with these kernels by projecting in even higher dimensional we run the Iterative Transformation Pursuit (ITP) algorithm

spaces, this is much more costly than the NN alternative. of Paulin et al. [35] on a subset of ILSVRC’10 (using all

Convergence. We report in Fig. 4(b) the evolution of the 1K classes but only 30 images per class) and choose a set

objective as well as the validation accuracy over the itera- of 4 transformations on top of the original images. Hence,

(a) (b) (c)

Figure 4. Top-5 error on ILSVRC’10. (a) Comparison of a linear classifier, a NN with 1 hidden layer and kernel classifiers with explicit

feature maps. (b) Log-posterior objective and validation error as a function of the number of passes over the training data. (c) Impact of

bagging for NNs with 0, 1, 2 and 3 hidden layers and 4K units per hidden layer. For each model, we chose the best validation drop-out rate.

hidden units data augm. bagging top-5 error (in %) [42] Best challenge FV results 26.2

4K no no 22.6 [28] AlexNet CNN (Bag-5) 16.4

4K no yes 20.2 [56] Zeiler CNN (Bag-6) 14.7

4K yes no 19.8 [46] OverFeat CNN (Bag-7) 13.2

4K yes yes 18.1 [48] Simonyan CNN (Bag-7) 7.5

8K yes yes 17.6 Table 2. Comparison with the state-of-the-art on ILSVRC’12.

State-of-the-art When relevant, we indicate how many models are bagged. Top-

[45] FV 25.5 5 error (in %).

[35] FV + ITP 25.1

[47] deep Fisher network 20.8

[35] DeCAF + ITP 18.6 augmentation (original image + 4 transforms) and bag 8

[28] AlexNet CNN 17.0

models. This leads to a top-5 error rate of 17.6%. Note that

Table 1. Impact of data augmentation and bagging on ILSVRC’10

for an architecture with 3 hidden supervised layers. We also com- if we perform only training set (but not test set) augmenta-

pare with the state-of-the-art. tion we obtain 19.0%. We compare these results with the

state-of-the-art in Table 1. We report significantly better re-

sults than a FV baseline [45] and the FV + ITP results [35].

the number of images is multiplied by 5. Data augmen- Our system is competitive with respect to CNNs [28, 35].

tation is used both at training and test time. At test time, Note that Simonyan et al. [47] report a 14.3% top-5 error

we multiply the predictions corresponding to the different by combining deep Fisher networks and CNNs. Similarly,

transformations of the images. we could combine the output of our model with that of a

In the case where we have no hidden supervised layer standard CNN.

(i.e. we learn a linear classifier on top of the 4K-dim PCA-

reduced FVs), the error rate is decreased from 27.2% with

4.2. Results on ILSVRC’12

no data augmentation to 25.2% with data augmentation. In We also benchmark our architecture on ILSVRC’12 [42]

the case where we have 3 hidden supervised layers, we re- for which many results have been reported recently in the

port the results in Table 1. We observe a 2.8% improve- literature. This dataset contains 1K classes, ∼1.2M training

ment without bagging (from 22.8% to 19.8%) and a 2.1% images and 50k validation images. Since the test labels are

improvement with bagging (from 20.2% to 18.1%). not available, we report results on the validation set as is

Learned representations. It is interesting to visualize what standard practice. We use training set augmentation and

is learned by our architecture. For this purpose, we in- bag 8 models. To avoid overfitting our architecture to the

spected the output of the intermediate unsupervised and su- validation data, we use the best architecture of ILSVRC’10:

pervised layers – see Fig. 5 and its caption. The conclu- three hidden supervised layers with 8K hidden units and a

sion is that the output of supervised layers is more semanti- 0.2 drop-out rate in all supervised layers. We obtain a top-5

cally consistent than the output of unsupervised layers. This error rate of 19.8%.

makes for a strong case to use the output of supervised lay- We compare to the state-of-the-art in Table 2. Our accu-

ers as generic mid-level features – see section 4.3. racy is significantly better than the best FV-based results re-

Comparison with the state-of-the-art. To obtain the best ported on this dataset (during the challenge). However, it is

possible results with our architecture, we use three hidden somewhat below the CNN systems [28, 56, 46] and signifi-

supervised layers with 8K hidden units. We perform data cantly below the recent very deep architecture by Simonyan

Figure 5. Visualizing the output of intermediate layers for an architecture with 3 hidden supervised layers and 4K units per hidden layer. For

a given layer, we show the images with the strongest responses for four random neurons (one line per neuron). Left: x0 is considered, i.e.

the output of the last unsupervised layer (u3 ). While the neuron outputs are visually consistent (e.g. same color), they are not necessarily

semantically consistent. This is because all the learning prior to (u3 ) is unsupervised. Right: x3 is considered, i.e. the output of the

penultimate layer (s3 ). Supervised learning makes the neuron outputs much more semantically consistent.

and Zisserman [48]. We still believe that our architecture classifier data augm. bagging mAP (in %)

linear no no 71.1 ± 0.12

has merits of its own. Especially, it does not require GPUs linear yes no 72.9 ± 0.19

or very large clusters [29] to be trained. Training the system linear yes yes 74.5

from scratch on a small CPU cluster (8 machines) takes ∼10 non-linear no no 73.1 ± 0.13

days: 7 days for the extraction of the PCA-reduced FVs and non-linear yes no 74.8 ± 0.25

non-linear yes yes 76.2

3 days for the supervised training of the 8 models. This is to Table 4. mAP on VOC’07 for linear vs. non-linear classifiers,

be compared e.g. with the 2-3 weeks reported by Simonyan with/without data-augmentation and with/without bagging.

and Zisserman in [48] to train a single model with a four

NVIDIA Titan black GPU system.

layers, reLU non-linearities for the hidden layers and sig-

4.3. Transferring mid-level features

moid non-linearities for the last layer (see section 3.2). We

In standard CNN architectures trained on large datasets cross-validated the number of layers, the number of hidden

such as ImageNet, the outputs of intermediate layers have units and the drop-out rate.

been used as mid-level features in a large variety of new Data augmentation and bagging. Since we have a limited

tasks. We now show that our architecture has the same amount of labeled training VOC images, we use data aug-

property. We start from a model learned on ILSVRC’12 mentation on VOC too. To this end, we apply ITP [35] to

(3 hidden supervised layers with 4K units and a drop-out select 4 transformations (on top of the original images), and

rate of 0.2) and study how such a classifier can transfer to apply them at training and test time. The ITP learning is

new tasks for which we have less (or no) training data. We done on the training and validation sets using a linear clas-

consider both classification and instance retrieval tasks. sifier. We use the same chosen transformations in both the

linear and non-linear cases. We also consider the bagging

4.3.1 Classification of multiple systems (8 in our experiments).

Results. We show results in Table 4. As observed for

Experiments are conducted on PASCAL VOC’07 and ILSVRC’10, non-linear classifiers improve over linear ones

VOC’12 [14, 15] which contain 20 classes. VOC’07 is split despite the limited amount of training data. Data augmen-

into 2,501 training, 2,510 validation and 4,952 test images. tation and bagging always help. Our best results of 76.2%

VOC’12 is composed of 5,717 training, 5,823 validation, mAP are obtained by bagging 8 classifiers with 3 hidden

and 10,991 test images. Following the PASCAL guidelines, layers (2K units per layer).

we first propose a study on VOC’07 and report a single run Comparison with state-of-the-art. We now compare our

on VOC’12 (evaluated on the PASCAL server). On both results with other mid-level representations in Table 5. For

datasets, we report the mean Average Precision (mAP). VOC’12, we trained the system with the best VOC’07 set-

Transfer strategy. We follow the strategy that involves ting and submitted our results to the PASCAL server1 . First

extracting the output of the penultimate layer (xL−1 in we report superior performance with respect to the standard

our case), `2 -normalizing it and feeding it to a classi- FV pipeline [45] and its improved version from [6]. We also

fier [13, 16]. We consider both linear and non-linear clas-

sifiers. In the non-linear case, to stay in the spirit of the 1 Detailed results available at: http://host.robots.ox.ac.

Methods FV x0 x∗ Bag-8 + PCA FV+proj [18] CNN+VLAD [17] CNN [1] CNN [41]

dim 256K 4K 4K 512 4K 512 12K 4K 500

Holidays 82.4 83.5 80.5 ± 0.31 82.7 84.7 79.0 80.2 74.9 64.2

UKB 3.35 3.33 3.36 ± 0.01 3.37 3.43 3.36 - 3.43 3.04

Table 3. Retrieval results on Holidays and UKB. x∗ indicates that we use the output of the best intermediate layer.

Proposed Architecture 76.2 77.7

[45] baseline FV 63.9 - Results. All results are reported in Table 3. We also com-

[6] improved FV 68.0 -

pare with other compact global image representations. We

[13] from [6] DeCAF - CNN 73.4 -

[41] Razavian - CNN 77.2 - observe that the PCA-reduced FVs obtain very competitive

[56] Zeiler - CNN - 79.0 results and that using the best intermediate layer leads to a

[6] Chatfield - CNN 82.4 83.2 decrease of performance on Holidays and to only a slight

[34] Oquab - CNN 77.7 78.7

increase on UKB. However, the proposed bagging strategy,

[21] He - CNN 80.1 -

[52] Wei - CNN 81.5 81.7 that involves PCA and whitening, leads to improvements

Table 5. Comparison with other mid-level representations on PAS- with a competitive result of 84.7% (resp. 3.43) on Holi-

CAL VOC’07 and VOC’12 (mAP in %). days (resp. UKB) for 4K dimensions. [41] reports 84.3%

on Holidays and 3.64 on UKB with a costly setting that in-

volves extracting and matching multiple CNN features per

outperform off-the-shelf DeCAF results and perform on par image. Our results are obtained by extracting and matching

with the results of Razavian et al. [41], Oquab et al. [34] a single descriptor per image. Babenko et al. [1] also re-

and Zeiler and Fergus [56]. He et al. [21] and Wei et al. [52] port better results on Holidays (79.3%) for a CNN that has

report better performance but with significantly more com- been retrained using an external dataset of landmarks. This

plex transfer strategies. [21] uses a spatial pyramid pooling additional information could be used by our architecture as

layer, with 50 bins. [52] uses a significantly more com- well, e.g. by learning an extra set of layers as done on VOC.

plex classification pipeline driven by bounding box propos-

als. The best results are those of Chatfield et al. [6] with a

costly architecture. Note that [34] and [52] also report bet-

ter results with extended class sets (i.e. more than the 1,000 5. Conclusion

classes of ILSVRC’12).

Summary. We proposed a hybrid architecture for image

classification that combines the benefits of FV and CNN

4.3.2 Instance-level retrieval pipelines. A first set of unsupervised layers involves the

We study the application of our mid-level features to in- extraction and dimensionality reduction of FVs while the

stance retrieval. We consider two datasets: INRIA Holi- subsequent supervised layers are fully connected. The pro-

days [24] (Holidays) which involves 1,491 images of 500 posed approach was shown to significantly improve over

scenes and the University of Kentucky Benchmark [33] previous FV systems without incurring the high cost asso-

(UKB) which involves 10,200 images of 2,250 objects. We ciated with CNNs. Especially, our system does not require

use the standard evaluation measures: mean AP (mAP) for training with GPUs or large CPU clusters. We also showed

Holidays and 4 times the recall at top 4 on UKB. that mid-level representations extracted from our architec-

ture were competitive with the mid-level features extracted

Layer choice. We extract and `2 -normalize the output of in-

from CNN representations, on several target tasks.

termediate layers and compare them with the dot-product.

As observed by Yosinski et al. [54], the transferability of Future work. We believe that our current architecture

features decreases as the distance between the base and tar- can be improved in its last unsupervised layer, i.e. the un-

get tasks increases. For UKB, whose images are closer to supervised dimensionality reduction of FVs. On the one

the ImageNet ones, we use x2 , while for Holidays, which hand PCA significantly reduces the high-dimensional FVs,

contains mostly scenes, we use x1 . We also report results and makes the subsequent supervised training and param-

for two baselines: using the high-dimensional FVs and the eter search manageable. On the other hand, FVs contain

PCA-reduced FVs, i.e. x0 . much fine-grained information [19] that might be discarded

Bagging. Bagging can be done by fusing the scores of mul- by this operation. An alternative inspired by [47] would

tiple queries. However, the query cost of this late fusion be to use a supervised dimensionality reduction instead of

strategy increases linearly with the number of models. In- PCA. Another alternative inspired by [49] would be to back-

stead, we concatenate the representations produced by the propagate the gradients to the currently unsupervised layers

different models, and reduce the dimension of this new rep- to fine-tune them. We leave this for future work.

Acknowledgments [17] Y. Gong, L. Wang, R. Guo, and S. Lazebnik. Multi-scale

orderless pooling of deep convolutional activation features.

This work was done in the context of the Project Fire-ID, In ECCV, 2014. 1, 2, 8

supported by the Agence Nationale de la Recherche (ANR- [18] A. Gordo, J. Rodrı́guez-Serrano, F. Perronnin, and E. Val-

12-CORD-0016). veny. Leveraging category-level labels for instance-level im-

age retrieval. In CVPR, 2012. 1, 2, 8

References [19] P.-H. Gosselin, N. Murray, H. Jégou, and F. Perronnin. Re-

[1] A. Babenko, A. Slesarev, A. Chigorin, and V. S. Lempitsky. visiting the Fisher vector for fine-grained classification. PRL,

Neural codes for image retrieval. In ECCV, 2014. 1, 2, 8 2014. 1, 8

[2] Y. Bengio and Y. LeCun. Large-Scale kernel machines, chap- [20] B. Hariharan, J. Malik, and D. Ramanan. Discriminative

ter Scaling learning algorithms towards AI. MIT press, 2007. decorrelation for clustering and classification. In ECCV,

4 2012. 3

[3] A. Berg, R. Farrell, A. Khosla, J. Krause, L. Fei-Fei, J. Li, [21] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling

and S. Maji. Fine-Grained Competition (FGComp). http: in deep convolutional networks for visual recognition. In

//sites.google.com/site/fgcomp2013/, 2013. ECCV, 2014. 8

1 [22] T. Jaakkola and D. Haussler. Exploting generative models in

[4] A. Bergamo, L. Torresani, and A. Fitzgibbon. PiCoDeS: discriminative classifiers. In NIPS, 1998. 2

Learning a compact code for novel-category recognition. In [23] H. Jégou and O. Chum. Negative evidences and co-

NIPS, 2011. 2 occurrences in image retrieval: the benefit of PCA and

[5] K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. whitening. In ECCV, 2012. 2

The devil is in the details: an evaluation of recent feature [24] H. Jégou, M. Douze, and C. Schmid. Hamming embedding

encoding methods. BMVC, 2011. 2 and weak geometric consistency for large scale image search.

[6] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. ECCV, 2008. 8

Return of the devil in the details: delving deep into convolu- [25] H. Jégou, M. Douze, and C. Schmid. Product quantization

tional nets. In BMVC, 2014. 2, 7, 8 for nearest neighbor search. IEEE TPAMI, 2011. 4

[7] Y. Cho and L. Saul. Kernel methods for deep learning. NIPS, [26] H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating

2009. 4, 5 local descriptors into a compact image representation. In

[8] S. Clinchant, G. Csurka, F. Perronnin, and J.-M. Renders. CVPR, 2010. 2

XRCE’s participation to imageval. ImageEval @ CVIR, [27] H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and

2007. 2 C. Schmid. Aggregating local image descriptors into com-

[9] G. Dahl, T. Sainath, and G. Hinton. Improving deep neural pact codes. TPAMI, 2012. 2

networks for LVCSR using rectified linear units and dropout. [28] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet clas-

In ICASSP, 2013. 1, 3 sification with deep convolutional neural networks. In NIPS,

[10] J. Deng, A. Berg, K. Li, and L. Fei-Fei. What does classi- 2012. 1, 2, 3, 4, 5, 6

fying more than 10,000 image categories tell us? In ECCV,

[29] Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Cor-

2010. 1

rado, J. Dean, and A. Ng. Building high-level features using

[11] J. Deng, N. Ding, Y. Jia, A. Frome, K. Murphy, S. Bengio, large scale unsupervised learning. In ICML, 2012. 1, 7

Y. Li, H. Neven, and H. Adam. Large-scale object classifica-

[30] Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard,

tion using label relation graphs. In ECCV, 2014. 1

W. Hubbard, and L. Jackel. Handwritten digit recognition

[12] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-

with a back-propagation network. NIPS, 1989. 1

Fei. ImageNet: A large-scale hierarchical image database.

[31] Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour, K. Yu, L. Cao, and

In CVPR, 2009. 1

T. Huang. Large-scale image classification: fast feature ex-

[13] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang,

traction and SVM training. In CVPR, 2011. 1

E. Tzeng, and T. Darrell. DeCAF: A deep convolutional acti-

vation feature for generic visual recognition. In ICML, 2014. [32] D. Lowe. Distinctive image features from scale-invariant

1, 2, 7, 8 keypoints. IJCV, 2004. 2

[14] M. Everingham, L. Van Gool, C. Williams, J. Winn, and [33] D. Nistér and H. Stewénius. Scalable recognition with a vo-

A. Zisserman. The Pascal visual object classes (VOC) chal- cabulary tree. CVPR, 2006. 8

lenge. IJCV, 2010. 7 [34] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and

[15] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, transferring mid-level image representations using convolu-

and A. Zisserman. The PASCAL Visual Object Classes tional neural networks. In CVPR, 2014. 1, 2, 8

Challenge 2012 (VOC2012) Results. http://www.pascal- [35] M. Paulin, J. Revaud, Z. Harchaoui, F. Perronnin, and

network.org/challenges/VOC/voc2012/workshop/index.html. C. Schmid. Transformation pursuit for image classification.

7 In CVPR, 2014. 5, 6, 7

[16] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich fea- [36] X. Peng, L. Wang, Y. Qiao, and Q. Peng. Boosting VLAD

ture hierarchies for accurate object detection and semantic with supervised dictionary learning and high-order statistics.

segmentation. In CVPR, 2014. 1, 2, 7 In ECCV, 2014. 2

[37] X. Peng, C. Zou, Y. Qiao, and Q. Peng. Action recognition [57] Z. Zhou, K. Yu, T. Zhang, and T. Huang. Image classification

with stacked Fisher vectors. In ECCV, 2014. 2 using super-vector coding of local image descriptors. ECCV,

[38] F. Perronnin and C. Dance. Fisher kernels on visual vocabu- 2010. 1

laries for image categorization. CVPR, 2007. 1, 2

[39] F. Perronnin, J. Sánchez, and T. Mensink. Improving the

Fisher kernel for large-scale image classification. ECCV,

2010. 1, 2

[40] A. Rahimi and B. Recht. Random features for large-scale

kernel machines. NIPS, 2007. 5

[41] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carls-

son. CNN features off-the-shelf: an astounding baseline for

recognition. In CVPR Deep Vision Workshop, 2014. 2, 8

[42] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,

S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,

A. Berg, and L. Fei-Fei. Imagenet large scale visual recog-

nition challenge. arXiv, 2014. 1, 4, 6

[43] J. Sánchez and F. Perronnin. High-dimensional signature

compression for large-scale image classification. In CVPR,

2011. 1, 4

[44] J. Sánchez, F. Perronnin, and T. de Campos. Modeling the

spatial layout of images beyond spatial pyramids. In PRL,

2012. 2

[45] J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek. Im-

age classification with the Fisher vector: theory and practice.

IJCV, 2013. 2, 3, 4, 6, 7, 8

[46] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and

Y. LeCun. OverFeat: Integrated recognition, localization and

detection using convolutional networks. In ICLR, 2014. 1, 6

[47] K. Simonyan, A. Vedaldi, and A. Zisserman. Deep Fisher

networks for large-scale image classification. In NIPS, 2013.

1, 2, 4, 6, 8

[48] K. Simonyan and A. Zisserman. Very deep convolutional

networks for large-scale image recognition. arXiv, 2014. 1,

6, 7

[49] V. Sydorov, M. Sakurada, and C. Lampert. Deep Fisher ker-

nels – End to end learning of the Fisher kernel GMM param-

eters. In CVPR, 2014. 1, 2, 4, 8

[50] L. Torresani, M. Szummer, and A. Fitzgibbon. Efficient ob-

ject category recognition using classemes. In ECCV, 2010.

1, 2

[51] G. Wang, D. Hoiem, and D. Forsyth. Learning image simi-

larity from Flickr groups using stochastic intersection kernel

machines. In ICCV, 2009. 1, 2

[52] Y. Wei, W. Xia, J. Huang, B. Ni, J. Dong, Y. Zhao, and

S. Yan. CNN: single-label to multi-label. arXiv, 2014. 8

[53] J. Weston, S. Bengio, and N. Usunier. Large scale image

annotation: learning to rank with joint word-image embed-

dings. ECML, 2010. 1

[54] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. Quantifying

the transferability of features in deep neural networks. arXiv,

2014. 8

[55] M. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. le,

P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, and G. Hinton.

On rectified linear units for speech processing. In ICASSP,

2013. 1, 3

[56] M. D. Zeiler and R. Fergus. Visualizing and understanding

convolutional networks. In ECCV. 2014. 1, 2, 6, 8

- A Neural Network- Based Classification Method for Inspection of Bead Shape in ERWUploaded byJames Phillips
- Face Recognition- A Convolutional Neural Network ApproachUploaded bynrgyku
- Machine Learning Primer 108796Uploaded byVinay Nagnath Jokare
- Gerke Soccer Jersey Number ICCV 2015 PaperUploaded byGeorge Chiu
- Intelligent Control AssignmentUploaded byMuhammad Zeeshan
- Elowsson_Deep Layered Learning in MIRUploaded byKai Lin Zhang
- H-DenseUNet Hybrid Densely Connected UNet for Live (1)Uploaded byMuhammad Zaka Ud Din
- Recent Techniques Used in Transmission Line Protection Final..Uploaded byPrabhat Sharma
- MC0076(C)Uploaded byBaijnath Gupta
- 1. Literature ReviewUploaded byMohammed Razwin K P
- report xixuwangUploaded byapi-320563531
- Study on Application of Quantum BP Neural Network to Curve FittingUploaded bymariosergio05
- Multivariable Feedback Control - Analysis and DesignUploaded bydanh
- Uso de Redes Neurais Na Predio Da Evapotranspirao Da Gerbera Jamesonii Cultivada Em Ambiente ProtegidoUploaded byrogerio_phb
- Mitja Perus- Multi-level Synergetic Computation in BrainUploaded bydcsi3
- Rr410212 Neural Networks and ApplicationsUploaded bySrinivasa Rao G
- 6D-NN-DeepUploaded byNishant Tiwari
- He Bi-Directional Cascade Network for Perceptual Edge Detection CVPR 2019 PaperUploaded byJan Kristanto
- 1607.02383Uploaded byArun Kumar S
- BuildUploaded byViet Nguyen
- Determining Optimum Structure for Artificial Neural NetworksUploaded byCaro Bouniot
- Prediction of Compressive Strength of Fiber Reinforced Concrete Using Artificial Neural NetworksUploaded byKhalid Aljanabi
- Test Engine for Dummies ExtravaganzaUploaded byRamadhan Yunus
- N18-3011.pdfUploaded byRoedson Aringo
- APPLICATION OF INSTANCE-BASED LEARNERS FOR ARRHYTHMIA DETECTION IN ECG SIGNALS.Uploaded byIJAR Journal
- applsci-08-00837.pdfUploaded byNguyễn Việt Anh
- Chapter 2Uploaded byTukang_Gunting_3476
- 1-s2.0-0148906294911290-mainUploaded byBill Maquin
- 31496Uploaded byJavier M Quispe
- Untitled (2).docxUploaded byShūbhåm ÑagayaÇh

- codes-ukUploaded byEnmanuel Benavides
- The Restless EarthUploaded byAnabel Salazar
- UT Dallas Syllabus for lit3311.002.07s taught by Theresa Towner (tmtowner)Uploaded byUT Dallas Provost's Technology Group
- Modele de Eseuri EnglezaUploaded byhelen93hasdeu6971
- Super_Safari_Pupils_Book_Level_3_Sample_Unit.pdfUploaded bytuandt09
- Gricean MaximsUploaded byMaryuri Marin
- TN WoD AttendeesUploaded byJason Altman
- Evaluation of Consumers' Satisfaction towards Rural Primary Health Care Services in Babylon GovernorateUploaded byJASH MATHEW
- Learnchase Best Power Systems for AIX – Virtualization II: Advanced PowerVM and Performance for IBM online TrainingUploaded byLearnchase
- Why Some Student Cheat on Exam . Useful Info About Cheat on ExamUploaded byLda Ampilin
- lesson plan domain and rangeUploaded byapi-356651649
- LKPDUploaded byRoms Wildah
- 8_FENGLER_16H30.pdfUploaded byminchanmon
- 1451459162_89 (2015) 37259-37267Uploaded byIsaac Batoya
- ResearchMethodSTA630SolvedMCQs2Uploaded byayaz143
- Safari - 28-Aug-2018 at 2:00 PM.pdfUploaded byHarshad Hemani
- Plagiarism Checking WebUploaded bymazhaic
- Memorial ScholarshipsUploaded byjojoj
- Timed Up and Go Test (TUG)Uploaded bycpradheep
- Training and Development Program for Apple IncUploaded bysanorita2818
- AIWTSAC-f713e1d6Uploaded byazharuddink_2
- lp2 magazineUploaded byapi-232123517
- SBM Course OutlineUploaded byIndrajit Paul
- Blue Book Updated by NARAYAN CHANGDERUploaded byUGC NET BOOK
- Generation Y Leaders Focused on Economic Justice See New Hope in Ministry as "Change Agent" ProfessionUploaded byfteleaders
- The Service Design PackageUploaded byGabriel Ds
- Formular MesuesUploaded bysheila
- 2739 Chapter 3 Exercises and Answer Key.Uploaded byluihwang
- 10.1016@j.socscimed.2018.02.026Uploaded byThiagoSoares
- year 11 drama unit of work - chris scapensUploaded byapi-378246480