Reporte de trabajos científicos.

Neural networks: building high-leave features using unsupervised learning.
Wilson R. Tingo Y. tyn20@hotmail.com

State of art
Recent studies observe that it is uiet time intensive to train deep learning algorithms. !t supposes that the long training time is partially responsible for the lac" of high#level features reported in the literature. $or e%ample& researchers typically reduce the si'es of datasets and models in order to train net(or"s in a practical amount of time& and these reductions limited the learning of high#level features. This problem is resolve by scaling up the core components involved in training deep net(or"s) the dataset& the model& and the computational resources. $irst& (e use a large dataset generated by sampling random frames from random YouTube videos.

ABSTRACT: The unsupervised learning allows building high-leave features from only unlabeled data. For example is it possible to learn the features from face (face detector) using only unlabeled images? The answer this, the neural networ trained is composed of several layered locally connected, the model has ! billion connections, the dataset has !" million #""x#"" pixel images downloaded from the $nternet. This networ using model parallelism and asynchronous %&'(system gestion data ) on a cluster with !,""" machines for three days. (ur experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. $t networ is sensitive to other high-level concepts such as cat faces and human bodies. KEYWOR S) artificial neural net(or"s& artificial intelligence& face detector* unsupervised learning. $igure 0. Thirty randomly#selected training images 1sho(n before the (hitening step2.

! "N#RO $%#"ON
The focus of this (or" is to build high#level& class#specific feature detectors from unlabeled images. This (or" investigates the feasibility of building high#level features from only unlabeled data. +ut perhaps more importantly& it ans(ers an intriguing uestion (hether the neuron could learn from unlabeled data. ,nsupervised feature learning and deep learning are methodologies in machine learning for building features from unlabeled data. ,sing unlabeled data in the (ild to learn features is the "ey idea behind the self#taught learning frame(or".

3 subset of training images is sho(n to chec" the proportion of faces in the dataset* (e run an 4pen56 face detection on 70%70 randomly#sampled patches from the dataset

) '*(OR"#+,S
The architecture and parameters in one layer of the net(or".

& #R'"N"N( SE# %ONS#R$%#"ON
T-. training dataset is constructed by sampling frames from /0 million YouTube videos. To avoid duplicates& each video contributes only one image to the dataset. .ach e%ample is a color image (ith 200%200 pi%els.

1

3fter training& it used this test set to measure the performance of each neuron in classifying faces against distracters.ince the test set is large& this method can reliably detect near optimal stimuli of the tested neuron. The first method is visuali'ing the most responsive stimuli in the test set. 2.). The output of one stage is the input to the ne%t one and the overall model can be interpreted as a nine#layered net(or" 1see $igure /2.The overall net(or" replicates this structure three times. The second approach is to perform numerical optimi'ation to find the optimal stimulus $igure /. RE%O(N"#"ON .urprisingly& the best neuron in the net(or" performs very (ell in recogni'ing faces& the best neuron in the net(or" achieves </. 2. *E'RN"N( 'N O. Opti0i1ation: 3ll parameters in model (ere trained jointly (ith the objective being the sum of the objectives of the three layers 2.EN#'* .Reporte de trabajos científicos.). $igure 9& confirm that the tested neuron indeed learns the concept of faces.! 'r-hite-ture The algorithm is built upon these ideas and can be vie(ed as a sparse deep autoencoder (ith three important ingredients) local receptive fields& pooling and local contrast normali'ation.ER".0= accuracy in detecting faces in relation to other. $or simplicity& the images are in /8. ). 5isuali1ation !n this section& (e (ill present t(o visuali'ation techni ues to verify if the optimal stimulus of the neuron is indeed a face. #ES# SE# The test set consists of 90&000 images sampled from t(o datasets) :abeled $aces !n the dataset. ). .!.#". +ottom) The optimal stimulus according to numerical constraint optimi'ation."/'#"ON *earning: 8uring learning& the parameters of the second sublayers 1-2 are fi%ed to uniform (eights& (hereas the encoding (eights W/ and decoding (eights W2 of the first sublayers are adjusted using the optimi'ation problem. E3. 2. Top) Top >< stimuli of the best neuron from the test set. 2 .2.ER". 4ur deep autoencoder is constructed by replicating three times the same stage composed of local filtering& local pooling and local contrast normali'ation. E3. 2.&.RO#O%O*S $igure 9.EN#S ON 4'%ES !n this section& (e describe our analysis of the learned representations in recogni'ing faces.

comBpubsBpub9<//C.Reporte de trabajos científicos.6209.orgBabsB///2.pdf 3 .org/pdf/1112. 7 Referen-es . 6 %ON%*$S"ON !n this (or"& (e simulated high#level class#specific neurons using unlabeled data.720D @9A http://arxiv. We achieved this by combining ideas from recently developed algorithms to learn invariances from unlabeled data.google.html @2A http)BBar%iv.sar formato 3?3* ejemplos @/A http)BBresearch.