Professional Documents
Culture Documents
Ingeniería Mecánica
Taller de Investigación ll
I'm going to be talking about some of the work that we do in my group and we kind of
focus on a bunch of different areas so I'm going to just kind of start with like what our
research philosophy is and the things that we do so I come from background of doing
epitaxial synthesis of complex oxides designing Fair electric domain structures we're not
really going to talk about that that much so if you don't like Fair electrics that's fine I do a
lot of multi-dimensional spectroscopy of all different sorts and I primarily work on the
scanning probe microscopy so we'll talk a little bit about that but I also have kind of
ventured into other sort of areas of advanced spectroscopy and to deal with kind of the
data and the size of the data that we have I really need machine learning because we
have a lot of valuable data and it's much harder to extract information from that data so I
see figure out ways to kind of extract information from that data and to do that practically
we really need to think about our Computing infrastructure um so a lot of what my group
has been focusing on is how do we actually make these machine learning models and
machine learning tools usable and practical for the applications that we have um so I kind
of wanted to go and start off with like what are the practical requirements that we need
for machine learning so things are like how do we how do we go about simply saving
searching preserving and sharing data this kind of gets into that concept of the data
Deluge where you have so much data and the data analysis I think everyone can say
takes infinitely longer than the acquisition time uh so you could think analysis kind of takes
from weeks to months so another problem is science is distributed all over the world
everyone has their instruments and things and data that is collected is rarely collated so
that anyone else in the world can use it this is a major problem so you think about where
you save your data it's in folders and file systems that's not good long term um another
key problem is universities and even National Labs don't really have the infrastructure
that's required to manage data there's no there's no good networking that's around for
actually moving large volumes of data are responsible for managing data and most of
them do not have experience in designing uh parallel file systems and systems that have
reliability availability and resiliency for managing Data Systems and the universities don't
provide it and if they do it's at a really high cost and usually it's not backed up you'll find
out that later um the other problem is doing the computation is slow and I think one of the
main reasons why doing the computation slow is not actually the computational speed
but it's moving the data so if you think about oh if you're going to have a data set that's
one terabyte with standard internet speeds at universities about a gigabit per second it's
a 2.5 hours you can't even really write on most disk systems at a gigabit a second so it's
actually slower than that uh so and then you also have a problem with if you're doing
scientific instruments and you want or experiments and you want to actually collect your
data and analyze it in real time the computation needs to be highly available and most
Computing systems run on slur with schedulers and that just doesn't work for site for
experimental workflows so another kind of problem that I think is really important is like
machine learning is great but how do we ensure that it's parsimonious how do we ensure
that it reflects the physics but I like the simple example it's a little bit of an over
exaggeration but if you start with a concept of a circle and you want to learn.