Data Processing with Ruby

Brian Chapados http://chapados.org

SDRuby April 3, 2008

Understanding Proteins sequence: 1-D linear chain
> Archaeglobus PCNA MIDVIMTGELLKTVTRAIVALVSEARIHFLEKGLHSRAVDPANVAMVIVDIPK DSFEVYNIDEEKTIGVDMDRIFDISKSISTKDLVELIVEDESTLKVKFGSVEYK VALIDPSAIRKEPRIPELELPAKIVMDAGEFKKAIAAADKISDQVIFRSDKEGF RIEAKGDVDSIVFHMTETELIEFNGGEARSMFSVDYLKEFCKVAGSGDLLTI HLGTNYPVRLVFELVGGRAKVEYILAPRIESE

structure: 3-D after folding

Hard to do structures with several components

X-ray scattering

C. Trame, personal communication. Sousa et al. 2000. Cell 103: 633-643.

Raw Data
Distance distribution function of particle R 0.0000E+00 0.5000E+00 0.1000E+01 0.1500E+01 0.2000E+01 0.2500E+01 0.3000E+01 0.3500E+01 0.4000E+01 0.4500E+01 0.5000E+01 0.5500E+01 0.6000E+01 0.6500E+01 0.7000E+01 0.7500E+01 0.8000E+01 0.8500E+01 0.9000E+01 0.9500E+01 0.1000E+02 0.1050E+02 P(R) 0.0000E+00 0.3157E-02 0.6069E-02 0.8740E-02 0.1118E-01 0.1339E-01 0.1538E-01 0.1718E-01 0.1879E-01 0.2023E-01 0.2153E-01 0.2269E-01 0.2374E-01 0.2471E-01 0.2560E-01 0.2645E-01 0.2727E-01 0.2809E-01 0.2891E-01 0.2976E-01 0.3065E-01 0.3160E-01 ERROR 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00

Existing Software
Svergun group @ EMBL
http://www.embl-hamburg.de/ExternalInfo/Research/Sax/software.html

Works well, but... requires running each program multiple times “interactive” interfaces not easily scriptable no really... you have to see it to believe it

Help from Ruby
We want to use linux clusters with hundreds of CPUs Ruby wrap external programs write shell scripts to run external programs Rake define relationships between inputs/outputs of different programs launch external programs after dependencies are satisfied

Do more with Ruby
quick and dirty... Define input parameters in a script Define common tasks in a library more robust... Ruby API for running commands More sophisticated information processing Evolve towards a micro-framework

Acknowledgements
Lab (Scripps Research Institute) John Tainer Scott Williams Chris Putnam Data Collection Beamline 12.3.1 The Advanced Light Source (ALS, LBNL) Funding NIH, DOE, NCI