You are on page 1of 9

MASCARA: data handling, processing and calibration

R. Stuika,*, A.-L. Lesagea, A. Jakobsa, J.F.P. Sproncka and I.A.G Snellena


a
Leiden University, Leiden Observatory, P.O. Box 9513, 2300 RA, Leiden, The Netherlands

ABSTRACT

MASCARA, the Multi-site All-Sky CAmeRA, consists of several fully-automated stations distributed across the globe.
Its goal is to find exoplanets transiting the brightest stars, in the V = 4 to 8 magnitude range, currently probed neither by
space- nor by ground-based surveys. The nearby transiting planet systems that MASCARA is expected to discover will
be key targets for future detailed planet atmosphere observations. Each station contains five wide-angle cameras
monitoring the near-entire sky at each location. Once fully deployed, MASCARA will provide a nearly continuous
coverage of the dark sky, down to magnitude 8, at sub-minute cadence.

Effectively taking an image of the full sky every 6.4 seconds, MASCARA will produce approximately 500 GB of raw
data per night, per station. This data needs to be processed in order to produce calibrated light curves, for up to ~40,000
stars down to magnitude 8 and with a signal-to-noise-ratio of better than 100. The aim of the data reduction pipeline is to
process the data locally and in real time, both to immediately have quality control, as well as to prevent a data back-log.
Although the cameras are fixed and the stars are therefore drifting over the CCDs, MASCARA is a targeted mission.
Data processing consists of three main steps:
1. Compute a complete astrometric solution to sub-pixel level for each exposure and extracting postage stamps for
each of the stars in the field of view.
2. Perform accurate photometry on each of the postage stamps, including back-ground subtraction and
identification of errors in the photometry due to bad pixels, satellites, air planes or Laser Guide Stars.
3. Remove fluctuations on time scales typical for transits, i.e., several hours, caused by for example the camera
and atmospheric transmission, color variations in stars and pixel-to-pixel gain fluctuations. Photometry on short
time scales already shows noise levels close to the photon noise limit, and using a combination of calibration
and relative photometry the red-noise component can be reduced to close to this photon noise limit, allowing for
semi-automated identification of exo-planet transits.
This paper discusses the data handling, processing and calibration and shows the first results of the pipeline
Keywords: Transit Survey, All-Sky, Camera, Exo-planet, High-cadence, Multi-site.

1. INTRODUCTION
The last decade was marked by a rain of exoplanet discoveries, which launched the field of exo-planetology: the study of
the properties of alien worlds. While the radial velocity method determines the main orbital elements and m sin(i) of a
planet, transits are the only way to determine the planetary radius. Transiting planets are also the favorite targets for the
determination of planet atmospheric properties. Currently, a large majority of the observations dedicated for atmospheric
characterization are done on the two transiting hot-Jupiter systems, HD 2094581 and HD 1897332 which both have
brightnesses of mV ~7.7.
Most of those observations are performed from space using the Spitzer or Hubble space telescopes. Nonetheless, the
detection of molecular features in the exoplanet's atmosphere can also be done using high resolution spectra of the
system3,4. Since atmospheric signals typically require measurements with accuracies on the order of 10-3-10-4, these
require observation at very high signal-to-noise just in order to disentangle the planetary from the stellar signal. Hence,

*
stuik@strw.leidenuniv.nl

Software and Cyberinfrastructure for Astronomy III, edited by Gianluca Chiozzi, Nicole M. Radziwill,
Proc. of SPIE Vol. 9152, 91520N · © 2014 SPIE · CCC code: 0277-786X/14/$18
doi: 10.1117/12.2055846

Proc. of SPIE Vol. 9152 91520N-1

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 08/12/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx


with the current instrumentation, atmospheric characterization can be done from the ground only on the brightest systems
available.
The current transit surveys are targeting a stellar population of magnitude mV = 7 - 14. The two space missions Corot5
and Kepler6 have found thousands of candidates at magnitude fainter than mV = 11, rendering most radial velocity
follow-up very challenging. However they did find an amazing menagerie of planets spanning from hot Jupiters to the
first Earth-sized planets and even an evaporating Mercury7. Together with the successful ground-based surveys such as
SWASP and HATNet, these surveys proved that planets are very common and can be found everywhere in the solar
neighborhood. Indeed radial velocity surveys found around 25% of nearby Solar-type stars a Neptune-sized planet, half
of them orbiting their host star within 0.1 AU, and each with a probability of 1 - 10% to transit. Detecting the brightest of
these transiting exoplanet systems requires a new survey targeting the entire sky at mV < 8.

2. SCIENCE REQUIREMENTS
To achieve the scientific goals as outlined above, we require MASCARA to do the following:
# Description Requirement Goal
RQ1 Number of MASCARA stations ≥1 per hemisphere ≥ 6
RQ2 Air mass coverage per station 1 – ≥2 1–≥3
RQ3 Unsaturated Dynamic Range V=4–8 V= 3 – 9
RQ4 Minimal signal to noise per hour 100 @ V=8 100 @ V=9
In order to reach these requirements, the following MASCARA concept has been developed:
All-sky coverage with 5 wide-field cameras. MASCARA will be using at each station a total of 5 modified Atik
11000M cameras, each fitted with a Canon 24mm f/1.4 lens, used at its largest aperture.
Nearly complete to Air Mass = 2, >50% to AM=3. As can be seen in Figure 1, the 5 wide field lenses (53°x74°) cover
nearly the complete sky to air mass 3, see Figure 1. At any given moment, each MASCARA station can monitor
about one third of all the stars in the sky.

Sky coverage using 5 cameras

50 100 150 200 250 300 350 400


Azimuth in degree

Figure 1. Local sky coverage using the Atik detector and the Canon 24 mm f/1.4 lens for one station. The field of
view of the camera overlap by nearly 18%. The cameras are pointing respectively towards North, East, South,
West with an inclination of 41°. The coverage of the central camera pointing towards zenith is shaded in grey. The
sky is nearly entirely covered till air mass 2, and partly from air mass 2 to air mass 3.

Proc. of SPIE Vol. 9152 91520N-2

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 08/12/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx


Preferably >6 stations worldwide. Three stations on each hemisphere will allow for continuous monitoring of the
nearly all stars. Although full coverage would be desired, in practice only between ~60-80% is achievable with a
limited number of stations, while the area within an angular separation of less than 7.5 degrees from the sun will
never be observable due to high sky background.
High cadence and short exposures to catch brightest stars. MASCARA will be taking images at a rate of 6.4 seconds
per exposure and the interline CCDs allow for fully continuous exposures. This will freeze the motion of the stars
(maximum of ~0.5 pixels per exposure drift, and allow both to capture the fastest events as well as prevent saturation
for the brightest stars. Since MASCARA is mainly limited by scintillation noise at the faint end and sky background
noise at the faint end, adding up images will only for a large number of images be dominated by read noise.
Fixed sidereal times to ease calibration. The station will take exposures at fixed sidereal times. This means that every
night a star will be in exactly the same location on the chip, which will allow for removal of local calibration errors at
no additional cost in hardware of software.
High reliability and minimal maintenance. The station is expected to operate continuously for at least 5 years with
minimal maintenance. The choice of a minimal number of moving components,
Low sensitivity to stray light. MASCARA should be operating all nights, including nights with a (full) moon in the sky.
Using multiple cameras, the moon will generally only impact observations in a single camera and with the anti-
blooming protection, it was found that only the area within a degree from the moon was not usable, while the column
containing the moon only sometimes is affected by the bleeding due to the moon.
Low cost. MASCARA is a small system and can be built for around 100 kEUR per site in hardware. Nearly all
components are off-the-shelf and easily replaceable in case of failure.
A full description of MASCARA can be found in Lesage et al8, with a technical description of the station in Spronck et
al9.
2.1 Software Requirements
The software requirements are not directly derived from the science requirements, but given by the amount of data and
the way the data is processed. The basic assumptions for designing the software are:
• MASCARA consists of a number of stations worldwide and one single, central archive.
• MASCARA is a high-cadence system with potentially limited bandwidth to the outside world. This means that the
data needs to be processed locally, at the MASCARA site and for sites with a limited bandwidth, only reduced data
can be sent to the central archive.
• The goal of the MASCARA data processing and pipeline is to provide a consistent and fully automated way of
transforming the raw images to calibrated light curves.
• The data processing is optimized to a) maximum data retention, b) maximize flexibility in reprocessing data and c)
allow for reprocessing of all data on short term.
• The final data product for MASCARA for the end user is a set of light curves, one for each individual object.
• Reprocessing of the intermediate data should remain possible for a limited amount of time.
These requirements form the basis of the design of the MASCARA data processing and pipeline.

3. SHORT DESCRIPTION OF THE MASCARA DATA PROCESSING


Each camera will be attached to its own personal computer (PC), which will control the camera, read the data from the
camera, process the data from raw images via a set of intermediate data products to a calibrated curve for each camera,
temporarily stores the data and arranges for the transfer of pre-processed data to a central archive, see Figure 2.

Proc. of SPIE Vol. 9152 91520N-3

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 08/12/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx


La Palma

Dome/Weather
CCD Cameral CCD Camera2 CCD Camera3 CCD Camera4 CCD CameraS Local Storage
Temperature/...
A

Control PCO Control PC1 Control PC2 Control PC3 Control P4 Control PC4

Inside Mas ara

Inside SWASP

r 1 I r-
Remote Storage

flf
& Processing
Y
11[ Y
1
Y
1

Leiden

Figure 2. Architecture of the MASCARA data acquisition, processing and storage.

A typical night will start with cooling down the camera to a set point, taking of a set of bias images, dark images and flat
images (the enclosure is constructed in such a way that flats should be relatively homogeneous) and processing these to a
single bias, dark and flat image to be used for the night and stored on disk. While the system is starting up, the current
input catalog is constructed for that night and all data which is stored for a limited time on the PC (like the oldest raw
and pre-processed data) is deleted from the local disk. Once the bias/flats/darks have been taken and weather permits, the
enclosure is opened. The rest of the night, the camera continuously takes images until it either becomes too light or too
clouded to proceed.
During data taking the following steps are taken:
Collect 50 images. A set of 50 images is taken at fixed sidereal times. This constitutes of dataset of 5m20s. Each of the
images is dark-corrected and potentially corrected for bias and flat fielded.
Collect auxiliary data. Meta-data for each exposure is collected. This includes the current status of the camera and
dome, like temperature, humidity, but also external weather information, location of the sun and moon, and any other
information that might be used to correct for slow variations in the calibration of the system.
Verify/update astrometric solution. When installing the station, an astrometric solution is made for each camera. Every
5 minutes the astrometric solution is verified by computing the offset between the actual source positions and the the
predicted position. If necessary, the astrometric solution is updated, see also Section 1.4.
Construct an integrated image. Based on the astrometric solution, an integrated image can be reconstructed. This
image maintains the noise properties of the individual images, but at significantly lower cadence and data volume.
Aperture photometry on all tracked stars. The number of stars each camera tracks depends on the brightness limit and
area on the sky, but are typically between 5000 stars down to magnitude 8 in the sparsely populated regions, i.e., near
the galactic poles, up to ~20000 stars for denser populated areas near the galactic equator, for magnitude 9. For more
on the photometry, see Section 1.5.
Detect and flag outliers. During both the astrometric step as well as the photometric step, many verification steps are
made to ensure that the final photometry of each star is as accurate as possible. Each observation is accompanied by a
flag that indicates, among other, if the star shows unexpected deviations in position, size, background, slope in
background and brightness, if there is a close neighbor, and if there are suspected hot pixels, either in the aperture or
in the background. Although the value of each of the above parameters is also saved, flagging the exposures with a
single flag allows for quick selection of ‘valid’ data points.

Proc. of SPIE Vol. 9152 91520N-4

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 08/12/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx


Trigger warnings on specific events. Potentially it is be possible to trigger on specific events. This is not yet being
implemented as the main science goal does not require this – transits will need to be confirmed by multiple events
before any ‘trigger’ is given.
Store light curvelets. For each star 50 brightness values, together with auxiliary data and flags are saved. Considering
the data volume and number of stars, the most efficient way to process the data was found to be saving the individual
snippets of data and reconstructing these snippets at a later stage to complete light curves.
Transfer raw and intermediate data. All data, i.e., the raw image frames as well as the summed imaged and light
curvelets, area initially only stored locally on a fast disk to allow the concatenation of the light curves. The raw image
files are also copied immediately to an internal secondary disk for safe keeping. For the first station on La Palma, we
have demonstrated that we can transfer the full raw dataset (500 Gb per station per night) to the central storage
facility in Leiden, but other locations will not be able to provide similar data rates. Therefore also a large local
storage facility is included that both provides a buffer in case of prolonged network outage and the ability to
reprocess the data locally if needed. At other locations than La Palma, exchanging and shipping this local data
storage back to Leiden would provide a way to still maintain a good average ‘data rate.’
At the end of the night several steps are taken to finalize the data set for the night:
Determine variations in bias and flat field. An additional set of bias frames, dark frames and flat fields are taken. If
there are deviations with respect to the beginning of the night, this will be additionally flagged in the light curves of
the individual stars and also indicates the need for example of cleaning of the windows.
Determine nightly transmission curves. Currently the most reliable way to determine the transmission of the system is
by tracking all stars as they move over the CCD. Considering that the average star is ‘stable,’ a median of the
transmission for all stars following the same track should give the average transmission of that point over the night.
We currently only have test data for photometric, but moonlit nights and comparing the behavior over many nights
will indicate if this process actually works reliably. Variations of the transmission from night to night could indicate
local contamination of the entrance window.
Concatenate individual light curvelets. For each star that was observable, up to ~100 light curvelets, each containing
50 data points were generated during the night. These curvelets are concatenated to a single light curve for that
camera, for that night, nightly transmission data is incorporated and the data is packaged in a fits file, including the
auxiliary data collected during the night.
Transfer reduced data set. The nightly light curves, together with the bias, dark and light frames from the start and end
of the night are transferred to the central storage facility, where the light curves are combined with the light curves
from other cameras (and stations) to form the most continuous light curve possible.
The typical data volumes for MASCARA, corrected for length of the night and average uptime are given in Table 1.

Proc. of SPIE Vol. 9152 91520N-5

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 08/12/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx


Raw 5 min. Light
Item Unit
size exposure curves
Image/curve size 20 20 0.20 MB
Night length 9.8 hrs
Fraction of the time available for observations 0.78
Number of images/curves per night per camera 4200 84 30,000
Raw Data volume per night per camera 86 1.7 5.0 GB
Data volume per night per station 431 9 29 GB
Data volume per month per station 13 0.26 0.87 TB
* * +
Compressed data volume per night per station 225 2.7 15 GB
Compressed data volume per month per station 6.7* 0.081* 0.44+ TB
Table 1. Typical MASCARA data volumes. Note that although the night varies between 8 and 12 hours, this
variation is fully compensated by the number of clear nights, leading to very small fluctuations in available night
time between summer and winter. The ‘5 min exposure’ column refers to an on-the-fly summation of images.
*
Compression tests on raw images performed with the fpack package. +Reduced light curve storage size based on
reduction of number of saved parameters as compression does not yield a significant reduction in size.
Compression using ‘rar’ yielded a further reduction in size to ~70%, but the overhead of compressing a large
number of files currently does not outweigh the reduction in storage requirements.

4. ASTROMETRY
Although a standard implementation for astrometry is given by astrometry.net, the solutions delivered were not stable,
only running under a Linux environment and not sufficiently fast. For MASCARA a dedicated two-step solution was
implemented. During the initial installation of the station, all the existing sources in the image are detected using a 5-
sigma threshold above the background noise. The sources are triangulated and matched to a grid of catalogue sources
using an initial guess of the camera pixel scale based on the f-number of the lens and pixel size. This step generates a set
of matched stars and an initial guess on the pointing and orientation of the camera, see Figure 3.

Using the matched stars, an iterative scheme is used to determine the pointing and distortions using a 10 parameter fit:
• Pointing of the optical axis of the lens (altitude0 and azimuth0)
• Orientation of the camera around its optical axis (rot)
• Lens optical axis on the CCD (X0, Y0)
• 5th order polynomial fit of the radial distortions of the lens. (((x-x0)2+(y-y0)2)0.5 = pn TAN(θ)n, n=1…5)
The resulting accuracy of the fit is determined by the accuracy of the determination of the position of the stars, the stars
used, asymmetry due to an imperfect lens and some higher-order polynomial terms in the radial direction. The higher
terms are calibrated by fitting low-order distortions to the actual positions of the stars. The residual errors after fitting are
of the order of 0.1 pix peak-to-valley, with 0.02 pixels noise in the stellar positions at the bright end due to scintillation
and the sky background.

Proc. of SPIE Vol. 9152 91520N-6

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 08/12/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx


350 matching triangles after pointing solution

2500

2000

1500

1000

500

100 1000 1500 2000 2500 3000 3500

Figure 3. Triangle matching during the astrometry routine.

During on-site test campaign, with significant changes in temperature (5-7 °C/night) and with a rather instable support of
the camera (wind impact), the astrometric solution remained stable at the sub-pixel level and solutions of the 10
parameters showed mainly a drift between similar parameters. For example, since the camera was pointed near zenith, an
offset in the rotation is nearly the same as another azimuth angle, while the pointing of the camera and the lens optical
axis on the CCD are interchangeable in a similar way, the exact solution depending on which stars are available in the
field of view.

5. PHOTOMETRY
The aperture photometry is done using a translation of the DAOphot aperture photometry routine. The photometry is
done for each catalogue star, based on the astrometric solution, i.e., no further centroiding and re-centering is applied.
Optimization of the aperture is still ongoing; the PSF varies significantly over the field of view, with large aberrations in
the corners of the field of view and an intentional defocus to 3 pixels in the center to decrease the intra-pixel sensitivity
variations. Selecting a too large aperture will lead to a large background contribution and confusion, while a small
aperture will lead to large variations due to small errors in the offset and the large variation in PSF size between center
and edge of the field of view. The optimal aperture size is between 4 and 8 pixels.
Even with an optimal aperture size, there is a strong variation in the system response. The photon flux varies by almost
more than a factor 2 between the center and the corners of the field of view, mainly due to the lens transmission and
vignetting, but also variations in atmospheric transmission due to variations in air mass and the fraction of the energy
captures in the aperture, which varies with the field-dependent PSF. Although the short term RMS variations in the flux
are at the level of 0.15-0.25% per hour at mV = 8, as extrapolated from 5-minute sets of exposures, the challenge lies in
removing long term fluctuations due to variations in the calibration. Currently a blind correction, based on overlapping
light curves of many stars, over the full night, yields a correction to the level of 1-2% RMS, see Figure 4.

Proc. of SPIE Vol. 9152 91520N-7

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 08/12/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx


MASCARA observation of HD 189733

1.04

1.02

0.96

- - Transit model
4 data binned by 5 min
0.94
} } data binned by 1 min
0.10 0.15 0.20 0.25
JD +2.4564934e6

Figure 4. The corrected curve for HD189733b, a known transiting exo-planet with a depth of 2.7%.

6. CONCLUSION AND FUTURE


The first MASCARA station is currently being integrated and tested at Leiden Observatory and will soon be tested on a
test field in the Netherlands. Once all tests have been performed to satisfaction, the station will be installed on La Palma
next to the SuperWASP station and should be operational by the end of the summer of 2014. A second station is
currently being developed, to be located on the southern hemisphere.

The data pipe line has been developed and is currently able to automatically extract light curves for all stars in the field
of view of the cameras. Calibration to the level of several percent can be done automatically, but removal of the last
systematic variations in the system response to the photon noise limit is under development. The test case of HD189733
has demonstrated that the correct light curve can be extracted, but the reliable automation of this process is still ongoing.
Calibration based on sidereal sampling of the light curve should significantly reduce the systematics, as long as exo-
planets are not transiting every 24 hours.
Further challenges that are under consideration are the way to store the data obtained by MASCARA; currently the data
storage of all MASCARA data is more expensive than the hardware for the station and methods are being developed to
only store the full data set for a limited time, with only a representative sub-set of the data being stored indefinitely. The
most optimal way currently seems to be by storing the extracted light curves together with 5-minute integrated images.
Future expansion of the pipe line to include time-domain astronomy and transients events will be investigated once the
first station is operational and the main transit pipe line is working. Transient events that might be used for triggering
are: meteorites, solar system bodies, satellites, and supernovae. Furthermore, the station is able to provide local data on
atmospheric transmission and clouds and we are expecting other interesting events in the very short time domain that
MASCARA is sensitive to.

REFERENCES

[1] Cody, A. M. and Sasselov, D. D., “HD 209458: Physical Parameters of the Parent Star and the Transiting
Planet,” APJ 569, 451-458 (Apr. 2002).

Proc. of SPIE Vol. 9152 91520N-8

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 08/12/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx


[2] Bouchy, F., Udry, S., Mayor, M., Moutou, C., Pont, F., Iribarne, N., da Silva, R., Ilovaisky, S., Queloz, D.,
Santos, N. C., Segransan, D., and Zucker, S., “ELODIE metallicity-biased search for transiting Hot Jupiters. II.
A very hot Jupiter transiting the bright K star HD 189733,” AAP 444, L15-L19 (Dec. 2005).
[3] Birkby, J. L., de Kok, R. J., Brogi, M., de Mooij, E. J. W., Schwarz, H., Albrecht, S., and Snellen, I. A. G.,
“Detection of water absorption in the day side atmosphere of HD 189733 b using ground-based high-resolution
spectroscopy at 3.2 µm,” MNRAS 436, L35-L39 (Nov. 2013).
[4] De Kok, R. J., Brogi, M., Snellen, I. A. G., Birkby, J., Albrecht, S., and de Mooij, E. J. W., “Detection of
carbon monoxide in the high-resolution day-side spectrum of the exoplanet HD 189733b,” AAP 554, A82 (June
2013).
[5] Barge, P. et al, “Transiting exoplanets from the CoRoT space mission. I. CoRoT-Exo-1b: a low-density short-
period planet around a G0V star,” AAP 482, L17-L20 (May 2008).
[6] Borucki, W. J. et al, “Kepler Planet-Detection Mission: Introduction and First Results,” Science 327, 977 (Feb.
2010).
[7] Rappaport, S., Levine, A., Chiang, E., El Mellah, I., Jenkins, J., Kalomeni, B., Kite, E. S., Kotson, M., Nelson,
L., Rousseau-Nepton, L., and Tran, K., “Possible Disintegrating Short-period Super-Mercury Orbiting KIC
12557548,” APJ 752, 1 (June 2012).
[8] Lesage, A.-L., Spronck, J.F.P, Stuik, R., Bettonvil, F., Pollaco, D., Snellen, I.A.G., “MASCARA, the Multi-site
All-Sky CAmerRA: Concept and first results,” Proc. Spie 9145-29 (2014).
[9] Spronck, J.F.P., Lesage, A.-L., Stuik, R, Bettonvil, F., Snellen, I.A.G, “MASCARA: opto-mechanical design
and integration,” Proc. SPIE 9147-196 (2014).

Proc. of SPIE Vol. 9152 91520N-9

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 08/12/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx

You might also like