A First Catalog of Variable Stars Measured by the Asteroid Terrestrialimpact Last Alert System (ATLAS)
Abstract
The Asteroid Terrestrialimpact Last Alert System (ATLAS) carries out its primary planetary defense mission by surveying about 13000 deg at least four times per night. The resulting data set is useful for the discovery of variable stars to a magnitude limit fainter than , with amplitudes down to 0.01 mag for bright objects. Here we present a Data Release One catalog of variable stars based on analyzing 142 million stars measured at least 100 times in the first two years of ATLAS operations. Using a LombScargle periodogram and other variability metrics, we identify 4.7 million candidate variables which we analyze in detail. Through Space Telescope Science Institute, we publicly release lightcurves for all of them, together with a vector of 169 classification features for each star. We do this at the level of unconfirmed candidate variables in order to provide the community with a large set of homogeneously analyzed photometry and avoid prejudging which types of objects others may find most interesting. We use machine learning to classify the candidates into fifteen different broad categories based on lightcurve morphology. About 10% (430,000 stars) pass extensive tests designed to screen out spurious variability detections: we label these as ‘probable’ variables. Of these, 230,000 receive specific classifications as eclipsing binaries, pulsating, Miratype, or sinusoidal variables: these are the ‘classified’ variables. New discoveries among the probable variables number more than 300,000, while 150,000 of the classified variables are new, including about 10,000 pulsating variables, 2,000 Mira stars, and 70,000 eclipsing binaries.
1 Introduction
1.1 Variable Stars and WideField Surveys
Variable stars have profound and wideranging value for astrophysics. Pulsating variables, especially Cepheids, are a central link in the cosmic distance ladder that is foundational to our understanding of cosmology. Detached eclipsing binaries offer some of the best opportunities to get precise masses and radii of distant stars. Contact binaries present us with a rich variety of interesting phenomena, and some of them represent intermediate stages in systems evolving toward novae, stellar mergers, Xray binaries, and (possibly) Type Ia supernovae. Flare stars and spotted rotators give insight into stellar magnetic fields across the HR diagram. Pulsating red giants (especially the hugeamplitude Mira stars), and a vast diversity of exotic types of variables probe interesting astrophysics and stellar evolution scenarios.
Going back at least to the early results of the Optical Gravitational Lens Experiment (OGLE, Udalski et al., 1994), sky surveys using widefield CCD imagers have greatly increased the number of known variable stars. Even though many of the surveys do not have variable stars as their primary objective, the data they produce is revolutionizing the field of variable star research. This trend will only accelerate in the future as Gaia (Perryman, 2013), the Zwicky Transient Facility (Graham et al., 2018), and LSST publish their first timeseries photometry while ongoing surveys continue releasing interesting results.
A full review of variable stars results from widefield surveys is beyond the scope of the present work, but we briefly note a few publications and statistics for context. Surveys that have produced data used for variable star discovery and analysis include the gravitational microlensing surveys MACHO (Alcock et al., 1993) and OGLE (Udalski et al., 1994); supernova/transient surveys including the AllSky Automated Survey (ASAS, Pojmański, 1997), the AllSky Automated Survey for Supernovae (ASASSN, Shappee et al., 2014), the Palomar Transient Factory (PTF, Law et al., 2009), and the Chinabased Tsinghua UniversityNAOC Transient Survey (TNTS, Zhang et al., 2015); PanSTARRS1 (Chambers et al., 2016; Magnier et al., 2016a; Waters et al., 2016; Magnier et al., 2016c, b); the Robotic Optical Transient Search Experiment (ROTSEI), which was built to look for optical counterparts of gammaray bursts (Akerlof et al., 2000); and asteroid surveys including the Lowell Observatory NearEarth Object Search (LONEOS, Bowell et al., 1995), the Lincoln NearEarth Asteroid Research (LINEAR, Stokes et al., 2000) and the Catalina Sky Survey (Larson et al., 2003).
Data from these surveys have been used in many publications analyzing and presenting catalogs of variable stars. We list only a few examples here, reserving the OGLE surveys for a separate paragraph. Alcock et al. (1998, 2000) used data from the MACHO survey to find RR Lyrae and Scuti stars in the Galactic bulge, while the same authors also published numerous papers on MACHO variables in the Magellanic Clouds. Pojmański (2002, 2003); Pojmański & Maciejewski (2004, 2005); and Pojmański et al. (2005) used the ASAS survey to identify a total of 46,756 variable stars brighter than mag, at declinations south of +28. These include 9,581 eclipsing binaries, 4,921 pulsating stars, and 2,758 Mira variables. Using data from ROTSEI, Kinemuchi et al. (2006) discovered 1197 RR Lyrae stars down to 14th magnitude and analyzed their metalicity using metalicitydependence in their pulse waveforms. Miceli et al. (2008) discovered and analyzed 838 RR Lyrae stars in the Galactic halo, using data from LONEOS asteroid survey. Using the LINEAR survey data, Palaversa et al. (2013) discovered and classified 7000 variable stars, while Sesar et al. (2013) analyzed a partlyoverlapping set of 5000 RR Lyrae stars in the same data. About 60,000 variable stars were discovered in asteroid search data from the Catalina Sky Survey by Drake et al. (2013a, 2014a). Drake et al. (2013b), Hernitscheck et al. (2016) and Cohen et al. (2017) explored starstreams in the outer halo of the Milky Way using RR Lyrae candidates identified in data from the Catalina Sky Survey, PanSTARRS1, and the PTF, respectively. Yao et al. (2015) have released a meticulously analyzed list of 1237 variables stars from the TNTS. Jayasinghe et al. (2018) have released a catalog of 66,533 variable stars discovered in data from ASASSN. We mention in passing that a host of interesting variable star results have also been obtained using photometry from the Kepler mission (e.g. Benkö et al. (2010); Bányai et al. (2013); and many others), but we will not discuss them herein because most of the dramatic Kepler discoveries have come from probing a regime of smallamplitude, highprecision photometry that is inaccessible from the ground and hence of limited relevance to the groundbased ATLAS survey results that are the subject of this paper.
The Optical Gravitational Lens Experiment (OGLE, Udalski et al., 1994) surveys deserve a separate discussion becuse they have produced the largest homogeneous catalogs of variable stars thus far (by an order of magnitude). The OGLE surveys of the Galactic bulge have revealed about 700,000 new variables among 400 million stars analyzed (Soszyński et al., 2011a, b, 2013, 2014; Mróz et al., 2015; Soszyński et al., 2015, 2016, 2017), while several hundred thousand more variable stars have been found at more southerly declinations in the Magellanic Clouds. Besides these huge numbers of stars, the OGLE catalogs significantly exceed most of the others described here in temporal span and numbers of photometric points per star. This wealth of data has enabled many important results. These include Soszyński et al. (2015), who present the shortestperiod known mainsequence eclipsing binary, together with a fascinating astrophysical discussion of the existence (and rarity) of eclipsing binaries with periods shorter than the wellknown 0.22 day cutoff; and Soszyński et al. (2017), who discover and classify Cepheids toward the galactic center using Fourier phase coefficients; as well as others too numerous to list.
1.2 The ATLAS Survey
The Asteroid Terrestrialimpact Last Alert System (ATLAS; Tonry et al., 2018) is designed to detect small (10–140m) asteroids on their ‘final plunge’ toward impact with Earth. Because such asteroids can come from any direction and go from undetectable to impact in less than a week, ATLAS scans the whole accessible sky every few days. To achieve this, we use fully robotic 0.5m f/2 Wright Schmidt telescopes with 1056010560 pixel STA1600 CCDs yielding a 5.45.4 degree field of view with 1.86 arcsecond pixels. The first ATLAS telescope commenced operations in mid2015 on the summit of Haleakalā on the Hawaiian island of Maui, and the second was installed in January/February 2017 at Maunaloa Observatory on the big island of Hawaii. During normal operations, each ATLAS telescope takes four 30second exposures per night of 200–250 target fields covering approximately one fourth of the accessible sky. Together, the two telescopes cover half the accessible sky each night. The four observations of a given target field on a given night are typically obtained over a period of somewhat less than one hour.
The widefield, highcadence observations ATLAS makes to discover nearEarth asteroids are also wellsuited to the discovery and characterization of variable stars down to a magnitude limit fainter than . We present herein the first catalog of variable stars measured by ATLAS, including characterization of known variable stars and the discovery of about 300,000 new variables.
This initial data release is based on the first two years of operation of the Haleakalā telescope only, and covers observations taken up through the end of June, 2017. This date marked the end of a series of changes, which included the switch to dualtelescope operations; upgraded optics for both telescopes; recollimation of the telescopes to take advantage of the new optics; and changes in our observing cadence, processing pipeline, and calibration data. The optical upgrades changed the width of the typical point spread function (PSF) delivered by the Haleakalā telescope from 7 arcseconds to 4 arcseconds. The conclusion of these significant changes made it natural to consider the data before the end of June 2017 as a closed chapter, and accordingly we reanalyzed all of it with optimized and homogeneous methodology. This is the data set we analyze herein to produce ATLAS variable star Data Release One (ATLAS DR1). The more recent data are expected to be even better photometrically, but the ATLAS DR1 data set enables the discovery and/or characterization of several hundred thousand variable stars. We anticipate generating additional data releases (ATLAS DR2, DR3, etc) approximately once a year, which will include homogenously processed data from both telescopes, with adjustments to the calibration and analysis to take advantage of optical improvements.
The ATLAS telescope on Haleakalā observes with two customized, wide filters designed to optimize detection of faint objects while still providing some color information. The ‘cyan’ filter (, covering 420–650 nm) is used during the two weeks surrounding new moon; and the ‘orange’ filter (, 560–820 nm) is used in lunar bright time. As described in Tonry et al. (2018), the and filters are welldefined photometric bands with known color transformations linking them to the PanSTARRS , , and bands (Magnier et al., 2016b).
During the period covered by ATLAS DR1, the ATLAS Haleakalā telescope cycled through four bands of declination (‘Dec bands’), observing one band each night. Cumulatively, the Dec bands extended from Dec to in their narrowest configuration. The exact thickness of each Dec bands was adjusted depending on the phase of the Moon, as a bright Moon would render a large area of the sky unobservable (up to a circle of radius ) and hence require a wider Dec band in order to obtain enough viable pointings to fill the night. Within the scheduled Dec band for a given night, the telescope took four to six 30second exposures of each of typically 200 fields spanning most of the range of right ascension (RA) that was far enough () from the Sun to be observed. The exposures of each given field were all taken within a period of typically 0.5–1.5 hours, with small () dithers between them. To mitigate any systematic effects dependent on field position, a randomly selected offset of was selected and homogeneously imposed on all the pointings from each night, to ensure that over a long period there would be a large diversity of pointings in each Dec band. During the period covered by DR1, various trial adjustments were made to the extent of the Dec bands (in both RA and Dec); in the number of observations of each field per night; and in the dithering strategy. These resulted in some observations being conducted north and south of the to Dec range, but they were not sufficiently numerous to discover many variable stars. Using observations from both telescopes, variable star measurements in ATLAS DR2 will cover the entire sky north of Dec . ATLAS DR2 will contain 70% more stars and at least three times more photometric measurements than the current data release.
1.3 ATLAS Variable Stars
The ATLAS DR1 catalog we present herein makes a major contribution even in the context of the great expansion of known variable stars described in §1.1. It is based on the analysis of 142 million stars, of which 4.7 million are identified as candidate variables. All of the photometry for each of these 4.7 million stars is being publicly released: the largest catalog of candidate variables yet. With 430,000 confirmed variables (of which 300,000 are new), ATLAS DR1 is also the largest homogeneous catalog of confirmed variables apart from OGLE, and the largest to span the sky (since the OGLE variables are confined to relatively small areas targeting the Galactic bulge and the Magellanic Clouds). By using two filters ( and ) with welldefined photometric properties (Tonry et al., 2018), ATLAS obtains quantitative color information for every star. We provide AB magnitudes in the and bands that are free of any known systematic bias, together with realistic uncertainties for every measurement.
Besides the photometry, we are publicly releasing an extensive set of 169 variability features for each of the 4.7 million candidate variables, which we hope other researchers will find useful for developing the rich scientific potential of the new catalog. The payoff for developing effective data mining techniques to extract astrophysical discoveries from this and other current variable star catalogs will only increase in the future. New, larger catalogs will continue to be released by Gaia (Perryman, 2013); the Zwicky Transient Facility (Graham et al., 2018); expanded versions of ongoing surveys including OGLE and ATLAS; and ultimately the Large Synoptic Survey Telescope. The potential for major discoveries from these data is enormous and spans almost all of astronomy, from star formation and planetary habitability to supernovae and cosmology.
2 The Data: Images and Detections
A customized, fully automated pipeline processes every image from an ATLAS telescope, outputting a flatfielded, calibrated image with both astrometric and photometric solutions. For asteroid detection, we subtract from each of these images a PSFmatched template extracted from the lownoise static sky image, or ‘wallpaper’ we have built up by stacking tens of thousands of ATLAS images taken under excellent conditions and covering the accessible sky (to a coverage depth of a few tens of images per filter at most locations). The subtraction of constant sources by differencing each image relative to the wallpaper is essential to the sensitive discovery of asteroids with a low falsepositive rate. However, the variable star results we present herein are based primarily on photometry of the images prior to subtracting the wallpaper, because deviations around the mean flux in the wallpaper are less useful for variability analyses than the total flux of a star. We use the differenced images as a final check to confirm the nature of stars with only tentative variability detections based on the unsubtracted photometry.
The analysis we present herein is based on 284,000 images, which span the sky from Dec 30 to +60, with some additional coverage north and south of these limits. Within this range, most areas of the sky are covered by more than 200 images.
We perform photometry of the unsubtracted images using the DoPhot code. DoPhot (Schechter, Mateo, & Saha, 1993) measures a star’s position and flux by adopting a point spread function (PSF) model, and iteratively finding, fitting, and subtracting each star from the image. The PSF model and aperture magnitudes are derived from the brightest stars as the program iterates. By working bright to faint it permits better photometry of faint stars by first removing bright neighbors. We use a Fortran90 version from AlonsoGarcia et al. (2012) that has a number of enhancements including floating point input and a spatially varying PSF. Our particular code has additional enhancements including bug fixes, double precision model fitting, and multithreading.
DoPhot outputs a median of about 110,000 photometric measurements per image. The mean is more than twice as large because of extremely dense starfields near the Galactic plane. Although we use DoPhot with a sensitive, threshold in order to detect the faintest measurable objects, a majority of these measurements are still expected to correspond to real stars. Under good conditions, the DoPhot measures objects significantly fainter than 19th magnitude, and the median uncertainty at magnitude 18.0 is about 0.095 mag in and slightly better than 0.15 mag in . The total number of photometric measurements in our analysis may be conservatively estimated at 220,000 per image multiplied by 284,000 images: more than 60 billion individual measurements.
DoPhot performs sophisticated fitting of photometric variations across each image due to the changing PSF. To remove additional photometric variations from a variety of sources (e.g. imperfect flatfield; gradients in atmospheric transparency; and residuals from the DoPhot fit) we perform a secondorder correction to the photometry from each image. To do this, we first calculate the offset between the measured magnitude of each star (above a flux threshold to ensure lownoise measurements) and its expected magnitude based on the PanSTARRS1 DR1 catalog (Flewelling et al., 2016), using known transformations we have derived between the PanSTARRS photometry and the wider ATLAS filters (Tonry et al., 2018). We then perform a bicubic fit on cells to model the variation in observed minus expected magnitude over the image, and we correct the measured magnitudes based on this fit. Since we have many thousands of stars per image, we are able to make the fit robust against outliers due to stellar variability and other effects.
3 Photometric Analysis
3.1 The Photometric Data
We associate the individual photometric detections to particular stars by crossmatching the RA and Dec output by DoPhot with objects in the PanSTARRS1 DR1 catalog(Flewelling et al., 2016). We elect to use the external catalog because the resolution ( 1 arcsec) and hence astrometric accuracy are much better than the ATLAS images, which in the current data set have a typical PSF width of 7 arcsec. The disadvantage of using an external catalog is that we may miss objects that have only recently become visible. Thus, we would not expect novae or supernovae to appear in our current analysis, and we might also miss some extremely longperiod, high amplitude variables that have been coming out of a deep minimum in the last three years. An ATLAS catalog of transients, focused on supernovae, is currently in preparation (Smith et al., in prep).
To construct a subset of the PanSTARRS1 DR1 catalog suitable for matching to ATLAS photometric detections, we require that each star be brighter than magnitude 19 in at least one of the , , , or bands. A sample query used to construct the PS1 DR1 catalog subset is provided in §10. To obtain the best list of PS1 objects, we require that the objects exist in the stack catalogs, with StackObjectThin.bestdetection=1, StackObjectThin.primarydetection=1, and with 1 or more valid stackObjectThin.(griz)KronMag 19 for filters ,, or . For objects passing that criteria, we use the mean astrometry (objectThin.raMean, objectThin.decMean) and mean photometry (meanObject.(grizy)KronMag, meanObject.(grizy)KronMagErr)), as these positions are calibrated to Gaia DR1, and the mean photometry is more robust than the stack photometry. Since the the PanSTARRS1 DR1 catalog does not extend south of Dec 30, we augment it in this regime using the TYCHO (Hoeg et al., 1997) and APASS (Henden et al., 2016) catalogs. These have magnitude limits considerably brighter than our intrinsic limit of , so we monitor only bright stars south of Dec 30. This explains the much higher variability fraction seen in this Dec regime in Figure 4: because smalleramplitude variations can be detected for them, bright stars are much more likely than faint ones to be identified as variable (as is illustrated by Figure 3).
For convenience, we will refer to our subset of the PanSTARRS1 DR1 catalog, augmented by the Tycho and APASS catalogs where necessary, as the ‘object matching catalog’. Note well that this catalog is an input to our object match, not an output consisting of objects already found to be present in the ATLAS data. It contains many more stars than than have been effectively measured by ATLAS.
We match ATLAS photometric detections to stars in the object matching catalog using a radius of , or slightly more than 1 arcsecond. To enable good characterization of every star, we confine our current analyses to stars for which ATLAS has at least 100 matching photometric measurements. Since most areas of the sky have been covered more than 200 times, this is not extremely restrictive, but stars that are so faint (or so confused with nearby neighbors) that the typical ATLAS image detects them with less than 50% probability will not be included in the current catalog. We find that ATLAS has more than 100 measurements for 142 million of the stars in the object match catalog. The magnitudes of these stars range from about 11–19 in the band and 10.5–18.5 in , where the bright limits are determined by saturation in the ATLAS images. Figure 1 shows the distribution of these 142 million stars on the sky, and Figure 2 shows the distribution of the number of measurements per star. The median number of measurements per star is 208, while the mean is 213.3. The total number of photometric measurements we analyze herein is therefore 213 142 million, or about 30 billion measurements. Since about twice this many measurements were obtained, roughly half of them must have corresponded to stars too faint or confused to accumulate 100 measurements, or to transients/artifacts. In DR2 we plan to extend our analysis to some of these hardtomeasure stars.
3.2 Selecting Candidate Variables
We begin our variability analysis with photometric time series for 142 million stars, each comprising at least 100 measurements. Following Flewelling (2013) and Drake et al. (2013a, 2014a), we calculate the LombScargle periodogram (Lomb, 1976; Scargle, 1982) of the photometric time series for every star, and use the output false alarm probability (FAP) for each star as our initial screening for variability. The LombScargle periodogram is more computationally intensive than traditional means of identifying variables (e.g. the Stetson indices), but it is much more sensitive to lowamplitude periodic variables, and the analysis is entirely tractable with modern facilities. A LombScargle periodogram can also sensitively detect variability that is not strictly periodic, as long as it has some type of coherent behavior with time.
This initial LombScargle periodogram is carried out by a customized program called lombscar. The nominal processing carried out by lombscar is to read the time (applying a light traveltime correction to translate the times into Heliocentric Julian Days), magnitude, mag error, and filter for each measurement of a given star, and then perform two iterations of fitting. It initializes with a fit to the light curve that consists of a constant brightness equal to the median magnitude (in each filter). All magnitude uncertainties are softened by addition of 0.03 mag in quadrature. For each iteration it does the following:

Prunes “bad points” which either:

have a magnitude uncertainty that is bigger than the larger of 0.3 mag or 2 times the upper quartile of the mag errors of the data, or

have a residual with respect to the current fit which is greater than 0.8 mag (first iteration) or 0.4 mag (second iteration), or

have a residual with respect to the current fit which is greater than 30 sigma (first iteration) or 15 sigma (second iteration).


Performs a quadratic polynomial fit to the light curve minus the current Fourier fit, including separate constant terms for each filter, but all filters sharing the same time behavior.

Calculates a LombScargle periodogram using the program LS fasper on the light curve minus the polynomial fit using HIFAC=100 and OFAC= 4. LS fasper is a slightly enhanced LombScargle code that does the calculation in double precision and uses the uncertainties of the points to get the resulting powers.

Doubles the raw period output by the LombScargle periodogram, and halves the frequency.

Does a Fourier fit of the data for every output frequency from fasper that has a probability of greater than 0.85 times the highest probability, and subsamples the frequency spacing out of fasper by 5x. These Fourier fits go to (3*freq) (hence includes the base frequency out of fasper as 2*freq). The Fourier fits report a that rejects the worst 10% of points. The very best from all of these fits is deemed to be the answer for this iteration.

At the conclusion of the iteration, computes Fourier fits at aliased frequencies of +/0.5 day and +/1.0 day.
The sampling factors ‘HIFAC’ and ‘OFAC’ (see, e.g., the discussion in Press et al., 1992) with which a LombScargle periodogram is run are important for determining the range of variables to which it is sensitive. The ‘oversampling’ factor OFAC determines how fine the sampling is in frequency space, such that the maximum phase error is approximately 1/OFAC. By viewing plots of the periodogram, one can easily evaluate whether OFAC is large enough to capture all the structure, and adjust if necessary. The HIFAC parameter determines the maximum detectable frequency. For a data set with total temporal span of and number of data points , this maximum frequency is (Press et al., 1992). HIFAC may therefore be interpreted as the factor by which the highest frequency probed exceeds the Nyquist limit that would apply if the measurements were equally spaced in time. In the case of unevenly spaced data such as ours, the LombScargle periodogogram can accurately measure frequencies many times higher than the equallyspaced Nyquist limit (Press et al., 1992). Our data have a median temporal span of about 620 days, while the median value of is 208. The maximum frequency with HIFAC=100 is therefore typically 16.8 cycles/day, corresponding to a period of 0.06 days or 1.4 hours. Eclipsing binaries and pulsating objects such as RR Lyrae stars and many Scuti stars have periods longer than this; however, some Scuti objects, subdwarf B stars, and pulsating white dwarfs have periods too short for detection in our current analysis. These objects are rare and often have amplitudes too small ( mag) for ATLAS to detect anyway. Since the runtime of the LombScargle periodogram increases linearly with HIFAC, probing down to a period of, e.g, 0.5 hours would almost triple the computational cost. For the present, we have elected not to make such a large investment to obtain a small increase in variable discoveries. We will probe shorter periods, at least of a subset of the brightest stars, in DR2.
The outlierclipping applied by lombscar, as well as its subtraction of the bestfit 2ndorder polynomial from the original time series, are intended to remove bad points and systematic trends and hence to increase the detectability of variable stars with periods shorter than the temporal span of our data. They can, however, decrease our sensitivity to longperiod variables and very highamplitude variables. Since most of our data do not appear to suffer from significant long term systematics, we have the potential to be very sensitive to longperiod variables, and we have taken steps to recover this sensitivity as detailed below in §3.2.2.
The most significant variables are those with the smallest FAP values output by the LombScargle analysis, and these probabilities range down to extremely small values (e.g. ). Hence, we adopt log10(FAP) as our primary measure of the strength of a variability detection. For convenience, we will refer to log10(FAP) as PPFAP, meaning ‘Power of the Periodogram False Alarm Probability’. Besides PPFAP, we record 31 additional statitics output by lombscar. These include the number of points (total and postclipping); the median magnitudes; the coefficients of the initial polynomial; the period identified by the LombScargle analysis; the coefficients and reduced value of the Fourier fit; and others described in §9 below.
3.2.1 Variable Features
We augment these 32 statistics from lombscar by calculating, for the each of our 142 million stars, a set of 22 additional statistics intended to be sensitive to nonperiodic as well as periodic variability, using a program called varfeat. Calculated features include the 5th, 10th, 25th, 75th, 90th, and 95th percentile magnitudes; a statistic we call Hday that probes the median nightly value to identify significant variability on a timescale shorter than a night; and a statistic we call Hlong that probes nighttonight variability relative to the intranight scatter. The varfeat analysis also includes many statistics described in Sokolovsky et al. (2017): the weighted standard deviation; interquartile range; for a constantbrightness model; robust median statistic; normalized excess variance; normalized peaktopeak amplitude; inverse von Neumann ratio; WelchStetson I; Stetson J; and Stetson K. These are described in more detail in §9.
3.2.2 Final Selection of Candidates
We wish to select a subset from our full sample of 142 million stars for more intensive variability analysis that would be computationally intractable as applied to the full set of stars. We do this in three stages.
First, we select the stars that appear to be strongly variable based on the initial analysis with lombscar. For these, we adopt a threshold of PPFAP10.0, corresponding to a formal false alarm probability of . We also add all stars from AAVSO Variable Star Index (VSX; Watson et al., 2006) catalog (downloaded as of November 2017) for which we have at least 100 measurements. The number of stars in the union of strong lombscar variables with known VSX stars is 1.1 million, or 0.77% of all 142 million wellmeasured stars. VSX stars that would not have been independently included make only a small (%) contribution to the total of 1.1 million candidates identified at this stage.
Next, to avoid excluding stars with lowamplitude variability, or objects whose variability was suppressed by the outlierclipping or polynomial subtraction applied by lombscar, we select stars with weaker LombScargle variability detections, having PPFAP between 5.0 and 10.0. This adds 2.4 million candidate variables (1.67% of all stars) to our list.
Finally, to catch any additional variables that may have been missed by lombscar, we use the varfeat analysis to select a set of potentially interesting stars that all have PPFAP less than 5.0. To determine which varfeat outputs are most useful for selecting candidate variables, we make use of the fact that all of the varfeat statistics are expected to be capable of detecting periodic as well as unperiodic variability. Thus, we can examine their degree of correlation with LombScargle FAP to identify those that are most sensitive to generic variability. We do this by calculating the 90th percentile envelope of PPFAP as a function of each of the varfeat statistics. The most useful statistics are those for which the envelope reaches the highest values while still in a regime populated with a significant number of stars. We find that the best ones are ; the Robust Median Statistic; the Inverse von Neumann ratio; the WelchStetson I and Stetson J indices (all described in Sokolovsky et al., 2017); the two that we invented to probe inter and intranight variability (Hday and Hlong, see §9); and the interquartile range (Sokolovsky et al., 2017). For all of these except the interquartile range, there is a value for which the 90th percentile envelope of PPFAP rises above 20.0, corresponding to a nominal false alarm probability of . We choose thresholds for each statistic that correspond to envelope values between 10 and 20. These thresholds are 2.5 for the Robust Median Statistic; 1.4 for the Inverse von Neumann ratio; 8.0 for WelchStetson I; 6.0 for Stetson J; 7 for Hday, and 20 for Hlong. We combined all these criteria with a logical OR, and thus identified 1.3 million potentially interesting stars (0.90% of all stars) with PPFAP values of less than 5.0 in the initial screening with lombscar.
The total number of candidate variables identified by these three selections is 4.7 million, or 3.34% of the all the stars that were analyzed. The bottom panel of Figure 1 shows the distribution of these candidate variables on the sky.
3.3 Fourier Fitting
We characterize each of our candidate variables with a program called fourierperiod, which performs a sophisticated Fourier analysis aimed at resolving any period aliases and probing the lightcurve morphology in detail. We begin this analysis with another LombScargle periodogram, which differs from the initial one in three ways. First, there is no presubtraction of a polynomial fit. Second, OFAC=20 is used rather than OFAC=4, ensuring finer sampling of the periods. Third, the outlierclipping is less aggressive. We reject all points with nominal uncertainties greater than 0.2 mag, corresponding to detections with less than 5 significance. We calculate a LombScargle periodogram without any additional clipping. However, since surviving outliers can sometimes distort a truly periodic signal and greatly reduce the value of PPFAP, we also perform three iterations of 3clipping relative to a constant model, and then recalculate the periodogram of the clipped data. Whichever data set (unclipped or clipped) produces the strongest variability detection (highest value of PPFAP) is retained for further analysis. Note that the value of used in the sigmaclipping is a simpleminded RMS scatter around the median in each filter, and hence will be elevated by the star’s own variability. This makes the clipping very conservative and ensures that, for example, no points from a pure sinusoid would be rejected regardless of its amplitude.
At each period , fourierperiod subtracts the median magnitude in each filter and then fits the data with a truncated Fourier series of the form:
(1) 
Where is a constant term, allowed to be different for each filter; and the master period and the order of the Fourier fit are determined from the data as described below.
The analysis defaults to the assumption that every star is a longperiod variable. The reasons are, first, that longperiod variability can be aliased to short periods in the LombScargle analysis (so in general it isn’t safe to assume that a highfrequency periodogram peak means a short period), and second, a search for longperiod variability is computationally cheap because only a relatively small number of periods must be probed. Therefore we begin by probing periods from 5 to 1500 days. At each period, we calculate a sampling step based on a maximum phase error :
(2) 
Where is the temporal span of the data, as before. We set to 0.025: thus, whatever the actual period of the star, we will fit some period such that no point is incorrectly phased by more than 0.025 cycles. Note that this is approximately equivalent to OFAC=40 in a LombScargle analysis. The dependence of the period sampling interval illustrates why probing long periods is cheap.
We begin by fitting a pure sinusoid ( in Equation 1) at every period from 5 to 1500 days, with the sampling dictated by Equation 2. We identify the period producing the best fit based on the value, and then evaluate the remaining signal by taking the LombScargle periodogram of the residuals. If PPFAP is greater than 4.0, we add another Fourier term and scan all the periods again. Since we have two Fourier terms now, the lightcurve could be more complex and a phasing error correspondingly more serious: thus, we reduce by a factor of two relative to its initial value of 0.025. If the residuals from the twoterm Fourier fit still have PPFAP greater than 4.0, we add a third term and reduce to 1/3 its initial value. We proceed until we reach a maximum number of Fourier terms. For the longperiod analysis, we use a maximum of 4 Fourier terms. Since longperiod variables (e.g. Mira stars) often have very different amplitudes in our different ATLAS filters, the Fourier coefficients and are allowed to be different for each filter, although the master period has to be the same.
If the periodogram of the residuals still shows PPFAP4.0 after the subtraction of a 4term Fourier fit, we conclude that the long period analysis did not find a satisfactory fit and we proceed to the shortterm analysis. Here, in the interest of computational tractability, we do not probe every possible period in a wide range. Instead, we probe a set of narrow ranges based on the initial LombScargle period, intended to include all plausible values for the true period. As in the longperiod fit, we start with a pure sinusoid and add additional terms, but the maximum is now , and the criterion for a good fit is more strict: PPFAP2.0 rather than 4.0. Also, since shortperiod variables usually don’t have huge differences in amplitude and lightcurve shape between the cyan and orange filters, the Fourier coefficients and are required to be the same for both filters, although each filter still gets its own constant term . Where the amplitude and/or the shape of the lightcurve is somewhat different in the vs. the band, the fit finds an approximate average lightcurve and no serious error results.
For a fit with Fourier terms, we probe base periods that are 1, 2, 3… times longer than the LombScargle output period . For each base period, we probe the aliases of the Earth’s sidereal day, so the full set of trial periods that we probe is given by:
(3) 
Where days is the sidereal rotation period; the alias index is allowed to take on values of 3, 2, 1, 0.5, 0, 0.5, 1, 2, and 3; and is an integer ranging from 1 to the number of Fourier terms being used in the fit. If the right hand side of Equation 3 turns out to be negative, we simply take its absolute value. We note that such ‘negative aliases’ are mathematically legitimate, and that they have the initially bewildering effect of timereversing the folded lightcurve. For example, a pulsating star with a nominal period of 2.45433 days could be exhibiting the alias of a true period of 0.625769 days, even though the left hand side of Equation 3 becomes negative if we plug in , and . In this case, the lightcurve folded at the nominal period of 2.45433 days will show a slow brightening and then a rapid fading rather than the classic ‘sawtooth’ lightcurve with its rapid rise and slow fall. Refolding the data with the correct period of 0.625769 will correct the timereversal and recover the familiar sawtooth pattern with its rapid brightening and slow fade.
We note that Equation 3 probes both aliases and multiples of the initial LombScargle period, as it should since eclipsing binaries, multimode pulsators, and other objects often have true periods that are a multiple of the period corresponding to the dominant frequency that will be identified by LombScargle. Specifically, Equation 3 probes aliases of multiples of the nominal period: it does the period multiplication first and then calculates the aliases. It is possible that the reverse procedure, probing multiples of aliased periods, would do a better job of probing the actual period ambiguities in some cases. This would produce:
(4) 
The sets of frequencies produced by Equations 3 and 4 are not identical, and we will likely use Equation 4 for DR2. However, the most probable types of period ambiguity are covered by both equations — and this is especially true since we include halfinteger aliases in our application of Equation 3.
Around each value of given by Equation 3, we search a narrow range in period that corresponds to cycles over the whole temporal span (except in the unaliased case , when we search a wider range corresponding to cycles). In each case, the period sampling is given by Equation 2, and the maximum phase error is set to 0.025 divided by the number of Fourier terms being fit.
When the bestfit period (based on the minimum criterion) has been identified for a given number of Fourier terms, the periodogram FAP of the residuals from this optimal fit is calculated. If PPFAP2, the fit is considered to have captured all the variability and fitting stops. Otherwise, another Fourier term is added and the period search begins again, unless the maximum number of Fourier terms has already been reached.
Note that the Fourier fitting rapidly becomes more computationally expensive as additional terms are added in the shortperiod fit. In the last iteration, with 6 Fourier terms, six different values of are explored; for each of which we probe the usual 9 different values of the alias , making 54 different period ranges in all. The ranges also are required to be more finely sampled, since has been reduced by a factor of 6 relative to its initial value of 0.025.
The respective FAP thresholds and maximum numbers of Fourier terms for the long and short period fits are sensitive and important parameters, and we arrived at the current values to optimize results after considerable experimentation. The maximum number of Fourier terms that can be used in the short period fit is an optimum because it usually produces very good fits to eclipsing binaries and pulsating stars, but yet is low enough that the computation does not become intractable. For the longperiod fit, we found that allowing more than four Fourier terms could sometimes enable a formally acceptable longperiod fit even to a strong and obvious shortperiod object — e.g. an RR Lyrae star vulnerable to aliasing because of having a period near 0.5 sidereal days. Such cases are extremely problematic because then the Fourier code does not even attempt the shortperiod fit that would yield the correct solution. On the other hand, giving the longperiod fit an insufficient number of Fourier terms (or a tootight threshold in terms of the acceptable FAP) results in much time being wasted in futile attempts to obtain shortperiod fits to longperiod variables.
We note that here (and throughout the current paper) we focus on the time domain rather than the frequency domain. Our intent with the Fourier series is to find a periodic function that fits the data, not to analyze the frequency content of the signal. The terms of the Fourier series have fixed frequencies , , , etc, dictated by the master period that is being explored. Thus, we are not performing a CLEAN algorithmlike subtraction of successive bestfit sine waves at arbitrary frequencies until the residuals are consistent with random noise. The latter type of analysis is required, e.g., for detailed characterization of stars that pulsate with multiple periods — while our aim at present is simply a very generalized characterization of variability that will identify stars worth further study. We suspect the ATLAS data would support such sophisticated frequency analyses of many stars, and as we are making our photometric data public, we hope the current paper will serve to guide other researchers toward promising objects of study.
Our Fourier analysis code calculates and saves 94 different statistics, which are described in detail in §9. These include the period and PPFAP of the initial periodogram; the numbers of points used for the final analysis; the original RMS scatter of the data from the mean (overall and in each filter); the master period adopted in the long period fit; the residual RMS and for this fit; the number of Fourier terms used; the minimum and maximum fitted brightness (confined to times where the fit is constrained by the data); the constant terms in the final Fourier fits; the sine and cosine coefficients for each Fourier term in each filter; the residual FAP after subtracting each successive Fourier fit; the Fourier index of the term that has the most power; analogous quantities for the shortperiod fit, if applicable, including the specifications on the aliasing and periodmultiplication of the final adopted period relative to the initial Lomb Scargle output; and two statistics measuring the degree of invariance of the shortperiod Fourier fit under timereversal and 180degree phaseshifting, respectively.
Our (rather arbitrary) definition of shortperiod is days, and applies to the highestfrequency Fourier term. Thus, the shortest master period that counts as ‘long’ in our analysis is 5 days for a pure sinusoid and 10, 15, and 20 days for fits with 2, 3, and 4 fourier terms respectively. If the longperiod analysis finds a satisfactory fit (which will necessarily have a period at least as long as these respective values), no shortperiod fit will be attempted. If the best longperiod fit is not satisfactory, a short period fit can (and will) be performed even if the period found by the initial periodogram is long. This is true because any possible input period will have aliases shorter than 5 days for some value of the alias parameter .
The limit of 5.0 days for the highestfrequency Fourier term applies to the short periods as well, so that the longest master period that counts as short is 5.0 days for a pure sinusoid but can be as long as 30 days if 6 Fourier terms are used in the fit. Thus there is some potential overlap in the regimes probed by the long and shortperiod fits. Note, however, that the shortperiod fit is performed only if the longperiod fit did not find an acceptable solution, defined as a fit with PPFAP for a LombScargle periodogram of the residuals from the fit.
3.4 Statistics from Difference Imaging
In order to detect asteroids, all ATLAS images are ‘differenced’ by the subtraction of a static sky template produced from earlier ATLAS data. Both the original and difference images are saved, and our variable star analysis thus far is based on the former. However, the difference images could be very useful in identifying variable stars, especially doubtful cases.
Hence, we wrote a program to calculate 19 potentially relevant statistics from the difference images for each candidate variable star. These are not based on reaccessing the pixels of the difference image (e.g. by doing forced photometry at the locations of suspected variable stars). Rather, they are based on existing detection catalogs (‘ddc files’) automatically produced from the each difference image for purposes of asteroid detection. The format and content of the ddc files are highly evolved due to their critical place in the ATLAS primary mission of asteroid discovery, and hence the files present a concise yet sophisticated list of analytics for each detection. Besides basic quantities such as the position, magnitude, associated uncertainties, and for the PSF fit to each detection, the ddc files include the output of an image analysis program called vartest that supports ATLAS asteroid discovery by automatically performing a pixelbased analysis to classify detected objects in the difference images and rule out false positives. For each detection in a difference image, vartest assigns the probability that it is a noise fluctuation (Pno); a cosmic ray (Pcr); an electronic artifact (Pbn, Pxt); a starsubtraction residual (Psc); a bona fide asteroid or transient (Ptr) — or a variable star (Pvr). To identify possible variable stars, vartest uses astrometric consistency between the original and difference images; unusual levels of residual flux; and a bias away from zero in the statistics of nearby pixels (which should have mean zero if the detection is a subtraction residual from a nonvarying star).
The 19 statistics we calculate from the ddc files include the number of times there was any detection corresponding to the star’s position; the median magnitude and SNR of such detections; the median of the PSFfit; and several more statistics based on the vartest probabilities. The most useful of the calculated statistics turn out to be the number of detections, and the median and rank 2 values of the variable probability ‘Pvr’ output by vartest. We identify thresholds on these statistics that are able to select a set of stars with median PPFAP10 in the lombscar analysis. The significance of this is that the ddc statistics are entirely independent of the lombscar results and hence can provide an independed confirmation of variablity. The required thresholds on the ddc statistics are hard to meet: most stars, variable and not, don’t pass the test. Of randomly selected stars regardless of variability, only 0.09% meet the criteria. We had to adopt such strict thresholds to meet the requirement of median PPFAP10, in order to reasonably claim that a star only tentatively identified as variable can, if it passes, be declared variable with some confidence. For stars meeting these demanding criteria, we assign a value ddcstat=1, indicating that the statistics from the difference images provide strong evidence of its genuine variability independent of other considerations. The vast majority of stars do not pass the ddc criteria, and are assigned ddcstat=0.
3.5 Stellar Proximity Statistics
Due to its hierarchical approach — detecting and subtracting away the brightest objects, prior to attempting to measure fainter ones — Dophot is able to extract good photometry even from dense starfields where some of the stellar images overlap and are confused. Where the PSF changes over time, however, the total number of stars detected in a confused field may change: on the blurrier images, some stars will blend together and be measured as one that were identified as distinct objects in sharper frames. This change in the number of detected stars can also affect the photometry.
To probe the effect of confusion on our photometry, we used our object matching catalog, described in §3.1. For each star in the full set of 142 million wellmeasured ATLAS objects, we calculated the distance to the nearest star in the object matching catalog (dist); the distance to the nearest star of at least equal brightness (dist0); the distance to the nearest star at least two magnitudes brighter (dist2); and the distance to the nearest star at least four magnitudes brigher (dist4). We then plotted the 99.5% upper envelope of the PPFAP in a sliding box as a function of these distances (Figure 5). The PPFAP envelope, near PPFAP=10.0 for isolated stars, rises at distances smaller than 20 arcseconds. We choose to regard as potentially affected any stars with dist arcsec or dist0 arcsec regardless of PPFAP; dist or dist020 arcsec and PPFAP15.0; and dist220 arcsec with PPFAP20.0. Since PPFAP=10.0 is our nominal boundary between strong and weak variability candidates for isolated stars, our objective here is to set conservative, but approximately equivalent, thresholds for stars that may be affected by blending from neighbors. We find no evidence that dist4 provides a meaningful constraint not already captured by dist, dist0, and dist2.
We find that 64.7% of our 142 million stars have a neighbor in the object matching catalog within 20 arcsec. Hence, the variability for all these stars is potentially spurious unless PPFAP15.0. Meanwhile, 7.88% of the stars have a neighbor 2 magnitudes brighter within 20 arcseconds: their variability might be spurious up to PPFAP. Only 0.16% of stars have a neighbor within 1.5 arcsec or a neighbor of equal brightness within 5.0 arcsec. The photometry of these last stars will certainly be affected by blending, and their variability is suspect regardless of the value of PPFAP.
To all stars with variability that is potentially suspect based on the criteria above, we assign proxstat=0, indicating that proximity statistics call their variability into question. Isolated stars or stars with values of PPFAP above the respective thresholds get proxstat=1, indicating their variability status is secure, at least as far as proximity effects are concerned.
4 Classification of Variable Stars
In §3 we have described how we analyzed our photometric time series using lombscar; the calculation of additional statistics with varfeat; detailed Fourier analysis using fourierperiod; the calculation of statistics from the difference images; and finally the stellar proximity analysis to probe possible affects of image confusion on the photometry. Of these analyses, lombscar, varfeat, and the proximity analysis are applied to all 142 million wellmeasured stars; while the Fourier fit and the difference statistics are calculated only for stars already identified as being interesting based on the earlier analyses.
For the 4.7 million candidate variables for which all five analyses were performed, we calculate and save 169 different features, including the binary proxstat and ddcstat values described above. For a description of all 169 features, see §9. All of these statistics, in addition to the lightcurves, are publicly available through STScI^{1}^{1}1http://mastweb.stsci.edu/ps1casjobs/ for all 4.7 million of our candidate variables.
Based on visual examination of a few tens of thousands of lightcurves, we came up with 13 categories of stars and developed a training set for input into machine learning algorithms, which we used to classify the remainder of the 4.7 million candidate variables. The 13 categories are CBF (close eclipsing binary, full period correctly identified by fourierperiod; CBH (close eclipsing binary, period found by fourierperiod is half the true orbital period); DBF and DBH (detached eclipsing binaries with either the full or half period identified; PULSE (pulsating variables of any kind for which the period found by fourierperiod corresponds to a single pulse); MPULSE (pulsating variables for which the period corresponds to multiple pulses: hence, likely multimode pulsators); SINE (pure sine wave); NSINE (pure sine wave was fit, but the data are noisy and/or residuals indicate nonsinusoidal variations); MSINE (modulated sine wave; period corresponds to multiple cycles: analogous to MPULSE); MIRA (Miratype longperiod, highamplitude variables); LPV (generic hardtoclassify variable without much power at frequencies corresponding to periods less than 5 days); IRR (generic hardtoclassify variable with significant power at high frequencies); and ‘dubious’ (probably not a real variable). These categories were chosen based on extensive visual examination as capturing most of the morphological types of lightcurves present in our data.
We performed machine training and classification using the Google TensorFlow machine learning library on a standard Linux platform with a single GPU card. 39,100 handclassified variable stars were selected for the TensorFlow training set. Seventy features were selected for training from the full set of 169 variable star features output by the five analyses described above. We employed the TensorFlow DNNClassifier model, a simple deep neural network, with three hidden layers of 400, 800, and 400 nodes respectively in each layer. This architecture was selected after iterating with models with different numbers of hidden layers and nodes as the simplest model capable of attaining high training accuracy.
The seventy features used for machine learning are described in §9. They include the PPFAP from the LombScargle periodogram run by fourierperiod; the filterspecific raw RMS scatter; the master period, min and max brightness, residual RMS, and Fourier coefficients from both the long and (if applicable) shortperiod fits performed by fourierperiod; the two parameters that describe the invariance of the lightcurve under 180degree phase shift and under timereversal centered on the deepest minimum. They also include several statistics output by varfeat: the median magnitudes and 5th, 10th, 25th, 75th, 90th, and 95th percentile magnitudes; Hday, Hlong, , the robust median statistic, Inverse von Neumann ratio, WelchStetson I, and Stetson J statistics.
In the extended trialanderror process of finding a satisfactory methodology for the machine classification, one important breakthrough was achieved when we converted the Fourier terms from sine and cosine coefficients to amplitude and phase. Feeding the machine phase and amplitude information produced markedly more accurate classifications. We defined the amplitude and phase coefficents so that the mth Fourier term, previously given as in Equation 1 by:
(5) 
is instead expressed as:
(6) 
We choose this particular formulation because it has the property that the minimum brightness (maximum magnitude) for a given Fourier term will occur whenever the argument of the cosine is zero, and if is the same for all values of , the minimum brightness will occur at the same time for all Fourier terms. Of course, different values of are equivalent if separated by for any integer . We regularize the interpretation of for terms with by choosing so that will be as close as possible too, but greater than, . Combined with the definition in Equation 6, this also has the implication that the phase offset between and cannot be greater than .
Another breakthrough was the training of the machine classifier in two stages. In stage 1, we pool the LPV, IRR, and ‘dubious’ classifications into a single classification called HARD. This step allows the classifier to train on the most distinct classes of variable stars, achieving an accuracy of 94.1%. For stage 2, we train a second classifier using the same training set to separate HARD variable stars into LPV, IRR and ‘dubious’ classes, with an accuracy of 96.8%. Training the DNNClassifier model typically takes up to 10 minutes on our singleGPU system, and classifying all 4.7 million candidate stars using the trained model took about 10 minutes.
The probabilities output by the machine classifier for each of the 13 classes of variables, as well as a generic ‘HARD’ probability, are provided for each star along with the vector of 169 features already mentioned. Including the proxstat and ddtstat values, we thus provide a total of 185 statistics for each star. All of these are publicly available from STScI, in addition the the lightcurves.
After the final round of classification with machine learning, we use parameters output by fourierperiod to identify subsets of most of the categories that are atypical and hence potentially misclassified. We investigate these by hand and reclassify them where appropriate — an exceedingly interesting exercise since some of them are very unusual objects (see §6). Among the highamplitude stars classified by machine learning as MIRA (and a few other classes) are a handful of objects with Miralike amplitudes but colors not red enough for actual Mira variables. In addition to their relatively blue colors, they often show less smooth lightcurves than real Mira stars. We have invented a new class for these objects: SHAV, for ‘slow highamplitude variable’. They include known AGN, variables of the R Coronae Borealis type (which, though red, are not as red as Miras), and other exotic objects.
Lastly, we make use of the proxstat and ddcstat values to adjust classification as follows. We assume that stars classified as any type of eclipsing binary (CBF, CBH, DBF, and DBH), pulsator (PULSE and MPULSE), coherent sinusoid (SINE and MSINE), or Mira variable have lightcurves with specific characteristics that are unlikely to be spuriously produced by blending. Hence, we do not adjust classifications for stars in any of these types due to proxstat=0. On the other hand, the generic categories IRR and LPV, as well as the lower significance variables in the NSINE category, could be contaminated by spurious variables due to blending. Hence, we reclassify all IRR, LPV, and NSINE variables that have proxstat=0 as ‘dubious’ unless they also had ddcstat=1, in which case their classifications were left unchanged. This exception makes sense because blended stars subtract just as cleanly as unblended ones in our difference images, so ddcstat=1 rules out a blend as the cause of the original variability detection. Given this fact, we should also reclassify all the ‘dubious’ stars with ddcstat=1 as something else. Reclassifying them as IRR could be a reasonable choice, but we elected to invent a new classification to reflect the unique analytical history of these stars. Since the machine learning did not classify them as IRR, it seems reasonable that they might be even farther from coherent periodicity than most stars in the IRR category. In order to communicate this, we have elected to call them ‘STOCH’, for stochastic. Thus, our final classification includes fifteen categoires: the thirteen listed above plus SHAV and STOCH. These are given in Table 1, and described in more detail in §4.1.
Class  Description 

CBF  Close binary, full period 
CBH  Close binary, half period 
DBF  Distant binary, full period 
DBH  Distant binary, half period 
dubious  Star might not be a real variable 
IRR  Irregular: catchall for difficult shortperiod cases 
LPV  Long period variable: catchall for difficult cases 
MIRA  Highamplitude, longperiod red variable 
MPULSE  Modulated Pulse: likely multimodal pulsator 
MSINE  Modulated Sine: multiple cycles of sinewave were fit 
NSINE  Noisy Sine: pure sine was fit, but residuals are large or nonrandom 
PULSE  Pulsating variable 
SHAV  Slow HighAmplitude Variable, too blue or irregular for Mira 
SINE  Pure sine was fit with small residuals 
STOCH  Stochastic: certainly variable, yet more incoherent even than IRR 
With its broad classes derived from visual investigation of lightcurve morphologies in our particular data set, our classification scheme is quite different from the schemes adopted by most previous works on variables detected in sky surveys (e.g. Drake et al., 2013a, b, 2014a; Jayasinghe et al., 2018), which have generally adopted preexisting, astrophysicallybased classification schemes with larger total numbers of categories. Both approaches have their merits, and the ATLAS data we present herein would certainly support more categories of classification. We adopt the broad, morphological categories partly with the objective of handing the machine classifier an easier problem and hence obtaining more reliable results from it — an important consideration since the huge number of stars we have classified precludes checking more than a small fraction of them by hand. Our broad categories may also lend themselves to the detection of unusual objects or new classes of variables: each broad category provides a helpful context of objects that are in some way similar, while at the same time containing considerable substructure on which the classifier has not yet passed any judgement. In effect, this can allow the stars to tell us how they want to be classified — a topic we explore further in §5. One disadvantage of our current scheme, which we intend to correct for DR2, is that it has no separate class for spotted rotators and other periodic variables that are not eclipsing binaries, pulsators, Mira stars, or sinusoids such as ellipsoidal variables. Many spotted rotators have likely been classified as IRR or LPV, even though they may show quite regular periodicity, and a few may also have been misclassifed as eclipsing binaries or pulsating stars.
Table 2 gives the total number of stars finally classified in each category, as well as the number that were rescreened by hand (if any) and the number that turned out to be new. The new objects are identified by excluding every star recorded in the VSX or GCVS catalogs (downloaded on March 15, 2018); the catalog from the ASASSN survey presented by Jayasinghe et al. (2018); the catalogs from the Catalina Sky Survey presented by Drake et al. (2013a, b, 2014a); and the OGLE catalogs published since 2010 and covering variable stars north of Dec 50 (Soszyński et al., 2011a, b, 2013, 2014; Mróz et al., 2015; Soszyński et al., 2015, 2016, 2017). We note that most of the Catalina Sky Survey variables had already been incorporated into the VSX at the time of our download, and that due to the mostly southerly coverage of the OGLE surveys, there was very little overlap between our data set and the OGLE variables.
Class  Total  Rescreened  New  Percent New  Percent ddcstat=1 

CBF  44165  810  25901  58.65  30.98 
CBH  36582  789  26196  71.61  16.65 
DBF  11338  458  8487  74.85  10.55 
DBH  17672  1392  14121  79.91  9.05 
dubious  4307019  0  4218985  97.96  0.00 
IRR  82960  0  72137  86.95  9.87 
LPV  50909  0  29968  58.87  38.03 
MIRA  7626  627  2063  27.05  55.23 
MPULSE  5514  873  2357  42.75  33.71 
MSINE  36285  229  30702  84.61  2.74 
NSINE  64726  0  58777  90.81  1.03 
PULSE  25162  5749  8031  31.92  40.33 
SHAV  17  17  2  11.76  58.82 
SINE  29404  0  23422  79.66  3.14 
STOCH  14834  0  13076  88.15  100.00 
All EBIN  109757  3449  74705  68.06  20.56 
All pulsators  30676  6622  10388  33.86  39.14 
4.1 Categories of Variables: Examples and Discussion
In this section we provide a brief discussion of each type of variable, and in Figures 6 through 18 we present nine examples of each type except ‘dubious’. These examples are randomly chosen from previously unknown variables in each category, with no attempt to avoid showing failures of our analysis or classification. The data are phasefolded at the master period output by fourierperiod. Two full cycles are shown for each object, with the points overplotted on the bestfit Fourier model. A vertical dotted line indicates the end of the first plotted cycle: hence, its intersection with the x axis gives the period. Data from the ATLAS band are shown in blue, while the band data are shown in red. In most cases, a consistent magnitude scale is used for all nine plots so that the diversity in amplitude can be seen at a glance. Since measured magnitudes are shown without rescaling, the offset between the and band data indicates the star’s color.
For irregular variables or stars whose period is greater than the temporal span of our data, the master period is not expected to correspond to any true astrophysical frequency. For such systems, the Fourier fit should be interpreted not as a measurement of a true cyclical pattern but merely as a probe of the system’s photometric coherence. A few cases exist (especially among the longperiod objects) where the Fourier fit runs away to unreasonable values during intervals of time that are unconstrained by the data. Our analysis is designed to avoid any adverse effects from these cases (e.g., in determining the min and max fitted magnitudes, we evaluate the fits only at times corresponding to actual measurements). In some cases where the star is very red and the band Fourier fit runs away due to the resulting paucity of band points, we have refrained from plotting the band fit to avoid distraction.
CBF: Close binary, full period. These stars are contact or nearcontact eclipsing binaries for which the Fourier fit has found the correct period and hence fit the primary and secondary eclipses separately. Classification tends to be very definitive in this category, with the rate of serious misclassification being as low as 1%. Mild errors such as confusion between the CBF and DBF classes; and period errors in which the nominal period is 1.5 times the correct value, may be slightly more common.
CBH: Close binary, half period. These stars are contact or nearcontact eclipsing binaries for which the Fourier fit has settled on half the correct period and hence has overlapped the primary and secondary eclipses. Physically, the CBF and CBH stars are expected to differ in that the primary and secondary eclipses are likely to be more similar in depth in the latter class. Like CBF, CBH stars are very rarely misclassified. Even RRc variables, which are notoriously difficult to distinguish from contact binaries because of their symmetrical lightcurves, are wellseparated by our Fourier analysis, especially the phase offsets (see Figure 20). At the longest periods there may be some contamination from spotted rotators and/or extremely symmetrical Cepheidtype pulsators.
DBF: Distant binary, full period. These stars are detached eclipsing binaries for which the Fourier fit has found the correct period and hence fit the primary and secondary eclipses separately. These stars are challenging because their lightcurves are flat much of the time, causing the PPFAP values to be relatively low, and our maximum of six Fourier terms can be insufficient to fit the narrow eclipses. Hence, a few percent of them may be misclassified, and a larger fraction likely have incorrect periods. Better results could be obtained using the Box Least Squares (BLS) algorithm of Kovács et al. (2002) as was done, e.g., by Jayasinghe et al. (2018). We judged this to be too computationally intensive at present, but will likely use it in DR2.
DBH: Distant binary, half period. These stars are fully detached eclipsing binaries for which the Fourier fit has settled on half the correct period and hence has overlapped the primary and secondary eclipses. Like the DBF class, they are challenging for our analysis, and are more likely to be misclassifed or to have incorrect periods than CBF and CBH. This will be helped in DR2 by our intended use of the BLS algorithm. Nevertheless, most of them are correctly classified even in the current analysis.
PULSE: Pulsating stars showing the classical sawtooth lightcurve, regardless of period. They are expected to include both RR Lyrae and Scuti stars, and some Cepheids. These classes are resolvable based on period, color, amplitude, and the phase offsets of the various Fourier terms. The pulsating stars are in many ways the most interesting class, since they contain the RR Lyrae and Cephied stars useful for distance determination. Accordingly, we put a great deal of effort into producing a clean set of accurately classified stars. Nearly 6000 were screened by hand to check the machine classification. The final misclassification rate should be as low as 1%. Most of the misclassifications are likely to be at the longest periods, where there may be some confusion with spotted rotators; and among the faintest objects, where low SNR made the classification difficult.
MPULSE: Stars showing modulated pulsation, such that the Fourier fit has settled on a period double or triple the actual pulsation, in order to render multiple pulses of different amplitudes or shapes. These objects could be multimodal or Blazhkoeffect stars, or stars exhibiting some other kind of variability in addition to their pulsations. In the case of the known highamplitude Scuti (HADS) star CSS_J082237.3+030441, we have confirmed multiple pulsation modes using targeted highprecision photometry with the University of Hawaii 2.2 meter telescope on Mauna Kea.
SINE: Sinusoidal variables. These stars exhibit simple sinewave variability with little residual noise. Ellipsoidal variables likely dominate this class. There may also be some RR Lyrae stars of type C, especially at faint magnitudes where the lower SNR makes it difficult to detect the nonsinusoidal nature of their lightcurves. Spotted rotators can also show sinusoidal variations: stars whose rotation axis is only modestly inclined to our line of sight may have circumpolar or nearcircumpolar spots, which will produce sinusoidal variations due to their changing aspect ratio as long as the inclination is nonzero.
MSINE: Stars showing modulated sinusoids. These are exactly analogous to the MPULSE stars, except that instead of a classic sawtooth pulse lightcurve, the fundamental waveform being modulated is a simple sinusoid. Thus, MSINE stars may show 2, 3, 4, 5, or even 6 cycles through the Fourier fit. Each cycle appears a good approximation to a sine wave, but the amplitude and/or mean magnitude varies from one to the next. Physically, the MSINE stars may include spotted ellipsoidal variables; rotating stars with evolving spots; and sinusoidal pulsators such as RR Lyrae (RRC) stars that have multiple modes or multiple types of variability. Period, color, and amplitude, as well as the exact form of the modulation, will likely elucidate the more detailed classification.
NSINE: Sinusoidal variables with much residual noise, or with evidence of additional variability not captured in the fit. Many spotted rotators with evolving spots likely fall into this class, as well as faint or lowamplitude Scuti stars and ellipsoidal variables.
MIRA: Mira variables. These stars are a subset of the LPV’s that have photometric amplitudes exceeding 2.0 magnitudes in either the cyan or orange filter. They generally show extremely coherent periodicity, but the twoyear temporal baseline of our data may in many cases be insufficient to solve for the period accurately.
SHAV: These are the slow highamplitude variables, an extremely rare class with long periods and Miralike amplitudes, but with color insufficiently red for a true Mira. Only 17 of these were identified in our entire catalog. They include AGN, R Coronae Borealis stars, and at least one apparent nova. As almost all of them are known (one of them is the archetypal variable R Cor Bor itself!), we do not have a figure showing unknown SHAV stars.
LPV and IRR: The acronyms stand for ‘long period’ and ‘irregular’ variables. These classes serve as ‘catchall’ bins for objects that do not seem to fit into any of our more specific categories. The LPV class contains objects whose variations appear to be dominated by low frequencies, corresponding to days, while the IRR class contains objects whose dominant frequencies are higher. Most of the stars classified as LPV or IRR (especially the latter) don’t show coherent variations that can be folded cleanly with a single period: hence, both classes are in some sense ‘irregular’, though the characteristic timescales are different. A characteristic timescale (i.e. a dominant frequency) is usually present even though the data cannot be cleanly phased. This timescale likely corresponds to some astrophysical reality such as a rotation, orbital, or pulsation period. Both the LPV and IRR classes contain a significant minority of objects with coherent variations that can be cleanly phased with a single period. That such objects end up in our ‘catchall’ categories indicates that their periodic waveform, though coherent, is not a good match to any of our more specific classifications. Some of these may simply be faint or noisy examples of variables that should have fallen into one of the specific ATLAS categories, but were not identified by the machine classifier due to the low significance of the signal. However, others are likely periodic variables of well defined astrophysical types that don’t fit any of the ATLAS classes — e.g. spotted rotators such as BY Draconis variables. Among the objects that cannot be cleanly phased to a single period, the LPV class surely includes many semiregular red giant variables, while the IRR class has a large number of cataclysmic binaries.
While a large majority of objects in both the LPV and IRR classes are expected to be true variables, a larger fraction may be spurious than in the foregoing, more specific categories. Readers interested in studying these objects using our catalog can easily select a purer sample of true variables. A very clean (though greatly reduced) sample would be obtained by simply requiring ddcstat=1. Alternatively, a more sophisticated selection could use thresholds on amplitude, PPFAP, the ratio of the raw RMS scatter to the residual RMS from the best fit, Hday, Hlong, or any of a number of other useful statistics described in §9. Examples of database queries relevant for such selections are given in §10.
STOCH: These are the variables that don’t fit into any coherent periodic class, not even IRR. They would be classified as ‘dubious’ except that they have ddcstat=1, meaning that detections on the difference images demonstrate their genuine variability. Their physical nature is unclear, but many of them do appear to exhibit highly significant stochastic variations with very little coherence on the timescales probed by ATLAS. Some of these may be very highfrequency variables with periods too short to be captured by lombscar or fourierperiod.
4.2 The dubious variables
The majority (more than 90%) of our candidate variables are designated by the machine classifier as ‘dubious’, indicating that the significance of the variability is so low that we cannot be sure they are real. Byhand examination of randomly chosen samples suggests that at least 2%, and probably 5–10%, of these stars are actually real variables that were too faint or lowamplitude to classify definitively. These include faint RR Lyrae stars, eclipsing binaries, spotted rotators, and other classes. Many of these will likely be raised to the status of definitive variables in ATLAS DR2. Figure 19 shows the best 2% from a randomly chosen sample of 450 ‘dubious’ stars screened by hand. We are confident that all of them are true variables.
If our higher estimate of a 10% variability fraction holds, as many real variables are to be found in our ‘dubious’ category as in all the others put together. Even taking the minimum value of 2%, we have more than 80,000 true variables classified as ‘dubious’. This illustrates a common shortcoming of machine classification: in order to obtain a fairly pure sample of true positives, as we have done, one must accept that a considerable number of good objects will be discarded. However, researchers interested in finding ‘gold in the minetailings’ can likely use some of the 169 variability features we have calculated for each star to identify many of the objects for which the ‘dubious’ classification was incorrect. Additionally, we have preserved for every star the probabilities output by the machine learning for every class, including the aggregate ‘HARD’ class (see §4). Presumably most of the ‘dubious’ stars that are real variables will have probabilities significantly below 1.0 for ‘dubious’ and/or ‘HARD’. It would be easy, for example, to perform a database query (see §10) that would select all the stars that were classified as ‘dubious’ but had a probability less than 0.6 for the ‘dubious’ class — or, e.g., had a PULSE probability greater than 0.2. Such queries can likely be used not only to find real variables in our ‘dubious’ category, but even real variables of specific types.
5 Connection of ATLAS Variables with Astrophysics
5.1 Fourier Phase Offsets
Our current analysis using machine learning augmented with screening by hand has barely scratched the surface of the rich data mine comprising the 169 features we calculate for each variable star, much less the photometric data itself. Figure 20 gives one example of such rich information. The plot shows amplitude vs. period for the variables in the PULSE category, with each star color coded according to the phase offset of its first two Fourier terms. The astrophysical sequences of the RRab and RRc stars are clearly resolved, as are the Scuti variables. RRc variables have smaller phase offsets indicative of their more symmetrical lightcurves, while the similar colors of the Scuti and RRab stars indicate their similar, highly asymmetrical sawtooth lightcurves.
In the lower right of the plot is a remarkable cluster of variables with small amplitudes ( mag, periods from 1–5 days, and phase shifts clustered near . These are lightcurves of a very different type, and may indicate a new class of variables. Whether they are actually pulsators is unclear. These objects are discussed further in 6.1.
In constrast to the rich diversity of phase offsets among pulsating variables, Figure 21 shows that eclipsing binaries in our CBH class almost all have phase offsets near or , indicating that the minima of the first and second Fourier terms are being aligned to produce a relatively deep and narrow eclipse. They are clearly distinguishable from the RRc stars in this analysis, though the two types are commonly confused.
Figure 22 illustrates even finer substructure by plotting the phase offsets of the 2nd, 3rd, and 4th Fourier term relative to the first term for pulsating variables fit with the maximum number of six Fourier terms. As described in §4, the maximum possible offset decreases for higherorder Fourier terms, being always equal to 360 for the mth Fourier term. The differences between RRab and RRc type variables seen in Figure 20 appear here as well, as do the similarities between the RRab and Scuti stars. Additional substructure in each of these groups also appears, indicating systematic changes in lightcurve shape with period. New, less populated sequences appear at longer periods, which may correspond to Cepheids and W Virginis variables.
5.2 Astrophysical Nature of SINE and NSINE Classes
Even without phase offset information, which is not applicable to stars fit with a pure sinusoid, we can use features of the amplitude vs. period distributions to probe the astrophysical nature of SINE and NSINE stars. Figure 23 shows the amplitude and period for the SINE and NSINE stars compared with CBH and PULSE.
Several interesting facts emerge. A considerable number of RRc variables, but very few RRab, have been classified as SINE. This is not surpising since the lightcurves of RRc stars are much more sinusoidal than the strong sawtooth curves of the RRab — a fact that is also indicated by Figure 20. Interestingly, the number of RRc stars classified as NSINE is much smaller even though the NSINE class is far more populous than SINE.
A clear vertical edge is seen at a period of about 0.11 days in the CBH, SINE, and NSINE plots. This corresponds to the well known 0.22 day orbital period cutoff for contact binaries (Drake et al., 2014b; Soszyński et al., 2015), and indicates that many of the SINE and NSINE stars are close binary systems. The SINE and especially NSINE classes extend to far lower amplitudes than CBH. Although some of the high amplitude SINE and NSINE variables are certainly misclassified eclipsing binaries, the lowamplitude majority are mostly ellipsoidal variables in systems astrophysically similar to the CBH stars but insufficiently inclined to our line of sight to exhibit eclipses.
Comparison between PULSE, SINE, and NSINE distributions in Figure 23 strongly suggests that the shortestperiod Scutis in our sample have been designated SINE or NSINE. The likely explanation is that their sawtooth waveforms contained frequencies outside the range of our Fourier analysis.
The lowestamplitude SINE and especially NSINE stars at periods longer than 0.3 days are likely to be spotted rotators rather than ellipsoidal variables. Additionally, many SINE and NSINE stars have periods longer than the maximum covered by Figure 23. We believe most of these longerperiod stars are spotted rotators, although some ellipsoidal variables will exist among them, especially in the the SINE class.
5.3 The ColorDependent ShortPeriod Limit of Eclipsing Binaries
A simplified and approximate analysis using Kepler’s Third Law suggests the period of a contact binary (at least, one with equalmass stars) should scale as , where is the mean density of the stars. Since more massive stars have lower mean densities when on the main sequence, it follows that contact binaries composed of more massive mainsequence stars should have longer periods. Since evolved stars have lower densities than main sequence stars of equal mass, and contact binaries have shorter periods than detached binaries of equal mass, it follows that for stars of a given mass, the shortestperiod binary star will be a contact binary composed of mainsequence stars — and that the period of this shortestperiod binary star will increase with the masses of the components. Since more massive main sequence stars have bluer colors, we predict that a plot of period vs. color for eclipsing binary systems will show a lower envelope in period that increases toward the blue. We test this prediction with Figure 24.
Although the train of reasoning above ignores many details (such as the distinction between contact and overcontact systems, and that fact that mutual interactions tend to cause contact binaries not to be normal mainsequence stars), Figure 24 confirms its basic prediction: from the bluest objects plotted redward to a color of about 0.3 magnitudes, the lower envelope of the period distribution descends sharply. This suggests that, indeed, more massive contact binaries have longer periods.
The trend we have predicted experiences a sharp change at a color of 0.3 mag, and the envelope does not continue steeply descending at redder colors. This is because it has run into the shortperiod limit at of days for contact binaries, which is well known although its astrophysical causes are not understood in detail (Drake et al., 2014b; Soszyński et al., 2015). It seems likely that it has to do with a limiting mass near 0.6 , below which mainsequence binaries are not able to evolve into a state of contact, perhaps due to the greatly reduced efficiency of winddriven angular momentum loss in such low mass stars (Soszyński et al. (2015) and references therein). In this context, it is interesting that the envelope in Figure 24 does continue to descend, albeit much more slowly, from a color of 0.3 mag out to the reddest objects plotted. This may be a clue to the detailed astrophysics of the 0.22 day period limit.
5.4 Galactic Distributions of Variable Classes
It is very interesting to plot the onsky distribution of our different classes of variables in Galactic coordinates. We show such plots for RR Lyrae stars and for all eclipsing binaries (CBF, CBH, DBF, and DBH) in Figure 25. The eclipsing binaries are strongly clustered to the Galactic plane, clearly indicating a disk population. By contrast, the RR Lyrae stars are distributed widely across the sky with very little preference for the Galactic plane, but with a strong concentration toward the Galactic center — the signature of an old, halo population. Although our sensitivity is not sufficient to probe star streams in the outer halo as was done by Drake et al. (2013a, b); Hernitscheck et al. (2016) and Cohen et al. (2017) using much larger telescopes, the faintest RR Lyrae stars plotted in Figure 25 do indicate significant substructure in the halo at distances less than 30 kpc.
While Figure 25 probed disk vs. halo populations, it is interesting also to attempt to divide stars into different luminosity classes based on the characteristics of their variability. From the discussion in §5.3, we would predict that the shortestperiod eclipsing binaries would be lowmass stars with very low intrinsic luminosities, while longerperiod eclipsing binaries should generally be more luminous. Many of the LPVs will be giant stars and hence should be even more luminous than the longerperiod eclipsing binaries, which we expect will mostly still be on or near the main sequence. Finally, the Mira stars, being red supergiants, should have the greatest mean luminosity of any class. Figure 26 shows the onsky distribution of each of these four types of objects in Galactic coordinates.
We would expect that the lowestluminosity objects will be invisble to ATLAS at large distances, and hence all of them will be quite local. If the maximum distance at which we can detect them is only a few times the thickness of the Galactic disk, then the onsky distribution will show only mild clustering toward the Galactic plane. By contrast, more luminous objects that can be seen at distances equal to many times the thickness of the disk should show a stronger clustering in the Galactic plane. The enormous concentration of almost all types of stars in the Galactic bluge should dictate that the onsky distribution of any type of star that is luminous enough to be seen at the distance of the bulge will be strongly concentrated in the direction of the Galactic center. Figure 26 bears all of these expectations out beautifully. While all four types of objects are clearly disk rather than halo populations, their differing luminosities result in very different distributions on the sky, with the Mira variables, being visible at enormous distances, concentrated in the direction of the Galactic center.
6 Interesting and Mysterious Subtypes
6.1 ‘Upsidedown CBH’ Variables
These objects correspond to the possible new class of variables labeled in Figure 20. As implied by their location in that figure, they lowamplitude lightcurves with periods ranging from about 1 to 5 days — and consistent with the clustering of their phase offsets near 90, they have narrow, symmetrical maxima very similar to the minima of CBH eclipsing binaries. Figure 27 shows four of the most representative lightcurves. In the course of byhand screening, we have identified a total of about 70 such objects, but there are probably many more in our catalog.
For lack of a better term, we refer to these objects as ‘upsidedown CBH’ variables, since their lighcurves do indeed look almost exactly like a CBH turned upside down. This inversion explains why they have phase offsets near 90, in contrast to 0 or 180 for real CBH systems. For CBH stars, the minima of the first two Fourier terms align to produce a deep, narrow eclipse, while for the upside down objects, the maxima align to produce a tall, narrow peak. The upsidedown CBH variables are certaintly not any type of eclipsing binary. Our machine classification designates most of them as PULSE. It seems conceivable that they do represent a new type of pulsating variable, although we are not convinced this is the best explanation.
Almost all of these variables are ATLAS discoveries. In the few cases that do appear in the VSX, classifications are not consistent: examples include ‘EW’ (contact binary), ‘ROT’ (spotted rotator), and ‘R’ (binary star with strong reflection effects). Given the shapes of our lightcurves, the ‘EW’ and ‘R’ classifications can be ruled out immediately (the latter because it should produce a pure sinusoid). Since a conspiracy of spots could produce almost any type of waveform, a spotted rotator cannot be ruled out in any particular case. However, the fact that the objects appear to constitute a welldefined class with consistent waveforms does seem to rule out ordinary spotted rotators.
Another possibile explanation is that they are binary systems in which a compact object is accreting material from a giant companion through an optically thick accretion disk. The stream of accreting material could then be making a bright spot where it impacts the accretion disk (presumably setting up a standing shock), and the periodic appearance and disappearence of this spot as the stars orbit one another would explain the variability we see. However, the luminosity of the standing shock would have to be remarkably consistent over time to explain the very coherent lightcurves we have observed for these objects.
6.2 Eclipsing Binaries Showing the O’Connell Effect
Figure 28 shows example eclipsing binaries showing the O’Connell effect (O’Connell, 1951), in which the brightness of the two maxima are different. The astrophysical cause is unknown (Wilsey & Beaky, 2009). There are several hundred such systems in our data.
Drake et al. (2014a) concluded based on lightcurves from the Catalina Sky Survey that the O’Connell effect is probably not caused by starspots, because it remains coherent over periods of years. The ATLAS data support this assessment in a majority of cases. Wilsey & Beaky (2009) note cases where the O’Connell effect is definitely inconstant, and suggest that in these cases it is due to spots — however, they argue that more than one astrophysical cause may be at work. It seems possible that the large number of O’Connell stars in the ATLAS data set may enable the deduction of the true astrophysical explanation(s) through a statistical analysis.
6.3 TwoCycle Modulated Sine Waves
Figure 29 shows representative lightcurves with 2cycle modulated sine waves. There are hundreds of these in our data. It seems most likely that they are ellipsoidal variables or eclipsing binaries with grazing eclipses whose maxima differ in brightness due to the same astrophysical cause that produces the O’Connell effect. If so, their statistics may also provide a clue to the physical nature of the effect. At present, however, we cannot rule out the alternative hypothesis that they are multimode pulsators with very symmetrical lightcurves.
6.4 Eclipsing Binaries that are Apparently Extreme Examples of the O’Connell Effect
Figure 30 shows apparent eclipsing binaries that exhibit the O’Connell effect in such an extreme degree that it seems to call into question the nature of the systems: are they really eclipsing binaries, or are they peculiar multimode pulsators or some other exotic type? There are are at least a few dozen like this in our data.
UV Mon, the only known variable among these four examples, is classifed as an eclipsing binary by VSX and was analyzed in detail by Wilsey & Beaky (2009). The two variables on the right in Figure 30 are even more extreme than UV Mon. Are even they real eclipsing binaries? Soszyński et al. (2016) provide at least tentative support for a ‘yes’ answer: they show the lightcurve of a star (OGLEBLGECL334012), which they classify as a peculiar eclipsing binary and which closely resembles those on the right hand side of Figure 30. In particular, the lightcurve of OGLEBLGECL334012 is almost identical to ATO J330.3774+05.8334 (though the ATLAS star is two magnitudes brighter and has a longer period).
Thus, previous classifications seems to support the idea that objects like those in Figure 30 are bona fide eclipsing binaries. If so, the extreme O’Connell signature that they exhibit should place strong constraints on the astrophysical cause of the effect. For example, it seems improbable that starspots could have sufficient contrast with the photosphere to cause such a strong effect — a conclusion that Wilsey & Beaky (2009) come to even in the milder case of UV Mon.
6.5 Notched Stars
Figure 31 shows examples of a rare type of variable star (only a few tens appear to exist in our data) that we initially interpreted as pulsating stars in eclipsing systems. This is probably not the case, however, since the pulsations would then have to be improbably synchronized with the orbital period.
Of the two known objects in Figure 31, VSX classifies CSS_J154809.4+305438 simply as an Algoltype eclipsing binary, while EQ CMa is classified in VSX as an RS Canum Venaticorumtype binary system, a spotted rotator with eclipses. However, the guide entitled Variable Star Type Designations in VSX (by Sebastián Otero, Christopher Watson and Patrick Wils) states that the lightcurves of RS Canum Venaticorum stars “look like sine waves outside eclipses”, which does not seem to describe the stars in 31. Their astrophysical nature remains mysterious to us.
7 Conclusion
We analyze 142 million stars with magnitudes ranging from 10.5–19 in the ATLAS bands, and with at least 100 photometric measurements each during the first two years of ATLAS operations. Among these stars, we identify 4.7 million candidate variables. For each of these candidates, we calculate 169 variability features. We use 70 of these features as input for a machine classifier that sorts the stars into broad, astrophysically useful classes based on lightcurve morphology.
We briefly explore just a small part of the rich astrophysical potential of these data. We find them to be useful for the detailed categorization of pulsating variables and eclipsing binaries, and for probing the astrophysical nature and Galactic distribution of many variable types. The enormous statistical power of the large, homogeneously analyzed data set may enable the elucidation of longstanding astrophysical mysteries such as the cause of the O’Connell effect (O’Connell, 1951) in eclipsing binaries. Our data may also result in the discovery of new classes of variables (see §6.1).
For all 4.7 million candidate variables, we publically release all of the ATLAS photometry, plus the vector of variability features and the probabilities from our machine classifier. This is the largest catalog of candidate variables yet released with photometry. We hope that many in the astronomical community will query the data through STScI (see §10) and exploit its potential for exciting discoveries and productive synergies with other projects.
8 Acknowledgements
We acknowledge useful discussions with Mark Huber, Dan Huber, Ben Shappee, and Jennifer van Saders. Support for the ATLAS survey was provided by NASA grant NN12AR55G under the guidance of Lindley Johnson and Kelly Fast. This publication makes use of the SIMBAD online database, operated at CDS, Strasbourg, France; the VizieR online database (see Ochsenbein et al., 2000); and the International Variable Star Index (VSX) database, operated at AAVSO, Cambridge, MA, USA.
References
 Alard & Lupton (1998) Alard, C. & Lupton, R. H. 1998, ApJ, 503, 325
 Alcock et al. (1993) Alcock, C., Akerlof, C. W., Allsman, R. A., et al. 1993, Natur, 365, 621
 Alcock et al. (1998) Alcock, C., Allsman, R. A., Alves, D. R., et al. 1998, ApJ, 494, 396
 Alcock et al. (2000) Alcock, C., Allsman, R. A., Alves, D. R., et al. 2000, ApJ, 536, 798
 AlonsoGarcia et al. (2012) AlonsoGarcia, J., Mateo, M., Sen, B., et al. 2012, AJ, 143, 70
 Akerlof et al. (2000) Akerlof, C., Alexandroff, F., Allende Prieto, C., et al. 2000, AJ, 119, 1901
 Bányai et al. (2013) Bányai, E., Kiss, L. L., Bedding, T. R., et al. 2013, MNRAS, 436, 1576
 Benkö et al. (2010) Benkö, J. M., Kolenberg, K., Szabó, R., et al. 2010, MNRAS, 409, 1585
 Bowell et al. (1995) Bowell, E., Koehn, B. W., Howell, S. B., Hoffman, M., & Muinonen, K. 1995, BAAS, 27, 1057
 Chambers et al. (2016) Chambers, K. C., Magnier, E. A., Metcalfe, N., et al. 2016, arXiv:1612.05560
 Cohen et al. (2017) Cohen, J. G., Sesar, B., Bahnolzer, S., et al. 2017, ApJ, 849, 150
 Drake et al. (2013a) Drake, A. J., Catelan, M., Djorgovski, S. G., et al. 2013, ApJ, 763, 5
 Drake et al. (2013b) Drake, A. J., Catelan, M., Djorgovski, S. G., et al. 2013, ApJ, 765, 154
 Drake et al. (2014a) Drake, A. J., Graham, M. J., Djorgovski, S. G., et al. 2014, ApJS, 213, 9
 Drake et al. (2014b) Drake, A. J., Djorgovski, S. G., GarcíaÁlvarez, D., et al. 2014, ApJ, 790, 157
 Flewelling (2013) Flewelling, H. 2013, AAS meeting #221, 215.09
 Flewelling et al. (2016) Flewelling, H., Magnier, E., Chambers, K. et al. 2016, arXiv:1612.05243
 Graham et al. (2018) Graham, M. et al. 2018, AAS meeting #231, 354.16
 Henden et al. (2016) Henden, A. A., Templeton, M., & Terrell, D. 2016, yCat, 2336, 0
 Hernitscheck et al. (2016) Hernitscheck, N., Schlafly, E. F., Sesar, B., et al. 2016, ApJ, 817, 73
 Hoeg et al. (1997) Hoeg, E., Bässgen, G., Bastian, U., et al. 1997, A&A, 323, 57
 Jayasinghe et al. (2018) Jayasinghe, T. J., Kochanek, C. S., Stanek, K. Z., et al. 2018, MNRAS, submitted (arXiv:1803.01001)
 Kinemuchi et al. (2006) Kinemuchi, K., Smith, H. A., Woźniak, P. R., & McKay, T. A. 2006, AJ, 132, 1202
 Kovács et al. (2002) Kovács, G., Zucker, S., & Mazeh, T. 2002, A&A, 391, 369
 Larson et al. (2003) Larson, S., Beshore, E., Hill, R., et al. 2003, DPS, 35, 3604
 Law et al. (2009) Law, N. M., Kulkarni, S. R., Dekany, R. G., et al. 2009, PASP, 121, 1395
 Lomb (1976) Lomb, N. R. 1976, Ap&SS, 39, 447
 Magnier et al. (2016a) Magnier, E. A., Chambers, K. C., Flewelling, H. A., et al. 2016, arXiv:1612.05240
 Magnier et al. (2016b) Magnier, E. A., Schlafly, E. F., Finkbeiner, D. P., et al. 2016, arXiv:1612.05242
 Magnier et al. (2016c) Magnier, E. A., Sweeney, W. E., Chambers, K. C., et al. 2016, arXiv:1612.05244
 Miceli et al. (2008) Miceli, A., Rest, A., Stubbs, C. W., et al. 2008, ApJ, 678, 865
 Mróz et al. (2015) Mróz, P., Udalski, A., Poleski, R., et al. 2015, Acta Astron., 65, 313
 Ochsenbein et al. (2000) Ochsenbein, F., Bauer, P. & Marcout, J. 2000, ApJS, 143, 23O
 O’Connell (1951) O’Connell, D. J. K. 1951, Pub. Riverview College Obs., 2, 85
 Perryman (2013) Perryman, M. A. C. 2003, in Proc. ASP Conf. Ser. Vol. 298, Gaia Spectroscopy: Science and Technology, ed. U. Munari (San Francisco, CA: ASP), 3
 Palaversa et al. (2013) Palaversa, L., Ivezić, Ž., Eyer, L., et al. 2013, AJ, 146, 101
 Pojmański (1997) Pojmański, G. 1997, AcA, 47, 467
 Pojmański (2002) Pojmański, G. 2002, AcA, 52, 397
 Pojmański (2003) Pojmański, G. 2003, AcA, 53, 341
 Pojmański & Maciejewski (2004) Pojmański, G. & Maciejewski, G., 2004, AcA, 54, 153
 Pojmański & Maciejewski (2005) Pojmański, G. & Maciejewski, G., 2005, AcA, 55, 97
 Pojmański et al. (2005) Pojmański, G., Pilecki, B. & Maciejewski, G., 2005, AcA, 55, 275
 Press et al. (1992) Press, W. H., Teukolsky, S.A., Vetterling, W. T., & Flannery, B. P. 1992, Numerical Recipes in C (Second Edition; New York, NY: Cambridge University Press)
 Scargle (1982) Scargle, J. D. 1982, ApJ, 263, 835
 Schechter, Mateo, & Saha (1993) Schechter, P. L., Mateo, M., & Saha, A. 1993, PASP, 105, 1342
 Sesar et al. (2013) Sesar, B., Ivezić, Ž., Stuart, J. S. 2013, AJ, 146, 21
 Sokolovsky et al. (2017) Sokolovsky, K. V. et al. 2017, MNRAS, 464, 274
 Soszyński et al. (2011a) Soszyński, I., Dziembowski, W. A., Udalski, A., et al. 2011, Acta Astron., 61, 1
 Soszyński et al. (2011b) Soszyński, I., Udalski, A., Pietrukowicz, P., et al. 2011, Acta Astron., 61, 285
 Soszyński et al. (2013) Soszyński, I., Udalski, A., Szymański, M. K., et al. 2013, Acta Astron., 63, 21
 Soszyński et al. (2014) Soszyński, I., Udalski, A., Szymański, M. K., et al. 2014, Acta Astron., 64, 177
 Soszyński et al. (2015) Soszyński, I., Stȩpień, K., Pikecki, B., et al. 2015, Acta Astron., 65, 39
 Soszyński et al. (2016) Soszyński, I., Pawlak, M. Pietrukowicz, P., et al. 2016, Acta Astron., 66, 405
 Shappee et al. (2014) Shappee, B. J., Prieto, J. L., Grupe, D., et al. 2014, ApJ, 788, 48
 Soszyński et al. (2017) Soszyński, I., Udalski, A., Szymański, M. M., et al. 2017, Acta Astron., 67, 297
 Stokes et al. (2000) Stokes, G. H., Evans, J. B., Viggh, H. E. M., Shelly, F. C. & Pearce, E. C. 2000, Icarus, 148, 21
 Tonry et al. (2018) Tonry, J. L., Stubbs, C. W., Denneau, L., Stalder, B., & Heinze, A. N. 2018, in press.
 Udalski et al. (1994) Udalski, A., Szymanski, M., Kaluzny, J., et al. 1994, ApJ, 426, 69
 Waters et al. (2016) Waters, C. Z., Magnier, E. A., Price, P. A., et al. 2016, arXiv:1612.05245
 Watson et al. (2006) Watson, C. L., Henden, A. A., & Price, A. 2006, Society for Astronomical Science Annual Syposium, 25, 47
 Wilsey & Beaky (2009) Wilsey, N. J., & Beaky, M. M. 2009, SASS, 28, 107
 Yao et al. (2015) Yao, X., Wang, L., Wang, X. et al. 2015, AJ, 150, 107
 Zhang et al. (2015) Zhang, T.M., Wang, X.F., Chen, J.C., et al. 2015, RAA, 15, 215
9 Appendix A: Description of Variable Star Features
Here we provide a brief description of all the variability features included in the analysis vector we have calculated for each star, as given in the ‘object’ table hosted by STScI (described in §10 below). As described in the text, there are 185 statistics calculated by ATLAS. The database gives 10 additonal columns giving PanSTARRS photometry, and two additional star identification columns, for a total of 197 columns. All of them are described in Table 3.
In Table 3, twocharacter prefixes encode which stage of the ATLAS analysis produced the feature: ‘fp’ means fourierperiod; ‘vf’ means varfeat; ‘df’ means the feature came from our statistical analysis of detections in the difference images; ‘ps’ means from the proximity statistics (i.e., angular distance to nearest neighboring star); and ‘ls’ means lombscar. Features used in our machineclassification are indicated with table note marks: for quantities used without alteration, and for the sine and cosine amplitudes of the Fourier terms, which were converted into amplitude and phase using Equations 5 and 6 before being used in the machine classification.
Column #  Column name  Description 

1  ATO_ID  official ATLAS name (add ATO prefix: e.g. ATO J054.8250+42.1556) 
2  ra  J2000.0 RA in degrees 
3  dec  J2000.0 Dec in degrees 
4  fp_c_pts  Number of band points fourierperiod identified as good 
5  fp_o_pts  Number of band points fourierperiod identified as good 
6  fp_LSperiod  Original period from fourierperiod’s Lomb Scargle periodogram 
7  fp_origLogFAP^{a}^{a}Used for machine classification.  PPFAP for fourierperiod’s Lomb Scargle periodogram 
8  fp_origRMS  RMS scatter of mediansubtracted input magnitudes that 
fourierperiod identified as good  
9  fp_magrms_c^{a}^{a}Used for machine classification.  RMS scatter of mediansubtracted band magnitudes that 
fourierperiod identified as good  
10  fp_magrms_o^{a}^{a}Used for machine classification.  RMS scatter of mediansubtracted band magnitudes that 
fourierperiod identified as good  
11  fp_lngfitper^{a}^{a}Used for machine classification.  Final master period from longperiod Fourier fit (days) 
12  fp_lngfitrms  RMS scatter from final longperiod fit (magnitudes) 
13  fp_lngfitchi  for longperiod Fourier fit 
14  fp_lngfournum^{a}^{a}Used for machine classification.  Number of Fourier terms used in longperiod fit 
15  fp_lngmin_c^{a}^{a}Used for machine classification.  Minimum brightness reached by longperiod fit to the band 
photometry at any time corresponding to an actual measurement (magnitudes)  
16  fp_lngmax_c^{a}^{a}Used for machine classification.  Maximum brightness reached by longperiod fit to the band 
photometry at any time corresponding to an actual measurement (magnitudes)  
17  fp_lngmin_o^{a}^{a}Used for machine classification.  Minimum brightness reached by longperiod fit to the band 
photometry at any time corresponding to an actual measurement (magnitudes)  
18  fp_lngmax_o^{a}^{a}Used for machine classification.  Maximum brightness reached by longperiod fit to the band 
photometry at any time corresponding to an actual measurement (magnitudes)  
19  fp_lngfitrms_c^{a}^{a}Used for machine classification.  RMS scatter of residuals from longperiod fit to the band data (magnitudes) 
20  fp_lngfitrms_o^{a}^{a}Used for machine classification.  RMS scatter of residuals from longperiod fit to the band data (magnitudes) 
21  fp_lngfitchi_c  for longperiod Fourier fit to the band data 
22  fp_lngfitchi_o  for longperiod Fourier fit to the band data 
23  fp_lngconst_c  Constant term in the longperiod fit to the band data (magnitudes) 
24  fp_lngconst_o  Constant term in the longperiod fit to the band data (magnitudes) 
25  fp_sin1_c^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Sine coefficient of first Fourier term in the longperiod fit to 
the band data (magnitudes)  
26  fp_cos1_c^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Cosine coefficient of first Fourier term in the longperiod fit to 
the band data (magnitudes)  
27  fp_sin1_o^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Sine coefficient of first Fourier term in the longperiod fit to 
the band data (magnitudes)  
28  fp_cos1_o^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Cosine coefficient of first Fourier term in the longperiod fit to 
the band data (magnitudes)  
29  fp_PPFAPlong1  PPFAP of residuals after subtraction of the best longperiod 
Fourier fit with one term.  
30  fp_sin2_c^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Sine coefficient of second Fourier term in the longperiod fit to 
the band data (magnitudes)  
31  fp_cos2_c^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Cosine coefficient of second Fourier term in the longperiod fit to 
the band data (magnitudes)  
32  fp_sin2_o^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Sine coefficient of second Fourier term in the longperiod fit to 
the band data (magnitudes)  
33  fp_cos2_o^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Cosine coefficient of second Fourier term in the longperiod fit to 
the band data (magnitudes)  
34  fp_PPFAPlong2  PPFAP of residuals after subtraction of the best longperiod 
Fourier fit with two terms  
35  fp_sin3_c^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 25 and 30 
36  fp_cos3_c^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 26 and 31 
37  fp_sin3_o^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 27 and 32 
38  fp_cos3_o^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 28 and 33 
39  fp_PPFAPlong3  Obvious by analogy to columns 29 and 34 
40  fp_sin4_c^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 25 and 30 
41  fp_cos4_c^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 26 and 31 
42  fp_sin4_o^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 27 and 32 
43  fp_cos4_o^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 28 and 33 
44  fp_PPFAPlong4  Obvious by analogy to columns 29 and 34 
45  fp_hifreq_c  A measure of the relative power in the highfrequency vs lowfrequency 
terms in the longperiod Fourier fit to the band  
46  fp_hifreq_o  A measure of the relative power in the highfrequency vs lowfrequency 
terms in the longperiod Fourier fit to the band  
47  fp_timerev_c  A measure of the degree of invariance of the longperiod Fourier fit 
to the band data with respect to reversal (i.e. mirroring) of the  
timeaxis about the time of minimum light (large value = invariant)  
48  fp_timerev_o  A measure of the degree of invariance of the longperiod Fourier fit 
to the band data with respect to reversal (i.e. mirroring) of the  
timeaxis about the time of minimum light (large value = invariant)  
49  fp_phase180_c  A measure of the degree of invariance of the longperiod Fourier fit 
to the band data with respect to a 180 phaseshift  
(large value = invariant)  
50  fp_phase180_o  A measure of the degree of invariance of the longperiod Fourier fit 
to the band data with respect to a 180 phaseshift  
(large value = invariant)  
51  fp_powerterm_c^{a}^{a}Used for machine classification.  Highestamplitude Fourier term in the longperiod fit to the 
band data  
52  fp_powerterm_o^{a}^{a}Used for machine classification.  Highestamplitude Fourier term in the longperiod fit to the 
band data  
53  fp_domper_c  Period corresponding to fp_powerterm_c (days) 
54  fp_domper_o  Period corresponding to fp_powerterm_o (days) 
55  fp_shortfit^{a}^{a}Used for machine classification.  Was a shortperiod fit performed? (0 = no) 
56  fp_period^{a}^{a}Used for machine classification.  Final master period from shortperiod Fourier fit (days) 
57  fp_fitrms  RMS scatter from final shortperiod fit (magnitudes) 
58  fp_fitchi  for shortperiod Fourier fit 
59  fp_fournum^{a}^{a}Used for machine classification.  Number of Fourier terms used in shortperiod fit 
60  fp_alias  Diurnal alias of final period relative to fp_LSperiod (see Equation 3) 
61  fp_multfac  Multiplication factor of final period relative to fp_LSperiod (see Equation 3) 
62  fp_phaseoff  Offset of final period relative to fp_LSperiod, in cycles over the 
full temporal span of our data  
63  fp_min_c^{a}^{a}Used for machine classification.  Minimum brightness reached by shortperiod fit to the band 
photometry at any time corresponding to an actual measurement (magnitudes)  
64  fp_max_c^{a}^{a}Used for machine classification.  Maximum brightness reached by shortperiod fit to the band 
photometry at any time corresponding to an actual measurement (magnitudes)  
65  fp_min_o^{a}^{a}Used for machine classification.  Minimum brightness reached by shortperiod fit to the band 
photometry at any time corresponding to an actual measurement (magnitudes)  
66  fp_max_o^{a}^{a}Used for machine classification.  Maximum brightness reached by shortperiod fit to the band 
photometry at any time corresponding to an actual measurement (magnitudes)  
67  fp_fitrms_c^{a}^{a}Used for machine classification.  RMS scatter of residuals from shortperiod fit to the band data (magnitudes) 
68  fp_fitrms_o^{a}^{a}Used for machine classification.  RMS scatter of residuals from shortperiod fit to the band data (magnitudes) 
69  fp_fitchi_c  for shortperiod Fourier fit to the band data 
70  fp_fitchi_o  for shortperiod Fourier fit to the band data 
71  fp_const_c^{a}^{a}Used for machine classification.  Constant term in the shortperiod fit to the band data (magnitudes) 
72  fp_const_o^{a}^{a}Used for machine classification.  Constant term in the shortperiod fit to the band data (magnitudes) 
73  fp_sin1^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Sine coefficient of first Fourier term in the shortperiod fit (magnitudes) 
74  fp_cos1^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Cosine coefficient of first Fourier term in the shortperiod fit (magnitudes) 
75  fp_PPFAPshort1  PPFAP of residuals after subtraction of the best shortperiod 
Fourier fit with one term.  
76  fp_sin2^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Sine coefficient of second Fourier term in the shortperiod fit (magnitudes) 
77  fp_cos2^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Cosine coefficient of second Fourier term in the shortperiod fit (magnitudes) 
78  fp_PPFAPshort2  PPFAP of residuals after subtraction of the best shortperiod 
Fourier fit with two terms  
79  fp_sin3^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 73 and 76 
80  fp_cos3^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 74 and 77 
81  fp_PPFAPshort3  Obvious by analogy to columns 75 and 78 
82  fp_sin4^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 73 and 76 
83  fp_cos4^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 74 and 77 
84  fp_PPFAPshort4  Obvious by analogy to columns 75 and 78 
85  fp_sin5^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 73 and 76 
86  fp_cos5^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 74 and 77 
87  fp_PPFAPshort5  Obvious by analogy to columns 75 and 78 
88  fp_sin6^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 73 and 76 
89  fp_cos6^{b}^{b}Used for machine classification after conversion of sine and cosine coefficients to an overall amplitude and phase (see Equations 5 and 6).  Obvious by analogy to columns 74 and 77 
90  fp_PPFAPshort6  Obvious by analogy to columns 75 and 78 
91  fp_hifreq  A measure of the relative power in the highfrequency vs lowfrequency 
terms in the shortperiod Fourier fit  
92  fp_timerev^{a}^{a}Used for machine classification.  A measure of the degree of invariance of the shortperiod Fourier fit 
with respect to reversal (i.e. mirroring) of the timeaxis about the  
time of minimum light (large value = invariant)  
93  fp_phase180^{a}^{a}Used for machine classification.  A measure of the degree of invariance of the shortperiod Fourier fit 
with respect to a 180 phaseshift (large value = invariant)  
94  fp_powerterm^{a}^{a}Used for machine classification.  Highestamplitude Fourier term in the shortperiod fit 
95  fp_domperiod  Period corresponding to fp_powerterm (days) 
96  vf_Nc  Number of band observations 
97  vf_No  Number of band observations 
98  vf_c_med^{a}^{a}Used for machine classification.  Weighted median magnitude 
99  vf_o_med^{a}^{a}Used for machine classification.  Weighted median magnitude 
100  vf_percentile5^{a}^{a}Used for machine classification.  5th percentile of mediansubtracted magnitudes 
101  vf_percentile10^{a}^{a}Used for machine classification.  10th percentile of mediansubtracted magnitudes 
102  vf_percentile25^{a}^{a}Used for machine classification.  25th percentile of mediansubtracted magnitudes 
103  vf_percentile75^{a}^{a}Used for machine classification.  75th percentile of mediansubtracted magnitudes 
104  vf_percentile90^{a}^{a}Used for machine classification.  90th percentile of mediansubtracted magnitudes 
105  vf_percentile95^{a}^{a}Used for machine classification.  95th percentile of mediansubtracted magnitudes 
106  vf_Hday^{a}^{a}Used for machine classification.  A statistic probing the significance of intranight variations 
107  vf_Hlong^{a}^{a}Used for machine classification.  A statistic probing the significance of internight (longterm) variations 
108  vf_wsd  Weighted standard deviation (Sokolovsky et al., 2017) 
109  vf_iqr  Interquartile range (Sokolovsky et al., 2017) 
110  vf_chin^{a}^{a}Used for machine classification.  Reduced = (Sokolovsky et al., 2017) 
111  vf_roms^{a}^{a}Used for machine classification.  Robust median statistic (Sokolovsky et al., 2017) 
112  vf_nxs  Normalized excess variance (Sokolovsky et al., 2017) 
113  vf_nppa  Normalized peaktopeak amplitude (Sokolovsky et al., 2017) 
114  vf_inu^{a}^{a}Used for machine classification.  Inverse von Neumann ratio (Sokolovsky et al., 2017) 
115  vf_WS_I^{a}^{a}Used for machine classification.  WelchStetson I (Sokolovsky et al., 2017) 
116  vf_S_J^{a}^{a}Used for machine classification.  Stetson J (Sokolovsky et al., 2017) 
117  vf_S_K  Stetson K (Sokolovsky et al., 2017) 
118  df_numdet  Number of detections at this location in the difference images 
119  df_medmag  Median magnitude of detections in the difference images (negativegoing 
detections are included by calculating magnitudes from the absolute  
value of the flux)  
120  df_meanmag  Mean magnitude of detections in the difference images (negativegoing 
detections are included by calculating magnitudes from the absolute  
value of the flux)  
121  df_medsig  Median SNR of detections in the difference images 
122  df_meansig  Mean SNR of detections in the difference images 
123  df_r2sig  SNR of second most significant difference image detection 
124  df_r1sig  SNR of most significant difference image detection 
125  df_medchin  Median of PSF fits on the difference images 
126  df_numbright  Number of positivegoing detections on the difference images 
127  df_medPvar  Median value of Pvr (probability of being a variable star) from vartest 
(max=999)  
128  df_meanPvar  Mean value of Pvr (max=999) 
129  df_r2Pvar  Second highest value of Pvr 
130  df_r1Pvar  Highest value of Pvr 
131  df_medPscar  Median value Psc (probability of being a star subtraction residual) 
from vartest (max=999)  
132  df_meanPscar  Mean value Psc (probability of being a star subtraction residual) 
from vartest (max=999)  
133  ps_dist  Angular distance to the nearest star in our PanSTARRS reference catalog 
(arcsec)  
134  ps_dist0  Angular distance to the nearest star in our PanSTARRS reference catalog 
that is at least equally bright(arcsec)  
135  ps_dist2  Angular distance to the nearest star in our PanSTARRS reference catalog 
that is at least two magnitudes brighter (arcsec)  
136  ps_dist4  Angular distance to the nearest star in our PanSTARRS reference catalog 
that is at least four magnitudes brighter (arcsec)  
137  ls_Npt  Number of photometric measurements input to lombscar 
138  ls_Nuse  Number of photometric measurements lombscar identified as good 
139  ls_c_med  Median band magnitude calculated by lombscar 
140  ls_o_med  Median band magnitude calculated by lombscar 
141  ls_Pday  Period output by lombscar (days) 
142  ls_PPFAP  PPFAP from LombScargle periodogram in lombscar 
143  ls_Chin  for Fourier+polynomial fit performed by lombscar 
144  ls_Cchin  for constantmagnitude fit performed by lombscar 
145  ls_Pchin  for polynomialonly fit performed by lombscar 
146  ls_Xchin  for polynomialonly fit performed by lombscar, without 
outlier trimming  
147  ls_Fraclo  Fraction of points with magnitudes more than 5 below the median 
148  ls_Frachi  Fraction of points with magnitudes more than 5 above the median 
149  ls_txclo  Fraction of low outliers with time diff less than 0.06 day 
150  ls_txchi  Fraction of high outliers with time diff less than 0.06 day 
151  ls_Chin_minus_1  for lombscar Fourier fit to alias 
152  ls_Chin_minus_h  for lombscar Fourier fit to alias 
153  ls_Chin_plus_h  for lombscar Fourier fit to alias 
154  ls_Chin_plus_1  for lombscar Fourier fit to alias 
155  ls_Ply1  Linear coefficient of polynomial fit by lombscar (mag/year) 
156  ls_Ply2  Quadratic coefficient of polynomial fit by lombscar (mag/year) 
157  ls_Phgap  biggest time gap with no points in folded lightcurve (fraction of ls_Pday) 
158  ls_D  Period doubling (1 if lombscar output period has been doubled relative to 
the highest peak in the LombScargle periodogram, or 0 if not)  
159  ls_RMS  RMS of residuals from lombscar Fourier fit 
160  ls_F0  Amplitude of lombscar constant Fourier term divided by RMS 
161  ls_F1cos  Amplitude of lombscar cos1 Fourier term divided by RMS 
162  ls_F1sin  Amplitude of lombscar sin1 Fourier term divided by RMS 
163  ls_F2cos  Amplitude of lombscar cos2 Fourier term divided by RMS 
164  ls_F2sin  Amplitude of lombscar sin2 Fourier term divided by RMS 
165  ls_F3cos  Amplitude of lombscar cos3 Fourier term divided by RMS 
166  ls_F3sin  Amplitude of lombscar sin3 Fourier term divided by RMS 
167  ls_F4cos  Amplitude of lombscar cos4 Fourier term divided by RMS 
168  ls_F4sin  Amplitude of lombscar sin4 Fourier term divided by RMS 
169  CLASS  Final ATLAS variable classification 
170  ddcSTAT  Difference image statistic (1=probably variable independent of any other 
information)  
171  proxSTAT  Proximity statistic (1=variability detection probably not caused by blending) 
172  prob_CBF  Machine classifier probability that this star is in the CBH category 
173  prob_CBH  Machine classifier probability that this star is in the CBF category 
174  prob_DBF  Machine classifier probability that this star is in the DBF category 
175  prob_DBH  Machine classifier probability that this star is in the DBH category 
176  prob_HARD  Machine classifier probability that this star is IRR, LPV, or ‘dubious’ 
177  prob_MIRA  Machine classifier probability that this star is in the MIRA category 
178  prob_MPULSE  Machine classifier probability that this star is in the MPULSE category 
179  prob_MSINE  Machine classifier probability that this star is in the MSINE category 
180  prob_NSINE  Machine classifier probability that this star is in the NSINE category 
181  prob_PULSE  Machine classifier probability that this star is in the PULSE category 
182  prob_SINE  Machine classifier probability that this star is in the SINE category 
183  prob_IRR  Machine classifier probability that this star is in the IRR category 
184  prob_LPV  Machine classifier probability that this star is in the LPV category 
185  prob_dubious  Machine classifier probability that this star is in the ‘dubious’ category 
186  gmag  PanSTARRS1 DR1 band magnitude 
187  gerr  Uncertainty on band magnitude 
188  rmag  PanSTARRS1 DR1 band magnitude 
189  rerr  Uncertainty on band magnitude 
190  imag  PanSTARRS1 DR1 band magnitude 
191  ierr  Uncertainty on band magnitude 
192  zmag  PanSTARRS1 DR1 band magnitude 
193  zerr  Uncertainty on band magnitude 
194  ymag  PanSTARRS1 DR1 band magnitude 
195  yerr  Uncertainty on band magnitude 
196  starID  Old version of ATLAS star ID (historical interest only) 
197  objid  Object ID, useful for linkage with the ‘detection’ database 
10 Appendix B: Sample Database Queries
In this appendix we supply instructions for querying the databases described above. Most importantly, we give example queries for the ATLAS variable star databaes presented herein. First, however, we show how to query the PanSTARRS1 DR1 database to produce a catalog like the one we used for matching ATLAS photometric dections to unique stars (see §3.1).
To query either the PanSTARRS or ATLAS databases, begin by going to the website:
Create an account, and login.
10.1 How to perform a PanSTARRS1 DR1 query like the ones we used to construct our object matching catalog
Having logged in to the website given above, click on the ‘Query’ tab. Select PanSTARRS_DR1 from the ‘Context’ dropdown menu. Type distinctive names for ‘Table’ and ‘Task Name’. The name you type under ‘Table’ is very important, since it will be the filename of the catalog produced by your query. Be sure to avoid attempting two different queries with the same table name.
Paste a query based on the example below into the big empty box filling most of the page. Click the ‘Syntax’ button at upper right to check for errors. If the syntax is OK, click the ‘Submit’ button. If you think your query might run very fast, you can use the ‘Quick’ button instead of ‘Submit’, but the PanSTARRS_DR1 database is so huge that almost nothing is quick.
When the query is finished, which could take several hours, click on the ‘MyDB’ tab to see the table that has been produced. Click on the filename and then click the ‘Download’ button that will appear near the middle of a row of buttons at top right. When the download is finished, click the ‘Output’ button to finally access the file that has been produced.
Here is an example query that select objects with ra between 120 and 135 from the PanSTARRS_DR1 database:
SELECT objectThin.objID, raMean, decMean, gMeanKronMag,gMeanKronMagErr, rMeanKronMag,rMeanKronMagErr, iMeanKronMag,iMeanKronMagErr, zMeanKronMag,zMeanKronMagErr, yMeanKronMag,yMeanKronMagErr FROM ObjectThin JOIN MeanObject ON objectThin.uniquePspsOBid = meanObject.uniquePspsOBid JOIN stackObjectThin ON objectThin.objID = stackObjectThin.objID WHERE bestdetection = 1 AND primarydetection = 1 AND ((gKronMag < 19 AND gKronMag > 0) OR (rKronMag < 19 and rKronMag > 0 ) OR (iKronMag < 19 and iKronMag > 0) OR (zKronMag < 19 and zKronMag > 0)) AND raMean >= 120. AND raMean < 135.
10.2 How to Query the ATLAS variable stars databases
Just as for PanSTARRS, begin by logging it at
and use the ‘Context’ dropdown menu to select HLSP_ATLAS_VAR.
There are three databases: object (the catalog of candidate variable stars, with 4.7 million entries); detection (all ATLAS photometric measurements of all the candidate variables, nearly a billion entries); and observation (a catalog of all ATLAS images used for DR1). To get examples of the columns present in each of the three catalogs, try the following three queries using the ‘Quick’ button:
select top 10 * from object select top 10 * from detection select top 10 * from observation
The column names in the detection and observation databases should be mostly selfexplanatory. The detection database gives the time (mjd = Modified Julian Day), celestial coordinates (ra and dec), magnitude and magnitude uncertainty (m and dm) and filter ( or ) for each ATLAS measurement.
Note well that the Modified Julian Day (MJD) values in the detection and observation databases donot have a lighttraveltime correction applied: that is, they are not the Heliocentric or Barycentric MJD. For precision timing or accurate phasing of short period variables, you must apply a lighttraveltime correction to produce HMJD or BMJD. Formulae for this correction are available online and in the Astronomical Almanac.
The object database has a huge number of columns, which are described in §9 above.
Here are some more sophisticated example queries:
Find very shortperiod objects ( days) with very significant intranight variations:
select fp_LSperiod,fp_Period, objid from object where fp_period < 0.05 and vf_hday > 20 and fp_shortfit =1
Get detections for one of the objects identified in the shortperiod query:
select * from detection where objid = 98841395045605667
Select candidate longperiod Cepheids in four stages:
1: PULSE variables with only a longperiod fit:
select * from object where fp_shortfit =0 and CLASS=’PULSE’
2: PULSE variables with a shortperiod fit leading to a master period longer than 6.2 days:
select * from object where fp_shortfit =1 and fp_period > 6.2 and fp_origLogFAP > 20 and CLASS=’PULSE’
3: Variables with a shortperiod fit that are not classified as PULSE but nevertheless have highly significant variations that might be consistent with a Cepheid. We demand highly significant variability by requiring that the PPFAP from fourierperiod’s original periodogram be more than 20, and we demand a coherent periodic fit by requiring that the ratio of raw to residual RMS be at least 4 and that for the shortperiod fit must be less than 5.
select * from object where fp_shortfit =1 and fp_period > 6.2 and fp_origLogFAP > 20 and (fp_origRMS/fp_fitrms) > 4 and fp_fitchi < 5 and CLASS!=’PULSE’
4: Variables with a longperiod fit that are not classified as PULSE but nevertheless have highly significant variations that might be consistent with a Cepheid. Besides using constraints analogous to the shortperiod query, we excluded MIRA stars and other extremely longperiod variables be requiring days.
select * from object where fp_shortfit =0 and fp_lngfitper < 50 and fp_origLogFAP > 20 and (fp_origRMS/fp_lngfitrms) > 4 and fp_lngfitchi < 5 and CLASS!=’PULSE’
The total number of Cepheid candidates matching any of the four queries is only about 600 out of the 4.7 million stars in the database.
Then here are some additional, more general queries:
Find RR Lyrae stars — extract all PULSE variables with periods between 0.3 and 0.9 days:
select ATO_ID, class, fp_period, objid from object where class=’PULSE’ and fp_period between 0.3 and 0.9 order by fp_period
Find irregular stars with huge amplitudes greater than 1.5 magnitudes, for which DDCstat=1, indicating that the statistics from the difference images show the variability is not spurious:
select ATO_ID, class, (fp_min_cfp_max_c) as delta1, (fp_min_ofp_max_o) as delta2 from object where class=’IRR’ and (((fp_min_cfp_max_c)>1.5) or ((fp_min_ofp_max_o)>1.5)) and DDCStat=1 order by delta1, delta2
Find eclipsing binaries with orbital periods below the wellknown cutoff at 0.22 days:
select ATO_ID, class, fp_period from object where (class in (’CBH’,’DBH’) and fp_period < 0.11) or (class in (’CBF’,’DBF’) and fp_period < 0.22) order by class, fp_period
Find SINE stars that might misclassified RRc based on their periods and amplitudes:
select ATO_ID, class, fp_period, fp_min_c  fp_max_c as delta from object where class = ’SINE’ and (fp_period between 0.3 and 0.48 and ((fp_min_c  fp_max_c) between 0.36 and 0.6)) order by fp_period, delta
Find LPV stars that are strong, highamplitude variables:
select ATO_ID, class, fp_origLogFAP, (fp_min_c  fp_max_c) as delta1, (fp_min_o  fp_max_o) as delta2 from object where class=’LPV’ and fp_origLogFAP>20 and (((fp_min_c  fp_max_c)>0.5) or ((fp_min_o  fp_max_o))>0.5) order by fp_origLogFAP, delta1, delta2
Find LPV stars that are probably coherent and regular, regardless of their amplitudes:
select ATO_ID, class, fp_origLogFAP, fp_lngfitper, (fp_lngmin_c  fp_lngmax_c) as delta1, (fp_lngmin_o  fp_lngmax_o) as delta2 from object where class=’LPV’ and fp_origLogFAP>20 and fp_lngfitchi < 5 and fp_powerterm_c=1 and fp_powerterm_o=1