---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.15.2
kernelspec:
  display_name: Python [conda env:tractor]
  language: python
  name: conda-env-tractor-py
---

# Automated Multiband Forced Photometry on Large Datasets
***

## Learning Goals:
By the end of this tutorial, you will be able to:
- get catalogs and images from NASA archives in the cloud where possible
- measure fluxes at any location by running forced photometry using "The Tractor" 
- employ parallel processing to make this as fast as possible
- cross match large catalogs
- plot results

## Introduction:
This code performs photometry in an automated fashion at all locations in an input catalog on 4 bands of IRAC data from IRSA and 2 bands of Galex data from MAST.  The resulting catalog is then cross-matched with a Chandra catalog from HEASARC to generate a multiband catalog to facilitate galaxy evolution studies.

The code will run on 2 different science platforms and makes full use of multiple processors to optimize run time on large datasets.

## Input:
- RA and DEC within COSMOS catalog
- desired catalog radius in arcminutes
- mosaics of that region for IRAC and Galex

## Output:
- merged, multiband, science ready pandas dataframe
- IRAC color color plot for identifying interesting populations

## Non-standard Imports
- `tractor` code which does the forced photometry from Lang et al., 2016
- `astroquery` to interface with archives APIs
- `astropy` to work with coordinates/units and data structures
- `skimage` to work with the images


## Authors:
Jessica Krick, David Shupe, Marziye JafariYazani, Brigitta Sipőcz, Vandana Desai, Steve Groom, Troy Raen


## Acknowledgements:
Kristina Nyland for the workflow of the tractor wrapper.\
MAST, HEASARC, & IRSA Fornax teams


```{code-cell} ipython3
#ensure all dependencies are installed
!pip install -r requirements.txt
```

```{code-cell} ipython3
# standard lib imports

import math
import time
import warnings
import concurrent.futures
import sys
import os
import re
from typing import NamedTuple

# Third party imports

import numpy as np
import matplotlib.pyplot as plt
from skimage.transform import rotate
import pandas as pd
import seaborn as sns
import statsmodels
import mpld3
from tqdm import tqdm

from astropy.nddata import Cutout2D
from astropy.io import fits
from astropy.table import Table
from astropy.coordinates import SkyCoord
import astropy.units as u

from astroquery.ipac.irsa import Irsa
from astroquery.heasarc import Heasarc
from astroquery.mast import Observations
import pyvo

# Local code imports
sys.path.append('code_src/')

from display_images import display_images
import cutout
from exceptions import TractorError
import photometry
from plot_SED import plot_SED
from nway_write_header import nway_write_header
from photometry import Band
from photometry import lookup_img_pair
#from prepare_prf import prepare_prf

# temporarily let the notebook start without tractor as dependency
try:
    from find_nconfsources import find_nconfsources
except ImportError:
    print("tractor is missing")
    pass


%matplotlib inline
```

## 1. Retrieve Initial Catalog from IRSA
- Automatically set up a catalog with ra, dec, photometric redshifts, fiducial band fluxes, & probability that it is a star  
- Catalog we are using is COSMOS2015 (Laigle et al. 2016)  
- Data exploration

```{code-cell} ipython3
#pull a COSMOS catalog from IRSA using pyVO

#what is the central RA and DEC of the desired catalog
coords = SkyCoord('150.01d 2.2d', frame='icrs')  #COSMOS center acording to Simbad

#how large is the search radius, in arcmin
radius = 48.0 * u.arcmin  

#specify only select columns to limit the size of the catalog
cols = [
    'ra', 'dec', 'id', 'Ks_FLUX_APER2', 'Ks_FLUXERR_APER2', 'PHOTOZ', 'SPLASH_1_MAG',
    'SPLASH_1_MAGERR', 'SPLASH_1_FLUX', 'SPLASH_1_FLUX_ERR', 'SPLASH_2_FLUX',
    'SPLASH_2_FLUX_ERR', 'SPLASH_3_FLUX', 'SPLASH_3_FLUX_ERR', 'SPLASH_4_FLUX',
    'SPLASH_4_FLUX_ERR', 'FLUX_GALEX_NUV', 'FLUX_GALEX_FUV', 'FLUX_CHANDRA_05_2',
    'FLUX_CHANDRA_2_10', 'FLUX_CHANDRA_05_10', 'ID_CHANDRA09 ', 'type', 'r_MAG_AUTO',
    'r_MAGERR_AUTO', 'FLUX_24', 'FLUXERR_24', 'MAG_GALEX_NUV', 'MAGERR_GALEX_NUV',
    'MAG_GALEX_FUV', 'MAGERR_GALEX_FUV'
]

tap = pyvo.dal.TAPService('https://irsa.ipac.caltech.edu/TAP')
result = tap.run_sync("""
           SELECT {}
           FROM cosmos2015
           WHERE CONTAINS(POINT('ICRS',ra, dec), CIRCLE('ICRS',{}, {}, {}))=1
    """.format(','.join(cols), coords.ra.value, coords.dec.value, radius.to(u.deg).value))
cosmos_table = result.to_table()


print("Number of objects: ", len(cosmos_table))
```

### 1a. Filter Catalog
- if desired could filter the initial catalog to only include desired sources

```{code-cell} ipython3
#an example of how to filter the catalog to 
#select those rows with either chandra fluxes or Galex NUV fluxes

#example_table = cosmos_table[(cosmos_table['flux_chandra_05_10']> 0) | (cosmos_table['flux_galex_fuv'] > 0)]

```

## 2. Retrieve Image Datasets from the Cloud

+++

#### Use the fornax cloud access API to obtain the IRAC data from the IRSA S3 bucket. 

Details here may change as the prototype code is being added to the appropriate libraries, as well as the data holding to the appropriate NGAP storage as opposed to IRSA resources.

```{code-cell} ipython3
# Temporary solution, remove when the fornax API is added to the image
# This relies on the assumption that https://github.com/fornax-navo/fornax-cloud-access-API is being cloned to this environment. 
# If it's not, then run a ``git clone https://github.com/fornax-navo/fornax-cloud-access-API --depth=1`` from a terminal at the highest directory root.
# You may need to update the fork if you forked it in the past

import os
if not os.path.exists('../../fornax-cloud-access-API'):
    ! git clone https://github.com/fornax-navo/fornax-cloud-access-API --depth=1 ../../fornax-cloud-access-API
```

```{code-cell} ipython3
sys.path.append('../../fornax-cloud-access-API')

import pyvo
import fornax
```

```{code-cell} ipython3
# Getting the COSMOS address from the registry to follow PyVO user case approach. We could hardwire it.
image_services = pyvo.regsearch(servicetype='image')
irsa_cosmos = [s for s in image_services if 'irsa' in s.ivoid and 'cosmos' in s.ivoid][0]

# The search returns 11191 entries, but unfortunately we cannot really filter efficiently in the query
# itself (https://irsa.ipac.caltech.edu/applications/Atlas/AtlasProgramInterface.html#inputparam)
# to get only the Spitzer IRAC results from COSMOS as a mission. We will do the filtering in a next step before download.
cosmos_results = irsa_cosmos.search(coords).to_table()

spitzer = cosmos_results[cosmos_results['dataset'] == 'IRAC']
```

```{code-cell} ipython3
# Temporarily add the cloud_access metadata to the Atlas response. 
# This dataset has limited acces, thus 'region' should be used instead of 'open'.
# S3 access should be available from the daskhub and those who has their IRSA token set up.

fname = spitzer['fname']
spitzer['cloud_access'] = [(f'{{"aws": {{ "bucket_name": "irsa-mast-tike-spitzer-data",'
                            f'              "region": "us-east-1",'
                            f'              "access": "restricted",'
                            f'              "key": "data/COSMOS/{fn}" }} }}') for fn in fname]
```

```{code-cell} ipython3
# Adding function to download multiple files using the fornax API. 
# Requires https://github.com/fornax-navo/fornax-cloud-access-API/pull/4
def fornax_download(data_table, data_subdirectory, access_url_column='access_url',
                    fname_filter=None, verbose=False):
    working_dir = os.getcwd()
    
    data_directory = f"data/{data_subdirectory}"
    os.chdir(data_directory)
    for row in data_table:
        if fname_filter is not None and fname_filter not in row['fname']:
            continue
        handler = fornax.get_data_product(row, 'aws', access_url_column=access_url_column, verbose=verbose)
        temp_file = handler.download()
        # on-prem download returns temp file path, s3 download downloads file
        if temp_file:
            os.rename(temp_file, os.path.basename(row['fname']))
        
    os.chdir(working_dir)
```

```{code-cell} ipython3
fornax_download(spitzer, access_url_column='sia_url', fname_filter='go2_sci', 
                data_subdirectory='IRAC', verbose=False)
```

#### Use IVOA image search and Fornax download to obtain Galex from the MAST archive

```{code-cell} ipython3
#the Galex mosaic of COSMOS is broken into 4 seperate images
#need to know which Galex image the targets are nearest to.
#make a new column in dataframe which figures this out

#four centers for 1, 2, 3, 4 are
ra_center=[150.369,150.369,149.869,149.869]
dec_center=[2.45583,1.95583,2.45583,1.95583]

galex = SkyCoord(ra = ra_center*u.degree, dec = dec_center*u.degree)
catalog = SkyCoord(ra = cosmos_table['ra'], dec = cosmos_table['dec'])
#idx, d2d, d3d = match_coordinates_sky(galex, catalog)  #only finds the nearest one
#idx, d2d, d3d = galex.match_to_catalog_sky(catalog)  #only finds the nearest one

cosmos_table['COSMOS_01'] = galex[0].separation(catalog)
cosmos_table['COSMOS_02'] = galex[1].separation(catalog)
cosmos_table['COSMOS_03'] = galex[2].separation(catalog)
cosmos_table['COSMOS_04'] = galex[3].separation(catalog)

#convert to pandas
df = cosmos_table.to_pandas()

#which row has the minimum value of distance to the galex images
df['galex_image'] = df[['COSMOS_01','COSMOS_02','COSMOS_03','COSMOS_04']].idxmin(axis = 1)
```

```{code-cell} ipython3
# 76k with 15arcmin diameter IRAC images
df.describe()
```

```{code-cell} ipython3
# Assign the endpoint for the MAST Galex Simple Image Access service.
galex_sia_url = 'https://mast.stsci.edu/portal_vo/Mashup/VoQuery.asmx/SiaV1?MISSION=GALEX&'

# Query the Galex SIA service using the COSMOS coordinates defined above.
query_result = pyvo.dal.sia.search(galex_sia_url, pos=coords, size=0.0)
galex_image_products = query_result.to_table()

# Filter the products so we only download SCIENCE products with exposure time > 40000
filtered_products = galex_image_products[(galex_image_products['timExposure'] > 40000.0)]
filtered_products = filtered_products[(filtered_products['productType'] == "SCIENCE")]

# The column containing the on-prem access to fall back on if cloud is unavailable
access_url_column = query_result.fieldname_with_ucd('VOX:Image_AccessReference')

# Download filtered products from AWS
download_subdir = 'Galex'
fornax_download(filtered_products, access_url_column=access_url_column, data_subdirectory=download_subdir)
```

```{code-cell} ipython3
# Get the GALEX sky background fits files in addition to the mosaics
skybg_products = []
skybkg_pattern = re.compile(r"COSMOS_0[1-4]-[fn]d-skybg")

for row in galex_image_products:
    if skybkg_pattern.search(row['name']):
        skybg_products.append(row)

# Download skybg products from AWS
download_subdir = 'Galex'
fornax_download(skybg_products, access_url_column=access_url_column, data_subdirectory=download_subdir)
```

```{code-cell} ipython3
#make sure there aren't any troublesome rows in the catalog
#are there missing values in any rows?
df.isna().sum()

#don't mind that there are missing values for some of the fluxes
#The rest of the rows are complete
```

```{code-cell} ipython3
#out of curiosity how many of each type of source are in this catalog
#Type: 0 = galaxy, 1 = star, 2 = X-ray source, -9 is failure to fit
df.type.value_counts()
```

## 3.  Run Forced Photometry
- Calculate a flux at a given position in 2 IRAC and 2 Galex bands

+++

### 3a. Setup
- initialize data frame columns to hold the results
- collect the parameters for each band/channel
- collect the input images

```{code-cell} ipython3
# initialize columns in data frame for photometry results
cols = ["ch1flux", "ch1flux_unc", "ch2flux", "ch2flux_unc", "ch3flux", "ch3flux_unc",
        "ch4flux", "ch4flux_unc", "ch5flux", "ch5flux_unc", "ch6flux", "ch6flux_unc"]
df[cols] = 0.0

# list to collect all the bands
all_bands = []
```

```{code-cell} ipython3
# IRAC channels

irac_band_indexes = [
    0,  # ch1
    1,  # ch2
    2,  # ch3
    3,  # ch4
]

irac_fluxconversion = (1E12) / (4.254517E10) * (0.6) *(0.6)

irac_mosaic_pix_scale = 0.6

irac_cutout_width = 10 # in arcseconds, taken from Nyland et al. 2017

irac_prfs = [
    fits.open('data/IRAC/PRF_IRAC_ch1.fits')[0].data,
    fits.open('data/IRAC/PRF_IRAC_ch2.fits')[0].data,
    fits.open('data/IRAC/PRF_IRAC_ch3.fits')[0].data,
    fits.open('data/IRAC/PRF_IRAC_ch4.fits')[0].data,
]

# zip parameters for each band into a container and append to the master list
irac_bands = [
    Band(
        idx, prf, irac_cutout_width, irac_fluxconversion, irac_mosaic_pix_scale
    )
    for idx, prf in zip(irac_band_indexes, irac_prfs)
]
all_bands += irac_bands
```

```{code-cell} ipython3
# GALEX bands

galex_band_indexes = [
    4,  # nuv
    5,  # fuv
]

galex_cutout_width = 40

galex_fluxconversions = [
    3.373E1,  # uJy. fudging this to make the numbers bigger for plotting later
    1.076E2,  # uJy. fudging this to make the numbers bigger for plotting later
]

galex_mosaic_pix_scale = 1.5

prf_nuv = fits.open("data/Galex/PSFnuv_faint.fits")[0].data
prf_fuv = fits.open("data/Galex/PSFfuv.fits")[0].data
prf_nuv = prf_nuv[0:119, 0:119]
prf_fuv = prf_fuv[0:119, 0:119]

#these are much larger than the cutouts we are using, so only keep the central region which is the size of our cutouts
ngalex_pix = galex_cutout_width / galex_mosaic_pix_scale
prf_cen = int(60)
prf_nuv = prf_nuv[(prf_cen - int(ngalex_pix / 2) - 1) : (prf_cen + int(ngalex_pix / 2)),
                  (prf_cen - int(ngalex_pix / 2) - 1) : (prf_cen + int(ngalex_pix / 2))]
prf_fuv = prf_fuv[(prf_cen - int(ngalex_pix / 2) - 1) : (prf_cen + int(ngalex_pix / 2)),
                  (prf_cen - int(ngalex_pix / 2) - 1) : (prf_cen + int(ngalex_pix / 2))]
galex_prfs = [prf_nuv, prf_fuv]

# zip parameters for each band into a container and append to the master list
galex_bands = [
    Band(
        idx, prf, galex_cutout_width, flux_conv, galex_mosaic_pix_scale
    )
    for idx, prf, flux_conv in zip(galex_band_indexes, galex_prfs, galex_fluxconversions)
]
all_bands += galex_bands
```

```{code-cell} ipython3
#Collect input images
# collect the files in pairs: (science image, sky-background image)
# if the same file should be used for both, just send it once
sci_bkg_pairs = [
    # IRAC. use the science image to calculate the background
    ('data/IRAC/irac_ch1_go2_sci_10.fits', ),
    ('data/IRAC/irac_ch2_go2_sci_10.fits', ),
    ('data/IRAC/irac_ch3_go2_sci_10.fits', ),
    ('data/IRAC/irac_ch4_go2_sci_10.fits', ),
    # GALEX. calculate the background from a dedicated file
    ('data/Galex/COSMOS_01-nd-int.fits.gz', 'data/Galex/COSMOS_01-nd-skybg.fits.gz'),
    ('data/Galex/COSMOS_01-fd-int.fits.gz', 'data/Galex/COSMOS_01-fd-skybg.fits.gz'),
    ('data/Galex/COSMOS_02-nd-int.fits.gz', 'data/Galex/COSMOS_02-nd-skybg.fits.gz'),
    ('data/Galex/COSMOS_02-fd-int.fits.gz', 'data/Galex/COSMOS_02-fd-skybg.fits.gz'),
    ('data/Galex/COSMOS_03-nd-int.fits.gz', 'data/Galex/COSMOS_03-nd-skybg.fits.gz'),
    ('data/Galex/COSMOS_03-fd-int.fits.gz', 'data/Galex/COSMOS_03-fd-skybg.fits.gz'),
    ('data/Galex/COSMOS_04-nd-int.fits.gz', 'data/Galex/COSMOS_04-nd-skybg.fits.gz'),
    ('data/Galex/COSMOS_04-fd-int.fits.gz', 'data/Galex/COSMOS_04-fd-skybg.fits.gz'),
]

```

### 3b. Main Function to do the Forced Photometry

```{code-cell} ipython3
def calc_instrflux(band, ra, dec, stype, ks_flux_aper2, img_pair, df):
    """
    Calculate single-band instrumental fluxes and uncertainties at the given ra, dec 
    using tractor.

    Parameters:
    -----------
    band : `Band`
        Collection of parameters for a single band. 
        A `Band` is a named tuple with the following attributes:
            idx : int
                Identifier for the band/channel.
                (integer in [0, 1, 2, 3, 4, 5] for the four IRAC bands and two Galex bands)
            prf : np.ndarray
                Point spread function for the band/channel.
            cutout_width : int
                width of desired cutout in arcseconds
            flux_conv : float
                factor used to convert tractor result to microjanskies
            mosaic_pix_scale : float
                Pixel scale of the image
    ra, dec : float
        celestial coordinates for measuring photometry
    stype : int
        0, 1, 2, -9 for star, galaxy, x-ray source
    ks_flux_aper_2 : float
        flux in aperture 2
    img_pair : tuple
        Pair of images for science and background respectively.
        If the tuple only contains one element it will serve double duty.
        A tuple element can be a `fits.ImageHDU` or the path to a FITS file as a `str`.
    df : pd.DataFrame
        Source catalog.
        Previous arguments (ra, dec, stype, ks_flux_aper_2) come from a single row of this df.
        However, we must also pass the entire dataframe in order to find nearby sources which are possible contaminates.

    Returns:
    --------
    outband : int
        reflects the input band index for identification purposes
    flux : float
        Measured flux in microJansky.
        NaN if the forced photometery failed.
    flux_unc : float
        Flux uncertainty in microJansky, calculated from the tractor results.
        NaN if the forced photometery failed or if tractor didn't report a flux variance.
    """
    # cutout a small region around the object of interest
    subimage, bkgsubimage, x1, y1, subimage_wcs = cutout.extract_pair(
        ra, dec, img_pair=img_pair, cutout_width=band.cutout_width, mosaic_pix_scale=band.mosaic_pix_scale
    )
    
    # find nearby sources that are possible contaminants
    objsrc, nconfsrcs = find_nconfsources(
        ra, dec, stype, ks_flux_aper2, x1, y1, band.cutout_width, subimage_wcs, df
    )

    # estimate the background
    skymean, skynoise = photometry.calc_background(bkgsubimage=bkgsubimage)

    # do the forced photometry
    # if tractor fails to converge, just return NaNs
    try:
        flux_var = photometry.run_tractor(
            subimage=subimage, prf=band.prf, objsrc=objsrc, skymean=skymean, skynoise=skynoise
        )
    except TractorError:
        return (band.idx, np.nan, np.nan)

    # convert the results
    microJy_flux, microJy_unc = photometry.interpret_tractor_results(
        flux_var=flux_var, flux_conv=band.flux_conv, objsrc=objsrc, nconfsrcs=nconfsrcs
    )

    return (band.idx, microJy_flux, microJy_unc)
```

### 3c. Calculate Forced Photometry with Straightforward but Slow Method
- no longer in use but keeping to demonstrate this capability\
as well as the increase in speed when going to parallelization

```{raw-cell}
%%time
#do the calculation without multiprocessing for benchmarking

#make a copy for parallel computation
pl_df = df.copy(deep=True)

t0 = time.time()
#for each object
for row in df.itertuples():
    #for each band
    for band in range(6):
        #measure the flux with tractor
        outband, flux, unc = calc_instrflux(band, row.ra, row.dec, row.type, row.ks_flux_aper2)
        #put the results back into the dataframe
        df.loc[row.Index, 'ch{:d}flux'.format(outband+1)] = flux
        df.loc[row.Index, 'ch{:d}flux_unc'.format(outband+1)] = unc
        #print(row.ra, row.dec, row.type, row.ks_flux_aper2, band+1,
        #      outband, flux, unc)
t1 = time.time()


#10,000 sources took 1.5 hours with this code on the IPAC SP
```

### 3d. Calculate Forced Photometry - Parallelization
- Parallelization: we can either interate over the rows of the dataframe and run the four bands in parallel; or we could zip together the row index, band, ra, dec, 

```{code-cell} ipython3
paramlist = []
for row in df.itertuples():
    for band in all_bands:
        img_pair = lookup_img_pair(sci_bkg_pairs, band.idx, row.galex_image)  # file paths only
        paramlist.append(
            [row.Index, band, row.ra, row.dec, row.type, row.ks_flux_aper2, img_pair, df]
        )
print('paramlist: ', len(paramlist))
```

```{code-cell} ipython3
#proove we can do this for one object
calc_instrflux(*paramlist[0][1:])

#same thing, different syntax
# calc_instrflux(paramlist[0][1], paramlist[0][2], paramlist[0][3], paramlist[0][4], paramlist[0][5], paramlist[0][6])
```

```{code-cell} ipython3
#wrapper to measure the photometry on a single object, single band
def calculate_flux(args):
    """Calculate flux."""
    val = calc_instrflux(*args[1:])
    # add simple logging
    if (args[0] % 100) == 0 and val[0] == 0:
        with open('output/output.log', 'a') as fp: fp.write(f'{args[0]}\n')
    return(args[0], val)
```

```{code-cell} ipython3
# if results were previously saved to this location, load them
# else start a pool of workers to calculate results in parallel, and save them here
fname = f'output/results_{radius.value}.npz'

if os.path.exists(fname):
    results = np.load(fname, allow_pickle=True)['results']

else:
    from  multiprocessing import Pool
    t0 = time.time()
    with open('output/output.log', 'w') as fp: fp.write('')
    with Pool() as pool:
        results = pool.map(calculate_flux, paramlist)
    dtime = time.time() - t0
    np.savez(fname, results=results)
    print(f'Parallel calculation took {dtime} seconds')
```

```{code-cell} ipython3
## put the results into the main daraframe
for res in results:
    idx,(ich, val, err) = res
    df.loc[idx, f'ch{ich+1}flux'] = val
    df.loc[idx, f'ch{ich+1}flux_unc'] = err
```

```{code-cell} ipython3
#Count the number of non-zero ch1 fluxes
print('Parallel calculation: number of ch1 fluxes filled in =',
      np.sum(df.ch1flux > 0))
```

### 3e. Cleanup

```{code-cell} ipython3
#had to call the galex flux columns ch5 and ch6
#fix that by renaming them now
cols = {'ch5flux':'nuvflux', 'ch5flux_unc':'nuvflux_unc','ch6flux':'fuvflux', 'ch6flux_unc':'fuvflux_unc'}
df.rename(columns=cols, inplace = True)
```

```{code-cell} ipython3
#When doing a large run of a large area, save the dataframe with the forced photometry 
# so we don't have to do the forced photometry every time

df.to_pickle(f'output/COSMOS_{radius.value}arcmin.pkl')
```

```{code-cell} ipython3
#If you are not runnig the forced photometry, then read in the catalog from a previous run

#df = pd.read_pickle('output/COSMOS_48.0arcmin.pkl')
```

### 3f. Plot to Confirm our Photometry Results 
- compare to published COSMOS 2015 catalog

```{code-cell} ipython3
%%time
#plot tractor fluxes vs. catalog splash fluxes
#should see a straightline with a slope of 1

#setup to plot
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)
fluxmax = 200
ymax = 100
xmax = 100
#ch1 
#first shrink the dataframe to only those rows where I have tractor photometry 
df_tractor = df[(df.splash_1_flux> 0) & (df.splash_1_flux < fluxmax)] #200
#sns.regplot(data = df_tractor, x = "splash_1_flux", y = "ch1flux", ax = ax1, robust = True)
sns.scatterplot(data = df_tractor, x = "splash_1_flux", y = "ch1flux", ax = ax1)

#add a diagonal line with y = x
lims = [
    np.min([ax1.get_xlim(), ax1.get_ylim()]),  # min of both axes
    np.max([ax1.get_xlim(), ax1.get_ylim()]),  # max of both axes
]

# now plot both limits against eachother
ax1.plot(lims, lims, 'k-', alpha=0.75, zorder=0)
ax1.set(xlabel = 'COSMOS 2015 flux ($\mu$Jy)', ylabel = 'tractor flux ($\mu$Jy)', title = 'IRAC 3.6')
ax1.set_ylim([0, ymax])
ax1.set_xlim([0, xmax])


#ch2 
#first shrink the dataframe to only those rows where I have tractor photometry 
df_tractor = df[(df.splash_2_flux> 0) & (df.splash_2_flux < fluxmax)]
#sns.regplot(data = df_tractor, x = "splash_2_flux", y = "ch2flux", ax = ax2, robust = True)
sns.scatterplot(data = df_tractor, x = "splash_2_flux", y = "ch2flux", ax = ax2)

#add a diagonal line with y = x
lims = [
    np.min([ax2.get_xlim(), ax2.get_ylim()]),  # min of both axes
    np.max([ax2.get_xlim(), ax2.get_ylim()]),  # max of both axes
]

# now plot both limits against eachother
ax2.plot(lims, lims, 'k-', alpha=0.75, zorder=0)
ax2.set(xlabel = 'COSMOS 2015 flux ($\mu$Jy)', ylabel = 'tractor flux ($\mu$Jy)', title = 'IRAC 4.5')
ax2.set_ylim([0, ymax])
ax2.set_xlim([0, xmax])


#ch3 
#first shrink the dataframe to only those rows where I have tractor photometry
df_tractor = df[(df.splash_3_flux> 0) & (df.splash_3_flux < fluxmax)]

#sns.regplot(data = df_tractor, x = "splash_3_flux", y = "ch3flux", ax = ax3, robust = True)
sns.scatterplot(data = df_tractor, x = "splash_3_flux", y = "ch3flux", ax = ax3)

#add a diagonal line with y = x
lims = [
    np.min([ax3.get_xlim(), ax3.get_ylim()]),  # min of both axes
    np.max([ax3.get_xlim(), ax3.get_ylim()]),  # max of both axes
]

# now plot both limits against eachother
ax3.plot(lims, lims, 'k-', alpha=0.75, zorder=0)
ax3.set(xlabel = 'COSMOS 2015 flux ($\mu$Jy)', ylabel = 'tractor flux ($\mu$Jy)', title = 'IRAC 5.8')
ax3.set_ylim([0, ymax])
ax3.set_xlim([0, xmax])


#ch4 
#first shrink the dataframe to only those rows where I have tractor photometry 
df_tractor = df[(df.splash_4_flux> 0) & (df.splash_4_flux < fluxmax)]

#sns.regplot(data = df_tractor, x = "splash_4_flux", y = "ch4flux", ax = ax4, robust = True)
sns.scatterplot(data = df_tractor, x = "splash_4_flux", y = "ch4flux", ax = ax4)

#add a diagonal line with y = x
lims = [
    np.min([ax4.get_xlim(), ax4.get_ylim()]),  # min of both axes
    np.max([ax4.get_xlim(), ax4.get_ylim()]),  # max of both axes
]

# now plot both limits against eachother
ax4.plot(lims, lims, 'k-', alpha=0.75, zorder=0)
ax4.set(xlabel = 'COSMOS 2015 flux ($\mu$Jy)', ylabel = 'tractor flux ($\mu$Jy)', title = 'IRAC 8.0')
ax4.set_ylim([0, ymax])
ax4.set_xlim([0, xmax])


plt.tight_layout()

fig.subplots_adjust( hspace=0.5)
fig.set_size_inches(8, 12)

#plt.savefig('output/flux_comparison.png')
```

Tractor is working for IRAC; Comparison of tractor derived fluxes with COSMOS 2015 fluxes for all four Spitzer IRAC channels.  Blue points represent each object from the subset of the COSMOS 2015 catalog.  The blue line is a linear regression robust fit to the data with uncertainties shown as the light blue wedge.  The black line is a y = x line plotted to guide the eye.

+++

## 4. Cross Match our New Photometry Catalog with an X-ray archival Catalog
We are using `nway` as the tool to do the cross match (Salvato et al. 2017).
`nway` expects input as two fits table files and outputs a third table file with all the possible matches and their probabilities of being the correct match.  We then sort that catalog and take only the best matches to be the true matches.

+++

### 4a. Retrieve the HEASARC Catalog

```{code-cell} ipython3
#first get an X-ray catalog from Heasarc
heasarc = Heasarc()
table = heasarc.query_mission_list()
mask = (table['Mission'] =="CHANDRA")
chandratable = table[mask]  

#find out which tables exist on Heasarc
#chandratable.pprint(max_lines = 200, max_width = 130)

#want ccosmoscat
mission = 'ccosmoscat'
#coords already defined above where I pull the original COSMOS catalog
ccosmoscat_rad = 1 #radius of chandra cosmos catalog
ccosmoscat = heasarc.query_region(coords, mission=mission, radius='1 degree', resultmax = 5000, fields = "ALL")
```

### 4b. Run `nway` to do the Cross-Match

```{code-cell} ipython3
#setup:

#astropy doesn't recognize capitalized units
#so there might be some warnings here on writing out the file, but we can safely ignore those

#need to make the chandra catalog into a fits table 
#and needs to include area of the survey.
nway_write_header('data/Chandra/COSMOS_chandra.fits', 'CHANDRA', float(ccosmoscat_rad**2) )


#also need to transform the main pandas dataframe into fits table for nway
#make an index column for tracking later
df['ID'] = range(1, len(df) + 1)

#need this to be a fits table and needs to include area of the survey.
rad_in_arcmin = radius.value  #units attached to this are confusing nway down the line
nway_write_header('data/multiband_phot.fits', 'OPT', float((2*rad_in_arcmin/60)**2) )

```

```{code-cell} ipython3
# call nway
!/home/jovyan/.local/bin/nway.py 'data/Chandra/COSMOS_chandra.fits' :ERROR_RADIUS 'data/multiband_phot.fits' 0.1 --out=data/Chandra/chandra_multiband.fits --radius 15 --prior-completeness 0.9
    
#!/opt/conda/bin/nway.py 'data/Chandra/COSMOS_chandra.fits' :ERROR_RADIUS 'data/multiband_phot.fits' 0.1 --out=data/Chandra/chandra_multiband.fits --radius 15 --prior-completeness 0.9
```

```{code-cell} ipython3
#Clean up the cross match results and merge them back into main pandas dataframe

#read in the nway matched catalog
xmatch = Table.read('data/Chandra/chandra_multiband.fits', hdu = 1)
df_xmatch = xmatch.to_pandas()

#manual suggests that p_i should be greater than 0.1 for a pure catalog.
#The matched catalog has multiple optical associations for some of the XMM detections.
#simplest thing to do is only keep match_flag = 1
matched = df_xmatch.loc[(df_xmatch['p_i']>=0.1) & df_xmatch['match_flag']==1]

#merge this info back into the df_optical dataframe.
merged = pd.merge(df, matched, 'outer',left_on='ID', right_on = 'OPT_ID')

#remove all the rows which start with "OPT" because they are duplications of the original catalog
merged = merged.loc[:, ~merged.columns.str.startswith('OPT')]

#somehow the matching is giving negative fluxes in the band where there is no detection 
#if there is a detection in the other band
#clean that up to make those negative fluxes = 0

merged.loc[merged['flux_chandra_2_10'] < 0, 'flux_chandra_2_10'] = 0
merged.loc[merged['flux_chandra_05_2'] < 0, 'flux_chandra_05_2'] = 0
```

```{code-cell} ipython3
#How many Chandra sources are there?
#How many Galex sources are there?

#make a new column which is a bool of existing chandra measurements
merged['cosmos_chandra_detect'] = 0
merged.loc[merged.flux_chandra_2_10 > 0,'cosmos_chandra_detect']=1

#make one for Galex too
merged['galex_detect'] = 0
merged.loc[merged.flux_galex_nuv > 0,'galex_detect']=1

#make chandra hardness ratio column:
#hard = 'flux_chandra_2_10', soft = flux_chandra_05_2, HR = (H-S)/(H+S)
merged['chandra_HR'] = (merged['flux_chandra_2_10'] - merged['flux_chandra_05_2']) / (merged['flux_chandra_2_10'] + merged['flux_chandra_05_2'])


print('number of Chandra detections =',np.sum(merged.cosmos_chandra_detect > 0))
print('number of Galex detections =',np.sum(merged.galex_detect > 0))
```

## 5. Plot Final Results
- We want to understand something about populations based on their colors

```{code-cell} ipython3
#IRAC color color plots akin to Lacy et al. 2004
#overplot galex sources
#overplot xray sources

#first select on 24 micron 
merged_24 = merged[(merged.flux_24 >= 0) ] 

#negative Galex fluxes are causing problems, so set those to zero
merged_24.loc[merged_24.fuvflux < 0, 'fuvflux'] = 0
merged_24.loc[merged_24.nuvflux < 0, 'nuvflux'] = 0

#make color columns
merged_24['F5.8divF3.6'] = merged_24.ch3flux / merged_24.ch1flux
merged_24['F8.0divF4.5'] = merged_24.ch4flux / merged_24.ch2flux

#detected in all IRAC bands
merged_allirac = merged_24[(merged_24['F8.0divF4.5'] > 0) & (merged_24['F5.8divF3.6'] > 0)]

#plot all the points
fig, ax = plt.subplots()
sns.scatterplot(data = merged_allirac, x = 'F5.8divF3.6', y = 'F8.0divF4.5',
                 ax = ax, alpha = 0.5, label = 'all')

#plot only those points with Galex detections
galex_detect = merged_allirac[merged_allirac.galex_detect > 0]
sns.scatterplot(data = galex_detect, x = 'F5.8divF3.6', y = 'F8.0divF4.5',
                 ax = ax, alpha = 0.5, label = 'Galex detect')

#plot only those points with chandra detections
chandra_detect = merged_allirac[merged_allirac.cosmos_chandra_detect > 0]
sns.scatterplot(data = chandra_detect, x = 'F5.8divF3.6', y = 'F8.0divF4.5',
                 ax = ax, label = 'Chandra detect')


ax.set(xscale="log", yscale="log")
ax.set_ylim([0.1, 10])
ax.set_xlim([0.1, 10])

ax.set(xlabel = 'log F5.8/F3.6', ylabel = 'log F8.0/F4.5')
ax.legend(loc='lower right')
plt.title('IRAC Color Color Plot')
```

This figure shows an IRAC color color plot akin to the seminal work by Lacy et al. 2004.  Points are color coded for those with Galex UV detections and those with Chandra x-ray detections. Note that the different populations are seperating out in this color color space.

```{code-cell} ipython3
#UV IR color color plot akin to Bouquin et al. 2015
fig, ax = plt.subplots()
merged['FUV-NUV'] = merged.mag_galex_fuv - merged.mag_galex_nuv
merged['NUV-3.6'] = merged.mag_galex_nuv - merged.splash_1_mag


#plot all the points
#sns.scatterplot(data = merged, x = 'NUV-3.6', y = 'FUV-NUV',
#                 ax = ax, alpha = 0.5)

#plot only those points with Galex detections
galex_detect = merged[merged.galex_detect > 0]
sns.kdeplot(data = galex_detect, x = 'NUV-3.6', y = 'FUV-NUV',
            ax = ax, fill = True, levels = 15)#scatterplot , alpha = 0.5

#plot only those points with chandra detections
#now with color coding Chandra sources by hardness ratio a la Moutard et al. 2020
chandra_detect = merged[merged.cosmos_chandra_detect > 0]
sns.scatterplot(data = chandra_detect, x = 'NUV-3.6', y = 'FUV-NUV',
                ax = ax, hue= 'chandra_HR',palette="flare")

#whew that legend for the hue is terrible
#try making it into a colorbar outside the plot instead
norm = plt.Normalize(merged['chandra_HR'].min(), merged['chandra_HR'].max())
sm = plt.cm.ScalarMappable(cmap="flare", norm=norm)
sm.set_array([])

# Remove the legend and add a colorbar
ax.get_legend().remove()
ax.figure.colorbar(sm)

#ax.set(xscale="log", yscale="log")
ax.set_ylim([-0.5, 3.5])
ax.set_xlim([-1, 7])

ax.set(xlabel = 'NUV - [3.6]', ylabel = 'FUV - NUV')
#plt.legend([],[], frameon=False)

#fig.savefig("output/color_color.png")
#mpld3.display(fig)  
```

We extend the works of Bouquin et al. 2015 and Moutard et al. 2020 by showing a GALEX - Spitzer color color diagram over plotted with Chandra detections.  Blue galaxies in these colors are generated by O and B stars and so must currently be forming stars. We find a tight blue cloud in this color space identifying those star forming galaxies.  Galaxies off of the blue cloud have had their star formation quenched, quite possibly by the existence of an AGN through removal of the gas reservoir required for star formation.  Chandra detected galaxies host AGN, and while those are more limited in number, can be shown here to be a hosted by all kinds of galaxies, including quiescent galaxies which would be in the upper right of this plot.  This likely implies that AGN are indeed involved in quenching star formation.  Additionally, we show the Chandra hardness ratio (HR) color coded according to the vertical color bar on the right side of the plot.  Those AGN with higher hardness ratios have their soft x-ray bands heavily obscured and appear to reside preferentially toward the quiescent galaxies.

+++

## References

This work made use of:

- Astroquery; Ginsburg et al., 2019, 2019AJ....157...98G

- Astropy; Astropy Collaboration 2022, Astropy Collaboration 2018, Astropy Collaboration 2013, 2022ApJ...935..167A, 2018AJ....156..123A, 2013A&A...558A..33A

- The Tractor; Lang et al. 2016, 2016AJ....151...36L

- Nyland et al. 2017 , 2017ApJS..230....9N

- Salvato et al. 2018, 2018MNRAS.473.4937S

- Laigle et al. 2016, 2016ApJS..224...24L

```{code-cell} ipython3

```