lsst.sims.maf.runComparison package

Submodules

lsst.sims.maf.runComparison.runComparison module

class lsst.sims.maf.runComparison.runComparison.RunComparison(baseDir, runNames, rundirs=None, defaultResultsDb='resultsDb_sqlite.db', verbose=False)[source]

Bases: object

Class to read multiple results databases, find requested summary metric comparisons, and stores results in DataFrames in class.

Set up the runs to compare and opens connections to all resultsDb_sqlite directories under baseDir/runNames[1-N] and their subdirectories. There are two ways to approach the storage and access to the MAF outputs: EITHER the outputs can be stored directly in the runNames directories or subdirectories of these: baseDir -> run1 -> subdirectory1 (e.g. ‘scheduler’, containing a resultsDb_sqlite.db file) ……………. -> subdirectoryN ……. -> runN -> subdirectoryX OR the outputs can be stored in a variety of different locations, and the names/locations then would be provided by [runNames][rundirs] – having a one-to-one correlation. In this case, you might expect the runNames to contain duplicates if there is more than one MAF output directory per run.

Parameters
  • baseDir (str) – The root directory containing all of the underlying runs and their subdirectories.

  • runNames (list of str) – The names to label different runs. Can contain duplicate entries.

  • rundirs (list) – A list of directories (relative to baseDir) where the MAF outputs in runNames reside. Optional - if not provided, assumes directories are simply the names in runNames. Must have same length as runNames (note that runNames can contain duplicate entries).

addSummaryStats(metricDict=None, verbose=False)[source]

Combine the summary statistics of a set of metrics into a pandas dataframe that is indexed by the opsim run name.and

Parameters
  • metricDict (dict, opt) – A dictionary of metrics with all of the information needed to query a results database. The metric/metadata/slicer/summary values referred to by a metricDict value could be unique but don’t have to be. If None (default), then fetches all metric results. (This can be slow if there are a lot of metrics.)

  • verbose (bool, opt) – Issue warnings resulting from not finding the summary stat information (such as if it was never calculated) will not be issued. Default False.

Returns

A pandas dataframe containing a column for each of the dictionary keys and related summary stats in the metricDict. The resulting dataframe is indexed by runNames. index metric1 metric2 <run_123> <metricValue1> <metricValue2> <run_124> <metricValue1> <metricValue2>

Return type

pandas DataFrame

buildMetricDict(metricNameLike=None, metricMetadataLike=None, slicerNameLike=None, subdir=None)[source]

Return a metric dictionary based on finding all metrics which match ‘like’ the various kwargs.

Parameters
  • metricNameLike (str, opt) – Metric name like this – i.e. will look for metrics which match metricName like “value”.

  • metricMetadataLike (str, opt) – Metric Metadata like this.

  • slicerNameLike (str, opt) – Slicer name like this.

  • subdir (str, opt) – Find metrics from this subdir only. If other parameters are not specified, this returns all metrics within this subdir.

Returns

Key = self-created metric ‘name’, value = Dict{metricName, metricMetadata, slicerName}

Return type

Dict

close()[source]

Close all connections to the results database files.

filterCols(summaryName)[source]

Return a dataframe containing only stats which match summaryName.

Parameters

summaryName (str) – The type of summary stat to match. (i.e. Max, Mean)

Returns

Return type

pd.DataFrame

findChanges(threshold=0.05)[source]

Return a dataframe containing only values which changed by threshhold.

Parameters

threshold (float, opt) – Identify values which change by more than threshold (%) in the normalized values. Default 5% (0.05).

Returns

Return type

pd.DataFrame

generateDiffHtml(normalized=False, html_out=None, show_page=False, combined=False, fullStats=False)[source]

Use bokeh to convert a summaryStats dataframe to interactive html table.

Parameters
  • normalized (bool, opt) – If True generate html table with normalizedStats

  • html_out (str, opt) – Name of the html that will be output and saved. If no string is provided then the html table will not be saved.

  • show_page (bool, opt) – If True the html page generate by this function will automatically open in your browser

  • combined (bool, opt) – If True the html produce will have columns for the original summaryStats values, as well as their normalized values. The baselineRun used to calculate the normalized values will be dropped from the table.

  • fullStats (bool, opt) – If False the final html table will not include summaryStats that contain ‘3Sigma’,’Rms’,’Min’,’Max’,’RobustRms’, or ‘%ile’ in their names.

getFileNames(metricName, metricMetadata=None, slicerName=None)[source]

For each of the runs in runlist, get the paths to the datafiles for a given metric.

Parameters
  • metricName (str) – The name of the original metric.

  • metricMetadata (str, opt) – The metric metadata specifying the metric desired (optional).

  • slicerName (str, opt) – The slicer name specifying the metric desired (optional).

Returns

Keys: runName, Value: path to file

Return type

Dict

normalizeStats(baselineRun)[source]

Normalize the summary metric values in the dataframe resulting from combineSummaryStats based on the values of a single baseline run.

Parameters
  • baselineRun (str) – The name of the opsim run that will serve as baseline.

  • Results

  • -------

  • DataFrame (pandas) – A pandas dataframe containing a column for each of the configuration parameters given in paramNamelike and a column for each of the dictionary keys in the metricDict. The resulting dataframe is indexed the name of the opsim runs. index metric1 metric2 <run_123> <norm_metricValue1> <norm_metricValue2> <run_124> <norm_metricValue1> <norm_metricValue2>

  • Notes

  • ------

  • metric values are normalized in the following way (The) –

  • = metric_value(run) - metric_value(baselineRun) / metric_value(baselineRun) (norm_metric_value(run)) –

plotMetricData(bundleDict, plotFunc, runlist=None, userPlotDict=None, layout=None, outDir=None, savefig=False)[source]
readMetricData(metricName, metricMetadata, slicerName)[source]
sortCols(baseName=True, summaryName=True)[source]

Return the columns (in order) to display a sorted version of the stats dataframe.

Parameters
  • baseName (bool, opt) – Sort by the baseName. Default True. If True, this takes priority in the sorted results.

  • summaryName (bool, opt) – Sort by the summary stat name (summaryName). Default True.

Returns

Return type

list

lsst.sims.maf.runComparison.summaryStatPlotters module

lsst.sims.maf.runComparison.summaryStatPlotters.plotSummaryStats(self, output=None, totalVisits=True)[source]

Plot the normalized metric values as a function of opsim run.

output: str, opt

Name of figure to save to disk. If this is left as None the figure is not saved.

totalVisits: bool

If True the total number of visits is included in the metrics plotted. When comparing runs a very different lengths it is recommended to set this flag to False.

Module contents