lsst.sims.maf.runComparison package¶
Submodules¶
lsst.sims.maf.runComparison.runComparison module¶
-
class
lsst.sims.maf.runComparison.runComparison.
RunComparison
(baseDir, runNames, rundirs=None, defaultResultsDb='resultsDb_sqlite.db', verbose=False)[source]¶ Bases:
object
Class to read multiple results databases, find requested summary metric comparisons, and stores results in DataFrames in class.
Set up the runs to compare and opens connections to all resultsDb_sqlite directories under baseDir/runNames[1-N] and their subdirectories. There are two ways to approach the storage and access to the MAF outputs: EITHER the outputs can be stored directly in the runNames directories or subdirectories of these: baseDir -> run1 -> subdirectory1 (e.g. ‘scheduler’, containing a resultsDb_sqlite.db file) ……………. -> subdirectoryN ……. -> runN -> subdirectoryX OR the outputs can be stored in a variety of different locations, and the names/locations then would be provided by [runNames][rundirs] – having a one-to-one correlation. In this case, you might expect the runNames to contain duplicates if there is more than one MAF output directory per run.
- Parameters
baseDir (str) – The root directory containing all of the underlying runs and their subdirectories.
runNames (list of str) – The names to label different runs. Can contain duplicate entries.
rundirs (list) – A list of directories (relative to baseDir) where the MAF outputs in runNames reside. Optional - if not provided, assumes directories are simply the names in runNames. Must have same length as runNames (note that runNames can contain duplicate entries).
-
addSummaryStats
(metricDict=None, verbose=False)[source]¶ Combine the summary statistics of a set of metrics into a pandas dataframe that is indexed by the opsim run name.and
- Parameters
metricDict (dict, opt) – A dictionary of metrics with all of the information needed to query a results database. The metric/metadata/slicer/summary values referred to by a metricDict value could be unique but don’t have to be. If None (default), then fetches all metric results. (This can be slow if there are a lot of metrics.)
verbose (bool, opt) – Issue warnings resulting from not finding the summary stat information (such as if it was never calculated) will not be issued. Default False.
- Returns
A pandas dataframe containing a column for each of the dictionary keys and related summary stats in the metricDict. The resulting dataframe is indexed by runNames. index metric1 metric2 <run_123> <metricValue1> <metricValue2> <run_124> <metricValue1> <metricValue2>
- Return type
pandas DataFrame
-
buildMetricDict
(metricNameLike=None, metricMetadataLike=None, slicerNameLike=None, subdir=None)[source]¶ Return a metric dictionary based on finding all metrics which match ‘like’ the various kwargs.
- Parameters
metricNameLike (str, opt) – Metric name like this – i.e. will look for metrics which match metricName like “value”.
metricMetadataLike (str, opt) – Metric Metadata like this.
slicerNameLike (str, opt) – Slicer name like this.
subdir (str, opt) – Find metrics from this subdir only. If other parameters are not specified, this returns all metrics within this subdir.
- Returns
Key = self-created metric ‘name’, value = Dict{metricName, metricMetadata, slicerName}
- Return type
Dict
-
filterCols
(summaryName)[source]¶ Return a dataframe containing only stats which match summaryName.
- Parameters
summaryName (str) – The type of summary stat to match. (i.e. Max, Mean)
- Returns
- Return type
pd.DataFrame
-
findChanges
(threshold=0.05)[source]¶ Return a dataframe containing only values which changed by threshhold.
- Parameters
threshold (float, opt) – Identify values which change by more than threshold (%) in the normalized values. Default 5% (0.05).
- Returns
- Return type
pd.DataFrame
-
generateDiffHtml
(normalized=False, html_out=None, show_page=False, combined=False, fullStats=False)[source]¶ Use bokeh to convert a summaryStats dataframe to interactive html table.
- Parameters
normalized (bool, opt) – If True generate html table with normalizedStats
html_out (str, opt) – Name of the html that will be output and saved. If no string is provided then the html table will not be saved.
show_page (bool, opt) – If True the html page generate by this function will automatically open in your browser
combined (bool, opt) – If True the html produce will have columns for the original summaryStats values, as well as their normalized values. The baselineRun used to calculate the normalized values will be dropped from the table.
fullStats (bool, opt) – If False the final html table will not include summaryStats that contain ‘3Sigma’,’Rms’,’Min’,’Max’,’RobustRms’, or ‘%ile’ in their names.
-
getFileNames
(metricName, metricMetadata=None, slicerName=None)[source]¶ For each of the runs in runlist, get the paths to the datafiles for a given metric.
-
normalizeStats
(baselineRun)[source]¶ Normalize the summary metric values in the dataframe resulting from combineSummaryStats based on the values of a single baseline run.
- Parameters
baselineRun (str) – The name of the opsim run that will serve as baseline.
Results –
------- –
DataFrame (pandas) – A pandas dataframe containing a column for each of the configuration parameters given in paramNamelike and a column for each of the dictionary keys in the metricDict. The resulting dataframe is indexed the name of the opsim runs. index metric1 metric2 <run_123> <norm_metricValue1> <norm_metricValue2> <run_124> <norm_metricValue1> <norm_metricValue2>
Notes –
------ –
metric values are normalized in the following way (The) –
= metric_value(run) - metric_value(baselineRun) / metric_value(baselineRun) (norm_metric_value(run)) –
-
plotMetricData
(bundleDict, plotFunc, runlist=None, userPlotDict=None, layout=None, outDir=None, savefig=False)[source]¶
lsst.sims.maf.runComparison.summaryStatPlotters module¶
-
lsst.sims.maf.runComparison.summaryStatPlotters.
plotSummaryStats
(self, output=None, totalVisits=True)[source]¶ Plot the normalized metric values as a function of opsim run.
- output: str, opt
Name of figure to save to disk. If this is left as None the figure is not saved.
- totalVisits: bool
If True the total number of visits is included in the metrics plotted. When comparing runs a very different lengths it is recommended to set this flag to False.