safep package

Submodules

safep.AFEP_parse module

Large datasets can be difficult to parse on a workstation due to inefficiencies in the way data is represented for pymbar. When possible, reduce the size of your dataset.

class safep.AFEP_parse.AFEPArgumentParser(*args, **kwargs)

Bases: ArgumentParser

Dedicated CLI argument parser for AFEP.

path

root path for data folder.

Type:: str|Path

fepoutre

regex for fepout files

Type:: str

replicare

regex for replica directories

Type:: str

temperature

the temperature at which the simulation was run (K)

Type:: float

detect_equilibrium

Flag to run automated equilibrium detection and downsampling

Type:: bool

fittingMethod

Method for fitting forward-backward discrepancies. Untested.

Type:: str

max_size

UNUSED. Maximum size of data to parse. Default 1GB Note: this should be much less than the total RAM available.

Type:: float

make_figures

Whether or not to generate figures. Default False.

Type:: bool

class safep.AFEP_parse.AFEPArguments(dataroot: Path, replica_pattern: str, filename_pattern: str, temperature: float, detect_equilibrium: bool, make_figures: bool, RT_kcal_per_mol: float = None, replicas: list[str] = None)

Bases: object

A more readable/testable format for CLI arguments

RT_kcal_per_mol: float

dataroot: Path

detect_equilibrium: bool

filename_pattern: str

classmethod from_AFEPArgumentParser(parser: AFEPArgumentParser)

Unpack arguments from an AFEPArgumentParser object.

Parameters:: parser (AFEPArgumentParser) – An AFEPArgumentParser object.
Returns:: The unpacked arguments.
Return type:: AFEPArguments

make_figures: bool

replica_pattern: str

replicas: list[str]

temperature: float

safep.AFEP_parse.add_summary_stats(mean, sterr, axes)

Add summary statistics to a general safep figure.

Parameters:

mean (float) – the mean free energy across replicas.
sterr (float) – the standard deviation or standard error of the free energy across replicas.
axes (matplotlib.axes) – matplotlib axes object

Returns:

the modified matplotlib axes object

Return type:

Axes

safep.AFEP_parse.add_to_general_figure(fig, axes, args, key, feprun)

Add another replica to an existing figure.

Parameters:

fig (matplotlib.figure.Figure) – The figure to add to.
axes (matplotlib.axes._subplots.Axes) – The axes to add to.
args (argparse.Namespace) – The parsed command line arguments.
key (str) – The key to use for the figure.
feprun (safep.Feprun) – The feprun replica to use.

Returns:

The figure and axes that were modified

Return type:

tuple(Fig, Axes)

safep.AFEP_parse.do_agg_data(dataax, plotax)

Aggregates data from a given matplotlib axis, computes statistical measures, and displays them on another axis.

Parameters: dataax (matplotlib.axes.Axes): The axis containing lines with data to be aggregated. plotax (matplotlib.axes.Axes): The axis where the statistical summary will be displayed.

Returns: matplotlib.axes.Axes: The plot axis with the statistical summary text added.

safep.AFEP_parse.do_general_figures_plot(args, fepruns, mean, sterr)

plot general SAFEP figures

Parameters:

args (AFEPArguments) – arguments from argparse
fepruns (dict) – fepruns dictionary
mean (float) – mean free energy across replicas
sterr (float) – sterr free energy across replicas

Returns:

figure and axes objects containing the figure panels

Return type:

fig, axes

safep.AFEP_parse.do_per_lambda_convergence_shared_axes(fepruns, mean, sterr)

Plot lambda convergence for all replicas.

Parameters:

args (AFEPArguments) – command line arguments
fepruns (dict) – fepruns dictionary
mean (float) – mean value across replicas
sterr (float) – sterr value across replicas
axes (list) – matplotlib axes list

Returns:

figure and axes objects with the per lambda convergence

Return type:

tuple[Fig, Axes]

safep.AFEP_parse.do_shared_convergence_plot(args, fepruns, dGs)

Make the convergence plot (reverse and forward cumulative averages)

Parameters:

args (AFEPArguments) – command line arguments
fepruns (dict) – fepruns dictionary
dGs (list) – list of free energies

Returns:

figure and axes with FCA/RCA convergence

Return type:

tuple(Fig, Axes)

safep.AFEP_parse.get_summary_statistics(args, fepruns)

Extract summary statistics from fepruns.

Parameters:

args (AFEPArguments) – parsed command line arguments
fepruns (dict) – dict of fepruns

Returns:

a pretty print string, individual delta Gs, and the mean and standard error across replicas

Return type:

tuple[str, list[float], float, float]

safep.AFEP_parse.initialize_general_figure(RT_kcal_per_mol, key, feprun)

Create a new general figure

Parameters:

RT_kcal_per_mol (float) – RT in kcal/mol
key (str) – The key of the initial dataset
feprun (safep.Feprun) – the feprun to plot

Returns:

The figure and axes

Return type:

Tuple(Fig,Axes)

safep.AFEP_parse.main(): Main function for parsing fep data and calculating free energy of (de)coupling

safep.AFEP_parse.make_figures(args, fepruns, dGs, mean, sterr) → None

Make general SAFEP figures and convergence plots

Parameters:

args (AFEPArguments) – arguments from argparse
fepruns (dict) – fepruns dictionary
dGs (list) – list of free energies
mean (float) – mean free energy across replicas
sterr (float) – sterr free energy across replicas

Returns:

None

Side effects:: Saves FEP_general_figures.pdf, FEP_convergence.pdf, and FEP_perLambda_convergence.pdf

safep.TI module

safep.TI.harmonicWall_U(HW, coord, L)

safep.TI.harmonicWall_dUdL(HW, coord, L)

safep.TI.make_harmonicWall(FC=10, targetFC=0, targetFE=1, upperWalls=1, schedule=None, numSteps=1000, targetEQ=500, name='HW', lowerWalls=None, lambdaExp=1.0, decoupling=None): This method should rarely be used, as long as a log file exists that can be parsed by parse_Colvars_log and the result fed to make_harmonicWall_from_Colvars

safep.TI.make_harmonicWall_from_Colvars(restraint_conf): Input: a restraint configuration as a dict produced by parse_Colvars_log

safep.TI.plot_TI(cumulative, perWindow, width=8, height=4, PDFtype='KDE', hystLim=(-1, 1), color='#0072B2', fontsize=12)

safep.TI.process_TI(dataTI, restraint, Lsched): Arguments: the TI data, restraint, and lambda schedule Function: Calculate the free energy for each lambda value, aggregate the result, and estimate the error Returns: The free energies and associated errors as functions of lambda. Both per window and cumulative.

safep.estimators module

safep.estimators.do_estimation(u_nk, method='both'): Run both exponential and BAR estimators and return the results in tidy dataframes. Arguments: u_nk in the alchemlyb format, method of fitting (String: BAR, EXP, or both) Returns: perWindow estimates (including errors), cumulative estimates (including errors)

safep.estimators.get_BAR(bar): Extract key information from an alchemlyb.BAR object. Useful for plotting. Arguments: a fitted BAR object Returns: l[ambdas], l_mid [lambda window midpoints], f [the cumulative free energy], df [the per-window free energy], ddf [the per-window errors], errors [the cumulative error]

safep.estimators.get_dG_from_data(data, temperature)

safep.estimators.get_exponential(u_nk): Get exponential estimation of the change in free energy. Arguments: u_nk in alchemlyb format Returns: l[ambdas], l_mid [lambda window midpoints], dG_f [forward estimates], dG_b [backward estimates]

safep.fileIO module

safep.fileIO.guess_lambda(fname): Guess lambda based on file name (last number in the filename divided by 100). Not very good. Arguments: file name Returns: estimated lambda value

safep.fileIO.parse_Colvars_log(filename)

Parses Colvars standard output from a log file

Returns dictionaries containing key-value pairs global_conf: top-level Colvars parameters colvars: list of dicts, one for each colvar biases: list of dicts, one for each bias

Note: within colvars, parameters of sub-objects (components and atom groups) are found as nested dictionaries in a list under the ‘children’ key, e.g.: colvars[0][‘children’][0] is the first component of the first colvar colvars[0][‘children’][0][‘children’][0] is the first atom group of that component

safep.fileIO.read_FEPOUT(fileName, step=1): Reads all data from a NAMD fepout file unlike alchemlyb’s parser. readFEPOUT reads each file in a single pass: keeping track of lambda values and appending each line to an array. The array is cast to a dataframe at the end to avoid appending to a dataframe. Arguments: fileName, step (stride) Returns: a dataframe containing all the data in a fepout file.

safep.fileIO.read_Files(files, step=1): Batch readFEPOUT Arguments: file (paths), step (stride) Returns: a list of dataframes, one dataframe per file

safep.fileIO.read_UNK(filepath): Read a u_nk that was written by saveUNK. Arguments: filepath Returns: u_nk

safep.fileIO.save_UNK(u_nk, filepath, safety=True): Write u_nk to a file Arguments: u_nk in the format of alchemlyb, filepath

safep.helpers module

safep.helpers.cum_Fn(x, m, s): Wrapper for the normal cumulative density function

safep.helpers.get_Rsq(X, Y, Yexpected): Calculate the coefficient of determination for arbitrary fits. Arguments: X (inputs), Y (experimental data), Yexpected (fitted or predicted data)

safep.helpers.moving_average(x, w): Generate a moving average over x with window width w Arguments: x (data), w (window width) Returns: a moving window average

safep.helpers.pdf_Fn(x, m, s): Wrapper for the normal probability density function

safep.plotting module

safep.plotting.add_hyst_textbox(diff, pdf_x, pdf_y, pdf_ax, xoffset=0.15, yoffset=0.95, fs=14)

Add a textbox to a probability density function (PDF) plot.

Parameters: - diff (array-like): The differences between forward and backward estimates. - pdf_x (array-like): The x-values of the PDF plot. - pdf_y (array-like): The y-values of the PDF plot. - pdf_ax (matplotlib.axes.Axes): The axes object of the PDF plot. - xoffset, yoffset (floats): the offsets for placing the textbox (default:0.15,0.95, respectively) - fs: the fontsize (default:14)

Returns: - pdf_ax (matplotlib.axes.Axes): The updated axes object with the textbox added.

Description: This function adds a textbox to a probability density function (PDF) plot. The textbox displays the mode, mean, and standard deviation of the differences between forward and backward estimates. The mode is calculated as the x-value at which the PDF reaches its maximum. The mean and standard deviation are calculated using the numpy functions np.average() and np.std(), respectively.

The textbox is positioned at the coordinates (xoffset, yoffset) relative to the axes object’s transformation. The text is formatted using LaTeX syntax and includes the mode, mean, and standard deviation values.

Example: diff = [0.1, 0.2, 0.3, 0.4, 0.5] pdf_x = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5] pdf_y = [0.2, 0.4, 0.6, 0.8, 1.0, 0.8] pdf_ax = plt.gca()

pdf_ax = add_hyst_textbox(diff, pdf_x, pdf_y, pdf_ax) plt.show()

safep.plotting.convergence_plot(theax, fs, ferr, bs, berr, fwd_color='#0072B2', bwd_color='#D55E00', fwd_legend=None, bwd_legend=None, fontsize=12, errorbars=True): Convergence plot. Does the convergence plotting. Returns: a pyplot

safep.plotting.do_conv_plot(ax, X, fs, ferr, fwd_color, label=None): A minimalist convergence plot Arguments: ax[is], X (the X values), fs (forward estimates), ferr (forward errors), fwdColor (the forward color), label text Returns: ax[is]

safep.plotting.fb_discrepancy_hist(dG_f, dG_b): Plot the distribution of the hystersis Arguments: dG_f, dG_b Returns: a pyplot

safep.plotting.fb_discrepancy_plot(l_mid, dG_f, dG_b): Plot the forward-backward discrepancy (“hysteresis”) Arguments: l_mid (lambda window midpoints), dG_f (forward estimates), dG_b (backward estimates) Returns: a pyplot

safep.plotting.plot_general(cumulative, cumulative_ylim, per_window, per_window_ylim, RT, width=8, height=4, pdf_type='KDE', fontsize=12, fig=None, axes=None, hysttype='classic', label=None, color='blue', errorbars=True, xlim=[-1, 1])

Plot the general analysis of cumulative and per-window changes in delta-G.

Arguments: - cumulative: A pandas DataFrame containing the cumulative changes in delta-G. - cumulativeYlim: The y-axis limits for the cumulative plot. - perWindow: A pandas DataFrame containing the per-window changes in delta-G. - perWindowYlim: The y-axis limits for the per-window plot. - RT: The gas constant times the temperature. - width: The width of the figure. - height: The height of the figure. - PDFtype: The type of probability density function to plot (either ‘KDE’ or ‘Histogram’). - fontsize: The font size of the plot labels. - fig: The figure object to use for plotting. - axes: The axes objects to use for plotting. - hysttype: The type of hysteresis plot to use (either ‘classic’ or ‘lines’). - label: The label for the plot. - color: The color of the plot. - errorbars: Whether to include error bars in the per-window plot.

Returns: - fig: The figure object. - axes: The axes objects.

safep.plotting.plot_general_legacy(cumulative, cumulative_ylim, perWindow, per_window_ylim, RT, width=8, height=4, PDFtype='KDE', fontsize=12): This is a deprecated plotting function

safep.plotting.plot_hysteresis(axes, per_window, hysttype, color='blue', fontsize=12, pdf_type='KDE', xlim=[-1, 1], textbox=True)

safep.plotting.plot_samples(theax, u_nk, color='blue', label='Raw Data')

Plot the number of samples against the lambda values.

Parameters:

theax (matplotlib.axes.Axes) – The axes object to plot on.
u_nk (pd.DataFrame) – The input data frame containing the samples and lambda values.
color (str, optional) – The color of the plot line. Defaults to ‘blue’.
label (str, optional) – The label for the plot legend. Defaults to ‘Raw Data’.

Returns:

The axes object after plotting the data.

Return type:

matplotlib.axes.Axes

safep.processing module

safep.processing.alt_convergence(u_nk, nbins): An alternative convergence calculation that uses percentile blocks instead of cumulative samples. Inspired by block averages. Uses the BAR estimator. Arguments: u_nk, nbins (number of blocks) Returns: forward estimates, forward errors

safep.processing.batch_process(paths, RT, decorrelate, pattern, temperature, detectEQ): Read and process all NAMD FEPout files that match a given pattern Arguments: paths (list of strings), RT (float), decorrelate (boolean, whether or not to use alchemlyb’s decorrelation functions), pattern (to match), temperature, detectEQ (boolean, whether or not to use alchemlyb’s equilibrium detection) Returns: lists of: u_nks, cumulative estimates, perWindow estimates, an affix (that describes the data generation protocol)

safep.processing.bootstrap_estimate(u_nk, estimator='BAR', iterations=100, schedule=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]): Bootstrapped free energy estimates with variable sample sizes. E.g. if schedule=[10], each bootstrapped sample will only be 10% the size of the original Arguments: u_nk, estimator (BAR or EXP[onential]), iterations (number of bootstrapped datasets to generate for each sample size), schedule (percent data to sample) Returns: if estimator==’BAR’, N estimates for each schedule value. if estimator==’EXP’, N estimates of forward, backward, and the mean of the two for each schedule value

safep.processing.decorrelate_u_nk(u_nk: DataFrame, method='dE') → DataFrame

Decorrelate each window without equilibrium detection.

Parameters:

u_nk (pd.DataFrame) – the energies in the alchemlyb style
method (str, optional) – decorrelation method. See subsampling.decorrelate_u_nk. Defaults to “dE”.

Returns:

the decorrelated energies in the alchemlyb style

Return type:

pd.DataFrame

safep.processing.detect_equilibrium_u_nk(u_nk: DataFrame)

safep.processing.do_convergence(u_nk, tau=1, num_points=10, estimator='BAR'): Convergence calculation. Incrementally adds data from either the start or the end of each windows simulation and calculates the resulting change in free energy. Arguments: u_nk, tau (an error scaling factor), num_points (number of chunks) Returns: forward-sampled estimate (starting from t=start), forward-sampled error, backward-sampled estimate (from t=end), backward-sampled error

safep.processing.do_per_lambda_convergence(u_nk): Convergence calculation. Incrementally adds data from either the start or the end of each windows simulation and calculates the resulting change in free energy. Arguments: u_nk, tau (an error scaling factor), num_points (number of chunks) Returns: forward-sampled estimate (starting from t=start), forward-sampled error, backward-sampled estimate (from t=end), backward-sampled error

safep.processing.get_PDF(dG_f, dG_b, DiscrepancyFitting='LS', dx=0.01, binNum=20): Estimate the PDF of the discrepancy data by fitting to a gaussian Arguments: Forward estimates, backward estimates, DiscrepancyFitting (‘LS’ least squares of ‘ML’ maximum likelihood), dx (sample width for fitted gaussian), binNum (number of bins for histogramming) Returns: (experimental) X, Y, pdfX, pdfY, fitted parameters, (fitted) pdfXnorm, pdfYnorm, pdfYexpected (for each experimental X)

safep.processing.get_empirical_CI(allSamples, CI=0.95): Get empirical confidence intervals from a dataframe Arguments: allSamples (the dataframe), CI (the interval to determine)[0.95] Returns: (upper bounds, lower bounds, means)

safep.processing.get_limits(allSamples): Get the mean and +/- 1 standard deviation for each group in a matrix Arguments: allSamples (the dataframe) Returns: (upper bounds, lower bounds, means)

safep.processing.get_moving_ave_slope(X, FX, window): Estimates the PDF from a moving window average of a CDF Arguments: X (inputs), FX (the CDF), window (width) Returns: slopes

safep.processing.get_n_samples(u_nk: DataFrame) → Series

safep.processing.read_and_process(fepoutFiles, temperature, decorrelate, detectEQ): Read NAMD fepout files for a single calculation and carry out any decorrelation or equilibrium detection Arguments: files to parse, temperature, decorrelate (boolean, whether or not to use alchemlyb’s decorrelation functions), detectEQ (boolean, whether or not to use alchemlyb’s equilibrium detection) Returns: u_nk

safep.processing.subsample(unkGrps, lowPct, hiPct): Subsamples a u_nk dataframe using percentiles [0-100] of data instead of absolute percents. Arguments: unkGrps (u_nk grouped by fep-lambda), lowPct (lower percentile bound), hiPct (upper percentile bound) Returns: partial (a u_nk in which each window is subsampled between the lower and upper percentile bounds)

safep.processing.u_nk_from_DF(data, temperature, eqTime, warnings=True): Experimental. caveat emptor. Generate an alchemlyb-type u_nk dataframe from a dense dataframe generated by readFEPOUT Arguments: data[frame from readFEPOUT], temperature, eqTime (time to equilibrium), warnings (boolean, whether or not to print warnings) Returns: u_nk

Module contents

safep

Tools for Analyzing and Debugging (SA)FEP calculations