safep package

Submodules

safep.AFEP_parse module

safep.TI module

safep.TI.harmonicWall_U(HW, coord, L)

safep.TI.harmonicWall_dUdL(HW, coord, L)

safep.TI.make_harmonicWall(FC=10, targetFC=0, targetFE=1, upperWalls=1, schedule=None, numSteps=1000, targetEQ=500, name='HW', lowerWalls=None)

safep.TI.plot_TI(cumulative, perWindow, width=8, height=4, PDFtype='KDE', hystLim=(-1, 1), color='#0072B2', fontsize=12)

safep.TI.process_TI(dataTI, restraint, Lsched): Arguments: the TI data, restraint, and lambda schedule Function: Calculate the free energy for each lambda value, aggregate the result, and estimate the error Returns: The free energies and associated errors as functions of lambda. Both per window and cumulative.

safep.estimators module

safep.estimators.do_estimation(u_nk, method='both'): Run both exponential and BAR estimators and return the results in tidy dataframes. Arguments: u_nk in the alchemlyb format, method of fitting (String: BAR, EXP, or both) Returns: perWindow estimates (including errors), cumulative estimates (including errors)

safep.estimators.get_BAR(bar): Extract key information from an alchemlyb.BAR object. Useful for plotting. Arguments: a fitted BAR object Returns: l[ambdas], l_mid [lambda window midpoints], f [the cumulative free energy], df [the per-window free energy], ddf [the per-window errors], errors [the cumulative error]

safep.estimators.get_dG_from_data(data, temperature)

safep.estimators.get_exponential(u_nk): Get exponential estimation of the change in free energy. Arguments: u_nk in alchemlyb format Returns: l[ambdas], l_mid [lambda window midpoints], dG_f [forward estimates], dG_b [backward estimates]

safep.fileIO module

safep.fileIO.guess_lambda(fname): Guess lambda based on file name (last number in the filename divided by 100). Not very good. Arguments: file name Returns: estimated lambda value

safep.fileIO.read_FEPOUT(fileName, step=1): Reads all data from a NAMD fepout file unlike alchemlyb’s parser. readFEPOUT reads each file in a single pass: keeping track of lambda values and appending each line to an array. The array is cast to a dataframe at the end to avoid appending to a dataframe. Arguments: fileName, step (stride) Returns: a dataframe containing all the data in a fepout file.

safep.fileIO.read_Files(files, step=1): Batch readFEPOUT Arguments: file (paths), step (stride) Returns: a list of dataframes, one dataframe per file

safep.fileIO.read_UNK(filepath): Read a u_nk that was written by saveUNK. Arguments: filepath Returns: u_nk

safep.fileIO.save_UNK(u_nk, filepath, safety=True): Write u_nk to a file Arguments: u_nk in the format of alchemlyb, filepath

safep.helpers module

safep.helpers.cum_Fn(x, m, s): Wrapper for the normal cumulative density function

safep.helpers.get_Rsq(X, Y, Yexpected): Calculate the coefficient of determination for arbitrary fits. Arguments: X (inputs), Y (experimental data), Yexpected (fitted or predicted data)

safep.helpers.moving_average(x, w): Generate a moving average over x with window width w Arguments: x (data), w (window width) Returns: a moving window average

safep.helpers.pdf_Fn(x, m, s): Wrapper for the normal probability density function

safep.plotting module

safep.plotting.add_hyst_textbox(diff, pdf_x, pdf_y, pdf_ax, xoffset=0.15, yoffset=0.95, fs=14)

Add a textbox to a probability density function (PDF) plot.

Parameters: - diff (array-like): The differences between forward and backward estimates. - pdf_x (array-like): The x-values of the PDF plot. - pdf_y (array-like): The y-values of the PDF plot. - pdf_ax (matplotlib.axes.Axes): The axes object of the PDF plot. - xoffset, yoffset (floats): the offsets for placing the textbox (default:0.15,0.95, respectively) - fs: the fontsize (default:14)

Returns: - pdf_ax (matplotlib.axes.Axes): The updated axes object with the textbox added.

Description: This function adds a textbox to a probability density function (PDF) plot. The textbox displays the mode, mean, and standard deviation of the differences between forward and backward estimates. The mode is calculated as the x-value at which the PDF reaches its maximum. The mean and standard deviation are calculated using the numpy functions np.average() and np.std(), respectively.

The textbox is positioned at the coordinates (xoffset, yoffset) relative to the axes object’s transformation. The text is formatted using LaTeX syntax and includes the mode, mean, and standard deviation values.

Example: diff = [0.1, 0.2, 0.3, 0.4, 0.5] pdf_x = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5] pdf_y = [0.2, 0.4, 0.6, 0.8, 1.0, 0.8] pdf_ax = plt.gca()

pdf_ax = add_hyst_textbox(diff, pdf_x, pdf_y, pdf_ax) plt.show()

safep.plotting.convergence_plot(theax, fs, ferr, bs, berr, fwd_color='#0072B2', bwd_color='#D55E00', fwd_legend=None, bwd_legend=None, fontsize=12, errorbars=True): Convergence plot. Does the convergence plotting. Returns: a pyplot

safep.plotting.do_conv_plot(ax, X, fs, ferr, fwd_color, label=None): A minimalist convergence plot Arguments: ax[is], X (the X values), fs (forward estimates), ferr (forward errors), fwdColor (the forward color), label text Returns: ax[is]

safep.plotting.fb_discrepancy_hist(dG_f, dG_b): Plot the distribution of the hystersis Arguments: dG_f, dG_b Returns: a pyplot

safep.plotting.fb_discrepancy_plot(l_mid, dG_f, dG_b): Plot the forward-backward discrepancy (“hysteresis”) Arguments: l_mid (lambda window midpoints), dG_f (forward estimates), dG_b (backward estimates) Returns: a pyplot

safep.plotting.plot_general(cumulative, cumulative_ylim, per_window, per_window_ylim, RT, width=8, height=4, pdf_type='KDE', fontsize=12, fig=None, axes=None, hysttype='classic', label=None, color='blue', errorbars=True, xlim=[-1, 1])

Plot the general analysis of cumulative and per-window changes in delta-G.

Arguments: - cumulative: A pandas DataFrame containing the cumulative changes in delta-G. - cumulativeYlim: The y-axis limits for the cumulative plot. - perWindow: A pandas DataFrame containing the per-window changes in delta-G. - perWindowYlim: The y-axis limits for the per-window plot. - RT: The gas constant times the temperature. - width: The width of the figure. - height: The height of the figure. - PDFtype: The type of probability density function to plot (either ‘KDE’ or ‘Histogram’). - fontsize: The font size of the plot labels. - fig: The figure object to use for plotting. - axes: The axes objects to use for plotting. - hysttype: The type of hysteresis plot to use (either ‘classic’ or ‘lines’). - label: The label for the plot. - color: The color of the plot. - errorbars: Whether to include error bars in the per-window plot.

Returns: - fig: The figure object. - axes: The axes objects.

safep.plotting.plot_general_legacy(cumulative, cumulative_ylim, perWindow, per_window_ylim, RT, width=8, height=4, PDFtype='KDE', fontsize=12): This is a deprecated plotting function

safep.plotting.plot_hysteresis(axes, per_window, hysttype, color='blue', fontsize=12, pdf_type='KDE', xlim=[-1, 1], textbox=True)

safep.plotting.plot_samples(theax, u_nk, color='blue', label='Raw Data')

Plot the number of samples against the lambda values.

Parameters:

theax (matplotlib.axes.Axes) – The axes object to plot on.
u_nk (pd.DataFrame) – The input data frame containing the samples and lambda values.
color (str, optional) – The color of the plot line. Defaults to ‘blue’.
label (str, optional) – The label for the plot legend. Defaults to ‘Raw Data’.

Returns:

The axes object after plotting the data.

Return type:

matplotlib.axes.Axes

safep.processing module

safep.processing.alt_convergence(u_nk, nbins): An alternative convergence calculation that uses percentile blocks instead of cumulative samples. Inspired by block averages. Uses the BAR estimator. Arguments: u_nk, nbins (number of blocks) Returns: forward estimates, forward errors

safep.processing.batch_process(paths, RT, decorrelate, pattern, temperature, detectEQ): Read and process all NAMD FEPout files that match a given pattern Arguments: paths (list of strings), RT (float), decorrelate (boolean, whether or not to use alchemlyb’s decorrelation functions), pattern (to match), temperature, detectEQ (boolean, whether or not to use alchemlyb’s equilibrium detection) Returns: lists of: u_nks, cumulative estimates, perWindow estimates, an affix (that describes the data generation protocol)

safep.processing.bootstrap_estimate(u_nk, estimator='BAR', iterations=100, schedule=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]): Bootstrapped free energy estimates with variable sample sizes. E.g. if schedule=[10], each bootstrapped sample will only be 10% the size of the original Arguments: u_nk, estimator (BAR or EXP[onential]), iterations (number of bootstrapped datasets to generate for each sample size), schedule (percent data to sample) Returns: if estimator==’BAR’, N estimates for each schedule value. if estimator==’EXP’, N estimates of forward, backward, and the mean of the two for each schedule value

safep.processing.decorrelate_u_nk(u_nk: DataFrame, method='dE') → DataFrame

safep.processing.detect_equilibrium_u_nk(u_nk: DataFrame)

safep.processing.do_convergence(u_nk, tau=1, num_points=10): Convergence calculation. Incrementally adds data from either the start or the end of each windows simulation and calculates the resulting change in free energy. Arguments: u_nk, tau (an error scaling factor), num_points (number of chunks) Returns: forward-sampled estimate (starting from t=start), forward-sampled error, backward-sampled estimate (from t=end), backward-sampled error

safep.processing.do_per_lambda_convergence(u_nk): Convergence calculation. Incrementally adds data from either the start or the end of each windows simulation and calculates the resulting change in free energy. Arguments: u_nk, tau (an error scaling factor), num_points (number of chunks) Returns: forward-sampled estimate (starting from t=start), forward-sampled error, backward-sampled estimate (from t=end), backward-sampled error

safep.processing.get_PDF(dG_f, dG_b, DiscrepancyFitting='LS', dx=0.01, binNum=20): Estimate the PDF of the discrepancy data by fitting to a gaussian Arguments: Forward estimates, backward estimates, DiscrepancyFitting (‘LS’ least squares of ‘ML’ maximum likelihood), dx (sample width for fitted gaussian), binNum (number of bins for histogramming) Returns: (experimental) X, Y, pdfX, pdfY, fitted parameters, (fitted) pdfXnorm, pdfYnorm, pdfYexpected (for each experimental X)

safep.processing.get_empirical_CI(allSamples, CI=0.95): Get empirical confidence intervals from a dataframe Arguments: allSamples (the dataframe), CI (the interval to determine)[0.95] Returns: (upper bounds, lower bounds, means)

safep.processing.get_limits(allSamples): Get the mean and +/- 1 standard deviation for each group in a matrix Arguments: allSamples (the dataframe) Returns: (upper bounds, lower bounds, means)

safep.processing.get_moving_ave_slope(X, FX, window): Estimates the PDF from a moving window average of a CDF Arguments: X (inputs), FX (the CDF), window (width) Returns: slopes

safep.processing.get_n_samples(u_nk: DataFrame) → Series

safep.processing.read_and_process(fepoutFiles, temperature, decorrelate, detectEQ): Read NAMD fepout files for a single calculation and carry out any decorrelation or equilibrium detection Arguments: files to parse, temperature, decorrelate (boolean, whether or not to use alchemlyb’s decorrelation functions), detectEQ (boolean, whether or not to use alchemlyb’s equilibrium detection) Returns: u_nk

safep.processing.subsample(unkGrps, lowPct, hiPct): Subsamples a u_nk dataframe using percentiles [0-100] of data instead of absolute percents. Arguments: unkGrps (u_nk grouped by fep-lambda), lowPct (lower percentile bound), hiPct (upper percentile bound) Returns: partial (a u_nk in which each window is subsampled between the lower and upper percentile bounds)

safep.processing.u_nk_from_DF(data, temperature, eqTime, warnings=True): Experimental. caveat emptor. Generate an alchemlyb-type u_nk dataframe from a dense dataframe generated by readFEPOUT Arguments: data[frame from readFEPOUT], temperature, eqTime (time to equilibrium), warnings (boolean, whether or not to print warnings) Returns: u_nk

Module contents

safep

Tools for Analyzing and Debugging (SA)FEP calculations