safep package
Submodules
safep.AFEP_parse module
Large datasets can be difficult to parse on a workstation due to inefficiencies in the way data is represented for pymbar. When possible, reduce the size of your dataset.
- class safep.AFEP_parse.AFEPArgumentParser(*args, **kwargs)
Bases:
ArgumentParserDedicated CLI argument parser for AFEP.
- path
root path for data folder.
- Type:
str|Path
- fepoutre
regex for fepout files
- Type:
str
- replicare
regex for replica directories
- Type:
str
- temperature
the temperature at which the simulation was run (K)
- Type:
float
- detect_equilibrium
Flag to run automated equilibrium detection and downsampling
- Type:
bool
- fittingMethod
Method for fitting forward-backward discrepancies. Untested.
- Type:
str
- max_size
UNUSED. Maximum size of data to parse. Default 1GB Note: this should be much less than the total RAM available.
- Type:
float
- make_figures
Whether or not to generate figures. Default False.
- Type:
bool
- class safep.AFEP_parse.AFEPArguments(dataroot: Path, replica_pattern: str, filename_pattern: str, temperature: float, detect_equilibrium: bool, make_figures: bool, RT_kcal_per_mol: float = None, replicas: list[str] = None)
Bases:
objectA more readable/testable format for CLI arguments
- RT_kcal_per_mol: float
- dataroot: Path
- detect_equilibrium: bool
- filename_pattern: str
- classmethod from_AFEPArgumentParser(parser: AFEPArgumentParser)
Unpack arguments from an AFEPArgumentParser object.
- Parameters:
parser (AFEPArgumentParser) – An AFEPArgumentParser object.
- Returns:
The unpacked arguments.
- Return type:
- make_figures: bool
- replica_pattern: str
- replicas: list[str]
- temperature: float
- safep.AFEP_parse.add_summary_stats(mean, sterr, axes)
Add summary statistics to a general safep figure.
- Parameters:
mean (float) – the mean free energy across replicas.
sterr (float) – the standard deviation or standard error of the free energy across replicas.
axes (matplotlib.axes) – matplotlib axes object
- Returns:
the modified matplotlib axes object
- Return type:
Axes
- safep.AFEP_parse.add_to_general_figure(fig, axes, args, key, feprun)
Add another replica to an existing figure.
- Parameters:
fig (matplotlib.figure.Figure) – The figure to add to.
axes (matplotlib.axes._subplots.Axes) – The axes to add to.
args (argparse.Namespace) – The parsed command line arguments.
key (str) – The key to use for the figure.
feprun (safep.Feprun) – The feprun replica to use.
- Returns:
The figure and axes that were modified
- Return type:
tuple(Fig, Axes)
- safep.AFEP_parse.do_agg_data(dataax, plotax)
Aggregates data from a given matplotlib axis, computes statistical measures, and displays them on another axis.
Parameters: dataax (matplotlib.axes.Axes): The axis containing lines with data to be aggregated. plotax (matplotlib.axes.Axes): The axis where the statistical summary will be displayed.
Returns: matplotlib.axes.Axes: The plot axis with the statistical summary text added.
- safep.AFEP_parse.do_general_figures_plot(args, fepruns, mean, sterr)
plot general SAFEP figures
- Parameters:
args (AFEPArguments) – arguments from argparse
fepruns (dict) – fepruns dictionary
mean (float) – mean free energy across replicas
sterr (float) – sterr free energy across replicas
- Returns:
figure and axes objects containing the figure panels
- Return type:
fig, axes
Plot lambda convergence for all replicas.
- Parameters:
args (AFEPArguments) – command line arguments
fepruns (dict) – fepruns dictionary
mean (float) – mean value across replicas
sterr (float) – sterr value across replicas
axes (list) – matplotlib axes list
- Returns:
figure and axes objects with the per lambda convergence
- Return type:
tuple[Fig, Axes]
Make the convergence plot (reverse and forward cumulative averages)
- Parameters:
args (AFEPArguments) – command line arguments
fepruns (dict) – fepruns dictionary
dGs (list) – list of free energies
- Returns:
figure and axes with FCA/RCA convergence
- Return type:
tuple(Fig, Axes)
- safep.AFEP_parse.get_summary_statistics(args, fepruns)
Extract summary statistics from fepruns.
- Parameters:
args (AFEPArguments) – parsed command line arguments
fepruns (dict) – dict of fepruns
- Returns:
a pretty print string, individual delta Gs, and the mean and standard error across replicas
- Return type:
tuple[str, list[float], float, float]
- safep.AFEP_parse.initialize_general_figure(RT_kcal_per_mol, key, feprun)
Create a new general figure
- Parameters:
RT_kcal_per_mol (float) – RT in kcal/mol
key (str) – The key of the initial dataset
feprun (safep.Feprun) – the feprun to plot
- Returns:
The figure and axes
- Return type:
Tuple(Fig,Axes)
- safep.AFEP_parse.main()
Main function for parsing fep data and calculating free energy of (de)coupling
- safep.AFEP_parse.make_figures(args, fepruns, dGs, mean, sterr) None
Make general SAFEP figures and convergence plots
- Parameters:
args (AFEPArguments) – arguments from argparse
fepruns (dict) – fepruns dictionary
dGs (list) – list of free energies
mean (float) – mean free energy across replicas
sterr (float) – sterr free energy across replicas
- Returns:
None
- Side effects:
Saves FEP_general_figures.pdf, FEP_convergence.pdf, and FEP_perLambda_convergence.pdf
safep.TI module
- safep.TI.harmonicWall_U(HW, coord, L)
- safep.TI.harmonicWall_dUdL(HW, coord, L)
- safep.TI.make_harmonicWall(FC=10, targetFC=0, targetFE=1, upperWalls=1, schedule=None, numSteps=1000, targetEQ=500, name='HW', lowerWalls=None, lambdaExp=1.0, decoupling=None)
This method should rarely be used, as long as a log file exists that can be parsed by parse_Colvars_log and the result fed to make_harmonicWall_from_Colvars
- safep.TI.make_harmonicWall_from_Colvars(restraint_conf)
Input: a restraint configuration as a dict produced by parse_Colvars_log
- safep.TI.plot_TI(cumulative, perWindow, width=8, height=4, PDFtype='KDE', hystLim=(-1, 1), color='#0072B2', fontsize=12)
- safep.TI.process_TI(dataTI, restraint, Lsched)
Arguments: the TI data, restraint, and lambda schedule Function: Calculate the free energy for each lambda value, aggregate the result, and estimate the error Returns: The free energies and associated errors as functions of lambda. Both per window and cumulative.
safep.estimators module
- safep.estimators.do_estimation(u_nk, method='both')
Run both exponential and BAR estimators and return the results in tidy dataframes. Arguments: u_nk in the alchemlyb format, method of fitting (String: BAR, EXP, or both) Returns: perWindow estimates (including errors), cumulative estimates (including errors)
- safep.estimators.get_BAR(bar)
Extract key information from an alchemlyb.BAR object. Useful for plotting. Arguments: a fitted BAR object Returns: l[ambdas], l_mid [lambda window midpoints], f [the cumulative free energy], df [the per-window free energy], ddf [the per-window errors], errors [the cumulative error]
- safep.estimators.get_dG_from_data(data, temperature)
- safep.estimators.get_exponential(u_nk)
Get exponential estimation of the change in free energy. Arguments: u_nk in alchemlyb format Returns: l[ambdas], l_mid [lambda window midpoints], dG_f [forward estimates], dG_b [backward estimates]
safep.fileIO module
- safep.fileIO.guess_lambda(fname)
Guess lambda based on file name (last number in the filename divided by 100). Not very good. Arguments: file name Returns: estimated lambda value
- safep.fileIO.parse_Colvars_log(filename)
Parses Colvars standard output from a log file
Returns dictionaries containing key-value pairs global_conf: top-level Colvars parameters colvars: list of dicts, one for each colvar biases: list of dicts, one for each bias
Note: within colvars, parameters of sub-objects (components and atom groups) are found as nested dictionaries in a list under the ‘children’ key, e.g.: colvars[0][‘children’][0] is the first component of the first colvar colvars[0][‘children’][0][‘children’][0] is the first atom group of that component
- safep.fileIO.read_FEPOUT(fileName, step=1)
Reads all data from a NAMD fepout file unlike alchemlyb’s parser. readFEPOUT reads each file in a single pass: keeping track of lambda values and appending each line to an array. The array is cast to a dataframe at the end to avoid appending to a dataframe. Arguments: fileName, step (stride) Returns: a dataframe containing all the data in a fepout file.
- safep.fileIO.read_Files(files, step=1)
Batch readFEPOUT Arguments: file (paths), step (stride) Returns: a list of dataframes, one dataframe per file
- safep.fileIO.read_UNK(filepath)
Read a u_nk that was written by saveUNK. Arguments: filepath Returns: u_nk
- safep.fileIO.save_UNK(u_nk, filepath, safety=True)
Write u_nk to a file Arguments: u_nk in the format of alchemlyb, filepath
safep.helpers module
- safep.helpers.cum_Fn(x, m, s)
Wrapper for the normal cumulative density function
- safep.helpers.get_Rsq(X, Y, Yexpected)
Calculate the coefficient of determination for arbitrary fits. Arguments: X (inputs), Y (experimental data), Yexpected (fitted or predicted data)
- safep.helpers.moving_average(x, w)
Generate a moving average over x with window width w Arguments: x (data), w (window width) Returns: a moving window average
- safep.helpers.pdf_Fn(x, m, s)
Wrapper for the normal probability density function
safep.plotting module
- safep.plotting.add_hyst_textbox(diff, pdf_x, pdf_y, pdf_ax, xoffset=0.15, yoffset=0.95, fs=14)
Add a textbox to a probability density function (PDF) plot.
Parameters: - diff (array-like): The differences between forward and backward estimates. - pdf_x (array-like): The x-values of the PDF plot. - pdf_y (array-like): The y-values of the PDF plot. - pdf_ax (matplotlib.axes.Axes): The axes object of the PDF plot. - xoffset, yoffset (floats): the offsets for placing the textbox (default:0.15,0.95, respectively) - fs: the fontsize (default:14)
Returns: - pdf_ax (matplotlib.axes.Axes): The updated axes object with the textbox added.
Description: This function adds a textbox to a probability density function (PDF) plot. The textbox displays the mode, mean, and standard deviation of the differences between forward and backward estimates. The mode is calculated as the x-value at which the PDF reaches its maximum. The mean and standard deviation are calculated using the numpy functions np.average() and np.std(), respectively.
The textbox is positioned at the coordinates (xoffset, yoffset) relative to the axes object’s transformation. The text is formatted using LaTeX syntax and includes the mode, mean, and standard deviation values.
Example: diff = [0.1, 0.2, 0.3, 0.4, 0.5] pdf_x = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5] pdf_y = [0.2, 0.4, 0.6, 0.8, 1.0, 0.8] pdf_ax = plt.gca()
pdf_ax = add_hyst_textbox(diff, pdf_x, pdf_y, pdf_ax) plt.show()
- safep.plotting.convergence_plot(theax, fs, ferr, bs, berr, fwd_color='#0072B2', bwd_color='#D55E00', fwd_legend=None, bwd_legend=None, fontsize=12, errorbars=True)
Convergence plot. Does the convergence plotting. Returns: a pyplot
- safep.plotting.do_conv_plot(ax, X, fs, ferr, fwd_color, label=None)
A minimalist convergence plot Arguments: ax[is], X (the X values), fs (forward estimates), ferr (forward errors), fwdColor (the forward color), label text Returns: ax[is]
- safep.plotting.fb_discrepancy_hist(dG_f, dG_b)
Plot the distribution of the hystersis Arguments: dG_f, dG_b Returns: a pyplot
- safep.plotting.fb_discrepancy_plot(l_mid, dG_f, dG_b)
Plot the forward-backward discrepancy (“hysteresis”) Arguments: l_mid (lambda window midpoints), dG_f (forward estimates), dG_b (backward estimates) Returns: a pyplot
- safep.plotting.plot_general(cumulative, cumulative_ylim, per_window, per_window_ylim, RT, width=8, height=4, pdf_type='KDE', fontsize=12, fig=None, axes=None, hysttype='classic', label=None, color='blue', errorbars=True, xlim=[-1, 1])
Plot the general analysis of cumulative and per-window changes in delta-G.
Arguments: - cumulative: A pandas DataFrame containing the cumulative changes in delta-G. - cumulativeYlim: The y-axis limits for the cumulative plot. - perWindow: A pandas DataFrame containing the per-window changes in delta-G. - perWindowYlim: The y-axis limits for the per-window plot. - RT: The gas constant times the temperature. - width: The width of the figure. - height: The height of the figure. - PDFtype: The type of probability density function to plot (either ‘KDE’ or ‘Histogram’). - fontsize: The font size of the plot labels. - fig: The figure object to use for plotting. - axes: The axes objects to use for plotting. - hysttype: The type of hysteresis plot to use (either ‘classic’ or ‘lines’). - label: The label for the plot. - color: The color of the plot. - errorbars: Whether to include error bars in the per-window plot.
Returns: - fig: The figure object. - axes: The axes objects.
- safep.plotting.plot_general_legacy(cumulative, cumulative_ylim, perWindow, per_window_ylim, RT, width=8, height=4, PDFtype='KDE', fontsize=12)
This is a deprecated plotting function
- safep.plotting.plot_hysteresis(axes, per_window, hysttype, color='blue', fontsize=12, pdf_type='KDE', xlim=[-1, 1], textbox=True)
- safep.plotting.plot_samples(theax, u_nk, color='blue', label='Raw Data')
Plot the number of samples against the lambda values.
- Parameters:
theax (matplotlib.axes.Axes) – The axes object to plot on.
u_nk (pd.DataFrame) – The input data frame containing the samples and lambda values.
color (str, optional) – The color of the plot line. Defaults to ‘blue’.
label (str, optional) – The label for the plot legend. Defaults to ‘Raw Data’.
- Returns:
The axes object after plotting the data.
- Return type:
matplotlib.axes.Axes
safep.processing module
- safep.processing.alt_convergence(u_nk, nbins)
An alternative convergence calculation that uses percentile blocks instead of cumulative samples. Inspired by block averages. Uses the BAR estimator. Arguments: u_nk, nbins (number of blocks) Returns: forward estimates, forward errors
- safep.processing.batch_process(paths, RT, decorrelate, pattern, temperature, detectEQ)
Read and process all NAMD FEPout files that match a given pattern Arguments: paths (list of strings), RT (float), decorrelate (boolean, whether or not to use alchemlyb’s decorrelation functions), pattern (to match), temperature, detectEQ (boolean, whether or not to use alchemlyb’s equilibrium detection) Returns: lists of: u_nks, cumulative estimates, perWindow estimates, an affix (that describes the data generation protocol)
- safep.processing.bootstrap_estimate(u_nk, estimator='BAR', iterations=100, schedule=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
Bootstrapped free energy estimates with variable sample sizes. E.g. if schedule=[10], each bootstrapped sample will only be 10% the size of the original Arguments: u_nk, estimator (BAR or EXP[onential]), iterations (number of bootstrapped datasets to generate for each sample size), schedule (percent data to sample) Returns: if estimator==’BAR’, N estimates for each schedule value. if estimator==’EXP’, N estimates of forward, backward, and the mean of the two for each schedule value
- safep.processing.decorrelate_u_nk(u_nk: DataFrame, method='dE') DataFrame
Decorrelate each window without equilibrium detection.
- Parameters:
u_nk (pd.DataFrame) – the energies in the alchemlyb style
method (str, optional) – decorrelation method. See subsampling.decorrelate_u_nk. Defaults to “dE”.
- Returns:
the decorrelated energies in the alchemlyb style
- Return type:
pd.DataFrame
- safep.processing.detect_equilibrium_u_nk(u_nk: DataFrame)
- safep.processing.do_convergence(u_nk, tau=1, num_points=10, estimator='BAR')
Convergence calculation. Incrementally adds data from either the start or the end of each windows simulation and calculates the resulting change in free energy. Arguments: u_nk, tau (an error scaling factor), num_points (number of chunks) Returns: forward-sampled estimate (starting from t=start), forward-sampled error, backward-sampled estimate (from t=end), backward-sampled error
- safep.processing.do_per_lambda_convergence(u_nk)
Convergence calculation. Incrementally adds data from either the start or the end of each windows simulation and calculates the resulting change in free energy. Arguments: u_nk, tau (an error scaling factor), num_points (number of chunks) Returns: forward-sampled estimate (starting from t=start), forward-sampled error, backward-sampled estimate (from t=end), backward-sampled error
- safep.processing.get_PDF(dG_f, dG_b, DiscrepancyFitting='LS', dx=0.01, binNum=20)
Estimate the PDF of the discrepancy data by fitting to a gaussian Arguments: Forward estimates, backward estimates, DiscrepancyFitting (‘LS’ least squares of ‘ML’ maximum likelihood), dx (sample width for fitted gaussian), binNum (number of bins for histogramming) Returns: (experimental) X, Y, pdfX, pdfY, fitted parameters, (fitted) pdfXnorm, pdfYnorm, pdfYexpected (for each experimental X)
- safep.processing.get_empirical_CI(allSamples, CI=0.95)
Get empirical confidence intervals from a dataframe Arguments: allSamples (the dataframe), CI (the interval to determine)[0.95] Returns: (upper bounds, lower bounds, means)
- safep.processing.get_limits(allSamples)
Get the mean and +/- 1 standard deviation for each group in a matrix Arguments: allSamples (the dataframe) Returns: (upper bounds, lower bounds, means)
- safep.processing.get_moving_ave_slope(X, FX, window)
Estimates the PDF from a moving window average of a CDF Arguments: X (inputs), FX (the CDF), window (width) Returns: slopes
- safep.processing.get_n_samples(u_nk: DataFrame) Series
- safep.processing.read_and_process(fepoutFiles, temperature, decorrelate, detectEQ)
Read NAMD fepout files for a single calculation and carry out any decorrelation or equilibrium detection Arguments: files to parse, temperature, decorrelate (boolean, whether or not to use alchemlyb’s decorrelation functions), detectEQ (boolean, whether or not to use alchemlyb’s equilibrium detection) Returns: u_nk
- safep.processing.subsample(unkGrps, lowPct, hiPct)
Subsamples a u_nk dataframe using percentiles [0-100] of data instead of absolute percents. Arguments: unkGrps (u_nk grouped by fep-lambda), lowPct (lower percentile bound), hiPct (upper percentile bound) Returns: partial (a u_nk in which each window is subsampled between the lower and upper percentile bounds)
- safep.processing.u_nk_from_DF(data, temperature, eqTime, warnings=True)
Experimental. caveat emptor. Generate an alchemlyb-type u_nk dataframe from a dense dataframe generated by readFEPOUT Arguments: data[frame from readFEPOUT], temperature, eqTime (time to equilibrium), warnings (boolean, whether or not to print warnings) Returns: u_nk
Module contents
safep
Tools for Analyzing and Debugging (SA)FEP calculations