gnt package¶

Submodules¶

gnt.cli module¶

Console script for gnt.

gnt.score module¶

score module. Functions to score genetic interactions

gnt.score.aggregate_guide_lfcs(df)[source]¶

gnt.score.build_anchor_df(df)[source]¶

Build a dataframe where each guide is defined as an anchor and is paired with all other target guides

Parameters:	df (DataFrame) – LFC df
Returns:
Return type:	DataFrame

gnt.score.calc_dlfc(df, base_lfcs)[source]¶

Add together base lfcs to generate an expectation for each guide pair

Parameters:	df (DataFrame) – Combo level LFCs base_lfcs (DataFrame) – Base LFCs - single guide phenotype
Returns:	delta LFCs for each guide pair
Return type:	DataFrame

gnt.score.check_guide_input(df)[source]¶

Check whether input DataFrame has expected format

Column 1: first guide identifier
Column 2: second guide identifier
Column 3: first gene identifier
Column 4: second gene identifier
Column 5+: LFC values from different conditions

Parameters:	df (DataFrame) – LFC data with the specified columns

gnt.score.check_min_guide_pairs(df, min_pairs)[source]¶

Check that each guide is paired with a minimum number of guides

Parameters:	df (DataFrame) – Anchor df with column anchor_guide min_pairs (int) – minimum number of guides to be paired with
Returns:	guides that are not paired fewer than min_pairs number of guides
Return type:	List

gnt.score.combine_z_scores(df, stat)[source]¶

Combine z-score for a gene pair

Sum z-scores and divide by the square root of the number of observations

Parameters:	df (DataFrame) – guide level statistic stat (str) – statistic to combine
Returns:	z_score for gene combination
Return type:	DataFrame

gnt.score.filter_anchor_base_scores(df, min_pairs)[source]¶

Filter guides that are not in with a certain number of pairs

Parameters:	df (DataFrame) – DataFrame with column anchor_guide min_pairs (int) – Minimum number of pairs for an anchor_guide
Returns:	Filtered dataframe if there are guides without the minimum number of guide pairs
Return type:	DataFrame

gnt.score.fit_anchor_model(df, fit_genes, model, deg, x_col='lfc_target', y_col='lfc')[source]¶

Fit linear model for a single anchor guide paired with all target guides in a condition

Parameters:

Parameters:	df (DataFrame) – LFCs for a single anchor anchor guide fit_genes (list) – Genes used to fit the linear model model (str) – Name of model used to fit x and y data x_col (str, optional) – X column to fit model y_col (str, optional) – Y column to fit model deg (int) – Degrees of freedom for spline model
Returns:	DataFrame – Guide level residuals DataFrame – R^2 for model

df (DataFrame) – LFCs for a single anchor anchor guide
fit_genes (list) – Genes used to fit the linear model
model (str) – Name of model used to fit x and y data
x_col (str, optional) – X column to fit model
y_col (str, optional) – Y column to fit model
deg (int) – Degrees of freedom for spline model

Returns:

DataFrame – Guide level residuals
DataFrame – R^2 for model

gnt.score.fit_models(df, fit_genes, model, deg)[source]¶

Loop through anchor guides and fit a linear model

Parameters:

Parameters:	df (DataFrame) – LFCs for all anchor guides fit_genes (list) – Genes used to fit the linear model model (str) – Name of model used to fit x and y data deg (int) – Degrees of freedom for spline model
Returns:	DataFrame – Guide level residuals DataFrame – R^2 for each model

df (DataFrame) – LFCs for all anchor guides
fit_genes (list) – Genes used to fit the linear model
model (str) – Name of model used to fit x and y data
deg (int) – Degrees of freedom for spline model

Returns:

DataFrame – Guide level residuals
DataFrame – R^2 for each model

gnt.score.get_avg_score(df, score)[source]¶

Get avg score for gene pairs

Parameters:	df (DataFrame) – Guide-level DataFrame score (str) – Column to average
Returns:	Average score for gene pairs
Return type:	DataFrame

gnt.score.get_base_lfc_from_dlfc(dlfc_df)[source]¶

Calculate gene level base lfcs from the guide-level dLFC dataframe

Parameters:	dlfc_df (DataFrame) – DataFrame of delta log-fold changes
Returns:	With columns gene, condition, base_lfc
Return type:	DataFrame

gnt.score.get_base_lfc_from_resid(residual_df)[source]¶

Calculate gene level base lfcs from the guide-level residual dataframe

Parameters:	residual_df (DataFrame) – DataFrame of residuals
Returns:	With columns gene, condition, base_lfc
Return type:	DataFrame

gnt.score.get_base_score(df, ctl_genes)[source]¶

Get LFC of each guide when paired with controls

Parameters:	df (DataFrame) – Anchor guides paired with all of their target guides. Has columns “anchor_guide” and “lfc” ctl_genes (list) – Control genes
Returns:	Median LFC for each anchor guide
Return type:	DataFrame

gnt.score.get_gene_dlfcs(guide_dlfcs, stat='dlfc_z')[source]¶

Combine dLFCs at the gene level

Parameters:	guide_dlfcs (DataFrame) – Guide level dLFCs stat (str, optional) – Statistic to combine at the gene level
Returns:	Gene level z_scores
Return type:	DataFrame

gnt.score.get_gene_residuals(guide_residuals, stat='residual_z')[source]¶

Combine residuals at the gene level

Parameters:	guide_residuals (DataFrame) – Guide level residuals stat (str, optional) – Statistic to combine at the gene level
Returns:	Gene level z_scores
Return type:	DataFrame

gnt.score.get_guide_dlfcs(lfc_df, ctl_genes)[source]¶

Calculate delta-LFC’s at the guide level

Model the LFC of each combination as the sum of each guide when paired with controls. The difference from this expectation is the delta log2-fold change

Parameters:	lfc_df (DataFrame) – LFCs from combinatorial screen Column 1: first guide identifier Column 2: second guide identifier Column 3: first gene identifier Column 4: second gene identifier Column 5+: LFC values from different conditions ctl_genes (list) – Negative control genes (e.g. nonessential, intronic, or no site)
Returns:	delta LFCs for guide pairs
Return type:	DataFrame

gnt.score.get_guide_residuals(lfc_df, ctl_genes, fit_genes=None, min_pairs=5, model='linear', deg=6)[source]¶

Calculate guide-level residuals

Parameters:

Parameters:	lfc_df (DataFrame) – LFCs from combinatorial screen * Column 1 - first guide identifier * Column 2 - second guide identifier * Column 3 - first gene identifier * Column 4 - second gene identifier * Column 5+ - LFC values from different conditions ctl_genes (list) – Negative control genes (e.g. nonessential, intronic, or no site) fit_genes (list, optional) – Genes used to train each linear model. If None, uses all genes to fit. This can be helpful if we expect a large fraction of target_genes to be interactors min_pairs (int, optional) – Minimum number of pairs a guide must be in model (str, optional) – Name of model to fit to data deg (int, optional) – Degrees of freedom for spline model. Ignored if model is not spline
Returns:	DataFrame – Guide level residuals DataFrame – R-squared and f-statistic p-value for each linear model

lfc_df (DataFrame) – LFCs from combinatorial screen * Column 1 - first guide identifier * Column 2 - second guide identifier * Column 3 - first gene identifier * Column 4 - second gene identifier * Column 5+ - LFC values from different conditions
ctl_genes (list) – Negative control genes (e.g. nonessential, intronic, or no site)
fit_genes (list, optional) – Genes used to train each linear model. If None, uses all genes to fit. This can be helpful if we expect a large fraction of target_genes to be interactors
min_pairs (int, optional) – Minimum number of pairs a guide must be in
model (str, optional) – Name of model to fit to data
deg (int, optional) – Degrees of freedom for spline model. Ignored if model is not spline

Returns:

DataFrame – Guide level residuals
DataFrame – R-squared and f-statistic p-value for each linear model

gnt.score.get_no_control_guides(df, guide_base_score)[source]¶

Get guides that are not paired with controls

Parameters:	df (DataFrame) – DataFrame with the column anchor_gene guide_base_score (DataFrame) – Guide scores when paired with controls
Returns:	Guides without control pairs
Return type:	list

gnt.score.get_pop_stats(df, stat)[source]¶

Get mean and standard deviation for a stat

Parameters:	df (DataFrame) – Guide level scores stat (str) – Column to caclulate statistics from
Returns:	Mean and standard deviation of the specified stat
Return type:	DataFrame

gnt.score.get_removed_guides_genes(anchor_df, guides)[source]¶

Get dataframe of removed guides and genes

Parameters:	anchor_df (DataFrame) – guides (list) – List of guides being removed
Returns:
Return type:	DataFrame

gnt.score.join_anchor_base_score(anchor_df, base_df)[source]¶

Join anchor DataFrame with Base LFCs on anchor_guide

Parameters:	anchor_df (DataFrame) – DataFrame with anchor and target guides base_df (DataFrame) – Base LFC DataFrame
Returns:
Return type:	DataFrame

gnt.score.melt_df(df, id_cols=None, var_name='condition', value_name='lfc')[source]¶

Helper function to melt a DataFrame

Parameters:	df (DataFrame) – log fold chang dataframe id_cols (list, optional) – Specify id columns. If none, then the first four columns are used var_name (str, optional) – New variable name value_name (str, optional) – New value name.
Returns:
Return type:	DataFrame

gnt.score.model_fixed_slope(train_x, train_y, test_x, slope=1)[source]¶

Predict guide phenotype using fixed slope

From: https://stackoverflow.com/questions/33292969/linear-regression-with-specified-slope

Parameters:

Parameters:	train_x (Series) – Single gene phenotype for training genes train_y (Series) – Pair phenotype for training genes test_x (Series) – Single gene phenotype for testing genes slope (int) – Slope to fit model
Returns:	Series – Predicted phenotype of pair DataFrame – Information about model

train_x (Series) – Single gene phenotype for training genes
train_y (Series) – Pair phenotype for training genes
test_x (Series) – Single gene phenotype for testing genes
slope (int) – Slope to fit model

Returns:

Series – Predicted phenotype of pair
DataFrame – Information about model

gnt.score.model_linear(train_x, train_y, test_x)[source]¶

Predict guide phenotype using a linear model and ordinary least squares

Parameters:

Parameters:	train_x (Series) – Single gene phenotype for training genes train_y (Series) – Pair phenotype for training genes test_x (Series) – Single gene phenotype for testing genes
Returns:	Series – Predicted phenotype of pair DataFrame – Information about OLS model

train_x (Series) – Single gene phenotype for training genes
train_y (Series) – Pair phenotype for training genes
test_x (Series) – Single gene phenotype for testing genes

Returns:

Series – Predicted phenotype of pair
DataFrame – Information about OLS model

gnt.score.model_quadratic(train_x, train_y, test_x)[source]¶

Predict guide phenotype using a linear model and ordinary least squares

Parameters:

Parameters:	train_x (Series) – Single gene phenotype for training genes train_y (Series) – Pair phenotype for training genes test_x (Series) – Single gene phenotype for testing genes
Returns:	Series – Predicted phenotype of pair DataFrame – Information about OLS model

train_x (Series) – Single gene phenotype for training genes
train_y (Series) – Pair phenotype for training genes
test_x (Series) – Single gene phenotype for testing genes

Returns:

Series – Predicted phenotype of pair
DataFrame – Information about OLS model

gnt.score.model_spline(train_x, train_y, test_x, deg)[source]¶

Predict guide phenotype using a natural cubic spline

Parameters:

Parameters:	train_x (Series) – Single gene phenotype for training genes train_y (Series) – Pair phenotype for training genes test_x (Series) – Single gene phenotype for testing genes deg (int) – Degrees of freedom for spline model
Returns:	Series – Predicted phenotype of pair DataFrame – Information about GLM model

train_x (Series) – Single gene phenotype for training genes
train_y (Series) – Pair phenotype for training genes
test_x (Series) – Single gene phenotype for testing genes
deg (int) – Degrees of freedom for spline model

Returns:

Series – Predicted phenotype of pair
DataFrame – Information about GLM model

gnt.score.order_cols(df, cols, name)[source]¶

Reorder values in columns to be in alphabetical order

Parameters:	df (DataFrame) – DataFrame with at least two columns to be reodredered in alphabetical order cols (List) – Indices of columns to be reordered name (str) – Name of reordered column
Returns:	With columns in alphabetical order
Return type:	DataFrame

gnt.score.order_cols_with_meta(df, cols, meta_cols, col_name, meta_name)[source]¶

Reorder values in columns to be in alphabetical order, keeping track of columns with meta-information

Parameters:	df (DataFrame) – DataFrame with at least two columns to be reordered in alphabetical order cols (List) – Indices of columns to be reordered meta_cols (List) – Indices of columns with meta information, ordered the same as cols col_name (str) – Base name of reordered columns meta_name (str) – Base name of meta information columns
Returns:	With columns in alphabetical order and their respective meta info
Return type:	DataFrame

gnt.score.remove_guides(df, rm_guides)[source]¶

Remove guides from DataFrame

Parameters:	df (DataFrame) – Dataframe with columns for guides rm_guides (list) – List of guides to remove
Returns:	Filtered to remove guides
Return type:	DataFrame

gnt.score.warn_no_control_guides(anchor_df, no_control_guides)[source]¶

Give warning for guides with control pairs

Parameters:	anchor_df (DataFrame) – no_control_guides (list) –

Warning

Guides without control pairs

Module contents¶

Top-level package for gnt. Imports the score module