gnt package

Submodules

gnt.cli module

Console script for gnt.

gnt.score module

score module. Functions to score genetic interactions

gnt.score.aggregate_guide_lfcs(df)[source]
gnt.score.build_anchor_df(df)[source]

Build a dataframe where each guide is defined as an anchor and is paired with all other target guides

Parameters:df (DataFrame) – LFC df
Returns:
Return type:DataFrame
gnt.score.calc_dlfc(df, base_lfcs)[source]

Add together base lfcs to generate an expectation for each guide pair

Parameters:
  • df (DataFrame) – Combo level LFCs
  • base_lfcs (DataFrame) – Base LFCs - single guide phenotype
Returns:

delta LFCs for each guide pair

Return type:

DataFrame

gnt.score.check_guide_input(df)[source]

Check whether input DataFrame has expected format

  • Column 1: first guide identifier
  • Column 2: second guide identifier
  • Column 3: first gene identifier
  • Column 4: second gene identifier
  • Column 5+: LFC values from different conditions
Parameters:df (DataFrame) – LFC data with the specified columns
gnt.score.check_min_guide_pairs(df, min_pairs)[source]

Check that each guide is paired with a minimum number of guides

Parameters:
  • df (DataFrame) – Anchor df with column anchor_guide
  • min_pairs (int) – minimum number of guides to be paired with
Returns:

guides that are not paired fewer than min_pairs number of guides

Return type:

List

gnt.score.combine_z_scores(df, stat)[source]

Combine z-score for a gene pair

Sum z-scores and divide by the square root of the number of observations

Parameters:
  • df (DataFrame) – guide level statistic
  • stat (str) – statistic to combine
Returns:

z_score for gene combination

Return type:

DataFrame

gnt.score.filter_anchor_base_scores(df, min_pairs)[source]

Filter guides that are not in with a certain number of pairs

Parameters:
  • df (DataFrame) – DataFrame with column anchor_guide
  • min_pairs (int) – Minimum number of pairs for an anchor_guide
Returns:

Filtered dataframe if there are guides without the minimum number of guide pairs

Return type:

DataFrame

gnt.score.fit_anchor_model(df, fit_genes, model, deg, x_col='lfc_target', y_col='lfc')[source]

Fit linear model for a single anchor guide paired with all target guides in a condition

Parameters:
  • df (DataFrame) – LFCs for a single anchor anchor guide
  • fit_genes (list) – Genes used to fit the linear model
  • model (str) – Name of model used to fit x and y data
  • x_col (str, optional) – X column to fit model
  • y_col (str, optional) – Y column to fit model
  • deg (int) – Degrees of freedom for spline model
Returns:

  • DataFrame – Guide level residuals
  • DataFrame – R^2 for model

gnt.score.fit_models(df, fit_genes, model, deg)[source]

Loop through anchor guides and fit a linear model

Parameters:
  • df (DataFrame) – LFCs for all anchor guides
  • fit_genes (list) – Genes used to fit the linear model
  • model (str) – Name of model used to fit x and y data
  • deg (int) – Degrees of freedom for spline model
Returns:

  • DataFrame – Guide level residuals
  • DataFrame – R^2 for each model

gnt.score.get_avg_score(df, score)[source]

Get avg score for gene pairs

Parameters:
  • df (DataFrame) – Guide-level DataFrame
  • score (str) – Column to average
Returns:

Average score for gene pairs

Return type:

DataFrame

gnt.score.get_base_lfc_from_dlfc(dlfc_df)[source]

Calculate gene level base lfcs from the guide-level dLFC dataframe

Parameters:dlfc_df (DataFrame) – DataFrame of delta log-fold changes
Returns:With columns gene, condition, base_lfc
Return type:DataFrame
gnt.score.get_base_lfc_from_resid(residual_df)[source]

Calculate gene level base lfcs from the guide-level residual dataframe

Parameters:residual_df (DataFrame) – DataFrame of residuals
Returns:With columns gene, condition, base_lfc
Return type:DataFrame
gnt.score.get_base_score(df, ctl_genes)[source]

Get LFC of each guide when paired with controls

Parameters:
  • df (DataFrame) – Anchor guides paired with all of their target guides. Has columns “anchor_guide” and “lfc”
  • ctl_genes (list) – Control genes
Returns:

Median LFC for each anchor guide

Return type:

DataFrame

gnt.score.get_gene_dlfcs(guide_dlfcs, stat='dlfc_z')[source]

Combine dLFCs at the gene level

Parameters:
  • guide_dlfcs (DataFrame) – Guide level dLFCs
  • stat (str, optional) – Statistic to combine at the gene level
Returns:

Gene level z_scores

Return type:

DataFrame

gnt.score.get_gene_residuals(guide_residuals, stat='residual_z')[source]

Combine residuals at the gene level

Parameters:
  • guide_residuals (DataFrame) – Guide level residuals
  • stat (str, optional) – Statistic to combine at the gene level
Returns:

Gene level z_scores

Return type:

DataFrame

gnt.score.get_guide_dlfcs(lfc_df, ctl_genes)[source]

Calculate delta-LFC’s at the guide level

Model the LFC of each combination as the sum of each guide when paired with controls. The difference from this expectation is the delta log2-fold change

Parameters:
  • lfc_df (DataFrame) –

    LFCs from combinatorial screen

    • Column 1: first guide identifier
    • Column 2: second guide identifier
    • Column 3: first gene identifier
    • Column 4: second gene identifier
    • Column 5+: LFC values from different conditions
  • ctl_genes (list) – Negative control genes (e.g. nonessential, intronic, or no site)
Returns:

delta LFCs for guide pairs

Return type:

DataFrame

gnt.score.get_guide_residuals(lfc_df, ctl_genes, fit_genes=None, min_pairs=5, model='linear', deg=6)[source]

Calculate guide-level residuals

Parameters:
  • lfc_df (DataFrame) – LFCs from combinatorial screen * Column 1 - first guide identifier * Column 2 - second guide identifier * Column 3 - first gene identifier * Column 4 - second gene identifier * Column 5+ - LFC values from different conditions
  • ctl_genes (list) – Negative control genes (e.g. nonessential, intronic, or no site)
  • fit_genes (list, optional) – Genes used to train each linear model. If None, uses all genes to fit. This can be helpful if we expect a large fraction of target_genes to be interactors
  • min_pairs (int, optional) – Minimum number of pairs a guide must be in
  • model (str, optional) – Name of model to fit to data
  • deg (int, optional) – Degrees of freedom for spline model. Ignored if model is not spline
Returns:

  • DataFrame – Guide level residuals
  • DataFrame – R-squared and f-statistic p-value for each linear model

gnt.score.get_no_control_guides(df, guide_base_score)[source]

Get guides that are not paired with controls

Parameters:
  • df (DataFrame) – DataFrame with the column anchor_gene
  • guide_base_score (DataFrame) – Guide scores when paired with controls
Returns:

Guides without control pairs

Return type:

list

gnt.score.get_pop_stats(df, stat)[source]

Get mean and standard deviation for a stat

Parameters:
  • df (DataFrame) – Guide level scores
  • stat (str) – Column to caclulate statistics from
Returns:

Mean and standard deviation of the specified stat

Return type:

DataFrame

gnt.score.get_removed_guides_genes(anchor_df, guides)[source]

Get dataframe of removed guides and genes

Parameters:
  • anchor_df (DataFrame) –
  • guides (list) – List of guides being removed
Returns:

Return type:

DataFrame

gnt.score.join_anchor_base_score(anchor_df, base_df)[source]

Join anchor DataFrame with Base LFCs on anchor_guide

Parameters:
  • anchor_df (DataFrame) – DataFrame with anchor and target guides
  • base_df (DataFrame) – Base LFC DataFrame
Returns:

Return type:

DataFrame

gnt.score.melt_df(df, id_cols=None, var_name='condition', value_name='lfc')[source]

Helper function to melt a DataFrame

Parameters:
  • df (DataFrame) – log fold chang dataframe
  • id_cols (list, optional) – Specify id columns. If none, then the first four columns are used
  • var_name (str, optional) – New variable name
  • value_name (str, optional) – New value name.
Returns:

Return type:

DataFrame

gnt.score.model_fixed_slope(train_x, train_y, test_x, slope=1)[source]

Predict guide phenotype using fixed slope

From: https://stackoverflow.com/questions/33292969/linear-regression-with-specified-slope

Parameters:
  • train_x (Series) – Single gene phenotype for training genes
  • train_y (Series) – Pair phenotype for training genes
  • test_x (Series) – Single gene phenotype for testing genes
  • slope (int) – Slope to fit model
Returns:

  • Series – Predicted phenotype of pair
  • DataFrame – Information about model

gnt.score.model_linear(train_x, train_y, test_x)[source]

Predict guide phenotype using a linear model and ordinary least squares

Parameters:
  • train_x (Series) – Single gene phenotype for training genes
  • train_y (Series) – Pair phenotype for training genes
  • test_x (Series) – Single gene phenotype for testing genes
Returns:

  • Series – Predicted phenotype of pair
  • DataFrame – Information about OLS model

gnt.score.model_quadratic(train_x, train_y, test_x)[source]

Predict guide phenotype using a linear model and ordinary least squares

Parameters:
  • train_x (Series) – Single gene phenotype for training genes
  • train_y (Series) – Pair phenotype for training genes
  • test_x (Series) – Single gene phenotype for testing genes
Returns:

  • Series – Predicted phenotype of pair
  • DataFrame – Information about OLS model

gnt.score.model_spline(train_x, train_y, test_x, deg)[source]

Predict guide phenotype using a natural cubic spline

Parameters:
  • train_x (Series) – Single gene phenotype for training genes
  • train_y (Series) – Pair phenotype for training genes
  • test_x (Series) – Single gene phenotype for testing genes
  • deg (int) – Degrees of freedom for spline model
Returns:

  • Series – Predicted phenotype of pair
  • DataFrame – Information about GLM model

gnt.score.order_cols(df, cols, name)[source]

Reorder values in columns to be in alphabetical order

Parameters:
  • df (DataFrame) – DataFrame with at least two columns to be reodredered in alphabetical order
  • cols (List) – Indices of columns to be reordered
  • name (str) – Name of reordered column
Returns:

With columns in alphabetical order

Return type:

DataFrame

gnt.score.order_cols_with_meta(df, cols, meta_cols, col_name, meta_name)[source]

Reorder values in columns to be in alphabetical order, keeping track of columns with meta-information

Parameters:
  • df (DataFrame) – DataFrame with at least two columns to be reordered in alphabetical order
  • cols (List) – Indices of columns to be reordered
  • meta_cols (List) – Indices of columns with meta information, ordered the same as cols
  • col_name (str) – Base name of reordered columns
  • meta_name (str) – Base name of meta information columns
Returns:

With columns in alphabetical order and their respective meta info

Return type:

DataFrame

gnt.score.remove_guides(df, rm_guides)[source]

Remove guides from DataFrame

Parameters:
  • df (DataFrame) – Dataframe with columns for guides
  • rm_guides (list) – List of guides to remove
Returns:

Filtered to remove guides

Return type:

DataFrame

gnt.score.warn_no_control_guides(anchor_df, no_control_guides)[source]

Give warning for guides with control pairs

Parameters:
  • anchor_df (DataFrame) –
  • no_control_guides (list) –

Warning

Guides without control pairs

Module contents

Top-level package for gnt. Imports the score module