gnt package¶
Submodules¶
gnt.cli module¶
Console script for gnt.
gnt.score module¶
score module. Functions to score genetic interactions
-
gnt.score.build_anchor_df(df)[source]¶ Build a dataframe where each guide is defined as an anchor and is paired with all other target guides
Parameters: df (DataFrame) – LFC df Returns: Return type: DataFrame
-
gnt.score.calc_dlfc(df, base_lfcs)[source]¶ Add together base lfcs to generate an expectation for each guide pair
Parameters: - df (DataFrame) – Combo level LFCs
- base_lfcs (DataFrame) – Base LFCs - single guide phenotype
Returns: delta LFCs for each guide pair
Return type: DataFrame
-
gnt.score.check_guide_input(df)[source]¶ Check whether input DataFrame has expected format
- Column 1: first guide identifier
- Column 2: second guide identifier
- Column 3: first gene identifier
- Column 4: second gene identifier
- Column 5+: LFC values from different conditions
Parameters: df (DataFrame) – LFC data with the specified columns
-
gnt.score.check_min_guide_pairs(df, min_pairs)[source]¶ Check that each guide is paired with a minimum number of guides
Parameters: - df (DataFrame) – Anchor df with column anchor_guide
- min_pairs (int) – minimum number of guides to be paired with
Returns: guides that are not paired fewer than min_pairs number of guides
Return type: List
-
gnt.score.combine_z_scores(df, stat)[source]¶ Combine z-score for a gene pair
Sum z-scores and divide by the square root of the number of observations
Parameters: - df (DataFrame) – guide level statistic
- stat (str) – statistic to combine
Returns: z_score for gene combination
Return type: DataFrame
-
gnt.score.filter_anchor_base_scores(df, min_pairs)[source]¶ Filter guides that are not in with a certain number of pairs
Parameters: - df (DataFrame) – DataFrame with column anchor_guide
- min_pairs (int) – Minimum number of pairs for an anchor_guide
Returns: Filtered dataframe if there are guides without the minimum number of guide pairs
Return type: DataFrame
-
gnt.score.fit_anchor_model(df, fit_genes, model, deg, x_col='lfc_target', y_col='lfc')[source]¶ Fit linear model for a single anchor guide paired with all target guides in a condition
Parameters: - df (DataFrame) – LFCs for a single anchor anchor guide
- fit_genes (list) – Genes used to fit the linear model
- model (str) – Name of model used to fit x and y data
- x_col (str, optional) – X column to fit model
- y_col (str, optional) – Y column to fit model
- deg (int) – Degrees of freedom for spline model
Returns: - DataFrame – Guide level residuals
- DataFrame – R^2 for model
-
gnt.score.fit_models(df, fit_genes, model, deg)[source]¶ Loop through anchor guides and fit a linear model
Parameters: - df (DataFrame) – LFCs for all anchor guides
- fit_genes (list) – Genes used to fit the linear model
- model (str) – Name of model used to fit x and y data
- deg (int) – Degrees of freedom for spline model
Returns: - DataFrame – Guide level residuals
- DataFrame – R^2 for each model
-
gnt.score.get_avg_score(df, score)[source]¶ Get avg score for gene pairs
Parameters: - df (DataFrame) – Guide-level DataFrame
- score (str) – Column to average
Returns: Average score for gene pairs
Return type: DataFrame
-
gnt.score.get_base_lfc_from_dlfc(dlfc_df)[source]¶ Calculate gene level base lfcs from the guide-level dLFC dataframe
Parameters: dlfc_df (DataFrame) – DataFrame of delta log-fold changes Returns: With columns gene, condition, base_lfc Return type: DataFrame
-
gnt.score.get_base_lfc_from_resid(residual_df)[source]¶ Calculate gene level base lfcs from the guide-level residual dataframe
Parameters: residual_df (DataFrame) – DataFrame of residuals Returns: With columns gene, condition, base_lfc Return type: DataFrame
-
gnt.score.get_base_score(df, ctl_genes)[source]¶ Get LFC of each guide when paired with controls
Parameters: - df (DataFrame) – Anchor guides paired with all of their target guides. Has columns “anchor_guide” and “lfc”
- ctl_genes (list) – Control genes
Returns: Median LFC for each anchor guide
Return type: DataFrame
-
gnt.score.get_gene_dlfcs(guide_dlfcs, stat='dlfc_z')[source]¶ Combine dLFCs at the gene level
Parameters: - guide_dlfcs (DataFrame) – Guide level dLFCs
- stat (str, optional) – Statistic to combine at the gene level
Returns: Gene level z_scores
Return type: DataFrame
-
gnt.score.get_gene_residuals(guide_residuals, stat='residual_z')[source]¶ Combine residuals at the gene level
Parameters: - guide_residuals (DataFrame) – Guide level residuals
- stat (str, optional) – Statistic to combine at the gene level
Returns: Gene level z_scores
Return type: DataFrame
-
gnt.score.get_guide_dlfcs(lfc_df, ctl_genes)[source]¶ Calculate delta-LFC’s at the guide level
Model the LFC of each combination as the sum of each guide when paired with controls. The difference from this expectation is the delta log2-fold change
Parameters: - lfc_df (DataFrame) –
LFCs from combinatorial screen
- Column 1: first guide identifier
- Column 2: second guide identifier
- Column 3: first gene identifier
- Column 4: second gene identifier
- Column 5+: LFC values from different conditions
- ctl_genes (list) – Negative control genes (e.g. nonessential, intronic, or no site)
Returns: delta LFCs for guide pairs
Return type: DataFrame
- lfc_df (DataFrame) –
-
gnt.score.get_guide_residuals(lfc_df, ctl_genes, fit_genes=None, min_pairs=5, model='linear', deg=6)[source]¶ Calculate guide-level residuals
Parameters: - lfc_df (DataFrame) – LFCs from combinatorial screen * Column 1 - first guide identifier * Column 2 - second guide identifier * Column 3 - first gene identifier * Column 4 - second gene identifier * Column 5+ - LFC values from different conditions
- ctl_genes (list) – Negative control genes (e.g. nonessential, intronic, or no site)
- fit_genes (list, optional) – Genes used to train each linear model. If None, uses all genes to fit. This can be helpful if we expect a large fraction of target_genes to be interactors
- min_pairs (int, optional) – Minimum number of pairs a guide must be in
- model (str, optional) – Name of model to fit to data
- deg (int, optional) – Degrees of freedom for spline model. Ignored if model is not spline
Returns: - DataFrame – Guide level residuals
- DataFrame – R-squared and f-statistic p-value for each linear model
-
gnt.score.get_no_control_guides(df, guide_base_score)[source]¶ Get guides that are not paired with controls
Parameters: - df (DataFrame) – DataFrame with the column anchor_gene
- guide_base_score (DataFrame) – Guide scores when paired with controls
Returns: Guides without control pairs
Return type: list
-
gnt.score.get_pop_stats(df, stat)[source]¶ Get mean and standard deviation for a stat
Parameters: - df (DataFrame) – Guide level scores
- stat (str) – Column to caclulate statistics from
Returns: Mean and standard deviation of the specified stat
Return type: DataFrame
-
gnt.score.get_removed_guides_genes(anchor_df, guides)[source]¶ Get dataframe of removed guides and genes
Parameters: - anchor_df (DataFrame) –
- guides (list) – List of guides being removed
Returns: Return type: DataFrame
-
gnt.score.join_anchor_base_score(anchor_df, base_df)[source]¶ Join anchor DataFrame with Base LFCs on anchor_guide
Parameters: - anchor_df (DataFrame) – DataFrame with anchor and target guides
- base_df (DataFrame) – Base LFC DataFrame
Returns: Return type: DataFrame
-
gnt.score.melt_df(df, id_cols=None, var_name='condition', value_name='lfc')[source]¶ Helper function to melt a DataFrame
Parameters: - df (DataFrame) – log fold chang dataframe
- id_cols (list, optional) – Specify id columns. If none, then the first four columns are used
- var_name (str, optional) – New variable name
- value_name (str, optional) – New value name.
Returns: Return type: DataFrame
-
gnt.score.model_fixed_slope(train_x, train_y, test_x, slope=1)[source]¶ Predict guide phenotype using fixed slope
From: https://stackoverflow.com/questions/33292969/linear-regression-with-specified-slope
Parameters: - train_x (Series) – Single gene phenotype for training genes
- train_y (Series) – Pair phenotype for training genes
- test_x (Series) – Single gene phenotype for testing genes
- slope (int) – Slope to fit model
Returns: - Series – Predicted phenotype of pair
- DataFrame – Information about model
-
gnt.score.model_linear(train_x, train_y, test_x)[source]¶ Predict guide phenotype using a linear model and ordinary least squares
Parameters: - train_x (Series) – Single gene phenotype for training genes
- train_y (Series) – Pair phenotype for training genes
- test_x (Series) – Single gene phenotype for testing genes
Returns: - Series – Predicted phenotype of pair
- DataFrame – Information about OLS model
-
gnt.score.model_quadratic(train_x, train_y, test_x)[source]¶ Predict guide phenotype using a linear model and ordinary least squares
Parameters: - train_x (Series) – Single gene phenotype for training genes
- train_y (Series) – Pair phenotype for training genes
- test_x (Series) – Single gene phenotype for testing genes
Returns: - Series – Predicted phenotype of pair
- DataFrame – Information about OLS model
-
gnt.score.model_spline(train_x, train_y, test_x, deg)[source]¶ Predict guide phenotype using a natural cubic spline
Parameters: - train_x (Series) – Single gene phenotype for training genes
- train_y (Series) – Pair phenotype for training genes
- test_x (Series) – Single gene phenotype for testing genes
- deg (int) – Degrees of freedom for spline model
Returns: - Series – Predicted phenotype of pair
- DataFrame – Information about GLM model
-
gnt.score.order_cols(df, cols, name)[source]¶ Reorder values in columns to be in alphabetical order
Parameters: - df (DataFrame) – DataFrame with at least two columns to be reodredered in alphabetical order
- cols (List) – Indices of columns to be reordered
- name (str) – Name of reordered column
Returns: With columns in alphabetical order
Return type: DataFrame
-
gnt.score.order_cols_with_meta(df, cols, meta_cols, col_name, meta_name)[source]¶ Reorder values in columns to be in alphabetical order, keeping track of columns with meta-information
Parameters: - df (DataFrame) – DataFrame with at least two columns to be reordered in alphabetical order
- cols (List) – Indices of columns to be reordered
- meta_cols (List) – Indices of columns with meta information, ordered the same as cols
- col_name (str) – Base name of reordered columns
- meta_name (str) – Base name of meta information columns
Returns: With columns in alphabetical order and their respective meta info
Return type: DataFrame
Module contents¶
Top-level package for gnt. Imports the score module