compute highly variable features
calculateHVF(
gobject,
spat_unit = NULL,
feat_type = NULL,
expression_values = c("normalized", "scaled", "custom"),
method = c("cov_groups", "cov_loess", "var_p_resid"),
reverse_log_scale = FALSE,
logbase = 2,
expression_threshold = 0,
nr_expression_groups = 20,
zscore_threshold = 1.5,
HVFname = "hvf",
difference_in_cov = 0.1,
var_threshold = 1.5,
var_number = NULL,
random_subset = NULL,
set_seed = TRUE,
seed_number = 1234,
show_plot = NULL,
return_plot = NULL,
save_plot = NULL,
save_param = list(),
default_save_name = "HVFplot",
return_gobject = TRUE,
calc_gini = FALSE,
verbose = TRUE
)giotto object
spatial unit
feature type
expression values to use
method to calculate highly variable features
reverse log-scale of expression values (default = FALSE)
if reverse_log_scale is TRUE, which log base was used?
expression threshold to consider a gene detected
(cov_groups) number of expression groups for cov_groups
(cov_groups) zscore to select hvg for cov_groups
name for highly variable features in cell metadata
(cov_loess) minimum difference in coefficient of variance required
(var_p_resid) variance threshold for features for var_p_resid method
(var_p_resid) number of top variance features for var_p_resid method
random subset to perform HVF detection on.
Passing NULL runs HVF on all cells.
logical. whether to set a seed when random_subset is used
seed number to use when random_subset is used
show plot
return ggplot object (overridden by return_gobject)
logical. directly save the plot
list of saving parameters from
GiottoVisuals::all_plots_save_function()
default save name for saving, don't change, change save_name in save_param
boolean: return giotto object (default = TRUE)
logical. Whether to calculate Gini index for each feature. Set to FALSE for performance with large datasets or dbMatrix objects.
be verbose
giotto object highly variable features appended to feature metadata
(fDataDT())
Currently we provide 2 ways to calculate highly variable genes:
1. high coeff of variance (COV) within groups:
First genes are binned (nr_expression_groups) into average expression
groups and the COV for each feature is converted into a z-score within each
bin. Features with a z-score higher than the threshold
(zscore_threshold) are considered highly variable.
2. high COV based on loess regression prediction:
A predicted COV is calculated for each feature using loess regression
(COV~log(mean expression))
Features that show a higher than predicted COV (difference_in_cov)
are considered highly variable.
g <- GiottoData::loadGiottoMini("visium")
#> 1. read Giotto object
#> 2. read Giotto feature information
#> 3. read Giotto spatial information
#> 4. read Giotto image information
#> python already initialized in this session
#> active environment : '/usr/bin/python3'
#> python version : 3.12
calculateHVF(g)
#> hvf has already been used, will be overwritten
#> An object of class giotto
#> >Active spat_unit: cell
#> >Active feat_type: rna
#> dimensions : 634, 624 (features, cells)
#> [SUBCELLULAR INFO]
#> polygons : cell
#> [AGGREGATE INFO]
#> expression -----------------------
#> [cell][rna] raw normalized scaled
#> spatial locations ----------------
#> [cell] raw
#> spatial networks -----------------
#> [cell] Delaunay_network spatial_network
#> spatial enrichments --------------
#> [cell][rna] cluster_metagene DWLS
#> dim reduction --------------------
#> [cell][rna] pca custom_pca umap custom_umap tsne
#> nearest neighbor networks --------
#> [cell][rna] sNN.pca custom_NN
#> attached images ------------------
#> images : alignment image
#>
#>
#> Use objHistory() to see steps and params used