Spatial Features#

Author: Clarence Mah | Last Updated: May 15, 2023

Here we demonstrate how to compute spatial features with. We will use the included MERFISH U2-OS dataset.

Load Libraries#

%load_ext autoreload
%autoreload 2

import bento as bt
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

Load Data#

adata = bt.ds.sample_data()

adata = adata[adata.obs[“nucleus_shape”] != None] bt.sync(adata) adata

bt.pl.density(adata)
../_images/118bc7c92395225f0e21963d1815f9b79edfda72bd1efe61f92534628e737923.png

Shape Features#

In bento we refer to cell membrane and other subcellular boundaries, e.g. nuclear membrane, as shapes. We can easily inspect morphological properties of these shapes with a number of built-in shape features. Call bt.tl.list_shape_features() to list available features.

bt.tl.list_shape_features()
{'area': 'Compute the area of each shape.',
 'aspect_ratio': 'Compute the aspect ratio of the minimum rotated rectangle that contains each shape.',
 'bounds': 'Compute the minimum and maximum coordinate values that bound each shape.',
 'density': 'Compute the RNA density of each shape.',
 'perimeter': 'Compute the perimeter of each shape.',
 'radius': 'Compute the radius of each cell.',
 'raster': 'Generate a grid of points contained within each shape. The points lie on\n    a 2D grid, with vertices spaced `step` distance apart.',
 'second_moment': 'Compute the second moment of each shape.',
 'span': 'Compute the length of the longest diagonal of each shape.'}

For convenience, bt.tl.obs_stats() computes the area, aspect ratio, and density properties for the cell and nucleus shapes.

bt.tl.obs_stats(adata)
AnnData object modified:
    obs:
        + cell_density, cell_aspect_ratio, cell_area
AnnData object modified:
    obs:
        + nucleus_aspect_ratio, nucleus_area, nucleus_density

To visualize feature distributions, run the equivalent plotting function, bt.pl.obs_stats().

The strip plot shows individual cells and the boxes show the 4 quantiles, where the blue are the lower two quantiles and the pink are the upper two quantiles. The vertical black line denotes the 50th quantile.

Tip

Spot outlier cells to investigate more closely. These may look very different as a result of poor segmentation or drastically different cell morphology.

bt.pl.obs_stats(adata)
../_images/1f4ec33d3a93c61b88207b01c7ed43ca32f1778f67bc6f076fa653807fb358cf.png

You may be interested in additional features; the main function you will use is bt.tl.analyze_shapes(). Pass cell_shape and area to compute the area for every cell.

bt.tl.analyze_shapes(adata, "cell_shape", "area")

Or pass lists of shapes and feature names to compute all combinations simultaneously.

bt.tl.analyze_shapes(
    adata, ["cell_shape", "nucleus_shape"], ["radius", "span", "perimeter"]
)
AnnData object modified:
    obs:
        + nucleus_perimeter, cell_span, nucleus_radius, cell_perimeter, nucleus_span, cell_radius
bt.pl.obs_stats(
    adata,
    obs_cols=[
        "cell_area",
        "cell_aspect_ratio",
        "cell_density",
        "nucleus_area",
        "nucleus_aspect_ratio",
        "nucleus_density",
        "nucleus_perimeter",
        "cell_perimeter",
    ],
)
../_images/670f15b9299a2a6ebab9156a653ee74ecd21962984ca2e830d92ef61c6491d40.png

You can use standard python data manipulation/visualization tools to explore features i.e. pandas and seaborn.

Note

Bento tries to simplify quantifying these spatial features, so it is conveninent for downstream exploratory tasks to utilize these feature sets e.g. for studying relationships between cell morphology and other phenotypes, building classifiers etc.

sns.pairplot(
    data=adata.obs[["cell_area", "cell_perimeter", "cell_aspect_ratio"]],
    kind="reg",
)
<seaborn.axisgrid.PairGrid at 0x292f4a4f0>
../_images/4a180a2f2c5352a48a91783f9a232979cddb78db74fdcbfb724802387798bc4d.png

Let’s inspect some potential outliers by plotting cells with extreme nucleus_density values.

top_cells = adata.obs["nucleus_density"].sort_values(ascending=False).index[:5]
bottom_cells = adata.obs["nucleus_density"].sort_values(ascending=True).index[:5]
cells = [*top_cells, *bottom_cells]

fig, axes = plt.subplots(2, 5, figsize=(10, 4))
for cell, ax in zip(cells, axes.flat):
    bt.pl.density(adata[cell], ax=ax, square=True, title="", binwidth=20)
    bt.pl.shapes(adata[cell], shapes="nucleus_shape", color="red", ax=ax)
../_images/7b7f4be39073a7141e76d627bd44d454c1ba7a2d71a9563e58a1ab2b47f788c7.png

Point Features#

In addition to shape-level features, we can compute subcellular spatial features for arbitrary groups of points, e.g. for every gene.

List available features with bt.tl.list_point_features.

bt.tl.list_point_features()
{'proximity': 'For a set of points, computes the proximity of points within `shape_name` as well as the proximity of points outside `shape_name`. Proximity is defined as the average absolute distance to the specified `shape_name` normalized by cell radius. Values closer to 0 denote farther from the `shape_name`, values closer to 1 denote closer to the `shape_name`.',
 'asymmetry': 'For a set of points, computes the asymmetry of points within `shape_name` as well as the asymmetry of points outside `shape_name`. Asymmetry is defined as the offset between the centroid of points to the centroid of the specified `shape_name`, normalized by cell radius. Values closer to 0 denote symmetry, values closer to 1 denote asymmetry.',
 'point_dispersion_norm': 'For a set of points, calculates the second moment of all points in a cell relative to the centroid of the total RNA signal. This value is normalized by the second moment of a uniform distribution within the cell boundary.',
 'shape_dispersion_norm': 'For a set of points, calculates the second moment of all points in a cell relative to the centroid of `shape_name`. This value is normalized by the second moment of a uniform distribution within the cell boundary.',
 'distance': 'For a set of points, computes the distance of points within `shape_name` as well as the distance of points outside `shape_name`.',
 'offset': 'For a set of points, computes the offset of points within `shape_name` as well as the offset of points outside `shape_name`. Offset is defined as the offset between the centroid of points to the centroid of the specified `shape_name`.',
 'point_dispersion': 'For a set of points, calculates the second moment of all points in a cell relative to the centroid of the total RNA signal.',
 'shape_dispersion': 'For a set of points, calculates the second moment of all points in a cell relative to the centroid of `shape_name`.',
 'ripley': 'For a set of points, calculates properties of the L-function. The L-function measures spatial clustering of a point pattern over the area of the cell.',
 'shape_enrichment': 'For a set of points, calculates the fraction of points within `shape_name` out of all points in the cell.'}

Similar to shape features, all we need to provide are the names of shape(s) and feature(s), plus an optional point grouping. By default the points are grouped by gene.

bt.tl.analyze_points(
    adata,
    shape_names=["cell_shape", "nucleus_shape"],
    feature_names=["distance", "asymmetry"],
    groupby="gene",
)
Crunching shape features...
Crunching point features...
Saving results...
Done.
AnnData object modified:
    uns:
        + cell_gene_features

Note

Bento demonstrates how utilize these features for downstream tasks, such as predicting RNA localization patterns (RNAforest).

Custom Features#

Shape Features#

You can also register your own custom computations as features. For example, let’s write a function to compute the centroid of a shape.

def cool_function(adata, shape_name, recompute=False):
    shape_prefix = shape_name.split("_")[0]
    feature_key = f"{shape_prefix}_centroid"

    if feature_key in adata.obs and not recompute:
        return

    centroid = bt.geo.get_shape(adata, shape_name).centroid
    adata.obs[feature_key] = centroid

Now we can register this function as a feature. The first argument is the name of the feature, and the second argument is the function itself.

bt.tl.register_shape_feature("centroid", cool_function)
Registered shape feature 'centroid' to `bento.tl.shape_features`.

Let’s compute the centroid for every cell.

bt.tl.analyze_shapes(adata, "cell_shape", "centroid")
AnnData object modified:
    obs:
        + cell_centroid

Point Features#

We can also register custom point features. These are a litte more complicated because we need to provide a function that computes the feature for arbitrary groups of points. For example, let’s write a function to compute the density of points within a shape.

from bento.tools._point_features import PointFeature


class ShapeDensity(PointFeature):
    def __init__(self, shape_name):
        super().__init__(shape_name)

    def extract(self, df):
        # Get shape polygon
        shape = df[self.shape_name].values[0]

        # Get number of points in shape
        count = (df[self.shape_prefix] != "-1").sum()

        # Calculate density
        density = count / shape.area

        return {f"{self.shape_prefix}_density": density}

Register the function as a point feature.

bt.tl.register_point_feature("density", ShapeDensity)
Registered point feature 'density' to `bento.tl.shape_features`.

We can apply it the every cell without specifying a grouping.

bt.tl.analyze_points(adata, "cell_shape", "density")
Crunching shape features...
Crunching point features...
Saving results...
Done.
AnnData object modified:
    uns:
        + cell_features

Or, we can apply it to every gene within each cell, which is how we generate gene-level features for RNAforest.

bt.tl.analyze_points(adata, "cell_shape", "density", groupby="gene", recompute=False)
Crunching shape features...
Crunching point features...
Saving results...
Done.