Spatial FeaturesΒΆ

Author: Clarence Mah | Last Updated: Apr 26, 2024

Here we demonstrate how to compute spatial features with. We will use the included MERFISH U2-OS dataset.

Load LibrariesΒΆ

import bento as bt
import matplotlib.pyplot as plt
import seaborn as sns

Load DataΒΆ

sdata = bt.ds.sample_data()
Done 🍱: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:01<00:00,  1.59it/s]
bt.pl.density(sdata)
../_images/71a537ca905c59e63314c7ac0bb7a6e386bd40a332e08db981a9425a411bd8c7.png

Shape FeaturesΒΆ

In bento we refer to cell membrane and other subcellular boundaries, e.g. nuclear membrane, as shapes. We can easily inspect morphological properties of these shapes with a number of built-in shape features. Call bt.tl.list_shape_features() to list available features.

bt.tl.list_shape_features()
{'area': 'Compute the area of each shape.',
 'aspect_ratio': 'Compute the aspect ratio of the minimum rotated rectangle that contains each shape.',
 'bounds': 'Compute the minimum and maximum coordinate values that bound each shape.',
 'density': 'Compute the RNA density of each shape.',
 'opening': 'Compute the opening (morphological) of distance d for each cell.',
 'perimeter': 'Compute the perimeter of each shape.',
 'radius': 'Compute the radius of each cell.',
 'raster': 'Generate a grid of points contained within each shape. The points lie on\n    a 2D grid, with vertices spaced `step` distance apart.',
 'second_moment': 'Compute the second moment of each shape.',
 'span': 'Compute the length of the longest diagonal of each shape.'}

For convenience, bt.tl.obs_stats() computes the area, aspect ratio, and density properties for the cell and nucleus shapes.

bt.tl.shape_stats(sdata)
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 114.64it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 102.04it/s]

To visualize feature distributions, run the equivalent plotting function, bt.pl.obs_stats().

The strip plot shows individual cells and the boxes show the 4 quantiles, where the blue are the lower two quantiles and the pink are the upper two quantiles. The vertical black line denotes the 50th quantile.

Tip

Spot outlier cells to investigate more closely. These may look very different as a result of poor segmentation or drastically different cell morphology.

bt.pl.shape_stats(sdata)
../_images/076a67aa02fcd70771e7c2279e15519b5912f3cdca5de68349bf070c473e08f0.png

You may be interested in additional features; the main function you will use is bt.tl.analyze_shapes(). Pass cell_boundaries and area to compute the area for every cell.

bt.tl.analyze_shapes(sdata, "cell_boundaries", "area")
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 9892.23it/s]

Or pass lists of shapes and feature names to compute all combinations simultaneously.

bt.tl.analyze_shapes(
    sdata, ["cell_boundaries", "nucleus_boundaries"], ["radius", "span", "perimeter"]
)
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [00:00<00:00, 517.16it/s]
bt.pl.shape_stats(
    sdata,
    cols=[
        "cell_boundaries_area",
        "cell_boundaries_aspect_ratio",
        "cell_boundaries_density",
        "cell_boundaries_perimeter",
        "nucleus_boundaries_area",
        "nucleus_boundaries_aspect_ratio",
        "nucleus_boundaries_density",
        "nucleus_boundaries_perimeter",
    ],
)
../_images/ac29073db701ad09c282995d571bc71476cb188780cd8515c8b50b5f8dd6980c.png

You can use standard python data manipulation/visualization tools to explore features i.e. pandas and seaborn.

Note

Bento tries to simplify quantifying these spatial features, so it is conveninent for downstream exploratory tasks to utilize these feature sets e.g. for studying relationships between cell morphology and other phenotypes, building classifiers etc.

sns.pairplot(
    data=sdata["cell_boundaries"][
        [
            "cell_boundaries_area",
            "cell_boundaries_perimeter",
            "cell_boundaries_aspect_ratio",
        ]
    ],
    kind="reg",
)
<seaborn.axisgrid.PairGrid at 0x7f85fc295b40>
../_images/a0a6bc66d3cb227bee9be260d365283432dcbbaca9fa04e4785d942f94f00bd4.png

Point FeaturesΒΆ

In addition to shape-level features, we can compute subcellular spatial features for arbitrary groups of points, e.g. for every gene.

List available features with bt.tl.list_point_features.

bt.tl.list_point_features()
{'proximity': 'For a set of points, computes the proximity of points within `shape_key` as well as the proximity of points outside `shape_key`. Proximity is defined as the average absolute distance to the specified `shape_key` normalized by cell radius. Values closer to 0 denote farther from the `shape_key`, values closer to 1 denote closer to the `shape_key`.',
 'asymmetry': 'For a set of points, computes the asymmetry of points within `shape_key` as well as the asymmetry of points outside `shape_key`. Asymmetry is defined as the offset between the centroid of points to the centroid of the specified `shape_key`, normalized by cell radius. Values closer to 0 denote symmetry, values closer to 1 denote asymmetry.',
 'point_dispersion_norm': 'For a set of points, calculates the second moment of all points in a cell relative to the centroid of the total RNA signal. This value is normalized by the second moment of a uniform distribution within the cell boundary.',
 'shape_dispersion_norm': 'For a set of points, calculates the second moment of all points in a cell relative to the centroid of `shape_key`. This value is normalized by the second moment of a uniform distribution within the cell boundary.',
 'distance': 'For a set of points, computes the distance of points within `shape_key` as well as the distance of points outside `shape_key`.',
 'offset': 'For a set of points, computes the offset of points within `shape_key` as well as the offset of points outside `shape_key`. Offset is defined as the offset between the centroid of points to the centroid of the specified `shape_key`.',
 'point_dispersion': 'For a set of points, calculates the second moment of all points in a cell relative to the centroid of the total RNA signal.',
 'shape_dispersion': 'For a set of points, calculates the second moment of all points in a cell relative to the centroid of `shape_key`.',
 'ripley': 'For a set of points, calculates properties of the L-function. The L-function measures spatial clustering of a point pattern over the area of the cell.',
 'shape_enrichment': 'For a set of points, calculates the fraction of points within `shape_key` out of all points in the cell.'}

Similar to shape features, all we need to provide are the names of shape(s) and feature(s), plus an optional point grouping. By default the points are grouped by gene.

bt.tl.analyze_points(
    sdata,
    shape_keys=["cell_boundaries", "nucleus_boundaries"],
    feature_names=["distance", "asymmetry"],
    groupby="feature_name",
)
Crunching shape features...
Crunching point features...
Saving results...
Done.

Note

Bento demonstrates how utilize these features for downstream tasks, such as predicting RNA localization patterns (RNAforest).

Custom FeaturesΒΆ

Warning

This section is under construction.