Stability functions
- tmplot.get_closest_topics(models: List[Any], ref: int = 0, method: str = 'sklb', top_words: int = 100, verbose: bool = True) Tuple[ndarray, ndarray]
Finding closest topics in models.
- Parameters:
models (List[Any]) – List of supported and fitted topic models.
ref (int = 0) – Index of reference matrix (zero-based indexing).
method (str = "sklb") – Distance calculation method. Possible variants: 1) “klb” - Kullback-Leibler divergence. 2) “sklb” - Symmetric Kullback-Leibler divergence. 3) “jsd” - Jensen-Shannon divergence. 4) “jef” - Jeffrey’s divergence. 5) “hel” - Hellinger distance. 6) “bhat” - Bhattacharyya distance. 7) “tv” - Total variation distance. 8) “jac” - Jaccard index.
top_words (int = 100) – Number of top words in each topic to use in Jaccard index calculation.
verbose (bool = True) – Verbose output (progress bar).
- Returns:
closest_topics (np.ndarray) – Closest topics indices in one two-dimensional array (topics ✕ models). Columns correspond to the compared models (their indices), rows are the closest topics pairs.
dist (np.ndarray) – Closest topics distances (e.g., Kullback-Leibler or Jaccard index values). Shape of this array corresponds to the shape of the first returned argument.
Example
>>> # `models` must be an iterable of fitted models >>> closest_topics, kldiv = tmplot.get_closest_topics(models)
- tmplot.get_stable_topics(closest_topics: ndarray, dist: ndarray, norm: bool = True, inverse: bool = True, inverse_factor: float = 1.0, ref: int = 0, thres: float = 0.9, thres_models: int = 2) Tuple[ndarray, ndarray]
Finding stable topics in models.
- Parameters:
closest_topics (np.ndarray) – Closest topics indices in a two-dimensional array. Columns correspond to the compared matrices (their indices), rows are the closest topics pairs. Typically, this should be the first value returned by
tmplot.get_closest_topics()
function.dist (np.ndarray) – Distance values: Kullback-Leibler divergence or Jaccard index values corresponding to the matrix of the closest topics. Typically, this should be the second value returned by
tmplot.get_closest_topics()
function.norm (bool = True) – Normalize distance values (passed as
dist
argument).inverse (bool = True) – Inverse distance values by subtracting them from
inverse_factor
. Should be set toFalse
if Jaccard index was used to calculate closest topics.inverse_factor (float = 1.0) – Subtract distance values from this factor to inverse.
ref (int = 0) – Index of reference matrix (i.e. reference column index, zero-based indexing).
thres (float = 0.9) – Threshold for distance values filtering.
thres_models (int = 2) – Minimum topic recurrence frequency across all models.
- Returns:
stable_topics (np.ndarray) – Filtered matrix of the closest topics indices (i.e. stable topics).
dist (np.ndarray) – Filtered distance values corresponding to the matrix of the closest topics.
See also
Example
>>> closest_topics, kldiv = tmplot.get_closest_topics(models) >>> stable_topics, stable_kldiv = tmplot.get_stable_topics( ... closest_topics, kldiv)