{
"cells": [
{
"cell_type": "markdown",
"id": "87920721-d15a-4ad2-84c0-7b9e82df176f",
"metadata": {},
"source": [
"# Tutorial"
]
},
{
"cell_type": "markdown",
"id": "b88b066b-3d5f-4603-a49f-e94b06b97860",
"metadata": {},
"source": [
"## Importing packages"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ec38e9d6-c56e-4703-976f-3264b7c37fb6",
"metadata": {},
"outputs": [],
"source": [
"import warnings\n",
"warnings.filterwarnings('ignore')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "377a9b2f-527a-4610-acb7-bb1261da6642",
"metadata": {},
"outputs": [],
"source": [
"import tmplot as tmp\n",
"import pickle as pkl\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "aebd7122-44e0-46f9-b999-400c6930be48",
"metadata": {},
"source": [
"## Importing data"
]
},
{
"cell_type": "markdown",
"id": "dd9119c3-e46d-4870-b066-d0866810e45a",
"metadata": {},
"source": [
"Let's take the BTM model trained on a test dataset (*SearchSnippets*) as an example.\n",
"We will begin with reading it from a file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "292c08e3-0fa3-431c-bd6e-b414f086dd28",
"metadata": {},
"outputs": [],
"source": [
"with open('data/model_btm.pkl', 'rb') as file:\n",
" model = pkl.load(file)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63773c31-8b45-4e20-9b55-a61cb5f7987c",
"metadata": {},
"outputs": [],
"source": [
"docs = pd.read_csv('data/SearchSnippets.txt.gz', header=None).values.ravel()"
]
},
{
"cell_type": "markdown",
"id": "7483f62b-ffc6-466a-bb5e-71375c29bd4d",
"metadata": {},
"source": [
"## Matrices"
]
},
{
"cell_type": "markdown",
"id": "47eb8fbf-7327-42ef-9e27-fe523518dd0f",
"metadata": {},
"source": [
"Researchers working with topic models often need to obtain `phi` (words vs topics probability) and `theta` (topics vs documents probability) matrices.\n",
"*Tmplot* provides two functions for getting these matrices from `tomotopy`, `bitermplus`, and `gensim` models."
]
},
{
"cell_type": "markdown",
"id": "80d6fd7a-b600-4331-b1bb-0bf68932faf9",
"metadata": {},
"source": [
"### Phi matrix"
]
},
{
"cell_type": "markdown",
"id": "77650819-89bb-4764-86b6-ce615a86e83d",
"metadata": {},
"source": [
"Note that you will need to pass a vocabulary for a gensim model."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c13528d2-0663-42e6-b08a-1461dfd4eaf3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | topics | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
" 5 | \n",
" 6 | \n",
" 7 | \n",
"
\n",
" \n",
" | words | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | aaa | \n",
" 3.195102e-08 | \n",
" 3.012856e-08 | \n",
" 3.047842e-08 | \n",
" 3.542745e-08 | \n",
" 3.836165e-08 | \n",
" 2.961217e-08 | \n",
" 2.362519e-08 | \n",
" 4.831267e-08 | \n",
"
\n",
" \n",
" | aaas | \n",
" 3.837318e-05 | \n",
" 3.012856e-08 | \n",
" 3.047842e-08 | \n",
" 3.542745e-08 | \n",
" 3.836165e-08 | \n",
" 5.922729e-04 | \n",
" 6.144912e-05 | \n",
" 2.903592e-05 | \n",
"
\n",
" \n",
" | aaron | \n",
" 3.195102e-08 | \n",
" 3.012856e-08 | \n",
" 3.047842e-08 | \n",
" 3.542745e-08 | \n",
" 4.296888e-04 | \n",
" 2.961217e-08 | \n",
" 2.362519e-08 | \n",
" 4.831267e-08 | \n",
"
\n",
" \n",
" | aau | \n",
" 3.195102e-08 | \n",
" 3.012856e-08 | \n",
" 3.047842e-08 | \n",
" 3.542745e-08 | \n",
" 3.836165e-08 | \n",
" 2.961217e-08 | \n",
" 2.362519e-08 | \n",
" 4.203686e-04 | \n",
"
\n",
" \n",
" | abbreviations | \n",
" 7.990951e-05 | \n",
" 3.163800e-04 | \n",
" 3.047842e-08 | \n",
" 3.542745e-08 | \n",
" 3.836165e-08 | \n",
" 2.961217e-08 | \n",
" 2.386144e-06 | \n",
" 4.831267e-08 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"topics 0 1 2 3 \\\n",
"words \n",
"aaa 3.195102e-08 3.012856e-08 3.047842e-08 3.542745e-08 \n",
"aaas 3.837318e-05 3.012856e-08 3.047842e-08 3.542745e-08 \n",
"aaron 3.195102e-08 3.012856e-08 3.047842e-08 3.542745e-08 \n",
"aau 3.195102e-08 3.012856e-08 3.047842e-08 3.542745e-08 \n",
"abbreviations 7.990951e-05 3.163800e-04 3.047842e-08 3.542745e-08 \n",
"\n",
"topics 4 5 6 7 \n",
"words \n",
"aaa 3.836165e-08 2.961217e-08 2.362519e-08 4.831267e-08 \n",
"aaas 3.836165e-08 5.922729e-04 6.144912e-05 2.903592e-05 \n",
"aaron 4.296888e-04 2.961217e-08 2.362519e-08 4.831267e-08 \n",
"aau 3.836165e-08 2.961217e-08 2.362519e-08 4.203686e-04 \n",
"abbreviations 3.836165e-08 2.961217e-08 2.386144e-06 4.831267e-08 "
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"phi = tmp.get_phi(model)\n",
"phi.head()"
]
},
{
"cell_type": "markdown",
"id": "f4c6f33a-0a87-4fcb-87ee-ee11a16d2ee8",
"metadata": {},
"source": [
"### Theta matrix"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "81c8eeed-2bd9-43cb-9288-963348d45a46",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | docs | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
" 5 | \n",
" 6 | \n",
" 7 | \n",
" 8 | \n",
" 9 | \n",
" ... | \n",
" 990 | \n",
" 991 | \n",
" 992 | \n",
" 993 | \n",
" 994 | \n",
" 995 | \n",
" 996 | \n",
" 997 | \n",
" 998 | \n",
" 999 | \n",
"
\n",
" \n",
" | topics | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0.354702 | \n",
" 0.294777 | \n",
" 0.178074 | \n",
" 0.332888 | \n",
" 0.596412 | \n",
" 0.726975 | \n",
" 0.099094 | \n",
" 0.257602 | \n",
" 0.532725 | \n",
" 0.471059 | \n",
" ... | \n",
" 0.007651 | \n",
" 0.085897 | \n",
" 0.025840 | \n",
" 0.019194 | \n",
" 0.033898 | \n",
" 0.020408 | \n",
" 0.030728 | \n",
" 0.036133 | \n",
" 0.084323 | \n",
" 0.024301 | \n",
"
\n",
" \n",
" | 1 | \n",
" 0.000245 | \n",
" 0.007173 | \n",
" 0.021324 | \n",
" 0.019411 | \n",
" 0.029472 | \n",
" 0.008740 | \n",
" 0.011804 | \n",
" 0.036323 | \n",
" 0.011349 | \n",
" 0.003909 | \n",
" ... | \n",
" 0.069988 | \n",
" 0.263869 | \n",
" 0.058431 | \n",
" 0.227196 | \n",
" 0.022920 | \n",
" 0.021660 | \n",
" 0.040932 | \n",
" 0.060534 | \n",
" 0.150018 | \n",
" 0.071271 | \n",
"
\n",
" \n",
" | 2 | \n",
" 0.003073 | \n",
" 0.057144 | \n",
" 0.013837 | \n",
" 0.014514 | \n",
" 0.011813 | \n",
" 0.002588 | \n",
" 0.000247 | \n",
" 0.027391 | \n",
" 0.002325 | \n",
" 0.005435 | \n",
" ... | \n",
" 0.007558 | \n",
" 0.014669 | \n",
" 0.014206 | \n",
" 0.002697 | \n",
" 0.008854 | \n",
" 0.017299 | \n",
" 0.014710 | \n",
" 0.027672 | \n",
" 0.061375 | \n",
" 0.011318 | \n",
"
\n",
" \n",
" | 3 | \n",
" 0.003678 | \n",
" 0.029281 | \n",
" 0.010010 | \n",
" 0.001287 | \n",
" 0.027349 | \n",
" 0.004351 | \n",
" 0.018189 | \n",
" 0.085879 | \n",
" 0.011453 | \n",
" 0.002965 | \n",
" ... | \n",
" 0.007010 | \n",
" 0.022462 | \n",
" 0.007516 | \n",
" 0.006018 | \n",
" 0.001193 | \n",
" 0.007400 | \n",
" 0.007335 | \n",
" 0.021119 | \n",
" 0.012309 | \n",
" 0.006168 | \n",
"
\n",
" \n",
" | 4 | \n",
" 0.000927 | \n",
" 0.035162 | \n",
" 0.001736 | \n",
" 0.319421 | \n",
" 0.024606 | \n",
" 0.042996 | \n",
" 0.019524 | \n",
" 0.036119 | \n",
" 0.001910 | \n",
" 0.039332 | \n",
" ... | \n",
" 0.016587 | \n",
" 0.056386 | \n",
" 0.005925 | \n",
" 0.003503 | \n",
" 0.001620 | \n",
" 0.006468 | \n",
" 0.004151 | \n",
" 0.018374 | \n",
" 0.008712 | \n",
" 0.087364 | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 1000 columns
\n",
"
"
],
"text/plain": [
"docs 0 1 2 3 4 5 6 \\\n",
"topics \n",
"0 0.354702 0.294777 0.178074 0.332888 0.596412 0.726975 0.099094 \n",
"1 0.000245 0.007173 0.021324 0.019411 0.029472 0.008740 0.011804 \n",
"2 0.003073 0.057144 0.013837 0.014514 0.011813 0.002588 0.000247 \n",
"3 0.003678 0.029281 0.010010 0.001287 0.027349 0.004351 0.018189 \n",
"4 0.000927 0.035162 0.001736 0.319421 0.024606 0.042996 0.019524 \n",
"\n",
"docs 7 8 9 ... 990 991 992 \\\n",
"topics ... \n",
"0 0.257602 0.532725 0.471059 ... 0.007651 0.085897 0.025840 \n",
"1 0.036323 0.011349 0.003909 ... 0.069988 0.263869 0.058431 \n",
"2 0.027391 0.002325 0.005435 ... 0.007558 0.014669 0.014206 \n",
"3 0.085879 0.011453 0.002965 ... 0.007010 0.022462 0.007516 \n",
"4 0.036119 0.001910 0.039332 ... 0.016587 0.056386 0.005925 \n",
"\n",
"docs 993 994 995 996 997 998 999 \n",
"topics \n",
"0 0.019194 0.033898 0.020408 0.030728 0.036133 0.084323 0.024301 \n",
"1 0.227196 0.022920 0.021660 0.040932 0.060534 0.150018 0.071271 \n",
"2 0.002697 0.008854 0.017299 0.014710 0.027672 0.061375 0.011318 \n",
"3 0.006018 0.001193 0.007400 0.007335 0.021119 0.012309 0.006168 \n",
"4 0.003503 0.001620 0.006468 0.004151 0.018374 0.008712 0.087364 \n",
"\n",
"[5 rows x 1000 columns]"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tmp.get_theta(model).head()"
]
},
{
"cell_type": "markdown",
"id": "e406b148-2b5a-43d1-92ae-9b372e668759",
"metadata": {},
"source": [
"## Documents"
]
},
{
"cell_type": "markdown",
"id": "8726b24b-127d-422c-9966-910a19edd62e",
"metadata": {},
"source": [
"Here is how you can get documents with maximum probabilities $P(t|d)$ for each topic:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c2f7dbde-d01b-40f6-8cd6-229c2941400c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" topic0 | \n",
" topic1 | \n",
" topic2 | \n",
" topic3 | \n",
" topic4 | \n",
" topic5 | \n",
" topic6 | \n",
" topic7 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" speakeasy speedtest speakeasy speed test test ... | \n",
" links jstor sici sici jstor postwar consumptio... | \n",
" imdb name julia roberts julia roberts imdb mov... | \n",
" guitars bodies amps guitars strings | \n",
" vcic unc edu vcic venture capital investment c... | \n",
" washington edu drivers device drivers device d... | \n",
" apache api dom document document xml standard ... | \n",
" hypotheses hypotheses author illustrates hypot... | \n",
"
\n",
" \n",
" | 1 | \n",
" speedtest bandwidth speed test bandwidth speed... | \n",
" econpapers repec article econpapers postwar co... | \n",
" celebrities cruise celebrity tom cruise tom cr... | \n",
" louis french fashion designer designer manufac... | \n",
" national venture capital association foster un... | \n",
" manufactures parallel serial drives | \n",
" schools dom default xml dom tutorial xml docum... | \n",
" surreal surreal | \n",
"
\n",
" \n",
" | 2 | \n",
" home bandwidth broadband speedtest bandwidth c... | \n",
" findarticles articles consumption consumer exp... | \n",
" imdb name tom cruise tom cruise imdb movies ce... | \n",
" fashion designers default fashion designers fa... | \n",
" san jose mercury news venture capital expanded... | \n",
" leonardo leonardo vinci inventor information c... | \n",
" access cards ieee access | \n",
" allposters surrealism posters surrealism poste... | \n",
"
\n",
" \n",
" | 3 | \n",
" home bandwidth broadband speedtest bandwidth c... | \n",
" financial financial international health insur... | \n",
" absolutely roberts absolutely julia roberts ph... | \n",
" fashion designers audio fashion designer net f... | \n",
" seattlepi nwsource venture seattle venture cap... | \n",
" journals searching biomedical journals engine ... | \n",
" generator xml generator sample xml instance do... | \n",
" hypotheses hypotheses nature research hypothes... | \n",
"
\n",
" \n",
" | 4 | \n",
" portfolio shareholder services manage investme... | \n",
" consumption consumer rights consumption consum... | \n",
" imdb title imdb movies celebs | \n",
" fashion fashion designers fashion designers fa... | \n",
" venture capital journal listening model ventur... | \n",
" lwn articles driver lwn device drivers kernel ... | \n",
" reference standard template library standard t... | \n",
" allposters beatles posters beatles prints allp... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" topic0 \\\n",
"0 speakeasy speedtest speakeasy speed test test ... \n",
"1 speedtest bandwidth speed test bandwidth speed... \n",
"2 home bandwidth broadband speedtest bandwidth c... \n",
"3 home bandwidth broadband speedtest bandwidth c... \n",
"4 portfolio shareholder services manage investme... \n",
"\n",
" topic1 \\\n",
"0 links jstor sici sici jstor postwar consumptio... \n",
"1 econpapers repec article econpapers postwar co... \n",
"2 findarticles articles consumption consumer exp... \n",
"3 financial financial international health insur... \n",
"4 consumption consumer rights consumption consum... \n",
"\n",
" topic2 \\\n",
"0 imdb name julia roberts julia roberts imdb mov... \n",
"1 celebrities cruise celebrity tom cruise tom cr... \n",
"2 imdb name tom cruise tom cruise imdb movies ce... \n",
"3 absolutely roberts absolutely julia roberts ph... \n",
"4 imdb title imdb movies celebs \n",
"\n",
" topic3 \\\n",
"0 guitars bodies amps guitars strings \n",
"1 louis french fashion designer designer manufac... \n",
"2 fashion designers default fashion designers fa... \n",
"3 fashion designers audio fashion designer net f... \n",
"4 fashion fashion designers fashion designers fa... \n",
"\n",
" topic4 \\\n",
"0 vcic unc edu vcic venture capital investment c... \n",
"1 national venture capital association foster un... \n",
"2 san jose mercury news venture capital expanded... \n",
"3 seattlepi nwsource venture seattle venture cap... \n",
"4 venture capital journal listening model ventur... \n",
"\n",
" topic5 \\\n",
"0 washington edu drivers device drivers device d... \n",
"1 manufactures parallel serial drives \n",
"2 leonardo leonardo vinci inventor information c... \n",
"3 journals searching biomedical journals engine ... \n",
"4 lwn articles driver lwn device drivers kernel ... \n",
"\n",
" topic6 \\\n",
"0 apache api dom document document xml standard ... \n",
"1 schools dom default xml dom tutorial xml docum... \n",
"2 access cards ieee access \n",
"3 generator xml generator sample xml instance do... \n",
"4 reference standard template library standard t... \n",
"\n",
" topic7 \n",
"0 hypotheses hypotheses author illustrates hypot... \n",
"1 surreal surreal \n",
"2 allposters surrealism posters surrealism poste... \n",
"3 hypotheses hypotheses nature research hypothes... \n",
"4 allposters beatles posters beatles prints allp... "
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tmp.get_top_docs(docs, model=model)"
]
},
{
"cell_type": "markdown",
"id": "dcb89b61-e611-44cf-b264-08bee8df2746",
"metadata": {},
"source": [
"## Visualization"
]
},
{
"cell_type": "markdown",
"id": "de40a65f-d213-4e58-b92a-52fad8a75034",
"metadata": {},
"source": [
"*tmplot* takes much from [LDAvis](https://github.com/cpsievert/LDAvis), but also extends the functionality with a number of algorithms and metrics for plotting topics and terms. *tmplot* is based on [ipywidgets](https://ipywidgets.readthedocs.io/) and [Altair](https://altair-viz.github.io/) ([Vega](https://vega.github.io/)-backed package for nice plots)."
]
},
{
"cell_type": "markdown",
"id": "cdf7fe50-601a-431f-bf33-f6de019eda7e",
"metadata": {},
"source": [
"### Topics"
]
},
{
"cell_type": "markdown",
"id": "5f7b8d80-db66-4062-b878-3a8fee0e20ce",
"metadata": {},
"source": [
"First, we need to calculate the coordinates of topics based on intertopic distance values. By default, the combination of *t-distributed Stochastic Neighbor Embedding* and *symmetric Kullback-Leibler divergence* is used to calculate topics coordinates in 2D, but a number of other metrics and algorithms are also available (see `tmplot.get_topics_dist` and `tmplot.get_topics_scatter` functions for additional information). "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ea2a69d-cc8a-4545-a407-fd02bc69138a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" x | \n",
" y | \n",
" topic | \n",
" size | \n",
" label | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" -41.183987 | \n",
" -30.480648 | \n",
" 0 | \n",
" 21.160233 | \n",
" 0 | \n",
"
\n",
" \n",
" | 1 | \n",
" -11.704910 | \n",
" -34.631725 | \n",
" 1 | \n",
" 4.265470 | \n",
" 1 | \n",
"
\n",
" \n",
" | 2 | \n",
" -56.292171 | \n",
" -4.832846 | \n",
" 2 | \n",
" 20.599346 | \n",
" 2 | \n",
"
\n",
" \n",
" | 3 | \n",
" 9.921317 | \n",
" -14.181945 | \n",
" 3 | \n",
" 7.176289 | \n",
" 3 | \n",
"
\n",
" \n",
" | 4 | \n",
" -45.702721 | \n",
" 22.987968 | \n",
" 4 | \n",
" 4.535249 | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" x y topic size label\n",
"0 -41.183987 -30.480648 0 21.160233 0\n",
"1 -11.704910 -34.631725 1 4.265470 1\n",
"2 -56.292171 -4.832846 2 20.599346 2\n",
"3 9.921317 -14.181945 3 7.176289 3\n",
"4 -45.702721 22.987968 4 4.535249 4"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"topics_coords = tmp.prepare_coords(model)\n",
"topics_coords.head()"
]
},
{
"cell_type": "markdown",
"id": "3200724e-18f1-4f84-b751-400ee7c42323",
"metadata": {},
"source": [
"Plotting topics:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7d40b041-7e75-48e0-89b2-7d60301cb7a9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tmp.plot_scatter_topics(topics_coords, size_col='size', label_col='label')"
]
},
{
"cell_type": "markdown",
"id": "7567f299-40b3-446b-9080-3b768b7f894d",
"metadata": {},
"source": [
"### Words (or terms)"
]
},
{
"cell_type": "markdown",
"id": "23466ca3-6714-44fd-8fe1-20d2d771a02e",
"metadata": {},
"source": [
"**tmplot** also uses terms *relevance* that was introduced by [Sievert and Shirley (2014)](https://www.aclweb.org/anthology/W14-3110.pdf) for sorting terms."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "91eb40cc-23ab-4c3f-808f-a4a109222bdc",
"metadata": {},
"outputs": [],
"source": [
"terms_probs = tmp.calc_terms_probs_ratio(phi, topic=0, lambda_=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2cf53d3d-d3ad-4871-9af2-afb1b6d30c9b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tmp.plot_terms(terms_probs)"
]
},
{
"cell_type": "markdown",
"id": "79c7c6c0-46e1-40e3-bdef-c3b8eae78a99",
"metadata": {},
"source": [
"### Documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e205a32c-efb1-4de7-a4b4-e647fa901033",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" topic0 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" speakeasy speedtest speakeasy speed test test ... | \n",
"
\n",
" \n",
" | 1 | \n",
" speedtest bandwidth speed test bandwidth speed... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" topic0\n",
"0 speakeasy speedtest speakeasy speed test test ...\n",
"1 speedtest bandwidth speed test bandwidth speed..."
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"top_docs_topic0 = tmp.get_top_docs(docs, model=model, docs_num=2, topics=[0])\n",
"top_docs_topic0"
]
},
{
"cell_type": "markdown",
"id": "4786d0a0-f09e-4885-b374-e23cf53e83bb",
"metadata": {},
"source": [
"The following output is used within the interactive interface that we will explore shortly:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bbe87e1f-e8e4-4cad-a51b-f19e1ee7c73a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | \n",
" topic0 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" speakeasy speedtest speakeasy speed test test speed internet connection speakeasy speed test | \n",
"
\n",
" \n",
" | 1 | \n",
" speedtest bandwidth speed test bandwidth speed test bandwidth bandwidth speed internet service | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
""
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tmp.plot_docs(top_docs_topic0)"
]
},
{
"cell_type": "markdown",
"id": "acc84831-c2c1-4c92-90b8-f87bd05e428e",
"metadata": {},
"source": [
"## Interactive report interface"
]
},
{
"cell_type": "markdown",
"id": "90e235e4-ab52-465b-a394-7feee3a24827",
"metadata": {},
"source": [
"To run the report interface, just call `tmplot.report()` function with your model and docs. You can tweak most of the hidden parameters using keyword arguments (see function docstring)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f443db82-1715-4d8e-9d0e-5a74a2e919c4",
"metadata": {},
"outputs": [],
"source": [
"tmp.report(model, docs=docs, height=400, width=250)"
]
},
{
"cell_type": "markdown",
"id": "751407f4-5f79-4331-b7a0-a141efd09d72",
"metadata": {},
"source": [
""
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}