Information Extraction
Using LDA for extracting main themes. Wont have to calculate cosine similarity. It will give topics directly. Also more tangible. Less work. Not sure how it will fare. Test data should work even at sentence level. No dilution of embeddings due to whole document averaging.
Train it on whole data and run it on individual sentences. I guess I can even drop words from the word topic matrix to make it better. Further, tamper the matrix using some more tricks.
This method can be useful in text summaries as well. Fuck 4800 dimensions. This can also be useful for dimensionality reduction. Output can be fed into another network.
Comments
Post a Comment