White Papers

White Papers

Unstructured Text Data Associations of Online Content in Public Domain: An insight and implementation

Unstructured Text Data Associations of Online Content in Public Domain

~ Promod George

A huge amount of free text data flows from the internet that is contributed by millions of users in the form of blog content, reviews, articles, emails, etc. While browsing through enormous data, the questions that often arise are – “How do we make relative sense between articles or entities or content?”, “Can a set of articles or entities or content be grouped together for further analysis?”, etc.

As a matter of fact, one Artificial Intelligence (AI) solution does not fit all situations. However, all AI approaches are quite closely coupled to a given solution. Every data point captured becomes useful somewhere in the AI solutioning.

Given a very large variety of articles or entities, or content of a wide variety, say Big Data, and a wide set of user-interests, a  AI/ ML based system needs to associate the right set to the right user based on high relevance and user-interest.

The white paper provides a solution that involves a periodic streaming of the raw articles, or entities, or content. This is done using either simple Python based Cron jobs or more robust Spark streaming into high end text indexing NoSQL databases, such as SolR (or MongoDB).



Enter the characters shown in the image.