Source: https://drive.google.com/file/d/10O8pMwSy9209mP4a9pCctCYIVXa8oGpL/view?usp=sharing

Text Analysis

Text Analysis

To determine structure, insight, and relationship within and between textual data, e.g. articles, tweets, books, music, web page content, code, etc.

Approaches To: Text Analysis

Sentiment Analysis

TF-IDF

Visualizations - Word Clouds & Networks

Much more!

Sentiment Analysis

Programatically infer emotional content of text

Sentiment Lexicon

Dataset containing words classified by their sentiment

When doing sentiment analysis…

Token

A meaningful unit of text

what you use for analysis

tokenization takes corpus of text and splits it into tokens (words, bigrams, etc.)

Stop words

Words not helpful for analysis

extremely common words such as “the”, “of”, “to”

are typically removed from analysis

TF-IDF

Term Frequency - Inverse Document Frequency (TF-IDF)

TF-IDF is a measure of originality of a word in a document obtained by comparing the number of times a word appears in a document with the number of documents the word appears in

Term within document
: frequency of in
: number of documents containing
: total number of documents

Word clouds display the words proportional to their frequency within the textual dataset

What if you don’t want to remove words from their context?

Carter's Digital Garden

Explorer

COGS 9 Lecture 24

Text Analysis

When doing sentiment analysis…

TF-IDF

Graph View

Table of Contents

Backlinks