Source : Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper
|What is a concordance view ?||
A concordance view shows us every occurrence of a given word, together with some context, takes a few extra seconds to build an index.
A concordance permits us to see words in context.
|Get words appearing in a similar range of contexts||
|Examine just the contexts that are shared||
|Generate a dispersion plot||
To determine the location of a word in the text.
Each stripe represents an instance of a word.
text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])
|>What is a token ?||A sequence of characters (word or punctuation symbol).|
|Get the number of tokens||
|What is a set ?||
The vocabulary of a text is a set of tokens
|Sort an array in Python||
|What is a word type ?||A word considered as a unique item of vocabulary.|
|Count occurrences in Python||
|Create a list in Python||
[‘’, ‘’ etc.]
|Concatenate lists in Python||
[ ] + [ ]
|Append to list in Python||
|Access a list index in Python||
|Access a list value in Python||
|Slicing in Python||
text[index1:index2] ; text[index1:] ; text[:index2]
|string vs list ?||strings are a list of characters|
|Convert a list to a string in Python||
|Convert a string to a list in Python||
|Get a text frequency distribution||
|Get the most frequent tokens||
fdist.most_common(number of tokens to get)
|Generate a cumulative frequency plot||
|What is an hapaxe ?||Word that occurs only once.|
|Fine-grained word selection||
w for w in set(text5) if len(w) > 7 and fdist5[w] > 7
|What is a collocation ?||
A sequence of words that occurs together unusually often, they are resistant to substitution with words that have similar senses.
|What is a bigram ?||
A word pair.
list(bigrams(['more', 'is', 'said', 'than', 'done']))
|Get the max value||
|Get a given frequency in a frequency distribution||
|if statement in Python||
if len(word) < 5: ... print('word length is less than 5') elif token.istitle(): ... print(token, 'is a titlecase word') else: ... print(token, 'is punctuation')
|for statement in Python||
for word in ['Call', 'me', 'Ishmael', '.']: ... print(word)
|What is a word sense disambiguation ?||To assess which sense of a word was intended in a given context.|
|What is a Pronoun Resolution ?||To detect the subjects and objects of verbs : found the antecedents.|
|What is an anaphora resolution ?||Identifying what a pronoun or noun phrase refers to.|
|What is a semantic role labeling ?||Identifying how a noun phrase relates to the verb.|
|What is text alignment ?||Automatically pair up the sentences : Once we have a million or more sentence pairs, we can detect corresponding words and phrases, and build a model that can be used for translating new text.|
|What is a Spoken Dialogue System ?||
A pipeline of language understanding components.
|What does RTE mean ?||Recognizing Textual Entailment|