Cornell notes : Natural Language Processing with Python - Chapter 1

Source : Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper

Question Answer
What is a concordance view ? A concordance view shows us every occurrence of a given word, together with some context, takes a few extra seconds to build an index.
A concordance permits us to see words in context.
Get words appearing in a similar range of contexts
Examine just the contexts that are shared
text2.common_contexts(["monstrous", "very"])
Generate a dispersion plot To determine the location of a word in the text.
Each stripe represents an instance of a word.
text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])
>What is a token ? A sequence of characters (word or punctuation symbol).
Get the number of tokens
What is a set ? The vocabulary of a text is a set of tokens
Sort an array in Python
What is a word type ? A word considered as a unique item of vocabulary.
Count occurrences in Python
Create a list in Python
[‘’, ‘’ etc.]
Concatenate lists in Python
[ ] + [ ]
Append to list in Python
Access a list index in Python
Access a list value in Python
Slicing in Python
text[index1:index2] ; text[index1:] ; text[:index2]
string vs list ? strings are a list of characters
Convert a list to a string in Python
Convert a string to a list in Python
Get a text frequency distribution
Get the most frequent tokens
fdist.most_common(number of tokens to get)
Generate a cumulative frequency plot
fdist1.plot(50, cumulative=True)
What is an hapaxe ? Word that occurs only once.
Fine-grained word selection
w for w in set(text5) if len(w) > 7 and fdist5[w] > 7
What is a collocation ? A sequence of words that occurs together unusually often, they are resistant to substitution with words that have similar senses.
What is a bigram ? A word pair.
list(bigrams(['more', 'is', 'said', 'than', 'done']))
Get the max value
Get a given frequency in a frequency distribution
if statement in Python
            if len(word) < 5:
            ...    print('word length is less than 5')
            elif token.istitle():
            ...     print(token, 'is a titlecase word')
            ...     print(token, 'is punctuation')
for statement in Python
            for word in ['Call', 'me', 'Ishmael', '.']:
            ...    print(word)
What is a word sense disambiguation ? To assess which sense of a word was intended in a given context.
What is a Pronoun Resolution ? To detect the subjects and objects of verbs : found the antecedents.
What is an anaphora resolution ? Identifying what a pronoun or noun phrase refers to.
What is a semantic role labeling ? Identifying how a noun phrase relates to the verb.
What is text alignment ? Automatically pair up the sentences : Once we have a million or more sentence pairs, we can detect corresponding words and phrases, and build a model that can be used for translating new text.
What is a Spoken Dialogue System ? A pipeline of language understanding components.

What does RTE mean ? Recognizing Textual Entailment

