Cornell notes : Natural Language Processing with Python - Chapter 1

Source : Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper

Question Answer
What is a concordance view ? A concordance view shows us every occurrence of a given word, together with some context, takes a few extra seconds to build an index.
A concordance permits us to see words in context.
text1.concordance("monstrous")
Get words appearing in a similar range of contexts
text1.similar("monstrous")
Examine just the contexts that are shared
text2.common_contexts(["monstrous", "very"])
Generate a dispersion plot To determine the location of a word in the text.
Each stripe represents an instance of a word.
text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])
>What is a token ? A sequence of characters (word or punctuation symbol).
Get the number of tokens
len(text3)
What is a set ? The vocabulary of a text is a set of tokens
set(text3)
Sort an array in Python
sorted(array)
What is a word type ? A word considered as a unique item of vocabulary.
Count occurrences in Python
text4.count('a')
Create a list in Python
[‘’, ‘’ etc.]
Concatenate lists in Python
[ ] + [ ]
Append to list in Python
list.append(‘’)
Access a list index in Python
text.index(value)
Access a list value in Python
text[index]
Slicing in Python
text[index1:index2] ; text[index1:] ; text[:index2]
string vs list ? strings are a list of characters
Convert a list to a string in Python
‘’.join(list)
Convert a string to a list in Python
string.split()
Get a text frequency distribution
FreqDist(text)
Get the most frequent tokens
fdist.most_common(number of tokens to get)
Generate a cumulative frequency plot
fdist1.plot(50, cumulative=True)
What is an hapaxe ? Word that occurs only once.
Fine-grained word selection
w for w in set(text5) if len(w) > 7 and fdist5[w] > 7
What is a collocation ? A sequence of words that occurs together unusually often, they are resistant to substitution with words that have similar senses.
text4.collocations()
What is a bigram ? A word pair.
list(bigrams(['more', 'is', 'said', 'than', 'done']))
Get the max value
fdist.max()
Get a given frequency in a frequency distribution
fdist.freq(3)
if statement in Python
            if len(word) < 5:
            ...    print('word length is less than 5')
            elif token.istitle():
            ...     print(token, 'is a titlecase word')
            else:
            ...     print(token, 'is punctuation')
            
for statement in Python
            for word in ['Call', 'me', 'Ishmael', '.']:
            ...    print(word)
            
What is a word sense disambiguation ? To assess which sense of a word was intended in a given context.
What is a Pronoun Resolution ? To detect the subjects and objects of verbs : found the antecedents.
What is an anaphora resolution ? Identifying what a pronoun or noun phrase refers to.
What is a semantic role labeling ? Identifying how a noun phrase relates to the verb.
What is text alignment ? Automatically pair up the sentences : Once we have a million or more sentence pairs, we can detect corresponding words and phrases, and build a model that can be used for translating new text.
What is a Spoken Dialogue System ? A pipeline of language understanding components.

What does RTE mean ? Recognizing Textual Entailment

I am Basile, a young software craftsman documenting his entrepreneurship journey. If you liked this article, you can follow my adventures in real time on Twitter. I’m always looking forward to meeting new people and learning from others !

My personal website : basilesamel.com