[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

发布于 2019-09-26 作者 风铃 112次 浏览 版块 前端

2.1 Introduction to Word Embeddings


2.1.1 Word Representation


Featurized representation: word embedding


use an n-dimensional vector to represent one word




2.1.2 Using word embeddings



Transfer learning and word embeddings




2.1.3 Properties of word embeddings




Cosine similarity




2.1.4 Embedding matrix




2.2 Learning Word Embeddings: Word2VEC & Glove


2.2.1 learn word embeddings






2.2.2 Word2Vec




2.2.3 GloVe word vectors






2.3 Applications using Word Embeddings


2.3.1 Sentiment Classification




RNN for sentiment classification




2.3.2 Debiasing word embeddings


bias problem







Q&A



5.A


Ee is computationally wasteful.


E






Suppose you learn a word embedding for a vocabulary of 10000 words. Then the embedding vectors
should be 10000 dimensional, so as to capture the full range of variation and meaning in those
words.











True












False







Correct





Question 2
Correct
1 / 1 points



2. Question 2




What is t-SNE?











A linear transformation that allows us to solve analogies on word vectors












A non-linear dimensionality reduction technique







Correct








A supervised learning algorithm for learning word embeddings












An open-source sequence modeling library









Question 3
Correct
1 / 1 points



3. Question 3




Suppose you download a pre-trained word embedding which has been trained on a huge corpus of text.
You then use this word embedding to train an RNN for a language task of recognizing if someone is
happy from a short snippet of text, using a small training set.
























x (input text) y (happy?)
I'm feeling wonderful today! 1
I'm bummed my cat is ill. 0
Really enjoying this! 1

Then even if the word “ecstatic” does not appear in your small training set, your RNN might
reasonably be expected to recognize “I’m ecstatic” as deserving a label
y=1.












True







Correct








False









Question 4
Incorrect
0 / 1 points



4. Question 4




Which of these equations do you think should hold for a good word embedding? (Check all that apply)











eboyegirlebrotheresister







This should be selected








eboyegirlesisterebrother







Un-selected is correct








eboyebrotheregirlesister







Correct








eboyebrotheresisteregirl







This should not be selected



Recall the logic of analogies! The order of the words matter.








Question 5
Incorrect
0 / 1 points



5. Question 5




Let E be an embedding
matrix, and let
e1234
be a one-hot vector corresponding to word 1234. Then to get the embedding of word 1234, why don’t we
call Ee1234
in Python?











It is computationally wasteful.












The correct formula is
ETe1234/span>
.












This doesn’t handle unknown words (<UNK>).












None of the above: Calling the Python snippet as described above is fine.







This should not be selected





Question 6
Correct
1 / 1 points



6. Question 6




When learning word embeddings, we create an artificial task of estimating
P(targetcontext)/span>
. It is okay if we do poorly on this
artificial prediction task; the more important by-product of this task is that we learn a useful set
of word embeddings.











True







Correct








False









Question 7
Correct
1 / 1 points



7. Question 7




In the word2vec algorithm, you estimate
P(tc)
, where
t is the target word
and c is a context
word. How are t and
c chosen from the
training set? Pick the best answer.











c is the
one word that comes immediately before
t.












c is a
sequence of several words immediately before
t.












c and
t are
chosen to be nearby words.







Correct








c is the
sequence of all the words in the sentence before
t.









Question 8
Incorrect
0 / 1 points



8. Question 8




Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings. The
word2vec model uses the following softmax function:


P(tc)=eθTtec10000t=1eθTtec



Which of these statements are correct? Check all that apply.











θt
and
ec
are both 500 dimensional vectors.







Correct








θt
and
ec
are both 10000 dimensional vectors.







Un-selected is correct








θt
and
ec
are both trained with an optimization algorithm such as Adam or gradient descent.







Correct








After training, we should expect
θt
to be very close to
ec
when t and
c are the
same word.







This should not be selected





Question 9
Correct
1 / 1 points



9. Question 9




Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings.The GloVe
model minimizes this objective:


min10,000i=110,000j=1f(Xij)(θTiej+bi+bjlogXij)
2


Which of these statements are correct? Check all that apply.











θi
and
ej
should be initialized to 0 at the beginning of training.







Un-selected is correct








θi
and
ej
should be initialized randomly at the beginning of training.







Correct








Xij
is the number of times word i appears in the context of word j.







Correct








The weighting function
f(.)
must satisfy
f(0)=0
.







Correct



The weighting function helps prevent learning only from extremely common word pairs. It is
not necessary that it satisfies this function.








Question 10
Correct
1 / 1 points



10. Question 10




You have trained word embeddings using a text dataset of
m1
words. You are considering using these word embeddings for a language task, for which you have a
separate labeled dataset of
m2
words. Keeping in mind that using word embeddings is a form of transfer learning, under which of
these circumstance would you expect the word embeddings to be helpful?











m1
>>
m2







Correct








m1
<<
m2








收藏
暂无回复