NLP90 : Self-learn NLP in 90 hours

Kushal Shah

--

Pre-requisites : Basics of Machine Learning

The content is designed so that you spend 6hrs per week for around 15 weeks making it 90 hrs (assuming good familiarity with general ML algorithms and Python). Of course, you are free to speed up or take it easy!

Week 1 : General Reading

https://towardsdatascience.com/your-guide-to-natural-language-processing-nlp-48ea2511f6e1

https://medium.com/@ageitgey/natural-language-processing-is-fun-9a0bff37854e

https://www.nltk.org/book/ch01.html

Week 2 : Word Tokenization and Sentence Segmentation

https://stanfordnlp.github.io/stanza/tokenize.html#start-with-pretokenized-text

https://www.nltk.org/api/nltk.tokenize.html

https://www.guru99.com/tokenize-words-sentences-nltk.html

Take 10 paragraphs from any source like Wikipedia and check if you are able to use word tokenization and sentence segmentation on this data. Also compare these results of Stanza with the NLTK. Can you spot any important differences or patterns?

Week 3 : Stemming and Lemmatization

https://www.guru99.com/stemming-lemmatization-python-nltk.html

https://www.nltk.org/book/ch03.html

Week 4 : N-gram models

https://medium.com/swlh/language-modelling-with-nltk-20eac7e70853

Week 5 : Naive Bayes & Sentiment Classification

https://www.analyticsvidhya.com/blog/2021/07/performing-sentiment-analysis-with-naive-bayes-classifier/

Week 6 : Sentiment Classification using POS Tagging and Logistic Regression

Week 7 : Text Classification with Logistic Regression

Week 8 : Word Embeddings

https://medium.com/@phylypo/a-survey-of-the-state-of-the-art-language-models-up-to-early-2020-aba824302c6

Week 9 : Recurrent Neural Networks (RNNs)

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

https://www.youtube.com/watch?v=WCUNPb-5EYI

Dropout in RNNs:

https://adriangcoder.medium.com/a-review-of-dropout-as-applied-to-rnns-72e79ecd5b7b

RNN Regularization:

https://arxiv.org/abs/1409.2329

Week 10 : Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU)

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

http://blog.echen.me/2017/05/30/exploring-lstms/

Stacked LSTMs:

https://machinelearningmastery.com/stacked-long-short-term-memory-networks/

https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/

LSTM Regularization:

https://machinelearningmastery.com/use-weight-regularization-lstm-networks-time-series-forecasting/

Week 11 & 12 : The Attention Mechanism & Transformers

https://www.youtube.com/watch?v=TQQlZhbC5ps

https://www.youtube.com/watch?v=OyFJWRnt_AY

http://nlp.seas.harvard.edu/2018/04/03/attention.html

https://www.youtube.com/watch?v=Osj0Z6rwJB4&list=PLEJK-H61XlwxpfpVzt3oDLQ8vr1XiEhev&index=2

https://kazemnejad.com/blog/transformer_architecture_positional_encoding/

https://towardsdatascience.com/master-positional-encoding-part-i-63c05d90a0c3

https://towardsdatascience.com/https-medium-com-chaturangarajapakshe-text-classification-with-transformer-models-d370944b50ca

Week 13 : BERT

https://medium.com/@mromerocalvo/dissecting-bert-part1-6dcf5360b07f

https://www.youtube.com/c/ChrisMcCormickAI/videos

http://jalammar.github.io/illustrated-bert/

https://www.thepythoncode.com/article/finetuning-bert-using-huggingface-transformers-python

Week 14 : NER using BERT

https://www.depends-on-the-definition.com/named-entity-recognition-with-bert/

https://towardsdatascience.com/named-entity-recognition-with-bert-in-pytorch-a454405e0b6a

Week 15 : Text Classification with BERT

https://www.tensorflow.org/text/tutorials/classify_text_with_bert

https://towardsdatascience.com/text-classification-with-bert-in-pytorch-887965e5820f

--

--

Kushal Shah

Building Self Shiksha, a free e-learning platform for AI/ML and teaching at Sitare University. Studied at IIT Madras, and taught at IIT Delhi and IISER Bhopal.