NLP90 : Self-learn NLP in 90 hours
Pre-requisites : Basics of Machine Learning
The content is designed so that you spend 6hrs per week for around 15 weeks making it 90 hrs (assuming good familiarity with general ML algorithms and Python). Of course, you are free to speed up or take it easy!
Week 1 : General Reading
https://towardsdatascience.com/your-guide-to-natural-language-processing-nlp-48ea2511f6e1
https://medium.com/@ageitgey/natural-language-processing-is-fun-9a0bff37854e
https://www.nltk.org/book/ch01.html
Week 2 : Word Tokenization and Sentence Segmentation
https://stanfordnlp.github.io/stanza/tokenize.html#start-with-pretokenized-text
https://www.nltk.org/api/nltk.tokenize.html
https://www.guru99.com/tokenize-words-sentences-nltk.html
Take 10 paragraphs from any source like Wikipedia and check if you are able to use word tokenization and sentence segmentation on this data. Also compare these results of Stanza with the NLTK. Can you spot any important differences or patterns?
Week 3 : Stemming and Lemmatization
https://www.guru99.com/stemming-lemmatization-python-nltk.html
https://www.nltk.org/book/ch03.html
Week 4 : N-gram models
https://medium.com/swlh/language-modelling-with-nltk-20eac7e70853
Week 5 : Naive Bayes & Sentiment Classification
Week 6 : Sentiment Classification using POS Tagging and Logistic Regression
Week 7 : Text Classification with Logistic Regression
Week 8 : Word Embeddings
Week 9 : Recurrent Neural Networks (RNNs)
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
https://www.youtube.com/watch?v=WCUNPb-5EYI
Dropout in RNNs:
https://adriangcoder.medium.com/a-review-of-dropout-as-applied-to-rnns-72e79ecd5b7b
RNN Regularization:
https://arxiv.org/abs/1409.2329
Week 10 : Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU)
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
http://blog.echen.me/2017/05/30/exploring-lstms/
Stacked LSTMs:
https://machinelearningmastery.com/stacked-long-short-term-memory-networks/
https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/
LSTM Regularization:
https://machinelearningmastery.com/use-weight-regularization-lstm-networks-time-series-forecasting/
Week 11 & 12 : The Attention Mechanism & Transformers
https://www.youtube.com/watch?v=TQQlZhbC5ps
https://www.youtube.com/watch?v=OyFJWRnt_AY
http://nlp.seas.harvard.edu/2018/04/03/attention.html
https://www.youtube.com/watch?v=Osj0Z6rwJB4&list=PLEJK-H61XlwxpfpVzt3oDLQ8vr1XiEhev&index=2
https://kazemnejad.com/blog/transformer_architecture_positional_encoding/
https://towardsdatascience.com/master-positional-encoding-part-i-63c05d90a0c3
Week 13 : BERT
https://medium.com/@mromerocalvo/dissecting-bert-part1-6dcf5360b07f
https://www.youtube.com/c/ChrisMcCormickAI/videos
http://jalammar.github.io/illustrated-bert/
https://www.thepythoncode.com/article/finetuning-bert-using-huggingface-transformers-python
Week 14 : NER using BERT
https://www.depends-on-the-definition.com/named-entity-recognition-with-bert/
https://towardsdatascience.com/named-entity-recognition-with-bert-in-pytorch-a454405e0b6a
Week 15 : Text Classification with BERT
https://www.tensorflow.org/text/tutorials/classify_text_with_bert
https://towardsdatascience.com/text-classification-with-bert-in-pytorch-887965e5820f