- What is NLTK?
- Natural Language Toolkit
- Navigation menu
- What is Natural Language Processing?
- NLP Tutorial 3 - Extract Text from PDF Files in Python for NLP - PDF Writer and Reader in Python
- NLTK (Natural Language Toolkit) Tutorial in Python
- NLTK: the natural language toolkit
- New Citation Alert!
- Natural language toolkit pdf writer
- NLTK: Build Document Classifier & Spell Checker with Python
The question is, what is Natural Language Processing? Natural Language Processing NLP is basically a branch of artificial intelligence which deals with understanding our own simple languages and interacting with humans.
What is NLTK?
In this video, we will download the NLTK module, and all the additional resources associated with it. I hope you've downloaded Python and set it up on your PCs. What we do here is we have a new project in Python. In the main directory, I have a file, a TXT file, requirements. So you don't really need to worry about going to the Terminal or CMD and getting these yourself.
It will handle this. PyCharm will do this all by itself.
If I move back to setting-up. What you do here is you just click yes, and it will download all the requirements that have not yet been satisfied.
It's not giving me that warning. Now the first step is complete. We have downloaded the NLTK module.
Natural Language Toolkit
It is here in the Python library. Now what we want to do As I move on along in the course I'll explain why we need these additional resources. So I say: import nltk Then I say: nltk. It shows the run setting up.
Just give it a minute. Here it is up and running. Modules and All Packages. I have installed most of them here. What I want you to do it just go to Collections, click on "all".
Make sure that this "all" is highlighted. If you click on "all" you can see that I want you to click on "all".
Just make sure that you clicked on "all" and then press "Download". It is going to download now all the resources associated with the NLTK module. It will take some time to download as the file sizes are huge. You'll get it done.
I'll see you in the next video and we'll proceed with an introduction to the NLTK module. I hope you get this done by then. Thank you! So in the last video, what we did was we downloaded all the additional resources associated with our NLTK module. I hope you have downloaded them because you will be needing them throughout this course.
So in this video, what we are going to do is we are going to access some very basic resources of the NLTK module.
So I say: from nltk. If I just run this right now, I haven't typed anything else, I've just imported this, you can see that what it is doing is So how do I do that? I can say: print and then I say: texts So this is a function we have; a texts function, what it does is that it loads all the texts into our memory, and if I run it, I can see that. So you can see that, till here, this was what we had previously and after that, what this does is, this loads everything So you can see that what this has done, is this has loaded all the sentences that this module has, like introductory sentences.
And what if I want to access them individually, I can definitely do that. So I can say: print and if I run this now, you'll see that it will give me the first sentence it has.
What is Natural Language Processing?
So it has given me a list here and each element of the list is a word. Okay, so this is returning me the first sentence there.
NLP Tutorial 3 - Extract Text from PDF Files in Python for NLP - PDF Writer and Reader in Python
And similarly, I can also access a text individually, like this function here. What it did was it loaded all the texts that are here in the nltk. So how can I access a text individually? It's similar to what we did with sentences.
NLTK (Natural Language Toolkit) Tutorial in Python
We say: print and if I do this, it is going to give me an object of text1 So you can see this here, that it has returned me the first text there.
You can see the first text is "Moby Dick by Herman" and here I have an object text, which is the same here, like this one here. And okay, so now there are various functions associated with each text. I'll just show you something.
Can you see this here? If I type text1. I can see a lot of functions that are available here. So in the next video, what we are going to do is we are going to discuss some of the very frequently used functions and help you understand what they do and how we can actually use them to build our own applications.
So, in the last video, what we did was we discussed how to access the very basic introductory texts of the nltk. In this video, what we will be doing is that we will be discussing very basic introductory functions which we use extensively for analysis.
NLTK: the natural language toolkit
So if you remember, we had these 9 introductory texts in the nltk. So now, what I want you to know is that this text is basically a text class, and the hierarchy for this text class is nltk.
Text So, if I say print , and I say text1 or text2, anything here, and if I print this now, you're going to see that it actually shows us this hierarchy, the type of text2, which is nltk. Text, which means that we have the "nltk" module first, and then there's a package named "text", and then there's this class, "Text".
New Citation Alert!
So you can see this here that we have this as a class. So what we will be doing here is we will be discussing some very basic functions of this class here, "Text", or this here, the same thing. So the first function which we are going to discuss is "concordance". So what this does is this expects a single word as its input, and what it gives us as an output is that it searches for the word we have given it as an input, and it returns us all the occurrences of that word in our text with some context.
Natural language toolkit pdf writer
By context, I mean the words which appear before that word and the words which appear after that word. So we'll just see this now, what we actually mean by this.
So I say: text1. And if I run this now, we're just going to see that it is going to give us all the occurrences of man with some context in text1.
So you can see that it says displaying 25 of matches. So it has, you know, it has found occurrences of the word "man", and you can see that here, it is "man", and it is also giving us some context to the word man, some words before the word "man", and some words after the word "man".
NLTK: Build Document Classifier & Spell Checker with Python
Okay, so this is what "concordance" does. This is pretty helpful, you know, if you want to see where a word is appearing and in what context it is appearing usually. The next function which we are going to discuss is "similar".
So what "similar" does is it expects a single word as its input, and what it gives us in return is that it searches for that word which we have provided in the input, and it fetches the context of that word, like, you know, every context, and then it returns us all the words that appear in the same context.
I'm just going to run this, and you'll understand it better than what I'm trying to say. I say: print and let's say I go for "woman". So if I print this now, what this is going to do is this is going to search for the word woman in text1, and then what it is going to return us is that it is going to return us all the words that appear in the same context as the word "woman".
So you can see them here, the words which appear in the same context as the word "woman": man, king, wife, hussar, fiddler, bull, laugh, writer. So all of these words, they appear in the same context as the word "woman" in text1, which is Moby Dick here. Okay, so this is what we have for similar. Was it used very extensively at the start of the text, or was it used extensively at the end, or in the middle, or wherever? So what this does is this specifically returns us a graph where on our y axis, we have the words we have given it as the input of which we want to find the frequency.
And on the x axis, what we have is we have the number of words and it displays us where this word occurred. So I'm just going to, you know, run this, and hopefully you'll understand this better, what I'm trying to tell you right now. I'll just explain. Okay, so text, let's say I'll go with text4 here, which is this here. So I say text4. So I give it a list.
I say, let's say I give it "democracy", "freedom", "law". So if I run this now, this function is going to give me a graph as its output. You're just going to see that right now. And I'll explain what I was telling you earlier. So you can see this here that on the y axis, what I have is I have the list I gave it as its input: "democracy", freedom", "law"; three words.
The word "freedom" has been used throughout the text.