Let's begin with a short introduction and then with coding.
What is a Wordcloud?
It is a representation of word frequency that provides proper prominence to words that appear more frequently in a text file. We can see which word is used most and what words are used more frequently than ohers. Words which appear more frequently attain bigger size than those which appear less frequently. It means word which occur most of the time attains the largest size in a wordcloud.
What are the uses of Wordcloud?
In data science field, wordclouds are very useful in visualization of text files. It provides easy undesrtanding of text files with respect to their content.
Wordcloud as any other uses but let's keep it here and start with coding.
Creating Wordcloud
For this project, I'll create a wordcloud from a text by writing a small Python script.
Code flow:
- Import libraries
- Create uploader widget (optional but suitable)
- Declare uninteresting words
- Ignore uninteresting words and assign frequencies
- Generate wordcloud
You need to have some libraries before creating one. These are some basic libraries of Python.
If you don't have wordcloud library, you can install it using this command: pip install wordcloud.
Now after importing all these libraries, let's move further.
There is a function named upload(). It has a lot of preliminary work. This is an uploader widget which will upload your text file and save all its content in a variable named file_contents.
Before running this function, you need to have a text file which must be upoaded in order to generate a wordcloud. A very simple and understandable text file is provided for you in the GitHub repository (link given at the bottom).
You can also visualize contents of a web page by copying and pasting it in a file (no need to call upload() function then).
After running the upload() function, it will show a browse button (mentioned above) to select a text file from the system . Select the required file.
Now, you need to understand certain things about words/characters which must be ignored before proceeding further. There are some punctuation marks that are used very frequently, the most common of them is full stop(.).
Also, apart from those punctuations there are some words which I call as uninteresting words. All the punctuations and uninteresting words are declared in the above mentioned lists.
We need to ignore/exclude these words and punctuation marks so that they do not become part of the wordcloud. They are really common in any text files.
The function which is mentioned above is the most important function. calculate_frequencies() function will first ensure that the words which belongs to puntuations and uninteresting words are ignored. After that it will assign frequencies of each word in a dictionary.
Now, we only need to call the calculate_frequencies() function to generate your very own wordcloud.
If you have followed all the steps carefully, this will display the required wordcloud.
This is just a basic wordcloud. You can try it using any other text file or copy content directly from the web.
Hope it helped you to understand the basics of a Wordcloud. Kindly subscribe to the Telegram channel for more updates.
Kindly give your choice for the next blog: Next Blog?
Thanks!
0 Comments
Thank You!