leafleafleafDocy banner shape 01Docy banner shape 02Man illustrationFlower illustration

Getting Started with Python wordcloud

Estimated reading: 4 minutes 15 views

We have previously obtained the high-frequency words in “Journey to the West”, but the word segmentation results are too unintuitive. Next, we describe how to use word clouds to represent this information.

First, we need to understand what a word cloud is. Word cloud, also known as word cloud, is a visually prominent presentation of “keywords” that appear frequently in text data, rendering the keywords into cloud-like color pictures, so that you can understand what the text data needs at a glance. the main meaning expressed. In this section, we use the wordcloud module of wordcloud.

Python wordcloud installation

First, we need to install the wordcloud module. Or use the pip tool, enter “pip install wordcloud” in the command line, and press Enter to start the installation, as shown in Figure 1.

Python install wordcloud module
figure 1
After successful installation, we can see that the version of the currently installed wordcloud module is 1.5.0, as shown in Figure 2.

Python installed wordcloud module
figure 2
Because the wordcloud module uses the matplotlib module, if this module has not been installed, an error will be reported when calling wordcloud, as shown in Figure 3.

Python install wordcloud module
image 3
So in order to use wordcloud, we also need to install matplotlib, as shown in Figure 4.

Python install matplotlib
Figure 4

Use of Python wordcloud

When using wordcloud, import the wordcloud module and then use the wordcloud.WordCloud() method to create a word cloud image. We can specify the font, image size and background color of the word cloud.

We have added the function of generating word cloud map to the word segmentation code of Journey to the West, and the new code is highlighted. code show as below.

import jieba
 import wordcloud
def takeSecond(elem):
    return elem[1]
def createWordCloud(text):
    w=wordcloud.WordCloud
       (font_path="msyh.ttf",width=1000,height=500,background_color="white")
    w.generate(text)
    w.to_file("Journey to the West word cloud map.jpg")
def main():
    path = "Journey to the West.txt"
    file = open(path,"r",encoding="utf-8")
    text=file.read()
    file.close()
    words = jieba.lcut(text)
    counts = {}
    for word in words:
        if len(word) == 1:
            continue
        elif word == "Great Sage" or word=="Old Sun" or word=="Walker" or word=="Sun Dasheng" or word=="Sun Xingzhe" or word=="Monkey King" or word== "Wukong" or word=="Monkey King" or word=="Monkey":
            rword = "Sun Wukong"
        elif word == "Master" or word == "Sanzang" or word=="Saint Monk":
            rword = "Tang Monk"
        elif word == "nerd" or word=="Bajie" or word=="old pig":
            rword = "Pig Bajie"
        elif word=="Monk Sha":
            rword="Sand monk"
        elif word == "fairy" or word=="demon" or word=="demon":
            rword = "monster"
        elif word=="Buddha":
            rword="Tathagata"
        elif word=="Three Princes":
            rword="white horse"
        else:
            rword = word
        counts[rword] = counts.get(rword,0) + 1
    file = open("excludes.txt","r")
    excludes =file.read().split(",")
    file.close
    for delWord in excludes:
        try:
            del counts[delWord]
        except:
            continue
    items = list(counts.items())
    items.sort(key = takeSecond, reverse=True)
    for i in range(20):
        item=items[i]
        keyWord =item[0]
        count=item[1]
        print("{0:<10}{1:>5}".format(keyWord,count))
    createWordCloud(str(items[0:20]))
main()

As mentioned earlier, first you need to import the wordcloud module. Then we added a new custom function called createWordCloud(), whose parameter is the phrase to generate the word cloud image, we call this parameter text.

In this function, a word cloud object is first generated, the path of the specified font is “msyh.ttf”, the width of the image is 1000 pixels, the height is 500 pixels, and the background color is white. Then call the generate method to generate a word cloud, and the incoming parameter is the phrase (ie text) to be generated in the word cloud image. Then, specify the file to which the word cloud graph should be output.

Remember, if there is Chinese in the word, we must set the font path, otherwise it will come out as a box instead of text, as shown in Figure 5.

Python install wordcloud module
Figure 5
In the main() function, this newly added function is called, and the first 20 high-frequency words in the list items are converted into strings and passed to this function as parameters.

Running the program, you can get our word cloud map under the specified path. You can see that the more high-frequency words, the larger and more eye-catching the font, as shown in Figure 6.


Image 6
Through this word cloud map, we can see the important characters in “Journey to the West” at a glance. The whole novel revolves around the four masters and apprentices of Tang Seng. Their core task is to “obtain scriptures”. “Monster” and “Little Demon”, but fortunately they have important backup “Bodhisattva” and “Tathagata”.

At this point, our word segmentation of the novel “Journey to the West” is over. You can also find some articles you like and use the jieba participle to analyze the main characters and plots in the article.

Leave a Comment

CONTENTS