Identifying SEM keywords from a technical text

Identifying keywords for Search Engine Marketing (SEM) or on-site Search Engine Optimisation (SEO) efforts is a basic task for many marketers. Usually, one can retrieve keywords easily, thinking “What is my text about?” or “How would a customer search for this text?”. However, identifying keywords gets tricky when the marketer is not a subject-matter expert for the content that he or she is advertising for. This post introduces a text analysis tool to help non-experts extract meaningful keywords from a technical text.

Text analysis is a form of machine learning that aims to understand and structure text. If you are a versed Python or R programmer, you can set up your own text analysis in a matter of minutes. However, if you are new to programming, you can use a service such as MonkeyLearn that aims to make machine learning accessible to anyone.

The Monkey Learn browser interface

In this example, I want to retrieve SEM keywords from a text that describes a method to manufacture Tin sulfide solar panels. As I am not a solar panel engineer, I find it daunting to identify meaningful keywords from such a highly technical text. Monkey Learn’s keyword extractor comes to the rescue. Paste your text into the text box, click “Extract Text” and receive some meaningful keywords within a few seconds.

It can be difficult to identify meaningful keywords for SEM or SEO campaigns when you don’t understand the text from which you want to retrieve the keywords. Text analysis tools such as Monkey Learn’s Keyword Extractor provide an efficient way to take the guesswork out of keyword generation. You will have keywords that lead your audience to exactly the kind of text they’ve been looking for.

Programmatic keyword generation

We could leave it at that and turn back to our marketing efforts. However, Monkey Learn offers an API that allows us to programmatically analyze lots of technical texts, set parameters, and retrieve a larger number of keywords per text (the default is 10).

The complete code is available in a Jupyter notebook on GitHub. Here is the workflow in plain English:

  • import the MonkeyLearn library
  • import JSON library (to wrangle JSON results of the API)
  • retrieve the API key (grab it here)
  • read your text input (into a list)
  • set the id of the model you want to run
  • make the API request
  • retrieve the keywords (and supporting information, e.g., the number of appearances in the text)

To get started, install the Monkey Learn Python Library in your terminal:

$ pip install monkeylearn

Set up a Jupyter notebook and load the libraries:

from monkeylearn import MonkeyLearn
import json

In order to use the Monkey Learn API, we need an API key. When you are logged into your Monkey Learn account, click on the circle in the upper right corner to open your account settings. Your personal API key will be provided right in the menu that opens.

When you are logged in, find your API key in the app of the Keyword Extractor or find it in the menu that opens when you click the account icon in the upper right corner.

I like to store my API key in a text file locally (which I exclude from GitHub upload). Read the key back into memory using:

with open("./api_key.txt",'r') as infile:
    api_key = infile.read()
monkeylearn = MonkeyLearn(api_key)

Next, I load my text input. In my example of the article on solar panel manufacturing, I included the title, abstract, introduction, and discussion of the scientific article.

infile_path = "./text_input.txt"
with open(infile_path,'r') as infile:
    # Read the file in one line
    text_data = infile.read()
    # Replace newline characters
    text_data = text_data.replace("\n", " ")
# The keyword extractor expects input in a list
data = [text_data]

It’s time for the API call:

# Specify the model (i.e., the keyword extractor)
model_id = 'ex_YCya9nrn'
# Set up the API call
result = monkeylearn.extractors.extract(model_id, data, 
    extra_args={'max_keywords':100, 'lowercase':1})

Note the extra arguments to the extractor model. Here, I request 100 keywords and set the lowercase argument to True (default is False). At this point, we could provide additional options. For instance, we can set the text transformation to stemming instead of the default lemmatization, or provide a blacklist (e.g., exclude keywords that have a too high cost per click rate). The options are listed in the keyword extractor’s documentation.

To get familiar with the API response for the extractor call, you might want to print the entire results with:

print(result.body)

To retrieve only the information, that we need, we loop through the ‘extractions’ and grab the keyword, its count, and relevance:

# Create a list of tuples containing 
# the keyword, its count, and its relevance
keywords = []
for keyword in result.body[0]['extractions']:
    keywords.append((keyword['parsed_value'], keyword['count'], keyword['relevance']))
# print the results
keywords

The result is a list of keywords, with the number of appearances in the text, and a relevance score. The relevance score (between 0 and 1) is high for words that are common in our provided data but rare in the English language and are, hence, likely good keywords to describe the content of our text sample.

Conclusions

Text analysis is a convenient approach to extract SEM keywords from a technical text. Services such as Monkey Learn allows marketers to conduct text analysis without any programming skills. Accessing the Monkey Learn API allows programmatic text analysis with more control over the model and the option to retrieve a higher number of keywords.

Leave a Reply

Your email address will not be published.