ImperialToMetric.com – Education, Methodology, Social Sciences and Technology

ImperialToMetric.com
Education and technologyEnglishTechnology

How to add titles to a sitemap (free solution)

Updated on / dernière mise à jour : 12/09/2023

Sitemaps are an essential tool for website navigation, organization, and search engine optimization. A sitemap is a hierarchical representation of the pages and content on a website, which helps both users and search engines understand the structure of a website and how its pages are related to one another.

Here's an example of a small sitemap that would include a title:

Sitemap with a title

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://justanexample.com/index.html</loc>
<title>The title of this particular webpage</title>
</url>
</urlset>

This being said, search engines may consider sitemaps with titles as erroneous. Hence, you should avoid including such titles to sitemaps. One reason you may want to include titles though is if you coded a searchbar for your website that is actually searching through a particular sitemap for titles; when users enter words in the search field, the titles – as well as the urls – will then also get searched for and referenced to in the shown results. If this is what you are trying to do, then the following explanations are for you. Remember though to keep your existing online sitemap.xml as is (without titles) to avoid problems with search bots (Google, Bing, etc.).

This is how you can create a new sitemap file called "sitemap_titles.xml" that would only be used by your searchbar (this particular xml file would include titles as well as urls):

  • Install Python, which is the programming language used in this script. You can download the latest version of Python for Windows from the official Python website: https://www.python.org/downloads/windows/. The website provides an executable installer file that can be easily used to install Python on a Windows machine. It is important to note that during the installation process, you will want to check the option "Add Python to PATH" to be able to run Python from the command line.
  • You also need to install the following dependencies:

Dependencies required

Requests: a python library for making HTTP requests.
BeautifulSoup4: a python library for parsing HTML and XML documents.
lxml: a library for processing XML and HTML in the Python language.



    • To proceed with the dependencies' installation, you can use your command terminal (with administrator privileges). Use the following commands:
      • pip install requests
      • pip install beautifulsoup4
      • pip install lxml
  • Once all the dependencies are installed and ready to rock, run the command prompt python to access this programming language.
  • Once in python, copy and paste the following command but do not press the Enter key yet

Python code

from bs4 import BeautifulSoup
import requests
import lxml.etree as ET

sitemap_url = "https://YourWebsiteUrlGoesHere/sitemap.xml"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0;Win64) AppleWebkit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}
response = requests.get(sitemap_url,headers=headers, timeout=10)
soup = BeautifulSoup(response.content, 'lxml-xml')
root = ET.Element("urlset", xmlns="http://www.sitemaps.org/schemas/sitemap/0.9")
for loc in soup.find_all('loc'):
url = loc.text
response = requests.get(url,headers=headers, timeout=10)
soup = BeautifulSoup(response.content, 'lxml')
title = soup.find('title')
if title is not None:
title_text = title.get_text()
else:
title_text = "No title found"
child = ET.SubElement(root, "url")
ET.SubElement(child, "loc").text = url
ET.SubElement(child, "title").text = title_text
tree = ET.ElementTree(root)
tree.write("C:/sitemap_titles.xml", pretty_print=True, xml_declaration=True, encoding='UTF-8'

    • You must now include your original sitemap's URL (which doesn't contain titles) by modifying line 4 of the code: sitemap_url = "https://YourWebsiteUrlGoesHere/sitemap.xml"
    • In the last line of the code, you can also modify the location where the new sitemap_titles.xml file (which will include titles) will be saved to.
    • Once the modifications are done, you can then press the Enter key and run the python code.
    • Depending on the size of the sitemap you are scraping, it may take a longer time to process (a few minutes)… so you may want to go get yourself a coffee or tea.
    • When the process is over, a new sitemap_titles.xml file has been created and you'll be able to access it via your chosen location (C:\ or in another folder).
    • To exit python, you can type exit() and/or close your command terminal. Voilà!

Regards.

Robert

Leave a Reply

Your email address will not be published. Required fields are marked *

2 × two =


Robert Radford, M.A., Québec (Canada) © MMXXIII.
All rights reserved.