Grab Meta Tags (Title/Description) from Sites with Node.js

While creating Meta Preview Tool I had to build a way to grab a page’s metadata to improve the site’s experience. When users enter links into sites they expect a preview of the URL. Apps like Discord, iMessage, Messenger all show previews of URLs when they are entered.

That’s why in this quick post I will show you how to grab metadata for a webpage using Node.js and Cheerio. With these tools, you can easily fetch and parse metadata from any web page.

Let’s dive in.

Full Code Snippet

Since you’re likely here for code, here’s my full code for grabbing meta title, description, and og:image using Node and Cheerio.

Why Use Node and Cheerio for Meta Tag Retrieval?

Unfortunately, we can’t use straight vanilla JS to grab metadata of pages because of cross-origin security restrictions. That’s where this script comes in: by using fetch on the server side, we can read a page’s HTML and grab metadata to present it in any way you want for your apps.

Node.js is flexible, efficient, and great for handling asynchronous tasks, making it ideal for retrieving meta tags. Whether you’re using a Cloudflare Worker or a custom app on Vercel, you can use this to grab metadata, which is perfect for web scraping. We’ll also use Cheerio, a lightweight library that brings the familiar jQuery syntax to the server side, making it easy to parse HTML and extract data.

Explaining the Code

Importing Cheerio

First we need to import Cheerio. Cheerio allows us to traverse and manipulate the DOM, making it easier to find the meta tags we’re after.

Validating the URL

Before fetching anything, we need a proper URL. If the URL doesn’t start with “http://” or “https://”, we prepend “https://” to make sure it’s valid. I found this to be a simple way to improve the experience. If you want to be more in-depth you could also use isURL from the Validator package, but I wanted to keep this as simple as possible.

Fetching the HTML

With our URL ready, now we can fetch the HTML content of the page. Using the Fetch API, our request goes out, retrieves the content, and brings back the HTML for Cheerio. Not that some sites may block bots so larger scale apps will likely want to use a proxy or a service like ScrapingBee.

Parsing HTML with Cheerio

Now we can extract the metadata from the HTML with Cheerio.

Extracting Metadata

With our structured DOM in place we can go looking for the page’s meta title, meta description, and open graph image. Here’s how:

  • Title: We grab the <title> tag, fetching its text content.
  • Description: Next, we sift through meta tags to find the description. The function checks for the standard <meta name="description"> or as a fallback Open Graph <meta property="og:description">.
  • Image: The image adds visual flair to a preview. We look for <meta property="og:image">, or fallback to Twitter’s <meta name="twitter:image">.

If the script doesn’t find anything, it will default to an empty string to keep our data tidy and predictable.

Navigating Errors Gracefully

Unfortunately, web scraping isn’t always so straightforward. Errors can range from an unreachable website to an unexpected HTML structure. That’s why the script is wrapped in a try/catch. If the fetch operation hits a snag, we log the error with a clear message.

By catching these errors early, we prevent unexpected breakdowns and can return a consistent, empty metadata object.

Wrapping It Up

As you can see, fetching metadata with Node.js and Cheerio is fairly simple. By handling URLs, fetching content, and parsing, we can easily package essential metadata into previews that capture attention.

Of course, you’ll have to convert this into an API endpoint (which is fairly simple when using something like Next.js). Then, your application can grab from that endpoint and present a preview of any link someone is using in your app.

Leave a Comment