While creating Meta Preview Tool I had to build a way to grab a page’s metadata to improve the site’s experience. When users enter links into sites they expect a preview of the URL. Apps like Discord, iMessage, Messenger all show previews of URLs when they are entered.
In this post I’ll show you how to grab metadata for a webpage using Node.js and Cheerio. With these tools, you can fetch and parse metadata from any web page.
Full Code Snippet
Since you’re likely here for code, here’s my full code for grabbing meta title, description, and og:image using Node and Cheerio. Install Cheerio first if you haven’t already:
npm install cheerio
Why Server-Side JavaScript?
You can’t fetch another site’s HTML from the browser. Cross-origin security restrictions block those requests unless the target site explicitly allows it, which most don’t. Running the fetch server-side sidesteps this entirely.
Node.js handles async HTTP requests well and deploys easily anywhere. The script works as a Cloudflare Worker, a Vercel serverless function, or any standard Node server without changes.
Explaining the Code
Importing Cheerio
Cheerio parses raw HTML on the server using jQuery-style selectors. It’s fast, lightweight, and has no browser dependencies. One thing to know upfront: Cheerio doesn’t execute JavaScript. It only reads the HTML the server returns. For most sites that’s fine since meta tags are in the static HTML, but more on that in the error section.
Validating the URL
Before fetching anything, we need a proper URL. If the URL doesn’t start with “http://” or “https://”, we prepend “https://” to it. I found this to be a simple way to improve the experience. If you want to be more thorough, use isURL from the Validator package, but for most cases the basic check is enough.
Fetching the HTML
With the URL ready, we fetch the page HTML using the Fetch API. Note that some sites block automated requests by checking the User-Agent header or running Cloudflare bot protection. For a personal project or low-volume tool this works fine. For higher-volume use, you’ll want a rotating proxy or a service like ScrapingBee.
Parsing HTML with Cheerio
Once we have the raw HTML, Cheerio loads it into a DOM-like structure we can query with selectors.
Extracting Metadata
With the DOM ready, we pull three fields:
- Title: Grabs the
<title>tag text. - Description: Checks
<meta name="description">first, then falls back to<meta property="og:description">. - Image: Checks
<meta property="og:image">first, then falls back to<meta name="twitter:image">.
Any field that’s missing defaults to an empty string. Here’s what a successful result looks like:
{
"title": "Example Domain",
"description": "This domain is for illustrative examples.",
"image": "https://example.com/og-image.png"
}
One thing to watch: og:image values are sometimes relative paths rather than absolute URLs. If you’re displaying the image in your app, check whether it starts with “http” and resolve it against the page’s base URL if not.
Navigating Errors Gracefully
The function wraps everything in a try/catch. If the fetch fails or the HTML is unparseable, it logs the error and returns an empty metadata object. This keeps your app from crashing when it hits an unreachable or malformed URL.
One limitation worth knowing: this won’t work for single-page applications that inject meta tags with JavaScript after page load. Cheerio only sees the raw HTML from the server. If a site’s <title> or meta description is set by React or Vue on the client, you’ll get empty strings back. For those cases you’d need a headless browser like Puppeteer.
Wiring It Up as an API Endpoint
The function is meant to run on a server, not in the browser. Here’s how to expose it as an API route in Next.js so your frontend can call it:
// app/api/meta/route.js
import { getMetaData } from '@/lib/getMetaData';
export async function GET(request) {
const { searchParams } = new URL(request.url);
const url = searchParams.get('url');
if (!url) {
return Response.json({ error: 'url parameter required' }, { status: 400 });
}
const meta = await getMetaData(url);
return Response.json(meta);
}
Then call it from your frontend:
const res = await fetch(`/api/meta?url=${encodeURIComponent(inputUrl)}`);
const meta = await res.json();
If you’re on Cloudflare Workers the pattern is the same, just swapping in Cloudflare’s request handler format.
Wrapping It Up
For most link preview use cases, this approach works well. Cheerio is lightweight, the setup is minimal, and the fallback logic handles the messiness of real-world HTML. The main edge case is JavaScript-rendered sites. If those make up a big portion of your URLs, reach for Puppeteer instead. Otherwise this is a solid starting point and has worked well in production for Meta Preview Tool.