Skip to main content

Headless WebKit now means Headless Chromium Chrome instead of headless PhantomJS

Announcement


Bad news for beloved PhantomJS:

Headless mode allows running Chromium in a headless/server environment. Expected use cases include loading web pages, extracting metadata (e.g., the DOM) and generating bitmaps from page contents -- using all the modern web platform features provided by Chromium and Blink.

To use headless, start Chrome with a command line flag:

 $ chrome --headless --remote-debugging-port=9222 https://chromium.org

Chrome Platform Status

Puppeteer

Puppeteer is a Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome or Chromium.

GitHub

Usage from Node.js


For example, the chrome-remote-interface Node.js package can be used to extract a page's DOM like this:

const CDP = require('chrome-remote-interface');

CDP((client) => {
  // Extract used DevTools domains.
  const {Page, Runtime} = client;

  // Enable events on domains we are interested in.
  Promise.all([
    Page.enable()
  ]).then(() => {
    return Page.navigate({url: 'https://example.com'});
  });

  // Evaluate outerHTML after page has loaded.
  Page.loadEventFired(() => {
    Runtime.evaluate({expression: 'document.body.outerHTML'}).then((result) => {
      console.log(result.result.value);
      client.close();
    });
  });
}).on('error', (err) => {
  console.error('Cannot connect to browser:', err);
});

Google Source




Comments