HTML-Scraping with RegEx

by ps1 | October 20,2011

Table of Contents

Scraping Website Data with PowerShell

To scrape valuable information from websites with PowerShell you can download the HTML code and then use regular expressions to extract what you are after. That’s not hard. Here is a sample:

$webclient = New-Object System.Net.WebClient
$html = $webclient.DownloadString('http://www.cnn.com') | Out-String

$headerpattern = '(?i)<h1>(.*?)</h1>'

$header = ([regex]$headerpattern).Matches($html) |
  ForEach-Object { $_.Groups[1].Value }
$header

Downloads the HTML Content

It downloads the HTML content from www.cnn.com and then extracts all <h1>…</h1> headers. That way, you get a quick headline overview.

ReTweet this Tip!

Table of Contents

Free Trial

Explore all the products and find the right solution for your business.

Start your free trial

Monitor & Protect

Idera SQL

Webyog

Data Modeling & Management

Aqua Data Studio

ER/Studio

WhereScape

Migration & Intelligence

BitTitan

Perspectium

Yellowfin

Free Trial

Resources

Support

Events

Contact Sales

Customers

Free Trial

Enterprises

Database

Cloud Services

Applications

Free Trial

SQL Diagnostic Manager

SQL Compliance Manager

SQL Secure

SQL Safe Backup

SQL Inventory Manager

SQL Admin Toolset

DB PowerStudio

Free Tools

HTML-Scraping with RegEx

Scraping Website Data with PowerShell

Downloads the HTML Content

Free Trial

Recommended Articles