Scraping Website Data with PowerShell
To scrape valuable information from websites with PowerShell you can download the HTML code and then use regular expressions to extract what you are after. That’s not hard. Here is a sample:
$webclient = New-Object System.Net.WebClient $html = $webclient.DownloadString('http://www.cnn.com') | Out-String $headerpattern = '(?i)<h1>(.*?)</h1>' $header = ([regex]$headerpattern).Matches($html) | ForEach-Object { $_.Groups[1].Value } $header
Downloads the HTML Content
It downloads the HTML content from www.cnn.com and then extracts all <h1>…</h1> headers. That way, you get a quick headline overview.