Ensuring Correct Text File Encoding in PowerShell
Text files can be stored using different encodings, and to correctly reading them, you must specify the encoding. That’s why most cmdlets dealing with text file reading offer the -Encoding parameter (for example, Get-Content). If you don’t specify the correct encoding, you are likely ending up with messed up special characters and umlauts.
Automatically Determining Text File Encoding
Yet how do you (automatically) determine the encoding a given text file uses? Here is a handy function that can help:
function Get-Encoding { param ( [Parameter(Mandatory,ValueFromPipeline,ValueFromPipelineByPropertyName)] [Alias('FullName')] [string] $Path ) process { $bom = New-Object -TypeName System.Byte[](4) $file = New-Object System.IO.FileStream($Path, 'Open', 'Read') $null = $file.Read($bom,0,4) $file.Close() $file.Dispose() $enc = [Text.Encoding]::ASCII if ($bom[0] -eq 0x2b -and $bom[1] -eq 0x2f -and $bom[2] -eq 0x76) { $enc = [Text.Encoding]::UTF7 } if ($bom[0] -eq 0xff -and $bom[1] -eq 0xfe) { $enc = [Text.Encoding]::Unicode } if ($bom[0] -eq 0xfe -and $bom[1] -eq 0xff) { $enc = [Text.Encoding]::BigEndianUnicode } if ($bom[0] -eq 0x00 -and $bom[1] -eq 0x00 -and $bom[2] -eq 0xfe -and $bom[3] -eq 0xff) { $enc = [Text.Encoding]::UTF32} if ($bom[0] -eq 0xef -and $bom[1] -eq 0xbb -and $bom[2] -eq 0xbf) { $enc = [Text.Encoding]::UTF8} [PSCustomObject]@{ Encoding = $enc Path = $Path } } }
Here is a test run checking all text files in your user profile:
PS> dir $home -Filter *.txt -Recurse | Get-Encoding Encoding Path -------- ---- System.Text.UnicodeEncoding C:\Users\tobwe\E006_psconfeu2019.txt System.Text.UnicodeEncoding C:\Users\tobwe\E009_psconfeu2019.txt System.Text.UnicodeEncoding C:\Users\tobwe\E027_psconfeu2019.txt System.Text.ASCIIEncoding C:\Users\tobwe\.nuget\packages\Aspose.Words\18.12.0\... System.Text.ASCIIEncoding C:\Users\tobwe\.vscode\extensions\ms-vscode.powers... System.Text.UTF8Encoding C:\Users\tobwe\.vscode\extensions\ms-vscode.powers...
psconf.eu – PowerShell Conference EU 2019 – June 4-7, Hannover Germany – visit www.psconf.eu There aren’t too many trainings around for experienced PowerShell scripters where you really still learn something new. But there’s one place you don’t want to miss: PowerShell Conference EU – with 40 renown international speakers including PowerShell team members and MVPs, plus 350 professional and creative PowerShell scripters. Registration is open at www.psconf.eu, and the full 3-track 4-days agenda becomes available soon. Once a year it’s just a smart move to come together, update know-how, learn about security and mitigations, and bring home fresh ideas and authoritative guidance. We’d sure love to see and hear from you!