Set-Content -Encoding parameter matters, if the content says so...

by Klaus Graefensteiner 5. November 2008 08:14

Introduction

I use a simple PowerShell script to replace some of the URLs in my blog post's html source before I publish them to www.tellingmachine.com. In my case the posts are stored as XML files. Usually I use Windows Live Writer to write my posts. I publish the documents for test purposes frequently during the authoring to the VisualStudio 2008 development web server that runs locally on my machine. Once the post is ready to put online, I take the XML file, run the PowerShell script against it and then copy it to my production server. Occasionally the XML files refuse to open in Internet Explorer after I ran the script. It took me few minutes to figure out why. Here is the story!

XML Architecture 

Figure 1: XML Architecture

Differences

The differences between a post published to Visual Studio and published to a production server are mainly the root path names of the hyperlink references. In my case I need to change the relative URLs for two things. First the relative picture paths need to be changed and the path to files that are referred to by a download link. Here are the actual replace instructions:

Before moving files from Visual Studio to the production server replace...

  • "/file.axd" with "/file.axd" for downloads
  • "/BlogEngine.Web/image.axd" with "/image.axd" for picture links
  • "http://www.tellingmachine.com" with "http://www.tellingmachine.com"

PowerShell script with a bug

Here is my first attempt to run the replace task with PowerShell. This one produced sometimes malformed XML.

   1: cd $home\desktop\PostPub
   2: $p = dir *.xml | ForEach-Object { `
   3: $text = $_ | get-content
   4: $text = $text -replace 'http://localhost:\d+/BlogEngine.Web','http://www.tellingmachine.com'
   5: $text = $text -replace '/file.axd', '/file.axd'
   6: set-content -Path $_.Fullname $text -force
   7: } `

The bug

The web server would throw an exception, when it tries to read the malformed XML post file. And, when I try to open the XML file directly in Internet Explorer I would get the following error.

Error opening XML in Internet Explorer

Figure 2: Error opening XML in Internet Explorer

First I thought that there is a problem with the regular expression, because when I did the search and replace manually in Visual Studio, then I wasn't able to reproduce the issue. It always works in Visual Studio, but running the script would with some blog posts always cause the error.

The epiphany

I stared at the PowerShell script in PowerGUI, hoping to spot the problem. And to my surprise, I found the solution. Can you see it too?

PowerGUI hint

Figure 3: PowerGUI hint

Exactly, look at the second line in the Locals window: encoding="utf-8". The XML actually prescribes what encoding format to use. From this point on the fix to this problem was a no-brainer. All I needed to do is to specify the encoding parameter in the set-content Cmdlet and that's it.

The fixed PowerShell script

Here is the script that does the job right:

   1: cd $home\desktop\PostPub
   2: $p = dir *.xml | ForEach-Object { `
   3: $text = $_ | get-content
   4: $text = $text -replace 'http://localhost:\d+/BlogEngine.Web','http://www.tellingmachine.com'
   5: $text = $text -replace '/file.axd', '/file.axd'
   6: set-content -Path $_.Fullname $text -force -Encoding UTF8
   7: } `

Recycle App Pool

There is one important note for BlogEngine.NET users. If you publish a blob post XML file manually via FTP to the posts folder of your virtual directory, then you need to invalidate the cache by recycling the App Pool and force the web application to pick up the new post. Otherwise the new post won't be displayed at all.

Download

The resources that this post is based on can be downloaded here: ReplaceURLs.zip

Ausblick

The encoding matters, if the content prescribes it. That is the lesson I learned here. The XML file said it clearly: "The encoding must be utf-8!".

Tags: , , ,

BlogEngine.NET | Blogging | Debugging | PowerShell

Comments are closed

About Klaus Graefensteiner

I like the programming of machines.

Add to Google Reader or Homepage

LinkedIn FacebookTwitter View Klaus Graefensteiner's profile on Technorati
Klaus Graefensteiner

Klaus Graefensteiner
works as Developer In Test and is founder of the PowerShell Unit Testing Framework PSUnit. More...

Open Source Projects

PSUnit is a Unit Testing framwork for PowerShell. It is designed for simplicity and hosted by Codeplex.
BlogShell is The tool for lazy developers who like to automate the composition of blog content during the writing of a blog post. It is hosted by CodePlex.

Administration

About

Powered by:
BlogEngine.Net
Version: 1.6.1.0

License:
Creative Commons License

Copyright:
© Copyright 2014, Klaus Graefensteiner.

Disclaimer:
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Theme design:
This blog theme was designed and is copyrighted 2014 by Klaus Graefensteiner

Rendertime:
Page rendered at 10/30/2014 7:25:50 AM (PST Pacific Standard Time UTC DST -7)