One PowerShell Script to download them all: PDC2008 Videos, Code and PowerPoint files

by Klaus Graefensteiner 4. December 2008 00:40

Introduction

Yes I did it. I downloaded all 65,563,667,714 bytes and saved them as 407 files on my nice new WD Passport hard drive that I brought home from the PDC2008. And, yes I did it the old fashioned manual way: Right-Click on link and Save-As. But you don't have to do it. In this blog post I provide the metadata of all the files I downloaded and a PowerShell script that will help you automating the download process. Besides the PDC content you also get to a treasure trove of PowerShell scripting techniques ranging from dynamically generating Regular Expressions to loading meta-data form XML and CSV files.

PDC2008 Download

Figure 1: PDC2008 Download

Quickstart

Download and install PowerShell

If PowerShell is not installed yet on your computer, then download it from the PowerShell website: http://www.microsoft.com/windowsserver2003/technologies/management/powershell/download.mspx.

Download and install this script

Download the scripts and initialization files from here: Get-PDC2008Content.zip and extract them into directory called C:\PDC2008DownloadScript.

Run the script

Start -> Run -> PowerShell. Copy and past the appropriate script snippet from Usage.ps1 into the PowerShell console window and hit enter.

The Problem Domain

Tracks

Microsoft made the conference material like videos, PowerPoint presentations and code samples available for download to the general public. They can be downloaded from http://www.microsoftpdc.com . The presentations at the Microsoft Professional Developer Conference 2008 are grouped in several tracks. For example there are TL (Tools and Languages), ES (Enterprise Services), PC (Presentation and Communication), BB (Business B), SYMP (Symposium), PRE (Pre-conference session) and KYN (Keynote) tracks. Each presentation is assigned to one of these track groups and given an unique two digit number. The abbreviation for the track group and the unique number build the unique session id. Here are some examples of session or presentation indexes: PC02, PC12, TL33, SYMP01, ES07, PRE08 and KYN04.

Content Types

The content is grouped into three different categories. There are videos, sample code and the PowerPoint presentations. The content is provided as files that can be downloaded. Each type is basically defined by its file extension. .PPTX are PowerPoint presentation files, .ZIP are sample code files and .WMV are video files.

Content Matrix

  .PPTX (PowerPoint) .WMV
(Video)
.ZIP
(Sample Code)
BB
(Business B)
     
ES
(Enterprise S)
     
KYN
(Keynote)
     
PC
(Presentation C)
     
PRE
(Pre-conference)
     
SYMP
(Symposium)
     
TL
(Tools Languages)
     

Content Files

The file name is defined by its session id and the content type. Here are some example file names: PC23.pptx, KYN02.wmv, ES01.zip, ES20.pptx and TL49.pptx. The file names follow a well defined pattern that can be easily described by the following regular expression:

   1: '^(?<TRACK>((TL)|(PC)|(ES)|(PRE)|(SYMP)|(KYN)|(BB)))(?<INDEX>\d{2})(?<EXT>(\.WMV)|(\.ZIP)|(\.PPTX))$'

The Pain Relief

Who has time to browse the PDC2008 web site, find the right content and download it? Who would rather command a computer to download for example all Don Box videos from the PDC2008? Who likes to scan or even read pages after pages of irrelevant data, to find one line of information that is relevant.

Abflugzeiten

Figure 2: Catching a flight

Does this situation sound and look familiar to you? You are in a hurry catching a flight. You have a choice: For once you could look up your flight on the departure board and find the terminal and the gate. Let's further assume that you are not familiar with this airport. Now you need to take your eyes from the board and start scanning your environment for signs that will guide you to the right terminal and gate. Alternatively you could try to find a real person, whose job it is to answer questions about flights. You would tell the person something like this: "I need to catch LH33. What terminal and gate is the air plain departing from? How can I get there quickly?" The person would say: "Terminal 2, Gate 21 and use the bus to get to the terminal!" Now - that's a user experience! Sifting through irrelevant data to find the relevant information is a task the computer is good at. Many web or desktop applications remind me of this departure board, lots of information that is not specific to an individual user's needs. I think we can do better. My script is a baby step in this direction.

Computer!

Download all PDC2008 files!

   1: # Download all content
   2: .\Get-PDC2008Content -SaveAsPath "c:\TestDownloads\" `
   3:                      -ContentManifestPath "c:\PDC2008DownloadScript\PDC2008ContentManifest.csv" `
   4:                      -URLMappingPath "c:\PDC2008DownloadScript\PDC2008URLMapping.xml"

Download all PDC2008 PowerPoint files!

   1: # Download all PowerPoint files
   2: .\Get-PDC2008Content -SaveAsPath "c:\TestDownloads\" `
   3:                      -ContentManifestPath "c:\PDC2008DownloadScript\PDC2008ContentManifest.csv" `
   4:                      -URLMappingPath "c:\PDC2008DownloadScript\PDC2008URLMapping.xml" `
   5:                      -ContentFilter ".PPTX"

Download PowerPoint and code sample files of a specific set of sessions!

   1: # Download only PowerPoint files and Zip files of a specific set of sessions
   2: .\Get-PDC2008Content -SaveAsPath "c:\TestDownloads\" `
   3:                      -ContentManifestPath "c:\PDC2008DownloadScript\PDC2008ContentManifest.csv" `
   4:                      -URLMappingPath "c:\PDC2008DownloadScript\PDC2008URLMapping.xml" `
   5:                      -SessionIds "BB01", "BB02", "BB03", "PRE01", "RPE02", "TL01" `
   6:                      -ContentFilter ".PPTX" , ".ZIP"

Force the download of all files of one specific session!

   1: # Get all content of one specific session. Overwrite already downloaded files
   2: .\Get-PDC2008Content -SaveAsPath "c:\TestDownloads\" `
   3:                      -ContentManifestPath "c:\PDC2008DownloadScript\PDC2008ContentManifest.csv" `
   4:                      -URLMappingPath "c:\PDC2008DownloadScript\PDC2008URLMapping.xml" `
   5:                      -SessionIds "BB01" `
   6:                      -Force

Download all code sample  files of the TL track!

   1: # Download all sample code files of the TL (Tools and Languages) track
   2: .\Get-PDC2008Content -SaveAsPath "c:\TestDownloads\" `
   3:                      -ContentManifestPath "c:\PDC2008DownloadScript\PDC2008ContentManifest.csv" `
   4:                      -URLMappingPath "c:\PDC2008DownloadScript\PDC2008URLMapping.xml" `
   5:                      -ContentFilter ".ZIP" `
   6:                      -TrackFilter "TL" `
   7:                      -Force

List the file names of all Keynote movies!

   1: # Download all keynote videos
   2: # Just list the files in the queue, but don't actually download them
   3: .\Get-PDC2008Content -SaveAsPath "c:\TestDownloads\" `
   4:                      -ContentManifestPath "c:\PDC2008DownloadScript\PDC2008ContentManifest.csv" `
   5:                      -URLMappingPath "c:\PDC2008DownloadScript\PDC2008URLMapping.xml" `
   6:                      -ContentFilter ".WMV" `
   7:                      -TrackFilter "KYN" `
   8:                      -Force `
   9:                      -Whatif

List the all PowerPoint files and videos of the BB and PC tracks!

   1: # Download all videos and PowerPoint files of the BB and PC tracks
   2: # Just list the files in the queue, but don't actually download them
   3: .\Get-PDC2008Content -SaveAsPath "c:\TestDownloads\" `
   4:                      -ContentManifestPath "c:\PDC2008DownloadScript\PDC2008ContentManifest.csv" `
   5:                      -URLMappingPath "c:\PDC2008DownloadScript\PDC2008URLMapping.xml" `
   6:                      -ContentFilter ".WMV", ".PPTX" `
   7:                      -TrackFilter "BB", "PC" `
   8:                      -Force `
   9:                      -WhatIf

Implementation Details

Initialization

The main component of this script is a hash table called $ContentDictionary that gets loaded from a comma separated file called PDC2008ContentManifest.csv at the beginning of the execution of the script. This file contains a list of FIleInfo objects of all files that can be downloaded from the http://www.MicrosoftPDC.com web site. This list is considered the master list of all downloadable content. Which means that only files that are in the list can be actually downloaded using the script. All user input is going to be verified against this list. If it is not in the list, then it will not be downloaded. The keys of this hash table are the file names including the file extension. The path to the PDC2008ContentManifest.csv is passed to the script as a parameter that is called $ContentManifestPath.

In addition to this path, there are two more parameters that provide path information. The $URLMappingPath parameter specifies the file location of a file called PDC2008URLMapping.xml. This file is a serialized hash table that maps a file extension to the root URL of the download link. Each file type has its own URL. This file is de-serialized during script initialization and stored in the $URLMapplings hash table. The third file path parameter is $SaveAsPath, which is the path to which all the downloaded files are saved. This is also the path that the script is going to look at to determine whether a file has been already downloaded or not.

Once the content manifest is loaded it is being analyzed and two arrays with distinct values are being created. One array stores a list of possible track ids $ContentTracks and the other array stores a list of possible file extensions $ContentTypes. At this point all other script parameters are being validated. The values passed in using the $ContentFilter are validated against $ContentTypes and the values passed in using the $TrackFilter parameter need to be contained in $ContentTracks in order to be considered valid. The parameter validation completes the initialization process.

At this point the rest of the script can be divided into two major parts: Compiling the download Queue and processing the downloads.

Compiling the download queue

This component delivers the main application logic. The idea is to start with a list of all possible downloads. This original list will be reduced during the evaluation of the passed-in parameters and the files that were already downloaded. At the end the list will be passed to the component that processes the actual downloads.

Here is the logic in pseudo code:

  1. If a list of specific sessions has been passed in using the $SessionIds parameter, then all other sessions and their files will be removed from the $DownloadQueue hash table.
  2. If the $Force switch parameter has not been set, then the list of already downloaded files in $SaveAsPath will be compared with the files in $DownloadQueue and the files that are in both collections will be removed from $DownloadQueue.
  3. If a list of specific sessions has been passed-in using the $SessionIds parameter, then only files that match elements of the $ContentFilter parameter will be kept in the $DownloadQueue. All other files will be removed.
  4. If the $SessionIds parameter is not used, then $ContentFilter and $TrackFilter elements are being matched against files in the $DownloadQueue hash table. Files that don't match are removed from this collection.
  5. If the $WhatIf switch parameter is set, then the resulting list of files in $DownloadQueue is only displayed and not actually downloaded.
  6. All remaining files in $DownloadQueue are now ready for download processing.

Processing the download queue

In this step the script enumerates over the $DownloadQueue and downloads each file contained in this hash table. It saves the file in the $SaveAsPath location and prints out some statistical information about the download transaction.

Complete Script Listing

Here is the complete listing of the Get-PDC2008Content script:

   1: param 
   2: (
   3:     [string]     $SaveAsPath = $(throw "`$SaveAsPath is a required Parameter. Please specify a path to a folder where your files will be saved!") ,
   4:     [string]     $ContentManifestPath = $(throw "`$ContentManifestPath is a required Parameter. Please specify a path to the PDC2008ContentManifest.csv file!") ,
   5:     [string]     $URLMappingPath = $(throw "`$URLMappingPath is a required Parameter. Please specify a path to the PDC2008URLMapping.xml file!") ,
   6:     [string[]]     $SessionIds ,
   7:     [string[]]     $ContentFilter ,
   8:     [string[]]     $TrackFilter ,
   9:     [switch]     $Force ,
  10:     [switch]    $WhatIf
  11: )
  12:  
  13:  
  14:  
  15: #Variables in script scope
  16: $SessionList = New-Object System.Collections.ArrayList    # List of explicitly provided sessions ids via SessionIds parameter
  17: $HasExlicitSessionIDs = $false
  18: $ContentManifest = $null                                 # Stores FileInfo objects of all downloadable content files
  19: $ContentDictionary = @{}                                # Hash table built from the original content manifest
  20: $DownloadQueue = @{}                                    # This is a hash table mapping a file name to its FileInfo Object. 
  21:                                                         # These represent the files that need to be actually downloaded after applying all filters. 
  22: $URLMappings = @{}                                        # This is a hash table that maps file extensions to their download URLs
  23: [string[]] $ContentTracks = $null                        # Array of strings with distinct track ids e.g. "PS", "TL", "BB", etc.
  24: [string[]] $ContentTypes = $null                        # Array of strings with distinct content types e.g. "PPTX", "ZIP", "WMV" etc.
  25: $RemoveList = $null                                        # Array of objects that need to be removed form the DownloadQueue
  26:  
  27:  
  28: function Validate-Parameters
  29: {
  30:     if ($(Test-Path $SaveAsPath) -eq $false)
  31:     {
  32:         throw "Save as path: `"{0}`" not found" -f $SaveAsPath 
  33:     }
  34:     
  35:     if ($SessionIds -ne $null)
  36:     {
  37:         Write "Sessions have been explicitly specified!"
  38:         $script:HasExlicitSessionIDs = $true
  39:         foreach ($s in $SessionIds) { [void] ($script:SessionList.Add($s))}
  40:     }
  41:     else
  42:     {
  43:         Write "No explicit sessions specified!"
  44:         $script:SessionList.Clear()
  45:         $script:HasExlicitSessionIDs = $false
  46:     }
  47:     
  48:     # Get a list of tracks and extensions
  49:     # This is needed here to validate the ContentFilter and TrackFilter parameters
  50:     Analyze-ContentManifest $script:ContentManifest
  51:     ValidateTrackFilter -ValidTracks $Script:ContentTracks -AskedForTracks $TrackFilter
  52:     ValidateContentFilter -ValidTypes $Script:ContentTypes -AskedForTypes $ContentFilter
  53:     
  54: }
  55:  
  56: function ValidateTrackFilter([string[]] $ValidTracks=$Script:ContentTracks, [string[]] $AskedForTracks=$TrackFilter)
  57: {
  58:     if ($ValidTracks -eq $null)
  59:     {
  60:         throw "ValidateTrackFilter failed, because `$ValidTracks is null!"
  61:     }
  62:     
  63:     if ($AskedForTracks -ne $null)
  64:     {
  65:         VerifyAllStringsAreInCollection -MasterArray $ValidTracks -ContainedArray $AskedForTracks
  66:     }
  67:     else
  68:     {
  69:         Write-Warning "No TrackFilter specified"
  70:     }
  71: }
  72:  
  73: function ValidateContentFilter([string[]] $ValidTypes=$Script:ContentTypes, [string[]] $AskedForTypes=$ContentFilter)
  74: {
  75:     if ($ValidTypes -eq $null)
  76:     {
  77:         throw "ValidateContentFilter failed, because `$ValidTypes is null!"
  78:     }
  79:     
  80:     if ($AskedForTypes -ne $null)
  81:     {
  82:         VerifyAllStringsAreInCollection -MasterArray $ValidTypes -ContainedArray $AskedForTypes
  83:     }
  84:     else
  85:     {
  86:         Write-Warning "No ContentFilter specified"
  87:     }
  88: }
  89:  
  90: function VerifyAllStringsAreInCollection( [object[]] $MasterArray, [object[]] $ContainedArray)
  91: {
  92:     $ContainedArray | foreach-object {if ($MasterArray -inotcontains $_) {throw "$_ is not a valid selection. These selections are valid: $MasterArray"}}
  93: }
  94:  
  95:  
  96: function Load-ContentManifestFromCSV ([string] $CSVPath = $ContentManifestPath)
  97: {
  98:     if (Test-Path $CSVPath)
  99:     {
 100:         $script:ContentManifest = Import-Csv -Path $CSVPath
 101:     }
 102:     else
 103:     {
 104:         throw "CSV file: `"{0}`" not found" -f $CSVPath 
 105:     }
 106:     BuildContentDictionary -Dictionary $Script:ContentDictionary -Manifest $Script:ContentManifest
 107: }
 108:  
 109: function BuildContentDictionary ([Object] $Dictionary=$Script:ContentDictionary , [Object[]] $Manifest=$script:ContentManifest)
 110: {
 111:     if($Manifest -eq $null)
 112:     {
 113:         throw "Invalid Manifest!"
 114:     }
 115:     # Building hashtable from array
 116:     $Manifest | ForEach-Object { $Dictionary[ $_.Name ] = $_ }
 117:     
 118:     Write "Building ContentDictionary hashtable from ContentManifest array"
 119:     $Dictionary
 120: }
 121:  
 122:  
 123: function Load-URLMappingFromXML ([string] $XMLPath = $URLMappingPath)
 124: {
 125:     if (Test-Path $XMLPath)
 126:     {
 127:         $script:URLMappings = Import-CliXML -Path $XMLPath
 128:         if ($script:URLMappings -eq $null)
 129:         {
 130:             throw "`$URLMappings is not initialized! Check the contents of the file: {0}" -f $XMLPath
 131:         }
 132:     }
 133:     else
 134:     {
 135:         throw "XML file: `"{0}`" not found" -f $XMLPath 
 136:     }
 137: }
 138:  
 139:  
 140:  
 141:  
 142: function Download-File ([string] $URL, [string] $SaveAsFile, [int] $Length)
 143: {
 144:     $WebDownloader = new-object System.Net.Webclient
 145:     $StartTime = Get-Date
 146:     Write ("Download started at {0}" -f $StartTime)
 147:     Write ("Downloading {0} bytes from {1} and saving file as {2}" -f $Length, $URL, $SaveAsFile)
 148:     
 149:     trap { $WebDownloader.Dispose(); throw "$_" } $WebDownloader.DownloadFile( $URL, $SaveAsFile);
 150:     
 151:     $EndTime = Get-Date
 152:     $Duration = $EndTime - $StartTime
 153:     Write ("Download ended at {0}" -f $EndTime)
 154:     Write ("Download took {0:F} seconds, which is {1:F} minutes" -f $Duration.TotalSeconds, $Duration.TotalMinutes )
 155:     $DownloadedFile = Get-ChildItem $SaveAsFile
 156:     $Speed = $DownloadedFile.Length / ($Duration.TotalSeconds + 1)
 157:     $KBRate = $Speed / 1MB
 158:     Write ("Download rate is {0:F} MB/s" -f $KBRate)
 159:     
 160:     if ((Verify-DownloadedFile $SaveAsFile $Length) -eq $false)
 161:     {
 162:         Write-Warning ("Download of file {0} is not complete" -f $SaveAsFile)
 163:     }
 164:     else
 165:     {
 166:         Write "Download completed successfully!"
 167:     }
 168:     
 169:     Write ""
 170:     Write ""
 171:     $WebDownloader.Dispose()
 172: }
 173:  
 174: function Process-DownloadQueue([Object] $Queue)
 175: {
 176:     if ($Queue -eq $null)
 177:     {
 178:         Write "No file in download list. Try again!"
 179:     }
 180:     else
 181:     {
 182:         $Queue.GetEnumerator() | foreach-object `
 183:         { `
 184:             $Ext = $_.Value.Extension.ToLower(); $Name = $_.Value.Name; $Length = $_.Value.Length ;`
 185:             trap { write-warning "$_`n`n" ; continue } Download-File -URL $($script:URLMappings[$Ext] + $Name) -SaveAsFile $($SaveAsPath + $Name) -Length $Length; `
 186:         }
 187:     }
 188: }
 189:  
 190: function Compile-DownloadQueue
 191: {
 192:     Write "Compiling DownloadQueue"
 193:     if ($script:HasExlicitSessionIDs -eq $true)
 194:     {
 195:         $script:ContentDictionary.GetEnumerator() | ForEach-Object -Process{ foreach( $s in $script:SessionList){ if ($_.Key.Substring(0, $_.Key.IndexOf(".")).ToLower() -eq $s.ToLower()) { $script:DownloadQueue[$_.Key] = $_.Value} }}
 196:     }
 197:     else
 198:     {
 199:         $script:DownloadQueue = $script:ContentDictionary        
 200:     }
 201:     
 202:     if($Force -eq $false)
 203:     {
 204:         Remove-DownloadedFilesFromQueue -Queue $script:DownloadQueue -DownloadPath $SaveAsPath
 205:     }
 206:     
 207:     Select-MatchingFilesFromQueue -Queue $script:DownloadQueue -Types $ContentFilter -Tracks $TrackFilter
 208: }
 209:  
 210: function Select-MatchingFilesFromQueue([Object] $Queue=$script:DownloadQueue, [String[]] $Types=$ContentTypes , [String[]] $Tracks=$ContentTracks)
 211: {
 212:     Write "Select-MatchingFilesFromQueue"    
 213:     
 214:     $TrackFilter = ""
 215:     if ($Tracks -eq $null -or $Tracks.Length -eq 0)
 216:     {
 217:         $TrackFilter = ".*?"
 218:     }
 219:     elseif ($Tracks.Length -gt 0)
 220:     {
 221:         for($i=0; $i -lt $Tracks.Length; $i++)
 222:         {
 223:             $TrackFilter += "({0})" -f $Tracks[$i]
 224:             if( $i -lt ($Tracks.Length - 1))
 225:             {
 226:                 $TrackFilter += "|"
 227:             }
 228:         }
 229:     }
 230:     
 231:     $TypeFilter = ""
 232:     if ($Types -eq $null -or $Types.Length -eq 0)
 233:     {
 234:         $TypeFilter = "\..*"
 235:     }
 236:     elseif ($Types.Length -gt 0)
 237:     {
 238:         for($i=0; $i -lt $Types.Length; $i++)
 239:         {
 240:             $TypeFilter += "(\{0})" -f $Types[$i]
 241:             if( $i -lt ($Types.Length - 1))
 242:             {
 243:                 $TypeFilter += "|"
 244:             }
 245:         }
 246:     }
 247:     Write "TrackFilter Regex part"
 248:     $TrackFilter
 249:     Write "TypeFilter Regex part"
 250:     $TypeFilter
 251:     
 252:     $RegexString = "^(?<TRACK>({0}))(?<INDEX>\d{2})(?<EXT>({1}))$" `
 253:         -f $TrackFilter, $TypeFilter, "{2}"
 254:     
 255:     Write "Queue before filter selection"
 256:     $Queue
 257:     $ResultQueue = @{}
 258:     
 259:     $Queue.GetEnumerator() | Foreach-Object {if($_.Key -match $RegexString) { $ResultQueue[$_.Key] = $_.Value}}
 260:     
 261:     Write "ResultQueue after filter selection"    
 262:     $script:DownloadQueue = $ResultQueue
 263:     $script:DownloadQueue.GetType().FullName
 264:     $script:DownloadQueue
 265:     
 266:     Write "Types"
 267:     $Types
 268:     
 269:     Write "Tracks"
 270:     $Tracks
 271: }
 272:  
 273: function Remove-DownloadedFilesFromQueue ([Object] $Queue , [string] $DownloadPath)
 274: {
 275:     Write "Remove-DownloadedFilesFromQueue"
 276:     if(( $Queue -ne $null) -and (Test-Path $SaveAsPath))
 277:     {
 278:         Compare-FileList -Reference $Queue -FilePath $DownloadPath
 279:         Remove-FilesFromQueue -Current $Queue -Delta $Script:RemoveList
 280:     }
 281: }
 282:  
 283: function Compare-FileList ([Object] $Reference , [string] $FilePath)
 284: {
 285:     Write "Comparing Files Lists"
 286:     if (Test-Path $FilePath )
 287:     {
 288:         Set-Location $FilePath
 289:         
 290:         $LocalFiles = dir | Select-Object Name, Length | Sort-Object Name
 291:         $QueuedFiles = $Reference.Values | Select-Object Name, Length | Sort-Object Name
 292:                 
 293:         Write "Files in Queue"
 294:         $QueuedFiles
 295:         
 296:         Write "Local Files"
 297:         $LocalFiles
 298:         
 299:         if ($LocalFiles -ne $null)
 300:         {
 301:             # Files and their length that have matches in both lists
 302:             $SyncRange = 100
 303:             if ($Script:ContentDictionary.Count -gt 100 )
 304:             {
 305:                 $SyncRange = $Script:ContentDictionary.Count
 306:             }
 307:             Write ("SyncWindow {0}" -f $SyncRange)
 308:             $DownloadedFiles = Compare-Object -ReferenceObject $QueuedFiles -DifferenceObject $LocalFiles -Property Name, Length -IncludeEqual -ExcludeDifferent -PassThru -SyncWindow $SyncRange
 309:             $Script:RemoveList = $DownloadedFiles
 310:             Write "Files that don't need to be downloaded anymore, because they are already downloaded!"
 311:             $Script:RemoveList
 312:         }
 313:         else
 314:         {
 315:             $Script:RemoveList = $null
 316:         }
 317:     }
 318:     else
 319:     {
 320:         throw "DifferenceFilePath: `"{0}`" not found" -f $DifferenceFilePath 
 321:     }
 322: }
 323:  
 324: function Verify-DownloadedFile ([string] $FileName, [int] $ExpectedFileLength )
 325: {    
 326:     if(Test-Path $FileName)                                 `
 327:     {                                                        `
 328:         $File = dir $FileName;                                 `
 329:         if( $File -ne $Null)                                `
 330:         {                                                    `
 331:             if ($File.Length -eq $ExpectedFileLength)        `
 332:             {                                                `
 333:                 return $true                                `
 334:             }                                                `
 335:             else                                            `
 336:             {                                                `
 337:                 return $false                                `
 338:             }                                                `
 339:         }                                                    `
 340:         else                                                `
 341:         {                                                    `
 342:             return $false                                    `
 343:         }                                                    `
 344:     }                                                        `
 345:     else                                                    `
 346:     {                                                        `
 347:         return $false                                        `
 348:     }                                                        `
 349: }
 350:  
 351: function Analyze-ContentManifest([Object[]] $Manifest = $script:ContentManifest)
 352: {
 353:     if($Manifest -ne $null)
 354:     {
 355:         $Manifest | foreach-object `
 356:             -Begin {"Analyzing Content Manifest"} `
 357:             -Process { [Void] ($_.Name -match  '^(?<TRACK>.*?)(?<INDEX>\d{2})(?<EXT>\..*$)'); $script:ContentTracks += $matches.TRACK.ToUpper(); $script:ContentTypes += $matches.EXT.ToUpper(); } `
 358:             -End { $script:ContentTracks = $script:ContentTracks | sort-object -unique; $script:ContentTypes = $script:ContentTypes | sort-object -unique; } 
 359:         
 360:         Write "ContentTracks"    
 361:         $script:ContentTracks;
 362:         Write "ContentTypes"
 363:         $script:ContentTypes;
 364:     }
 365: }
 366:  
 367: function Remove-FilesFromQueue ([Object] $Current=$Script:DownloadQueue, [Object[]] $Delta )
 368: {
 369:     if( $Delta -eq $null -or $Delta.count -eq 0)
 370:     {
 371:         Write-Warning "Removing files from Download Queue. No files to remove!"
 372:         return
 373:     }
 374:     if( $Current -eq $null)
 375:     {
 376:         throw "Removing files from Download Queue. Queue does not exist!"
 377:     }
 378:     
 379:     if( $Current.count -eq 0)
 380:     {
 381:         Write-Warning "Removing files from Download Queue. Queue is empty!"
 382:         return
 383:     }
 384:     
 385:     Write "Remove-FilesFromQueue"
 386:     Write "Files to remove"
 387:     $Delta
 388:     
 389:     Write "Queue before removal"
 390:     $Current
 391:     
 392:     $Delta | ForEach-Object -Process{ $Current.Remove( $_.Name )}
 393:     
 394:     Write "Queue after removal"
 395:     $Current
 396: }
 397:  
 398: Load-ContentManifestFromCSV
 399: Load-URLMappingFromXML
 400: Validate-Parameters
 401: Compile-DownloadQueue
 402: if ($WhatIf -eq $false)
 403: {
 404:     Process-DownloadQueue -Queue $script:DownloadQueue
 405: }

Download

Get-PDC2008Content Script

Download the script and the content database here: Get-PDC2008Content.zip

Acceptance test script and test results

Download the acceptance test scripts and the test results here: AcceptanceTests.zip

Unit Coding scripts

Yes, Unit Coding instead of Unit Testing. Unfortunately PowerShell as programming language is in the toddler stages. A professional IDE and other developer productivity tools are not available yet. This means that lots of experimenting and debugging needs to happen to get a script right. During the development of this script I needed to create many script units to test a language construct in isolation. These unit scripts can be downloaded here: ScriptStudies.zip

Ausblick

This project is the core component of a set of three loosely coupled PowerShell script. The first and the last script are not yet developed. My vision is to have a script that searches the PDC2008 session description for keywords and returns a list of session ids. These ids can then be downloaded with the script that is the topic of this post. And once downloaded I would like to have a script that burns the content on a DVD or CD. Once all scripts are finished, I would like to be able to search for example for the keywords "OSLO" and "Don Box" and my computer would then burn me a DVD with the PDC2008 resources that matched my search.

Es gibt viel zu tun. Packen wir's an! This sentence is in German and means: There is lots to do. Let's get started!

Tags: , ,

PowerShell | Regex

Comments

12/10/2008 4:03:18 AM #

KG

Pure PowerShell Magic!

KG United States |

12/10/2008 5:03:59 AM #

Bobby Ryan

Good Job!

Bobby Ryan United States |

12/10/2008 6:40:22 PM #

Amr Elsehemy

very Nice,
Well Done..

Amr Elsehemy Egypt |

12/11/2008 2:55:57 AM #

Gordon Bell

Great Work!  A little late for me though, since I manually downloaded most of the content already.

Gordon Bell United States |

12/11/2008 3:12:17 AM #

Jeffery Hicks

There are in fact professional PowerShell IDEs available.  PrimalScript (http://www.primalscript.com) has had PowerShell support since it was first released including a debugger which is quite a feat since PowerShell v1.0 doesn't ship with any debugger APIs. The forthcoming PrimalScript 2009 will offer even more.  

Still, I thought this was a killer script.

Jeffery Hicks
Microsoft PowerShell MVP
http://blog.sapien.com
follow me: http://www.twitter.com/JeffHicks

"Those who forget to script are doomed to repeat their work."

Now Available: Managing Active Directory With Windows PowerShell: TFM

Jeffery Hicks United States |

12/11/2008 3:23:26 PM #

mycall

Have you considered a MUCH easier approach?

1. Install WinHTTrack
2. download blogs.msdn.com/mswanson/pages/PDC2008Sessions.aspx
  a) include only .PPTX, .WMV, .ZIP and any other extensions
  b) depth can be a few levels

Yes, it isn't as fun as writing a PS1 script, but sometimes knowing great utilities is better WRT efficiency.

mycall United States |

Comments are closed

About Klaus Graefensteiner

I like the programming of machines.

Add to Google Reader or Homepage

LinkedIn FacebookTwitter View Klaus Graefensteiner's profile on Technorati
Klaus Graefensteiner

Klaus Graefensteiner
works as Developer In Test and is founder of the PowerShell Unit Testing Framework PSUnit. More...

Open Source Projects

PSUnit is a Unit Testing framwork for PowerShell. It is designed for simplicity and hosted by Codeplex.
BlogShell is The tool for lazy developers who like to automate the composition of blog content during the writing of a blog post. It is hosted by CodePlex.

Administration

About

Powered by:
BlogEngine.Net
Version: 1.6.1.0

License:
Creative Commons License

Copyright:
© Copyright 2014, Klaus Graefensteiner.

Disclaimer:
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Theme design:
This blog theme was designed and is copyrighted 2014 by Klaus Graefensteiner

Rendertime:
Page rendered at 7/25/2014 9:01:47 AM (PST Pacific Standard Time UTC DST -7)