Introduction
A view view weeks ago I wrote a blog post about an issue I ran into with renaming a series of files: How to rename batches of files using .NET Regular Expressions. In this post I ended up solving the problem using a C# Windows Forms application. My initial approach to use Powershell was not feasible because of the steep learning curve that the new language imposed. Now I am halfway through Bruce Payette's book Windows PowerShell in Action
and I am finally able to solve the same problem in Powershell.
Figure 1: First Test Flight with Powershell
The War Story
I was trying the other day to scan in an old high school year book. I used a Canon CanoScan LiDE 600F scanner (0302B002)
. The Toolbox software that came with the scanner indexes the file names automatically. If you call your project e.g. “Abizeitung” then the name of the file of the first scan is called “Abizeitung_0001.jpg”, the name of the 10th scan is called “Abizeitung_0010.jpg” and so on. My goal was to have the index in the file name match the page number of the scanned page. After scanning 38 pages I made a mistake. I scanned the sheet with page 39 and page 40 twice. At this point the file index and the pages got out of sync. Page 41 was now “Abizeitung_0043.jpg”. The index of the files of page 41 and higher was shifted by 2. Just later when I was about to scan pages 120 and 121 I noticed that I screwed up the file numbering and deleted the duplicate scans. I deleted “Abizeitung_0039.jpg” and “Abizeitung_0040.jpg” and continued scanning. This time the CanoScan Toolbox software outsmarted me again, by using the now two empty file name slots to store the scans of pages 120 and 121. The following table shows the full extent of the scan screw up:
| File Name Index | Page Number | Comment |
| 1 | 1 | |
| 2 | 2 | |
| ... | ... | |
| 37 | 37 | |
| 38 | 38 | |
| 39 | 37, 120 | First I scanned page 37 twice, which created file index 39, then later when I discovered that page number and file index are out of synch I deleted the file with index 39. But the scanning tool used the empty file index slot and filled it up with the scan of page number 120. |
| 40 | 38, 121 | First I scanned page 38 twice, which created file index 40, then later when I discovered that page number and file index are out of synch I deleted the file with index 40. But the scanning tool used the empty file index slot and filled it up with the scan of page number 121. |
| 41 | 39 | |
| 42 | 40 | |
| ... | ... | |
| 119 | 117 | |
| 120 | 118 | |
| 121 | 119 | The scan of page 120 filled up file index slot 39 and the scan of pate 121 filled up file index slot 40. |
| 122 | 122 | Now the page number and the file index are in synch again. |
| ... | ... | |
| 136 | 136 | |
Setting Up The Stage
To test my progress during the development of my Powershell script, I needed a set of files that simulated problem. These files can be generated using the following Powershell script (GenerateSampleFiles.ps).
1: if( Test-Path $Home\Desktop\Scans ) {Remove-Item $Home\Desktop\Scans\*.*} 2: else { new-item $Home\Desktop\Scans -ItemType directory; Set-Location $Home\Desktop\Scans} 3:
4: 1..38 | foreach {$FileName = 'scan{0:d4}.txt' -f $_; set-content -path $FileName "Page $_"} 5: 39..121 | foreach {$FileName = 'scan{0:d4}.txt' -f $_; $PageNum = ([int] $_) -2; set-content -path $FileName "Page $PageNum"} 6: 39..40 | foreach {$FileName = 'scan{0:d4}.txt' -f $_; remove-item -path $FileName } 7: 39..40 | foreach {$FileName = 'scan{0:d4}.txt' -f $_; $PageNum = ([int] $_) + 81; set-content -path $FileName "Page $PageNum"} 8: 122..136 | foreach {$FileName = 'scan{0:d4}.txt' -f $_; set-content -path $FileName "Page $_"}
The script basically creates a set of text files with names like "scan0001.txt" and the content of the text files is like "Page 1".
Verify the correctness of the result
The following Powershell script "VerifyResult.ps1" automates the checking of result of the script that actually performs the renaming of the files.
1: if (Test-Path result.txt) {Remove-Item result.txt} 2: dir | foreach {$a = get-content -path $_; add-content -path result.txt "$_`t`t$a"}
Shift-FileIndex Cmdlet
The following script gets all child items of a specified folder and filters filenames using a Regular Expression. Only files that match the pattern Text + 4-Digit-Number + .Extension will be permitted. Also the integer value of the 4-Digit-Number part needs to be in the specified range. These files then will be renamed by incrementing or decrementing the integer value of the number part in the file name. If the number gets shifted to a higher value then the renaming starts with the highest number and ends with the lowest. If the number gets decremented the value shift begins with the file with the lowest number.
1: function Shift-FileIndex
2: { 3: param ( [int]$firstIndex=$(throw "`$firstIndex is a required parameter"),
4: [int]$lastIndex=$(throw "`$lastIndex is a required parameter"),
5: [switch]$shiftDown,
6: [int]$shiftBy=$(throw "`$shiftBy is a required parameter"))
7: begin { 8: if(($firstIndex -lt $lastIndex) -and ($firstIndex -gt 0) -and ($lastIndex -gt 1))
9: { 10: $size = $lastIndex * 2
11: }
12: else
13: { 14: throw "`$firstIndex needs to be less than `$lastIndex and both can't be negative!"
15: }
16:
17: }
18: process{ 19: if ($shiftDown)
20: { 21: $input | where-object {$_.name -match '^(?<NAME>.*?)(?<INDEX>\d{4})(?<EXT>\..*$)'-and ([int] $matches.INDEX -ge $firstIndex) -and ([int] $matches.INDEX -le $lastIndex)} | foreach-object {$fh=New-Object System.IO.FileInfo[] $size}{$fh[[int]$matches.INDEX]= $_}{$i = 0; while( $i -lt $fh.length) {if($fh[$i] -ne $null){($fh[$i]).name -match '^(?<NAME>.*?)(?<INDEX>\d{4})(?<EXT>\..*$)'; $NewFileName = '{0}{1:d4}{2}' -f $matches.NAME, (([int] $matches.INDEX) - $shiftBy), $matches.EXT; $NewFileName; move-item -literal $fh[$i] $NewFileName;} $i++}} 22: }
23: else
24: { 25: $input | where-object {$_.name -match '^(?<NAME>.*?)(?<INDEX>\d{4})(?<EXT>\..*$)'-and ([int] $matches.INDEX -ge $firstIndex) -and ([int] $matches.INDEX -le $lastIndex)} | foreach-object {$fh=New-Object System.IO.FileInfo[] $size}{$fh[[int]$matches.INDEX]= $_}{$i = $fh.Length -1 ; while( $i -ge 0) {if($fh[$i] -ne $null){($fh[$i]).name -match '^(?<NAME>.*?)(?<INDEX>\d{4})(?<EXT>\..*$)'; $NewFileName = '{0}{1:d4}{2}' -f $matches.NAME, (([int] $matches.INDEX) + $shiftBy ), $matches.EXT; $NewFileName; move-item -literal $fh[$i] $NewFileName;} $i--}} 26: }
27: }
28: end{} 29: }
Usage
With the Shift-FileIndex Cmdlet I am now able to repair the file names. The following script shows the usage of the Cmdlet.
1: dir $Home\Desktop\Scans | Shift-FileIndex -firstIndex 39 -lastIndex 40 -shiftBy 100
2: dir $Home\Desktop\Scans | Shift-FileIndex -firstIndex 41 -lastIndex 121 -shiftDown -shiftBy 2
3: dir $Home\Desktop\Scans | Shift-FileIndex -firstIndex 139 -lastIndex 140 -shiftDown -shiftBy 19
Open issues
I haven't figured out yet how to sort a hash table by its key values that are converted to integers. Here is a short example. This example sorts the keys based on the string representation of its value.
1: # Sorting a hash table based on the key values as integers
2: $a = @{} 3:
4: Set-Location -Path $Home\Desktop\scans
5:
6: $a[[int]1] = dir scan0001.txt
7: $a[[int]11] = dir scan0011.txt
8: $a[[int]3] = dir scan0003.txt
9:
10:
11: $a | sort -Descending
12:
13: Name Value
14: ---- -----
15: 3 C:\Documents and Settings\KlausG\Desktop\Scans\scan0003.txt
16: 1 C:\Documents and Settings\KlausG\Desktop\Scans\scan0001.txt
17: 11 C:\Documents and Settings\KlausG\Desktop\Scans\scan0011.txt
Download
The set of files discussed in this post can be downloaded here: MyFirstPowershellOneLiner.zip
Summary
I love Powershell. I am only halfway through Bruce's book, but I am getting already a glimpse of its potential. I like especially the dynamic aspect of it. Creating and extending types on the fly is something that is quite exciting. Next I am looking to find a solution for my open issues with sorting hash tables. I think building a custom PSObject type with an integer NoteProperty and a NoteProperty that holds a FileInfo object might get me a few steps closer to solving the sorting challenge.