Renaming a series of indexed files with Powershell

by Klaus Graefensteiner 17. July 2008 06:01

Introduction

A view view weeks ago I wrote a blog post about an issue I ran into with renaming a series of files: How to rename batches of files using .NET Regular Expressions. In this post I ended up solving the problem using a C# Windows Forms application. My initial approach to use Powershell was not feasible because of the steep learning curve that the new language imposed. Now I am halfway through Bruce Payette's book Windows PowerShell in Action and I am finally able to solve the same problem in Powershell.

 First Test Flight with Powershell

 Figure 1: First Test Flight with Powershell

The War Story

I was trying the other day to scan in an old high school year book. I used a Canon CanoScan LiDE 600F scanner (0302B002). The Toolbox software that came with the scanner indexes the file names automatically. If you call your project e.g. “Abizeitung” then the name of the file of the first scan is called “Abizeitung_0001.jpg”, the name of the 10th scan is called “Abizeitung_0010.jpg” and so on. My goal was to have the index in the file name match the page number of the scanned page. After scanning 38 pages I made a mistake. I scanned the sheet with page 39 and page 40 twice. At this point the file index and the pages got out of sync. Page 41 was now “Abizeitung_0043.jpg”. The index of the files of page 41 and higher was shifted by 2. Just later when I was about to scan pages 120 and 121 I noticed that I screwed up the file numbering and deleted the duplicate scans. I deleted “Abizeitung_0039.jpg” and “Abizeitung_0040.jpg” and continued scanning. This time the CanoScan Toolbox software outsmarted me again, by using the now two empty file name slots to store the scans of pages 120 and 121. The following table shows the full extent of the scan screw up:

File Name Index Page Number Comment
1 1  
2 2  
... ...  
37 37  
38 38  
39 37, 120

First I scanned page 37 twice, which created file index 39, then later when I discovered that page number and file index are out of synch I deleted the file with index 39. But the scanning tool used the empty file index slot and filled it up with the scan of page number 120.

40 38, 121

First I scanned page 38 twice, which created file index 40, then later when I discovered that page number and file index are out of synch I deleted the file with index 40. But the scanning tool used the empty file index slot and filled it up with the scan of page number 121.

41 39  
42 40  
... ...  
119 117  
120 118  
121 119

The scan of page 120 filled up file index slot 39 and the scan of pate 121 filled up file index slot 40.

122 122

Now the page number and the file index are in synch again.

... ...  
136 136  

Setting Up The Stage

To test my progress during the development of my Powershell script, I needed a set of files that simulated problem. These files can be generated using the following Powershell script (GenerateSampleFiles.ps).

   1: if( Test-Path $Home\Desktop\Scans ) {Remove-Item $Home\Desktop\Scans\*.*}
   2: else { new-item $Home\Desktop\Scans -ItemType directory; Set-Location $Home\Desktop\Scans}
   3:  
   4: 1..38 | foreach {$FileName = 'scan{0:d4}.txt' -f $_; set-content -path $FileName "Page $_"}
   5: 39..121 | foreach {$FileName = 'scan{0:d4}.txt' -f $_; $PageNum = ([int] $_) -2; set-content -path $FileName "Page $PageNum"}
   6: 39..40 | foreach {$FileName = 'scan{0:d4}.txt' -f $_; remove-item -path $FileName }
   7: 39..40 | foreach {$FileName = 'scan{0:d4}.txt' -f $_; $PageNum = ([int] $_) + 81; set-content -path $FileName "Page $PageNum"}
   8: 122..136 | foreach {$FileName = 'scan{0:d4}.txt' -f $_; set-content -path $FileName "Page $_"}

The script basically creates a set of text files with names like "scan0001.txt" and the content of the text files is like "Page 1".

Verify the correctness of the result

The following Powershell script "VerifyResult.ps1" automates the checking of result of the script that actually performs the renaming of the files.

   1: if (Test-Path result.txt) {Remove-Item result.txt}
   2: dir | foreach {$a = get-content -path $_; add-content -path result.txt "$_`t`t$a"}

Shift-FileIndex Cmdlet

The following script gets all child items of a specified folder and filters filenames using a Regular Expression. Only files that match the pattern Text + 4-Digit-Number + .Extension will be permitted. Also the integer value of the 4-Digit-Number part needs to be in the specified range. These files then will be renamed by incrementing or decrementing the integer value of the number part in the file name. If the number gets shifted to a higher value then the renaming starts with the highest number and ends with the lowest. If the number gets decremented the value shift begins with the file with the lowest number.

   1: function Shift-FileIndex
   2: {
   3:     param (    [int]$firstIndex=$(throw "`$firstIndex is a required parameter"), 
   4:             [int]$lastIndex=$(throw "`$lastIndex is a required parameter"), 
   5:             [switch]$shiftDown, 
   6:             [int]$shiftBy=$(throw "`$shiftBy is a required parameter"))
   7:     begin {
   8:         if(($firstIndex -lt $lastIndex) -and ($firstIndex -gt 0) -and ($lastIndex -gt 1))
   9:         {
  10:             $size = $lastIndex * 2
  11:         }
  12:         else
  13:         {
  14:             throw "`$firstIndex needs to be less than `$lastIndex and both can't be  negative!"
  15:         }
  16:         
  17:     }
  18:     process{
  19:         if ($shiftDown)
  20:         {
  21:             $input | where-object {$_.name -match  '^(?<NAME>.*?)(?<INDEX>\d{4})(?<EXT>\..*$)'-and ([int] $matches.INDEX -ge $firstIndex) -and  ([int] $matches.INDEX -le $lastIndex)} | foreach-object {$fh=New-Object System.IO.FileInfo[]  $size}{$fh[[int]$matches.INDEX]= $_}{$i = 0; while( $i -lt $fh.length) {if($fh[$i] -ne  $null){($fh[$i]).name -match '^(?<NAME>.*?)(?<INDEX>\d{4})(?<EXT>\..*$)'; $NewFileName =  '{0}{1:d4}{2}' -f $matches.NAME, (([int] $matches.INDEX) - $shiftBy), $matches.EXT; $NewFileName;  move-item -literal $fh[$i] $NewFileName;} $i++}}
  22:         }
  23:         else
  24:         {
  25:             $input | where-object {$_.name -match  '^(?<NAME>.*?)(?<INDEX>\d{4})(?<EXT>\..*$)'-and ([int] $matches.INDEX -ge $firstIndex) -and  ([int] $matches.INDEX -le $lastIndex)} | foreach-object {$fh=New-Object System.IO.FileInfo[]  $size}{$fh[[int]$matches.INDEX]= $_}{$i = $fh.Length -1 ; while( $i -ge 0) {if($fh[$i] -ne  $null){($fh[$i]).name -match '^(?<NAME>.*?)(?<INDEX>\d{4})(?<EXT>\..*$)'; $NewFileName =  '{0}{1:d4}{2}' -f $matches.NAME, (([int] $matches.INDEX) + $shiftBy ), $matches.EXT;  $NewFileName; move-item -literal $fh[$i] $NewFileName;} $i--}}
  26:         }
  27:     }
  28:     end{}
  29: }

Usage

With the Shift-FileIndex Cmdlet I am now able to repair the file names. The following script shows the usage of the Cmdlet.

   1: dir $Home\Desktop\Scans | Shift-FileIndex -firstIndex 39 -lastIndex 40 -shiftBy 100
   2: dir $Home\Desktop\Scans | Shift-FileIndex -firstIndex 41 -lastIndex 121 -shiftDown -shiftBy 2
   3: dir $Home\Desktop\Scans | Shift-FileIndex -firstIndex 139 -lastIndex 140 -shiftDown -shiftBy 19

Open issues

I haven't figured out yet how to sort a hash table by its key values that are converted to integers. Here is a short example. This example sorts the keys based on the string representation of its value.

   1: # Sorting a hash table based on the key values as integers
   2: $a = @{}
   3:  
   4: Set-Location -Path $Home\Desktop\scans
   5:  
   6: $a[[int]1] = dir scan0001.txt
   7: $a[[int]11] = dir scan0011.txt
   8: $a[[int]3] = dir scan0003.txt
   9:  
  10:  
  11: $a | sort -Descending 
  12:  
  13: Name                           Value                                                                     
  14: ----                           -----                                                                     
  15: 3                              C:\Documents and Settings\KlausG\Desktop\Scans\scan0003.txt               
  16: 1                              C:\Documents and Settings\KlausG\Desktop\Scans\scan0001.txt               
  17: 11                             C:\Documents and Settings\KlausG\Desktop\Scans\scan0011.txt 

Download

The set of files discussed in this post can be downloaded here: MyFirstPowershellOneLiner.zip

Summary

I love Powershell. I am only halfway through Bruce's book, but I am getting already a glimpse of its potential. I like especially the dynamic aspect of it. Creating and extending types on the fly is something that is quite exciting. Next I am looking to find a solution for my open issues with sorting hash tables. I think building a custom PSObject type with an integer NoteProperty and a NoteProperty that holds a FileInfo object might get me a few steps closer to solving the sorting challenge.

kick it on DotNetKicks.com

Tags: , ,

Photoshop | PowerShell

Comments

7/18/2008 10:55:35 PM #

pingback

Pingback from blogs.msdn.com

PowerShell Team Blog : Shift-FileIndex

blogs.msdn.com |

7/18/2008 11:00:47 PM #

Jeffrey Snover

Very nice stuff Klaus!
You just "started" with PowerShell???? That's pretty impressive.  If that is your first script then I'm really looking forward to seeing your second one.  Smile
You should consider participating/contributing to PSCX http://www.codeplex.com/PowerShellCX

BTW - I blogged about your script:

blogs.msdn.com/.../shift-fileindex.aspx

Jeffrey Snover [MSFT]
Windows Management Partner Architect
Visit the Windows PowerShell Team blog at:    http://blogs.msdn.com/PowerShell
Visit the Windows PowerShell ScriptCenter at:  www.microsoft.com/.../msh.mspx

Jeffrey Snover United States |

7/19/2008 7:01:04 AM #

Richard

To sort a hash table

PS> $a = @{}
PS> $a[[int]1] = "dir scan0001.txt"
PS> $a[[int]11] = "dir scan0011.txt"
PS> $a[[int]3] = "dir scan0003.txt"
PS> $a

Name                           Value
----                           -----
3                              dir scan0003.txt
1                              dir scan0001.txt
11                             dir scan0011.txt

use the getenumerator() method before the sort

PS> $a.getEnumerator() | Sort Key -Descending

Name                           Value
----                           -----
11                             dir scan0011.txt
3                              dir scan0003.txt
1                              dir scan0001.txt


PS> $a.getEnumerator() | Sort Name -Descending

Name                           Value
----                           -----
11                             dir scan0011.txt
3                              dir scan0003.txt
1                              dir scan0001.txt

Hope this helps

Richard United Kingdom |

7/26/2008 10:22:35 AM #

pingback

Pingback from msmvps.com

Sorting a hash table - Richard Siddaway's Blog - MSMVPS.COM

msmvps.com |

Comments are closed

About Klaus Graefensteiner

I like the programming of machines.

Add to Google Reader or Homepage

LinkedIn FacebookTwitter View Klaus Graefensteiner's profile on Technorati
Klaus Graefensteiner

Klaus Graefensteiner
works as Developer In Test and is founder of the PowerShell Unit Testing Framework PSUnit. More...

Open Source Projects

PSUnit is a Unit Testing framwork for PowerShell. It is designed for simplicity and hosted by Codeplex.
BlogShell is The tool for lazy developers who like to automate the composition of blog content during the writing of a blog post. It is hosted by CodePlex.

Administration

About

Powered by:
BlogEngine.Net
Version: 1.6.1.0

License:
Creative Commons License

Copyright:
© Copyright 2014, Klaus Graefensteiner.

Disclaimer:
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Theme design:
This blog theme was designed and is copyrighted 2014 by Klaus Graefensteiner

Rendertime:
Page rendered at 7/23/2014 12:26:02 PM (PST Pacific Standard Time UTC DST -7)