Meta-Programming with PowerShell and Regular Expressions

by Klaus Graefensteiner 25. November 2008 01:13

Introduction

I am almost done with my biggest PowerShell scripting project so far. The last feature that I implemented was a flexible pattern matching facility. I wrote a procedure that would try to filter out a set of file names based on the first letters and the extensions of the file. The possible name prefixes and extensions would be provided by string arrays that could contain zero or more elements. The challenge as always with PowerShell is to understand its type conversion magic. One approach to this is to use baby steps while changing the static regex string to a dynamically built regex string. This progression and the result is captured in this blog post.

Pattern matching white board session

Figure 1: Pattern matching white board session

Regex Progression

The following PowerShell snippets show the steps of converting a static expression into a dynamically generated Regex. In this case the different prefixes are combined as alternations and so are the values of the file extensions.

   1: [String[]] $Types = ".ZIP", ".WMV"
   2: #[String[]] $Tracks = "TL"
   3: #[String[]] $Tracks = $null
   4: [String[]] $Tracks = "TL", "PC"
   5:  
   6: [String[]] $FileNames = "TL01.ZIP", "TL02.ZIP", "TL02.WMV", "TL02.PPTX", 
   7:                         "TL03.PPTX", "BB02.ZIP", "BB03.WMV", "BB03.PPTX", 
   8:                         "PC05.PPTX", "PC09.WMV", "PC09.PPTX"
   9:  
  10:  
  11: Write "Static Regex matches all`n`n" 
  12: # Static Regex matches all
  13: $FileNames | ForEach-Object { 
  14:     ($_ -match  '^(?<TRACK>.*?)(?<INDEX>\d{2})(?<EXT>\..*)$'); 
  15:     $matches.TRACK; $matches.EXT; $matches[0] }
  16:  
  17:  
  18: Write "Static Regex matches only TL sessions`n`n" 
  19: # Static Regex matches only TL sessions
  20: $FileNames | ForEach-Object { 
  21:     ($_ -match  '^(?<TRACK>(TL))(?<INDEX>\d{2})(?<EXT>\..*)$'); 
  22:     $matches.TRACK; $matches.EXT; $matches[0] }
  23:  
  24:  
  25: Write "Static Regex matches TL and PC sessions`n`n" 
  26: # Static Regex matches only TL and PC sessions
  27: $FileNames | ForEach-Object { 
  28:     ($_ -match  '^(?<TRACK>((TL)|(PC)))(?<INDEX>\d{2})(?<EXT>\..*)$'); 
  29:     $matches.TRACK; $matches.EXT; $matches[0] }
  30:  
  31:  
  32: Write "Static Regex matches TL and PC sessions and .WMV files`n`n" 
  33: # Static Regex matches only TL and PC sessions and .WMV files
  34: $FileNames | ForEach-Object { $_ ; 
  35:     ($_ -match  '^(?<TRACK>((TL)|(PC)))(?<INDEX>\d{2})(?<EXT>(\.WMV))$'); 
  36:     $matches.TRACK; $matches.EXT; $matches[0] }
  37:  
  38:  
  39: Write "Static Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
  40: # Static Regex matches only TL and PC sessions and .WMV and .ZIP files
  41: $FileNames | ForEach-Object { $_ ; 
  42:     ($_ -match '^(?<TRACK>((TL)|(PC)))(?<INDEX>\d{2})(?<EXT>((\.WMV)|(\.ZIP)))$'); 
  43:     $matches.TRACK; $matches.EXT; $matches[0] }
  44:  
  45:  
  46: Write "Static Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
  47: # Static Regex matches only TL and PC sessions and .WMV and .ZIP files 
  48: # Replaced ' with " in regex string
  49: $FileNames | ForEach-Object { $_ ; 
  50:     ($_ -match "^(?<TRACK>((TL)|(PC)))(?<INDEX>\d{2})(?<EXT>((\.WMV)|(\.ZIP)))$"); 
  51:     $matches.TRACK; $matches.EXT; $matches[0] }
  52:  
  53:  
  54: Write "Static Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
  55: # Static Regex matches only TL and PC sessions and .WMV and .ZIP files
  56: # Refactoring Track group and Exension group filter values by using the format 
  57: # operator
  58: # The {2} string is going to replaced by {2}
  59: $RegexString = "^(?<TRACK>({0}))(?<INDEX>\d{2})(?<EXT>({1}))$" `
  60:     -f "(TL)|(PC)", "(\.WMV)|(\.ZIP)", "{2}"
  61: $FileNames | ForEach-Object { $_ ; ($_ -match $RegexString  ); 
  62:     $matches.TRACK; $matches.EXT; $matches[0] }
  63:  
  64:  
  65: Write "Static Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
  66: # Static Regex matches only TL and PC sessions and .WMV and .ZIP files
  67: # Adding the filter variables
  68: $TrackFilter = "(TL)|(PC)"
  69: $TypeFilter = "(\.WMV)|(\.ZIP)"
  70: $RegexString = "^(?<TRACK>({0}))(?<INDEX>\d{2})(?<EXT>({1}))$" `
  71:     -f $TrackFilter, $TypeFilter, "{2}"
  72: $FileNames | ForEach-Object { $_ ; ($_ -match $RegexString  ); 
  73:     $matches.TRACK; $matches.EXT; $matches[0] }
  74:  
  75:  
  76: Write "Static Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
  77: # Static Regex matches only TL and PC sessions and .WMV and .ZIP files
  78: # Building values of filter variables dynamically
  79:  
  80: $TrackFilter = ""
  81: if ($Tracks -eq $null -or $Tracks.Length -eq 0)
  82: {
  83:     $TrackFilter = ".*?"
  84: }
  85: elseif ($Tracks.Length -gt 0)
  86: {
  87:     for($i=0; $i -lt $Tracks.Length; $i++)
  88:     {
  89:         $TrackFilter += "({0})" -f $Tracks[$i]
  90:         if( $i -lt ($Tracks.Length - 1))
  91:         {
  92:             $TrackFilter += "|"
  93:         }
  94:     }
  95: }
  96:  
  97: $TypeFilter = ""
  98: if ($Types -eq $null -or $Types.Length -eq 0)
  99: {
 100:     $TypeFilter = "\..*"
 101: }
 102: elseif ($Types.Length -gt 0)
 103: {
 104:     for($i=0; $i -lt $Types.Length; $i++)
 105:     {
 106:         $TypeFilter += "(\{0})" -f $Types[$i]
 107:         if( $i -lt ($Types.Length - 1))
 108:         {
 109:             $TypeFilter += "|"
 110:         }
 111:     }
 112: }
 113: $TrackFilter
 114: $TypeFilter
 115: $RegexString = "^(?<TRACK>({0}))(?<INDEX>\d{2})(?<EXT>({1}))$" `
 116:     -f $TrackFilter, $TypeFilter, "{2}"
 117: $FileNames | ForEach-Object { $_ ; ($_ -match $RegexString  ); 
 118:     $matches.TRACK; $matches.EXT; $matches[0] }
 119:  
 120:  
 121: Write "Static Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
 122: # Static Regex matches only TL and PC sessions and .WMV and .ZIP files
 123: # Adding the matches to an array of strings
 124:  
 125:  
 126: $TrackFilter = ""
 127: if ($Tracks -eq $null -or $Tracks.Length -eq 0)
 128: {
 129:     $TrackFilter = ".*?"
 130: }
 131: elseif ($Tracks.Length -gt 0)
 132: {
 133:     for($i=0; $i -lt $Tracks.Length; $i++)
 134:     {
 135:         $TrackFilter += "({0})" -f $Tracks[$i]
 136:         if( $i -lt ($Tracks.Length - 1))
 137:         {
 138:             $TrackFilter += "|"
 139:         }
 140:     }
 141: }
 142:  
 143: $TypeFilter = ""
 144: if ($Types -eq $null -or $Types.Length -eq 0)
 145: {
 146:     $TypeFilter = "\..*"
 147: }
 148: elseif ($Types.Length -gt 0)
 149: {
 150:     for($i=0; $i -lt $Types.Length; $i++)
 151:     {
 152:         $TypeFilter += "(\{0})" -f $Types[$i]
 153:         if( $i -lt ($Types.Length - 1))
 154:         {
 155:             $TypeFilter += "|"
 156:         }
 157:     }
 158: }
 159: $TrackFilter
 160: $TypeFilter
 161:  
 162: $RegexString = "^(?<TRACK>({0}))(?<INDEX>\d{2})(?<EXT>({1}))$" `
 163:     -f $TrackFilter, $TypeFilter, "{2}"
 164: $FilterMatches = $FileNames | Where-Object {$_ -match $RegexString}
 165: Write "FilterMatches"
 166: $FilterMatches
 167:  
 168:  
 169:  

The Final Script

And here is the final solution.

   1: [String[]] $Types = ".ZIP", ".WMV"
   2: [String[]] $Tracks = "TL", "PC"
   3:  
   4: [String[]] $FileNames = "TL01.ZIP", "TL02.ZIP", "TL02.WMV", "TL02.PPTX", 
   5:                         "TL03.PPTX", "BB02.ZIP", "BB03.WMV", "BB03.PPTX", 
   6:                         "PC05.PPTX", "PC09.WMV", "PC09.PPTX"
   7:  
   8:  
   9:  
  10: Write "Dynamic Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
  11:  
  12: $TrackFilter = ""
  13: if ($Tracks -eq $null -or $Tracks.Length -eq 0)
  14: {
  15:     $TrackFilter = ".*?"
  16: }
  17: elseif ($Tracks.Length -gt 0)
  18: {
  19:     for($i=0; $i -lt $Tracks.Length; $i++)
  20:     {
  21:         $TrackFilter += "({0})" -f $Tracks[$i]
  22:         if( $i -lt ($Tracks.Length - 1))
  23:         {
  24:             $TrackFilter += "|"
  25:         }
  26:     }
  27: }
  28:  
  29: $TypeFilter = ""
  30: if ($Types -eq $null -or $Types.Length -eq 0)
  31: {
  32:     $TypeFilter = "\..*"
  33: }
  34: elseif ($Types.Length -gt 0)
  35: {
  36:     for($i=0; $i -lt $Types.Length; $i++)
  37:     {
  38:         $TypeFilter += "(\{0})" -f $Types[$i]
  39:         if( $i -lt ($Types.Length - 1))
  40:         {
  41:             $TypeFilter += "|"
  42:         }
  43:     }
  44: }
  45: $TrackFilter
  46: $TypeFilter
  47:  
  48: $RegexString = "^(?<TRACK>({0}))(?<INDEX>\d{2})(?<EXT>({1}))$" `
  49:     -f $TrackFilter, $TypeFilter, "{2}"
  50: $FilterMatches = $FileNames | Where-Object {$_ -match $RegexString}
  51: Write "FilterMatches"
  52: $FilterMatches

Download

The script files can be downloaded here: DynamicRegexFilterForFileNames.zip

Ausblick

There is quite some magic in PowerShell. I am still not quite sure how to operate the magical wand correctly, but I am getting there. And most importantly I still enjoy exploring its potential.

Tags: , , ,

PowerShell | Regex

Comments

12/2/2008 8:59:34 PM #

vardis

Good work - keep it up.

vardis United Kingdom |

4/8/2009 8:39:11 AM #

pingback

Pingback from blog.usepowershell.com

Use PowerShell » Turn Your Stored Procedures Into PowerShell Functions - MetaProgramming With PowerShell

blog.usepowershell.com |

7/8/2013 8:10:21 AM #

pingback

Pingback from eonlinegratis.com

How To Rename Part Of A File Name From A Selection Of Files In Windows? | Click & Find Answer !

eonlinegratis.com |

9/5/2013 1:58:45 PM #

pingback

Pingback from jjeasy.com

How to Rename Part of a File Name from a Selection of Files in Windows? - Just just easy answers

jjeasy.com |

Comments are closed

About Klaus Graefensteiner

I like the programming of machines.

Add to Google Reader or Homepage

LinkedIn FacebookTwitter View Klaus Graefensteiner's profile on Technorati
Klaus Graefensteiner

Klaus Graefensteiner
works as Developer In Test and is founder of the PowerShell Unit Testing Framework PSUnit. More...

Open Source Projects

PSUnit is a Unit Testing framwork for PowerShell. It is designed for simplicity and hosted by Codeplex.
BlogShell is The tool for lazy developers who like to automate the composition of blog content during the writing of a blog post. It is hosted by CodePlex.

Administration

About

Powered by:
BlogEngine.Net
Version: 1.6.1.0

License:
Creative Commons License

Copyright:
© Copyright 2014, Klaus Graefensteiner.

Disclaimer:
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Theme design:
This blog theme was designed and is copyrighted 2014 by Klaus Graefensteiner

Rendertime:
Page rendered at 11/23/2014 1:51:50 PM (PST Pacific Standard Time UTC DST -7)