Meta-Programming with PowerShell and Regular Expressions

by Klaus Graefensteiner 11/25/2008 12:13:33 AM

Introduction

I am almost done with my biggest PowerShell scripting project so far. The last feature that I implemented was a flexible pattern matching facility. I wrote a procedure that would try to filter out a set of file names based on the first letters and the extensions of the file. The possible name prefixes and extensions would be provided by string arrays that could contain zero or more elements. The challenge as always with PowerShell is to understand its type conversion magic. One approach to this is to use baby steps while changing the static regex string to a dynamically built regex string. This progression and the result is captured in this blog post.

Pattern matching white board session

Figure 1: Pattern matching white board session

Regex Progression

The following PowerShell snippets show the steps of converting a static expression into a dynamically generated Regex. In this case the different prefixes are combined as alternations and so are the values of the file extensions.

   1: [String[]] $Types = ".ZIP", ".WMV"
   2: #[String[]] $Tracks = "TL"
   3: #[String[]] $Tracks = $null
   4: [String[]] $Tracks = "TL", "PC"
   5:  
   6: [String[]] $FileNames = "TL01.ZIP", "TL02.ZIP", "TL02.WMV", "TL02.PPTX", 
   7:                         "TL03.PPTX", "BB02.ZIP", "BB03.WMV", "BB03.PPTX", 
   8:                         "PC05.PPTX", "PC09.WMV", "PC09.PPTX"
   9:  
  10:  
  11: Write "Static Regex matches all`n`n" 
  12: # Static Regex matches all
  13: $FileNames | ForEach-Object { 
  14:     ($_ -match  '^(?<TRACK>.*?)(?<INDEX>\d{2})(?<EXT>\..*)$'); 
  15:     $matches.TRACK; $matches.EXT; $matches[0] }
  16:  
  17:  
  18: Write "Static Regex matches only TL sessions`n`n" 
  19: # Static Regex matches only TL sessions
  20: $FileNames | ForEach-Object { 
  21:     ($_ -match  '^(?<TRACK>(TL))(?<INDEX>\d{2})(?<EXT>\..*)$'); 
  22:     $matches.TRACK; $matches.EXT; $matches[0] }
  23:  
  24:  
  25: Write "Static Regex matches TL and PC sessions`n`n" 
  26: # Static Regex matches only TL and PC sessions
  27: $FileNames | ForEach-Object { 
  28:     ($_ -match  '^(?<TRACK>((TL)|(PC)))(?<INDEX>\d{2})(?<EXT>\..*)$'); 
  29:     $matches.TRACK; $matches.EXT; $matches[0] }
  30:  
  31:  
  32: Write "Static Regex matches TL and PC sessions and .WMV files`n`n" 
  33: # Static Regex matches only TL and PC sessions and .WMV files
  34: $FileNames | ForEach-Object { $_ ; 
  35:     ($_ -match  '^(?<TRACK>((TL)|(PC)))(?<INDEX>\d{2})(?<EXT>(\.WMV))$'); 
  36:     $matches.TRACK; $matches.EXT; $matches[0] }
  37:  
  38:  
  39: Write "Static Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
  40: # Static Regex matches only TL and PC sessions and .WMV and .ZIP files
  41: $FileNames | ForEach-Object { $_ ; 
  42:     ($_ -match '^(?<TRACK>((TL)|(PC)))(?<INDEX>\d{2})(?<EXT>((\.WMV)|(\.ZIP)))$'); 
  43:     $matches.TRACK; $matches.EXT; $matches[0] }
  44:  
  45:  
  46: Write "Static Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
  47: # Static Regex matches only TL and PC sessions and .WMV and .ZIP files 
  48: # Replaced ' with " in regex string
  49: $FileNames | ForEach-Object { $_ ; 
  50:     ($_ -match "^(?<TRACK>((TL)|(PC)))(?<INDEX>\d{2})(?<EXT>((\.WMV)|(\.ZIP)))$"); 
  51:     $matches.TRACK; $matches.EXT; $matches[0] }
  52:  
  53:  
  54: Write "Static Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
  55: # Static Regex matches only TL and PC sessions and .WMV and .ZIP files
  56: # Refactoring Track group and Exension group filter values by using the format 
  57: # operator
  58: # The {2} string is going to replaced by {2}
  59: $RegexString = "^(?<TRACK>({0}))(?<INDEX>\d{2})(?<EXT>({1}))$" `
  60:     -f "(TL)|(PC)", "(\.WMV)|(\.ZIP)", "{2}"
  61: $FileNames | ForEach-Object { $_ ; ($_ -match $RegexString  ); 
  62:     $matches.TRACK; $matches.EXT; $matches[0] }
  63:  
  64:  
  65: Write "Static Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
  66: # Static Regex matches only TL and PC sessions and .WMV and .ZIP files
  67: # Adding the filter variables
  68: $TrackFilter = "(TL)|(PC)"
  69: $TypeFilter = "(\.WMV)|(\.ZIP)"
  70: $RegexString = "^(?<TRACK>({0}))(?<INDEX>\d{2})(?<EXT>({1}))$" `
  71:     -f $TrackFilter, $TypeFilter, "{2}"
  72: $FileNames | ForEach-Object { $_ ; ($_ -match $RegexString  ); 
  73:     $matches.TRACK; $matches.EXT; $matches[0] }
  74:  
  75:  
  76: Write "Static Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
  77: # Static Regex matches only TL and PC sessions and .WMV and .ZIP files
  78: # Building values of filter variables dynamically
  79:  
  80: $TrackFilter = ""
  81: if ($Tracks -eq $null -or $Tracks.Length -eq 0)
  82: {
  83:     $TrackFilter = ".*?"
  84: }
  85: elseif ($Tracks.Length -gt 0)
  86: {
  87:     for($i=0; $i -lt $Tracks.Length; $i++)
  88:     {
  89:         $TrackFilter += "({0})" -f $Tracks[$i]
  90:         if( $i -lt ($Tracks.Length - 1))
  91:         {
  92:             $TrackFilter += "|"
  93:         }
  94:     }
  95: }
  96:  
  97: $TypeFilter = ""
  98: if ($Types -eq $null -or $Types.Length -eq 0)
  99: {
 100:     $TypeFilter = "\..*"
 101: }
 102: elseif ($Types.Length -gt 0)
 103: {
 104:     for($i=0; $i -lt $Types.Length; $i++)
 105:     {
 106:         $TypeFilter += "(\{0})" -f $Types[$i]
 107:         if( $i -lt ($Types.Length - 1))
 108:         {
 109:             $TypeFilter += "|"
 110:         }
 111:     }
 112: }
 113: $TrackFilter
 114: $TypeFilter
 115: $RegexString = "^(?<TRACK>({0}))(?<INDEX>\d{2})(?<EXT>({1}))$" `
 116:     -f $TrackFilter, $TypeFilter, "{2}"
 117: $FileNames | ForEach-Object { $_ ; ($_ -match $RegexString  ); 
 118:     $matches.TRACK; $matches.EXT; $matches[0] }
 119:  
 120:  
 121: Write "Static Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
 122: # Static Regex matches only TL and PC sessions and .WMV and .ZIP files
 123: # Adding the matches to an array of strings
 124:  
 125:  
 126: $TrackFilter = ""
 127: if ($Tracks -eq $null -or $Tracks.Length -eq 0)
 128: {
 129:     $TrackFilter = ".*?"
 130: }
 131: elseif ($Tracks.Length -gt 0)
 132: {
 133:     for($i=0; $i -lt $Tracks.Length; $i++)
 134:     {
 135:         $TrackFilter += "({0})" -f $Tracks[$i]
 136:         if( $i -lt ($Tracks.Length - 1))
 137:         {
 138:             $TrackFilter += "|"
 139:         }
 140:     }
 141: }
 142:  
 143: $TypeFilter = ""
 144: if ($Types -eq $null -or $Types.Length -eq 0)
 145: {
 146:     $TypeFilter = "\..*"
 147: }
 148: elseif ($Types.Length -gt 0)
 149: {
 150:     for($i=0; $i -lt $Types.Length; $i++)
 151:     {
 152:         $TypeFilter += "(\{0})" -f $Types[$i]
 153:         if( $i -lt ($Types.Length - 1))
 154:         {
 155:             $TypeFilter += "|"
 156:         }
 157:     }
 158: }
 159: $TrackFilter
 160: $TypeFilter
 161:  
 162: $RegexString = "^(?<TRACK>({0}))(?<INDEX>\d{2})(?<EXT>({1}))$" `
 163:     -f $TrackFilter, $TypeFilter, "{2}"
 164: $FilterMatches = $FileNames | Where-Object {$_ -match $RegexString}
 165: Write "FilterMatches"
 166: $FilterMatches
 167:  
 168:  
 169:  

The Final Script

And here is the final solution.

   1: [String[]] $Types = ".ZIP", ".WMV"
   2: [String[]] $Tracks = "TL", "PC"
   3:  
   4: [String[]] $FileNames = "TL01.ZIP", "TL02.ZIP", "TL02.WMV", "TL02.PPTX", 
   5:                         "TL03.PPTX", "BB02.ZIP", "BB03.WMV", "BB03.PPTX", 
   6:                         "PC05.PPTX", "PC09.WMV", "PC09.PPTX"
   7:  
   8:  
   9:  
  10: Write "Dynamic Regex matches TL and PC sessions and .WMV and .ZIP files`n`n" 
  11:  
  12: $TrackFilter = ""
  13: if ($Tracks -eq $null -or $Tracks.Length -eq 0)
  14: {
  15:     $TrackFilter = ".*?"
  16: }
  17: elseif ($Tracks.Length -gt 0)
  18: {
  19:     for($i=0; $i -lt $Tracks.Length; $i++)
  20:     {
  21:         $TrackFilter += "({0})" -f $Tracks[$i]
  22:         if( $i -lt ($Tracks.Length - 1))
  23:         {
  24:             $TrackFilter += "|"
  25:         }
  26:     }
  27: }
  28:  
  29: $TypeFilter = ""
  30: if ($Types -eq $null -or $Types.Length -eq 0)
  31: {
  32:     $TypeFilter = "\..*"
  33: }
  34: elseif ($Types.Length -gt 0)
  35: {
  36:     for($i=0; $i -lt $Types.Length; $i++)
  37:     {
  38:         $TypeFilter += "(\{0})" -f $Types[$i]
  39:         if( $i -lt ($Types.Length - 1))
  40:         {
  41:             $TypeFilter += "|"
  42:         }
  43:     }
  44: }
  45: $TrackFilter
  46: $TypeFilter
  47:  
  48: $RegexString = "^(?<TRACK>({0}))(?<INDEX>\d{2})(?<EXT>({1}))$" `
  49:     -f $TrackFilter, $TypeFilter, "{2}"
  50: $FilterMatches = $FileNames | Where-Object {$_ -match $RegexString}
  51: Write "FilterMatches"
  52: $FilterMatches

Download

The script files can be downloaded here: DynamicRegexFilterForFileNames.zip

Ausblick

There is quite some magic in PowerShell. I am still not quite sure how to operate the magical wand correctly, but I am getting there. And most importantly I still enjoy exploring its potential.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , ,

PowerShell | Regex

Powered by BlogEngine.NET 1.3.0.0
Vanilla Theme by Klaus Graefensteiner

About Klaus Graefensteiner

GRAVATAR icon of Klaus Graefensteiner I enjoy the programming of machines.

E-mail me Send mail
Blogroll as OPML OPML LinkedIn Profile View Klaus Graefensteiner's LinkedIn profile

Calendar

<<  January 2009  >>
MoTuWeThFrSaSu
2930311234
567891011
12131415161718
19202122232425
2627282930311
2345678

View posts in large calendar

Recent comments

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2009

Sign in