Finding missing and duplicate lines in text files using PowerShell

by Klaus Graefensteiner 22. July 2008 07:17

Introduction

Every now and then I need to investigate bugs that get exposed because countable "things" are less or more than expected. In my special case I was dealing with an application that manages subscription handles to memory registers of programmable logic controllers (PLC). These programs are called Data Access Servers. Occasionally we get calls form customers reporting that the number of handles is less, or sometimes more than expected. The first step to debug this situation is to find out, which handles are missing or which ones are duplicates. This blog post describes how PowerShell's Compare-Object cmdlet makes this task, which used to be a pain in the "peep" now a piece of cake.

Topology analysis of planet Jupiter 

Figure 1: Topology analysis of planet Jupiter

Procedure

The PLC memory addresses that clients are subscribing to are maintained in a list data structure. This list can be displayed in an MMC snap-in and it can be exported to a text file.

Step1: Export the item list

Open the Wonderware SMC and navigate to your DA Server node and the to Diagnostics\Client Groups\YourGroupName. Right click and export items into a text file.

Exporting Client Group Items I

Figure 2: Exporting Client Group Items

Item list in Notepad

Figure 3: Item list in Notepad

Step2: Fire up PowerShell

Script for missing items

This script requires two files. One file that got exported with the correct number items. I named it ClientItemsBefore.txt. The other file was exported when items were missing. This is called ClientItemsAfter.txt.

   1: #Load strings from first file
   2: $a = Get-Content -Path $home\desktop\ClientItemsBefore.txt
   3: #Initialize arrays of item names that we are going to compare
   4: $ins1 = ,0
   5: $ins2 = ,0
   6:  
   7: #Exclude headings and start with first data line
   8: #Get only the name of the item
   9: for($i = 1; $i -lt $a.Length; $i++)
  10: {
  11:     $in = ($a[$i].split("`t"))[0]
  12:     $ins1 += $in
  13: }
  14:  
  15: #Load strings from second file
  16: $a = Get-Content -Path $home\desktop\ClientItemsAfter.txt
  17: #Exclude headings start with first data line
  18: for($i = 1; $i -lt $a.Length; $i++)
  19: {
  20:     $in = ($a[$i].split("`t"))[0]
  21:     $ins2 += $in
  22: }
  23: #Find the differences in the two lists
  24: Compare-Object $ins1 $ins2
Script for duplicate items

In this case I exported the items into a file called ClientItemsWithDuplicates.txt.

   1: cls
   2: #Load strings from first file
   3: $a = Get-Content -Path $home\desktop\ClientItemsWithDuplicates.txt
   4: #Initialize arrays of item names that we are going to compare
   5: $ins1 = ,0
   6:  
   7: #Exclude headings start and with first data line
   8: #Get only the name of the item
   9: for($i = 1; $i -lt $a.Length; $i++)
  10: {
  11:     $in = ($a[$i].split("`t"))[0]
  12:     $ins1 += $in
  13: }
  14:  
  15: #Get rid of duplicates
  16: $ins2 = $ins1  | sort -Unique
  17:  
  18: #Compare before and after
  19: Compare-Object $ins1 $ins2

That's it

Before PowerShell I would, in the missing items case, combine the two item export files into one file. Then I would use a LogParser query with a Having clause to find the ones that are not duplicates. If I expected duplicates in one file, then I would use LogParser and search for them with an appropriate Having clause to filter them out. Is PowerShell going to replace the LogParser?

The script and item export files can be downloaded here: FindingMissingItems.zip

kick it on DotNetKicks.com

Tags: , , , ,

Debugging | Wonderware

About Klaus Graefensteiner

I like the programming of machines.

Add to Google Reader or Homepage

LinkedIn FacebookTwitter View Klaus Graefensteiner's profile on Technorati
Klaus Graefensteiner

Klaus Graefensteiner
works as developer in Test at Rockwell Automation and is founder of the PowerShell Unit Testing Framework PSUnit. More...

Administration

About

Powered by:
BlogEngine.Net
Version: 1.5.0.7

License:
Creative Commons License

Copyright:
© Copyright 2009, Klaus Graefensteiner.

Disclaimer:
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Theme design:
This blog theme was designed and is copyrighted 2009 by Klaus Graefensteiner

Rendertime:
Page rendered at 9/9/2010 6:34:20 PM (PST Pacific Standard Time UTC DST -7)