Validating CSV Headers In Your Scripts

EDIT: I forgot to put the link to the project in the original post. Whoops. I added it to the bottom of this post.

I tend to work with a lot of CSV files in my scripts, either importing them as data sources to control the scripts or as exports of data sets to provide to other people and report on what was done.

I was working on my HPE Automation project and one of the things that came up in my mind was how can I hand this script off, and validate the data in the csv is correct. Beyond that - how can I validate at the very get go that the headers of the CSV file are correct. If this was data coming in through as parameters to a function this would be easy since the validation rules could be used. I could also wrap a try-catch around every segment of code that even looks at a vale of the csv - provided it is one of the commands that actually returns an error when no results are returned.

With all of these differences - and wanting to make it easy to hand off to another team - I began working on a CSV validation tool. Before embarking on this journey I did search the web for something that handled this already - and came up empty (and if you know of something - please tell me!)

So my criteria for this module are as follows:

  1. Validate the headers of a CSV file - with the option to mark a file as invalid if more headers are present other than what is provided in an array.

  2. Return an object which contains the following data

    1. The missing headers

    2. A boolean noting if all headers required are present

    3. A boolean noting if all the required headers are accounted for, or if there are more headers than defined required headers

    4. A boolean noting if the CSV - based on the required headers and switch indicating if extra headers are allowed - if the CSV is valid

  3. Be expandable to have additional rules instead of just headers. This would be in the form of a custom object and would define the header and the rules in it. This is a secondary objective - and as of current is something coming later.

After a little work the initial form of the project has taken shape, and I was able to easily integrate it into a lot of work projects I am working on. The current script handles requirements 1 and 2 and can be expanded to handle requirement 3 shortly.

So how does it work?

After installing and importing the module there is currently only one command: test-csv.

This command has only three parameters currently.

  1. Path - The path to the CSV file to validate

  2. Headers - an array of headers to make sure are present in the CSV file

  3. DisallowHeadersWithoutRules - if this switch is passed then if the CSV file has more headers than are present in the headers array, the file determined to not be valid. Only files that have exactly the same headers as the headers parameter are considered valid.

An example of running the script with results is below. In this test the header temp was provided in the headers array, and is marked as a missing header in the returned object under MissingHeaders.

TestCsvImage1.PNG

An example of a valid CSV file result is shown below.

TestCsvImage2.PNG

With this module now working at least partially what I want, I can now add the following line of code to my code to determine that my CSV at least has the headers the script is expecting before continuing on:

$csvTestResults = test-csv "c:\temp\sampleCsv.csv" -headers @("Name")
if($($csvTestResults.CSVValid))
{
...Process...
}
else{
throw "The csv file used is invalid.  The missing headers are $($csvtestresults.missingheaders -join ',')"
}

Now when the script fails - the person from the other team running it knows WHY it failed without me having to write custom validation code each and every time. This saves each of us headaches and allows for a consistent experience for all of the scripts that they use

Link to the github repo for this script: Link