SSIS Custom Transformation Component: INITCAP

Introduction

SSIS (SQL Server Integration Services) is an ETL tool, provided by Microsoft, for effectively managing data loading in enterprise-level Data warehouses. It provides various types of source and destination adapters for synchronizing data among heterogeneous data sources. Many inbuilt transformation components are also available for data manipulation and data streaming in the Dataflow task, although sometimes these transformations seem to be insufficient in transforming data as per business requirements. In such cases, .Net code can be integrated in SSIS using Scripts Component. The challenge comes when you require such functionality in many SSIS packages, because then managing and manipulation of .Net code becomes difficult. SSIS Custom Component written in .Net comes in handy for developers to overcome such situations.

Development Foundation

BIDS (Business Intelligence Development Studio) is an integrated platform for the design and development of SSIS package and Custom components. This article demonstrates the development of Custom component INITCAP, the integration of custom component in BIDS, and the usage of this component in an SSIS package. As the name suggests, “INTICAP” can be used in an SSIS package for manipulation of text data in “Proper Case.”

Let’s start with this interesting journey.

This example is using SQL Server 2008 R2 and .Net 3.5 framework on 64-bit windows environment. If you have a different environment, setting a system of executable and referencing libraries will be changed accordingly. The solution is also available on GitHub here.

Step One: Create New SSIS Project

Using BIDS, create a new class library project “SSISCustomComponents” using C# or your preferable language and add the following references in the project. The default location for these references will be C;/Program Files (x86)/Microsoft SQL Server/100/SDK/Assemblies

  • Microsoft.SqlServer.Dts.Design
  • Microsoft.SqlServer.DTSPipelineWrap
  • Microsoft.SqlServer.DTSRuntimeWrap
  • Microsoft.SqlServer.ManagedDTS
  • Microsoft.SqlServer.PipelineHost

Step Two: Add New Class File

Add “INITCAP.cs” class file in the project. Inherit this class from the PipelineComponent class and assign the required attributes value using DtsPipelineComponent as given here:

using System;
using System.Linq;
using System.Text.RegularExpressions;
using Microsoft.SqlServer.Dts.Pipeline;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Globalization;
using System.Threading;

namespace SSISCustomComponents
{
    [DtsPipelineComponent(DisplayName = "INITCAP"
     , Description = "INITCAP Component for Dataflow Task- Developed by Tejpal Baghel"
     , ComponentType = ComponentType.Transform
     , IconResource="Icon.ico")]
    public class INITCAP : PipelineComponent
    {
    
    }
}

Step Three: ProvideComponentProperties Method

This method will execute when this component is added to a data flow task as design time. The inputs, outputs, and custom properties of this component are defined by this method. Exceptional text processing can be controlled using the “EnableTextProcessing” property, set to either true or false. If this property value is set to true, then exceptional text processing will be applied as per values assigned to “UpperCase,” “LowerCase,” and “MixedCase” custom properties.

public override void ProvideComponentProperties()
    {
        	ComponentMetaData.Name = "INITCAP";
        	ComponentMetaData.Description = "INITCAP Component for Dataflow Task- Developed by Tejpal Baghel";
        	ComponentMetaData.ContactInfo = "Tejpal.Baghel@3PillarGlobal.com";
         RemoveAllInputsOutputsAndCustomProperties();
 
       IDTSInput100 inputObject = ComponentMetaData.InputCollection.New();
        	inputObject.Name = "InputToINITCAP";
        	inputObject.ErrorRowDisposition = DTSRowDisposition.RD_FailComponent;
 
       IDTSOutput100 outputObject = ComponentMetaData.OutputCollection.New();
        	outputObject.Name = "OutputFromINITCAP";
        	outputObject.SynchronousInputID = inputObject.ID; //Synchronous Transform       	
 
       IDTSCustomProperty100 TextProcessing = ComponentMetaData.CustomPropertyCollection.New();
        	TextProcessing.Name = "EnableTextProcessing";
        	TextProcessing.Description = "Enable Text Processing";
        	TextProcessing.Value = "True";
 
       IDTSCustomProperty100 AlwaysUpperCase = ComponentMetaData.CustomPropertyCollection.New();
        	AlwaysUpperCase.Name = "UpperCase";
        	AlwaysUpperCase.Description = "Always Upper Case";
        	AlwaysUpperCase.Value = "IND,US,UK";
 
       IDTSCustomProperty100 AlwaysLowerCase = ComponentMetaData.CustomPropertyCollection.New();
        	AlwaysLowerCase.Name = "LowerCase";
        	AlwaysLowerCase.Description = "Always Lower Case";
        	AlwaysLowerCase.Value = "in,out,for,and,or,not,yes,no";
 
       IDTSCustomProperty100 MixedCaseInput = ComponentMetaData.CustomPropertyCollection.New();
        	MixedCaseInput.Name = "MixedCase";
        	MixedCaseInput.Description = "Mixed Case";
        	MixedCaseInput.Value = "Po Box|PO Box,hod|HoD";

       IDTSCustomProperty100 ColumnsPrefix = ComponentMetaData.CustomPropertyCollection.New();
        	ColumnsPrefix.Name = "ColumnsPrefix";
        	ColumnsPrefix.Description = "ColumnsPrefix";
         ColumnsPrefix.Value = "INITCAP_";

 
        	AddErrorOutput("ErrorFromINITCAP", inputObject.ID, outputObject.ExclusionGroup);
    }

Part A: Custom Logger Method

This is a user-defined centralized and reusable method for fire error/exceptions entry in a log pipeline.

private void Logger(string MessageText)
    	{
        	bool cancel = false;
        	this.ComponentMetaData.FireError(0, this.ComponentMetaData.Name, MessageText, "", 0, out cancel);
    	}

Part Four: Validate Method

The Validate method will execute repeatedly when a component is edited for verification of the component’s configuration in SSIS designer at design time. DTSValidationStatus enum provides information of configuration status.

Enum Member NameValueDescription
VS_ISVALID0Component is correctly configured and ready for execution
VS_ISBROKEN1Component has an error that can be rectified by editing the component in the designer
VS_NEEDSNEWMETADATA2Indicates that an error exists in the ComponentMetaData
VS_ISCORRUPT4Indicates that the component has discovered errors that should only occur if the ComponentMetaSData property has been modified directly

Two options will be performed in this method:

  1. The component is validating text string, so the data type should be either DT_STR or DT_WSTR
  2. New columns should be added in the output stream and named to INITCAP_<Column Name>. The value of <Column Name> will extract from the input stream, while the “ColumnsPrefix” custom property provides initial value and the default value is INITCAP_.
    public override DTSValidationStatus Validate()
            {
                IDTSInput100 input = ComponentMetaData.InputCollection[0];
                string errorMsg = "Wrong datatype specified for {0}. It accepts DT_STR and DT_WSTR";
    
                for (int x = 0; x < input.InputColumnCollection.Count; x++)
                {
                    if (!(input.InputColumnCollection[x].DataType == DataType.DT_STR
                       || input.InputColumnCollection[x].DataType == DataType.DT_WSTR))
                    {
                         Logger(String.Format(errorMsg, input.InputColumnCollection[x].Name));
    return DTSValidationStatus.VS_ISCORRUPT;
                    }
                }
    
    IDTSOutput100 output = ComponentMetaData.OutputCollection[0];
                foreach (IDTSInputColumn100 inputColumn in input.InputColumnCollection)
                {
                    bool IsPresent = false;
                    foreach (IDTSOutputColumn100 outputColumn in output.OutputColumnCollection)
                    {
                        if (outputColumn.Name == ComponentMetaData.CustomPropertyCollection["ColumnsPrefix"].Value.ToString()
    + inputColumn.Name)
                        {
                            IsPresent = true;
                        }
                    }
    
                    if (!IsPresent)
                    {
                        IDTSOutputColumn100 outputcol = output.OutputColumnCollection.New();
                        outputcol.Name = ComponentMetaData.CustomPropertyCollection["ColumnsPrefix"].Value.ToString()
     + inputColumn.Name;
                        outputcol.Description = String.Format("{0} contains valid data", inputColumn.Name);
                        outputcol.SetDataTypeProperties(inputColumn.DataType,inputColumn.Length, 0, 0,inputColumn.CodePage);
                    }
                }
                return DTSValidationStatus.VS_ISVALID;
            }

Step Five: ReinitializeMetaData Method

The ReinitializeMetaData method will execute in response to the component returning VS_NEEDSNEWMETADATA from the Validate method.

public override void ReinitializeMetaData()
        {
            ComponentMetaData.RemoveInvalidInputColumns();
            ReinitializeMetaData();
        }

Step Six: InsertOutputColumnAt Method

This prevents users from adding a new output column using the advanced editor. This is done because the component has already added an output column in the Validate method.

public override IDTSOutputColumn100 InsertOutputColumnAt(int outputID,int outputColumnIndex,string name,string description)
        {
            Logger(string.Format("Fail to add output column name to {0} ", ComponentMetaData.Name));
            throw new Exception(string.Format("Fail to add output column name to {0} ", ComponentMetaData.Name), null);
        }

Step Seven: PreExecute Method

This executes once at runtime prior to starting the execution.

public override void PreExecute()
        {
            IDTSInput100 input = ComponentMetaData.InputCollection[0];
            inputBufferColindex = new int[input.InputColumnCollection.Count];

            Enumerable
                .Range(0, input.InputColumnCollection.Count)
                .ToList()
                .ForEach(i =>
                {
                    IDTSInputColumn100 inputCol = input.InputColumnCollection[i];
                    inputBufferColindex[i] = BufferManager
                                               .FindColumnByLineageID(input.Buffer, inputCol.LineageID);
                });


            IDTSOutput100 output = ComponentMetaData.OutputCollection[0];
            outputBufferColindex = new int[output.OutputColumnCollection.Count];

            Enumerable
                .Range(0, input.InputColumnCollection.Count)
                .ToList()
                .ForEach(i =>
                {
                    IDTSOutputColumn100 outputCol = output.OutputColumnCollection[i];
                    outputBufferColindex[i] = BufferManager
                                                .FindColumnByLineageID(input.Buffer, outputCol.LineageID);
                });
        }

Step Eight: ProcessInput Method

This is the most usable method; it will execute repeatedly as the data flow task receives full buffers from the upstream components. Because this component is synchronous, the incoming rows will be added to the output stream at the same time. The method for “Proper Case” text processing will execute for every row in this method. The EndOfRowset property will be used for the identify end of the input data stream.

public override void ProcessInput(int inputID, PipelineBuffer buffer)
  {
   IDTSInput100 input = ComponentMetaData.InputCollection.GetObjectByID(inputID);
   if (!buffer.EndOfRowset)
   {
while (buffer.NextRow())
	{
for (int x = 0; x < inputBufferColindex.Length; x++)
            {
               string ProperCaseData = "";
try
               {
	ProperCaseData = InitCap_ProcessText(buffer.GetString(inputBufferColindex[x]));
   buffer.SetString(outputBufferColindex[x], ProperCaseData);
               }
               catch (Exception ex)
               {
	//buffer.DirectErrorRow(outputBufferColindex[x], -1, inputBufferColindex[x]);
               }
}
}
}
}

Step Nine: InitCap_ProcessText Method

This is a user-defined method for text processing. Incoming text data will process and prepare the output text in “Proper Case” as required. This function executes another function, “Exceptional_TextProcessing,” for handling exceptional cases during text processing, but only if the component property “EnableTextProcessing” value is set to true.

private string InitCap_ProcessText(string InputText)
    {
       string ProperCaseText = InputText.ToString();
       CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
       TextInfo textInfo = cultureInfo.TextInfo;
       ProperCaseText = textInfo.ToTitleCase(ProperCaseText.ToLower());
 if (Convert.ToBoolean(ComponentMetaData.CustomPropertyCollection["EnableTextProcessing"].Value.ToString()) == true)

        	{
            	ProperCaseText = Exceptional_TextProcessing(ProperCaseText);
        	}
     	return ProperCaseText.Trim();
    }

	
 private string Exceptional_TextProcessing(string ProperCaseText)
    {
        	CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
        	TextInfo textInfo = cultureInfo.TextInfo;
        	string[] ExceptionalCase;
        	string[] ExceptionalCase_Mixed;
        	string TempString;
        	ProperCaseText = " " + ProperCaseText + " ";
 
        	// tranformation exceptions: words to be lowercased
        	ExceptionalCase = textInfo.ToTitleCase(ComponentMetaData.CustomPropertyCollection["LowerCase"].Value.ToString().ToLower()).Split(',');
        	foreach (string myValue in ExceptionalCase)
        	{
            	TempString = " " + myValue.Trim() + " ";
            	if (ProperCaseText.Contains(myValue))
            	{
                	ProperCaseText = ProperCaseText.Replace(TempString, TempString.ToLower());
            	}
        	}
 
        	// tranformation exceptions: words to be UPPERCASED
        	ExceptionalCase = textInfo.ToTitleCase(ComponentMetaData.CustomPropertyCollection["UpperCase"].Value.ToString().ToLower()).Split(',');
        	foreach (string myValue in ExceptionalCase)
        	{
            	TempString = " " + myValue.Trim() + " ";
            	if (ProperCaseText.Contains(TempString))
            	{
                	ProperCaseText = ProperCaseText.Replace(TempString, TempString.ToUpper());
            	}
       	 }
 
        	// tranformation exceptions: words to be Mixed-CASE (business specific)
        	ExceptionalCase = ComponentMetaData.CustomPropertyCollection["MixedCase"].Value.ToString().Split(',');
        	foreach (string myValue in ExceptionalCase)
        	{
            	ExceptionalCase_Mixed = myValue.Split('|');
            TempString = " " + textInfo.ToTitleCase(ExceptionalCase_Mixed[0].ToLower()) + " ";
            	if (ProperCaseText.Contains(TempString))
    	        {
                	ProperCaseText = ProperCaseText.Replace(TempString, ExceptionalCase_Mixed[1].ToString());
            	}
        	}
        	return ProperCaseText.Trim();
    }

Step Ten: Deployment

Below are the steps for the deployment of custom SSIS components

  1. Open the project properties window, click the Build tab, and set Build configuration to “All Configuration” and platform to “Any CPU”
  2. On the signing tab, sign the assembly with a strong name using the strong name key file and password
  3. Build the project and copy the output assembly (either manually or using the command prompt) to the C:/Program Files (x86)/Microsoft SQL Server/100/DTS/PipelineComponents directly for 64-bit machine
  4. Open the command prompt with Administrator privileges and install (uninstall if already installed) the assembly to the GAC using the following commands
    gacutil -u SSISCustomComponents
    gacutil -i “C:/Program Files (x86)/Microsoft SQL Server/100/DTS/PipelineComponenets/SSISCustomComponents.dil”
    Note: if you have multiple versions of the .Net framework installed on deployment machine, then use the correct version of gacutil.exe to add a file in GAC. In this example, the cd C:/Program (x86)/Microsoft SDKs/Windows/v7.0A/bin/x64 command is used to point to the correct gacutil as shown below
     ssis_pic1
  5. Open directory C:/Windows/assembly manually. The SSISCustomComponents assembly should be there as shown below
    ssis_pic2

Step Eleven: Demo Using SSIS

The steps for using custom components in the SSIS package are given below. The sample package is designed for generating output text data in “Proper Case” (INITCAP).

  1. Open BIDS and create a new SQL Server integration services project
  2. Add a new package and rename it to “Demo_INITCAP”
  3. Prepare sample text data in an excel file with three columns (e.g. name, address, city) and fill some rows in it
  4. Add a dataflow task in control flow, then add Excel Source and Destination adapters and configure them correctly
  5. Right click in the Data Flow Transformer section of the SSIS Toolbox and select the Choose Items menu
    ssis_pic3
  6. Inside the dialogue box, select the SSIS Data Flow Items tab and check the INITCAP Item in the list
    ssis_pic4
  7. Now INITCAP Transformation will be available in Toolbox. Drag and drop it into the dataflow and configure it as shown below
    ssis_pic5
  8. The default custom property settings of the INITCAP component will look like the following image and can be configured as per required text processing. EnableTextProcessing can be set to False for only proper case output. ColumnsPrefix can be changed as required
    ssis_pic6
  9. A new column with the name INITCAP_<Column Name> will be created automatically for every given input in the dataflow as shown below
    ssis_pic7
    Execute the package and verify the result in the output file. A quick look at the data formatting using the INITCAP transformation captured by the data viewer is shown below. INITCAP_Name, INITCAP_Address, and INITCAP_City are the output columns created by this transformation, based on the Name, Address, and City columns respectively. The data in the output columns are produced in initial caps locl (“Proper Case”).
    ssis_pic8

Conclusion

This article demonstrates all of the steps for the development, configuration, and usability of a solution-oriented custom component written in .Net for an enterprise-level SSIS solution. Although this is a basic example to start with, the SSIS custom component can also be implemented for business-centric requirements.

 

Tejpal Baghel

Tejpal Baghel

Technical Lead

Tejpal Baghel is a Technical Lead working at 3Pillar Global. Tejpal has 8 years of experience in data warehousing and business intelligence solution development using MSBI technology. He has enterprise business intelligence solution development experience in healthcare, financial management, B2B integration and revenue forecasting. Prior to joining 3Pillar, he has worked in Tata Consultancy Services (TCS) and Accretive Health. Tejpal is a graduate in Computer Science from Agra University in India. He is actively involved in social activities and believes in the power of politics for building a better nation.

Leave a Reply

Related Posts

How to Manage the “Need for Speed” Without Sacri... The pace of innovation today is faster than it has ever been. Customers are much more active and vocal thanks to social and mobile channels, and the c...
Determining the First Release The first thing you release needs to put the solution to your customer's most important problem in their hands. Deciding what the most important probl...
The Art of Building Rapid (and Valuable) Proofs of Concept Clients and stakeholders want results. They want assurances that their investment is well spent and they're building the right product. The software d...
Are You Doing Stuff or Creating Value? You can put a bunch of stickies on the wall, create tons of JIRA tickets, and commit lots of code, but are you creating value? Is the work your produc...
Costovation – Giving Your Customers Exactly What They ... On this episode of The Innovation Engine podcast, we delve into “cost-ovation,” or innovation that gives your customers exactly what they want – and n...