Data Profiling using SSIS
Applies to: SQL Server 2012 Standard Edition.
Data profiling is the process of examining the data to obtain statistics about
it and use those statistics to better understand the data, to identify problems
with the data, and to help properly design ETL processes.
To profile data using SSIS, please perform the following steps:
1. Create an Integration Services Project with SQL Server Data Tools.
2. Create a new ADO .NET Connection to the data source you would like to
profile. Top create this connection you will need to provide the server
name, the authentication type (Windows Authentication/SQL
Authentication), and the database where the table(s) you will be
profiling is(are) located.
3. Double click the Data Profiling Task on the Common section of the
SSIS Toolbox or drag it to Control Flow surface.
4. Configure the Data Profiling Task.
4.1 Specify the XML files where de results of the data profiling will be
stored.
Let's edit the Data Profiling task and create a new file connection.
4.2 Specify the profile type that will be performed and the table use as data
source.
Now on the Profile Requests tab of the Data Profiling Task Editor, let's select
the "Column Value Distribution Profile Request" from the Profile Type dropdown
list. Take a look at all available profile types. For a detailed description of
each profile type, please read the
Data Profiling Task article on TechNet.
Next, let's provide the connection, table and column on that table that will be
analyzed.
5. On the Debug menu, click "Start Debugging" to start analyzing the data.
6. Finally, let's use Data Profile Viewer, to see the statistics returned by the
profiling process.
Double click the Data Profiling Task.
Make a click on the Open Profile Viewer button.