Chapter 1: Developing SSIS packages

 Chapter Summary

  • Creating SSIS packages involves working with BIDS and creating a new SSIS project.
  • The main object in an SSIS project is a package, which contains the business logic to manage workflow and process data.
  • Within a package, the control flow lets you create tasks and containers, which provide the ability to run process-oriented operations.
  • The Data Flow Task is the second core object (behind the control flow) in an SSIS package, enabling data-processing operations.
  • The data flow uses source adapters, destination adapters, and transformations.

 Lesson 1: Creating SSIS Packages and Data Sources

1. you are asked to combine data from an Excel workbook and a database table and then push the results to a fixed-width flat file. Can you accomplish this task by using the Import And Export Wizard?

No. The Import And Export Wizard lets you work with only a single source and a single destination. To combine data merging or data cleansing tasks, you need to either create a new package specifically for that purpose or modify a package previously created by the wizard.

2. you need to create both SSIS packages to process your data and SSAS cubes to perform analysis. Can you create both objects in a single project?

No. you cannot create both SSIS and SSAS objects in one project because BIDS does not let you combine objects used for different platforms. you need to build two separate projects in BIDS: one for the SSIS packages and another for the SSAS cubes and dimensions.

3. What is the difference between a project data source and a package connection?

Both project data sources and package connections are connection strings. However, a data source resides outside the package and can be used as the connection reference for more than one package. A package connection does not have to be associated with a data source.

4. If a connection references a data source and the data source is changed, when will the connection be updated?

Connections are updated by their associated data sources only when the package is opened for editing in BIDS.

Lesson 2: Creating and Editing Control Flow Objects

1. What is the difference between a control flow task and a control flow container?

Control flow tasks perform operations, whereas containers coordinate and group tasks. for example, a foreach Loop Container can look through the files in a system folder, and a file System Task embedded within the container can then move the files to a new folder location.

2. To run a stored procedure within a SQL Server database, what task would you choose?

The Execute SQL Task can run a stored procedure within SQL Server or any relational database for which you have an installed data provider. The syntax of the statement entered in the Execute SQL Task will be in the native language of the underlying database.

Lesson 3: Using Data Flow Adapters and Transformations

1. How would you use SSIS to import a file from an fTP server to a SQL Server table?

First, you would use an fTP Task to copy the file to the machine on which SSIS is installed. you can then import the file into a SQL Server table by using a Data flow Task configured with a flat file Source adapter and either a SQL Server Destination adapter or an OLEDB Destination adapter.

2. you need to migrate a user-created Access database to SQL Server, but the Data flow toolbox does not contain an Access source adapter. How do you import this data into SQL Server?

Although not listed in the toolbox, Access is one of the many database sources and destinations that SSIS works with. To extract data from Access, you first need to make a package connection to the Microsoft Jet OLEDB Provider. You can then use the OLEDB Source adapter to select the table or perform a custom query.

3. The Multicast Transformation and the Conditional Split Transformation both can have multiple outputs. Which transformation would you use if you needed to send rows matching certain conditions out one output and rows matching different conditions out another?

The Conditional Split Transformation lets you define expressions against which the rows from the source are evaluated. for every row, the expressions are evaluated in order, and a row is sent out the first output when the matching expression evaluates to True. Therefore, any single row can go out only one output. With a Multicast Transformation, on the other hand, all rows go out every output.

4. Describe the transformations you could use to combine data from two different database tables that exist on two different servers.

To combine data from two different database tables that exist on two different servers, you could use the Merge Join Transformation, which combines datasets by joining the rows across a set of common keys. This transformation allows an inner join, a left outer join, or a full outer join. you could also use a Lookup Transformation to associate data from two sources. The Lookup can cache a table in memory and, through matching columns, can return new columns to the data flow.

Creating an ETL Solution

 

Case scenario

The business development department of Adventure Works has requested that you implement a data mart that it can use to analyze reseller sales against salesperson sales targets. Your first task is to create a series of SSIS packages that move data from the source Enterprise Resource Planning (ERP) system to a data mart database that contains fact tables and dimension tables.

1. How would you work within BIDS to create SSIS project structures, packages, project data sources, and package connections?

2. What transformations would you use, and how would you implement the data flow that loads dimension tables?

3. What transformations would you use, and how would you implement the data flow that loads fact tables?

Answers

 

1. The best practice for creating a set of packages that all work together is to create a single SSIS project within BIDS. You would then create multiple packages within the project, one for each of the different dimensions and fact tables. Because all the packages would use the same source connection string to the ERP system and the same destination connection string to the data mart, you should create two data sources within the project: one for the source and one for the destination. Last, each package would need to reference the project data sources, so within each package, you would create package connections based on the project data sources.

2. The SSIS data flow contains a Slowly Changing Dimension Transformation that can handle changes and new records in the dimension tables. Before using the Slowly Changing Dimension Transformation, you need to create a source adapter to the ERP system that pulls the data to be compared with the dimension tables. You might need to use a transformation to clean and correct any data anomalies, and after those steps are complete, you can connect the data to the Slowly Changing Dimension Transformation. Ideally, you would create a separate SSIS package for each dimension package so that they could be reused for different groups of package executions.

3. Fact tables contain surrogate keys that reference the dimension tables but also contain the business keys from the source. So as you are pulling the reseller sales data and the sales quotas, you can use a Lookup Transformation to get the surrogate keys from dimension tables by joining across the business keys. You can then insert new rows into the fact tables by using the surrogate keys and measures. As you do when using dimensions, you will typically have a separate package for each fact table that needs processing.