Chapter 1: Developing SSIS packages
Chapter Summary
- Creating SSIS packages involves working with BIDS and creating a new SSIS
project.
- The main object in an SSIS project is a package, which contains the business
logic to manage workflow and process data.
- Within a package, the control flow lets you create tasks and containers, which
provide the ability to run process-oriented operations.
- The Data Flow Task is the second core object (behind the control flow) in
an SSIS package, enabling data-processing operations.
- The data flow uses source adapters, destination adapters, and transformations.
Lesson 1: Creating SSIS Packages and Data Sources
1. you are asked to combine data from an Excel workbook
and a database table and then push the results to a fixed-width flat file. Can
you accomplish this task by using the Import And Export Wizard?
No. The Import And Export Wizard lets you work with only a single source and a
single destination. To combine data merging or data cleansing tasks, you need to
either create a new package specifically for that purpose or modify a package previously
created by the wizard.
2. you need to create both SSIS packages to process your
data and SSAS cubes to perform analysis. Can you create both objects in a single
project?
No. you cannot create both SSIS and SSAS objects in one project because BIDS does
not let you combine objects used for different platforms. you need to build two
separate projects in BIDS: one for the SSIS packages and another for the SSAS cubes
and dimensions.
3. What is the difference between a project data source
and a package connection?
Both project data sources and package connections are connection strings. However,
a data source resides outside the package and can be used as the connection reference
for more than one package. A package connection does not have to be associated with
a data source.
4. If a connection references a data source and the data
source is changed, when will the connection be updated?
Connections are updated by their associated data sources only when the package
is opened for editing in BIDS.
Lesson 2: Creating and Editing Control Flow Objects
1. What is the difference between a control flow task and
a control flow container?
Control flow tasks perform operations, whereas containers coordinate and group
tasks. for example, a foreach Loop Container can look through the files in a system
folder, and a file System Task embedded within the container can then move the files
to a new folder location.
2. To run a stored procedure within a SQL Server database,
what task would you choose?
The Execute SQL Task can run a stored procedure within SQL Server or any relational
database for which you have an installed data provider. The syntax of the statement
entered in the Execute SQL Task will be in the native language of the underlying
database.
Lesson 3: Using Data Flow Adapters and Transformations
1. How would you use SSIS to import a file from an fTP
server to a SQL Server table?
First, you would use an fTP Task to copy the file to the machine on which SSIS
is installed. you can then import the file into a SQL Server table by using a Data
flow Task configured with a flat file Source adapter and either a SQL Server Destination
adapter or an OLEDB Destination adapter.
2. you need to migrate a user-created Access database to
SQL Server, but the Data flow toolbox does not contain an Access source adapter.
How do you import this data into SQL Server?
Although not listed in the toolbox, Access is one of the many database sources
and destinations that SSIS works with. To extract data from Access, you first need
to make a package connection to the Microsoft Jet OLEDB Provider. You can then use
the OLEDB Source adapter to select the table or perform a custom query.
3. The Multicast Transformation and the Conditional Split
Transformation both can have multiple outputs. Which transformation would you use
if you needed to send rows matching certain conditions out one output and rows matching
different conditions out another?
The Conditional Split Transformation lets you define expressions against which
the rows from the source are evaluated. for every row, the expressions are evaluated
in order, and a row is sent out the first output when the matching expression evaluates
to True. Therefore, any single row can go out only one output. With a Multicast
Transformation, on the other hand, all rows go out every output.
4. Describe the transformations you could use to combine
data from two different database tables that exist on two different servers.
To combine data from two different database tables that exist on two different
servers, you could use the Merge Join Transformation, which combines datasets by
joining the rows across a set of common keys. This transformation allows an inner
join, a left outer join, or a full outer join. you could also use a Lookup Transformation
to associate data from two sources. The Lookup can cache a table in memory and,
through matching columns, can return new columns to the data flow.
Creating an ETL Solution
Case scenario
The business development department of Adventure Works has requested that you
implement a data mart that it can use to analyze reseller sales against salesperson
sales targets. Your first task is to create a series of SSIS packages that move
data from the source Enterprise Resource Planning (ERP) system to a data mart database
that contains fact tables and dimension tables.
1. How would you work within BIDS to create SSIS project structures, packages,
project data sources, and package connections?
2. What transformations would you use, and how would you implement the data flow
that loads dimension tables?
3. What transformations would you use, and how would you implement the data flow
that loads fact tables?
Answers
1. The best practice for creating a set of packages that all work together is
to create a single SSIS project within BIDS. You would then create multiple packages
within the project, one for each of the different dimensions and fact tables. Because
all the packages would use the same source connection string to the ERP system and
the same destination connection string to the data mart, you should create two data
sources within the project: one for the source and one for the destination. Last,
each package would need to reference the project data sources, so within each package,
you would create package connections based on the project data sources.
2. The SSIS data flow contains a Slowly Changing Dimension Transformation that
can handle changes and new records in the dimension tables. Before using the Slowly
Changing Dimension Transformation, you need to create a source adapter to the ERP
system that pulls the data to be compared with the dimension tables. You might need
to use a transformation to clean and correct any data anomalies, and after those
steps are complete, you can connect the data to the Slowly Changing Dimension Transformation.
Ideally, you would create a separate SSIS package for each dimension package so
that they could be reused for different groups of package executions.
3. Fact tables contain surrogate keys that reference the dimension tables but
also contain the business keys from the source. So as you are pulling the reseller
sales data and the sales quotas, you can use a Lookup Transformation to get the
surrogate keys from dimension tables by joining across the business keys. You can
then insert new rows into the fact tables by using the surrogate keys and measures.
As you do when using dimensions, you will typically have a separate package for
each fact table that needs processing.