Ben
DataBase & Business Intelligence Expert Solution
  Home   Services Forum Jobs References Live demo   Contacts  
Mnaouar Ben Khelifa
Email:
Password:
Remember
Subscribe?     Forgot Password ?
tunis annonces femme cherche
tunis ads weatherServices
tunis annonces meuble
tunis annonces autotunis annonces chien Power BI
tunis annonces meuble
tunis annonces autotunis annonces chien Database
tunis annonces meuble
tunis annonces autotunis annonces chien Integration Services
tunis annonces meuble
tunis annonces autotunis annonces chien Tabular Vs OLAP
tunis annonces meuble
tunis annonces autotunis annonces chien Reporting Services
tunis annonces meuble
tunis annonces autotunis annonces chien Analysis Services
tunis annonces meuble
tunis annonces autotunis annonces chien Data Modeling
tunis annonces meuble
tunis annonces autotunis annonces chien Data Architect
tunis annonces meuble
tunis annonces autotunis annonces chien Crystal Reports
tunis chat chien
End of Left Navigation
End of Left Navigation
End of Left Navigation
weather montreal
tunis annonces You are here: Home > Services > Integration Services >

Data base & business intelligence solution website




Is the functions performed when pulling data out of one database and placing it into another of a different type. ETL is used to migrate data, often from relational databases into decision support systems.

why we need integration services :
1- when having data in different databases , different file formats Example when having data in SQL server others in files, access, oracle or in db2 and we want to migrate one of them or all of them to one server , example to SQL server for future reports
2- To export data from current server to another server for reports in order to improve performance of the online server or to create cubes.

We are the expert of ETL ( SSIS) integration(200, 2005, 2008, 2008R2), we have created more than 1000 complex packages, all runs in scheduled way, we could solve any problem you have in ETL .

Why you choose us : Because we could accomplish the job you asking in half time any company estimation does, without any bug ,very clean , answering all your needs.


SSIS architecture :


A sample SSIS package:



Microsoft SQL Server 2005/2008 Integration Services (SSIS) is a platform for building high performance data integration solutions, including the extraction, transformation, and loading (ETL) of packages for data warehousing. SSIS is the new name assigned to the component formerly branded as Data Transformation Services (DTS).



With SSIS you could :
o- Execute DDL query statements against an Analysis Services server
o- Process an Analysis Services cube
o- Insert data from a file into a database
o- Execute a data mining query
o- Execute an SSIS package from windows tasks or SQL server tasks
o- Perform file system operations such as copy or delete
o- Perform FTP operations
o- Send e-mail with attachment files prepared from the database
o- Transfer an entire database between two SQL Servers
o- Transfer custom error messages between two SQL Servers
o- Transfer jobs between two SQL Servers
o- Perform operations on XML data
o- Evaluates and splits up rows in a dataset
o- Converts data to a different datatype
o- Groups rows that contain similar values
o- Looks up values using fuzzy matching
o- Merges two sorted datasets
o- Builds a pivot table from a dataset
o- Executes a custom script




ETL Development with

SQL Server Integration Services

Overview

This purpose of this lab is to introduce ETL development to you using an actual ETL tool. The tool we will use is called SQL Server Integration Services or SSIS. The goal is to help you get a feel for how the concepts you’ve learned in class apply in a real-world scenario.

Disclaimer!

As such it should be noted, this lab does not demonstrate everything you need to know about the ETL process, nor does it serve as training tool for SSIS. Both of these activities are beyond the scope of this course.  Yes, when you’re finished with the lab you’ll know your way around SSIS, but more importantly you’ll understand how the ETL process gets implemented.

To complete this lab you will need the following:

·         Access to the Northwind, NorthwindDW, and NorthwindStage Databases on Microsoft SQL Server 2012. This should be available through your iSchool vSphere login.

·         Access to the SQL Server 2012 Data tools (used to author, SSIS packages).

·         You should connect to your computer running your SQL server database before starting this lab.

·         The Northwind Product Inventory Analysis Detailed Design Excel workbook.

·         Execute the SQL scripts to make the Northwind Product Inventory Analysis Star Schema and stage tables.

The last two items may be obtained where you downloaded this lab.

Part 1: Case Study: Northwind Product Inventory Analysis

As you might recall from the first lab, Northwind traders’ management needs a means to track inventory levels over time. The current transactional database keeps order levels but does not store them chronologically. In the analysis, the data warehouse team as determined that a daily periodic snapshot of inventory levels can used to address this situation.

In addition, the data warehouse team has completed a physical design, architecture plan and source to target mapping and now they would like you to implement the ETL solution!

Star Schema for Product Inventory

Part 2: ETL Solution Walk-Through

In this part, we’ll walk through the creation of the ETL solution to take a daily snapshot of all active products and populate them into the warehouse. Here’s what we’ll do:

1.       Stage our data from the source Northwind database into NorthwindStage. We will stage the data to avoid placing excess stress on the “production” Northwind database. ;-)

2.       Move and transform our staged data into the data warehouse.

3.       Implement a two solutions: an initial data load, and a periodic snapshot load.

IMPORTANT: Before You Begin:

Prior to starting this part of the lab you should execute ALL THREE SQL scripts included with this lab. These scripts will make the Data Mart tables, staging tables, and date dimension (populated with rows for dates from 1984 -2024). After you execute both scripts you should see these tables in your SQL Server Instance:

RUN THIS FIRST

RUN THIS SECOND

RUN THIS THIRD

DateDimensionSQLScript

Star-Schema-Product-Inventory-Analysis

Stage-Product-Inventory-Analysis

 

NOTE: There should be 14,610 rows in the dimDate table

Step 1: Launch SQL Data Tools

In order build our ETL solution, first we will need to run SQL Data Tools, and create an SSIS Package Details:

DO THIS:

1.       Click the Start button.

2.       Click on the SQL Server Data Tools icon

3.       From the menu, click File à New à Project

4.       Select Integration Services Project from the New Project dialog.

5.       Click OK

Step 2: Getting your way around the SSIS user interface.

Before we dive in and create our first package it would be helpful to give you a quick tour of the user interface. Each of the items we point to are explained below the screenshot.

1

1.       SSIS consists of files called Packages a package contains 1 Control Flow with 1 or more Data Flows. You create the package flows by dragging items from the SSIS toolbox onto the main package design surface.

2.       The Control Flow tab is your main workspace for control flows. Control Flows outline a set of tasks which should be carried out by the package. It is as a visual programming language and you can setup tasks to run sequentially, concurrently and based on various conditions.

3.       The Data Flow tab is your workspace for Data Flows. A Data Flow represents a visual transformation of data from a source to a target. Your data source can be just about anything, another DBMS, a text file, an XML file, a web service, etc. Your data target is typically a file or DBMS.

4.       The SSIS Toolbox contains different elements you can add to your design surface depending on whether you are in the Control Flow or Data Flow tab.

5.       The Solution Explorer is a collection of everything used as part of your project. It contains your SSIS Packages, the global connections used by the packages, and any global variables required to make the solution work.

6.       The Connection Managers tab displays the data source connections available to your package.

7.       The Properties window is where you can change the characteristics of the elements you’ve added to your design surface.

Step 3: Create the InventoryStage Package

In this first step we will create an SSIS package called InventoryStage to stage data from our Northwind OLTP database. 

To rename the package

1.       In Solution Explorer, Right-click on SSIS Packages and select New SSIS Package
(NOTE: If you do not see Solution Explorer open it from the VIEW menu.)

2.       Right-Click on the package and choose Rename from the menu

3.       Name the package InventoryStage.dtsx

4.       Make sure this package appears in your Solution explorer window under SSIS packages

TIP: It’s a good idea to save your work after you complete every step in this lab to avoid a situation where you might lose work.

Step 3a: Setup the DateStageKey global variable

In our project solution we’ll need a global variable. This global variable will represent the current date, which is a source for capturing the snapshot of our inventory history at that point in time. (Recall: Product inventory analysis is a periodic snapshot fact table with the snapshots occurring daily.) In a real world setup you might retrieve this information from the system clock but it is always advisable to control this value during development so that you can test your package against multiple dates.

To create your global variable:

1.       Double-click on Project.params section in the Solution Explorer. This will open the parameters canvas for the entire project.

2.       Click on the Add Parameter icon  to create a new parameter.

3.       Enter information as follows:

4.       Note: the value is today’s date in the format YYYYMMDD.

Step 3b: Setup our global connections

Our project will also need connections to various data sources and targets. The best practice is to place these connection in the project (instead of the package itself) that way they can be shared across multiple packages.

To setup your connections:

1.       In Solution Explorer right click on Connection Managers

2.       From the menu select New Connection Manager

3.       From the Add SSIS Connection Manager dialog, choose OLEDB and click Add…

4.       From the Configure OLE DB Connection Manager screen, click the New… button to launch the connection manager.

5.       In this step we will create a connection to the Northwind database. From the Connection Manager dialog, fill out the information to complete the data source connection.
Server name: (yes, that’s a period. It means the local server)
Select a database name: Northwind

6.       You can click Test Connection to verify the connection works, and when it does click OK to save.

7.       Repeat steps 1 – 6 and add connections for NorthwindDW and NorthwindStage.

8.       When you’re finished, you should have 3 connections under the Connection Managers folder for each of the connections. (NOTE: I’ve renamed the Northwind database connection to NorthwindOLTP)
Here’s a screenshot:

Step 3c: Truncate tables task

We will need to erase our stage tables before populating them with data. We might not desire to do this in a production environment where we need to save several days’ worth of staging data, but during ETL development this is a handy step since it will wipe clean our tables at the start of each step.

Insert truncate tables task

1.       Click on the Control Flow tab if you have not already.

2.       Drag and drop the Execute SQL task onto the surface:

Notice the Red X in the task this means the task will not run because it is not setup properly. Let’s set it up.

3.       Double-click on task to open the Execute SQL Task Editor

4.       Under the Name property, enter Execute SQL Task Truncate Stage Tables

5.       Under the Connection property, choose LocalHost.NorthwindStage

6.  Under the SQLStatement property, click on the Builder Button   and type in the following SQL:
truncate table StageProduct;

truncate table StageSupplier;

truncate table StageInventory;

7.       Click OK after entering your query, click OK to close the task editor.

8.       If you did all steps correctly you should see the following on your design surface:

9.       Execute our package. Let’s try out our single-task program by pressing F5

Step 3d: Executing and debugging a package.

It’s useful to execute your package to verify that what you’ve done is working properly.

·         To execute your package, press the F5 key. This will place your package in run mode. You will not be able to modify the design surface in this mode.

·         If your package executes correctly, all tasks in the package will display a green checkmark next to them. For Example:

·         If your package does not execute correctly, you will see a Red X next to the step with the error. For example:

·         To debug a package error, click on the Progress tab. The error will be listed under an exclamation point . This will help you isolate and correct the error.

·         To stop package execution and return to design mode so you can fix an error or continue to add tasks, press Shift+F5.

Step 3e: Adding a Data Flow task

In this next step, you’ll create a data flow task to move data out of our source NorthwindOLTP database and into the NorthwindStage database. Remember when we stage tables, we’re extracting raw data from our source systems as-is into a staging database. We do this to reduce the I/O load on our production systems.

I will walk you through doing this for Products and then leave you to complete it for the other staging tables.

To add a data flow task:

1.       Drag and drop the Data Flow task onto the design surface

2.       Rename the data flow task to StageProduct by changing the Name property in the Properties window.

3.       Double-click on StageProduct to edit the data flow. This will open up the data flow design surface.

4.       Let’s start with the data source. Drag the Source Assistant from the SSIS Toolbox into the design surface. This will open the following dialog.

5.       Select the source type SQL Server, connection manager LocalHost.NorthwindOLTP then click OK you will see your data source in the data flow design surface:

Because of the Red X you know we need to configure the source.

6.       Double-click on the OLE DB Source to bring up the OLE DB Source Editor. The general rule of thumb is you configure this source to pull the exact data you need to stage. No more. No Less.

To accomplish this we will use an SQL command.

7.       Make sure the Connection Manager and data access mode match the screenshot.

8.  To edit the SQL command text, click the Build Query…  button as paste in the following SQL

SELECT Products.ProductID, Products.ProductName, Categories.CategoryName, Products.QuantityPerUnit, Products.Discontinued

FROM Products INNER JOIN

Categories ON Products.CategoryID = Categories.CategoryID

9.       After you paste it your screen should look like the screenshot in step 6. You can click Preview… to see the query output and verify it

10.   After you verify the data click the OK button to close the OLE DB Source Editor.

11.   Rename the source to say OLE DB Source ProductAndCategory like this screenshot:

12.   One thing we need to do is add our global variable StageDateKey to the input pipeline. We do this by adding a Derived Column to the design surface.

13.   Now we need to connect the data flow. Dragging the blue arrow from the OLE DB Source to the Derived Column to connect the output of the former to the input of the latter.  When you do it correctly, you should see the following on your design surface:

Now we need to configure our derived column to add StageDateKey data flow pipeline.

14.   Double-click the Derived Column to configure it the Derived Column Transformation Editor will appear:

15.   Drag the variable $Project::StageDateKey into the Expression section of the dialog, then name the Derived Column Name StageDateKey as it appears in the screenshot:

16.   When you have it configured, click OK to save. Now that we have the data in the pipeline, it is time to add the destination for the data.

17.   Drag the Destination Assistant from the SSIS Toolbox into the design surface. This will open the following dialog.

18.   Select a destination type of SQL Server and a connection manager of LocalHost.NorthwindStage then click OK. You will see the following in your design surface

19.   Associate the output from the Derived Column as input into the OLE DB Destination by dragging the blue arrow from the former to the latter. When you’ve completed the task you should see this:

20.   Double-click on OLE DB Destination  to configure it, bringing up the OLE DB Destination Editor

21.   Set the Connection Manager to match the screenshot. Be sure to include StageProduct as the table.

22.   Click on the Mappings section in the dialog, and make sure all columns from the source are mapped to the appropriate destination column.

23.   To add a mapping, simply drag from input column to destination column. When you’re done, click OK to save.

24.   Rename OLE DB Destination to say OLE DB Destination StageProducts and your design surface should look like this:

25.   If it does, you have completed the data flow. Click on the Control Flow tab.

Step 3f: Connecting the tasks

Back in your control flow design surface you should see two tasks:

Connecting the task flow.

1.       Drag the green arrow from Execute SQL Task Truncate Stage Tables and drop it on StageProduct. When you finish your control flow should look like this:

2.       Press F5 to execute the package. Troubleshoot and correct any errors you encounter. You will see this if your package runs successfully:

Step 3g: Create the other data flows in the tasks

ON YOUR OWN: Try to repeat steps 3e & 3f 2 more times to create data flow tasks for StageInventory, and StageSupplier Here’s some useful tips to help you get it done:

StageSupplier Data Flow

·         Source:

o   Database: NorthwindOLTP

o   SQL: SELECT  SupplierID, CompanyName, ContactName, ContactTitle, City, Region, Country FROM  Suppliers

·         Derived Columns

·         Destination:

o   Database: NorthwindStage

o   Table: StageSupplier

o   Custom Mappings: RegionNA to Region

StageInventory Data Flow

·         Source:

o   Database: NorthwindOLTP

o   SQL: SELECT  SupplierID, UnitsInStock, UnitsOnOrder, ReorderLevel, ProductID
FROM Products

·         Derived Columns:

·         Destination:

o   Database: NorthwindStage

o   Table: StageInventory

When you’re finished your control flow should look like this:

Make sure your InventoryStage.dtsx is saved and executes correctly before moving on to the next part.

Step 4: Build the InventoryMart Package

In this last step, we will build out the other half of our ETL process – from staging tables to data mart. Like step 3, this will be a long multi-step process. I will walk you through the first data flow and then let you figure out the rest on your own. The InventoryMart package consists of 3 data flows:

Before you begin:

1.       In Solution Explorer, Right-click on SSIS Packages and select New SSIS Package
(NOTE: If you do not see Solution Explorer open it from the VIEW menu.)

2.       Right-Click on the package and choose Rename from the menu

3.       Name the package InventoryMart.dtsx

From here on out, we will use the InventoryMart package.

Step 4.a:  Create DimProduct data flow task in the InventoryMart package

Let’s create the DimProduct data flow task to pull data from the NorthwindStage database into the NorthwindDW database.

1.       From the Control Flow tab, add a new Data Flow Task and name it DimProduct.

2.       Double-click on DimProduct  to edit the data flow

3.       Drag and drop the Source Assistant onto the data flow design surface.

4.       Select SQL Server as the source type, LocalHost.NorthwindStage as the connection

5.       Rename the OLE DB Source to OLE DB Source StageProduct

6.       Configure the source to use the table StageProduct, matching this screenshot:

7.       Back at the design surface, drag and drop the Slowly Changing Dimension onto the surface.

8.       Drag the output blue arrow from the OLE DB Source and drop onto the Slowly Changing Dimension

9.       Double-click on the Slowly Changing Dimension to start the wizard:

This wizard will help you configure the load into the target database so that only that that has changed will added or updated. Click Next> to begin.

10.   In the first step we must choose the Destination dimension table and business Key. The business key represents a series of columns which determine that the row is unique so we can say “that is the same dimension”. This value is typically the natural key for the data or the primary key from the original OLTP source.
We’ll configure the Table to be DimProduct and the business key should be ProductID.

After your screen matches the screenshot, click Next>

11.   Next we need to configure the SCD columns your choices are Fixed Attribute = Throw an error if this changes, Changing Attribute = Overwrite History (Type 1), and Historical Attribute = New Record for changed data (Type 2)

Configure your screen to match this screenshot, click Next>

12.   In the next step we need to configure the columns used by the historical attribute to track changes. We’ll use our RowIsCurrent column in the table to track these changes.

Complete the screen as shown and click Next>

13.   We will not use the Inferred dimension members.

Disable this setting and click Next>

14.   Click Finish to save the destination flow. Your design surface should look like this:

15.   Close your data flow and run your control flow. Make sure it runs successfully before continuing.


 

Step 4.b:  Create DimSupplier data flow tasks

On your own: Build out the data flow tasks for DimSupplier. These steps should be similar to what you just did in step 4.a

DimSupplier

·         Source

o   Database: NorthwindStage

o   Table: StageSupplier

·         Slowly Changing Dimension

o   Business Key: SupplierID

o   Other Attributes: Type 2 (HistoricalAttribute)

Data Flow:

 

TIP: Run your package to make sure the logic is correct. It should be noted that on subsequent runs only the data that changes is imported from stage into our data mart. This day we can just blindly process staged data and the logic in InventoryMart will figure out the new data. Awesome!

Step 4.c:  Create data flow for FactInventory

Because the data flow for populating the fact table is a little different, we’ll walk you through this process. Fact tables need to lookup dimension surrogate keys from their natural source key (a.k.a. business key) counterparts before being added to the destination.

1.       From the Control Flow tab, add a new Data Flow Task and name it FactInventory.

2.       Double-click on FactInventory to edit the data flow

3.       Drag and drop the Source Assistant onto the data flow design surface.

4.       Select SQL Server as the source type, LocalHost.NorthwindStage as the connection

5.       Rename the OLE DB Source to OLE DB Source StageInventory

6.       Configure the source to use the table StageInventory.

7.       Drag and drop the Lookup onto the data flow design surface.

8.       Connect the blue arrow from OLE DB Source to Lookup

9.       Rename Lookup  to Lookup ProductKey

10.   Double-click on the Lookup ProductKey to configure it. This will open the Lookup Transformation Editor dialog:

11.   Click on the Connection section from the dialog. The goal of this screen is to retrieve only the data we need to properly lookup the dimension key when given the natural key. We must also take into account that we should ignore rows not current as part of our slowly changing dimension. The SQL pattern we use always the same:

SELECT DimensionPK, DimensionNaturalKey FROM DimensionTable WHERE RowIsCurrent=1

That being said configure the connection like this:
Connection: Localhost.NorthwindNW
SQL: SELECT ProductKey, ProductID  FROM  DimProduct  WHERE (RowIsCurrent = 1)
Here’s a screenshot

12.   Next, click on Columns. The purpose of this section is to retrieve the lookup value and place it in the data flow. We usually set this up to find the natural key and retrieve the dimension key. This screenshot shows how this should be configured:

13.   Click OK to save the lookup.

14.   Next we need to lookup the supplier key. Repeat steps 7-13, but this time:
- Input / Output Selection: Lookup Match Output
- Connect the data flow from Lookup ProductKey to Lookup SupplierKey
- SQL: SELECT SupplierKey, SupplierID FROM DimSupplier  WHERE RowIsCurrent=1
- Lookup: SupplierKey using SupplierID
When you’re finished your data flow should look like this:

15.   Now we need to lookup the Date key. Repeat steps 7-13, but this time:
- Input / Output Selection: Lookup Match Output
- Connect the data flow from Lookup SupplierKey to Lookup DateKey
- SQL: SELECT DateKey FROM DimDate
- Lookup: DateKey using StageDateKey
When you’re finished your data flow should look like this:

16.   In the last part, we need to process the fact table. Again we will use the slowly changing dimension wizard. You might be confused by this – why use a dimension wizard to process a fact table. Well, the dimension wizard does something very useful- helps us to detect “new” rows so we avoid inserting the same fact more than once. This is critical as part of the ETL fact table process.

17.   Drag and drop the Slowly Changing dimension onto the design surface, and then connect the output of the Lookup DateKey to it.

18.   Double-click on the Slowly Changing Dimension to configure it.

19.   Configure the table and keys as shown in this screenshot:

Notice we need to use our Dimension FK’s as business keys. This will ensure the same fact is not entered more than once. Click Next>

20.   Configure the SCD columns to match the screenshot.

By setting the measures as type 1 attributes (replace) means that if we need to update a fact retroactively we will be able to do so. Click Next>


 

 

21.   Select Changing attributes, as shown in the screenshot. In this form, when the same business key arrives, our fact data will be overwritten.

Click Next>

22.   Disable inferred dimension members, click Next>

23.   Click Finish to save. Your entire fact table data flow will look like this:

24.   Return to the Control Flow tab, and reorganize your control flow to match this screenshot:

25.   Execute your package and make sure it works. Debug any issues you have.


 

Part 3: On Your Own: ETL Solution for Sales Orders

In this part, try to build out an initial load for the Northwind Sales Orders data mart. Don’t worry about updates, just the initial sales load, like we did in this lab. Here’s a high level outline of the process:

1.       Create a new SSIS project – NorthwindSales

2.       In connection Managers, create your project connections to Northwind, NorthwindStage, and NorthwindDW

3.       Add a new SalesStage Package to the project.

a.       Add data flow tasks to stage all dimensions

                                                               i.      note- you can skip the dimensions you’ve staged already in like Product

                                                             ii.      to save time, have the destination data flow task create the table on the target in the stage database

b.      Add data flow task to stage the fact table.

                                                               i.      Be sure to include the natural keys from the dimensions.

                                                             ii.      Write SQL to roll up the fact table to the appropriate grain.

4.       Add a new SalesMart Package to the project.

a.       Add data flow tasks to move data from staging to the dimension tables

                                                               i.      Consult your documentation for types of SCD’s

b.      Add a data flow task to move data from staging into the fact table.

                                                               i.      Follow the pattern we used in the last step of part 2 to configure the data flow to lookup dimension keys from the natural keys.

c.       Organize the control flow of all the packages so that the dimensions are populated before the fact table.

5.       When you’re finished, zip up BOTH your part 2 and part 3 solutions and submit to the assignment drop box of our LMS.

Good Luck!

database solution, BI solution, provide solutions to the most complex database and BI problems

Data base & business intelligence solution website