The other night, I gave a presentation on SSIS architecture to a local user group, and I thought I'd share the discussion here. This is part of a beginner's course on SSIS I'm teaching, where I share tips on how to design your packages in a way that's both supportable and extensible.
The discussion the other night was on what to do when you have multiple tables to load. Do you put them in a single package or in multiple packages, each with a single table load? Of course, like everything else in databases, there's no single answer that works across the board. You have to look at every scenario you write and make this decision again and again. It's not that bad, though; after a while, you get a feel for it, and it becomes much easier.
[ Keep up with app dev issues and trends with InfoWorld's Fatal Exception and Strategic Developer ]
Before I get into specifics, I'd like to say a couple words on why this is necessary. I've been supporting both DTS and SSIS for many years. In the process, I've seen all types of architectures for processes, and I can say they're not all created equal.
Some are much easier to support from the DBA side and to maintain from the dev side, so you have to decide on your goals for your process. Is your goal to move the data as fast as possible with little regard to support or extensibility? Are you looking for a quick recovery should something go wrong? Maybe you want to move the entire process easily to different servers or at least part of the process to a different server.
These are some of the possibilities, but the point is that you can't architect a process until you've determined your goals. How would you know if you're successful unless you measure your progress against concrete plans?
We see this in high-availability scenarios a lot. Someone will say they need high availability for their big box and instantly put in a cluster -- probably because it sounds cool. A few months later, the data typically becomes corrupted in some way and the admins are shocked to find out that the cluster hasn't protected them. This is because they didn't bother writing down their high-availability goals and, thus, lack a way to measure the solution.
I'll add that this is the same decision process you go through for replication scenarios. How do you want to divide up your publications? It's impractical to think that you'll have 80 tables in the same publication -- if something goes wrong and you have to resync the data via snapshot, you run the risk of greatly increase its duration. If you break it up, you can resync only the affected tables.








