Categories: Nerd Stuff, SQL Posted by NateChurch on 12/15/2011 11:21 AM | Comments (0)

I have been on a team where we are creating a solution for a corporation that will be deployed to different business units around the world as a standardized solution. In this case the solution is for oil field work and will help each business unit optimize the production of wells based on the injection of water into other wells, a process called waterflooding. The customers for this application will be the engineers who will monitor the metrics in order to make decisions on how to optimize the process by controlling the variables available to that engineer. For instance, they can increase water flow to an adjacent well if they see production drop off from another well or they can even recommend drilling new wells.

                As you can imagine this application is very data intensive. The process of injecting water creates many standard metrics which we are going to measure with our application. Some of these could be water rate, water pressure, or oil out for instance. With a user base that is largely technical and mathematically driven, they want as many different numbers as they can get to make their decision. This can present a problem because the information they want may be outside of the standard dataset we will provide.

                This need for additional data is also driven by each business unit’s differing geography and geology. For instance, in West Texas it is often cheaper to drill a new well if a greater injection rate is needed, where an offshore rig in deep water would find the same solution cost prohibitive. Other the other hand, offshore wells may take more expensive measurements, but performing these same measurements hundreds of times in a West Texas field is too expensive to do. 

                We are tasked with creating a solution that has as many of the basic attributes as possible for our first release so the engineers can make the nest decision, but giving the ability to expand those data sets as needed. We will hand the finished product to each business unit’s IT staff and they will then take over ownership. We will send out patches and upgrades to the solution, which they will then install. Our finished product will consist of a database, ETL, and a user interface, so the extensibility is needed throughout.

                With these goals in mind, I was asked to come up with different solutions for creating an extensible dataset. I have detailed some of these solutions below. My goal was to create a dataset that was both modular, meaning that it was standardized for flexibility, but also extensible, or able to be expaned. Since we are handing the tables over to the business unit IT staff, it is important to maintain logical divisions between the added code. This will make code we have developed less likely to break when extended. It will also facilitate smooth upgrades and patches.

 

Here are the solutions I provided to my client. Obviously certain choices are preferred over others.

1.       Add columns to the original source tables.

a.       Positives: This is arguably the simplest

b.      Negatives:  Presents problems when upgrading the solution and can affect any tuned performance you have set up.

In the sample, our original columns are in blue and the added columns are in red. So, we have added Latitude and Longitude

 

2.       Create custom tables and join them via views to the source tables.

a.       This is another simple solution where you create one table preferably in another schema for every table where you wish to have custom values. In our sample, the solution tables are in the dbo schema and the user tables are in the bu schema.

b.      Positives:

                                                               i.      This table is owned and managed by the users

                                                             ii.      Can be put into separate schema for increased security (which is also nice for use on alternate filegroups or alternate backup strategies if needed)

                                                            iii.      View can auto-extend

Note:  (This can be done by using SELECT dbo.col1, dbo.col2, bu.* …)

                                                           iv.      Helps with upgrades

                                                             v.      Does not affect performance of the original tables outside of the view

                                                           vi.      Users create their own ETL to populate these tables.

                   c.       Negatives:

                                                               i.      Must account for multiples in the custom attribute tables, so row counts stay the same.  For instance, if there are two rows for WellID BH1 then there will be more rows in the final view. Two ways you can handle this via a “TOP 1” or by adding a date added column and taking the most recent.

 In the sample, you can see we joined the table with a simple outer join and the view is populated with values for both tables. Be sure to use an outer join so you have the flexibility to keep your interface running when there are no custom attributes provided by the users. This is especially important because there may be two ETL processes(solution and user) updating a single row in the view. It is also important your interface can handle NULLs.


 

3.       Create one custom value table  and join those via views to the source tables

a.       This solution creates a common attribute table so there are less new tables. It basically creates a virtual or dynamic tables, that you can create via a PIVOT.

b.      Positives:

                                                               i.      Custom value tables are great at keeping all values in one location for copying during upgrades

                                                             ii.      These are helpful because you have them created and added dynamically via the interface. 

                                                            iii.      Users can update the custom valued table using their own ETL.

                                                           iv.      Can be put into separate schema for increased security

                                                             v.      Stored Procedures can be written so they don’t have to change.

c.       Negatives:

                                                               i.      Doesn’t allow for multiples of the same attribute.

                                                             ii.      Requires modification of the views to add new attributes.

                                                            iii.      If you extend to far (too many attributes) you will start to see performance degrade.

 

In the sample you can see the custom table contains the custom values for all the tables. In this case, you would want a Primary, Clustered Index on the first three columns (Table, RowID, Attribute). The values are then added to the original values using a PIVOT clause. Optionally you could create a view that does the PIVOT before the final view for the end to maintain(see dotted line in sample).

 

 

4.       Create an XML column on a lookup table and store the additional information there.

a.       This would involve creating an XML schema and creating XML documents for each row. You could choose to store this in the original table, a custom table for each domain, or one custom table for all domans (similar to the designs of Options 2 And 3 but with XML)

b.      Positives:

                                                               i.      It can be indexed as XML as well.

                                                             ii.      Customer can update these using their own ETL, separate from those provided.

c.       Downside:

                                                               i.       It is difficult to update without an interface to do so.

                                                             ii.      User’s ETL coding is more complex.

                                                            iii.      Views must be manually changed for each new attribute

 

In the sample you can see that the there is a table that contains custom XML attributes. This option could also be created one table for the whole solution (similar to the last option) or it could be created as an additional column on the original table. I have gone away from that, because I don’t like BLOBs attached on the original tables. In this case I wouldn’t so that I can move them to other filegroups if needed.

 

 

Sample Scripts:

I have included some sample SQL scripts for option 3 and 4 in case you wish to see how these might work code wise.

ModularExtensibleDataSets-DynamicTables.sql (2.69 kb)

ModularExtensibleDataSets-XMLColumn.sql (764.00 bytes)

 




Comments are closed