SQL Server and GDPR, what should you be doing?

24 07 2018

SQL SERVER…..GDPR…..ONE OF THESE IS MY FAVOURITE THING TO BLOG ABOUT….THE OTHER IS GDPR.

Fortunately, this blog isn’t about GDPR! It’s about SQL Server, or more specifically, how to secure your data in SQL Server. However, if you care about GDPR (and let’s face it, we’re constantly told we should!) and you happen to have a few SQL Server databases kicking around, laying bare to the world, then it’s probably worth your while reading on.

Over recent months, most of our customers at Web Applications UK have enquired about what they should be doing database-wise to make sure they’re protecting their data. SQL Server gives you a few different options on this front, depending on what version you have. Unfortunately, there’s no one-size-fits-all answer. However, I’ll attempt to summarise the key characteristics of each of these options.

Former Oldham Athletic footballer Fitz Hall

Former Oldham Athletic footballer Fitz Hall, nicknamed “One Size” (get it??). He doesn’t like talking about GDPR.

 

Transparent Data Encryption (TDE)

What is it?

TDE is nothing new. In fact, it’s been a DBA staple since 2008. TDE encrypts the physical files – including backups – by using a certificate, much like most other forms of encryption. However, it does NOT encrypt the data itself.

When is this a good option?

Well…if someone walks off with your backups, or even the raw data files, they’re going to have a pretty tough time restoring your databases. So if this is a concern, then TDE is a great option. It’s also super easy to implement – your applications won’t need to change one bit!

However…

  • It’s an Enterprise-only feature (boo-hoo!), and your disaster recovery (DR) server will need Enterprise too if you ever need to restore the backup!
  • It doesn’t encrypt the data across the wire, so if someone intercepts your data in transit, TDE won’t protect you. If you have a rogue DBA in your data centre, TDE isn’t going to stop them simply dumping all your data into another database and walking off with it.
  • If you’re prone to forgetting passwords, or losing encryption keys, then you’re not going to have a fun time with .

Further reading

https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/transparent-data-encryption?view=sql-server-2017

 

Always Encrypted (AE)

Always Encrypted was new in SQL Server 2016. AE means that the data is always encrypted within the database, and in transit. The data can only be decrypted by a client application driver, such as OLE DB. This requires a certificate to be installed on your app server – . So anyone stealing your database, or even just querying AE columns through SSMS, are going to see the data in it’s encrypted state (i.e. a load of junk).

AE encrypts the data at column level, so you target specific columns that you want to encrypt, such as passwords, credit card numbers, and other bits of Personally Identifiable Information (PII). Basically, you get to pick and choose (to some extent) what data you want to encrypt.

When is this a good option?

  • It’s available in all editions (yay!)
  • The data is protected all the way down to the application server, so that rogue DBA in the remote data centre isn’t going to be able to see your data. Equally, anyone intercepting the data in transit will only see the encrypted version. It’s a great solution for providing separation between those who own the data and those who manage it.
  • It involves very little in the way of application changes – you just need to make sure you’re using an Always Encrypted enabled driver in your application.

However…

  • Remember, SQL Server doesn’t hold the encryption keys, which makes things tricky when you want to compare data. Range scan queries e.g. LIKE, <. >, BETWEENhave issues. Some of this logic will need to be shifted to your application layer. Not fun! However ask yourself, do I actually need to perform these kind of searches on my encrypted columns?
  • Full-text indexes on an encrypted column aren’t supported.
  • Every client application that needs access to the data (e.g. a data warehouse) needs a copy of the certificate, so there’s a maintenance overhead.

Further reading

https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/always-encrypted-database-engine?view=sql-server-2017

https://www.brentozar.com/archive/2017/08/use-always-encrypted/

 

Dynamic Data Masking (DDM)

Yes, that’s MASK, not encrypt, hash, or any other fancy stuff. Quite simply, DDM allows you to put a configurable mask in front of a column’s data, so undeserving prying eyes can see only some, or none of it. The amount they can see depends on the masking function you apply. Phone numbers and email addresses are obvious uses here.

Quick Example:

CREATE TABLE TESTMAIL (EMAIL NVARCHAR(100) MASKED WITH (Function='partial(1, "xxxx", 0)') NULL)
INSERT INTO TESTMAIL VALUES ('test@test.com')
SELECT EMAIL FROM TESTMAIL;

Result to user with the UNMASK permission on the table:

Result to user WITHOUT the UNMASK permission on the table:

When is this a good option?

  • You want column-level control of what data is protected.
  • You want to use SQL Server permissions to control who can see behind the mask.
  • You want to search the masked data using range-scans (LIKE, ‘<>’ etc).
  • You want something that’s fast to apply to existing columns.
  • You don’t want to modify your existing queries.

 

However…

  • It’s not completely secure – just because you can’t see the data, doesn’t mean you can’t figure out what it is. For example, you may ‘mask’ the SALARY column in your HR database, but someone running ….WHERE Name = ‘Joe Bloggs’ AND SALARY BETWEEN @VAL1 and @VAL2…. isn’t going to have to work too hard to figure out Joe’s salary.
  • If your application uses a single login throughout, DDM is going to be harder to implement. DDM relies on the login context, so you’re going to need to use EXECUTE AS in your queries, or multiple connection strings in your application. The latter has the downside of preventing connection pooling and spinning up multiple database connections.

Further reading

https://docs.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking?view=sql-server-2017

 

Row-level Security (RLS)

Like AE, RLS was new in 2016 (all editions). RLS is concerned with removing rows of data from the results of a query (as opposed to masking or encrypting), depending on permissions. If a user isn’t allowed to read/modify a particular subset of rows from a table, then they’re filtered out. The filtering is performed by a user-defined function (UDF), so you can apply your own business logic as required.

When is this a good option?

  • You want to restrict access to specific rows in a table (both READS and WRITES), based on the user context.
  • When you want to centralise the access-restriction logic for your application and ensure consistency.
  • When you have complex business rules that govern who can modify what data.

However…

  • Think about performance! You are applying a predicate function to the query and it’s going to slow it down.
  • Like DDM, RLS requires apps to execute in the context of the end user, which could lead to many open connections. Using a dynamic ‘EXECUTE AS [user]’ could be one way around .

Further reading

https://docs.microsoft.com/en-us/sql/relational-databases/security/row-level-security?view=sql-server-2017

 

Conclusion

Hopefully this has given you a basic overview of what your options are. Which is right for you will depend on your business requirements. If you have enterprise edition, then turning on TDE from the outset seems like a no-brainer, right? The other options are going to require a little more thought. Just spare one of those thoughts for the App development team – you’re likely going to need work with them on this!

Thanks for reading!

Advertisements




SQL Server Waits: How, What and Why

1 08 2017

I’m British, and we’re used to waiting in line for things, but why is my SQL Server waiting?

This is a question that comes up a lot, and is a lesson 101 problem for any Database Admin. We know that SQL Server waits, but is this bad? Should we be doing something, or buying something? What do these ‘waits’ really mean, and how do I get rid of them?

Well, let’s take a breath and not panic. SQL Server waits for many reasons, and they’re not all something to worry about! In fact, many of them can be ignored all together, and if you’re weren’t seeing any waits whatsoever, then you’d probably be better off saving yourself some cash and sticking your data in an Excel spreadsheet.

But no one likes waiting, and reducing these waits is good, so let’s examine SQL Server Waits in more detail. We’ll start by talking about what we mean by a ‘wait’, then go on to talk about why they occur, before looking at some useful scripts to help us identify them. Finally, we’ll talk about what actions we could take to reduce these waits.

What is a Wait?

SQL Server executes your queries using something known as a ‘worker thread’. Put simply – if one of these threads wants to execute, but a required resource is not available, it must wait! Any active threads that are not currently executing, are classed as ‘waiting’, and of course if a thread is waiting, our query will take longer to complete.

When these threads ‘wait’, SQL Server logs lots of super-useful information, such as how long it’s been waiting, and why it’s waiting (known as the wait ‘type’). Once we understand it, we can use this information to make informed decisions on how to reduce these wait times.

What can cause waits?

Let’s think about what happens when SQL Server executes a query

  1. SQL Server requests the required data pages from the physical disk and they are loaded into its memory buffer pool (if not already there)
  2. SQL Server accesses the data pages from the buffer pool and applies the required locks
  3. The query is executed

So considering the above, let’s think about what might cause SQL Server to wait:

Disk I/O

Those old spindly disks just can’t keep up sometimes! If SQL Server must read the pages it needs from disk, your thread is going to relinquish its CPU time and wait until those data pages are loaded into memory.

Page Contention

SQL Server manages access to the data pages by a series of locks. For example, by default your SELECT statement will need to place a SHARED lock of some sort on the required data pages to ensure they’re not modified by any other transaction until its complete. However, the execution thread could be blocked from achieving this by another thread, which is holding an incompatible lock. This behaviour is critical to maintaining the integrity of the database transactions, and guess what? If your thread can’t acquire the locks it needs, it’s going to have to sit there and wait.

CPU

Finally, of course your thread needs time on the CPU to be executed. If a thousand other requests also need executing, your query may just have to wait in line for its turn (or more specifically – a queue – more on this below)!

How SQL Server manages a waiting thread

Whenever a thread is ‘active’, SQL Server holds it in one of three states; Runnable, Running or Suspended. Throughout its lifecycle, your thread will likely pass through all 3 of these states at least once.

It will come as no surprise that ‘Running’ means your thread is being executed, and the other two mean it isn’t! Both ‘Runnable’ and ‘Suspended’ are ‘waiting’ states, and as mentioned earlier, SQL Server will track the time your thread spends in this state.

You may find that your thread sits in one of the ‘waiting’ states for a prolonged period, and you need to know why! Let’s examine each state in more detail.

Runnable

Threads in the runnable state are not waiting on anything, other than CPU time. They’re sat in a queue, and will be dealt with on a First-In-First-Out (FIFO) basis.

Seeing threads in the runnable state is completely normal on even a moderately busy system. However, if threads are staying in this state for long periods, this is more likely a sign of CPU pressure. A CPU with 32 logical cores can only execute 32 threads at any one time, so anything beyond that is going to cause a wait!

Running

As mentioned, the Runnable queue is a FIFO queue, so the thread at the ‘head’ of this queue will eventually get some CPU time, and move into the Running state. Your query is being executed! Woohoo! However…

Suspended

When your thread is running on the CPU, it doesn’t get to just run and run until it’s done. It’s actually given just 4ms (yes, that’s milliseconds!) of CPU time before it’s force to relinquish control. This could be even shorter if it needs a resource at some point that it has to wait for (such as a lock, or disk I/O). Once that happens, the thread moves onto the Suspended list (otherwise known as the ‘waiter’ list).

Note that this is a list, as opposed to a queue – there’s no knowing when your resource will be available, and your thread must wait until it is!

If you consistently see high numbers of threads in the Suspended state then you’ve got contention issues on a resource other than CPU (more on this below).

Once the resource is available, your thread will move back onto the Runnable queue, and the cycle continues…

Suspended -> Runnable -> Running (4ms) -> Suspended -> Runnable – Running (4ms)-> etc…

How do I find out why my SQL Server is waiting?

So now we understand waits, let’s examine how we can identify what types of waits our server is suffering from.

This is where SQL Server itself can help us. We can check the current state of our thread by checking the following dynamic management view (DMV) while our query is executing:

SELECT * FROM sys.dm_exec_requests where session_id > 50

(note – session_id 50 or below are internal processes and can generally be ignored for this purpose)

That will give us some useful information, but it usually shows us an ever-changing picture, and doesn’t really give us enough to start troubleshooting. There are a couple of other DMVs for diagnosing wait stats which you should be aware of, and when the 3 are used in combination, they become extremely useful:

sys.dm_os_wait_stats

This DMV returns information about all the waits encountered by threads that have executed. It gives an historical view of the data (since the server was started, or stats were purged), rather than a current view.

sys.dm_os_waiting_tasks

This DMV returns information about what tasks are in the waiter list (Suspended) now i.e. waiting on some resource. It gives a current view of the data, and therefore is most useful when the system is currently experiencing blocking issues.

There are a tonne of scripts out there, written by some very clever members of the SQL Server community, which do a lot of the work for you and really take the pain out of understanding what on earth is going on! I’m not here to state which ones you should and shouldn’t use, but these are the ones that I keep in my locker:

Paul Randal – Tell me where it hurts!

https://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/

This is a very useful script which wraps some aggregation code around dm_os_wait_statsand shows you the cumulative wait stats for your server.

Pinal Dave – Current running queries

https://blog.sqlauthority.com/2009/01/07/sql-server-find-currently-running-query-t-sql/

Pinal Dave’s blog is a treasure-trove of useful scripts. This particular script is actually very simple – it’s an expanded version of sys.dm_exec_requests that throws in the actual query text for good measure, so you can see exactly what queries are currently being executed on your server.

Pedro Lopes – SQL Swiss Army Knife – Locking, blocking and active transactions

https://blogs.msdn.microsoft.com/blogdoezequiel/2012/09/18/sql-swiss-army-knife-11-1-locking-blocking-and-active-transactions/

As the names suggests, this script gives you pretty much everything you need for diagnosing current waits, and it does this by combining most of the key DMVs. It will show you what’s running right now, what’s waiting, what’s being blocked, and lots of other useful information.

Brent Ozar – Triage Wait Stats in SQL Server

https://www.brentozar.com/responder/triage-wait-stats-in-sql-server

This script gives you the best of both worlds – it will tell you the overall wait times on your server since it was started, and then give you a snapshot of your current waits. Brent’s blog also includes some links to sp_WhoIsActive from Adam Machanic, which I would definitely recommend you check out.

Interpreting wait stats – Perfmon is your friend

Fixing waits is rarely a case of just running a DMV and saying “Ah yes, there’s the problem”, however we can typically draw some reasonable conclusions from what we find.

As mentioned, if threads are remaining in the Runnable state for a long time, then you’ve likely got CPU pressure. Running a Perfmon trace and monitoring the System – Processor Queue Length and Processor – % Time counters will help confirm this. Also, look at the signal_wait_time_ms column from sys.dm_os_wait_stats – this will tell you how much of the cumulative wait time was spent solely on waiting for CPU.

If reducing the workload on your server is not an option (it rarely is), then upgrading your CPUs might be your only choice. Remember, however, that waiting is normal. So don’t start filling out purchase requests unless they’re actually causing you a problem.

Threads that are Suspended are waiting on something other than CPU. Don’t make the mistake of instantly thinking, “OK, I need faster disks”, or “I need to buy more memory”. We usually have to introduce a further level of monitoring to come to these conclusions with any certainty, and for this I usually turn to Perfmon again. Brent Ozar has written a great blog on the best SQL Server Perfmon counters to analyze, and how to draw the right conclusions.

Conclusion

Waits are normal, and necessary, but should not be ignored. Navigating your way through the maze of information and identifying where improvements should be made can be somewhat challenging. It’s important to take a holistic look at what’s occurring on your server before jumping to any conclusions. The starting point is to just ask SQL Server – hey, what are you waiting for?

 

Feel free to leave a comment below.

 





SQL Server – Clustered vs Non-clustered indexes

31 08 2016

A question was asked on LinkedIn (https://www.linkedin.com/groups/137774/137774-6171222761881600002) regarding the difference between clustered and none-clustered indexes. Since LinkedIn limits the characters you can enter, I’m posting my (hastily written) reply here.

I often explain it to SQL newbies using the example of a dictionary vs store catalogue. Let’s say you pick up a copy of the Oxford English Dictionary and you want to find the definition of a word e.g. ‘Database’. Well because all the information in the dictionary is sorted in a helpful order i.e. alphabetical, then you just scan through the dictionary, in order, and then stop at ‘D’. When you find it, you not only find the word ‘Dictionary’, but also some information associated with the word, such as an explanation of what it means, and maybe a synonym. Your query has been fully satisfied. Happy days!

That…is a clustered index in a nutshell. It’s your data, physically sorted based on the index key, which in this case, was the name of the word. In reality, the clustered index key is usually the primary key, which will often by an integer (although certainly doesn’t have to be). Since the clustered index physically sorts our data, we can only have 1 per table, although it could contain multiple columns.

 

Now let’s imagine you want to buy a birthday present for your better-half. You pop into a store and pick up a thick product catalogue. The store catalogue is sorted by product category (furniture, homeware, electricals etc) and each product within the category includes the full description, price, pictures etc.  Here, our clustered index is on the product category.

However you’re only interested in gifts that are priced under $10 (you’re a generous soul!). How would you find these products efficiently? It could take you A LONG TIME to scan the entire store catalogue and find every product under $10! This search would be very inefficient and you’d more than likely get bored and give up.

The good news is that the store have included an index at the back of their catalogue. This lists every product, but in this index they’re sorted by price, low -> high. This means you can scan the products in price order and very quickly see all the products you’re interested in, never even reading the ones you’re not!

There’s a slight downside to this approach, which is that when you scan the price index, it doesn’t contain all the other information you’re after, such as the pictures of the product and it’s full description. If they were included as well, the price index would be almost as big as the catalogue itself! Instead, our price index contains the price (our index key), the name of the product (an included column), and crucially, a pointer to the page where the rest of the information (e.g. our pictures) can be physically found – in our clustered index!

That…is a simple non-clustered index in a nutshell. It creates an additional index which allows SQL Server to efficiently search for the data pages needed based on the search predicate provided in the query (in our example, the price). Also, because these indexes are separate from our physical table, we can have many clustered indexes on one table. Be aware however, that each index will make the overall size of your database bigger (the index has to be stored somewhere after all!) and there is a impact on your DML statements since the index has to be maintained.

This is a somewhat rushed explanation but I hope that helps!





SQL Server – Why a Table-Valued function could be killing your performance

23 08 2016

In this blog I’m going to walk you through a serious SQL Server performance problem I had in a customer’s production environment. Turned out that although the results were good, a miss-behaving table-valued function in one query was executing thousands of times more than necessary, and killing the performance.

The code, as I found it, looked like this. Nothing unusual right? So let’s lift up the bonnet and see what’s leaking underneath:

SELECT C.CustomerID, O.OrderDate, TF.[Address]
FROM   #First100Customers AS C
INNER JOIN dbo.Orders AS O ON C.CustomerID = O.CustomerID
CROSS APPLY ( SELECT * FROM dbo.tfnGetRecentAddress(O.CustomerID)) AS TF

Note – For this example we’re using SQL Server 2012 (or earlier). The behaviour is different in later versions, which we’ll come on to.

Here’s the data script so you can follow along:

IF EXISTS (SELECT * FROM Sys.objects WHERE Name = ‘Orders’ AND type = ‘U’)        N         DROP TABLE dbo.Orders
GOIF EXISTS (SELECT * FROM Sys.objects WHERE Name = ‘CustomerAddress’ AND type = ‘U’)
DROP TABLE dbo.CustomerAddress
GOIF EXISTS (SELECT * FROM Sys.objects WHERE Name = ‘Customers’ AND type = ‘U’)
DROP TABLE dbo.Customers
GOCREATE TABLE dbo.Customers (CustomerID int IDENTITY(1,1) PRIMARY KEY, Name nvarchar(200))

CREATE TABLE dbo.Orders (OrderID int IDENTITY(1,1) PRIMARY KEY, OrderDate datetime, CustomerID int REFERENCES Customers(CustomerID))

CREATE TABLE dbo.CustomerAddress (ID int IDENTITY(1,1), CustomerID int REFERENCES dbo.Customers(CustomerID), [Address] nvarchar(500), CreatedDate datetime DEFAULT getdate())

Now let’s populate our tables with some dummy data. Note that the script is creating 1,000 customers, each with 2 addresses and 10 orders:

DECLARE @Cnt smallint = 0, @oCnt int, @CustomerID int

WHILE @Cnt < 1000
BEGIN
— Create customer
INSERT INTO Customers (Name) VALUES (‘aaa’)
SET @CustomerID = SCOPE_IDENTITY()

— Create 2 addresses per customer
INSERT INTO CustomerAddress (     CustomerID, [Address]) VALUES (@CustomerID,’Dummy Address 1′)

INSERT INTO CustomerAddress (     CustomerID, [Address]) VALUES (@CustomerID,’Dummy Address 2′)

— Create 10 orders per customer
SET @oCnt = 0
WHILE @oCnt < 10
BEGIN
INSERT INTO Orders (OrderDate, CustomerID) VALUES (getdate(), @CustomerID)
SET @oCnt = @oCnt + 1
END

SET @Cnt = @Cnt + 1
END

 

And finally, let’s create our Table-valued Function. what the TVF does is irrelevant to the problem I’m highlighting, but it’s nice to use a real-world example, so this one is picking up the most recent address of the customer (and yes you’re right – this should be an inline TVF!):

CREATE FUNCTION dbo.tfnGetRecentAddress(@ID int)
RETURNS @tblAddress TABLE
(
[Address] varchar(500) NOT NULL
)
AS
BEGININSERT INTO @tblAddress ([Address])
SELECT TOP 1 [Address]
FROM   dbo.CustomerAddress
WHERE  ID = @ID
ORDER BY      CreatedDate descRETURNEND
GO

Right, we’ve got all our objects in place, now we can move onto the problem in hand. Let’s say we write a query that returns the first 100 customers from the Customers table. We want it to output the CustomerID, their order date history, and their most recent address (using our TVF). To help the example, we’ll cache these customers in a temp table:

CREATE TABLE #First100Customers (CustomerID int PRIMARY KEY)

INSERT INTO #First100Customers
SELECT CustomerID
FROM   dbo.Customers
WHERE  CustomerID <= 100

SELECT C.CustomerID, O.OrderDate, TF.[Address]
FROM   #First100Customers AS C
INNER JOIN dbo.Orders AS O ON C.CustomerID = O.CustomerID
CROSS APPLY ( SELECT *
FROM dbo.tfnGetRecentAddress(O.CustomerID)) AS TF

Pic2

Nothing special about that. We get back 1,000 rows (100 customers x 10 orders), just as we’d expect. So job done, let’s go home right? Well… not quite. Take a look at the execution plan for our query, and in particularly, that TVF!

Wow! 10,000 executions of our TVF! Why? Our temp table contains only 100 customers, the JOIN onto Orders will result in 1,000 rows, so we could legitimately expect our TVF to be called 1,000 times. So why on earth is our TVF being executed 10,000 times?

Looking at the plan it starts to become clearer. The optimizer has re-written our query and decided to scan Orders first, which results in 10,000 rows (the entire table), and each ID returned is then passed into the TVF! It then executes the JOIN onto #First100Customers, which throws away the 9,000 rows that don’t match the JOIN condition. So our results look good, however the TVF was still executed 10,000 times – that’s at least 9,000 times more than was necessary.

Let’s examine how the same poorly-performing query behaves in SQL Server 2014 and SQL Server 2016:

Interesting, now the optimiser has scanned and joined our 2 tables first, which resulted in 1,000 rows, and just 1,000 executions of our TVF – exactly what we expected.

So the answer is to tell the customer to upgrade their SQL Servers, right? Er, well – no. In our example, we can solve the problem and provide a significant performance boost, across all SQL versions, with a small code tweak.

SELECT C.CustomerID, O.OrderDate, TF.[Address]
FROM   #First100Customers AS C
INNER JOIN dbo.Orders AS O ON C.CustomerID = O.CustomerID
CROSS APPLY ( SELECT *
FROM dbo.tfnGetRecentAddress(C.CustomerID)) AS TF

The change is subtle, so I’ve highlighted it above – previously we were passing in the CustomerID from our Orders table. The data value is exactly the same, but by switching to the CustomerID from the smaller table, we force the optimiser to scan #First100Customers first, which in turn forces the optimal plan.

Let’s look at our new query plan. This plan is the same in all versions of SQL Server >= 2008.

Remember when we started there were 10,000 executions! What an improvement! Only 100 executions – 1 per entry in #First100Customers!

To many of you, this may seem very obvious and natural. However for a junior developer, it’s an easy mistake to make because our results look good. However we got a sub-optimal query plan; and therefore a slow running query. In SQL Server 2014/2016, the impact was moderate, but in 2012 or earlier, it was massive.

So what do you think? Even in the later versions, is this a bug in SQL Server? It’s arguable that the optimiser should be smart enough to recognise that #First100Customers.CustomerID and Order.CustomerID are the same value (it’s the JOIN condition!), and therefore only ever execute the TVF 100 times, regardless of which reference is passed.

Well, it doesn’t, let’s get over it. At least we know not to make that mistake again!

Before I wrap up, let’s change our example slightly. The first example illustrates an important point, but arguably it’s a junior coding mistake. However, consider this scenario where that isn’t the case:

CREATE FUNCTION dbo.tfnGetOrderPrice(@OrderID int)
RETURNS @tblOrders TABLE       (
Price numeric(9,2) NOT NULL
)
AS
BEGININSERT INTO @tblOrders (Price)
VALUES (‘10.99’)RETURN
END

GO

SELECT C.CustomerID, O.OrderDate, TF.Price
FROM   #First100Customers AS C
INNER JOIN dbo.Orders AS O ON C.CustomerID = O.CustomerID
CROSS APPLY ( SELECT *
FROM dbo.tfnGetOrderPrice(O.OrderID)) AS TF

This is a new query, calling a different TVF which takes the OrderID as the parameter (and just returning a hard-coded price for the purposes of this example). Let’s examine the execution plan in SQL 2012 (or earlier):

Unsurprisingly, we’re back to 10,000 executions, and a poorly performing query. So how can we resolve this so the TVF gets executed just 1,000 times as we would expect (remember, it’s 100 customers x 10 orders)? We know from the last example that upgrading to 2014 or 2016 will fix the plan, but is there another way?

Well not that I know of, at least not an elegant way! Unfortunately, we’re going to have to start hacking.

The hack is that we can pass a dummy parameter from #First100Customers to the TVF that does nothing. That is to say, it does nothing except that the dependency forces the optimiser to produce a different plan:

ALTER FUNCTION dbo.tfnGetOrderPrice(@OrderID int, @DummyID int)
RETURNS @tblOrders TABLE       (
Price numeric(9,2) NOT NULL
)
AS
BEGIN
INSERT INTO @tblOrders (Price)
VALUES (‘10.99’)RETURN
ENDGO

SELECT C.CustomerID, O.OrderDate, TF.Price
FROM   #First100Customers AS C
INNER JOIN dbo.Orders AS O ON C.CustomerID = O.CustomerID
CROSS APPLY ( SELECT *
FROM dbo.tfnGetOrderPrice(O.OrderID, C.CustomerID)) AS TF

You can see that thanks to our hack, we now have an optimal plan, resulting in 1,000 executions of our TVF, instead of 10,000.

I try to follow the mantra of “assume hacking isn’t possible”. Sometimes you don’t have a choice, but please comment if you know a nicer way of solving this problem.

My aim in this blog wasn’t to teach you a fairly dirty hack. It was to highlight this particular behaviour of TVFs, and to stress the importance of examining your execution plans carefully. If you’re new to SQL Server, and want to make that leap from coder to developer, then it’s important to learn that SQL Server programming doesn’t stop at the moment your query returns the results you expect.

Thanks for reading and please comment below.





Getting started with the Raspberry PI, PIFace and Scratch.

8 07 2013

I’ve just gone through the slightly torturous process of setting up my Raspberry PI (RPi) and PiFace. For my own reference, and for others, here’s list of useful resources. Hopefully this will give you all the pieces of the ‘Pi’ that you need (sorry!!!).

Will add more detail as soon as I can. I’m posting this straight away so others can start benefiting immediately. Note that some of the steps, such as ‘enabling mesh in Scratch’ may not be necessary, it depends on which version of Linux you’re using. Personally I’ve found the one referenced above to be the most stable, although it does require a little more manual setup than others.

Note: Always shutdown your RPi correctly (type sudo shutdown from the LX Terminal window). Otherwise, be prepared to re-image your SD card on a regular basis!

A quick intro to Linux

I’m by no means an expert on Linux. I hadn’t had the ‘pleasure’ of it until I bought my RPi, For the none technical among you, here’s a quick guide to help you get up to speed.

You’ve all used Windows right? Most of you have probably used IOS (IPhone) or Android on your phones. These are all special types of software called ‘Operating Systems’ and can be thought of as the ‘head teacher’ of you’re device. They’re responsible for running all other software that runs on your computer, as well as a whole bunch of other stuff like memory management, security etc.

Linux is another one of these Operating Systems, and is the one required by the RPi. Just like Microsoft Windows (which has many versions including XP, Vista, Windows 7 & 8), Linux also has many versions out there. However because Linux is open source (it’s source code is ‘unlocked’) then anyone with the technical skills can make their own version. Consequently, when you come to install Linux on your Pi, you may find yourself bombarded with all the different ‘versions’ and ‘builds’ that you can choose from.

The guys behind the RPi have made a customised version of Linux which is tweaked to make it play especially nice with the Pi. You can find that version by following the link above. Note that there are other versions out there which have been especially customised for using Scratch, however I haven’t tried them personally.

Keep an eye on my tweets and I’ll post more useful stuff as I find it @ShaunRAustin





Teaching Scratch

19 07 2012

Introduction

This is my second blog on my experiences in teaching programming to children. The first discussed what happened when I went into a primary school to teach some of the basics of Computer Science, such as binary, logic, and cryptology.

IT in education is a big topic in the UK at the moment and has been ever since Education Secretary Michael Gove announced that he wanted to replace the traditional ICT lessons with Computer Science Lessons. This is something I’ve felt strongly about for some time, since my personal memories of learning IT in school are:

  • Enter data in Excel, sum up values.
  • Type letter in Word, maybe do a mail merge.
  • Create a PowerPoint presentation and make text fly in!

The problem is kids nowadays already know how to do that when they leave primary, and often well before. My four year old was using a computer before riding his first bike, and his primary school have recently bought a bunch of IPads (admittedly IPad 2, but let’s not be picky)! The world of app’s has opened up in front of our children and they’re inquisitive, now let’s make them capable!

Scratch

I won’t go into much detail about what Scratch is as it’s relatively well established, other than repeating the marketing blurb from the web site:

Scratch is a programming language that makes it easy to create your own interactive stories, animations, games, music, and art — and share your creations on the web.
 

Scratch is perfect as an introduction to programming because you can produce something in no time at all, and produce something tangible at that!  – not learn a concept or a methodology, this is something that the students will be able to share with the family and peers.

The lessons took place week commencing 7th July with a group of 10 students from the Waterhead Academy, Oldham. Thankfully the Waterhead Academy’s IT lessons are already far better than my own (and I suspect many other school’s), so some of the children had used Scratch before, some hadn’t, but all were there because they wanted to be, they had an interest in programming and it was my job to nourish that.

Each lesson was one hour long, starting with ‘what is software’ right through to making a game. Overall the response was everything I’ve hoped for and more. The students were fantastic, they followed the tasks I set them, but also expanded on them as they thought of their own ideas, to the point where much of my lesson plan went out of the window and the students were dictating where we went next. From lesson one they made it clear they wanted to make games (no surprise!), so I brought that in early and ditched one of the other planned lessons. A game was important because, however simple, it allowed the kids to make the connection between what we were doing, and what real Software Developers do, and having something tangible that they could play with and show their peers was a big motivation to do a good job!

I wanted to share a summary of what we did each lesson. I got some of the ideas from a series of online lessons plans, which can be found here. It’s important to note that I had a mixed ability group, and some of the students had used Scratch before and were familiar with concepts such as IF…ELSE…FOREVER

Click here to download the lesson plan : Scratch Lesson Plan





Things Network Admins do to seriously annoy developers when building their machines

16 03 2012

One of the cool things about my job is that every time we hire a bunch of new developers, my machine gets passed down to them and I get a new, more-powerful one.  However this does mean placing my trust in dubious bunch of people known as ‘Network Administrators’ who have the job of building my new PC (they can’t watch server logs all day). This is my list of infuriating things that network admin’s “forget” to do when setting up a new machine for a developer:

  • Don’t make the developer an Adminstrator.
  • Leave of random bits of crucial software e.g. IIS
  • Schedule anti-virus scans to run during working hours, and try to ensure that it runs long enough to clash with the next run.
  • Disable random critical Windows features.
  • Don’t install printers or other useful devices.
  • Don’t enable Remote Desktop.
  • Don’t activate software licenses.
  • Don’t perform a Windows update.

I’d be interested in hearing everyone else’s particular annoyances, and I’ll send the full list to our network team, assuming my email is set up.