THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Rob Farley

- Owner/Principal with LobsterPot Solutions (a MS Gold Partner consulting firm), Microsoft MVP (SQL Server) and leader of the SQL User Group in Adelaide, Australia.

  • 24 hours to pass until 24 Hours of PASS

    There’s a bunch of stuff going on at the moment in the SQL world, so if you’ve missed this particular piece of news, let me tell you a bit about it.

    Twice a year, the SQL community puts on its biggest virtual event – 24 Hours of PASS. And the next one is tomorrow – March 21st, 2012. Twenty-four sessions, back-to-back, featuring a selection of some of the best presenters in the SQL world, speakers from all over the world, coming together in an online collaboration that so far has well over thirty thousand registrations across the presentations. Some people are signed up for all 24 sessions, some only one.

    Traditionally, LiveMeeting has been used as the platform for this event, but this year we’re going with a new platform – IBTalk. It promises big, and we’re hoping it won’t let us down. LiveMeeting has been great, and we thank Microsoft for providing it as a platform for the past few years. However, as the event has grown, we’ve found that a new idea is necessary. Last year a search was done for a new platform, and IBTalk ticked the right boxes. The feedback from the presenters and moderators so far has been overwhelmingly positive, and we’re hoping that this is going to really enhance the user experience.

    One of my favourite features of the platform is the language side. It provides a pretty good translation service. Users who join a session will see a flag on the left of the screen. If they click it, they can change the language to one of 15 on offer. Picking this changes all the labels on everything. It even translates the text in the Q&A window.

    What this means is that someone from Brazil can ask their question in Portuguese, and the presenter will see it in English. Then if the answer is typed in English, the questioner will be able to see the answer, also in Portuguese. Or they can switch to English to see it as the answerer typed it. I know there’s always the risk of bad translations going on, but I’ve heard good things about this translation service.

    But there’s more – IBTalk are providing staff to type up closed captioning live during the event. So if English isn’t your first language, don’t worry!

    Picking your language will also let you see subtitles in your chosen language. I’m hoping that this event is the start of PASS being able to reach people from all corners of the world. Wouldn’t it be great to find that this event is successful, and that the next 24HOP (later in the year, our Summit Preview event) has just as many non-English speakers tuning in as English speakers?

    If you haven’t been planning which sessions you’re going to attend, you really should get over to sqlpass.org/24hours and have a look through what’s on offer. There’s some amazing material from some of the industry’s brightest, covering a wide range of topics, from classic SQL areas to the brand new SQL 2012 features. There really should be something for every SQL professional. Check the time zones though – if you’re in the US you might be on Summer time, and an hour closer to GMT than normal.

    Massive thanks must go to Microsoft, SQL Sentry and Idera for sponsoring this event. Without sponsors we wouldn’t be able to put any of this on. These companies are helping 24HOP continue to grow into an event for the whole world.

    See you tomorrow!

    @rob_farley | #24hop | #sqlpass

  • Be the surgeon

    It’s a phrase I use often, especially when teaching, and I wish I had realised the concept years earlier. (And of course, fits with this month’s T-SQL Tuesday topic, hosted by Argenis Fernandez)TSQL2sDay150x150

    When I’m sick enough to go to the doctor, I see a GP. I used to typically see the same guy, but he’s moved on now. However, when he has been able to roughly identify the area of the problem, I get referred to a specialist, sometimes a surgeon.

    Being a surgeon requires a refined set of skills. It’s why they often don’t like to be called “Doctor”, and prefer the traditional “Mister” (the history is that the doctor used to make the diagnosis, and then hand the patient over to the person who didn’t have a doctorate, but rather was an expert cutter, typically from a background in butchering). But if you ask the surgeon about the pain you have in your leg sometimes, you’ll get told to ask your GP. It’s not that your surgeon isn’t interested – they just don’t know the answer.

    IT is the same now.

    That wasn’t something that I really understood when I got out of university. I knew there was a lot to know about IT – I’d just done an honours degree in it. But I also knew that I’d done well in just about all my subjects, and felt like I had a handle on everything. I got into developing, and still felt that having a good level of understanding about every aspect of IT was a good thing.

    This got me through for the first six or seven years of my career.

    But then I started to realise that I couldn’t compete.

    I’d moved into management, and was spending my days running projects, rather than writing code. The kids were getting older. I’d had a bad back injury (ask anyone with chronic pain how it affects  your ability to concentrate, retain information, etc). But most of all, IT was getting larger.

    I knew kids without lives who knew more than I did. And I felt like I could easily identify people who were better than me in whatever area I could think of. Except writing queries (this was before I discovered technical communities, and people like Paul White and Dave Ballantyne). And so I figured I’d specialise.

    I wish I’d done it years earlier.

    Now, I can tell you plenty of people who are better than me at any area you can pick. But there are also more people who might consider listing me in some of their lists too. If I’d stayed the GP, I’d be stuck in management, and finding that there were better managers than me too.

    If you’re reading this, SQL could well be your thing. But it might not be either. Your thing might not even be in IT. Find out, and then see if you can be a world-beater at it.

    But it gets even better, because you can find other people to complement the things that you’re not so good at.

    My company, LobsterPot Solutions, has six people in it at the moment. I’ve hand-picked those six people, along with the one who quit. The great thing about it is that I’ve been able to pick people who don’t necessarily specialise in the same way as me. I don’t write their T-SQL for them – generally they’re good enough at that themselves. But I’m on-hand if needed. Consider Roger Noble, for example. He’s doing stuff in HTML5 and jQuery that I could never dream of doing to create an amazing HTML5 version of PivotViewer. Or Ashley Sewell, a guy who does project management far better than I do. I could go on. My team is brilliant, and I love them to bits. We’re all surgeons, and when we work together, I like to think we’re pretty good!

    @rob_farley

  • 24 Hours of PASS

    Hooray – we’re ready to announce the details of the 24 Hours of PASS event that’s coming to an internet connection near you on March 21st! Read on for the day’s schedule, some of my thoughts on the 24 Hours of PASS concept, information about the platform, and an announcement which I think is really quite a big deal and worth making a fuss over.

    24 Hours Straight

    So – March 21st. No overlap with any other days, just 24 Hours of PASS, squeezed back into a single day. The last few events have been split over two days, running from noon to midnight (GMT), which conveniently fits in daylight hours of the Eastern United States. To help with the current push for internationalisation (and there’s more on that later in this post), we’re starting at midnight GMT (which is a perfectly reasonable 7pm in New York), and running through to the following midnight GMT (which is the next 7pm in New York, strangely enough). I’d like to be able to give prizes to people who attend all twenty-four, but I’m not quite that keen. Tell you what though – I’ll cheer on Twitter and my blog for anyone who tells me they’ve made it through all of them. I know people have done it before, and hopefully plenty will again.

    VC involvement

    The Virtual Chapter portfolio isn’t mine. It’s Denise McInerney’s. She’s doing a brilliant job, and we both see a strong correlation between 24HOP and the VCs. There are likely to be lots of people who tune in to 24HOP who haven’t really noticed that there are virtual chapters that meet, providing excellent online meetings, several times every month! We’ve roped in many of the VC leaders to help choose the sessions, host the sessions, and more. In fact, I suspect that many of the sessions that were submitted but not chosen could well be getting selected for the Virtual Chapters in the weeks and months to come. The Virtual Chapters are one of the best things about PASS, and it’s great that 24HOP can be showcasing them this time around. Most of the VCs are represented, including Performance, Data Architecture, PowerShell, Professional Development, and the larger ones like BI and AppDev. Even the newly forming BigData VC. The Oracle VC isn’t. Sorry, Scott.

    Platform

    LiveMeeting is such a great product. If you haven’t tried it out, you really should. It’s been our platform of choice here at 24HOP for ages now. Unfortunately, 24HOP seems to have outgrown it. People who have tuned in to recent events will have noticed that the video hasn’t always streamed as nicely as we’d like, and although we really do love LiveMeeting, we’re going to try a different platform – called IBTalk. This thing looks really impressive, and has some really cool features too. Most importantly, it should scale well. And record nicely. And allow eval forms nicely. And make coffee* (*feature list may not be completely accurate).

    Internationalisation

    I know you’ve already read about how the event is going to be 24 hours straight, ignoring the fact that numbers may well be lower while the US sleeps. But that’s not the only way in which 24HOP is becoming more international...

    During the event, the IBTalk people are going to provide Live Closed Captioning. So you can get subtitles during the event!

    But there’s more...

    This captioning will be available in fifteen different languages!

    English, French, Spanish, Portuguese, Italian, Dutch, German, Russian, Turkish, Arabic, Traditional Chinese, Simplified Chinese, Japanese, Korean, and Thai. You want subtitles in Thai? No problem. Watching with a Turkish friend? Sure, no worries. Even if you’re just more comfortable in Dutch than English...

    PASS is serious about trying to reach people around the world. With these languages, I think most people are covered. There are a few gaps, such as not having Swedish or Bangla – but apparently just about everyone in IT in those areas speaks English anyway (so I’m told).

    The platform also has an amazing feature that translates questions, so if you're more comfortable asking a question in Russian or Portguese, you can do that. The presenter will see the question in their own language, and the attendee will read the answer in theirs.

    The Schedule!

    As well as what you can see below, the schedule is live over at http://www.sqlpass.org/24hours/spring2012/SessionsbySchedule.aspx and http://www.sqlpass.org/24hours/spring2012/SessionsbyTrack.aspx. Please head over to the PASS site to register for the sessions you’re interested in. But in the meantime, check out this list! I’m sure you’ll recognise plenty of the names, but probably not all. You’ll also notice there are plenty of speakers from different parts of the world, including the UK (like Mark and Neil), Australia (like Julie), and Israel (like Ami).

    Time

    Title

    Speaker

    Track / VC

    00:00 - 01:00 GMT SSIS Tips & Tricks Josef Richberg AppDev
    01:00 - 02:00 GMT Persistence In The Cloud: How to use Azure Storage David Giard Azure
    02:00 - 03:00 GMT Tier-1 BI in the Age of Bees and Elephants Denny Lee BigData / BI
    03:00 - 04:00 GMT Performance Tuning for Pirates! John Sterrett Perf
    04:00 - 05:00 GMT Integrating DQS, MDS and Your Data Warehouse Christopher Price BI
    05:00 - 06:00 GMT High Volume Data Processing Techniques Without Driving Your DBA Crazy! Julie Koesmarno Perf
    06:00 - 07:00 GMT Upgrading and Overhauling Your SSIS Packages for 2012 Jessica Moss BI
    07:00 - 08:00 GMT A Window into Your Data: Using Window Functions in T-SQL Steve Hughes AppDev
    08:00 - 09:00 GMT VertiPaq under the hood Marco Russo BI
    09:00 - 10:00 GMT STOP! Consolidate and Listen! Jorge Segarra Virtualization
    10:00 - 11:00 GMT Where are my (primary) keys? Ami Levin Data Architecture
    11:00 - 12:00 GMT SQL Server 2012 Memory Management Neil Hambly DBA
    12:00 - 13:00 GMT Enterprise Data Mining with SQL Server Mark Tabladillo BI
    13:00 - 14:00 GMT Automate Policy-Based Management using PowerShell Allen White PowerShell
    14:00 - 15:00 GMT SQL Server First Responder Kit Kendra Little DBA
    15:00 - 16:00 GMT Fitting Microsoft Hadoop into your Enterprise BI Strategy Cindy Gross BigData / BI
    16:00 - 17:00 GMT Implementing SQL Server 2012 on Windows Server Core Edwin Sarmiento PowerShell
    17:00 - 18:00 GMT A Deep Dive in SQL Server 2012 Data Warehousing Dejan Sarka BI
    18:00 - 19:00 GMT Panel: I Was Young and Didn't Know Any Better Karen Lopez ProfDev
    19:00 - 20:00 GMT Introducing SQL Server Data Tools (SSDT) Nabeel Derhem AppDev
    20:00 - 21:00 GMT Moves Like Jagger - Upgrading to SQL Server 2012 Mark Broadbent DBA
    21:00 - 22:00 GMT What to Look For in SQL Server 2012 Execution Plans Grant Fritchey AppDev
    22:00 - 23:00 GMT SQL 2012 - HA and DR Lots of New Options Joseph D’Antoni DBA
    23:00 - 00:00 GMT Improving the Performance of your Data Warehouse Queries with Columnstore Indexes Benjamin Nevarez Perf

    It’s going to be a massive event. Get registering soon!

    [Edit: Most of this post also appears at: http://www.sqlpass.org/Community/PASSBlog/entryid/410/24-Hours-of-PASS.aspx]

  • BigData featuring in 24 Hours of PASS

    This is not the big announcement that I’ve promised about 24 Hours of PASS (which starts in five weeks from now!) – that’s coming later this week (and for those of you who haven’t picked up on it – 24HOP is the portfolio I’ve been given since becoming an elected director of PASS).

    The announcement that is coming later in the week will contain the full schedule (which is being finalised as we speak), and will also include another piece of news that I’m very excited to be able to announce. (Although by now I’m curious about whether this will end up being an anti-climax for those of you expecting me to announce that we’re going to be broadcasting from the moon or something)

    TSQL2sDay150x150But in keeping with this month’s T-SQL Tuesday (which feels like one of the only things that gets me taking time out to write – hard to think that it’s two years since I hosted), I thought I’d mention what’s happening on the Big Data side with 24 Hours of PASS.

    If you haven’t been paying attention, Big Data is the Big Topic du jour. Microsoft is running with the Apache Hadoop platform – which is hugely significant when you consider the power of open source here. I’m not going to suggest that you’re missing out if you’re not already jumping in with both feet – but I think this area is set to grow significantly.

    Anyway – I’m going to leak the details for two of the 24 Hours of PASS sessions. You’ll be able to register for these in a few days’ time.

    At 02:00 - 03:00 GMT (what is this in your time zone?), we have:

    Denny Lee: Tier-1 BI in the Age of Bees and Elephants

    In this age of Big Data, data volumes become exceedingly larger while the technical problems and business scenarios become more complex.  This session dives provides concrete examples of how these can be solved. Highlighted will be the use of Big Data technologies including Hadoop (elephants) and Hive (bees) with Analysis Services.  Customer examples including Klout and Yahoo! (with their 24TB cube) will highlight both the complexities and solutions to these problems.

    And at 15:00 - 16:00 GMT (what is this in your time zone?), we have:

    Cindy Gross: Fitting Microsoft Hadoop into your Enterprise BI Strategy

    What is this Big Data thing and why should you care? Learn about Microsoft's Hadoop connectors, our Hive add-in for Excel, what Pig and Hive are, and more. Big Data is an exciting foray into the world of data previously too big to load and query in an affordable manner. With the new Big Data tools you can expand your BI reach be the hero who helps your company make better business decisions.

    Both these sessions are going to be great. If you’ve heard either Denny or Cindy present before, you know that they’re both technically excellent and fantastic presenters too. What’s more, with these sessions being roughly 12 hours apart, hopefully you can get to at least one, no matter where in the world you are. (In fact, Denny’s session is in the timeslot that the Adelaide SQL Server User Group uses – so we’ll be showing this on the big screen at our regular event).

  • APPLY – not exactly set-based

    In my last post, I showed a technique for dealing with working columns when writing T-SQL. The idea was around using APPLY to be able to push values from the existing set through calculations (but preferably not scalar functions, of course), producing new columns which can be used further down the query, even in the WHERE and GROUP BY clauses. Useful stuff indeed.

    But there is a limitation which I didn’t cover, and I feel that this is worth writing about for this month’s T-SQL Tuesday. The theme allows people to write about previous Tuesday topics – so I’m going to revisit topics 17 and 25 as I go a little further into APPLY.

    These working columns can only be applied if they are calculations on a single row of data. The resulting set might be bigger or smaller based on how many rows are affected, but the input (at least logically) should be considered on a row-by-row basis.

    Consider the following query:

    SELECT p.Name, r.RevName
    FROM
    Production.ProductSubcategory AS s
    CROSS APPLY
    (SELECT REVERSE(s.Name) AS RevName) AS r
    CROSS APPLY
    (
        SELECT TOP (1) *
        FROM Production.Product AS p
        WHERE p.ProductSubcategoryID = s.ProductSubcategoryID
        ORDER BY p.ListPrice DESC
    ) AS p
    ORDER BY s.ProductSubcategoryID;

    You see two CROSS APPLYs here – one creating a simple working column in REVERSE(s.Name), the other even involving another table to do a lookup. This second CROSS APPLY doesn’t necessarily produce any rows – if there is no matching Product, the resultset won’t contain any rows for that ProductSubcategory. I’ve shown some of its results below:

    But what about the following query, which gives the same 37 rows on the AdventureWorks database:

    SELECT
    (
        SELECT TOP (1) p.Name
        FROM Production.Product AS p
        WHERE p.ProductSubcategoryID = s.ProductSubcategoryID
        ORDER BY p.ListPrice DESC
    ),
    REVERSE(s.Name) AS RevName
    FROM Production.ProductSubcategory AS s
    ORDER BY s.ProductSubcategoryID;

    This is very similar, but you’ll notice that instead of developing the result set in the FROM clause, I’ve used the calculations directly in the SELECT clause to produce the same result.

    Except that it’s not the same.

    Here, my TOP sub-query can only produce a single value. You’ll notice I change my query to fetch only a single column now – any more would give an error. And I’m lucky I’ve selected TOP(1), not TOP(2), or TOP (1) WITH TIES – which would both be legal in my APPLY system. You’ll see that APPLY provides additional flexibility here.

    So now consider the following query:

    SELECT
        p.Name,
        p.ProductSubcategoryID,
        COUNT(*) OVER (PARTITION BY p.ProductSubcategoryID) AS SubcatCnt
    FROM Production.Product AS p
    ORDER BY p.ProductSubcategoryID, p.Name;

    You will see that both look at the list of Products, and produce a count of the number of products which are in the same Subcategory. I’m sure you’re all very familiar with the OVER clause and the use of the windowing technique provided by the PARTITION BY clause.

    If we run this first query and look at some of the results, you’ll see the marvellous windowing technique, which I’ve highlighted using ZoomIt. You’ll see that the SubcatCnt column has the value 3 for the Subcategory with 3 items, and 6 for the one with 6, and so on.

    Excellent stuff. But you’ll probably also be aware that you can’t use windowing functions like this in the WHERE clause – they’re applied only in the SELECT clause.

    Having seen my working columns trick though, you might think otherwise.

    Check this out – it’s perfectly legal!

    SELECT
    p.Name,
    p.ProductSubcategoryID,
    sc.SubcatCnt
    FROM Production.Product AS p
    CROSS APPLY
    (
    SELECT COUNT(*) OVER (PARTITION BY p.ProductSubcategoryID) AS SubcatCnt
    ) AS sc
    ORDER BY p.ProductSubcategoryID, p.Name;

    Amazing stuff. I can now use SubcatCnt in the WHERE clause. Job done!

    But I’m having you on. It’s a trick. It’s rubbish.

    You see, APPLY logically works on a single row at a time, as I said earlier. Look at the same block of results for this second query.

    Loads of 1s. Not exactly helpful. It’s accurate though – there is a count of one row in each partition of the set that is logically passed in each time.

    APPLY works out the result for each row individually. It doesn’t do it for the whole set. So if you’re wanting working columns that need to be applied to multiple rows at once, then you need to think of another solution.

    APPLY is good, but not that good. Know its limitations and you’ll be better off.

  • A T-SQL Tip: Working calculations

    T-SQL Tuesday again and this month is on T-SQL Tips (thanks Allen!). In some ways it’s a tough topic, because there are things I don’t really consider tips that other people do, and vice-versa. This one’s legitimate though: Using CROSS APPLY for working columns in calculations.

    Let me give you an example. TSQL2sDay150x150

    Back in 2009, I wrote a blog post on Julian (YYDDD) dates. http://msmvps.com/blogs/robfarley/archive/2009/03/25/converting-to-and-from-julian-format-in-t-sql.aspx.

    Someone asked me recently about using this for time periods, for example, someone’s age when they play a football match. For example, Cesc Fàbregas holds the record as the youngest Arsenal player, aged 16 years and 177 days. Given someone’s birthdate and date of interest, this seems like it should be quite easy to work out, but can get frustratingly tricky.

    The idea is quite straight forward – you just count the number of days from their most recent birthday. So let’s have a think.

    Counting the number of years between the startdate and the enddate is a nice place to start. I could count the number of days, but that might give me rounding error based on leap years. But counting the years simply compares the year component, such as 1987 and 2003, and the player might not have had his birthday this year yet. In that case, we’d want to go with the year before. Anyway – once we’ve figured out how old the player is, we can figure out when their last birthday was and count the days since then. Easy.

    But so easy to make a mistake somewhere.

    And this is where CROSS APPLY can come into its own, by allowing us to use working columns.

    Let’s start with a few players. Some young ones, and Thierry Henry, just because. I’m using the VALUES method, but you could have a Players table just as easily.

    select *
    from
    (
        values
         ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date))
        ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date))
        ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date))
        ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date))
    ) p (Name, DoB, Debut)

    image

    First, I want to make it clear which column is my startdate and which is my enddate. It’s too easy to be thinking “startdate” and pick up “Debut” here, because this sounds very much like the same thing. There’s the startdate of their playing career, and the startdate of the calculation. To avoid confusion, I’m going to do some simple column-renaming. This gives me more reusable code, and APPLY even means I never have to worry about whether I already have these column names referring to something else, because they’re going to get their own table alias too.

    select *
    from
    (
        values
         ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date))
        ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date))
        ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date))
        ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date))
    ) p (Name, DoB, Debut)
    CROSS APPLY
       (select p.DoB as startdate, p.Debut as enddate) as working

    image

    Might seem like a bit of a waste to you, but it means so much to me. Really.

    Now I want to count how many years there are between my startdate and enddate, and work out when the startdate is this year. I could do this in one step, sure, but I want to be able to check my working really thoroughly.

    select *
    from
    (
        values
         ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date))
        ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date))
        ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date))
        ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date))
    ) p (Name, DoB, Debut)
    CROSS APPLY
    (select p.DoB as startdate, p.Debut as enddate) as working
    CROSS APPLY
    (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd
    CROSS APPLY
    (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty

    image

    Now I can easily test to see if I need to subtract a year or not.

    select *
    from
    (
        values
         ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date))
        ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date))
        ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date))
        ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date))
    ) p (Name, DoB, Debut)
    CROSS APPLY
    (select p.DoB as startdate, p.Debut as enddate) as working
    CROSS APPLY
    (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd
    CROSS APPLY
    (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty
    CROSS APPLY
    (select case when sdty.StartDateThisYear > working.enddate then -1 else 0 end as YearOffset) yo
    CROSS APPLY
    (select dateadd(year,yd.YearsDifferent + yo.YearOffset,working.startdate) as AdjustedYear) ay

    image

    I’m sure you can see where this is going. I now have their latest birthday before the date I’m looking for, and I can easily turn this into a Julian Date format.

    select p.*, jd.JulianDiff
    from
    (
        values
         ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date))
        ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date))
        ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date))
        ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date))
    ) p (Name, DoB, Debut)
    CROSS APPLY
       (select p.DoB as startdate, p.Debut as enddate) as working
    CROSS APPLY
       (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd
    CROSS APPLY
       (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty
    CROSS APPLY
       (select case when sdty.StartDateThisYear > working.enddate then -1 else 0 end as YearOffset) yo
    CROSS APPLY
       (select dateadd(year,yd.YearsDifferent + yo.YearOffset,working.startdate) as AdjustedYear) ay
    CROSS APPLY
       (select (yd.YearsDifferent + yo.YearOffset) * 1000 as YearComponent) yc
    CROSS APPLY
       (select datediff(day,ay.AdjustedYear,working.enddate) as DayComponent) dc
    CROSS APPLY
       (select yc.YearComponent + dc.DayComponent as JulianDiff) as jd
    ;

    image

    The great thing about this is that I can be very confident of my working, being able to check each step of the calculation along the way. I can even turn it into a simple TVF, whilst keeping the calculations just as verifiable:

    create function dbo.YYDDD(@startdate date, @enddate date) returns table
    as return
    (
    select jd.JulianDiff
    FROM
    (values (@startdate, @enddate)) working (startdate, enddate)
    CROSS APPLY
       (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd
    CROSS APPLY
       (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty
    CROSS APPLY
       (select case when sdty.StartDateThisYear > working.enddate then -1 else 0 end as YearOffset) yo
    CROSS APPLY
       (select dateadd(year,yd.YearsDifferent + yo.YearOffset,working.startdate) as AdjustedYear) ay
    CROSS APPLY
       (select (yd.YearsDifferent + yo.YearOffset) * 1000 as YearComponent) yc
    CROSS APPLY
       (select datediff(day,ay.AdjustedYear,working.enddate) as DayComponent) dc
    CROSS APPLY
       (select yc.YearComponent + dc.DayComponent as JulianDiff) as jd
    )
    ;

    select p.*, jd.JulianDiff
    from
    (
        values
         ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date))
        ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date))
        ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date))
        ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date))
    ) p (Name, DoB, Debut)
    CROSS APPLY dbo.YYDDD(DoB, Debut) jd;

    I know it’s really simple, but it’s an amazingly useful tip for making sure that you do things right.

    Oh, and as for the plan – SQL just rolls it all up into a single Compute Scalar as if we’d written it out the long way, happily showing us that doing it step-by-step ourselves is no worse at all.

    image

    @rob_farley

  • Stored Procedures with SSRS? Hmm… not so much

    Little Bobby Tables’ mother says you should always sanitise your data input. Except that I think she’s wrong. The SQL Injection aspect is for another post, where I’ll show you why I think SQL Injection is the same kind of attack as many other attacks, such as the old buffer overflow, but here I want to have a bit of a whinge about the way that some people sanitise data input, and even have a whinge about people who insist on using stored procedures for SSRS reports.

    Let me say that again, in case you missed it the first time:

    I want to have a whinge about people who insist on using stored procedures for SSRS reports.

    Let’s look at the data input sanitisation aspect – except that I’m going to call it ‘parameter validation’. I’m talking about code that looks like this:

    create procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as
    begin
        /* First check that @eomdate is a valid date */
        if isdate(@eomdate) != 1
        begin
            select 'Please enter a valid date' as ErrorMessage;
            return;
        end

        /* Then check that time has passed since @eomdate */
        if datediff(day,@eomdate,sysdatetime()) < 5
        begin
            select 'Sorry - EOM is not complete yet' as ErrorMessage;
            return;
        end
       
        /* If those checks have succeeded, return the data */
        select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales
        from Sales.SalesOrderHeader
        where OrderDate >= dateadd(month,-1,@eomdate)
            and OrderDate < @eomdate
        group by SalesPersonID
        order by SalesPersonID;
    end

    Notice that the code checks that a date has been entered. Seriously??!! This must only be to check for NULL values being passed in, because anything else would have to be a valid datetime to avoid an error.

    The other check is maybe fair enough, but I still don’t like it.

    The two problems I have with this stored procedure are the result sets and the small fact that the stored procedure even exists in the first place. But let’s consider the first one of these problems for starters. I’ll get to the second one in a moment.

    If you read Jes Borland (@grrl_geek)’s recent post about returning multiple result sets in Reporting Services, you’ll be aware that Reporting Services doesn’t support multiple results sets from a single query. And when it says ‘single query’, it includes ‘stored procedure call’. It’ll only handle the first result set that comes back. But that’s okay – we have RETURN statements, so our stored procedure will only ever return a single result set.  Sometimes that result set might contain a single field called ErrorMessage, but it’s still only one result set.

    Except that it’s not okay, because Reporting Services needs to know what fields to expect. Your report needs to hook into your fields, so SSRS needs to have a way to get that information. For stored procs, it uses an option called FMTONLY.

    When Reporting Services tries to figure out what fields are going to be returned by a query (or stored procedure call), it doesn’t want to have to run the whole thing. That could take ages. (Maybe it’s seen some of the stored procedures I’ve had to deal with over the years!)

    So it turns on FMTONLY before it makes the call (and turns it off again afterwards). FMTONLY is designed to be able to figure out the shape of the output, without actually running the contents. It’s very useful, you might think.

    set fmtonly on
    exec dbo.GetMonthSummaryPerSalesPerson '20030401';
    set fmtonly off

    Without the FMTONLY lines, this stored procedure returns a result set that has three columns and fourteen rows. But with FMTONLY turned on, those rows don’t come back.

    But what I do get back hurts Reporting Services.

    image

    It doesn’t run the stored procedure at all. It just looks for anything that could be returned and pushes out a result set in that shape. Despite the fact that I’ve made sure that the logic will only ever return a single result set, the FMTONLY option kills me by returning three of them.

    It would have been much better to push these checks down into the query itself.

    alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as
    begin
        select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales
        from Sales.SalesOrderHeader
        where
        /* Make sure that @eomdate is valid */
            isdate(@eomdate) = 1
        /* And that it's sufficiently past */
        and datediff(day,@eomdate,sysdatetime()) >= 5
        /* And now use it in the filter as appropriate */
        and OrderDate >= dateadd(month,-1,@eomdate)
        and OrderDate < @eomdate
        group by SalesPersonID
        order by SalesPersonID;
    end

    Now if we run it with FMTONLY turned on, we get the single result set back. But let’s consider the execution plan when we pass in an invalid date.

    First let’s look at one that returns data. I’ve got a semi-useful index in place on OrderDate, which includes the SalesPersonID and TotalDue fields. It does the job, despite a hefty Sort operation.

    image

    …compared to one that uses a future date:

    image

    You might notice that the estimated costs are similar – the Index Seek is still 28%, the Sort is still 71%. But the size of that arrow coming out of the Index Seek is a whole bunch smaller.

    The coolest thing here is what’s going on with that Index Seek. Let’s look at some of the properties of it.

    image

    Glance down it with me… Estimated CPU cost of 0.0005728, 387 estimated rows, estimated subtree cost of 0.0044385, ForceSeek false, Number of Executions 0.

    That’s right – it doesn’t run. So much for reading plans right-to-left...

    The key is the Filter on the left of it. It has a Startup Expression Predicate in it, which means that it doesn’t call anything further down the plan (to the right) if the predicate evaluates to false.

    image

    Using this method, we can make sure that our stored procedure contains a single query, and therefore avoid any problems with multiple result sets. If we wanted, we could always use UNION ALL to make sure that we can return an appropriate error message.

    alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as
    begin
        select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, /*Placeholder: */ '' as ErrorMessage
        from Sales.SalesOrderHeader
        where
        /* Make sure that @eomdate is valid */
            isdate(@eomdate) = 1
        /* And that it's sufficiently past */
        and datediff(day,@eomdate,sysdatetime()) >= 5
        /* And now use it in the filter as appropriate */
        and OrderDate >= dateadd(month,-1,@eomdate)
        and OrderDate < @eomdate
        group by SalesPersonID

        /* Now include the error messages */
        union all
        select 0, 0, 0, 'Please enter a valid date' as ErrorMessage
        where isdate(@eomdate) != 1
        union all
        select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage
        where datediff(day,@eomdate,sysdatetime()) < 5

        order by SalesPersonID;
    end

    But still I don’t like it, because it’s now a stored procedure with a single query. And I don’t like stored procedures that should be functions.

    That’s right – I think this should be a function, and SSRS should call the function. And I apologise to those of you who are now planning a bonfire for me. Guy Fawkes’ night has already passed this year, so I think you miss out. (And I’m not going to remind you about when the PASS Summit is in 2012.)

    create function dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime)
    returns table as
    return (
        select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, '' as ErrorMessage
        from Sales.SalesOrderHeader
        where
        /* Make sure that @eomdate is valid */
            isdate(@eomdate) = 1
        /* And that it's sufficiently past */
        and datediff(day,@eomdate,sysdatetime()) >= 5
        /* And now use it in the filter as appropriate */
        and OrderDate >= dateadd(month,-1,@eomdate)
        and OrderDate < @eomdate
        group by SalesPersonID
        union all
        select 0, 0, 0, 'Please enter a valid date' as ErrorMessage
        where isdate(@eomdate) != 1
        union all
        select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage
        where datediff(day,@eomdate,sysdatetime()) < 5
    );

    We’ve had to lose the ORDER BY – but that’s fine, as that’s a client thing anyway. We can have our reports leverage this stored query still, but we’re recognising that it’s a query, not a procedure. A procedure is designed to DO stuff, not just return data. We even get entries in sys.columns that confirm what the shape of the result set actually is, which makes sense, because a table-valued function is the right mechanism to return data.

    And we get so much more flexibility with this.

    If you haven’t seen the simplification stuff that I’ve preached on before, jump over to http://bit.ly/SimpleRob and watch the video of when I broke a microphone and nearly fell off the stage in Wales. You’ll see the impact of being able to have a simplifiable query. You can also read the procedural functions post I wrote recently, if you didn’t follow the link from a few paragraphs ago.

    So if we want the list of SalesPeople that made any kind of sales in a given month, we can do something like:

    select SalesPersonID
    from dbo.GetMonthSummaryPerSalesPerson(@eomonth)
    order by SalesPersonID;

    This doesn’t need to look up the TotalDue field, which makes a simpler plan.

    select *
    from dbo.GetMonthSummaryPerSalesPerson(@eomonth)
    where SalesPersonID is not null
    order by SalesPersonID;

    This one can avoid having to do the work on the rows that don’t have a SalesPersonID value, pushing the predicate into the Index Seek rather than filtering the results that come back to the report.

    If we had joins involved, we might see some of those being simplified out. We also get the ability to include query hints in individual reports. We shift from having a single-use stored procedure to having a reusable stored query – and isn’t that one of the main points of modularisation?

    Stored procedures in Reporting Services are just a bit limited for my liking. They’re useful in plenty of ways, but if you insist on using stored procedures all the time rather that queries that use functions – that’s rubbish.

    @rob_farley

  • SQLRally Nordic gets underway

    PASS is becoming more international, which is great.

    The SQL Community has always been international – it’s not as if data is only generated in North America. And while it’s easy for organisations to have a North American focus, PASS is taking steps to become international. Regular readers will be aware that I’m one of three advisors to the PASS Board of Directors, with a focus on developing PASS as a more global organisation.

    With this in mind, it’s great that today is Day 1 of SQLRally Nordic, being hosted in in Sweden – not only a non-American country, but one that doesn’t have English as its major language.

    The event has been hosted by the amazing Johan Åhlén and Raoul Illyés, two guys who I met earlier this year, but the thing that amazes me is the incredible support that this event has from the SQL Community.

    It’s been sold out for a long time, and when you see the list of speakers, it’s not surprising.

    Some of the industry’s biggest names from Microsoft have turned up, including Mark Souza (who is also a PASS Director), Thomas Kejser and Tobias Thernström. Business Intelligence experts such as Jen Stirrup, Chris Webb, Peter Myers, Marco Russo and Alberto Ferrari are there, as are some of the most awarded SQL MVPs such as Itzik Ben-Gan, Aaron Bertrand and Kevin Kline. The sponsor list is also brilliant, with names such as HP, FusionIO, SQL Sentry, Quest and SolidQ complimented by Swedish companies like Cornerstone, Informator, B3IT and Addskills.

    As someone who is interested in PASS becoming global, I’m really excited to see this event happening, and I hope it’s a launch-pad into many other international events hosted by the SQL community.

    If you have the opportunity, thank Johan and Raoul for putting this event on, and the speakers and sponsors for helping support it. The noise from Twitter is that everything is going fantastically well, and everyone involved should be thoroughly congratulated!

    @rob_farley

  • When is a SQL function not a function?

    Should SQL Server even have functions? (Oh yeah – this is a T-SQL Tuesday post, hosted this month by Brad Schulz) TSQL2sDay150x150

    Functions serve an important part of programming, in almost any language. A function is a piece of code that is designed to return something, as opposed to a piece of code which isn’t designed to return anything (which is known as a procedure). SQL Server is no different. You can call stored procedures, even from within other stored procedures, and you can call functions and use these in other queries. Stored procedures might query something, and therefore ‘return data’, but a function in SQL is considered to have the type of the thing returned, and can be used accordingly in queries. Consider the internal GETDATE() function.

    SELECT GETDATE(), SomeDatetimeColumn
    FROM dbo.SomeTable;

    There’s no logical difference between the field that is being returned by the function and the field that’s being returned by the table column. Both are the datetime field – if you didn’t have inside knowledge, you wouldn’t necessarily be able to tell which was which. And so as developers, we find ourselves wanting to create functions that return all kinds of things – functions which look up values based on codes, functions which do string manipulation, and so on.

    But it’s rubbish.

    Ok, it’s not all rubbish, but it mostly is. And this isn’t even considering the SARGability impact. It’s far more significant than that. (When I say the SARGability aspect, I mean “because you’re unlikely to have an index on the result of some function that’s applied to a column, so try to invert the function and query the column in an unchanged manner”)

    I’m going to consider the three main types of user-defined functions in SQL Server:

    • Scalar
    • Inline Table-Valued
    • Multi-statement Table-Valued

    I could also look at user-defined CLR functions, including aggregate functions, but not today. I figure that most people don’t tend to get around to doing CLR functions, and I’m going to focus on the T-SQL-based user-defined functions.

    Most people split these types of function up into two types. So do I. Except that most people pick them based on ‘scalar or table-valued’. I’d rather go with ‘inline or not’.

    If it’s not inline, it’s rubbish. It really is.

    Let’s start by considering the two kinds of table-valued function, and compare them.

    These functions are going to return the sales for a particular salesperson in a particular year, from the AdventureWorks database.

    CREATE FUNCTION dbo.FetchSales_inline(@salespersonid int, @orderyear int)
    RETURNS TABLE AS 
    RETURN (
        SELECT e.LoginID as EmployeeLogin, o.OrderDate, o.SalesOrderID
        FROM Sales.SalesOrderHeader AS o
        LEFT JOIN HumanResources.Employee AS e
        ON e.EmployeeID = o.SalesPersonID
        WHERE o.SalesPersonID = @salespersonid
        AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101')
        AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101')
    )
    ;
    GO

    CREATE FUNCTION dbo.FetchSales_multi(@salespersonid int, @orderyear int)
    RETURNS @results TABLE (
        EmployeeLogin nvarchar(512),
        OrderDate datetime,
        SalesOrderID int
        )
    AS
    BEGIN
        INSERT @results (EmployeeLogin, OrderDate, SalesOrderID)
        SELECT e.LoginID, o.OrderDate, o.SalesOrderID
        FROM Sales.SalesOrderHeader AS o
        LEFT JOIN HumanResources.Employee AS e
        ON e.EmployeeID = o.SalesPersonID
        WHERE o.SalesPersonID = @salespersonid
        AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101')
        AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101')
        ;
        RETURN
    END
    ;
    GO

    You’ll notice that I’m being nice and responsible with the use of the DATEADD function, so that I have SARGability on the OrderDate filter.

    Regular readers will be hoping I’ll show what’s going on in the execution plans here. Here I’ve run two SELECT * queries with the “Show Actual Execution Plan” option turned on.

    image

    Notice that the ‘Query cost’ of the multi-statement version is just 2% of the ‘Batch cost’. But also notice there’s trickery going on. And it’s nothing to do with that extra index that I have on the OrderDate column.

    Trickery.

    Look at it – clearly, the first plan is showing us what’s going on inside the function, but the second one isn’t. The second one is blindly running the function, and then scanning the results. There’s a Sequence operator which is calling the TVF operator, and then calling a Table Scan to get the results of that function for the SELECT operator. But surely it still has to do all the work that the first one is doing...

    To see what’s actually going on, let’s look at the Estimated plan.

    image

    Now, we see the same plans (almost) that we saw in the Actuals, but we have an extra one – the one that was used for the TVF. Here’s where we see the inner workings of it. You’ll probably recognise the right-hand side of the TVF’s plan as looking very similar to the first plan – but it’s now being called by a stack of other operators, including an INSERT statement to be able to populate the table variable that the multi-statement TVF requires.

    And the cost of the TVF is 57% of the batch!

    But it gets worse.

    Let’s consider what happens if we don’t need all the columns. We’ll leave out the EmployeeLogin column.

    image

    Here, we see that the inline function call has been simplified down. It doesn’t need the Employee table. The join is redundant and has been eliminated from the plan, making it even cheaper. But the multi-statement plan runs the whole thing as before, only removing the extra column when the Table Scan is performed.

    A multi-statement function is a lot more powerful than an inline one. An inline function can only be the result of a single sub-query. It’s essentially the same as a parameterised view, because views demonstrate this same behaviour of extracting the definition of the view and using it in the outer query. A multi-statement function is clearly more powerful because it can contain far more complex logic.

    But a multi-statement function isn’t really a function at all. It’s a stored procedure. It’s wrapped up like a function, but behaves like a stored procedure. It would be completely unreasonable to expect that a stored procedure could be simplified down to recognise that not all the columns might be needed, but yet this is part of the pain associated with this procedural function situation.

    The biggest clue that a multi-statement function is more like a stored procedure than a function is the “BEGIN” and “END” statements that surround the code. If you try to create a multi-statement function without these statements, you’ll get an error – they are very much required. When I used to present on this kind of thing, I even used to call it “The Dangers of BEGIN and END”, and yes, I’ve written about this type of thing before in a similarly-named post over at my old blog.

    Now how about scalar functions...

    Suppose we wanted a scalar function to return the count of these.

    CREATE FUNCTION dbo.FetchSales_scalar(@salespersonid int, @orderyear int)
    RETURNS int
    AS
    BEGIN
        RETURN (
           
    SELECT COUNT(*)
            FROM Sales.SalesOrderHeader AS o
            LEFT JOIN HumanResources.Employee AS e
            ON e.EmployeeID = o.SalesPersonID
            WHERE o.SalesPersonID = @salespersonid
            AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101')
            AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101')
        );
    END
    ;
    GO

    Notice the evil words? They’re required. Try to remove them, you just get an error.

    That’s right – any scalar function is procedural, despite the fact that you wrap up a sub-query inside that RETURN statement. It’s as ugly as anything. Hopefully this will change in future versions.

    Let’s have a look at how this is reflected in an execution plan. Here’s a query, its Actual plan, and its Estimated plan:

    SELECT e.LoginID, y.year, dbo.FetchSales_scalar(p.SalesPersonID, y.year) AS NumSales
    FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year)
    CROSS JOIN Sales.SalesPerson AS p
    LEFT JOIN HumanResources.Employee AS e
    ON e.EmployeeID = p.SalesPersonID;

    image

    image

    We see here that the cost of the scalar function is about twice that of the outer query. Nicely, the query optimizer has worked out that it doesn’t need the Employee table, but that’s a bit of a red herring here. There’s actually something way more significant going on.

    If I look at the properties of that UDF operator, it tells me that the Estimated Subtree Cost is 0.337999.

    image

    If I just run the query SELECT dbo.FetchSales_scalar(281,2003); we see that the UDF cost is still unchanged.

    image

    You see, this 0.0337999 is the cost of running the scalar function ONCE.

    But when we ran that query with the CROSS JOIN in it, we returned quite a few rows. 68 in fact. Could’ve been a lot more, if we’d had more salespeople or more years.

    image

    And so we come to the biggest problem. This procedure (I don’t want to call it a function) is getting called 68 times – each one between twice as expensive as the outer query. And because it’s calling it in a separate context, there is even more overhead that I haven’t considered here.

    The cheek of it, to say that the Compute Scalar operator here costs 0%! I know a number of IT projects that could’ve used that kind of costing method, but that’s another story that I’m not going to go into here.

    Let’s look at a better way.

    Suppose our scalar function had been implemented as an inline one. Then it could have been expanded out like a sub-query. It could’ve run something like this:

    SELECT e.LoginID, y.year,
    (SELECT COUNT(*)
        FROM Sales.SalesOrderHeader AS o
        LEFT JOIN HumanResources.Employee AS e
        ON e.EmployeeID = o.SalesPersonID
        WHERE o.SalesPersonID = p.SalesPersonID
        AND o.OrderDate >= DATEADD(year,y.year-2000,'20000101')
        AND o.OrderDate < DATEADD(year,y.year-2000+1,'20000101')
        ) AS NumSales
    FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year)
    CROSS JOIN Sales.SalesPerson AS p
    LEFT JOIN HumanResources.Employee AS e
    ON e.EmployeeID = p.SalesPersonID;

    image

    Don’t worry too much about the Scan of the SalesOrderHeader underneath a Nested Loop. If you remember from plenty of other posts on the matter, execution plans don’t push the data through. That Scan only runs once. The Index Spool sucks the data out of it and populates a structure that is used to feed the Stream Aggregate. The Index Spool operator gets called 68 times, but the Scan only once (the Number of Executions property demonstrates this).

    Here, the Query Optimizer has a full picture of what’s being asked, and can make the appropriate decision about how it accesses the data. It can simplify it down properly.

    To get this kind of behaviour from a function, we need it to be inline. But without inline scalar functions, we need to make our function be table-valued. Luckily, that’s ok.

    CREATE FUNCTION dbo.FetchSales_inline2(@salespersonid int, @orderyear int)
    RETURNS table
    AS
    RETURN (SELECT COUNT(*) as NumSales
        FROM Sales.SalesOrderHeader AS o
        LEFT JOIN HumanResources.Employee AS e
        ON e.EmployeeID = o.SalesPersonID
        WHERE o.SalesPersonID = @salespersonid
        AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101')
        AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101')
    );
    GO

    But we can’t use this as a scalar. Instead, we need to use it with the APPLY operator.

    SELECT e.LoginID, y.year, n.NumSales
    FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year)
    CROSS JOIN Sales.SalesPerson AS p
    LEFT JOIN HumanResources.Employee AS e
    ON e.EmployeeID = p.SalesPersonID
    OUTER APPLY dbo.FetchSales_inline2(p.SalesPersonID, y.year) AS n;

    And now, we get the plan that we want for this query.

    image

    All we’ve done is tell the function that it’s returning a table instead of a single value, and removed the BEGIN and END statements. We’ve had to name the column being returned, but what we’ve gained is an actual inline simplifiable function. And if we wanted it to return multiple columns, it could do that too. I really consider this function to be superior to the scalar function in every way.

    It does need to be handled differently in the outer query, but in many ways it’s a more elegant method there too. The function calls can be put amongst the FROM clause, where they can then be used in the WHERE or GROUP BY clauses without fear of calling the function multiple times (another horrible side effect of functions).

    So please. If you see BEGIN and END in a function, remember it’s not really a function, it’s a procedure. And then fix it.

    @rob_farley

  • Table-valued parameters in SQL 2005

    Can’t be done? Oh, but it can. Let me show you.

    Just quickly, a Table-Valued Parameter is a useful thing introduced in SQL 2008 that lets you have a read-only parameter which is a table type, passed into a stored procedure. To use it you need to have a user-defined table type, so that you can define what is going to be passed in. You can read about them at http://msdn.microsoft.com/en-us/library/bb510489.aspx

    The 2008 code looks something like this:

    /* First create a database to play in */
    create database TVPDemo;
    go
    use TVPDemo;
    go

    /* And create a table with some sample data. I’m getting mine from AdventureWorks */
    select ProductID, Name as ProductName, ListPrice, ProductSubcategoryID
    into dbo.Products
    from AdventureWorks.Production.Product;
    go

    /* Now for the real stuff – create a table type */
    create type dbo.NewProducts as table
    (ProductName nvarchar(100) collate Latin1_General_CI_AS
    ,ListPrice money
    ,SubCategory int
    ,DeleteMe bit
    );
    go

    /* And a stored procedure which uses this table type */
    create procedure dbo.MaintainProducts(@NewProducts dbo.NewProducts readonly) as
    begin
    /* Obviously we could MERGE – that’d work nicely here. But I want 2005 features */

       /* Update some products into Products table */  
       update p set ListPrice = n.ListPrice, ProductSubcategoryID = n.SubCategory
       from dbo.Products p
       join @NewProducts n
       on n.ProductName = p.ProductName
       where n.DeleteMe = cast(0 as bit);

       /* Insert some */  
       insert dbo.Products (ProductName, ListPrice, ProductSubcategoryID)
       select n.ProductName, n.ListPrice, n.SubCategory
       from @NewProducts n
       where not exists (select * from dbo.Products p where p.ProductName = n.ProductName)
       and n.DeleteMe = cast(0 as bit);

       /* And delete some */  
       delete p
       from dbo.Products p
       join @NewProducts n
       on n.ProductName = p.ProductName
       where n.DeleteMe = cast(1 as bit);

       /* Now list them all, returning this to the client */ 
       select *
       from dbo.Products;

    end
    go

    /* Now let’s familiarise ourselves with what’s in Product now */
    select *
    from dbo.Products;

    /* And do some maintenance on it. We create a table variable of the appropriate type, populate it and call the proc */
    declare @SomeNewProducts dbo.NewProducts;
    insert @SomeNewProducts (ProductName, ListPrice, SubCategory, DeleteMe)
    select 'Blade', 0.1, 1, 0
    union all
    select 'Blade2', 0.1, null, 0
    union all
    select 'Bearing Ball', 1, 2, 1
    ;

    exec dbo.MaintainProducts @SomeNewProducts;

    /*
    When we ran this stored procedure, the latest version of dbo.Products was outputted, so we can clearly see the new record, and the absence of the one we deleted.
    Lovely
    */

    But this wasn’t possible in SQL 2005. We didn’t have user-defined table types, and we certainly didn’t have table-valued parameters.

    Except that we could still do something very similar. This was something I’d taken for granted, but when I showed this to someone at the PASS Summit, and then someone else, I got persuaded to write a blog post on it.

    If you haven’t seen this idea before, I’m sure you’ll kick yourself. It’s remarkably simple, but I think it’s quite powerful. Like I said – I’d taken it for granted.

    The idea is this: make a VIEW with an INSTEAD OF trigger, using the inserted table instead of the table variable.

    That INSTEAD OF trigger is essentially where your stored procedure is kept. A trigger is still a procedure, it’s just not stored in the traditional list of stored procedures. But it will act just like one.

    As for the view – that can just be a placeholder. Think of it as simply defining the columns you need to handle. You don’t need a FROM clause, and you don’t even need any rows to come back. I like to put a contradiction in there so that I don’t think there’s any real values coming out.

    So a trigger doesn’t take a table-valued parameter, but it can leverage the inserted and deleted tables that are available in triggers. For us, we’re just interested in the former. Have a look at the code, and you’ll see what I mean.

    This code can run on SQL 2005 (well, it can also run on later versions, but that’s less important).

    /* First let’s set up a new database, just like we did in SQL 2008*/
    create database TVPDemo;
    go
    use TVPDemo;
    go

    select ProductID, Name as ProductName, ListPrice, ProductSubcategoryID
    into dbo.Products
    from AdventureWorks.Production.Product;
    go

    /* Here’s the tricky bit. Make a view. Focus on the columns. I put WHERE 0=1 in, just to make it cleaner */
    create view dbo.NewProducts as
    select
        cast(N'' as nvarchar(100)) collate SQL_Latin1_General_CP1_CI_AS as ProductName,
        cast(0 as money) as ListPrice,
        cast(0 as int) as SubCategory,
        cast(0 as bit) as DeleteMe
    where 0=1
    ;
    go

    /* This trigger contains the same code as in the 2008 stored procedure.
    * But instead of having a table variable, we use the inserted table.
    */
    create trigger dbo.MaintainProducts on dbo.NewProducts instead of insert as
    begin
       /* Update some products into Products table */  
       update p set ListPrice = i.ListPrice, ProductSubcategoryID = i.SubCategory
       from dbo.Products p
       join inserted i
       on i.ProductName = p.ProductName
       where i.DeleteMe = cast(0 as bit);

       /* Insert some */  
       insert dbo.Products (ProductName, ListPrice, ProductSubcategoryID)
       select i.ProductName, i.ListPrice, i.SubCategory
       from inserted i
       where not exists (select * from dbo.Products p where p.ProductName = i.ProductName)
       and i.DeleteMe = cast(0 as bit);

       /* And delete some */  
       delete p
       from dbo.Products p
       join inserted i
       on i.ProductName = p.ProductName
       where i.DeleteMe = cast(1 as bit);

      /* Now list them all, returning this to the client */
       select *
       from dbo.Products;

    end
    go

    /* Look what’s in there now */
    select *
    from dbo.Products;

    /* Remember there’s never anything in here */
    select *
    from dbo.NewProducts;

    /* Now we simply insert into our view. As we do, the trigger runs immediately and makes the changes */
    insert dbo.NewProducts (ProductName, ListPrice, SubCategory, DeleteMe)
    select 'Blade', 0.1, 1, 0
    union all
    select 'Blade2', 0.1, null, 0
    union all
    select 'Bearing Ball', 1, 2, 1

    So there you have it – a useful TVP equivalent in versions prior to SQL 2008. I get that I’m probably writing this post about ten years too late. Sorry about that.

    But if you’re not fond of the idea of having to declare and populate a table variable, then perhaps this idea is for you. This method will support any type of inserting, whether it’s row-by-row, or the results of a single SELECT statement. One day though, TVPs won’t be READONLY any more (this doesn’t seem to be the case for SQL Server 2012 unfortunately), and when that happens, you’ll want to definitely be using TVPs.

    @rob_farley

  • Highlights and Lowlights of PASS Summit 2011

    This was a proper big week.

    The PASS Summit ran from Tuesday night to Friday, but I’d arrived in America the Friday before. So by the time it actually started, I had that strange feeling that things were wrapping up. My calendar was ridiculously full. The stuff that I was aware of ahead of time looked like this:

    Friday 7th: Arrive in America. Travel to Portland. Speaker Dinner.

    Saturday 8th: SQL Saturday #92 (two sessions to give, plus a song performance with Buck during the morning break). Charity dinner for The Leukemia & Lymphoma Society (whom LobsterPot sponsor).

    Sunday 9th: Walk the Portland Half Marathon. Travel to Seattle. Collect my kilt. Register for the Summit.

    Monday 10th: Early morning prayer meeting. Deliver a pre-conference seminar. Insiders Dinner in evening.

    Tuesday 11th: Early morning prayer meeting. Meeting about SQL Saturday. PASS Board Meeting. Insiders Day at the Microsoft Campus. Opening Night Party, including being a Quiz Bowl contestant. Speaker/Volunteer Dinner.

    Wednesday 12th: Early morning prayer meeting. Chapter leaders’ meeting. Chapter lunch. Book signing. Lightning Talk to deliver (my song). Global Growth meeting. Exhibitors’ party. Parties for SQL People, SQL Sentry, SolidQ and SQLKaraoke.

    Thursday 13th: Early morning prayer meeting. WIT Lunch. Spotlight session to deliver. Redgate Dinner. Party at Gameworks.

    Friday 14th: Early morning prayer meeting. More book signing. Board Q&A session. Board photos.

    Saturday 15th: Sleep in and fly home.

    That’s the short version. Really. There were a lot of other things that were squeezed in – in particular, the fact that I had promised Audrey Hammonds about six months ago that I would run through her presentation with her, and one other thing that became somewhat significant: THE FACT THAT ALLEN KINSEL ARRANGED FOR ME TO PERFORM MY SONG DURING THE FRIDAY KEYNOTE.

    Readers of my blog (and plenty of other blogs) will know by now that I played my guitar and sang at the start of the Friday Keynote. Buck Woody accompanying me with his guitar and backing vocals. What you might not know is that this only got arranged late on Wednesday night. Allen had seen me perform it (again with Buck) during the Lightning Talks on Wednesday afternoon, and by the time the night was over, Buck and I were booked in to perform it in front of the three and a half thousand delegates at the biggest keynote of the week. I’ve played and sang before a couple of hundred before, but never even close to that many. To say I had mixed emotions would’ve been an understatement. I didn’t hesitate to say yes, and was excited, but was also phenomenally nervous.

    As a late entry to a tight schedule, we got two minutes only and had to cut the song short. We didn’t sing the bridge section, so stopped after just Verse Chorus Verse Chorus. It was tremendous fun, and I loved seeing faint glows of phones being waved around in the crowd. I desperately wanted to look into the crowd as I was walking off stage, but my nerves and the fear of tripping on the guitar lead froze me completely and I didn’t turn my head at all. I was told later that there was a standing ovation – but I was just full of emotion, and so tired. So tired. It was my own fault – I’d gone out with Microsoft people after the various parties, and when they’d all gone to bed for a couple of hours around 5:30am, I’d got myself ready to lead a quick song at the 6am prayer meeting head off to my 6:15am sound check, prior to the 7:15am book signing. I got through the day (although I fell asleep for a few minutes during Audrey’s session, which I will need to watch to see how she did). I eventually got to bed around 10:30pm Friday night, and about twelve hours later managed to get up with just enough time to pack and check out of the hotel for noon.

    But this wasn’t the highlight for me.

    Nor was the highlight the fact that my level of involvement was so much greater, now that I’m a Board Member. I have to admit that I always get so much more out of events like this if I’m involved. I appreciate that I can get to more sessions if I’m not already laden with other entries on my calendar, but I would rather serve others wherever possible. I’m still new on the board, but as an advisor, I’m hoping to be able to influence things like the Summit more and more next year, and maybe there will be a few areas in which we can find opportunity to improve it still.

    Serving is great – but wasn’t the highlight.

    The highlight was the people. It always is, and it always will be. Right from the moment I arrived in America and tracked down John & Yanni Robel and Jes Borland. Arriving in Portland and seeing Jeremiah & Kendra, Buck Woody, Tim Ford, Erin Stellato and many more at the SQLSaturday events. Participating in the Portland Half Marathon with Erin, Jes, Yanni, Brent Ozar, Karen Lopez, Doug Lane (who did 10km) and Allen White (who did the full marathon), and a lots of people wearing the LobsterPot logo on their shirts. And despite not being able to run (I physically can’t run because of an old back injury, so I just walked it), I didn’t even finish last of the half marathoners! For all the extra pain I felt because my back isn’t really up to walking 21km in 3 hours, it was worth it. I’m especially proud of Yanni, who has achieved amazing things despite having a nasty blood cancer.

    Despite Sunday being a ridiculously early start and a very long day, each day started with 6am prayer meetings. But this gave the opportunity to start each day on a good note, meet even more people (like Matt “Mrs” Slocum), and to figure out some of why I always click so well with friends like Stacia Misner.

    Even the time I spent moving from one spot to another was a chance to see people and grow friendships deeper. I think of the night I was heading back to my hotel after the walk back from one of the parties had gone past the other hotels, and I bumped into the “Damn Strates” (Jason & Sarah), only to spend an hour talking with them about a number of different things. Or the time spent talking with one of the industry’s newest and brightest lights Jes Borland, who thinks she can out-talk me. Or the time with fellow board members JRJ, MarkS & Lara, which is time I can’t help but enjoy as the conversations switch over and over from flippant to serious and back again.

    Oh, and the lowlight was losing my prescription sunglasses. That’s a real pain.

    The SQL community has something very special, that other communities should be jealous of. These people genuinely love each other, and it’s really good to see and to be part of it. I’ve seen the same at SQLBits, but not in many other contexts.

    @rob_farley

    PS: Apologies to everyone for not having links on your names. I wrote this on a plane without connectivity.

  • Supporting Leukemia Research at the Portland Half Marathon

    My company LobsterPot Solutions gives quite a bit to charity. Sometimes it’s working for reduced rates (or even free), but sometimes we give money directly.

    A Seattle-based friend of mine has CML – one of the types of leukemia, and when I found out she was raising money for the Leukemia & Lymphoma Society by running marathons through their Team In Training program, I wanted to help. LobsterPot donated money to her cause, and I had expected that might be the end of the matter.

    Later, I was contacted for my company logo. It seems that the amount we gave was enough to have the logo printed on the backs of shirts. In fact, the LobsterPot logo is going to be worn by 170 people at races all season, including the Portland Half Marathon. As I was going to be within a few hours’ drive of Portland on the day, I decided to check it out for myself.image

    I even entered the event – the Portland Half Marathon 2011.

    I can’t run. I had a back injury in 2003, which has stopped me being able to cope with the impact of running. But I walked it, in a purple shirt provided by the Leukemia & Lymphoma Society with the LobsterPot logo on the back. It took me over three hours to get around the distance, and although walking is good for my back, I was still very sore at the end.

    Walking that course though, all 21km of it, I saw lots of purple shirts with LobsterPot claws on them, the wearers all encouraging each other to stay strong, and to finish. Many of these people, including my friend Yanni, are leukemia survivors, and it was inspiring.

    So now I have a challenge for you. If you are a business owner, then find ways to give back. It doesn’t have to be much, but it should be something. If you don’t know anyone you can sponsor for this type of event, visit Yanni’s fundraising page at http://tinyurl.com/HelpYanni. She’s running a full marathon in January, and has just started raising funds.

    Get yourself over to http://tinyurl.com/HelpYanni and see how to donate to her. Who knows – if you’re a company there could be a hidden marketing value there. Three years ago she was told she had an incurable blood cancer. Just over a week ago, she completed a half-marathon. Isn’t that the kind of spirit you want your company to reward?

    But don’t do it for yourself, do it for Yanni and those like her.

    image

  • I should've looked the other way

    The words for that song I did at the PASS Summit 2011 are as follows. On the Friday, I stopped where the bridge starts. Various recordings of it are making it to YouTube, such as here, where the song starts around 2:20 in.

    I should've looked the other way

    Verse 1:
    My query sucks - it takes too long
    So long I wrote this song
    The plan's not big - it ain't a giant
    And yet I have an angry client
    Performance now has made her weary
    So I've come in to fix her query
    I promise I won't ever fail her
    Say "Trust me, love, I'm from Australia!"

    Chorus:
    I need to find you
    But I don't want to search every row
    My predicate's residual
    My seek just runs too slow
    I thought I'd caught a glimpse of you
    Been searching for all day
    But all along, I'd done it wrong
    I should've looked the other way

    Verse 2:
    A trace is on, I know the reads
    That fetch the bytes the query needs
    There's spooling from a CTE
    They've got recursion needlessly
    I need to dig a little further
    I worry there might be a cursor
    The DBA has the plan_handle
    He says it's not corrupt, he knows Paul Randal!

    Repeat chorus

    Bridge:
    There is an index covering predicates with keys
    But my developer has used inequalities
    There is a range scan
    Hiding truth
    Hiding cost
    Hiding you...

    Repeat chorus x2

    I should've looked the other way

    © 2011 Rob Farley ;)

    @rob_farley

  • Data, Information and Knowledge

    Hopefully my connection is slightly better during today’s keynote than it was during yesterday’s, when “Live Blogging” didn’t really cut it. The PASS staff saw the problem and have resolved it (thanks guys!).

    Quentin Clark has jumped on stage to talk some more about SQL Server 2012, and he started with the expression “Data, Information and Knowledge”. I love this – I see Business Intelligence about extracting information from data, and it’s good to have Microsoft see this priority across the whole SQL platform.

    He’s also talking about the 12 biggest features of SQL Server 2012, which he says has more new features than any release of SQL Server yet.

    1. Required 9s

    • Integration Services as a Server
    • HA for StreamInsight
    • SQL Server AlwaysOn.

    SQL Server has seen uptime as a key component for a long time, but to provide High Availability for StreamInsight is particularly significant. StreamInsight involves being able to consume data at significant rates, being able to run queries against that data while it’s still on the move – before it’s even reached the relational database. High Availability for StreamInsight should be able to better provide strategies to ensure that streaming data need not be lost. Businesses suffer badly when they lose data. SQL Server 2012 should be able to reduce this problem almost completely.

    2. Blazing-Fast Performance

    • Performance Enhancements – RDBMS, SSAS, SSIS
    • ColumnStore Index

    3. Rapid Data Exploration
    4. Managed Self-Service BI

    • Power View + PowerPivot
    • Administration from SharePoint
    • Reporting Alerts

    Yes, “Power View” has a space in it. It’s the new name for Crescent, which is about self-service reporting using a Silverlight experience. I’m all for allowing users to interact with the data in powerful ways, but I’m also concerned about how to manage this. SharePoint seems to continue as the main platform for this, and although I’d love to see the administration of these reports be done inside SQL itself (instead of SharePoint), I get that SharePoint is currently the platform of choice.

    5. Credible, Consistent Data

    • BI Semantic Model
    • Data Quality Services
    • Master Data Services

    I’m not going to comment on this stuff right now – it’s been talked about plenty already, but the enhancements definitely look good. The new stuff around DQS lets you fix up data at a number of extra points and have it pushed back into the underlying warehouse.

    6. Organisational Compliance

    • Expanded Audit – User-defined, Filtering
    • User-defined Server Roles

    This is useful. Audit is one of the massive things, and yet an administrator has always been able to turn it off. With the ability to have user-defined server roles, auditing turns into a much more real option.

    7. Peace of Mind

    • Production-simulated Application Testing
    • System Center Advisor & Management Packs
    • Expanded Support – Premier Mission Critical

    The Replay tools have become Distributed Replay, which is introduces a ton of really good options.

    8. Scalable Data Warehousing
    9. Fast Time to Solution

    • SQL Server Appliances – Optimised and Pre-tuned
    • HW + SW + Support – Just Add Power
    • Choice of Hardware

    Appliances are stepping up. Buying an appliance, adding the network connection and electricity, and it’s ready to accept data in twenty minutes. The number of new options available suggests that the future of hardware buying will be even more focused on the appliance concept.

    10. Extend Any Data, Anywhere

    • Greater Interoperability – new drivers for PHP, Java & Hadoop
    • ODBC Drivers for Linux & Change Data Capture for SSIS & Oracle
    • Beyond Relational: FileTable, 2D Spatial, Semantic Search

    I really like the idea of CDC for Oracle, although I doubt there will be any

    11. Optimised Productivity

    • SQL Server Data Tools (formerly “Juneau”)
    • Unified Across Database & BI
    • Deployment & Targeting Freedom

    12. Scale on Demand

    • AlwaysOn
    • Deployment Across Public & Private
    • Elastic Scale

    And of course, many of these items contribute to make a much more cloud-ready platform. Cloud isn’t for everyone, but Microsoft are certainly making steps to make it a more feasible option.

    And we’re over – hitting publish now.

    @rob_farley

  • Mashing up data

    In the PASS Summit Keynote (Day 1), they’re demonstrating tools to join data from Excel, Marketplace, SQL Azure, using the “Data Explorer”.

    The thing that I’m liking about this is that you can specify that you want to do a lookup or a merge. Both joins, but two different types. I like this because when I write joins in T-SQL queries, I tend to have it in my head whether I’m expecting a Nested Loop (lookup) or a Merge Join. Seems Data Explorer lets me decide that right up front.

    So you can pull data down from all kinds of different places, and publish it back out through the Marketplace. This means you can enhance other data sets, which other people can then enhance further, and this can continue to make some incredible data sources that can be consumed in OData just about anywhere.

More Posts Next page »
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement