SQL Clause – Musings on all things data platform

Why I use Memory Optimized Tables (MOT)

Post author By johanlb
Post date August 13, 2019

Boy, that was a long hiatus…

I suddenly realized this month’s #TSQL2SDAY was on a topic I felt I could finally get back to blogging on.

Back in December 2016 I wrote a blog post on a project I had used in-memory OLTP , or now better known as memory optimized tables. And to this day I rarely meet anyone who uses the technology, despite having held plenty of Hands-On Labs for Microsoft where MOT is part of the lab.
My first try at MOT was a pretty specific use case – I had a 20+ million row data set that I needed to

My first try at MOT was a pretty specific use case – I had a 26+ million row data set that I needed to do look-up against to find out if a specific person (anonymized ID) passed through a specific point in an airport as a domestic transfer passenger (starting their travel at another Norwegian airport the same day), as an international transfer passenger or starting their journey from the same airport. Doing look-ups against a regular table caused the expected locks and took forever. Using temp tables was OK, but was still slow. For this use case, MOT was perfect! Not only did I not have to worry about locks, but the hash index that only exists for memory optimized tables works great for single value lookups – which the anonymized passenger ID is. So I got a huge speed increase – the runtime was only 1/4 of the original.

Now, this was the only time I used MOT for a while. Until earlier this year, when I was re-designing an Azure DB. Unlike last time when I implemented MOT as part of an ETL flow, this time I was working on a regular table. The table in question stores events on bags coming in, and there are 6-10 different events per bag depending on where it’s headed. And most of these events have unique timestamps or values that are not present in the other events. So, I need to aggregate or look through the various events per bag to find the information I need. The different values and aggregations are presented in views that then are loaded into an Analysis Services model which is updated every hour. And again, I run into waits due to locks which slows down the whole process from inserting events to processing the model.

So, to speed up the processing and since I was upgrading the server anyways I changed the table to a memory optimized table. It is a lot easier to implement this in Azure than it was back on my 2016-box, but of course you don’t have any control of the hardware so you can’t upgrade just memory – you need to go up in pricing tiers instead. I do have other tables with the same issue, but analyzing the different queries I noticed that the bags table was the biggest bottleneck. And after fiddling around with different scenarios, I ended up setting the server to a size that allowed me to make the bags table a MOT, and still keep the server costs within budget.

I can’t boast of a performance increase as big as the one I got out of my first project, but performance improved by roughly 1/3. So all in all a decent improvement.

MOT is still not a feature of SQL Server that is for everyone, but for scenarios like the ones above and with the improvements that have been introduced in 2017 and above I feel more people ought to be aware of the benefits it brings.

Thanks for the invite, Steve. I hope people enjoy this blog post.

Speaking at SQLNexus and SQLGrillen!

Post author By johanlb
Post date February 4, 2017

Whoa, we’re allready in February!
After a great SQLSatSlovenia – fantastic city and awesome event by the way – I decided to submit some sessions for SQLNexus (2/5-3/5-17) for the fun of it.

And out of the blue, I got an e-mail last week telling me that my session ” Don’t cross the streams! (A closer look at Stream Analytics)” had been selected! Azure Stream Analytics is a subset of the T-SQL language, which makes for annoying limitations at times for people used to T-SQL. Yet since it has a very specific role to play, this session is on trying to show you how to make the best use of the language.
I’m ever so slightly nervous, yet also very happy to get the chance to deliver this session. SQLNexus is one of – if not the largest MS Data Platform events in the Nordics.

And to make this first 6 months of 2017 even better, I recently got the news that my session on introduction to Azure IOT was accepted to SQLGrillen on 2/6-17, a German event organized by the allways enthustiastic William Durkin. I had submitted a session when the call for speakers was announced in October, and was really pleased when I was accepted. They went from 10 submissions last year to about 190 this year, so being one of the few to be able to speak here was humbling. And hey, who doesn’t want to attend an event with the motto “Databases, Bratwurst & Beer”?
The event is even sold out allready!

I’ve submitted sessions to a few other events this spring as well, but so far these two are the only ones I’ll be speaking at. Time will tell if this changes. 🙂 Just being lucky enough to speak at 2 events is fantastic!

Speaking at SQLSaturday Slovenia!

Post author By johanlb
Post date December 5, 2016

Less than a week to go, but still thought I’d put it out here – my only international speaking engagement this year is in Ljubljana December 10th.
While I’ve done some lightning speeches and closed forum sessions, this year has been pretty calm. Which is good as I’ve had more responsibilities added to my daytime job.

I did the same session as I’ve been selected to hold in Ljubljana for the user group in Oslo last week on December 1st.
An introduction to Azure IoT, Stream Analytics and PowerBI.

It’s a great topic, and I hope people will enjoy the session on Saturday.

http://www.sqlsaturday.com/567/eventhome.aspx

InMemory OLTP for ETL

Post author By johanlb
Post date December 5, 2016

I’ve had a new project at work where I need to do some heavy lookups (lookup table is 26+ million in full load) and the performance in this has been horrible.
I needed to do lookups on the same data set that I’m sending through a data flow in SSIS.
The reason for this is a business case of finding where the people who shopped at a particular store arrived from and are heading to.
The data we have tell us some of this, but also gives hints on the people we don’t know where are coming from or going to.
So, applying a set of business rules to the data and do lookups in several steps based on these rules is what I’m doing.

Now, in a regular disk based database this will end in LOCK conflicts. So sending the data into a temp table or a regular cloned table is the only solution.
This adds to the loading times, and created other resource conflicts. The job got too heavy to function on a full load.
Let’s just say that if it didn’t crash the server, the ssis job took 12+ hours…
I’ve been looking into ways of improving this load, and stumbled across SQL Judo’s post on in memory staging.

As we where in the process of upgrading from SQLServer 2012 to 2016 by the end of September, I decided to investigate further. And the SQL Judo post links to another important tidbit – the new hash index type for in-memory optimized tables that got introduced with SQL Server 2014. It’s supposedly got one job, which is returning a single value, and it does it well.
You can find code samples and an explanation here.

Naturally, the code samples are simple and might not fit what you want to do exactly, but you get the general idea. Just remember the main challenge with this index: You have to have a bucket count that’s large enough for the amount of unique values in your table…

All right, so I start by making a hashed column for what constitutes a unique row in my data set. And then put the hash index on this. You don’t have to create a hashed column, that’s just due to the nature of my data set. You just put it on the column that you want to run the lookup on.

Great! Now I’m ready to start creating my first InMemory table…
Then I get this error:
Cannot create memory optimized tables. To create memory optimized tables, the database must have a MEMORY_OPTIMIZED_FILEGROUP that is online and has at least one container.

Ah, of course – you need to prep the server for InMemory OLTP… Quick search in the BOL reveals the needed steps:

ALTER DATABASE imoltp ADD FILEGROUP imoltp_mod CONTAINS MEMORY_OPTIMIZED_DATA  

ALTER DATABASE imoltp ADD FILE (name='imoltp_mod1', filename='c:\data\imoltp_mod1') 
TO FILEGROUP imoltp_mod

Ok, this time though, we’re good to go.

CREATE TABLE [demo].[TempLookup]
(
	[OLTP_ID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
	[Bnr] [nvarchar](15) COLLATE Danish_Norwegian_CI_AS NULL,
	[Dep_Code] [varchar](5) COLLATE Danish_Norwegian_CI_AS NULL,
	[Initials] [nvarchar](3) COLLATE Danish_Norwegian_CI_AS NULL,
	[Dest_Code] [varchar](5) COLLATE Danish_Norwegian_CI_AS NULL,
	[DateSk] [int] NULL,
	[TimeSk] [int] NULL,
	[platform] [nvarchar](max) COLLATE Danish_Norwegian_CI_AS NULL,
	[Sequence number] [nvarchar](10) COLLATE Danish_Norwegian_CI_AS NULL,
	[EndTimeSk] [int] NULL,
	HASH_Dep [varbinary] NULL  INDEX Ix_Hash_Dep  NONCLUSTERED HASH WITH (BUCKET_COUNT=30000000),
	HASH_Dest [varbinary] NULL INDEX Ix_Hash_Dest NONCLUSTERED HASH WITH (BUCKET_COUNT=30000000),
	HASH_Time [varbinary] NULL INDEX Ix_Hash_Time NONCLUSTERED HASH WITH (BUCKET_COUNT=30000000)





)WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_ONLY )

I tested both SCHEMA_AND_DATA and SCHEMA_ONLY, which are the two versions of InMemory tables available.
SCHEMA_AND_DATA gives you a guarantee that the data won’t be lost if the server goes down, however you loose a bit of speed as the data has to be backed up to disk.
From my tests, the SCHEMA_ONLY gave so much better performance that I decided it was worth ignoring the extra security data retention gave me.
In a staging or transformation step in SSIS, in my experience you rarely have to worry about the data getting lost – just add a load_date filter if you need to – and start the job from scratch.
But this might be important to you, so take a moment to decide on this setting.

Alright, back to my little table I run into this little error:
Msg 12317, Level 16, State 78, Line 20 Clustered indexes, which are the default for primary keys, are not supported with memory optimized tables. Specify a NONCLUSTERED index instead.

Oh yeah, sorry about that… InMemory – or Hekaton if you like – has it’s quirks, or limitations.
Just adding that crucial little “NON” on the primary key and we should be OK.

CREATE TABLE [demo].[TempLookup]
(
	[OLTP_ID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY NONCLUSTERED,
	[Bnr] [nvarchar](15) COLLATE Danish_Norwegian_CI_AS NULL,
	[Dep_Code] [varchar](5) COLLATE Danish_Norwegian_CI_AS NULL,
	[Initials] [nvarchar](3) COLLATE Danish_Norwegian_CI_AS NULL,
	[Dest_Code] [varchar](5) COLLATE Danish_Norwegian_CI_AS NULL,
	[DateSk] [int] NULL,
	[TimeSk] [int] NULL,
	[platform] [nvarchar](max) COLLATE Danish_Norwegian_CI_AS NULL,
	[Sequence number] [nvarchar](10) COLLATE Danish_Norwegian_CI_AS NULL,
	[EndTimeSk] [int] NULL,
	HASH_Dep [varbinary] (50) NULL  ,
	HASH_Dest [varbinary] (50) NULL,
	HASH_Time [varbinary] (50) NULL,


	INDEX Ix_Hash_Dep 
	HASH(HASH_Dep) WITH (BUCKET_COUNT=30000000),
	INDEX Ix_Hash_Dest
	HASH(HASH_Dest) WITH (BUCKET_COUNT=30000000),
	INDEX Ix_Hash_Time
	HASH(HASH_Time) WITH (BUCKET_COUNT=30000000)

)WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_ONLY )



GO

SET ANSI_PADDING OFF
GO

Allright, this did the trick. I now have a table ready for the task at hand.
Now I swap the lookup queries so it uses the hashed column as the lookup value, and start the package.

The result? While running through 26 million rows still takes a while, the job now runs through in about 3 hours. Yes, that’s 1/4 the time it took to run it before. Granted, we did have to double the memory on the server – but we had plans for doing this anyways.
And the job still fails at times, but when it does it fails in the beginning of the job – leaving no error messages in the regular SSIS execution log.
This could be a bug, or it could be a resource conflict. And perhaps warrants another blogpost when I’ve figured it out.
But for now, if it fails, it’s just to restart the job and it will run through within the normal DWH loading window. So all is well.

My first forays into the realm of Powershell

Post author By johanlb
Post date April 17, 2015

Powershell has allways been the realm of IT Pros and DBAs for me. Though I’ve done my share of accidental DBA stuff, I haven’t put any effort into learning Powershell.

I know it’s a powerful scripting language, but there’s just so many things to learn and too few hours in which to learn in.
However, after working in closer relation with IT Pros at clients during regular releases at a couple of clients the last few years, and seeing how much time us developers spend every release in writing as detailed release documents as possible so that IT can do the deployment without too many mistakes I’ve come to realise I need to learn Powershell. If developer deliver deployment setup in a way IT is comfortable with (i.e. as powershell scripts) we save time to do development instead.

So, in February I managed to read enough on blogs to create a simple Powershell script for deploying a multi-dimensional cube. And a co-worker found a script to deploy SSIS ispac-files. The SSAS script simply deploys the XMLA deployment script that the SSAS Deployment wizard creates when you deploy to file instead of to the server. Slightly more work for me, but less for IT. Meaning one less point of confusion and/or failure.

We did run into problems with the SSIS script though. And several different attempts at scripts later still result in the same “file not found” error when starting the SSIS package from SQL Server Agent. I’m guessing there’s an encryption problem, though I can’t replicate the problem when I execute the script on my test server. However – in my experience, most of us develop with the encrypt with User Key setting per default. The deployment wizard changes the encryption settings, so this hasn’t come up as a problem before.

Researching this issue had me stumbling across this blog post from WIDBA, on how to copy SSIS projects from one server to another. This seemed to be a great solution to my problem. It’s supposed to a) create a new folder on the destination server if the folder doesn’t exist and then b) loop through the source server catalog and copy each project it finds to the destination server.

However, I could not get the script to work properly. For some reason it struggles with returning the values for the projects in the folder of the SSISDB catalog. So not only did it fail to create a new folder – or see if one allready existed – but it also failed to loop through projects. And as I’m too much of a freshman to find exactly what needed to be fixed in the code to solve the problem. Instead I had to find a around the problem…
As I know that the folder will allways exsist – this is a production server we’re deploying to after all – I don’t need to verify if it exist or attempt to create it if not.

And since I want us developers to simply deploy the projects ready for release I wanted to have a dedicated folder on our dev. server to make sure that only the approved versions where released – I created a separate source- and deployment folder parameter in the script.

The result looks like this:
$SourceServer = “Development”
$DestinationServer = “Production”
$ProjectFolder = “Deployment”
$destProjectFolder = “SSIS”
[Reflection.Assembly]::LoadWithPartialName(“Microsoft.SqlServer.Management.IntegrationServices”)
$ISNamespace = “Microsoft.SqlServer.Management.IntegrationServices”
Write-Host “Connecting to server …”
# Create a connection to the server
$constr = “Data Source=$SourceServer;Initial Catalog=master;Integrated Security=SSPI;”
$destconstr = “Data Source=$DestinationServer;Initial Catalog=master;Integrated Security=SSPI;”
$con = New-Object System.Data.SqlClient.SqlConnection $constr
$destcon = New-Object System.Data.SqlClient.SqlConnection $destconstr
# Create the Integration Services object
$ssis = New-Object $ISNamespace”.IntegrationServices” $con
$destssis = New-Object $ISNamespace”.IntegrationServices” $destcon
#Grab the SSISDB catalog
$catalog = $ssis.Catalogs | Where-Object{$_.Name -eq “SSISDB”}
$destcatalog = $destssis.Catalogs | Where-Object{$_.Name -eq “SSISDB”}
#Grab the Folders where the project
$curFolder = $catalog.Folders | Where{$_.Name -eq $ProjectFolder}
$destFolder = $destcatalog.Folders | Where{$_.Name -eq $destProjectFolder}
#This deploys everything under the projects
Write-Host “Starting Project Deployments for ” $curFolder.Name
foreach($proj in $curFolder.Projects)
{
if($con.State -eq “Closed”) {$con.Open()};
$cmd = New-Object System.Data.SqlClient.SqlCommand
$cmd.CommandType = “StoredProcedure”
$cmd.connection = $con
$cmd.CommandText = “SSISDB.catalog.get_project”
$cmd.Parameters.Add(“@folder_name”,$curFolder.Name) | out-null;
$cmd.Parameters.Add(“@project_name”,$proj.Name) | out-null;
[byte[]]$results = $cmd.ExecuteScalar();
if($results -ne $null) {
$deployedProject = $destFolder.DeployProject($proj.Name,$results)
Write-Host $proj.Name $deployedProject.Status
}
else {
Write-Host $proj.Name ” failed – no results from Get_Project”
}
}

As you can see – it’s not a major change – but at east I got it to work.

And this has the bonus of making the pre-release days a much simpler affair for us developers.
No need to write detailed descriptions for each SSIS project that IT shall release.
No need to remind IT to “allways do a get latest” from TFS, especially nice if you run into a bug that you missed in pre-release testing and need to re-deploy.
IT won’t have to relate to more than one single script – and a script language and tools they are familiar with – it doesn’t even have to be downloaded for every release.

I had the first release deployment with this script today, and it worked exactly as well as I had hoped – though today was merely a single-project release.

Meanwhile, I’ve also been playing around with Azure – and the powershell scripts that comes bundled with the service – and I realise how much simple my life will be if I can get a grip on the powershell language.

So, now I’ve invested in books on Powershell – and need to find time to learn more. This has the potential to save a lot of time in a production enviroment also in the future, as well as making my play dates in the cloud that much easier. (And if you attended my session at SQLUG Oslo 14/4 – had I mastered Powershell properly, you would most likely have been able to see the demo I had planned on showing…)

New speaking opportunities

Post author By johanlb
Post date February 13, 2015

Well, as mentioned earlier – I have been invited to do a webinar on PASS Business Intelligence virtual chapter – and the date is Monday 16. February – so at the time of writing, just 3 days away.

And I’ve been asked by the local user group to hold a session in April, though the date hasn’t been announced yet.

So much looking forward to this, as well as seeing if any of my submissions to SQLSaturdays this year will be accepted.

Update 18. March:

The date for the SQLUG session has een announced: 14th of April.
Looking forward to doing a live session.
Details can be found here

SQLHangout

Post author By johanlb
Post date January 27, 2015

As mentioned in previous post – I was invited by Boris Hristov to one of his regular #SQLHangouts.
Topic for our session was things you need to think about on performance when working with PowerPivot or Tabular mode – as it’s a whole different annimal than your regular mutli-dimensional cube. We also talked a bit about Qlikview and Tableau – and MS PowerBI.

I had heard rumours that it was fun to be part of these hangouts – and of course, watching the other sessions allways confirm this – what with Boris being an all around good guy and a good host. And it was true, great fun – hope you guys enjoy it as well. And that someone can benefit from watching it.

We recorded it today – and you can now watch the result here:

New year’s resolutions

Post author By johanlb
Post date January 23, 2015

Well, I really don’t do new year’s resolutions. I tend to fail miserably on them for some reason.

But this year, I’ve decided to have a few.

The first one is this blog here, I found that I run into stuff at work that for some reason I need to get off my chest without bothering my poor wife.
So, a blog should be the perfect solution. Who knows, I might post something others find useful as well?

Secondly, I plan on getting better at speaking at events. And the only way of doing that is to get practice.
So; I need to come up with topics that are interesting enough to share – and then spend time sending these off to event organizers to see if someone bothers inviting me.

Surprisingly, it seems I’ve managed to spark some interesting ideas. So far I’ve been invited by Boris Hristov to participate in one of his SQL Hangouts – which is really cool.
I’ll leave it to him to announce the date, but I’m really looking forward to this – the SQL Hangouts tend to be fun to watch at least.

Secondly, I’ve been given the chance to hold my second webinar for the PASS BI virtual chapter. Date to be announced when they release the February schedule, but I’ll be talking about PowerBI, SSAS Tabular and Azure.
I’m really amazed at this development, and hope I can continue the year as well as it starts – but only time will tell.

Ehlo world!

Post author By johanlb
Post date September 16, 2014

Well, this is it! Another blog among the plethora of SQL and BI blogs out there.

Hope I can write something that can be of use to others…

The header image is the view Westwards from a small hill a few kilometers from my house. I currently live on an island called Sotra outside Bergen on the edge of the North Sea.