Category Archives: Hiring

Fun With SQL (OVER and PARTITION clauses)

0
Filed under Hiring, SQL

I’ve been interviewing over the last few weeks for a new position (If anyone has a need for a very experienced .Net developer with some recent Ruby/Rails/Javascript/Coffeescript/JQuery etc chops, shoot me an email!)

At any rate, one company had an interesting SQL challenge.

In a nutshell, given a table with an ID field and a FLAG bit field, find all the consecutive rows where the FLAG field is 1 (ie a bad record).

The trick was that the output needed to look like:

ID RANGE        Error Count
4000-6000       2001

First, lets insert a CSV file into a table….

BULK INSERT Flags
FROM 'c:\dev\darin\sqltest\sqltest.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
)
GO

They supplied a 1.6 million row sample table, and expected sub 10 second response time. Sooooo, you also might want to add a clustered index to the ID field.

Let’s see some SQL!

With the sample table in place, I could get down to business.

I’ll admit, I had to google the syntax of the OVER and PARTITION clauses in T-SQL. They’re not something I use every day.

First, I needed all the rows, with a group number grouping all the consecutive rows who’s flag value was 1. I used a Common Table Expression for this. I could have used temp tables and it would have resulted in about the same performance, but… what the hey!

WITH parts AS (
   SELECT *, id - ROW_NUMBER() OVER (ORDER BY id) AS grp
   FROM flags
   WHERE flag = 1
)

Next, I needed the count of each group as part of the result set….

WITH counts AS (
   SELECT *, COUNT(*) OVER (PARTITION BY grp) AS cnt
   FROM parts
)

With those two CTEs in place, all that was left was the output Select….

SELECT cast(min(id) as varchar(20)) + '-' + cast(max(id) as varchar(20)) as IDRange, 
   cnt as ErrorCount
FROM counts
WHERE cnt >= 1000
GROUP BY grp, cnt
ORDER BY IDRange

In this case, I only cared about blocks of errors of more than 1000 rows. The concatenated SELECT is just to make a nice output format. realistically, I’d probably only select MIN(ID), MAX(ID), CNT

I haven’t used CTE’s a lot, but I’m finding more and more uses for them.

And the OVER clause is something I really need to get more familiar with.

For the 1.6 million row source table, I was getting sub 2 second response times on this query. No doubt, there’s ways to speed this up, but that’s not too shabby.

Hiring Good People

0
Filed under Hiring

The topic of hiring came up recently, and it reminding me of a novel approach an employer I once worked with used to find people.

They posted notices with all the major job search boards (that’s how I came across them in the first place), and in them, they gave no email address, just the URL of a zip file that contained the source of a broken project.

The task was to fix the project. Once you’d done so and were able to run the thing to completion, the end result was the email address where you needed to submit your resume.

The trick was that you couldn’t just find the email address in the source. The project really did have to be debugged and corrected first.

It wasn’t terribly tough. If I remember correctly, I fixed it up one morning before going in to my at that time current job, and submitted my resume more on a lark than anything else. The company was, after all, located in Chicago (I had no interest in moving to Chicago), and they made no mention of telecommuting.

In the end, I liked the guys involved, got the job and stayed for more than two years.

I was speaking with the CIO who concocted the test at one point. He said he got something like nine resumes submitted and every one of them looked like a great fit. That in contrast to the thousands of resumes that come in from a typical post to, say, Dice.

In the entire time I’ve used the various job sites out there, I’ve never seen this approach used by anyone else.