Clearing LOG Files: cluster index and identity

According to Brian Knight in "SQL Server 2000 for Experienced DBAs",
"Avoid creating clustered indexes on identity columns.
Clustered indexes perform better on range queries,
such as a date. When you have a clustered index on
an identity column, you risk your data receiving hot
spots, which are caused by many people updating the
same data page."
I would like to know what is the consenus in this forum on this.It really depends on your usage; there is no single silver bullet answer.
"rkusenet" <rkusenet@.sympatico.ca> wrote in message
news:2unifgF2c9ng0U1@.uni-berlin.de...
> According to Brian Knight in "SQL Server 2000 for Experienced DBAs",
> "Avoid creating clustered indexes on identity columns.
> Clustered indexes perform better on range queries,
> such as a date. When you have a clustered index on
> an identity column, you risk your data receiving hot
> spots, which are caused by many people updating the
> same data page."
> I would like to know what is the consenus in this forum on this.
> --
>
>|||I prefer the exact opposite. I think hot spots (up to a point) are a good
thing. Hot spotting data gives the cache manager something to grab hold of.
Clustering on an Identity column also guarantees inserts are at the end of
the table, thus avoiding the dreaded page split. Hot spotting was very bad
under older versions of SQL, but SQL 2000 copes with it pretty well.
Clustering on a narrow unique key also has benefits, especially if you have
a lot of non-clustered indexes on a table.
Having said that, I do not use the identity column as my Primary Key. I
create a PK constraint using a non-clustered index on a natural key.
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com
I support the Professional Association for SQL Server
www.sqlpass.org
"rkusenet" <rkusenet@.sympatico.ca> wrote in message
news:2unifgF2c9ng0U1@.uni-berlin.de...
> According to Brian Knight in "SQL Server 2000 for Experienced DBAs",
> "Avoid creating clustered indexes on identity columns.
> Clustered indexes perform better on range queries,
> such as a date. When you have a clustered index on
> an identity column, you risk your data receiving hot
> spots, which are caused by many people updating the
> same data page."
> I would like to know what is the consenus in this forum on this.
> --
>
>|||And I fall exactly opposite of Geof... Here are my reasons... When the
table performance is targeted for quick queries, as opposed to fast inserts
( which is most of the time in my experiecnce)... Then don't clustere on the
identity column, cluster on a Where clause key becuase..
clustered indexes are useful when
you are returning many rows in your queries
there are many duplicate clustered keys ( since they are stored together
they will be returned together)
you are doing many range searches
you are doing tons or order by on the clustered key...
Are any of these true when you use an identity column? How many times will
you do select * from emp where empid > 50 ,,, or empid between 100 and 500
(rarely if ever)... You might be ordering by empid... But for the most part,
if you cluster on the identity column you will NOT be able to benefit from
the nice things that clustering can bring to a select.. So most of the time
I cluster on a WHERE clause column which gives the biggest bang ( deciding
which one is sometimes tough tho.)
Here is the exception:
IF the primary performance item for this table is INSERT SPEED, then the
best way to get that to be fast is to cluster on the identity column. This
cuts page allocation in half, because of a special algorythm used in the
Engine... THe inserts will be fast, but the cost will be select speed...
Reasonable people differ here... You should choose you method ( in my
opinion) based on the performance priorities for the table..
hope this helps..
Wayne Snyder, MCDBA, SQL Server MVP
Mariner, Charlotte, NC
www.mariner-usa.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
"Geoff N. Hiten" <SRDBA@.Careerbuilder.com> wrote in message
news:e5NK88EwEHA.2804@.TK2MSFTNGP14.phx.gbl...
> I prefer the exact opposite. I think hot spots (up to a point) are a good
> thing. Hot spotting data gives the cache manager something to grab hold
of.
> Clustering on an Identity column also guarantees inserts are at the end of
> the table, thus avoiding the dreaded page split. Hot spotting was very
bad
> under older versions of SQL, but SQL 2000 copes with it pretty well.
> Clustering on a narrow unique key also has benefits, especially if you
have
> a lot of non-clustered indexes on a table.
> Having said that, I do not use the identity column as my Primary Key. I
> create a PK constraint using a non-clustered index on a natural key.
> --
> Geoff N. Hiten
> Microsoft SQL Server MVP
> Senior Database Administrator
> Careerbuilder.com
> I support the Professional Association for SQL Server
> www.sqlpass.org
> "rkusenet" <rkusenet@.sympatico.ca> wrote in message
> news:2unifgF2c9ng0U1@.uni-berlin.de...
>|||Good point. Range queries on the clustered index are going to be faster if
you cluster on a natural key. I have one severely de-normalized
transactional table that I had to cluster on the most used foreign key to
get any decent performance. On a properly normalized schema, I find that
more complex queries run faster clustering on an Identity column. It allows
very fast index comparisons so you can leverage multiple indexes to create a
'virtual' covered index. It also increases index density so you get more
effective use of cache memory.
As always, your application will determine your specific needs. Test, test,
and test again so you know what your application is doing to the system. Be
ready to make changes and measure again so you can make real comparisons and
recommendations.
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com
I support the Professional Association for SQL Server
www.sqlpass.org
"Wayne Snyder" <wayne.nospam.snyder@.mariner-usa.com> wrote in message
news:%23FdcEtNwEHA.3908@.TK2MSFTNGP12.phx.gbl...
> And I fall exactly opposite of Geof... Here are my reasons... When the
> table performance is targeted for quick queries, as opposed to fast
inserts
> ( which is most of the time in my experiecnce)... Then don't clustere on
the
> identity column, cluster on a Where clause key becuase..
> clustered indexes are useful when
> you are returning many rows in your queries
> there are many duplicate clustered keys ( since they are stored together
> they will be returned together)
> you are doing many range searches
> you are doing tons or order by on the clustered key...
> Are any of these true when you use an identity column? How many times will
> you do select * from emp where empid > 50 ,,, or empid between 100 and 500
> (rarely if ever)... You might be ordering by empid... But for the most
part,
> if you cluster on the identity column you will NOT be able to benefit from
> the nice things that clustering can bring to a select.. So most of the
time
> I cluster on a WHERE clause column which gives the biggest bang (
deciding
> which one is sometimes tough tho.)
> Here is the exception:
> IF the primary performance item for this table is INSERT SPEED, then the
> best way to get that to be fast is to cluster on the identity column. This
> cuts page allocation in half, because of a special algorythm used in the
> Engine... THe inserts will be fast, but the cost will be select speed...
> Reasonable people differ here... You should choose you method ( in my
> opinion) based on the performance priorities for the table..
> hope this helps..
>
>
> --
> Wayne Snyder, MCDBA, SQL Server MVP
> Mariner, Charlotte, NC
> www.mariner-usa.com
> (Please respond only to the newsgroups.)
> I support the Professional Association of SQL Server (PASS) and it's
> community of SQL Server professionals.
> www.sqlpass.org
> "Geoff N. Hiten" <SRDBA@.Careerbuilder.com> wrote in message
> news:e5NK88EwEHA.2804@.TK2MSFTNGP14.phx.gbl...
good[vbcol=seagreen]
> of.
of[vbcol=seagreen]
> bad
> have
>|||Geoff N. Hiten wrote:
> Good point. Range queries on the clustered index are going to be
> faster if you cluster on a natural key. I have one severely
> de-normalized transactional table that I had to cluster on the most
> used foreign key to get any decent performance. On a properly
> normalized schema, I find that more complex queries run faster
> clustering on an Identity column. It allows very fast index
> comparisons so you can leverage multiple indexes to create a
> 'virtual' covered index. It also increases index density so you get
> more effective use of cache memory.
> As always, your application will determine your specific needs.
> Test, test, and test again so you know what your application is doing
> to the system. Be ready to make changes and measure again so you can
> make real comparisons and recommendations.
>
Just to add my 2 cents. Clustering on any column that changes causes
sever fragmentation of the table. So if the OP chooses to cluster on a
column(s) used to filter queries, make sure that those columns are not
likely to change over time.
The OP should also realize that the clustered key is incorporated into
all non-clustered indexes, so it should be as short as possible.
David Gugick
Imceda Software
www.imceda.com|||In message <2unifgF2c9ng0U1@.uni-berlin.de>, rkusenet
<rkusenet@.sympatico.ca> writes
>According to Brian Knight in "SQL Server 2000 for Experienced DBAs",
>"Avoid creating clustered indexes on identity columns.
> Clustered indexes perform better on range queries,
> such as a date. When you have a clustered index on
> an identity column, you risk your data receiving hot
> spots, which are caused by many people updating the
> same data page."
>I would like to know what is the consenus in this forum on this.
>
It really depends on your requirements and usage.
In an OLAP environment I would agree with his comment however in an OLTP
environment this would cause performance issues on large tables. What is
worse is you increase the risk of index fragmentation and therefore
require more frequent rebuilds.
If the table is used for linking purposes in an OLTP environment with
few inserts then again his statement would hold true.
It all depends ...
Kind Regards,
--
Andrew D. Newbould E-Mail: newsgroups@.NOSPAMzadsoft.com
ZAD Software Systems Web : www.zadsoft.com

Thursday, March 22, 2012

cluster index and identity

No comments:

Post a Comment

Clearing LOG Files

Blog Archive

About Me