[b]
How you can improve performance on the server
Do not be deceived by the apparent fast performance in your development environment. Unless you have a great testing and staging environment - and most people don't - performance problems often do not manifest themselves until you reach the production environment, and by then it's too late. Once the problem has reached the customer it costs at least 100 times as much to fix, so use techniques covered in this article to help prevent the problem in the first instance.
[b][size=18]
Indexes
It is very important your index supports the way your query accesses the table. In the example below, the table is keyed on CustomerID only with no secondary indexes. If you look up an individual record by CustomerID performance will be optimal, however if you access the data any other combination of columns then your database performance is going to be relatively poor, no matter what techniques you employ when programming your components.
CODE
SELECT * FROM Customers WHERE City = 'London'
SELECT * FROM Customers WHERE CompanyName = 'Around the Horn'
Both example queries above force the database engine to search though every row in the table to find the record(s) matching the criteria as there are no indexes on the CompanyName or City columns. Even though the second query would only ever return one matching company name, the database engine does not know this so it searches the entire table to see if there are more matching records. This searching process is called a table scan and is reflected in the query plan shown below:
SQL Server has two types of indexes: clustered and nonclustered. A clustered index keeps the data rows of a table physically stored in index order. Because there is only one physical order to the rows in a table, there can be a maximum of one clustered index per table. All other indexes on a table are nonclustered.
[b][size=18]
Clustered Indexes
A clustered index is usually the most efficient method for finding a group of rows in a table, so choosing the column(s) by which to define the clustered index is an important part of application tuning. Be careful creating clustered indexes arbitrarily as each non-clustered index automatically inherits the keyed columns from the clustered index – this can make your non-clustered indexes very large – try to minimise your use of wide clustered indexes, especially if you have multiple non-clustered indexes on the same table.
You may choose to put the clustered index on a foreign key field where you wish to quickly scan a range of values - for example a particular batch, site, or customer, rather than jumping around the table retrieving data from various regions. In the example table above, you might place a clustered index on City so that all records in the same city are grouped together.
[b][size=18]
Non-Clustered Indexes
When accessing the data though a non-clustered index, the db engine must first lookup the appropriate value in the index, then go to the table to read the actual data. While this lookup is an overhead, it is still much faster than performing a table scan (looking through the entire table) to find each record. Most database engines are clever enough to skip the lookup on the actual data table if the index contains all the columns necessary to satisfy the query. Purposely creating an index that contains all the columns a query may want is called creating a covering index, as you are effectively creating a new copy of the table sorted in a different order.
Remove unused nonclustered indexes as they slow down inserts, updates, and deletes on a table (remember that data must be inserted/updated/deleted from the index too). Unused indexes also consume space on disk and possibly in memory so remove them if they are not being utilized effectively.
[b][size=18]
Table Hot Spots – (Clustered and Non-Clustered Indexes)
Be wary of creating transactional “hot-spots” on your tables in high-volume OLTP environments, where you may be inserting lots of rows at a particular point in the table – usually at the end. Common examples of this are found in tables with counter (auto-increment) fields. If the data is keyed and physically sorted by the counter field then new records are constantly appended in the same data-page (area). This adversely affects performance (creates locking contention) especially when many sessions are trying to access the area simultaneously.
[b][size=18]
Distribution Statistics
An index includes distribution statistics about the data values that is used by the database engine to determine if a particular index will be useful to solve a query. When a large number of rows are added or changed after an index is defined on a table, these statistics will be inaccurate until you run an UPDATE STATISTICS statement (ms sql 7.0). Out of date statistics can mislead the server into using a poor index or a table scan to resolve a query, causing poor query performance and reduced concurrency. An update operation that uses a table scan instead of an index, for example, will lock the entire table for the duration of the transaction.
It is important to update statistics regularly on the active tables in your application. Microsoft SQL Server can be setup to do this automatically using the database maintenance wizard.
[b][size=18]
Invalidating Your Indexes
If you’ve got indexes, use them – Applying functions on indexed columns is a common mistake that can invalidate the use of your indexes. A couple of examples are:
CODE
..WHERE LEFT( customer_name, 2 ) = ‘AN’ Will not use the indexes – forcing a table scan
..WHERE amount * 1.125 > $100 Again, amount may be in the index but amount multiplied by 1.125 is not, so the index can’t be used.
These could be re-written several ways, but perhaps the following is the best
CODE
..WHERE customer_name LIKE ‘AN%’
..WHERE amount > $112.50
This rule applies in certain cases where you are joining two tables with different datatypes, or when trimming character data.
CODE
WHERE CONVERTt(CHAR (10), table1.customer_no) = CONVERT (CHAR (10), table2.customer_no)
In this instance, good database design will ensure your tables both have the correct datatype.
[b][size=18]
Blocking Locks and Deadlocks
Applications use database locks to control data integrity in multiuser concurrency situations. Poor database design coupled with uncoordinated business processes can cause crippling database lock contention and destroy your application's performance.
Most databases offer support for locking records at different levels including row, page, and table locking. While normal processing in a multiuser environment can conveniently use locks, your application will run faster without locks. Considerable design effort should be applied to table design, indexing strategy, and query optimisation to avoid locking situations.
The following topics discuss blocking locks, deadlocks, and present some design strategies for avoiding them.
[b][size=18]
Blocking Locks
A blocking lock occurs when one lock causes another process to wait (in a queue) until the current process is entirely done with the resources. As soon as the first process is complete, the blocked process resumes operation. In a normal server environment, infrequent blocking locks are acceptable. But if blocking locks are common (rather than infrequent), there is probably some kind of design or query implementation problem.
Some of the common design problems that cause blocking locks are very wide tables, insufficient indexes, and tables that are not completely normalised. If your application is using SQL Server, you can monitor the total number of blocking locks by using the User Activity Monitor in Microsoft SQL Server's SQL Enterprise Manager, or sp_lock in transact sql to quickly identify all locks on the system, or for a specific process.
[b][size=18]
Deadlocks
Consider the situation where one partially finished transaction must wait for another transaction to complete. At the same time, the other partially finished transaction must also wait for the original transaction to complete. This is called a deadlock: when each transaction is waiting for resources used by the other. When such a deadlock occurs, the database typically cancels one of the transactions automatically.
Long simultaneous transactions that lock and unlock resources must be carefully designed to avoid deadlocks.
[b][size=18]
Client / Server Deadlocks
While most databases do a good job of automatically identifying deadlocks, they will not detect a “client/server” deadlock. These occur when an application creates two connections to the database, and a transaction on one connection (possibly run asynchronously, but not necessarily) causes a transaction on the other two wait indefinitely. The net result is your application will stall either forever, or until it times out.
[b][size=18]
Escalation
Escalation is the process of converting many fine-grain locks into fewer coarse-grain locks, reducing system overhead. Most databases automatically escalate row locks and page locks into table locks when a transaction exceeds its escalation threshold.
This can be the cause of blocking locks. If you have many page level locks; SQL may try to obtain a table level lock but may not be able to due to another process holding a single exclusive lock on a different page – even if your application many not be needing to use that page.
Lock-Avoiding Design Strategies
Most blocking problems happen because a single process holds locks for an extended period of time, causing a chain of blocked processes, all waiting on other processes for locks. There are a few design strategies that can reduce the occurrence of blocking locks and deadlocks:
· Use clustered indexes on high-usage tables.
· [b]
Make sure that UPDATE and DELETE statements use an index If not, the database may not be able to perform row-level locking. Without an index at all, the entire table will be locked in a write operation, with a poor index, the page (or even table) would be locked rather than a single row.
· [b]
Ensure there are not commit or rollback conflicts in nested transactions.
· [b]
Use stored procedures. Stored procedures are already parsed, normalized, and compiled, and they run very quickly.
· [b]
Put data service components on the database computer. Deploying data service components on the same computer as the database engine can greatly reduce network traffic and improve overall data throughput.
· [b]
Keep lookup tables locally. There are usually instances in which read-only data is accessed fairly often by your application. By keeping that data locally, lookups become extremely fast.
· [b]
Do not submit queries with long execution times. Commonly this happens when intermixing decision support queries and online transaction processing (OLTP) queries on the same database. Look for ways to optimise the query, by changing indexes, breaking a large, complex query into simpler queries, or running the query during off hours or on a separate computer.
· [b]
Rollback or commit queries that were cancelled. If you cancel a query, for example, using the Open Database Connectivity (ODBC) sqlcancel function, also issue the required number of ROLLBACK and COMMIT statements. Cancelling the query does not automatically roll back or commit the transaction. All locks acquired within the transaction are retained after the query is cancelled. Applications must properly manage transaction nesting levels by committing or rolling back cancelled transactions.
· [b]
Process all returns to completion. Especially a problem with cursors where you have asked the server for a range of rows but only fetched some of them, leaving locks on the tables potentially blocking other users.
Some Last Resort Options – Break out the emergency kit
· [b]
Avoid high row count SQL statements that can cause a table lock. due to decreased performance (but increased concurrency).
· [b]
Break long transactions up into many shorter transactions. With SQL Server, you can use "bound connections" to control the execution sequence of the shorter transactions. Shorter transactions are the key; bound connections should only be used as a last resort in specialist situations.
[b][size=18]
SQL Server Network Libraries
When you install the SQL Server ODBC driver on a computer, the only network library usually installed is Named Pipes; for performance reasons, you should install additional libraries. The most direct way to get the additional network libraries is to install the SQL Server client utilities, which will install the additional network libraries. After installation, they are available through the ODBC Data Source Administrator. The network library with the best performance is TCP/IP.
Many larger corporate networks run multiple protocols to accommodate different generations of network expansion. Some consideration should be given to the interaction of these protocols. For example, some servers running IPX will continually announce themselves with a broadcast message. This message leads to additional network traffic that may have an adverse effect on system performance. Eliminating other protocols and just running TCP/IP may not be practical, but it may be possible to segment or isolate different sections of the network to eliminate this additional network traffic in areas your applications are running.