smartmontools是一款開(kāi)源的磁盤(pán)控制,監(jiān)視工具,可以運(yùn)行在 Linux,Unix,BSD,Solaris,Mac OS,OS/2,Cygwin和Windows上,同時(shí)它還可以從啟動(dòng)光盤(pán)或啟動(dòng)軟盤(pán)運(yùn)行,支持ATA/ATAPI/SATA-3(到-8)位的硬盤(pán)和 SCSI硬盤(pán),另外還支持磁帶設(shè)備,它的老家在smartmontools.sourceforge.net,實(shí)際上它是一個(gè)軟件包,包括了兩個(gè)實(shí)用程 序:smartctl和smatd。它監(jiān)控的硬盤(pán)必須具有S.M.A.R.T特性,目前所有硬盤(pán)都有這個(gè)特性,但默認(rèn)情況下通常沒(méi)有開(kāi)啟這個(gè)功能,有兩種 方法來(lái)開(kāi)啟這個(gè)特性:1)通過(guò)BIOS設(shè)置選項(xiàng)2)通過(guò)smartctl命令。利用它可以測(cè)試硬盤(pán)的健康狀況,并在發(fā)生故障前進(jìn)行預(yù)警。
smartmontools的使用
1、啟動(dòng)監(jiān)控進(jìn)程
# /etc/init.d/smartd start
啟動(dòng) smartd: [ 確定 ]
2、檢查硬盤(pán)是否支持SMART 1993年以后出廠的硬盤(pán)基本上都支持SMART技術(shù),使用如下命令可以來(lái)查看:
# smartctl -i /dev/hdb
smartctl version 5.33 [i686-turbo-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: ST3160212A
Serial Number: 5LS2EDKN
Firmware Version: 3.AAE
User Capacity: 160,041,885,696 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Mon Sep 17 02:13:37 2007 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
在上面的信息可以看到,該硬盤(pán)是支持SMART技術(shù),且當(dāng)前是開(kāi)啟的。如果SMART support is: Disabled表示SMART未啟用,執(zhí)行如下命令,啟動(dòng)SMART:
smartctl --smart=on --offlineauto=on --saveauto=on /dev/hdb
3、檢查硬盤(pán)健康狀態(tài)
# smartctl -H /dev/hdb
smartctl version 5.33 [i686-turbo-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
請(qǐng)注意result后邊的結(jié)果:PASSED,這表示硬盤(pán)健康狀態(tài)良好,如果這里顯示Failure,那么最好立刻給服務(wù)器更換硬盤(pán)。SMART只能報(bào)告磁盤(pán)已經(jīng)不再健康,但是報(bào)警后還能繼續(xù)運(yùn)行多久是不確定的,通常,SMART報(bào)警參數(shù)是有預(yù)留的,磁盤(pán)報(bào)警后,不會(huì)當(dāng)場(chǎng)壞掉,一般能堅(jiān)持一段時(shí)間,有的硬盤(pán)SMART報(bào)警后還繼續(xù)跑了好幾年,有的硬盤(pán)SMART報(bào)錯(cuò)后幾天就壞了,千萬(wàn)不要存在僥幸心理。執(zhí)行如下命令可以看到詳細(xì)的參數(shù):
# smartctl -A /dev/hdb
smartctl version 5.33 [i686-turbo-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 100 006 Pre-fail Always - 81812244
3 Spin_Up_Time 0x0003 100 099 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 257
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 64781708
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 4365
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 276
187 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0
190 Unknown_Attribute 0x0022 058 053 045 Old_age Always - 773324842
194 Temperature_Celsius 0x0022 042 047 000 Old_age Always - 42 (Lifetime Min/Max 0/21)
195 Hardware_ECC_Recovered 0x001a 052 048 000 Old_age Always - 1562815
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
FLAG是標(biāo)記,標(biāo)準(zhǔn)數(shù)值(VALUE)應(yīng)當(dāng)小于或等於關(guān)鍵值(THRESH)。WHEN_FAILED 代表錯(cuò)誤信息,上面顯示的WHEN_FAILED縱行是空行,說(shuō)明硬盤(pán)沒(méi)有故障。如果WHEN_FAILED顯示數(shù)字,表明硬盤(pán)磁道可能有比較大的壞道。
4、對(duì)硬盤(pán)進(jìn)行檢測(cè) 手工對(duì)硬盤(pán)進(jìn)行測(cè)試的方法有以下四種:
smartctl -t short 后臺(tái)檢測(cè)硬盤(pán),消耗時(shí)間短
smartctl -t long 后臺(tái)檢測(cè)硬盤(pán),消耗時(shí)間長(zhǎng)
smartctl -C -t short 前臺(tái)檢測(cè)硬盤(pán),消耗時(shí)間短
smartctl -C -t long 前臺(tái)檢測(cè)硬盤(pán),消耗時(shí)間長(zhǎng)
例如,在后臺(tái)對(duì)硬盤(pán)進(jìn)行詳細(xì)的檢查,命令如下:
# smartctl -t long /dev/hdb
smartctl version 5.33 [i686-turbo-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 54 minutes for test to complete.
Test will complete after Mon Sep 17 03:53:32 2007
Use smartctl -X to abort test.
上面的信息顯示54分鐘后將完成檢查,同時(shí)可以使用 smartctl -X 終止檢查。終止硬盤(pán)檢查命令的使用方法是:
# smartctl -X /dev/hdb
smartctl version 5.33 [i686-turbo-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Abort SMART off-line mode self-test routine".
Self-testing aborted!
5、查看硬盤(pán)日志 使用“smartctl -l logtype ”可以查看硬盤(pán)的日志,日志又分為多種類(lèi)型,如selftest、error等等。例如查看硬盤(pán)檢測(cè)的日志,如下:
# smartctl -l selftest /dev/hdb
smartctl version 5.33 [i686-turbo-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Aborted by host 90% 4365 -
# 2 Extended offline Completed without error 00% 4247 -
# 3 Short offline Aborted by host 30% 4246 -
# 4 Short offline Aborted by host 10% 4246 -
# 5 Extended offline Completed without error 00% 4229 -
查看硬盤(pán)錯(cuò)誤日志:
# smartctl -l error /dev/hdb
smartctl version 5.33 [i686-turbo-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged