Commits · 63c0c05a29750b86241eca58cde4ea494b65a94e · Summer2022 / 22b970497

Sep 28, 2022

scsi: hisi_sas: Modify v3 HW ATA completion process when SATA disk is in error status · 63c0c05a

Xingui Yang authored 2 years ago

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q63H
CVE: NA

-------------------------------------

When an NCQ error occurs, SAS controller will abnormally complete the I/Os
that newly delivered to disk, and bit8 in CQ dw3 will be set to 1 to
indicate current SATA disk is in error status. The current processing flow
is set ts->stat to SAS_OPEN_REJECT and then sas_ata_task_done() will set
fis stat to ATA_ERR. After analyzed by ata_eh_analyze_tf(), err_mask will
set to AC_ERR_HSM. If media error occurs for four times within 10 minutes
and the chip rejects new I/Os for four times, NCQ will be disabled due to
excessive errors.

However, if media error occurs multiple times, the NCQ mode shouldn't be
disabled. Therefore, use sas_task_abort() to handle abnormally completed
I/Os when SATA disk is in error status.

[10253.397429] hisi_sas_v3_hw 0000:b4:02.0: erroneous completion disk err...

63c0c05a

Aug 25, 2022

scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw · 9a26603c

Xingui Yang authored 2 years ago

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5NIU1


CVE: NA

-------------------------------------

When CQ header dw3 SATA_DISK_ERR is set it means this SATA disk is in
error state and the current IPTT is invalid. An invalid IPTT does not
correspond to any slot. In this scenario, new I/Os that delivered to disk
will be rejected by the, controller and all I/Os remained on the disk
should be aborted, which we add here with the ata_link_abort() call.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: kang fenglong <kangfenglong@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

9a26603c

Revert "scsi: hisi_sas: Modify v3 HW I/O processing when SATA_DISK_ERR bit is... · 338ab8d0

Xingui Yang authored 2 years ago

Revert "scsi: hisi_sas: Modify v3 HW I/O processing when SATA_DISK_ERR bit is set and NCQ Error occurs"

driver inclusion
category: bugfix
buzialla: https://gitee.com/openeuler/kernel/issues/I5NIU1



-----------------------------

This reverts commit fc811070.

After an error occurs in the NCQ scenario, the policy similar to the ahci
driver is used to handle NCQ exceptions. The libata interface
ata_link_abort is invoked to process all I/Os on the link. Compared with
the current solution, the code modification amount is small, and the
possibility of NCQ being disabled decreases. That is, the probability of
performance deterioration is low. So revert it.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: kang fenglong <kangfenglong@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

338ab8d0

Jul 07, 2022

scsi: hisi_sas: Modify v3 HW I/O processing when SATA_DISK_ERR bit is set and NCQ Error occurs · fc811070

Xingui Yang authored 2 years ago

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5CG2F


CVE: NA

-----------------------------------------------------------------------

SATA_DISK_ERR bit is bit16 of cq dw3，when it is set to 1, it means this
sata disk is in error status and IPTT is invalid, such as NCQ error. In
this scenario, new IO issued from this disk will be rejected by sas
controller, all I/O remained in disk should be aborted.

To ensure sas controller wouldn't operate memory before abort all I/O, all
I/O remained in the disk should be set to aborted state by register and
completed with state SAS_ABORTED_TASK through task_done(), then SCSI error
handling thread will be wake up immediately to analyze the cause of the
error, such as read log page for error details.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: kang fenglong <kangfenglong@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

fc811070

scsi: hisi_sas: enable use_clustering · 5f32f8b7

Xingui Yang authored 2 years ago

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5B468


CVE: NA

------------------------------------------------

Enable "clustering", that is merging of segments so that they might span
more than a single page, and optimized the issue that 520 KB of service
delivery is split.

fio test with --filename=/dev/sdb --bs=520k --iodepth=32

before:
[root@localhost ~]# cat /sys/block/sdb/queue/max_segment_size
4096

[root@localhost ~]#iostat -x
Device ... r_await rareq-sz ... aqu-sz  %util
sdb    ... 29.78   259.89   ... 5.87    9.92

after:
[root@localhost ~]# cat /sys/block/sdb/queue/max_segment_size
65536

[root@localhost ~]#iostat -x
Device ... r_await rareq-sz ... aqu-sz  %util
sdb    ... 29.80   516.03   ... 1.34    4.50

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: kang fenglong <kangfenglong@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

5f32f8b7

scsi: hisi_sas: Change DMA setup lock timeout to 2.5s · 510ebd8e

Xingui Yang authored 2 years ago

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5BXH1


CVE: NA

-------------------------------------------------------------------

DMA setup lock timeout protection is added when DMA setup frames are
received, it's a function outside the protocol and used to prevent SATA
disk I/Os from being delivered for a long time. The default value is 100ms
, it's too strict and easily triggered timeout when the disk is overloaded
or faulty. Based on the average I/O latency of 300 disks, we adjust the
value to 2.5s.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: kang fenglong <kangfenglong@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

510ebd8e

May 13, 2022

scsi: hisi_sas: Change hisi_sas_control_phy() phyup timeout · 1bc9fed3

Xiang Chen authored 2 years ago

mainline inclusion
from mainline-5.17-rc1
commit 512623de5239
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4UT5N
CVE: NA

-------------------------------------------

The time of phyup not only depends on the controller but also the type of
disk connected. As an example, from experience, for some SATA disks the
amount of time from reset/power-on to receive the D2H FIS for phyup can
take upto and more than 10s sometimes. According to the specification of
some SATA disks such as ST14000NM0018, the max time from power-on to ready
is 30s.

Based on this the current timeout of phyup at 2s which is not enough. So
set the value as HISI_SAS_WAIT_PHYUP_TIMEOUT (30s) in
hisi_sas_control_phy().

For v3 hw there is a pre-existing workaround for a HW bug, being that we
issue a link reset when the OOB occurs but the phyup does not. The current
phyup timeout is HISI_SAS_WAIT_PHYUP_TIMEOUT. So if this does occur from
when issuing a phy enable or similar via hisi_sas_control_phy(), the
subsequent HW workaround linkreset processing calls hisi_sas_control_phy(),
but this will pend the original phy reset timing out, so it is safe.

Link: https://lore.kernel.org/r/1645703489-87194-3-git-send-email-john.garry@huawei.com


Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
conflict:
	drivers/scsi/hisi_sas/hisi_sas.h
	drivers/scsi/hisi_sas/hisi_sas_main.c
Reviewed-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: kang fenglong <kangfenglong@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4.19.90-2205.4.0

1bc9fed3

scsi: hisi_sas: Fix SAS disk sense info print incorrectly sometimes · 8fdd6417

Xingui Yang authored 2 years ago

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ZO9V


CVE: NA

----------------------------------

Sometimes disk response sense info, but driver print data underflow without
sense, and it's not correct. we use scsi_normalize_sense instead of
hisi_sas_get_sense_data to parse the sense info.

before:
data underflow without sense, rsp_code:0xf0, rc:-2.

after:
data underflow, rsp_code:0x70, sensekey:0x5, ASC:0x21, ASCQ:0x0.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: kang fenglong <kangfenglong@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

8fdd6417

Dec 20, 2021

scsi: hisi_sas: Add support for sata disk I/O errors report to libsas · aaea70a1

Xingui Yang authored 3 years ago

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NDM8


CVE: NA

---------------------------
If the response frame has been written back to the memory and carries the
disk error and status when sata I/O completion abnormally, then set task
stat to SAS_PROTO_RESPONSE and let libsas to handle it.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: Kangfenglong <kangfenglong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

aaea70a1

Nov 04, 2021

scsi: hisi_sas: print status and error when sata io abnormally completed · a54cfa2d

Xingui Yang authored 3 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

---------------------
To help debugging efforts, print d2h status and error

D2H:
      FIS Status Bits    =    0x53
        BSY =                  0... ....  Off
        DRDY =                 .1.. ....  On
        DF =                   ..0. ....  Off
        DSC =                  ...1 ....  On
        DRQ =                  .... 0...  Off
        Alignment Error =      .... .0..  Off
        Sense Data Available = .... ..1.  On
        ERR =                  .... ...1  On
      FIS Error Bits    =    0x40
        ICRC =    0... ....  Off
        UNC =     .1.. ....  On
        MC (O) =  ..0. ....  Off
        IDNF =    ...0 ....  Off
        MCR (O) = .... 0...  Off
        ABRT =    .... .0..  Off
        EOM =     .... ..0.  Off
        CCTO =    .... ...0  Off

Here is an example print:
hisi_sas_v3_hw 0000:74:02.0: sata d2h status 0x53, error 0x40

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed by kangfenglong <kangfenglong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

a54cfa2d

Revert "scsi: hisi_sas: use threaded irq to process CQ interrupts" · 4e89e218

Xingui Yang authored 3 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

-----------------------------------------
This reverts commit 5868da04.

this optimization patch depends on the patch of the kernel MQ block.
If the block MQ patch is not integrated at the upper layer,
there is a possibility that the spinlock deadlock occurs.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: Kangfenglong <kangfenglong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

4e89e218

Nov 01, 2021

scsi: hisi_sas: unsupported DIX between OS and HBA only for SATA device · c2381170

Yang Xingui authored 3 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

Signed-off-by: Yang Xingui <yangxingui@huawei.com>
Reviewed-by: Ouyangdelong <ouyangdelong@huawei.com>
Reviewed-by: Kangfenglong <kangfenglong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

c2381170

Oct 30, 2021

scsi: hisi_sas: queue debugfs dump work before FLR · 0ac786a7

Yang Xingui authored 3 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

Debugfs dump should be executed before FLR run for we have to dump some
registers before reset by FLR. So it's wrong to queue debugfs dump work
when running FLR work for these two work queue in same workqueue. It
mean that Debugfs dump work is alway execute after FLR and get data
which is reset.

Signed-off-by: Yang Xingui <yangxingui@huawei.com>
Reviewed-by: Kangfenglong <kangfenglong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

0ac786a7

Sep 30, 2021

scsi: hisi_sas: Optimize the code flow of setting sense data when ssp I/O abnormally completed · b2ef7f6c

yangxingui authored 3 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

---------------------------
In the data underflow scenario, if correct sense data and response frame
have been written to the host memory and the CQ RSPNS_GOOD bit is 0,
then driver sends the sense data to the upper layer.

Signed-off-by: yangxingui <yangxingui@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b2ef7f6c

Sep 23, 2021

scsi: hisi_sas: set sense data when the sas disk's I/O abnormally completed · c0762a73

yangxingui authored 3 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

---------------------------
The sense data of the sas disk is used by the kernel and upper layers to
perform some policies on disks when the I/O is abnormally completed.

Such as the logs as follow, if the driver transmit sense data to the upper
layer, the disk may be repaired by remap policy of disk management system.

[Wed Sep 15 13:03:04 2021] hisi_sas_v3_hw 0000:74:02.0: erroneous
completion iptt=3342 task=        pK-error dev id=0
sas_addr=0x5541310520e0b000 CQ hdr: 0x1503 0xd0e 0x0 0x20000
Error info: 0x1200 0x0 0x0 0x40
[Wed Sep 15 13:03:04 2021] hisi_sas_v3_hw 0000:74:02.0: data underflow,
rsp_code:0x72, sensekey:0x3, ASC:0x11, ASCQ:0x0.

Signed-off-by: yangxingui <yangxingui@huawei.com>
Reviewed-by: ouyangdelong &lt; <ouyangdelong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

c0762a73

Sep 15, 2021

scsi: hisi_sas: use threaded irq to process CQ interrupts · 5868da04

Xiang Chen authored 3 years ago

mainline inclusion
from mainline-v5.10-rc4
commit 81f338e9
category: bugfix
bugzilla: NA
CVE: NA

Currently IRQ_EFFECTIVE_AFF_MASK is enabled for ARM_GIC and ARM_GIC3, so it
only allows a single target CPU in the affinity mask to process interrupts
and also interrupt thread, and the performance of using threaded irq is
almost the same as tasklet. But if the config is not enabled, the interrupt
thread will be allowed all the CPUs in the affinity mask. At that situation
it improves the performance (about 20%).

Note: IRQ_EFFECTIVE_AFF_MASK is configured differently for different
architecture chip, and it seems to be better to make it be configured
easily.

Link: https://lore.kernel.org/r/1579522957-4393-2-git-send-email-john.garry@huawei.com


Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Yang Xingui <yangxingui@huawei.com>
Reviewed-by: kangfenglong <kangfenglong@huawei.com>
Reviewed-by: ouyangdelong &lt; <ouyangdelong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

5868da04

Aug 20, 2021

scsi: hisi_sas: Flush workqueue in hisi_sas_v3_remove() · fa94a1ee

Luo Jiaxing authored 3 years ago

mainline inclusion
from mainline-v5.12-rc1
commit 6834ec8b23c3eb345936022d46179b9d371e2344
category: bugfix
bugzilla: 176139
CVE: NA

------------------------------------------------------------------------

If the controller reset occurs at the same time as driver removal, it may
be possible that the interrupts have been released prior to the host
softreset, and calling pci_irq_vector() there causes a WARN:

WARNING: CPU: 37 PID: 1542 /pci/msi.c:1275 pci_irq_vector+0xc0/0xd0
Call trace:
pci_irq_vector+0xc0/0xd0
disable_host_v3_hw+0x58/0x5b0 [hisi_sas_v3_hw]
soft_reset_v3_hw+0x40/0xc0 [hisi_sas_v3_hw]
hisi_sas_controller_reset+0x150/0x260 [hisi_sas_main]
hisi_sas_rst_work_handler+0x3c/0x58 [hisi_sas_main]

To fix, flush the driver workqueue prior to releasing the interrupts to
ensure any resets have been completed.

Link: https://lore.kernel.org/r/1611659068-131975-5-git-send-email-john.garry@huawei.com


Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ouyangdelong <ouyangdelong@huawei.com>
Signed-off-by: Nifujia <nifujia1@hisilicon.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

4.19.90-2108.7.0

fa94a1ee

Jun 11, 2021

scsi: libsas: add lun number check in .slave_alloc callback · efa2c015

Yufen Yu authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 51878
CVE: NA

-------------------------------------------------

We found that offline a sata device on hisi sas control and then
scanning the host can probe 255 non-existent devices into system.

[root@localhost ~]# lsscsi
  [2:0:0:0]    disk    ATA      Samsung SSD 860  2B6Q  /dev/sda
  [2:0:1:0]    disk    ATA      WDC WD2003FYYS-3 1D01  /dev/sdb
  [2:0:2:0]    disk    SEAGATE  ST600MM0006      B001  /dev/sdc

 1) echo "offline" > /sys/block/sdb/device/state
 2) echo "- - -" > /sys/class/scsi_host/host2/scan

Then, we can see another 255 non-existent devices in system:
  [root@localhost ~]# lsscsi
  [2:0:0:0]    disk    ATA      Samsung SSD 860  2B6Q  /dev/sda
  [2:0:1:0]    disk    ATA      WDC WD2003FYYS-3 1D01  /dev/sdb
  [2:0:1:1]    disk    ATA      WDC WD2003FYYS-3 1D01  /dev/sdh
  ...
  [2:0:1:255]  disk    ATA      WDC WD2003FYYS-3 1D01  /dev/sdjb

After REPORT LUN command issued to the offlin...

efa2c015

Jan 11, 2021

scsi: hisi_sas: fix logic bug when alloc device with MAX device num == 1 · 7903bb3a

Wang Chao authored 4 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

We find that when set HISI_SAS_MAX_ITCT_ENTRIES as 1, we don't evenallocate
one device. This does not comply with the code logic designed by us.

The log is as follows:

[ 3.565847] hsi_sas_v3_hw 0000:74:02.0: Adding to iommu group 0
[ 3.582037] scsi host0: hisi_sas_v3_hw
[ 4.794270] hisi_sas_v3_hw 0000:74:02.0: Enable MSI auo-affinity
[ 4.872986] hisi_sas_v3_hw 0000:74:02.0: phyup: phy5 link_rate=11
[ 4.879057] hisi_sas_v3_hw 0000:74:02.0: phyup: phy0 link_rate=11
[ 4.879117] sas: phy-0:5 added to port-0:0, phy_mask:0x20 (500e004aaaaaaa1f)
[ 4.885131] hisi_sas_v3_hw 0000:74:02.0: phyup: phy1 link_rate=11
[ 4.885145] sas: DOING DISCOVERY on port 0, pid:910
[ 4.891199] hisi_sas_v3_hw 0000:74:02.0: phyup: phy2 link_rate=11
[ 4.891203] hisi_sas_v3_hw 0000:74:02.0: phyup: phy3 link_rate=11
[ 4.891209] hisi_sas_v3_hw 0000:74:02.0: phyup: phy4 link_rate=11
[ 4.897510] hisi_sas_v3_hw 0000:74:02.0: fail alloc dev: max support 1 devices
[ 4.903335] hisi_sas_v3_hw 0000:74:02.0: phyup: phy6 link_rate=11
[ 4.903340] hisi_sas_v3_hw 0000:74:02.0: phyup: phy7 link_rate=11
[ 4.909402] sas: driver on host 0000:74:02.0 cannot handle device 500e004aaaaaaa1f, error:-22
[ 4.937404] sas: DONE DISCOVERY on port 0, pid:910, result:-22
[ 4.937409] sas: broadcast received: 0
[ 4.937410] sas: phy0 matched wide port0
[ 4.937414] sas: phy-0:0 added to port-0:0, phy_mask:0x21 (500e004aaaaaaa1f)
[ 4.937431] sas: DOING DISCOVERY on port 0, pid:910
[ 4.937581] hisi_sas_v3_hw 0000:74:02.0: fail alloc dev: max support 1 devices
[ 4.944774] sas: driver on host 0000:74:02.0 cannot handle device 500e004aaaaaaa1f, error:-22
[ 4.953395] sas: DONE DISCOVERY on port 0, pid:910, result:-22

We find that this extreme case is not considered when implementing
hisi_sas_alloc_dev(). Actually, the device ID is allocated from 1 instead
of 0. As a result, if the value of HISI_SAS_MAX_ITCT_ENTRIES is 1, the
driver considers that the number of devices has reached the upper limit and
return sas_dev with NULL.

Signed-off-by: Wang Chao <wangchao342@hisilicon.com>
Reviewed-by: Zhu Xiongxiong <zhuxiongxiong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

7903bb3a

scsi: hisi_sas: mask corresponding RAS interrupts for hilink DFX exception · 2a5c8402

Wang Chao authored 4 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

BIOS has already masked correspondig RAS interrupts since hilink changed
exception type from NFE to DFX, so do SAS driver.

Signed-off-by: Wang Chao <wangchao342@hisilicon.com>
Reviewed-by: Zhu Xiongxiong <zhuxiongxiong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

2a5c8402

scsi: hisi_sas: Directly trigger SCSI error handling for completion errors · c3ae3100

Wang Chao authored 4 years ago

driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

Abort failed commands in completion path. This avoids having to wait for
block layer timeouts and triggering the SCSI error handling thread.

Link: https://lore.kernel.org/r/1594627471-235395-2-git-send-email-john.garry@huawei.com


Signed-off-by: Wang Chao <wangchao342@hisilicon.com>
Reviewed-by: Zhu Xiongxiong <zhuxiongxiong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

c3ae3100

scsi: hisi_sas: use wait_for_completion_timeout() when clearing ITCT · 47a693cb

Wang Chao authored 4 years ago

driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

When injecting 2bit ecc errors, it will cause confusion inside SAS
controller which needs host reset to recover it. If a device is gone at the
same times inject 2bit ecc errors, we may not receive the ITCT interrupt so
it will wait for completion in clear_itct_v3_hw() all the time. And host
reset will also not occur because it can't require hisi_hba->sem, so the
system will be suspended.

To solve the issue, use wait_for_completion_timeout() instead of
wait_for_completion(), and also don't mark the gone device as
SAS_PHY_UNUSED when device gone.

Link: https://lore.kernel.org/r/1571926105-74636-4-git-send-email-john.garry@huawei.com


Signed-off-by: Wang Chao <wangchao342@hisilicon.com>
Reviewed-by: Zhu Xiongxiong <zhuxiongxiong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

47a693cb

scsi: hisi_sas: Update all the registers after suspend and resume · fae0c8dd

Wang Chao authored 4 years ago

driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

After suspend and resume, the HW registers will be set back to their
initial value. We use init_reg_v3_hw() to set some registers, but some
registers are set via firmware in ACPI "_RST" method, so add reset handler
before init_reg_v3_hw().

Link: https://lore.kernel.org/r/1567774537-20003-7-git-send-email-john.garry@huawei.com


Signed-off-by: Wang Chao <wangchao342@hisilicon.com>
Reviewed-by: Zhu Xiongxiong <zhuxiongxiong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

fae0c8dd

scsi: flip the default on use_clustering · ad5211c6

Wang Chao authored 4 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

Most SCSI drivers want to enable "clustering", that is merging of
segments so that they might span more than a single page.  Remove the
ENABLE_CLUSTERING define, and require drivers to explicitly set
DISABLE_CLUSTERING to disable this feature.

Signed-off-by: Wang Chao <wangchao342@hisilicon.com>
Reviewed-by: Zhu Xiongxiong <zhuxiongxiong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ad5211c6

Apr 22, 2020

scsi: hisi_sas: do not reset the timer to wait for phyup when phy already up · 14750584

Luo Jiaxing authored 4 years ago


driver inclusion
category: bugfix
bugzilla: NA

-----------------------------------------------------------------------

We found out that after phy up, the hardware report another oob interrupt,
but did not follow a phy up interrupt. like:

oob ready -> phy up -> DEV found -> oob read -> wait phy up -> timeout

We run link reset when wait phy up timeout, and it make a normal disk into
reset processing. So we made some circumvention action in the code, so that
this abnormal oob interrupt will not start the timer to wait for phy up.

Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: gao chuan <gaochuan4@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

14750584

Jan 12, 2020

hisi_sas: Solve the bug of hisi sas tried to access other's interrupt. · ce419f98

Gao Chuan authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

Hisi sas tried to alloc 16 affinity interrupts. When cpu cores > 16, it will own all
what it alloced. However when cpu cores < 16, such as cpu cores = 8, only 8
affinity interrupts could be alloced. Then hisi sas set "nvecs = 8" to record the
number. When resetting hisi sas host, it tried to operate all it'self interrupts.
it used "queue_count" instead of "nvecs" to find the interrupts. But the
"queue_count" was 16 forever, which lead to try to operate other's interrupt finally.

Feature or Bugfix:Bugfix

Signed-off-by: Gao Chuan <gaochuan4@huawei.com>
Reviewed-by: zhouyupeng <zhouyupeng1@huawei.com>
Reviewed-by: luojian <luojian5@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>

ce419f98

Dec 27, 2019

scsi: hisi_sas: Put function hisi_sas_debugfs_exit() after free_irqs and... · 51a98e77

chenxiang authored 5 years ago and

谢秀奇 committed 5 years ago

scsi: hisi_sas: Put function hisi_sas_debugfs_exit() after free_irqs and destroy workqueue when removing hisi_sas driver

driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

Currently we call function hisi_sas_debugfs_exit() to remove debugfs_dir
before free irqs and destroy workqueue when removing hisi_sas driver.
If dump is triggered before function hisi_sas_debugfs_exit() but
debugfs_work may be called after it, so it may refer to already removed
debugfs_dir which will cause NULL pointer dereference.
To avoid it, put function hisi_sas_debugfs_exit() after free_irqs and
destroy workqueue when removing hisi_sas driver.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: John Garry <john.garry@huawei.com>

Feature or Bugfix:Bugfix

Signed-off-by: chenxiang (M) <chenxiang66@hisilicon.com>
Reviewed-by: huangdaode <huangdaode@hisilicon.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

51a98e77

hisi_sas: Change int to unsigned int to avoid the overflow risk of left shift Operators. · 33ee8871

gaochuan4 authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

"4 << 29" exceeds the max value of int type,
so "4" needs to be changed as unsigned int type.

Signed-off-by: gaochuan (E) <gaochuan4@huawei.com>
Reviewed-by: zhouyupeng1 <zhouyupeng1@huawei.com>
Reviewed-by: chenxiang <chenxiang66@hisilicon.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

33ee8871

scsi: hisi_sas: Fix for setting the PHY linkrate when disconnected · f4b46755

John Garry authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

In commit efdcad62 ("scsi: hisi_sas: Set PHY linkrate when
disconnected"), we use the sas_phy_data.enable flag to track whether the
PHY was enabled or not, so that we know if we should set the PHY negotiated
linkrate at SAS_LINK_RATE_UNKNOWN or SAS_PHY_DISABLED.

However, it is not proper to use sas_phy_data.enable, since it is only set
when libsas attempts to set the PHY disabled/enabled; hence, it may not
even have an initial value.

As a solution to this problem, introduce hisi_sas_phy.enable to track
whether the PHY is enabled or not, so that we can set the negotiated
linkrate properly when the PHY comes down.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: Yang <Yingliang&lt;yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

f4b46755

scsi: hisi_sas: Drop hisi_sas_hw.get_free_slot · 1dabd19c

John Garry authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

In commit 1273d65f ("scsi: hisi_sas: change queue depth from 512 to
4096"), the depth of each queue is the same as the max IPTT in the
system.

As such, as long as we have an IPTT allocated, we will have enough space
on any delivery queue.

All .get_free_slot functions were checking for space on the queue by
reading the DQ read pointer. Drop this, and also raise the code into
common code, as there is nothing hw specific remaining.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>

Feature or Bugfix:Bugfix

Signed-off-by: chenxiang (M) <chenxiang66@hisilicon.com>
Reviewed-by: huangdaode <huangdaode@hisilicon.com>
Reviewed-by: Yang <Yingliang&lt;yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

1dabd19c

hisi_sas: Add the ncq tag for ATA cmd if it's NON-Write/Read IO. · 61620de1

Yupeng Zhou authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

Currently it only assigns ncq tag for FPDMA READ and FPDMA WRITE,
for the other NCQ command(such as FPDMA SEND/FPDMA RECV/NCQ_NON_DATA),
their ncq tags are all 0.
So if serval ncq commands are sent concurently whose tags are 0,
it will cause IO error.

Signed-off-by: Yupeng Zhou <zhouyupeng1@huawei.com>
Reviewed-by: luojian <luojian5@huawei.com>
Reviewed-by: chenxiang <chenxiang66@hisilicon.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

61620de1

hisi_sas: Fix the issue of disk in fail when only 1 cpu online. · e3b9140e

Yupeng Zhou authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: feature
bugzilla: NA
CVE: NA

This patch fix the issue of disk in fail when only 1 cpu online.
alloc the DQ number according to the interrupt allocated.

Signed-off-by: Yupeng Zhou <zhouyupeng1@huawei.com>
Reviewed-by: Jian Luo <luojian5@huawei.com>
Reviewed-by: Chuan Gao <gaochuan4@huawei.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

e3b9140e

scsi: hisi_sas: replace "%p" with "%pK" · 4f608393

Xiang Chen authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

The format specifier "%p" can leak kernel address, and use "%pK" instead.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

4f608393

hisi_sas: Send the NOTIFY spinup primitive when the disk response need spinup sense key. · c20a454f

Yupeng Zhou authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: feature
bugzilla: NA
CVE: NA

This patch add the flow that Send the NOTIFY spinup primitive
when the disk response need spinup sense key.

Signed-off-by: Yupeng Zhou <zhouyupeng1@huawei.com>
Reviewed-by: luojian <luojian5@huawei.com>
Reviewed-by: chenxiang <chenxiang66@hisilicon.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

c20a454f

hisi_sas: Fix the bist loopback issues. · 68e3fff2

Yupeng Zhou authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: feature
bugzilla: NA
CVE: NA

1. modify the init bist data for PRBS test.
2. change the filename from 'loopback mode' to 'loopback_mode'.

Signed-off-by: Yupeng Zhou <zhouyupeng1@huawei.com>
Reviewed-by: luojian <luojian5@huawei.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

68e3fff2

hisi_sas: Adjust the DQ selection method. · 04f9b4fe

Yupeng Zhou authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: feature
bugzilla: NA
CVE: NA

Adjust the DQ selection method, select the DQ by numa node ID.

Signed-off-by: Yupeng Zhou <zhouyupeng1@huawei.com>
Reviewed-by: luojian <luojian5@huawei.com>
Reviewed-by: chenxiang <chenxiang66@hisilicon.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

04f9b4fe

hisi_sas: non-Write/Read command drop off the underflow verify. · 987fcec2

Yupeng Zhou authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

1. About the non-Write/Read command IO,
should not check the response length.
2. About the write/read IO, should be aborted if the IO remained in target.

Signed-off-by: Yupeng Zhou <zhouyupeng1@huawei.com>
Reviewed-by: luojian <luojian5@huawei.com>
Reviewed-by: chenxiang <chenxiang66@hisilicon.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

987fcec2

hisi_sas: increase the exception IO DFX. · 9f149ec2

Yupeng Zhou authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

Increase the exception IO DFX, add the sense key print if IO underflow.

Signed-off-by: Yupeng Zhou <zhouyupeng1@huawei.com>
Reviewed-by: luojian <luojian5@huawei.com>
Reviewed-by: chenxiang <chenxiang66@hisilicon.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

9f149ec2

hisi_sas: optimize the code convention. · 7109a3aa

Yupeng Zhou authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

optimize the code convention, no need to '|' the 0xffffffff with other bit.

Signed-off-by: Yupeng Zhou <zhouyupeng1@huawei.com>
Reviewed-by: luojian <luojian5@huawei.com>
Reviewed-by: chenxiang <chenxiang66@hisilicon.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

7109a3aa

hisi_sas: add the bist loopback feature. · 7add7cc0

Yupeng Zhou authored 5 years ago and

谢秀奇 committed 5 years ago


driver inclusion
category: feature
bugzilla: NA
CVE: NA

add the bist loopback feature.

Signed-off-by: Yupeng Zhou <zhouyupeng1@huawei.com>
Reviewed-by: luojian <luojian5@huawei.com>
Reviewed-by: chenxiang <chenxiang66@hisilicon.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

7add7cc0