LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 04-11-2018, 05:23 PM   #1
vanbynight
LQ Newbie
 
Registered: Apr 2018
Posts: 1

Rep: Reputation: Disabled
umem block driver errors


This is my first post here. I've been googleing this for the better part of a day and still haven't found a solution for this device driver. I've posted thsis on a couple different sites before realizing that this is probably the best one to put it on so sorry for anyone who read this twice.

I made the mistake of buying some (micro memory) PCIE Battery Backed RAM cards off ebay because the price was right. They appear to be from a EMC SAN. According to the listing they're both Curtiss Wright 5453J and there are some markings on the PCB that confirm this. It's a battery backed DRAM card with 1GB of usable space, it is supposed to appear as a block device to the operating system and is basically a small NVME SSD that wipes its storage if it loses power for more than a day. In the past they were mostly used for external journals for files systems before SSD storage became affordable.


The problem I'm encountering is when I tried booting up into kernel 4.4.0 I received an error from the umem driver that the magic number didn't match. I looked into this error and found that someone made a patch for ubuntu 14.04 running kernel 3.13. So I installed ubuntu 14.04.5 and patched the umem.ko driver and now kernel 3.13 tries to initalize the devices but I receive memory access errors.

This driver looks like it was last patched in 2002 according to the source files so I'm guessing there was a compatibility problem between kernel 2.6 and later kernels that wasn't fixed. It also was written for PCI and PCI-X (maybe older interfaces too) so I would think it should work using PCIE.

this is the patch I applied originally from https://patchwork.kernel.org/patch/8180661/
It didn't seem to work at first so I just edited the umem.c file in vim
this probably was a bad idea but the driver attached to the device after editing so that was good

Code:
Comments
Dr. Net! - Eugen Rieck - Feb. 1, 2016, 3:33 p.m.
From: Eugen Rieck <eugen@drnet.at>

Allow for multiple magic numbers per model and include the magic number
for the PCIe-based 5453CN, that shares the device id of the older
PCI-X-based 5425CN.

Signed-off-by: Eugen Rieck <eugen@drnet.at>
---
The umem driver for Micro Memory (now Curtiss-Wright) NVRAM cards relied
on a unique magic number for each PCI device id. Newer cards violate this
and were thus rejected by the driver.
Developed for 3.13.0 (Ubuntu 14.04,  ubuntu 3.13.0-76-generic), but merges
cleanly into current (4.5-rc2) as the driver is old and largely unchanged.

 umem.c |   29 +++++++++++++++++++++++------
 umem.h |    2 ++
 2 files changed, 25 insertions(+), 6 deletions(-)
Patch
--- drivers/block/umem.c.orig    2016-02-01 14:51:07.463944676 +0100
+++ drivers/block/umem.c    2016-02-01 15:03:52.815934423 +0100
@@ -32,6 +32,8 @@ 
  * 17May2002:NeilBrown   - remove init_mem initialisation.  Instead detect
  *             - a sequence of writes that cover the card, and
  *             - set initialised bit then.
+ * 01Feb2016:Eugen Rieck - allow multiple magic numbers per pci id to
+ *               acommodate newer models
  */
 
 #undef DEBUG    /* #define DEBUG if you want debugging info (pr_debug) */
@@ -799,6 +801,9 @@ 
     unsigned long    csr_base;
     unsigned long    csr_len;
     int        magic_number;
+    int        magic_numbers[MAGIC_NUMBERS_PER_DEV];
+    int         magic_number_ok = 0;
+    int         i;
     static int    printed_version;
 
     if (!printed_version++)
@@ -850,27 +855,39 @@ 
     switch (card->dev->device) {
     case 0x5415:
         card->flags |= UM_FLAG_NO_BYTE_STATUS | UM_FLAG_NO_BATTREG;
-        magic_number = 0x59;
+        magic_numbers[0] = 0x59;
+        magic_numbers[1] = 0x100;
         break;
 
     case 0x5425:
         card->flags |= UM_FLAG_NO_BYTE_STATUS;
-        magic_number = 0x5C;
+        magic_numbers[0] = 0x5C;
+        magic_numbers[1] = 0x5E;
+        magic_numbers[2] = 0x100;
         break;
 
     case 0x6155:
         card->flags |= UM_FLAG_NO_BYTE_STATUS |
                 UM_FLAG_NO_BATTREG | UM_FLAG_NO_BATT;
-        magic_number = 0x99;
+        magic_numbers[0] = 0x99;
+        magic_numbers[1] = 0x100;
         break;
 
     default:
-        magic_number = 0x100;
+        magic_numbers[0] = 0x100;
         break;
     }
 
-    if (readb(card->csr_remap + MEMCTRLSTATUS_MAGIC) != magic_number) {
-        dev_printk(KERN_ERR, &card->dev->dev, "Magic number invalid\n");
+    magic_number = readb(card->csr_remap + MEMCTRLSTATUS_MAGIC);
+    for (i = 0; i < MAGIC_NUMBERS_PER_DEV; i++) {
+        if (magic_numbers[i] == magic_number) {
+            magic_number_ok = 1;
+            break;
+        }
+        if (magic_numbers[i] >= 0x100) break;
+    }
+    if (!magic_number_ok) {
+        dev_printk(KERN_ERR, &card->dev->dev, "Magic number 0x%02x invalid for device 0x%04x\n", magic_number, card->dev->device);
         ret = -ENOMEM;
         goto failed_magic;
     }

--- drivers/block/umem.h.orig    2016-02-01 14:51:16.055944561 +0100
+++ drivers/block/umem.h    2016-02-01 14:47:38.463947477 +0100
@@ -15,6 +15,8 @@ 
 
 #define IRQ_TIMEOUT (1 * HZ)
 
+#define MAGIC_NUMBERS_PER_DEV    2
+
 /* CSR register definition */
 #define MEMCTRLSTATUS_MAGIC    0x00
 #define  MM_MAGIC_VALUE        (unsigned char)0x59

I rebuilt the module and now I can see the devices as /dev/umem[a,b]

lsblk reports the following

Code:
sda      8:0    0    20G  0 disk
├─sda1   8:1    0   512M  0 part /boot/efi
└─sda2   8:2    0  19.5G  0 part /
sdb      8:16   0 524.5G  0 disk
└─sdb1   8:17   0 524.5G  0 part /usr
umema  251:0    0     1G  0 disk
umemb  251:64   0     1G  0 disk
However dmesg reports all this garbage on boot there are two installed so that's why it's reporting two different devices

Code:
root@ubuntu-test:/home/cscott# dmesg | grep umem
[    5.409505] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[    5.409509] umem 0000:05:00.0: I/O error on sector 16/4096
[    5.409511] umem 0000:05:00.0: DMAstat -
[    5.409552] umem 0000:05:00.0: DMAstat -
[    5.409555] umem 0000:05:00.0: Memory access error detected (err count 0)
[    5.409556] umem 0000:05:00.0: Multi-bit EDC error
[    5.409558] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[    5.409560] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[    5.409564] umem 0000:05:00.0: I/O error on sector 0/4096
[    5.409565] umem 0000:05:00.0: DMAstat -
[    5.409604] umem 0000:05:00.0: DMAstat -
[    5.409607] umem 0000:05:00.0: Memory access error detected (err count 0)
[    5.409608] umem 0000:05:00.0: Multi-bit EDC error
[    5.409610] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[    5.409612] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[    5.409615] umem 0000:05:00.0: I/O error on sector 0/4096
[    5.409616] umem 0000:05:00.0: DMAstat -
[    5.409659] umem 0000:05:00.0: DMAstat -
[    5.409662] umem 0000:05:00.0: Memory access error detected (err count 0)
[    5.409663] umem 0000:05:00.0: Multi-bit EDC error
[    5.409665] umem 0000:05:00.0: Fault Address 0x0000001f00, Fault Data 0x116c30a4425832b1
[    5.409667] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[    5.409670] umem 0000:05:00.0: I/O error on sector 8/4096
[    5.409671] umem 0000:05:00.0: DMAstat -
[    5.409712] umem 0000:05:00.0: DMAstat -
[    5.409715] umem 0000:05:00.0: Memory access error detected (err count 0)
[    5.409716] umem 0000:05:00.0: Multi-bit EDC error
[    5.409718] umem 0000:05:00.0: Fault Address 0x0000002f00, Fault Data 0x10f4104798a000f0
[    5.409719] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[    5.409729] umem 0000:05:00.0: I/O error on sector 16/4096
[    5.409731] umem 0000:05:00.0: DMAstat -
[    5.409781] umem 0000:05:00.0: DMAstat -
[    5.409784] umem 0000:05:00.0: Memory access error detected (err count 0)
[    5.409785] umem 0000:05:00.0: Multi-bit EDC error
[    5.409787] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[    5.409789] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[    5.409792] umem 0000:05:00.0: I/O error on sector 0/4096
[    5.409794] umem 0000:05:00.0: DMAstat -
[    5.409833] umem 0000:05:00.0: DMAstat -
[    5.409836] umem 0000:05:00.0: Memory access error detected (err count 0)
[    5.409838] umem 0000:05:00.0: Multi-bit EDC error
[    5.409840] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[    5.409841] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[    5.409845] umem 0000:05:00.0: I/O error on sector 0/4096
[    5.409846] umem 0000:05:00.0: DMAstat -
[    5.409888] umem 0000:05:00.0: DMAstat -
[    5.409891] umem 0000:05:00.0: Memory access error detected (err count 0)
[    5.409892] umem 0000:05:00.0: Multi-bit EDC error
[    5.409895] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[    5.409896] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[    5.409900] umem 0000:05:00.0: I/O error on sector 0/4096
[    5.409901] umem 0000:05:00.0: DMAstat -
[    5.409939] umem 0000:05:00.0: DMAstat -

and so on and so forth

[   21.878134] umem 0000:05:00.0: Memory access error detected (err count 0)
[   21.878136] umem 0000:05:00.0: Multi-bit EDC error
[   21.878140] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[   21.878142] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[   21.878148] umem 0000:05:00.0: I/O error on sector 0/4096
[   21.878150] umem 0000:05:00.0: DMAstat -
[   21.878195] umem 0000:05:00.0: DMAstat -
[   21.878199] umem 0000:05:00.0: Memory access error detected (err count 0)
[   21.878201] umem 0000:05:00.0: Multi-bit EDC error
[   21.878205] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[   21.878207] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[   21.878213] umem 0000:05:00.0: I/O error on sector 0/4096
[   21.878215] umem 0000:05:00.0: DMAstat -
[   21.878262] umem 0000:05:00.0: DMAstat -
[   21.878266] umem 0000:05:00.0: Memory access error detected (err count 0)
[   21.878268] umem 0000:05:00.0: Multi-bit EDC error
[   21.878272] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[   21.878275] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[   21.878280] umem 0000:05:00.0: I/O error on sector 0/4096
[   21.878282] umem 0000:05:00.0: DMAstat -
[   21.878327] umem 0000:05:00.0: DMAstat -
[   21.878332] umem 0000:05:00.0: Memory access error detected (err count 0)
[   21.878334] umem 0000:05:00.0: Multi-bit EDC error
[   21.878337] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[   21.878340] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[   21.878345] umem 0000:05:00.0: I/O error on sector 0/4096
[   21.878347] umem 0000:05:00.0: DMAstat -
[   21.878396] umem 0000:05:00.0: DMAstat -
[   21.878400] umem 0000:05:00.0: Memory access error detected (err count 0)
[   21.878402] umem 0000:05:00.0: Multi-bit EDC error
[   21.878406] umem 0000:05:00.0: Fault Address 0x0000003f00, Fault Data 0x620c20a3b81580a0
[   21.878409] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[   21.878414] umem 0000:05:00.0: I/O error on sector 24/4096
[   21.878416] umem 0000:05:00.0: DMAstat -
[   21.878462] umem 0000:05:00.0: DMAstat -
[   21.878467] umem 0000:05:00.0: Memory access error detected (err count 0)
[   21.878469] umem 0000:05:00.0: Multi-bit EDC error
[   21.878473] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[   21.878475] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[   21.878481] umem 0000:05:00.0: I/O error on sector 0/4096
[   21.878483] umem 0000:05:00.0: DMAstat -
[   21.878528] umem 0000:05:00.0: DMAstat -
[   21.878532] umem 0000:05:00.0: Memory access error detected (err count 0)
[   21.878535] umem 0000:05:00.0: Multi-bit EDC error
[   21.878538] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[   21.878541] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[   21.878545] umem 0000:05:00.0: I/O error on sector 0/4096
[   21.878548] umem 0000:05:00.0: DMAstat -
[   21.878593] umem 0000:05:00.0: DMAstat -
[   21.878598] umem 0000:05:00.0: Memory access error detected (err count 0)
[   21.878600] umem 0000:05:00.0: Multi-bit EDC error
[   21.878604] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[   21.878607] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[   21.878612] umem 0000:05:00.0: I/O error on sector 0/4096
[   21.878614] umem 0000:05:00.0: DMAstat -
[   21.878658] umem 0000:05:00.0: DMAstat -
[   21.878662] umem 0000:05:00.0: Memory access error detected (err count 0)
[   21.878665] umem 0000:05:00.0: Multi-bit EDC error
[   21.878668] umem 0000:05:00.0: Fault Address 0x0000007f00, Fault Data 0xe5754bb094460c14
[   21.878671] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[   21.878676] umem 0000:05:00.0: I/O error on sector 56/4096
[   21.878678] umem 0000:05:00.0: DMAstat -
[   21.878723] umem 0000:05:00.0: DMAstat -
[   21.878728] umem 0000:05:00.0: Memory access error detected (err count 0)
[   21.878730] umem 0000:05:00.0: Multi-bit EDC error
[   21.878734] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[   21.878737] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[   21.878742] umem 0000:05:00.0: I/O error on sector 0/4096
[   21.878744] umem 0000:05:00.0: DMAstat -
[   21.878788] umem 0000:05:00.0: DMAstat -
[   21.878792] umem 0000:05:00.0: Memory access error detected (err count 0)
[   21.878795] umem 0000:05:00.0: Multi-bit EDC error
[   21.878798] umem 0000:05:00.0: Fault Address 0x0000000f00, Fault Data 0xf6ee04e64a340280
[   21.878800] umem 0000:05:00.0: Fault Check 0x00, Fault Syndrome 0x00
[   21.878805] umem 0000:05:00.0: I/O error on sector 0/4096
[   21.878807] umem 0000:05:00.0: DMAstat -
I get these error when I try to access either one of the block devices for a read/write operation

Right now they are sharing IRQs with other devices I'm not sure if that matters but in the bios I have them mapped to IRQs that have some USB peripherals too.

and some more debug info

output of
Code:
 lscpi -nnk
Code:
04:00.0 Memory controller [0580]: Micro Memory MM-5425CN PCI 64/66 Memory Module with Battery Backup [1332:5425] (rev 08)
        Subsystem: Micro Memory Device [1332:5453]
        Kernel driver in use: umem
05:00.0 Memory controller [0580]: Micro Memory MM-5425CN PCI 64/66 Memory Module with Battery Backup [1332:5425] (rev 08)
        Subsystem: Micro Memory Device [1332:5453]
I'm not sure if it matters but this is reporting that they are 5425-CN devices which are considerably older than the 5453J

Code:
root@ubuntu-test:/# uname -a
Linux ubuntu-test 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
From what I understand these were used in an EMC or Nutanix storage server of some sort as an external journal device. I've used older versions of these cards before for external journals and they have been quite reliable along with providing performance that is up there with the new NVME storage technology. I'm not expecting a miracle but if someone has trouble shooting advice for this type of error I'll gladly take it. I have coded a little before and can try and fix the driver if someone can point me in the right direction of what should be fixed.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
error in smb sever in centos ext3 errors block bitmap for group 256 not in group (block 33188) GAUTAM singh Linux - Server 2 11-20-2017 06:10 AM
[SOLVED] is there a way to block ips based on apache2 errors? jonjetjon Linux - Software 4 12-02-2014 04:38 PM
Sparc boot errors (read error on block) austinramsay Gentoo 3 10-22-2013 04:22 AM
Errors when compiling kernel 2.6.23, cdrom.o and drivers/block/built-in.o stanunruh Linux - Newbie 3 08-26-2011 03:56 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 04:22 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration