Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
torvalds
GitHub Repository: torvalds/linux
Path: blob/master/Documentation/ABI/testing/debugfs-cxl
32129 views
What:		/sys/kernel/debug/cxl/memX/inject_poison
Date:		April, 2023
KernelVersion:	v6.4
Contact:	[email protected]
Description:
		(WO) When a Device Physical Address (DPA) is written to this
		attribute, the memdev driver sends an inject poison command to
		the device for the specified address. The DPA must be 64-byte
		aligned and the length of the injected poison is 64-bytes. If
		successful, the device returns poison when the address is
		accessed through the CXL.mem bus. Injecting poison adds the
		address to the device's Poison List and the error source is set
		to Injected. In addition, the device adds a poison creation
		event to its internal Informational Event log, updates the
		Event Status register, and if configured, interrupts the host.
		It is not an error to inject poison into an address that
		already has poison present and no error is returned. If the
		device returns 'Inject Poison Limit Reached' an -EBUSY error
		is returned to the user. The inject_poison attribute is only
		visible for devices supporting the capability.

		TEST-ONLY INTERFACE: This interface is intended for testing
		and validation purposes only. It is not a data repair mechanism
		and should never be used on production systems or live data.

		DATA LOSS RISK: For CXL persistent memory (PMEM) devices,
		poison injection can result in permanent data loss. Injected
		poison may render data permanently inaccessible even after
		clearing, as the clear operation writes zeros and does not
		recover original data.

		SYSTEM STABILITY RISK: For volatile memory, poison injection
		can cause kernel crashes, system instability, or unpredictable
		behavior if the poisoned addresses are accessed by running code
		or critical kernel structures.

What:		/sys/kernel/debug/cxl/memX/clear_poison
Date:		April, 2023
KernelVersion:	v6.4
Contact:	[email protected]
Description:
		(WO) When a Device Physical Address (DPA) is written to this
		attribute, the memdev driver sends a clear poison command to
		the device for the specified address. Clearing poison removes
		the address from the device's Poison List and writes 0 (zero)
		for 64 bytes starting at address. It is not an error to clear
		poison from an address that does not have poison set. If the
		device cannot clear poison from the address, -ENXIO is returned.
		The clear_poison attribute is only visible for devices
		supporting the capability.

		TEST-ONLY INTERFACE: This interface is intended for testing
		and validation purposes only. It is not a data repair mechanism
		and should never be used on production systems or live data.

		CLEAR IS NOT DATA RECOVERY: This operation writes zeros to the
		specified address range and removes the address from the poison
		list. It does NOT recover or restore original data that may have
		been present before poison injection. Any original data at the
		cleared address is permanently lost and replaced with zeros.

		CLEAR IS NOT A REPAIR MECHANISM: This interface is for testing
		purposes only and should not be used as a data repair tool.
		Clearing poison is fundamentally different from data recovery
		or error correction.

What:		/sys/kernel/debug/cxl/regionX/inject_poison
Date:		August, 2025
Contact:	[email protected]
Description:
		(WO) When a Host Physical Address (HPA) is written to this
		attribute, the region driver translates it to a Device
		Physical Address (DPA) and identifies the corresponding
		memdev. It then sends an inject poison command to that memdev
		at the translated DPA. Refer to the memdev ABI entry at:
		/sys/kernel/debug/cxl/memX/inject_poison for the detailed
		behavior. This attribute is only visible if all memdevs
		participating in the region support both inject and clear
		poison commands.

		TEST-ONLY INTERFACE: This interface is intended for testing
		and validation purposes only. It is not a data repair mechanism
		and should never be used on production systems or live data.

		DATA LOSS RISK: For CXL persistent memory (PMEM) devices,
		poison injection can result in permanent data loss. Injected
		poison may render data permanently inaccessible even after
		clearing, as the clear operation writes zeros and does not
		recover original data.

		SYSTEM STABILITY RISK: For volatile memory, poison injection
		can cause kernel crashes, system instability, or unpredictable
		behavior if the poisoned addresses are accessed by running code
		or critical kernel structures.

What:		/sys/kernel/debug/cxl/regionX/clear_poison
Date:		August, 2025
Contact:	[email protected]
Description:
		(WO) When a Host Physical Address (HPA) is written to this
		attribute, the region driver translates it to a Device
		Physical Address (DPA) and identifies the corresponding
		memdev. It then sends a clear poison command to that memdev
		at the translated DPA. Refer to the memdev ABI entry at:
		/sys/kernel/debug/cxl/memX/clear_poison for the detailed
		behavior. This attribute is only visible if all memdevs
		participating in the region support both inject and clear
		poison commands.

		TEST-ONLY INTERFACE: This interface is intended for testing
		and validation purposes only. It is not a data repair mechanism
		and should never be used on production systems or live data.

		CLEAR IS NOT DATA RECOVERY: This operation writes zeros to the
		specified address range and removes the address from the poison
		list. It does NOT recover or restore original data that may have
		been present before poison injection. Any original data at the
		cleared address is permanently lost and replaced with zeros.

		CLEAR IS NOT A REPAIR MECHANISM: This interface is for testing
		purposes only and should not be used as a data repair tool.
		Clearing poison is fundamentally different from data recovery
		or error correction.

What:		/sys/kernel/debug/cxl/einj_types
Date:		January, 2024
KernelVersion:	v6.9
Contact:	[email protected]
Description:
		(RO) Prints the CXL protocol error types made available by
		the platform in the format:

			0x<error number> <error type>

		The possible error types are (as of ACPI v6.5):

			0x1000	CXL.cache Protocol Correctable
			0x2000	CXL.cache Protocol Uncorrectable non-fatal
			0x4000	CXL.cache Protocol Uncorrectable fatal
			0x8000	CXL.mem Protocol Correctable
			0x10000	CXL.mem Protocol Uncorrectable non-fatal
			0x20000	CXL.mem Protocol Uncorrectable fatal

		The <error number> can be written to einj_inject to inject
		<error type> into a chosen dport.

What:		/sys/kernel/debug/cxl/$dport_dev/einj_inject
Date:		January, 2024
KernelVersion:	v6.9
Contact:	[email protected]
Description:
		(WO) Writing an integer to this file injects the corresponding
		CXL protocol error into $dport_dev ($dport_dev will be a device
		name from /sys/bus/pci/devices). The integer to type mapping for
		injection can be found by reading from einj_types. If the dport
		was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
		a CXL 2.0 error is injected.