When we are finished with return PFNs to the hypervisor, then
populate it back, and also mark the E820 MMIO and E820 gaps
as IDENTITY_FRAMEs, we then call P2M to set areas that can
be used for ballooning. We were off by one, and ended up
over-writting a P2M entry that most likely was an IDENTITY_FRAME.
For example:
1-1 mapping on 40000->40200
1-1 mapping on bc558->bc5ac
1-1 mapping on bc5b4->bc8c5
1-1 mapping on bc8c6->bcb7c
1-1 mapping on bcd00->100000
Released 614 pages of unused memory
Set 277889 page(s) to 1-1 mapping
Populating 40200-40466 pfn range: 614 pages added
=> here we set from 40466 up to bc559 P2M tree to be
INVALID_P2M_ENTRY. We should have done it up to bc558.
The end result is that if anybody is trying to construct
a PTE for PFN bc558 they end up with ~PAGE_PRESENT.
CC: stable@vger.kernel.org
Reported-by-and-Tested-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If the controller has no PCIe module attached, accessing of the device
configuration space causes a data bus error. Avoid this by checking the
status of the PCIe link in advance, and indicate an error if the link
is down.
Signed-off-by: Gabor Juhos <juhosg@openwrt.org>
Cc: stable@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4293/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The binding doc and dts use properties "fsl,{cd,wp}-internal" while
esdhc driver uses "fsl,{cd,wp}-controller". Fix binding doc and dts
to get them match driver code.
Reported-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: <stable@vger.kernel.org>
Acked-by: Chris Ball <cjb@laptop.org>
Though commit 602bf40 (ARM: imx6: exit coherency when shutting down
a cpu) improves the stability of imx6q cpu hotplug a lot, there are
still hangs seen with a more stressful hotplug testing.
It's expected that once imx_enable_cpu(cpu, false) is called, the cpu
will be taken down by hardware immediately, and the code after that
will not get any chance to execute. However, this is not always the
case from the testing. The cpu could possibly be alive for a few
cycles before hardware actually takes it down. So rather than letting
cpu execute some code that could cause a hang in these cycles, let's
make the cpu spin there and wait for hardware to take it down.
Cc: <stable@vger.kernel.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Commit 91f68c89d8 ("block: fix infinite loop in __getblk_slow")
is not good: a successful call to grow_buffers() cannot guarantee
that the page won't be reclaimed before the immediate next call to
__find_get_block(), which is why there was always a loop there.
Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595:
inode #19278: block 664: comm cc1: unable to read itable block" on console,
which pointed to this commit.
I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs
sometimes fails from a missing header file, under memory pressure on
ppc G5. I've never seen this on x86, and I've never seen it on 3.5-rc7
itself, despite that commit being in there: bisection pointed to an
irrelevant pinctrl merge, but hard to tell when failure takes between
18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2).
(I've since found such __ext4_get_inode_loc errors in /var/log/messages
from previous weeks: why the message never appeared on console until
yesterday morning is a mystery for another day.)
Revert 91f68c89d8, restoring __getblk_slow() to how it was (plus
a checkpatch nitfix). Simplify the interface between grow_buffers()
and grow_dev_page(), and avoid the infinite loop beyond end of device
by instead checking init_page_buffers()'s end_block there (I presume
that's more efficient than a repeated call to blkdev_max_block()),
returning -ENXIO to __getblk_slow() in that case.
And remove akpm's ten-year-old "__getblk() cannot fail ... weird"
comment, but that is worrying: are all users of __getblk() really
now prepared for a NULL bh beyond end of device, or will some oops??
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # 3.0 3.2 3.4 3.5
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is already fixed for ILK and SNB in the below commit but somehow
IVB is missed.
commit ab2f9df10d
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date: Mon Feb 27 12:40:10 2012 -0800
drm/i915: fix color order for BGR formats on SNB
Had the wrong bits and field definitions.
Signed-off-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
Reviewed-by: Antti Koskipää <antti.koskipaa@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Wrong order of parameters passed-in when calling hdmi/adpa
/lvds_pipe_enabled(), 2nd and 3rd parameters are reversed.
This bug was indroduced by
commit 1519b9956e
Author: Keith Packard <keithp@keithp.com>
Date: Sat Aug 6 10:35:34 2011 -0700
drm/i915: Fix PCH port pipe select in CPT disable paths
The reachable tag for this commit is v3.1-rc1-3-g1519b99
Signed-off-by: Anhua Xu <anhua.xu@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44876
Tested-by: Daniel Schroeder <sec@dschroeder.info>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The old interface is bugged and reads the wrong sensor when retrieving
the reading for the chassis fan (it reads the CPU sensor); the new
interface works fine.
Reported-by: Göran Uddeborg <goeran@uddeborg.se>
Cc: stable@vger.kernel.org
Tested-by: Göran Uddeborg <goeran@uddeborg.se>
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
This patch fixes the name of the macro used to mask the
mmc interrupt: erroneously it was used: MMC_DEFAUL_MASK.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erroneously the DWMAC_CORE_3_40 was set to 34 instead of 0x34.
This can generate problems when run on old chips because
the driver assumes that there are the extra 16 regs available
for perfect filtering.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Gianni Antoniazzi <gianni.antoniazzi-ext@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sylvain Munault reported following info :
- TCP connection get "stuck" with data in send queue when doing
"large" transfers ( like typing 'ps ax' on a ssh connection )
- Only happens on path where the PMTU is lower than the MTU of
the interface
- Is not present right after boot, it only appears 10-20min after
boot or so. (and that's inside the _same_ TCP connection, it works
fine at first and then in the same ssh session, it'll get stuck)
- Definitely seems related to fragments somehow since I see a router
sending ICMP message saying fragmentation is needed.
- Exact same setup works fine with kernel 3.5.1
Problem happens when the 10 minutes (ip_rt_mtu_expires) expiration
period is over.
ip_rt_update_pmtu() calls dst_set_expires() to rearm a new expiration,
but dst_set_expires() does nothing because dst.expires is already set.
It seems we want to set the expires field to a new value, regardless
of prior one.
With help from Julian Anastasov.
Reported-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
CC: Julian Anastasov <ja@ssi.bg>
Tested-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This issue was recently observed on an AMD C-50 CPU where a patch of
maximum size was applied.
Commit be62adb492 ("x86, microcode, AMD: Simplify ucode verification")
added current_size in get_matching_microcode(). This is calculated as
size of the ucode patch + 8 (ie. size of the header). Later this is
compared against the maximum possible ucode patch size for a CPU family.
And of course this fails if the patch has already maximum size.
Cc: <stable@vger.kernel.org> [3.3+]
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1344361461-10076-1-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The sub-register used to access the stack (sp, esp, or rsp) is not
determined by the address size attribute like other memory references,
but by the stack segment's B bit (if not in x86_64 mode).
Fix by using the existing stack_mask() to figure out the correct mask.
This long-existing bug was exposed by a combination of a27685c33a
(emulate invalid guest state by default), which causes many more
instructions to be emulated, and a seabios change (possibly a bug) which
causes the high 16 bits of esp to become polluted across calls to real
mode software interrupts.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This change marks interface as down on reset, otherwise the driver can't
reinitialize itself properly.
Without the change a transient problem turns out to be critical and leads
to inavailability to reset the driver without brcmsmac module unload/load
cycle:
ieee80211 phy0: wl0: PSM microcode watchdog fired at 5993 (seconds). Resetting.
brcms_c_dpc : PSM Watchdog, chipid 0xa8d9, chiprev 0x1
ieee80211 phy0: wl0: fatal error, reinitializing
ieee80211 phy0: Hardware restart was requested
ieee80211 phy0: brcms_ops_start: brcms_up() returned -19
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Cc: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Pull drm fixes from Dave Airlie:
"Intel: edid fixes, power consumption fix, s/r fix, haswell fix
Radeon: BIOS loading fixes for UEFI and Thunderbolt machines, better
MSAA validation, lockup timeout fixes, modesetting fixes
One udl dpms fix, one vmwgfx fix, a couple of trivial core changes.
There is an export added to ACPI as part of the radeon bios fixes.
I've also included the fbcon flashing cursor vs deinit race fix, that
seems the simplest place to start"
Trivial conflict in drivers/video/console/fbcon.c due to me having
already applied the fbcon flashing cursor vs deinit race fix, and Dave
had added a comment in there too.
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (22 commits)
fbcon: fix race condition between console lock and cursor timer (v1.1)
drm: Add missing static storage class specifiers in drm_proc.c file
drm/udl: dpms off the crtc when disabled.
drm: Remove two unused fields from struct drm_display_mode
drm: stop vmgfx driver explosion
drm/radeon/ss: use num_crtc rather than hardcoded 6
Revert "drm/radeon: fix bo creation retry path"
drm/i915: use hsw rps tuning values everywhere on gen6+
drm/radeon: split ATRM support out from the ATPX handler (v3)
drm/radeon: convert radeon vfct code to use acpi_get_table_with_size
ACPI: export symbol acpi_get_table_with_size
drm/radeon: implement ACPI VFCT vbios fetch (v3)
drm/radeon/kms: extend the Fujitsu D3003-S2 board connector quirk to cover later silicon stepping
drm/radeon: fix checking of MSAA renderbuffers on r600-r700
drm/radeon: allow CMASK and FMASK in the CS checker on r600-r700
drm/radeon: init lockup timeout on ring init
drm/radeon: avoid turning off spread spectrum for used pll
drm/i915: fall back to bit-banging if GMBUS fails in CRT EDID reads
drm/i915: extract connector update from intel_ddc_get_modes() for reuse
drm/i915: fix hsw uncached pte
...
Pull SCSI target fixes from Nicholas Bellinger:
"The executive summary includes:
- Post-merge review comments for tcm_vhost (MST + nab)
- Avoid debugging overhead when not debugging for tcm-fc(FCoE) (MDR)
- Fix NULL pointer dereference bug on alloc_page failulre (Yi Zou)
- Fix REPORT_LUNs regression bug with pSCSI export (AlexE + nab)
- Fix regression bug with handling of zero-length data CDBs (nab)
- Fix vhost_scsi_target structure alignment (MST)
Thanks again to everyone who contributed a bugfix patch, gave review
feedback on tcm_vhost code, and/or reported a bug during their own
testing over the last weeks.
There is one other outstanding bug reported by Roland recently related
to SCSI transfer length overflow handling, for which the current
proposed bugfix has been left in queue pending further testing with
other non iscsi-target based fabric drivers.
As the patch is verified with loopback (local SGL memory from SCSI
LLD) + tcm_qla2xxx (TCM allocated SGL memory mapped to PCI HW) fabric
ports, it will be included into the next 3.6-rc-fixes PULL request."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Remove unused se_cmd.cmd_spdtl
tcm_fc: rcu_deref outside rcu lock/unlock section
tcm_vhost: Fix vhost_scsi_target structure alignment
target: Fix regression bug with handling of zero-length data CDBs
target/pscsi: Fix bug with REPORT_LUNs handling for SCSI passthrough
tcm_vhost: Change vhost_scsi_target->vhost_wwpn to char *
target: fix NULL pointer dereference bug alloc_page() fails to get memory
tcm_fc: Avoid debug overhead when not debugging
tcm_vhost: Post-merge review changes requested by MST
tcm_vhost: Fix incorrect IS_ERR() usage in vhost_scsi_map_iov_to_sgl
Pull i2c-embedded fixes from Wolfram Sang:
"Some bugfixes for the "embedded" part of the I2C subsystem. The fixes
affect mostly drivers which have been largely reworked lately and
where regressions appeared."
* 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux:
i2c: tegra: protect suspend/resume callbacks with CONFIG_PM_SLEEP
i2c: diolan-u2c: Fix master_xfer return code
I2C: OMAP: xfer: fix runtime PM get/put balance on error
i2c: nomadik: Add default configuration into the Nomadik I2C driver
These patches fix the Samsung PWM driver and perform some minor cleanups
like fixing checkpatch and sparse warnings. Two redundant error messages
are removed and the Kconfig help text for the PWM subsystem is made more
descriptive.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJQNJEVAAoJEN0jrNd/PrOhuWMP+QHfkkRRetJXrBUq8YOISln3
zPFh0oRgbd9679dIQ23Gzm8iFlD5bN8e4JBOleRGitv5ar6btqoS4F/AFRyCncOv
cn7s2qdEQvXRF73gZwzUgwiTRBkf3dcYOdc2Y1GymzfQapIaEjsleC4nO3E0kkNf
XpjaZGQZa1PNmrb93QsSuzuJtcemjFs9d2QA5YcB/EMJ/kCsz2/rpYv+q457rddS
m/xBJyG4Q+CVSq7WxaK7X+opVx1amhALawUj+OgroW7y28HJhHt6UTZI1/zhL22B
gZqxsdjKpU66CxHUUOz3olprGAXd2YIU3VFsRV8IcU1pZpCyJS3uGMueXwS21VC0
OS3TxnaKNgOREidO22J6C3c5N2NZm9ERBLY3PMYVW0KWdeWzxJvwLRwpQoCUj0fm
OpvmzqwxIKLcNgzppTEKXzFikwZcZZQ+zrZycoHxKSqu2YJtimrP59oqDUX69Wp3
PBGrAZRU+uz9Zq18csidyL5DbLtIwEOILmMO/MnTuAQQ2z2avDTQ0xVzRPq51A/W
QyYhStguxR0e4rHmD//lf695HywilZPUlj3aqVm4kCGaasWlRyinIOwPDuOUZuEw
FcMb8+1tdNfu6mU0NgW8z3WajvG/p/pNPCMhtfsdwJTkHK5aB6sJJzFOJm7iVD8W
F/IbEubPBu0k4RrQ/BIf
=oTl5
-----END PGP SIGNATURE-----
Merge tag 'for-3.6-rc3' of git://gitorious.org/linux-pwm/linux-pwm
Pull pwm fixes from Thierry Reding:
"These patches fix the Samsung PWM driver and perform some minor
cleanups like fixing checkpatch and sparse warnings.
Two redundant error messages are removed and the Kconfig help text for
the PWM subsystem is made more descriptive."
* tag 'for-3.6-rc3' of git://gitorious.org/linux-pwm/linux-pwm:
pwm: Improve Kconfig help text
pwm: core: Fix coding style issues
pwm: vt8500: Fix coding style issue
pwm: Remove a redundant error message when devm_request_and_ioremap fails
pwm: samsung: add missing device pointer to struct pwm_chip
pwm: Add missing static storage class specifiers in core.c file
Pull ceph fixes from Sage Weil:
"Jim's fix closes a narrow race introduced with the msgr changes. One
fix resolves problems with debugfs initialization that Yan found when
multiple client instances are created (e.g., two clusters mounted, or
rbd + cephfs), another one fixes problems with mounting a nonexistent
server subdirectory, and the last one fixes a divide by zero error
from unsanitized ioctl input that Dan Carpenter found."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: avoid divide by zero in __validate_layout()
libceph: avoid truncation due to racing banners
ceph: tolerate (and warn on) extraneous dentry from mds
libceph: delay debugfs initialization until we learn global_id
- NFSv3 mounts need to fail if the FSINFO rpc call fails
- Ensure that the NFS commit cache gets torn down when we unload the
NFS module.
- Fix memory scribble issues when interrupting a LAYOUTGET rpc call
- Fix NFSv4 legacy idmapper regressions
- Fix issues with the NFSv4 getacl command
- Fix a regression when using the legacy "mount -t nfs4"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQNPyKAAoJEGcL54qWCgDyQlAP/AmnLaJmKMPj5P4GCiA/uxMV
sr0XSSVAv+OgUtRxLNO5PDyunVIIEjwA5nGa2RZEG14+eYwNEeQehaGoaGqoKLhh
ATOCMvHHEs5qSKLckWsZWflf6U/MLimYS3D/hHVUtP0QWbBMtol3ImABs2ht/VUc
HW0wQoEG+5mhbhC87d4ku5E+OHQzYeNNmdX2VkKOmT0CwRd/bfsHxlGwMmcHldP2
TSge+MRhoKNycpy0j7nugHj2pZ12UiX+O8371OlAketnPk0OA+6RNGMyNo49IQAn
VEN9MFWc4ypH1+hJ0OvhzmUA6Zn2/lyQXUtwM1DcXeBmLI4aClb3bT3G3xKNh09O
yLEvkh2aUUlmoTSgP7vDK/Ui3QwVnsaJULvg+gywQJG6qTSFgUQfErNKgJmabHQb
d0EuumlSGnJ7qdC5nmrUp+M/CpbfGh15ax/JnTRGVK7C3jeCm18vc2np+z4EUDp/
USzrvuBu1B8eI0nQNoht0BeNf7sEJGgFIn8BJzxDakP8buXPMqEvFwA8c1JmMeBX
E6pbG2jHqPdrTJtxQTpMRqhA0MxbFlVNi5cCs6U7ifiBqbRWEXyNVgAHTwoIxIrn
ASbL0s8gZH+EMEXNOmPmFiO2NUjBCeSzAjrsTmSpWa13YmGfiroZN8N/iPzLnFf+
FR3SFUnoJHq7x4uLZoAX
=lqfh
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- NFSv3 mounts need to fail if the FSINFO rpc call fails
- Ensure that the NFS commit cache gets torn down when we unload the
NFS module.
- Fix memory scribble issues when interrupting a LAYOUTGET rpc call
- Fix NFSv4 legacy idmapper regressions
- Fix issues with the NFSv4 getacl command
- Fix a regression when using the legacy "mount -t nfs4"
* tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv3: Ensure that do_proc_get_root() reports errors correctly
NFSv4: Ensure that nfs4_alloc_client cleans up on error.
NFS: return -ENOKEY when the upcall fails to map the name
NFS: Clear key construction data if the idmap upcall fails
NFSv4: Don't use private xdr_stream fields in decode_getacl
NFSv4: Fix the acl cache size calculation
NFSv4: Fix pointer arithmetic in decode_getacl
NFS: Alias the nfs module to nfs4
NFS: Fix a regression when loading the NFS v4 module
NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done
pnfs-obj: Better IO pattern in case of unaligned offset
NFS41: add pg_layout_private to nfs_pageio_descriptor
pnfs: nfs4_proc_layoutget returns void
pnfs: defer release of pages in layoutget
nfs: tear down caches in nfs_init_writepagecache when allocation fails
Pull assorted fixes - mostly vfs - from Al Viro:
"Assorted fixes, with an unexpected detour into vfio refcounting logics
(fell out when digging in an analog of eventpoll race in there)."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
task_work: add a scheduling point in task_work_run()
fs: fix fs/namei.c kernel-doc warnings
eventpoll: use-after-possible-free in epoll_create1()
vfio: grab vfio_device reference *before* exposing the sucker via fd_install()
vfio: get rid of vfio_device_put()/vfio_group_get_device* races
vfio: get rid of open-coding kref_put_mutex
introduce kref_put_mutex()
vfio: don't dereference after kfree...
mqueue: lift mnt_want_write() outside ->i_mutex, clean up a bit
This QUANTA device is driven by the generic hid-multitouch.ko driver, and
therefore shouldn't be in the special drivers list.
This has been an oversight in 4fa3a58 ("HID: hid-multitouch: Switch to
device groups").
Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
It seems commit 4a9d4b02 (switch fput to task_work_add) reintroduced
the problem addressed in commit 944be0b2 (close_files(): add scheduling
point)
If a server process with a lot of files (say 2 million tcp sockets)
is killed, we can spend a lot of time in task_work_run() and trigger
a soft lockup.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix kernel-doc warnings in fs/namei.c:
Warning(fs/namei.c:360): No description found for parameter 'inode'
Warning(fs/namei.c:672): No description found for parameter 'nd'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
As soon as we'd installed the file into descriptor table, it can
get closed by another thread. Freeing ep in process...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It's not critical (anymore) since another thread closing the file will block
on ->device_lock before it gets to dropping the final reference, but it's
definitely cleaner that way...
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
we really need to make sure that dropping the last reference happens
under the group->device_lock; otherwise a loop (under device_lock)
might find vfio_device instance that is being freed right now, has
already dropped the last reference and waits on device_lock to exclude
the sucker from the list.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Although the possible race described in
commit 85b7059169
KVM: MMU: fix shrinking page from the empty mmu
was correct, the real cause of that issue was a more trivial bug of
mmu_shrink() introduced by
commit 1952639665
KVM: MMU: do not iterate over all VMs in mmu_shrink()
Here is the bug:
if (kvm->arch.n_used_mmu_pages > 0) {
if (!nr_to_scan--)
break;
continue;
}
We skip VMs whose n_used_mmu_pages is not zero and try to shrink others:
in other words we try to shrink empty ones by mistake.
This patch reverses the logic so that mmu_shrink() can free pages from
the first VM whose n_used_mmu_pages is not zero. Note that we also add
comments explaining the role of nr_to_scan which is not practically
important now, hoping this will be improved in the future.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Probably a leftover from the early days of self-patching, p6nops
are marked __initconst_or_module, which causes them to be
discarded in a non-modular kernel. If something later triggers
patching, it will overwrite kernel code with garbage.
Reported-by: Tomas Racek <tracek@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Cc: Michael Tokarev <mjt@tls.msk.ru>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: qemu-devel@nongnu.org
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/5034AE84.90708@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When debugging is enabled, we use a temporary on-stack buffer for formatting
the key strings like "(11368871, direntry, 0xcd0750)". The buffer size is
32 bytes and sometimes it is not enough to fit the key string - e.g., when
inode numbers are high. This is not fatal, but the key strings are incomplete
and UBIFS complains like this:
UBIFS assert failed in dbg_snprintf_key at 137 (pid 1)
This is a regression caused by "515315a UBIFS: fix key printing".
Fix the issue by increasing the buffer to 48 bytes.
Reported-by: Michael Hench <michaelhench@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Tested-by: Michael Hench <michaelhench@gmail.com>
Cc: stable@vger.kernel.org [v3.3+]
If update_wall_time() is called and the current offset isn't large
enough to accumulate, avoid re-calling timekeeping_adjust which may
change the clock freq and can cause 1ns inconsistencies with
CLOCK_REALTIME_COARSE/CLOCK_MONOTONIC_COARSE.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1345595449-34965-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Andreas Schwab noticed that the 1 << tk->shift could overflow if the
shift value was greater than 30, since 1 would be a 32bit long on
32bit architectures. This issue was introduced by 1e75fa8be (time:
Condense timekeeper.xtime into xtime_sec)
Use 1ULL instead to ensure we don't overflow on the shift.
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1345595449-34965-4-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch_gettimeoffset returns a u32 value which when shifted by tk->shift
can overflow. This issue was introduced with 1e75fa8be (time: Condense
timekeeper.xtime into xtime_sec)
Cast it to u64 first.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1345595449-34965-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Andreas noticed problems with resume on specific hardware after commit
1e75fa8b (time: Condense timekeeper.xtime into xtime_sec) combined
with commit b44d50dca (time: Fix casting issue in tk_set_xtime and
tk_xtime_add)
After some digging I realized we aren't normalizing the timekeeper
after the add. Add the missing normalize call.
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1345595449-34965-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When one CPU is going down and this CPU is the last one in irq
affinity, current code is setting cpu_all_mask as the new
affinity for that irq.
But for some systems (such as in Medfield Android mobile) the
firmware sends the interrupt to each CPU in the irq affinity
mask, averaged, and cpu_all_mask includes all potential CPUs,
i.e. offline ones as well.
So replace cpu_all_mask with cpu_online_mask.
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Acked-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A137286@SHSMSX101.ccr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This comment is no longer true. We support up to 2^16 CPUs
because __ticket_t is an u16 if NR_CPUS is larger than 256.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The destination address of unicast frames forwarded through a mesh gate
was being replaced with the broadcast address. Instead leave the
original destination address as the mesh DA. If the nexthop address is
not in the mpath table it will be resolved. If that fails, the frame
will be forwarded to known mesh gates.
Reported-by: Cedric Voncken <cedric.voncken@acksys.fr>
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
So we've had a fair few reports of fbcon handover breakage between
efi/vesafb and i915 surface recently, so I dedicated a couple of
days to finding the problem.
Essentially the last thing we saw was the conflicting framebuffer
message and that was all.
So after much tracing with direct netconsole writes (printks
under console_lock not so useful), I think I found the race.
Thread A (driver load) Thread B (timer thread)
unbind_con_driver -> |
bind_con_driver -> |
vc->vc_sw->con_deinit -> |
fbcon_deinit -> |
console_lock() |
| |
| fbcon_flashcursor timer fires
| console_lock() <- blocked for A
|
|
fbcon_del_cursor_timer ->
del_timer_sync
(BOOM)
Of course because all of this is under the console lock,
we never see anything, also since we also just unbound the active
console guess what we never see anything.
Hopefully this fixes the problem for anyone seeing vesafb->kms
driver handoff.
v1.1: add comment suggestion from Alan.
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Merge fixes from Andrew Morton.
Random drivers and some VM fixes.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (17 commits)
mm: compaction: Abort async compaction if locks are contended or taking too long
mm: have order > 0 compaction start near a pageblock with free pages
rapidio/tsi721: fix unused variable compiler warning
rapidio/tsi721: fix inbound doorbell interrupt handling
drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode
mm: correct page->pfmemalloc to fix deactivate_slab regression
drivers/rtc/rtc-pcf2123.c: initialize dynamic sysfs attributes
mm/compaction.c: fix deferring compaction mistake
drivers/misc/sgi-xp/xpc_uv.c: SGI XPC fails to load when cpu 0 is out of IRQ resources
string: do not export memweight() to userspace
hugetlb: update hugetlbpage.txt
checkpatch: add control statement test to SINGLE_STATEMENT_DO_WHILE_MACRO
mm: hugetlbfs: correctly populate shared pmd
cciss: fix incorrect scsi status reporting
Documentation: update mount option in filesystem/vfat.txt
mm: change nr_ptes BUG_ON to WARN_ON
cs5535-clockevt: typo, it's MFGPT, not MFPGT
Pull media fixes from Mauro Carvalho Chehab:
"For bug fixes, at soc_camera, si470x, uvcvideo, iguanaworks IR driver,
radio_shark Kbuild fixes, and at the V4L2 core (radio fixes)."
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] media: soc_camera: don't clear pix->sizeimage in JPEG mode
[media] media: mx2_camera: Fix clock handling for i.MX27
[media] video: mx2_camera: Use clk_prepare_enable/clk_disable_unprepare
[media] video: mx1_camera: Use clk_prepare_enable/clk_disable_unprepare
[media] media: mx3_camera: buf_init() add buffer state check
[media] radio-shark2: Only compile led support when CONFIG_LED_CLASS is set
[media] radio-shark: Only compile led support when CONFIG_LED_CLASS is set
[media] radio-shark*: Call cancel_work_sync from disconnect rather then release
[media] radio-shark*: Remove work-around for dangling pointer in usb intfdata
[media] Add USB dependency for IguanaWorks USB IR Transceiver
[media] Add missing logging for rangelow/high of hwseek
[media] VIDIOC_ENUM_FREQ_BANDS fix
[media] mem2mem_testdev: fix querycap regression
[media] si470x: v4l2-compliance fixes
[media] DocBook: Remove a spurious character
[media] uvcvideo: Reset the bytesused field when recycling an erroneous buffer
Pull networking update from David Miller:
"A couple weeks of bug fixing in there. The largest chunk is all the
broken crap Amerigo Wang found in the netpoll layer."
1) netpoll and it's users has several serious bugs:
a) uses GFP_KERNEL with locks held
b) interfaces requiring interrupts disabled are called with them
enabled
c) and vice versa
d) VLAN tag demuxing, as per all other RX packet input paths, is not
applied
All from Amerigo Wang.
2) Hopefully cure the ipv4 mapped ipv6 address TCP early demux bugs for
good, from Neal Cardwell.
3) Unlike AF_UNIX, AF_PACKET sockets don't set a default credentials
when the user doesn't specify one explicitly during sendmsg().
Instead we attach an empty (zero) SCM credential block which is
definitely not what we want. Fix from Eric Dumazet.
4) IPv6 illegally invokes netdevice notifiers with RCU lock held, fix
from Ben Hutchings.
5) inet_csk_route_child_sock() checks wrong inet options pointer, fix
from Christoph Paasch.
6) When AF_PACKET is used for transmit, packet loopback doesn't behave
properly when a socket fanout is enabled, from Eric Leblond.
7) On bluetooth l2cap channel create failure, we leak the socket, from
Jaganath Kanakkassery.
8) Fix all the netprio file handling bugs found by Al Viro, from John
Fastabend.
9) Several error return and NULL deref bug fixes in networking drivers
from Julia Lawall.
10) A large smattering of struct padding et al. kernel memory leaks to
userspace found of Mathias Krause.
11) Conntrack expections in netfilter can access an uninitialized timer,
fix from Pablo Neira Ayuso.
12) Several netfilter SIP tracker bug fixes from Patrick McHardy.
13) IPSEC ipv6 routes are not initialized correctly all the time,
resulting in an OOPS in inet_putpeer(). Also from Patrick McHardy.
14) Bridging does rcu_dereference() outside of RCU protected area, from
Stephen Hemminger.
15) Fix routing cache removal performance regression when looking up
output routes that have a local destination. From Zheng Yan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
af_netlink: force credentials passing [CVE-2012-3520]
ipv4: fix ip header ident selection in __ip_make_skb()
ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()
tcp: fix possible socket refcount problem
net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child()
net/core/dev.c: fix kernel-doc warning
netconsole: remove a redundant netconsole_target_put()
net: ipv6: fix oops in inet_putpeer()
net/stmmac: fix issue of clk_get for Loongson1B.
caif: Do not dereference NULL in chnl_recv_cb()
af_packet: don't emit packet on orig fanout group
drivers/net/irda: fix error return code
drivers/net/wan/dscc4.c: fix error return code
drivers/net/wimax/i2400m/fw.c: fix error return code
smsc75xx: add missing entry to MAINTAINERS
net: qmi_wwan: new devices: UML290 and K5006-Z
net: sh_eth: Add eth support for R8A7779 device
netdev/phy: skip disabled mdio-mux nodes
dt: introduce for_each_available_child_of_node, of_get_next_available_child
net: netprio: fix cgrp create and write priomap race
...
Jim Schutt reported a problem that pointed at compaction contending
heavily on locks. The workload is straight-forward and in his own words;
The systems in question have 24 SAS drives spread across 3 HBAs,
running 24 Ceph OSD instances, one per drive. FWIW these servers
are dual-socket Intel 5675 Xeons w/48 GB memory. I've got ~160
Ceph Linux clients doing dd simultaneously to a Ceph file system
backed by 12 of these servers.
Early in the test everything looks fine
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
31 15 0 287216 576 38606628 0 0 2 1158 2 14 1 3 95 0 0
27 15 0 225288 576 38583384 0 0 18 2222016 203357 134876 11 56 17 15 0
28 17 0 219256 576 38544736 0 0 11 2305932 203141 146296 11 49 23 17 0
6 18 0 215596 576 38552872 0 0 7 2363207 215264 166502 12 45 22 20 0
22 18 0 226984 576 38596404 0 0 3 2445741 223114 179527 12 43 23 22 0
and then it goes to pot
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
163 8 0 464308 576 36791368 0 0 11 22210 866 536 3 13 79 4 0
207 14 0 917752 576 36181928 0 0 712 1345376 134598 47367 7 90 1 2 0
123 12 0 685516 576 36296148 0 0 429 1386615 158494 60077 8 84 5 3 0
123 12 0 598572 576 36333728 0 0 1107 1233281 147542 62351 7 84 5 4 0
622 7 0 660768 576 36118264 0 0 557 1345548 151394 59353 7 85 4 3 0
223 11 0 283960 576 36463868 0 0 46 1107160 121846 33006 6 93 1 1 0
Note that system CPU usage is very high blocks being written out has
dropped by 42%. He analysed this with perf and found
perf record -g -a sleep 10
perf report --sort symbol --call-graph fractal,5
34.63% [k] _raw_spin_lock_irqsave
|
|--97.30%-- isolate_freepages
| compaction_alloc
| unmap_and_move
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_slowpath
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| do_page_fault
| page_fault
| |
| |--87.39%-- skb_copy_datagram_iovec
| | tcp_recvmsg
| | inet_recvmsg
| | sock_recvmsg
| | sys_recvfrom
| | system_call
| | __recv
| | |
| | --100.00%-- (nil)
| |
| --12.61%-- memcpy
--2.70%-- [...]
There was other data but primarily it is all showing that compaction is
contended heavily on the zone->lock and zone->lru_lock.
commit [b2eef8c0: mm: compaction: minimise the time IRQs are disabled
while isolating pages for migration] noted that it was possible for
migration to hold the lru_lock for an excessive amount of time. Very
broadly speaking this patch expands the concept.
This patch introduces compact_checklock_irqsave() to check if a lock
is contended or the process needs to be scheduled. If either condition
is true then async compaction is aborted and the caller is informed.
The page allocator will fail a THP allocation if compaction failed due
to contention. This patch also introduces compact_trylock_irqsave()
which will acquire the lock only if it is not contended and the process
does not need to schedule.
Reported-by: Jim Schutt <jaschut@sandia.gov>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 7db8889ab0 ("mm: have order > 0 compaction start off where it
left") introduced a caching mechanism to reduce the amount work the free
page scanner does in compaction. However, it has a problem. Consider
two process simultaneously scanning free pages
C
Process A M S F
|---------------------------------------|
Process B M FS
C is zone->compact_cached_free_pfn
S is cc->start_pfree_pfn
M is cc->migrate_pfn
F is cc->free_pfn
In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn accordingly.
Simultaneously, Process B finishes isolating in a block and updates
compact_cached_free_pfn again to the location of its free scanner.
Process A moves to "end_of_zone - one_pageblock" and runs this check
if (cc->order > 0 && (!cc->wrapped ||
zone->compact_cached_free_pfn >
cc->start_free_pfn))
pfn = min(pfn, zone->compact_cached_free_pfn);
compact_cached_free_pfn is above where it started so the free scanner
skips almost the entire space it should have scanned. When there are
multiple processes compacting it can end in a situation where the entire
zone is not being scanned at all. Further, it is possible for two
processes to ping-pong update to compact_cached_free_pfn which is just
random.
Overall, the end result wrecks allocation success rates.
There is not an obvious way around this problem without introducing new
locking and state so this patch takes a different approach.
First, it gets rid of the skip logic because it's not clear that it
matters if two free scanners happen to be in the same block but with
racing updates it's too easy for it to skip over blocks it should not.
Second, it updates compact_cached_free_pfn in a more limited set of
circumstances.
If a scanner has wrapped, it updates compact_cached_free_pfn to the end
of the zone. When a wrapped scanner isolates a page, it updates
compact_cached_free_pfn to point to the highest pageblock it
can isolate pages from.
If a scanner has not wrapped when it has finished isolated pages it
checks if compact_cached_free_pfn is pointing to the end of the
zone. If so, the value is updated to point to the highest
pageblock that pages were isolated from. This value will not
be updated again until a free page scanner wraps and resets
compact_cached_free_pfn.
This is not optimal and it can still race but the compact_cached_free_pfn
will be pointing to or very near a pageblock with free pages.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix unused variable compiler warning when built with CONFIG_RAPIDIO_DEBUG
option off.
This patch is applicable to kernel versions starting from v3.2
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>